# 「Cryptography-MIT6875」: Lecture 4

In this series, I will learn MIT 6.875, Foundations of Cryptography, lectured by Vinod Vaikuntanathan.
Any corrections and advice are welcome. ^ - ^

Topics:

• Stateless Secret-key Encryption with PRF.
• The Goldreich-Goldwasser-Micali (GGM) PRF construction.

# Recap

In the previous blog, we introduced the second definition of PRG, Next-bit Unpredictability.

Using hybrid argument, we achieved the predicting-to-distinguishing reduction to prove Next-bit Unpredictability = Indistinguishability for PRGs.

We elaborated the theorem of PRG Length Extension and used it to construct stateful secret-key encryption scheme.

For the purpose of stateless encryption, we brought about a new notion, Pseudorandom Functions (PRFs).

# Agenda

In the first place, we’ll finish up the the stateless secret-key encryption scheme.

The last blog elaborated the theorem of PRG Length Extension, if there is a PRG that stretched by one bit, there is one that stretched by poly. many bits.

In this blog, we’ll introduce the theorem of PRF, if there is a PRG, then there is a PRF.

In addition, we will elaborate the Goldreich-Goldwasser-Micali (GGM) construction.

# PRF

## Definition

Recall the definition of PRF.

Collection of the Pseudorandom Functions:

Consider the collection of pseudorandom functions $\mathcal{F}_l=\{f_k:\{0,1\}^l\rightarrow\{0,1\}^m\}_{k\in\{0,1\}^n}$, each of which maps $l$ bits to $m$ bits.

• indexed by a key $k$.
• $n$ : key length, $l$: input length, $m$: output length.
• Independent parameters, all poly(sec-param)=poly(n).
• #functions in $\mathcal{F}_l\le 2^n$ (single exponential in $n$)

Collection of ALL Functions:

Consider the collection of ALL functions $ALL_l=\{f:\{\ 0,1\}^l\rightarrow \{0,1\}^m\}$, each of which is maps $l$ bits to $m$ bits.

• #functions in $ALL_l\le 2^{m2^l}$ (doubly exponential in $l$)

The #functions in the pseudorandom collection is much less than that in the random collection.

But the pseudorandom functions should be “indistinguishable” from random.

There are two worlds, the pseudorandom world and the random world.

• The pseudorandom world
1. Sample a function $f$ from $\mathcal{F}_l$.
The function $f$ is picked one and stays fixed for all. The oracle is instantiated once the $f$ is picked.
2. The oracle responses $f(x)$ for each query $x$.
You can think there is a truth table of $f$ in the oracle.
It responses the same $f(x)$ for the same query $x$ since the function $f$ is fixed.
• The random world
1. sample a function $f$ from $ALL_l$.
Likewise, the function $f$ is picked one and stays fixed for all. The oracle is instantiated once the $f$ is picked.
2. The oracle responses $f(x)$ for each query $x$.
Likewise, it responses the same $f(x)$ for the same query $x$ since the function $f$ is fixed.
• The Distinguisher $D$ has the power of querying the oracle many poly. times and try to guess which world she is in, the pseudorandom world or the random world.

Definition of PRF:

For all p.p.t. $D$, there is a negligible function $\mu$ s.t.

$$|\operatorname{Pr}[f\leftarrow \mathcal{F}_l:D^f(1^n)=1]-\operatorname{Pr}[f\leftarrow ALL_l:D^f(1^n)=1]|\le \mu(n)$$

Notation: Actually, $D$ dose not have a challenge since she gets nothing as input except the secure parameter, but she has the power of querying the oracle many poly. times.

## The Relation in PRG and PRF

Ponder over the relation in PRG and PRF.

They both expand a few random bits into many pseudorandom bit.

PRG expands $n$ bits into $p(n)$ bits while PRF expands $n$ bits into $2^l$ bits that $l=p(n)$.

With a PRG, accessing $2^l$-th bit takes time $2^l$ time. With a PRF, this can be done in time $l$.

So, a PRF = locally accessible (or, random-access) PRG.

## Stateless-key Encryption

We can design stateless secret-key encryption from PRF.

• $Gen(1^n)$: Generate a random $n$-bit key $k$ that defines $f_k:\{0,1\}^l\rightarrow \{0,1\}^m$.
The domain size of $f_k$, $2^l$, had better be super-polynomially large in $n$.
• $Enc(k,m)$: Pick a random $x$ and let the ciphertext $c$ be the pair $(x,y=f_k(x)\oplus m)$.
It’s a polynomial time to evaluate $f_k(x)$ since $f_k$ are random accessible.
• $Dec(k,c=(x,y))$: Output $f_k(x)\oplus y$.

Instead of remembering the whole truth table of $f$, we remember just the key $k$.

Notation: We cannot pick up the same $x$ for the different messages since one-time pad cannot be reused.

Therefore, the domain size of $f_k$ is supposed be super-polynomially large in $n$ to prevent collision of $x$.

### Security of Secret-Key Enc. (for one msg)

In [the second blog] of this series, we defined the computationally indistinguishability of the secret-key encryption scheme.

There are two mathematical forms of Computational Indistinguishability. We can use them to define the security of the secret-key encryption from PRF respectively.

There are two worlds, the left world and the right world.

• They both sampled a key $k$ using $Gen(1^n)$. The left world uses the key to encrypt $m_0$ while the right world uses the key to encrypt $m_1$ using $Enc(k,m)$.
• The distinguisher $D$ gets the ciphertext $c=Enc(k,m)$ and tries to guess which world she is in.

We can get the definitions.

Definition 1

$\begin{array}{l}\text{For all }m_0,m_1,\text{and all p.p.t. }D,\text{there is a negligible function }\mu \text{ s.t.} \\ \operatorname{Pr}[k\leftarrow \mathcal{K}; b\leftarrow \{0,1\}:D(Enc(k,m_b))=b] \le 1/2 + \mu(n)\end{array}$

Definition 2

$\begin{array}{l}\text{For all }m_0,m_1,\text{and all p.p.t. }D,\text{there is a negligible function }\mu \text{ s.t.} \\ \mid \operatorname{Pr}[k\leftarrow \mathcal{K}:D(Enc(k,m_0))=1] -\operatorname{Pr}[k\leftarrow \mathcal{K}:D(Enc(k,m_1))=1]\mid \le \mu(n)\end{array}$

The two definitions are substantially equivalent. The first emphasizes the probability of $D$ guessing correctly while the second emphasized the probability of $D$ guessing she is the world 1.

However, the definition indicates that the distinguisher only sees the ciphertext of one message, $m_0$ or $m_1$, which contradicts the setting that distinguisher has the power of querying the oracle poly. many times.

So this definition only defines the secret-key encryption scheme for one message, which encrypts different messages with different keys, such as one-time pad.

### Security of Secret-Ket Enc. (for many msgs)

We need to modify it for many messages since the oracle is an interactive oracle.

There are two oracles.

• The Left Oracle $Left(\cdot,\cdot)$
• Sample a key $k$ .
The oracle is instantiated once is key is picked.
• For each query $(m_L, m_R)$, the left oracle uses the key $k$ to encrypt the left message $m_L$ by $c:=Enc(k,m_L)$ and responses with $c$.
• The Right Oracle $Right(\cdot,\cdot)$
• Sample a key $k$.
The oracle is instantiated once is key is picked.
• For each query $(m_L,m_R)$, the right oracle uses the key $k$ to encrypt the right message $m_R$ by $c:=Enc(k,m_R)$ and responses with $c$.
• The distinguisher $D$ has the power of asking the oracle many poly. times.
For each query, $D$ gets the ciphertext and tries to guess with which oracle she is interacting (or which message is encrypted).

We can get the secure definitions of the secret-key encryption for many messages.

Definition :

For all p.p.t. $D$, there is a negligible function $\mu$ s.t.
$$\mid \operatorname{Pr}[k\leftarrow \mathcal{K}:D^{Left(\cdot,\cdot)}(1^n)=1] -\operatorname{Pr}[k\leftarrow \mathcal{K}:D^{Left(\cdot,\cdot)}(1^n)=1]\mid\le \mu(n)$$

Crucial Supplement:

Actually, the definitions of security we gave above are not achievable including for one messages and for many messages.

How can we attack them ?

Length Attack. We can use messages of different lengths.

For the secret-key encryption for one message, if $|m_0|\ne|m_1|$, the adversary can distinguish easily only from the length of ciphertext.

For the secret-key encryption for many messages, if $|m_L\ne m_R|$, the adversary can distinguish it easily as well.

It’s a crucial point that the scheme is secure only if it hides all possible information, which many people often overlook in practice.

Therefore, the supplements for these definitions are

• secure definition for one message: $|m_0|=|m_1|$.
• secure definition for many messages: $|m_L|=|m_R|$

### Security Analysis

We’ll prove the definition by hybrid argument.

Proof:

• We know two oracles.
• Left oracle: $c=(x,y=f_k(x)\oplus m_L)$
• Right oracle: $c=(x,y=f_k(x)\oplus m_R)$
• The thing we want to prove is $D$ gets $c$ and cannot distinguish with which oracle she is interacting.
• We want to change the oracle a little bit to define a sequence of hybrid distributions.
Consider the ciphertext as the distribution.
• Hybrid 0: $D$ gets access to the Left oracle.
$c=(x,y=f_k(x)\oplus m_L)$
• Hybrid 1: Replace $f_k$ by a random function.
$c=(x,y=r_x\oplus m_L)$
• Hybrid 2: Replace $f_k$ by a random function.
$c=(x,y=r_x)$
• Hybrid 3: Replace $f_k$ by a random function (like Hybrid 1).
$c=(x,y=r_x\oplus m_R)$
• Hybrid 4: $D$ gets access to the Right oracle (like Hybrid 0).
$c=(x,y=f_k(x)\oplus m_R)$
• The thing we want to prove is that Hybrid 0 and Hybrid 4 are indistinguishable.
• Prove Hybrid 0 = Hybrid 1 (and Hybrid 4 = Hybrid 3).
• It can be proved by PRF security.
• The definition of PRF indicates $D$ cannot distinguish from the pseudorandom world and the random world.
• Prove Hybrid 1 = Hybrid 2 (and Hybrid 3 = Hybrid 2).
• It can be proven by birthday paradox.
• Suppose $D$ has asked the oracle for $q$ times and gets $q$ ciphertexts.
• The oracle in Hybrid 2 only outputs the function value $r_x$ without $\oplus$ operation.
If $D$ gets two same ciphertexts, $D$ knows that she is interacting with the oracle in Hybrid 2.
• The oracle in Hybrid 1 is one-time pad.
If $D$ gets two ciphertexts with the same $x$ but different $y$, $D$ knows that she is interacting with the oracle in Hybrid 1.
• The probability of $D$ distinguishing from Hybrid 1 and Hybrid 2 is up to the collision probability of $x$.
• The domain size of $x$ is $2^l$ that $l$ is super-polynomially large in $n$.
• The probability of picking the same $x$ is at most $q^2/2^l$, which is negligible.
There are $q^2$ possible pairs to match any value in the domain size of $x$.
• The adjacent hybrid distributions are all indistinguishable, so Hybrid 0 and Hybrid 4 are indistinguishable.
• QED.

# Theorem of PRF

Finally, we can get into the theorem of PRF.

(Tired but Happy ^ - ^)

## PRG Length Extension

Before start, let’s look back at PRG Length Extension.

Theorem

Let $G:\{0,1\}^n \rightarrow \{0,1\}^{n+1}$ be a PRG.

Then, for every polynomial $m(n)$, there is a PRG $G':\{0,1\}^n\rightarrow\{0,1\}^{m(n)}$.
Recall the construction of the last blog.

Construction: $G(s)=G_0(s)||G_1(s)$ where $G_0(s)$ is 1 bit and $G_1(s)$ is $n$ bits.

We parse the output of PRG as 1 bit and $n$ bits. The 1 bit is as the output and the $n$ bits are as the seed of next call.

Write the process as a tree.

The problem is accessing the $i$-th output bit takes time linear in $\approx i$.

## Goldreich-Goldwasser-Micali PRF

Theorem:

Let $G$ be a PRF.
Then, for every polynomials $l=l(n),m=m(n)$, there exists a PRF family $\mathcal{F}_l=\{f_s:\{0,1\}^l\rightarrow \{0,1\}^m\}_{s\{0,1\}^n}$.

Note: We will focus on $m=l$. The output length could be made smaller (by truncation) or larger (by expansion with a PRG).

### Construction

Construction: Let $G(s)=G_0(s)||G_1(s)$ where $G_0(s)$ and $G_1(s)$ are both $n$ bits each.

So $G:\{0,1\}^n\rightarrow \{0,1\}^{2n}$ is a PRG that stretches $n$ bits into $2n$ bits and parses the output in half.

Notation: $G_0(s),G_1(s)$ are not functions in $s$. $G_0(s)$ represents the first $n$ bits of $G(s)$’s output and $G_1(s)$ represents the second $n$ bits of $G(s)$’s output.

Write it as a tree as well.

The depth is $l$ so there are $2^l$ leaves or paths.

So each path/leaf labeled by $x\in\{0,1\}^l$ corresponds to $f_s(x)$.

The pseudorandom function family $\mathcal{F}_l$ is defined by a collection of functions $f_s$ where:

$$f_s(x_1x_2\dots x_l)=G_{x_l}(G_{x_{l-1}}(\dots G_{x_1}(s)))$$

If $G:\{0,1\}\rightarrow \{0,1\}^2$

• $f_s$ define $2^l$ presudorandom bits.
• The $x$-th bit can be computed using $l$ evaluations of the PRG $G$.
(as opposed to $x \approx 2^l$ evaluations as before)

## Security Analysis of GGM PRF

### PRG Repetition Lemma

Before proving the security of GGM PRF, let’s introduce a PRG Repetition Lemma, which is the component of the following proof.

Lemma:

Let $G$ be a PRG.

Then, for every polynomials $L=L(n)$, the following two distributions are computationally indistinguishable:

$$(G(s_1),G(s_2),\dots,G(s_L)\approx (u_1,u_2,\dots,u_L)$$

This lemma can be proved easily by hybrid argument since the hybrid distributions are easy to construct.

If there is a p.p.t. distinguisher between the two distributions with distinguishing advantage $\varepsilon$, then there is a p.p.t. distinguisher for $G$ with advantage $\ge \varepsilon/L$.

By hybrid argument, we can achieve the PRG reduction.

### Step 1-Hybrid Argument

Assume that there is a p.p.t. $D$ and a poly. function $p$ s.t.
$|\operatorname{Pr}[f\leftarrow \mathcal{F}_l:D^f(1^n)=1]-\operatorname{Pr}[f\leftarrow ALL_l:D^f(1^n)=1]|\ge 1/p(n)$

Under the assumption, we can derive a contradiction to the security of PRG.

The key idea is argument by levels of the tree. Very ingenious!

For simplicity, we consider the PRG of GGM PRF is $G:\{0,1\}\rightarrow \{0,1\}^2$, so the GGM PRF generate $2^l$ pseudorandom bits.

Consider the hybrid distributions of the $2^l$ bits generated by a tree.

Hybrid 0 (The Pseudorandom World):

In the pseudorandom world, we consider the distribution of $2^l$ bits generated by a GGM tree as Hybrid 0.

For each query $x$, the oracle responses with $b_x$ by $l$ evaluations of PRG.

Hybrid 1:

We can get Hybrid 1 by changing a little to the GGM tree in Hybrid 0.

The difference is the first level of the tree in Hybrid 1, $s_0$ and $s_1$, are random while they are generated by PRG in Hybrid 0.

Hybrid 2:

We can get Hybrid 2 by changing Hybrid 1 a little.

The second level of tree in Hybrid 2, $s_{00}, s_{01}, s_{10}, s_{11}$, are random while they are generated by PRG in Hybrid 1.

Hybrid $l$ (The Random World):

We can define a sequence of hybrids slowly by changing a level of tree each time.

At last, we get the hybrid $l$, the random world, the $l$-th level of which are all random bits.

These hybrids can be efficiently computable using lazy evaluation.

Lazy evaluation means that there is no need to evaluation all bits $b_1b_2\dots b_{2^l}$, the oracle only evaluates $b_x$ for each query $x$.

Let $p_i=\operatorname{Pr}[f\leftarrow H_i:D^f(1^n)=1]$.

We know $p_0-p_l\ge \varepsilon$.

By a hybrid argument, there is some $i$ s.t. $p_i-p_{i+1}\ge \varepsilon/l$.

### Step 2-Reduction

Now we’ll prove that if a distinguisher with advantage $\varepsilon/l$ between hybrids implies a distinguisher with advantage $\ge \varepsilon/ql$ for the PRG. (where $q$ is the number of queries that $D$ makes)

Assume there is a p.p.t. $D$ distinguishing from Hybrid $i$ and Hybrid $i+1$ with advantage $\varepsilon/l$.

We will construct a Repeated Repetition Breaker $D’$ from the distinguisher $D$ so as to get a PRG Breaker from the PRG Repetition Lemma.

• The challenge of $D’$ : She gets a tuple of $(y_1,y_2,\dots, y_q)$ and tries to guess whether they are all pseudorandom or all random.
• $q$ is the upper bound number of queries that $D$ makes and $q=q(n)$.
• $(y_1,y_2,\dots, y_q)$ are either all pseudorandom or all random.
• Suppose each $y_i$ are $2n$ bits.
• Construction of Repeated PRG Breaker $D’$
1. Parse each $y_i$ in half: $y_i=(y_{i,L},y_{i,R})$.
2. Construct a oracle by putting these $y_i$ into the $i$-th level of the tree in order.
It’s a GGM tree only starting from the $i$-th level.
1. If $(y_1,y_2,\dots, y_q)$ are all pseudorandom, this oracle implies Hybrid $i$.
Each pair of $y_i$, $(y_{i,L},y_{i,R})$, actually has an implicit parent in the tree.
We don’t know it but it exists.
2. If $(y_1,y_2,\dots, y_q)$ are all random, this oracle implies Hybrid $i+1$.
3. This oracle can use lazy evaluation as well.
It only responses with $f(x)$ for the each coming query $x$.
3. Let the distinguisher $D$ interact with the oracle we build.
$D$ can make at most $q$ queries.
1. If $D$ distinguishes it as Hybrid $i$, $D’$ outputs ‘Pseudorandom’.
2. If $D$ distinguishes it as Hybrid $i+1$, $D’$ outputs ‘Random’.
• $D’$ has advantage with $\varepsilon/l$ for distinguishing repeated PRG.
• There is a distinguisher with advantage $\varepsilon/lq$ for PRG from the PRG Repetition Lemma.

## Nits of GGM PRF

Although the construction of GGM PRF is elegant, there are some nits.

• Expensive: $l$ invocations of a PRG.

• Sequential: bit by bit, $l$ sequential invocations of a PRG.

• Loss in security reduction:

break PRF with advantage $\varepsilon$ → break PRG with advantage $\varepsilon/ql$,

where $q$ is an arbitrary polynomial. $q =$ # queries of the PRF distinguisher.

So some new questions come into mind.

Is there tighter reduction ?

How to avoid the loss ?

「Cryptography-MIT6875」: Lecture 4

https://f7ed.com/2022/07/08/mit6875-lec4/

f7ed

2022-07-08

2022-08-12