# 「Cryptography-MIT6875」: Lecture 4

Topics:

- Stateless Secret-key Encryption with PRF.
- The Goldreich-Goldwasser-Micali (GGM) PRF construction.

# Recap

In the previous blog, we introduced the second definition of PRG, **Next-bit Unpredictability.**

Using hybrid argument, we achieved the predicting-to-distinguishing reduction to prove **Next-bit Unpredictability = Indistinguishability for PRGs.**

We elaborated the **theorem** of __PRG Length Extension__ and used it to construct stateful secret-key encryption scheme.

For the purpose of stateless encryption, we brought about a new notion, **Pseudorandom Functions (PRFs).**

# Agenda

In the first place, we’ll finish up the the stateless secret-key encryption scheme.

The last blog elaborated the **theorem of PRG Length Extension**, if there is a PRG that stretched by one bit, there is one that stretched by poly. many bits.

In this blog, we’ll introduce the **theorem** of PRF, __if there is a PRG, then there is a PRF.__

In addition, we will elaborate the __Goldreich-Goldwasser-Micali (GGM) construction.__

# PRF

## Definition

Recall the definition of PRF.

**Collection of the Pseudorandom Functions:**

Consider the **collection** of pseudorandom functions $\mathcal{F}_l=\{f_k:\{0,1\}^l\rightarrow\{0,1\}^m\}_{k\in\{0,1\}^n}$, each of which maps $l$ bits to $m$ bits.

- indexed by a key $k$.
- $n$ : key length, $l$: input length, $m$: output length.
- Independent parameters, all poly(sec-param)=poly(n).
- #functions in $\mathcal{F}_l\le 2^n$ (single exponential in $n$)

**Collection of ALL Functions:**

Consider the **collection** of ALL functions $ALL_l=\{f:\{\ 0,1\}^l\rightarrow \{0,1\}^m\}$, each of which is maps $l$ bits to $m$ bits.

- #functions in $ALL_l\le 2^{m2^l}$ (doubly exponential in $l$)

The #functions in the pseudorandom collection is much less than that in the random collection.

But the pseudorandom functions should be “**indistinguishable**” from random.

There are two worlds, the pseudorandom world and the random world.

**The pseudorandom world**- Sample a function $f$ from $\mathcal{F}_l$.

The function $f$ is__picked one and stays fixed__for all. The oracle is instantiated once the $f$ is picked. - The oracle responses $f(x)$ for each query $x$.

You can think there is a truth table of $f$ in the oracle.

It responses the same $f(x)$ for the same query $x$ since the function $f$ is fixed.

- Sample a function $f$ from $\mathcal{F}_l$.
**The random world**- sample a function $f$ from $ALL_l$.

Likewise, the function $f$ is__picked one and stays fixed for all__. The oracle is instantiated once the $f$ is picked. - The oracle responses $f(x)$ for each query $x$.

Likewise, it responses the same $f(x)$ for the same query $x$ since the function $f$ is fixed.

- sample a function $f$ from $ALL_l$.
- The Distinguisher $D$
__has the power of querying the oracle many poly. times__and__try to guess which world__she is in, the pseudorandom world or the random world.

## The Relation in PRG and PRF

Ponder over the relation in PRG and PRF.

They both expand a few random bits into many pseudorandom bit.

PRG expands $n$ bits into $p(n)$ bits while PRF expands $n$ bits into $2^l$ bits that $l=p(n)$.

With a PRG, accessing $2^l$-th bit takes time $2^l$ time. With a PRF, this can be done in time $l$.

So, a __PRF = locally accessible (or, random-access) PRG.__

## Stateless-key Encryption

We can design stateless secret-key encryption from PRF.

- $Gen(1^n)$: Generate a random $n$-bit key $k$ that defines $f_k:\{0,1\}^l\rightarrow \{0,1\}^m$.

The domain size of $f_k$, $2^l$, had better be super-polynomially large in $n$. - $Enc(k,m)$: Pick a
**random**$x$ and let the ciphertext $c$ be the pair $(x,y=f_k(x)\oplus m)$.

It’s a polynomial time to evaluate $f_k(x)$ since $f_k$ are random accessible. - $Dec(k,c=(x,y))$: Output $f_k(x)\oplus y$.

Instead of remembering the whole truth table of $f$, __we remember just the key $k$.__

__Notation__: We **cannot** pick up the same $x$ for the different messages since one-time pad cannot be reused.

Therefore, the **domain size** of $f_k$ __is supposed be super-polynomially large__ in $n$ to prevent collision of $x$.

### Security of Secret-Key Enc. (for one msg)

In [the second blog] of this series, we defined the computationally indistinguishability of the secret-key encryption scheme.

There are two mathematical forms of **Computational Indistinguishability**. We can use them to define the security of the secret-key encryption from PRF respectively.

There are two worlds, the left world and the right world.

- They both sampled a key $k$ using $Gen(1^n)$. The
**left world**uses the key to encrypt $m_0$ while the**right world**uses the key to encrypt $m_1$ using $Enc(k,m)$. - The distinguisher $D$ gets the ciphertext $c=Enc(k,m)$ and tries to guess which world she is in.

We can get the definitions.

*Definition 1*

*Definition 2*

The two definitions __are substantially equivalent.__ The first emphasizes the probability of $D$ guessing correctly while the second emphasized the probability of $D$ guessing she is the world 1.

However, the definition **indicates** that the __distinguisher only sees the ciphertext of one message,__ $m_0$ or $m_1$, which **contradicts** the setting that __distinguisher has the power of querying the oracle poly. many times.__

So this definition only **defines** the __secret-key encryption scheme for one message__, which encrypts different messages with different keys, such as one-time pad.

### Security of Secret-Ket Enc. (for many msgs)

We need to modify it __for many messages__ since the oracle is an **interactive** oracle.

There are two oracles.

- The Left Oracle $Left(\cdot,\cdot)$
- Sample a key $k$ .

The oracle is instantiated once is key is picked. - For each query $(m_L, m_R)$, the left oracle uses the key $k$ to
**encrypt the left message**$m_L$ by $c:=Enc(k,m_L)$ and responses with $c$.

- Sample a key $k$ .
- The Right Oracle $Right(\cdot,\cdot)$
- Sample a key $k$.

The oracle is instantiated once is key is picked. - For each query $(m_L,m_R)$, the right oracle uses the key $k$ to
**encrypt the right message**$m_R$ by $c:=Enc(k,m_R)$ and responses with $c$.

- Sample a key $k$.
- The distinguisher $D$ has the power of asking the oracle many poly. times.

For each query, $D$__gets the ciphertext and tries to guess with which oracle she is interacting__(or which message is encrypted).

We can get the secure definitions of the secret-key encryption for many messages.

### Security Analysis

We’ll prove the definition by **hybrid argument.**

**Proof:**

- We know two oracles.
- Left oracle: $c=(x,y=f_k(x)\oplus m_L)$
- Right oracle: $c=(x,y=f_k(x)\oplus m_R)$

- The thing we
**want to prove**is $D$__gets $c$ and cannot distinguish with which oracle she is interacting.__ - We want to
__change the oracle a little bit__to**define a sequence of hybrid distributions.**

Consider the ciphertext as the distribution.- Hybrid 0: $D$ gets access to the Left oracle.

$c=(x,y=f_k(x)\oplus m_L)$ - Hybrid 1: Replace $f_k$ by a random function.

$c=(x,y=r_x\oplus m_L)$ - Hybrid 2: Replace $f_k$ by a random function.

$c=(x,y=r_x)$ - Hybrid 3: Replace $f_k$ by a random function (like Hybrid 1).

$c=(x,y=r_x\oplus m_R)$ - Hybrid 4: $D$ gets access to the Right oracle (like Hybrid 0).

$c=(x,y=f_k(x)\oplus m_R)$

- Hybrid 0: $D$ gets access to the Left oracle.
- The thing we
**want to prove**is that__Hybrid 0 and Hybrid 4 are indistinguishable.__ - Prove
**Hybrid 0 = Hybrid 1**(and**Hybrid 4 = Hybrid 3**).- It can be proved by
**PRF security**. - The definition of PRF indicates $D$ cannot distinguish from the pseudorandom world and the random world.

- It can be proved by
- Prove
**Hybrid 1 = Hybrid 2**(and**Hybrid 3 = Hybrid 2**).- It can be proven by
**birthday paradox.** - Suppose $D$ has asked the oracle for $q$ times and gets $q$ ciphertexts.
- The oracle in Hybrid 2 only outputs the function value $r_x$ without $\oplus$ operation.

If $D$ gets two same ciphertexts, $D$ knows that she is interacting with the oracle in Hybrid 2. - The oracle in Hybrid 1 is one-time pad.

If $D$ gets two ciphertexts with the same $x$ but different $y$, $D$ knows that she is interacting with the oracle in Hybrid 1.

- The oracle in Hybrid 2 only outputs the function value $r_x$ without $\oplus$ operation.
- The probability of $D$
**distinguishing from**Hybrid 1 and Hybrid 2 is up to the collision probability of $x$.- The domain size of $x$ is $2^l$ that $l$ is super-polynomially large in $n$.
- The probability of picking the same $x$ is at most $q^2/2^l$, which is
__negligible__.

There are $q^2$ possible pairs to match any value in the domain size of $x$.

- It can be proven by
- The adjacent hybrid distributions are all indistinguishable, so Hybrid 0 and Hybrid 4 are indistinguishable.
- QED.

# Theorem of PRF

Finally, we can get into the theorem of PRF.

(Tired but Happy ^ - ^)

## PRG Length Extension

Before start, let’s look back at PRG Length Extension.

*Theorem*

Let $G:\{0,1\}^n \rightarrow \{0,1\}^{n+1}$ be a PRG.

Then, for every polynomial $m(n)$, there is a PRG $G':\{0,1\}^n\rightarrow\{0,1\}^{m(n)}$.

Recall the construction of the last blog.

Construction: $G(s)=G_0(s)||G_1(s)$ where $G_0(s)$ is 1 bit and $G_1(s)$ is $n$ bits.

We __parse the output of PRG as 1 bit and $n$ bits__. The 1 bit is as the output and the $n$ bits are as the seed of next call.

Write the process as a tree.

The **problem** is accessing the $i$-th output bit takes time __linear__ in $\approx i$.

## Goldreich-Goldwasser-Micali PRF

### Construction

Construction: Let $G(s)=G_0(s)||G_1(s)$ where $G_0(s)$ and $G_1(s)$ are both $n$ bits each.

So $G:\{0,1\}^n\rightarrow \{0,1\}^{2n}$ is a PRG that stretches $n$ bits into $2n$ bits and __parses the output in half.__

__Notation__: $G_0(s),G_1(s)$ are not functions in $s$. $G_0(s)$ represents the first $n$ bits of $G(s)$’s output and $G_1(s)$ represents the second $n$ bits of $G(s)$’s output.

Write it as a tree as well.

The depth is $l$ so there are $2^l$ leaves or paths.

So each path/leaf labeled by $x\in\{0,1\}^l$ corresponds to $f_s(x)$.

The pseudorandom function family $\mathcal{F}_l$ is defined by a collection of functions $f_s$ where:

$$

f_s(x_1x_2\dots x_l)=G_{x_l}(G_{x_{l-1}}(\dots G_{x_1}(s)))

$$

If $G:\{0,1\}\rightarrow \{0,1\}^2$

- $f_s$ define $2^l$ presudorandom bits.
- The $x$-th bit can be computed using $l$ evaluations of the PRG $G$.

(as opposed to $x \approx 2^l$ evaluations as before)

## Security Analysis of GGM PRF

### PRG Repetition Lemma

Before proving the security of GGM PRF, let’s introduce **a PRG Repetition Lemma**, which is the component of the following proof.

**Lemma:**

Let $G$ be a PRG.

Then, for every polynomials $L=L(n)$, the following two distributions are computationally indistinguishable:

$$

(G(s_1),G(s_2),\dots,G(s_L)\approx (u_1,u_2,\dots,u_L)

$$

This lemma can be proved easily by hybrid argument since the hybrid distributions are easy to construct.

If there is a p.p.t. __distinguisher between the two distributions__ with distinguishing advantage $\varepsilon$, then there is a p.p.t. __distinguisher for__ $G$ with advantage $\ge \varepsilon/L$.

By hybrid argument, we can __achieve the PRG reduction.__

### Step 1-Hybrid Argument

Prove it by contradiction.

Assume that there is a p.p.t. $D$ and a poly. function $p$ s.t.

$|\operatorname{Pr}[f\leftarrow \mathcal{F}_l:D^f(1^n)=1]-\operatorname{Pr}[f\leftarrow ALL_l:D^f(1^n)=1]|\ge 1/p(n)$

Under the **assumption**, we can __derive a contradiction to the security of PRG.__

The **key idea** is argument by **levels** of the tree. Very ingenious!

For simplicity, we consider the PRG of GGM PRF is $G:\{0,1\}\rightarrow \{0,1\}^2$, so the GGM PRF generate $2^l$ pseudorandom bits.

Consider the hybrid distributions of the $2^l$ bits generated by a tree.

**Hybrid 0 (The Pseudorandom World):**

In the pseudorandom world, we consider the distribution of $2^l$ bits generated by a GGM tree as Hybrid 0.

For each query $x$, the oracle responses with $b_x$ by $l$ evaluations of PRG.

**Hybrid 1:**

We can get Hybrid 1 by changing a little to the GGM tree in Hybrid 0.

The **difference** is the __first level of the tree in Hybrid 1,__ $s_0$ and $s_1$, are **random** while they are generated by PRG in Hybrid 0.

**Hybrid 2:**

We can get Hybrid 2 by changing Hybrid 1 a little.

The __second level of tree in Hybrid 2__, $s_{00}, s_{01}, s_{10}, s_{11}$, are **random** while they are generated by PRG in Hybrid 1.

…

**Hybrid $l$ (The Random World):**

We can define a sequence of hybrids slowly by changing a level of tree each time.

At last, we get the hybrid $l$, the random world, the $l$-th level of which are **all random bits.**

These hybrids can be **efficiently computable** using __lazy evaluation.__

Lazy evaluation means that there is no need to evaluation all bits $b_1b_2\dots b_{2^l}$, the oracle only evaluates $b_x$ for each query $x$.

Let $p_i=\operatorname{Pr}[f\leftarrow H_i:D^f(1^n)=1]$.

We know $p_0-p_l\ge \varepsilon$.

By a __hybrid argument__, there is some $i$ s.t. $p_i-p_{i+1}\ge \varepsilon/l$.

### Step 2-Reduction

Now we’ll **prove** that if a distinguisher with advantage $\varepsilon/l$ __between hybrids__ implies a distinguisher with advantage $\ge \varepsilon/ql$ __for the PRG__. (where $q$ is the number of queries that $D$ makes)

Assume there is a p.p.t. $D$ distinguishing from Hybrid $i$ and Hybrid $i+1$ with advantage $\varepsilon/l$.

We will construct **a Repeated Repetition Breaker $D’$** from the distinguisher $D$ so as to __get a PRG Breaker from the PRG Repetition Lemma.__

- The
**challenge**of $D’$ : She gets a tuple of $(y_1,y_2,\dots, y_q)$ and tries to guess__whether they are all pseudorandom or all random.__- $q$ is the upper bound number of queries that $D$ makes and $q=q(n)$.
- $(y_1,y_2,\dots, y_q)$ are either all pseudorandom or all random.
- Suppose each $y_i$ are $2n$ bits.

**Construction**of__Repeated PRG Breaker__$D’$- Parse each $y_i$ in half: $y_i=(y_{i,L},y_{i,R})$.
- Construct a
**oracle**by__putting these $y_i$ into the $i$-th level of the tree in order.__

It’s a GGM tree only starting from the $i$-th level.- If $(y_1,y_2,\dots, y_q)$ are all
**pseudorandom**, this__oracle implies Hybrid $i$.__

Each pair of $y_i$, $(y_{i,L},y_{i,R})$, actually has an implicit parent in the tree.

We don’t know it but it exists. - If $(y_1,y_2,\dots, y_q)$ are all
**random**, this__oracle implies Hybrid $i+1$.__ - This oracle can use lazy evaluation as well.

It only responses with $f(x)$ for the each coming query $x$.

- If $(y_1,y_2,\dots, y_q)$ are all
- Let the
**distinguisher**$D$__interact with the oracle we build.__

$D$ can make at most $q$ queries.- If $D$ distinguishes it as Hybrid $i$, $D’$ outputs ‘Pseudorandom’.
- If $D$ distinguishes it as Hybrid $i+1$, $D’$ outputs ‘Random’.

- $D’$ has advantage with $\varepsilon/l$ for
__distinguishing repeated PRG.__ - There is a
**distinguisher**with advantage $\varepsilon/lq$__for PRG__from the PRG Repetition Lemma.

## Nits of GGM PRF

Although the construction of GGM PRF is elegant, there are some nits.

Expensive: $l$ invocations of a PRG.

Sequential: bit by bit, $l$ sequential invocations of a PRG.

**Loss**in__security reduction:__break PRF with advantage $\varepsilon$ → break PRG with advantage $\varepsilon/ql$,

where $q$ is an arbitrary polynomial. $q =$ # queries of the PRF distinguisher.

So some new questions come into mind.

Is there tighter reduction ?

How to avoid the loss ?

「Cryptography-MIT6875」: Lecture 4