# 「Cryptography-MIT6875」: Lecture 6 - Number Theory

In this series, I will learn MIT 6.875, Foundations of Cryptography, lectured by Vinod Vaikuntanathan.
Any corrections and advice are welcome. ^ - ^
The motif of this blog is Number Theory, including the second half of Lecture 5 and the Lecture 6.

It’s an excellent opportunity to learn Number Theory in manner of English.

An important point in this blog is that we focus more on the statements, which is useful in later lectures, rather than the proof.

About 70% of the content in this blog is originally and literally from the lecture notes. I just organize and refine it according to the logic of the professor’s narration since the lecture note is awesome.

The rest is my own understanding and derivation of some theorems. And I will be learning Number Theory and completing the omitted proof.

So this blog will be updated continuously.

Topics covered:

• Groups, Order of a group and the Order of an element, Cyclic Groups.
• The Multiplicative Group $\mathbb{Z}_N^*$ and $\mathbb{Z}_P^*$ for a prime $P$.
• Generators of $\mathbb{Z}_P^*$.
• Primes, Primality Testing.
• The Discrete Logarithm (DLOG) problem and a candidate OWF.
• Diffie-Hellman assumptions: DDH and CDH.

# Groups

An Abelian group $\mathbb{G}=(S,\star)$ is a set $S$ together with a operation $\star :S\times S\rightarrow S$ which satisfies

• Identity: There is an element $\mathcal{I}\in S$ such that for $a\in S$, $a\star \mathcal{I}=\mathcal{I}\star a=a$.
• Inverse: For every $a\in S$, there is an element $b\in S$ such that $a\star b =b\star a=\mathcal{I}$.
• Associativity: For every $a,b,c\in S$, $a\star (b\star c)=(a\star b)\star c$.
• Commutativity: For every $a,b\in S$, $a\star b=b\star a$.

## Order of a group and the order of an element

• The order of a group is the number of elements in it, namely $|S|$.
• The order of an element $g\in S$ is the minimal number of times one has to perform the group operation on $g$ to get to the identity element $\mathcal{I}$.
That is, $\mathrm{ord}(g)=\min_{i>0}\{g^i=\mathcal{I}\}$.

Theorem 1 (Lagrange’s Theorem):

The order of any element divides the order of the group.

## Generator of a Group

A generator of a group $\mathbb{G}$ is an element of order $|\mathbb{G}|$. In other words,

$$\mathbb{G}=\{g,g^2,\dots,g^{|\mathbb{G}|}=\mathcal{I}\}$$

## Cyclic group

A group $\mathbb{G}$ is called cyclic if it has a generator.

Theorem 2:

Every group whose order is a prime number is cyclic.
Moreover, every element other than the identity is a generator.

Proof

## Discrete Logarithms

Let $\mathbb{G}$ be a cyclic group. We know that $\mathbb{G}$ has a generator $g$, and that every $h\in \mathbb{G}$ can be written as $h=g^x$ for a unique $x\in\{1,2,\dots,|\mathbb{G}|\}$.

We write

$$x=\operatorname{dlog}_g(h)$$

to denote the fact that $x$ is the discrete logarithm of $h$ to the base $g$.

We will look for groups where computing the group operation is easy (namely, polynomial time) but computing discrete logarithms is hard (namely, exponential or sub-exponential time).

Our source for such groups will come from number theory.

Discrete logarithms in $\mathbb{Z}_N$ are, for better or worse, easy.

# Baby (Computational) Number Theory

The complexity of basic operations with numbers. $n$ denotes the input length for each of these operations. Greatest Common Divisors:

The greatest common divisor (gcd) of positive integers $a$ and $b$ is the largest positive integer $d$ that divides both $a$ and $b$.

$a$ and $b$ are relatively prime if their gcd is 1.

# The Multiplicative Group $\mathbb{Z}_N^*$

The multiplicative group of numbers mod $N$, denoted $\mathbb{Z}_N^*$, consists of the set

$$S=\{1\le a< N:\operatorname{gcd}(a,N)=1\}$$

with multiplication mod $N$ being the operation.

Some further facts about $\mathbb{Z}_N^*$

• The order of $\mathbb{Z}_N^*$, the number of positive integers smaller than $N$ that are relatively prime to it, is called the Euler totient function of $N$ denoted $\varphi(N)$.

• $\varphi(p)=p-1$ if $p$ is prime.

• $\varphi(p^k)=p^k-p^{k-1}$ if $p$ is prime.

• $\varphi(pq)=\varphi(p)\varphi(q)$ if $\operatorname{gcd}(p,q)=1$.

• If $N=\prod _i p_i^{\alpha_i}$ is the prime factorization of $N$, then $\varphi(N)=\prod_i p_i^{\alpha_i-1}(pi-1)$.

# The Multiplicative Group $\mathbb{Z}_p^*$

$\mathbb{Z}_p^*$ is Cyclic.

The following theorem is a very important property of $\mathbb{Z}_p^*$ when $P$ is prime.

Theorem 4 :

If $P$ is prime, then $\mathbb{Z}_p^*$ is a cyclic group.

Proof

Note:

It is very tempting to prove this theorem by appealing the Theorem 2 which says that every group with prime order is cyclic. Be careful, and note that the order of $\mathbb{Z}_p^*$ is $P-1$, which is decidedly not prime.

There are several followup questions.

• How many generators are there for $\mathbb{Z}_p^*$ ?
• How to tell (efficiency) if a given element $g$ is a generator ?
• How to sample a random generator for $\mathbb{Z}_p^*$ ?

## $\mathbb{Z}_p^{*}$ and $\mathbb{Z}_{P-1}$

Before proceeding further, let us note the following structural fact about $\mathbb{Z}_P^*$.

There are two groups are isomorphic with an isomorphism $\phi$ that maps $x\in\mathbb{Z}_{P-1}$ to $g^x\in \mathbb{Z}_P^*$. [isomorphic: 同构的]

In particular, consider $\phi(x)=g^x \pmod P$.

We have $\phi(x+y)=\phi(x)\cdot \phi(y)$.

The isomorphism is efficiently computable in the forward direction (exponentiation, using the repeated squaring algorithm) but not known to be efficiently computable in the reverse direction.

For example, consider $\mathbb{Z}_7^*$ and $\mathbb{Z}_6$:

• $\mathbb{Z}_7^* = \{1,2,3,4,5,6\}$ and there is a generator $g=5$. You can get $\{g,g^2,g^3,g^4,g^5,g^6\} =\{5, 4,6,2,3,1\}$.
When you perform the multiplication on $g$, you will wrap around the group.
That’s an intuitive reason why discrete logarithm is hard in $\mathbb{Z}_p^*$.
• $\mathbb{Z}_6=\{1,2,3,4,5,6\}$ and the generator $g=1$. The group operation in $\mathbb{Z}_6$ is addition.
You can get $\{g,g^2,g^3,g^4,g^5,g^6\}=\{1,2,3,4,5,6\}$.
When you perform the addition on $g$, you just walk along the group.
• We can know that there is a one-to-one mapping $\phi$ from $\mathbb{Z}_6$ to $\mathbb{Z}_7^*$, that is $\mathbb{Z}_6\cong \mathbb{Z}_7^*$ .

Here is another quick application of this isomorphism:

Lemma :

Let $P$ be an odd prime.

If $g$ is a generator of $\mathbb{Z}_P^*$, then so is $g^x$ as long as $x$ and $P-1$ are relatively prime.

Proof:

The following proof is my own deduction since the proof is omitted in the lecture.
Corrections and advice are welcome.

There is a generator $g’$ in $\mathbb{Z}_{P-1}$corresponding to $g$ in $\mathbb{Z}_P^*$ from the isomorphism $\mathbb{Z}_P^*\cong \mathbb{Z}_{P-1}$.

$x$ and $P-1$ are relatively prime, and there are two statements.

In group $\mathbb{Z}_P^*$, if $g$ is a generator, then $g^x$ is a generator.

In group $\mathbb{Z}_{P-1}$, if $g’$ is a generator, then $xg’$ is a generator.

From the isomorphism, we can turn the question in $\mathbb{Z}_P^*$ to a relative question in $\mathbb{Z}_{P-1}$.

It’s easy to prove that if $x$ is a generator in $\mathbb{Z}_{P-1}$ (which can iterate all elements), then $ax$ is also a generator in $\mathbb{Z}_{P-1}$ (which can also iterate all elements) where $\operatorname{gcd}(a, P-1)=1$.

• The main idea is to suppose for a contradiction that there are two different $x_1$ and $x_2$,i.e. $x_1\ne x_2 \pmod{P-1}$, satisfying $ax_1=ax_2 \pmod{P-1}$.
• Then we know $P-1\mid a(x_1-x_2)$.
• From $\operatorname{gcd}(a, P-1)=1$, we know $P-1\mid x_1-x_2$.
• So $x_1= x_2 \pmod{P-1}$. (contradiction)
• QED.

As a corollary, we immediately derive the fact that $\phi(P-1)$ elements of $\mathbb{Z}_p^*$ are generators.

## $\mathbb{Z}_p^*$ has lots of generators

For the first question, $\mathbb{Z}_p^*$ has lots of generators.

Theorem 5:

The number of generators in $\mathbb{Z}_p^*$ is $\varphi(P-1)$.

But how large is $\varphi(P-1)$ asymptotically? This is answered by the following classical theorem.

Theorem 6:

For every integer $N$, $\varphi(N)=\Omega(N/\log \log N)$.

In other words, if you pick a random element of $\mathbb{Z}_p^*$, you will see a generator with probability

$$\varphi(P-1)/(P-1)=\Omega(1/\log\log P)$$

which is polynomial in $1/\log P$. So, reasonably often.[asymptotical: 渐进的]

Difference between Big $\mathcal{O}$ and Big $\Omega$:
Big $\mathcal{O}$ and Big $\Omega$ function are used in computer science to describe the performance or complexity of an algorithm.

• Big $\mathcal{O}$ is used to describe the worst case running time for an algorithm.
• But, Big $\Omega$ notation, on the other hand, is used to describe the best case running time for a given algorithm.

If you want to get a generator in $\mathbb{Z}_p^*$, you will hit a generator very quickly just keeping picking random in $\mathbb{Z}_p^*$.

Then it comes to the second question.

## How to tell a generator of $\mathbb{Z}_p^*$ ?

The answer is that you can tell in poly. time whether $g\in \mathbb{Z}_p^*$ is a generator given the factorization of $P-1$.

### Given the factorization of P-1​

At first, we know that a generator is an element of order $P-1$, so it’s easy to check whether $g^{P-1}=1 \pmod P$.

Then we need to check if there is some smaller power of $g$ that equals $1$ since the order is the minimal power.

However, there are a large number of divisors of $P-1$, roughly $P^{1/\log \log P}$ which is not polynomial (in $\log P$).

It turns out, you do not need to check all divisors, but rather only the terminal divisors (or, the maximal factors less than $P-1$).

If $P-1=\prod_i q_i^{\alpha_i}$, the terminal divisors are $(P-1)/q_i$ for each $i$.

For example:

• If $P-1=p_1^2p_2^3p_3^2$.
• Suppose $g=p_1^2p_2^3$, then $g^{p_1^2p_2^3}=1$.
You can get some powers of $g^{p_1^2p_2^3}$ that equals $1$.
And the power to $g^{p_1^2p_2^3}$ is the multiplication to the exponent $p_1^2p_2^3$.
• You can get $g^{p_1^2p_2^3p_3}=1$ which is smaller power than $P-1$.
• You can also get $g^{p_1^2p_2^3p_3^2}=1$ which is useless since $P-1=p_1^2p_2^3p_3^2$.
• So you only hit one terminal divisor, that is $(P-1)/p_3$.
• Suppose $g=p_1p_2^2p_3$, then $g^{p_1p_2^2p_3}=1$.
Similarly, you can get some powers of $g^{p_1p_2^2p_3}$ that equals $1$.
• You can get $g^{p_1p_2^3p_3^2}=1$ by power $p_2p_3$.
You hit one terminal divisor $(P-1)/p_1$.
• You can get $g^{p_1^2p_2^2p_3^2}=1$ by power $p_1p_3$.
You hit one terminal divisor $(P-1)/p_2$.
• You can get $g^{p_1^2p_2^3p_3}=1$ by power $p_1p_2$.
You hit one terminal divisor $(P-1)/p_3$.
• So you can hit 3 terminal divisors.
• The conclusion is that if there is a smaller power of $g$ that equals 1, then you can definitely hit some powers of terminal divisor that equals 1, which are maximal factors of $P-1$. So you can only check all the terminal divisiors.

Algorithm:

The following algorithm works on $g$ and the prime factorization of $P-1$.

• For each $i$, check if $g^{(P-1)/q_i}\overset{?}= 1\pmod P$.
• If yes for any $i$, say “not a generator”.
• Otherwise, say “generator”.

### Given only g​ and ​P​

That’s nice.

But can one tell if $g$ is a generator given only $g$ and $P$ ?
(as opposed to the prime factorization of $P-1$ which is in general hard to compute)

We don’t know, so there are some ways around it.

Solution 1:

Pick a random $P$ together with its prime factorization of $P-1$.

This, it turns out, can be done due to a clever algorithm of Kalai .

Solution 2:

Pick $P=2Q+1$ where $Q$ is prime.

Such primes like $P$ are called safe primes, and $Q$ is called a Sophie-Germain prime after the famous mathematicians.

While there are infinitely many primes, it has only been conjectured that there are infinitely many Sophie-Germain primes. This remains unproven.

Moreover, the Sophie-Germain primes are considered as the hardest class of primes for discrete logarithms.

Solution 2 is what people typically use in practice.

In practice, you just pick a random prime as $Q$ and check whether $P=2Q+1$ is a prime.

It’s efficient as long as the Sophie-Germain primes are in dense distribution.

The above solution 2 brings about new questions.

How to sample a random prime and how to test primality ?

## Primes

There are some questions about primes.

• How many primes of $n$-bit length ?
• How to test primality ?
• How to sample a $n$-bit random prime ?

### How many primes

The prime number theorem tells us that there are sufficiently many prime numbers.

In particular, letting $\pi(N)$ denote the number of prime numbers less than $N$, we know that $\pi(N)=\Omega(N/\log N)$.

Thus, if you pick a random number smaller than $N$, with probability $1/\log N$ (which is $1$/polynomial in bit-length $n$ in question) you have a prime number at hand.

### Primality Testing

How to recognize that a given number is prime ?

This has been the subject of extensive research in computational number theory with many polynomial-time algorithms, culminating with the deterministic polynomial-time primality testing algorithm of Agrawal, Kayal and Saxena (a.k.a. AKS) in 2002.

### Sample a prime

How to sample a $n$-bit random prime ?

The above two facts put together tell us how to generate a random $n$-bit prime number.

1. Just pick a random number less than $2^n$.
2. And test if it is prime.

In expected $n$ iterations of this procedure, you will find a $n$-bit prime number, even a random prime number.

# One-way Functions

One-way function is the atom of cryptography.

A one-function is a function $f$ that is easy to compute but hard to invert on average.

How to define a one-way function ?

Take 1 (Wrong):

For every p.p.t. algorithm $A$ and every chosen $x$,there is a negligible function $\mu$, s.t.
$$\operatorname{Pr}[A(f(x))=x]<\mu(n)$$
It is hard for $A$ to guess the inverse of $f(x)$ for every chosen $x$.

It seems right.

How about the function $f(x)=0$ which maps any $x$ to $0$ ?

The probability of guessing the inverse of $0$ is negligible if the domain size of $x$ is sufficiently large. Because it is essentially random.

But $f(x)=0$ is not one-way function obviously.

Take 2:

For every p.p.t. algorithm $A$ and every chosen $x$, there is a negligible function $\mu$,s.t.
$$\operatorname{Pr}[A(f(x))\in \tilde{f}(f(x))]<\mu(n)$$
where $\tilde{f}$ is the inverse function.

The difference of the new definition is that $A$ is trying to guess some possible inverses of $f(x)$ rather than the exactly chosen $x$.

In this definition, $f(x)=0$ is not one-way function.

## Candidate: DLOG

Let us present an informal one-way function candidate.

$$f(P,g,x)=(P,g,g^x \pmod P)$$

Computing this function can be done in time polynomial in the input length.

However, inverting is the discrete logarithm problem which is conjectured to be hard.

Defined formally below.

Discrete Log Assumption (DLOG):

For a random $n$-bit prime $P$ and random generator $g$ of $\mathbb{Z}_P^*$, and a random $x\in \mathbb{Z}_{P-1}$, there is no polynomial (in $n$) time algorithm that computes $x$ given $P,g,g^x \pmod P$.

Shorthand:

• Easy: $P,g,x\rightarrow g^x$
• Hard: $P,g,g^x\rightarrow x$

In fact, this is not only a one-way function, but can also be made into a family of one-way permutations.

More on that later, we will see that one-way permutations can be used to build pseudorandom generators; and as we saw already, pseudorandom generators can be sued to build pseudorandom functions and stateless secret-key encryption and authentication.

We can do all the crypto we saw so far based on the hardness of the discrete logarithm problem.

Route: OWF→PRG→PRF.

However, going via this route may not be the most efficient.

So, we will look at related problems and try to build more efficient PRGs and PRFs.

## The Diffie-Hellman Assumptions

There is another route to build PRGs and PRFs efficiently, that is Diffie-Hellman Assumptions.

Route: DH→PRG.

Route: DH→PRF.

Given $g^x$ and $g^y\mod P$, you can easily compute $g^{x+y}=g^x\cdot g^y \pmod P$. But it is hard to compute $g^{xy}\pmod P$.

If you can compute discrete logarithms, then you can compute $x$ from $g^x$, and raise $g^y$ to $x$ to get $(g^y)^x=g^{xy} \pmod P$.

But discrete log is hard, so this isn’t an efficient way to solve the problem.

### CDH Assumption

The problem, called the computational Diffie-Hellman (CDH) problem, appears to be computationally hard, in fact as hard as computing discrete logarithms!

Computational Diffie-Hellman Assumption (CDH):

For a random $n$-bit prime $P$ and random generator $g$ of $\mathbb{Z}_P^*$, and random $x,y\in \mathbb{Z}_{P-1}$, there is no polynomial (in $n$) time algorithm that computes $g^{xy}\pmod P$ given $P,g,g^x\pmod P,g^y\pmod P$.

Shorthand:

• Easy: $P,g,g^x,g^y\rightarrow g^{x+y}$
• Hard: $P,g,g^x,g^y\rightarrow g^{xy}$

Moreover, it appears hard to even tell if you are given the right answer or not!

### DDH Assumption

But this requires some care to formalize.

At first, one may think that given $P,g,g^x,g^y$, it is hard to distinguish between the right answer $g^{xy} \pmod P$ versus a random number $u \mod P$.

Let us call the assumption that this decisional problem is hard the decisional Diffie-Hellman (DDH) assumption.

Decisional Diffie-Hellman Assumption (first take):

For a random $n$-bit prime $P$ and random generator $g$ of $\mathbb{Z}_P^*$, and random $x,y\in \mathbb{Z}_{P-1}$ and a random number $u\in \mathbb{Z}_P^*$, there is no polynomial (in $n$) time algorithm that distinguishes between $(P,g,g^x\pmod P, g^y \pmod P, g^{xy} \pmod P)$ and $(P,g,g^x\pmod P,g^y \pmod P, u\pmod P)$.

However, the assumption turns out to be false.

### DDH is False in $\mathbb{Z}_P^*$

However, the DDH assumption is false in $\mathbb{Z}_P^*$.

It seems awfully strong on first look.

It says not only it is hard to compute $g^{xy}$ from $g^x$ and $g^y$, but also that not even a single bit of $g^{xy}$ can be computed (with any polynomial advantage beyond trivial guessing).

However, there are some information about $g^{xy}$ indeed dose leak from $g^x$ and $g^y$.

Before that, let us take a quick detour in Quadratic Residues.

$$h=g^2 \pmod P$$

• $h$ is a QR if $h=g^2\pmod P$.
• 1/2 of $\mathbb{Z}_P^*$ are QR.
• We can tell efficiently if $h$ is a QR by evaluating $h^{(P-1)/2}\overset{?}=1 \pmod P$.
• $h=g^x$ is a QR if and only if $x$ is even.

Use the facts of quadratic residue in $\mathbb{Z}_P^*$, we know the following statements are equivalent.

• $g^{xy}$ is a QR.
• iff $xy$ is even.
• iff $x$ is even or $y$ is even.
• iff $g^x$ is a QR or $g^y$ is a QR.

Now, we can distinguish between $(P,g,g^x,g^y,g^{xy})$ and $(P,g,g^x,g^y,u)$.

$g^{xy}$ is a quadratic residue with probability $3/4$ while $u$ is a quadratic residue with probability $1/2$.

So the advantage for distinguishing is $1/4$.

Thus, we need to refine our assumption.

Looking at the core reason behind the above attack, we see that there is a 1/2 chance that $g^x$ falls into a subgroup (the subgroup of quadratic residues, to be precise) and once that happens, $g^{xy}$ is also in the subgroup no matter what $y$ is.

These properties are furthermore detectable in polynomial-time which led us to attack.

A solution is to work with subgroup of $\mathbb{Z}_P^*$ of prime order.

In particular, we will take $P=2Q+1$ to be a safe prime and work with $\mathcal{QR}_P$, the subgroup of quadratic residues in $\mathbb{Z}_P^*$.

$\mathcal{QR}_P$ has order $(P-1)/2=Q$ which is indeed prime!

By virtue of this, every non-identity element of $\mathcal{QR}_P$ is its generator w.r.t. theorem 2.

With this change, we can state the following DDH assumption which is widely believed to be true.

Decisional Diffie-Hellman Assumption (final):

Let $P=2Q+1$ be a random $n$-bit safe prime and let $\mathcal{QR}_P$ denote the subgroup of quadratic residues in $\mathbb{Z}_P^*$.

For a random generator $g$ of $\mathcal{QR}_P$, and random $x,y\in \mathbb{Z}_Q$ and a random number $u\in \mathcal{QR}_P$, there is no polynomial (in $n$) that distinguishes between $(P,g,g^x\pmod P,g^y \pmod P, g^{xy}\pmod P)$ and $(P,g,g^x\pmod P, g^y \pmod P,u\pmod P)$.

Shorthand:

• Hard: distinguish between $(P,g,g^x,g^y,g^{xy})$ and $(P,g,g^x,g^y,u)$.

We know DLOG→CDH→DDH but no implications are known in the reverse directions.

 M. Agrawal, N. Kayal, and N. Saxena. Primes is in p. Annals of mathematics, pages 781–793, 2004.

 D. Angluin. Lecture notes on the complexity of some problems est number theory. 1982.

 A. Kalai. Generating random factored numbers, easily. Journal of Cryptology, 16(4):287–289, 2003.