「Cryptography-MIT6875」: Lecture 6 - Number Theory
Topics covered:
- Groups, Order of a group and the Order of an element, Cyclic Groups.
- The Multiplicative Group $\mathbb{Z}_N^*$ and $\mathbb{Z}_P^*$ for a prime $P$.
- Generators of $\mathbb{Z}_P^*$.
- Primes, Primality Testing.
- The Discrete Logarithm (DLOG) problem and a candidate OWF.
- Diffie-Hellman assumptions: DDH and CDH.
Groups
An Abelian group $\mathbb{G}=(S,\star)$ is a set $S$ together with a operation $\star :S\times S\rightarrow S$ which satisfies
- Identity: There is an element $\mathcal{I}\in S$ such that for $a\in S$, $a\star \mathcal{I}=\mathcal{I}\star a=a$.
- Inverse: For every $a\in S$, there is an element $b\in S$ such that $a\star b =b\star a=\mathcal{I}$.
- Associativity: For every $a,b,c\in S$, $a\star (b\star c)=(a\star b)\star c$.
- Commutativity: For every $a,b\in S$, $a\star b=b\star a$.
Order of a group and the order of an element
- The order of a group is the number of elements in it, namely $|S|$.
- The order of an element $g\in S$ is the minimal number of times one has to perform the group operation on $g$ to get to the identity element $\mathcal{I}$.
That is, $\mathrm{ord}(g)=\min_{i>0}\{g^i=\mathcal{I}\}$.
Generator of a Group
A generator of a group $\mathbb{G}$ is an element of order $|\mathbb{G}|$. In other words,
$$ \mathbb{G}=\{g,g^2,\dots,g^{|\mathbb{G}|}=\mathcal{I}\} $$Cyclic group
A group $\mathbb{G}$ is called cyclic if it has a generator.
Proof[2]
Discrete Logarithms
Let $\mathbb{G}$ be a cyclic group. We know that $\mathbb{G}$ has a generator $g$, and that every $h\in \mathbb{G}$ can be written as $h=g^x$ for a unique $x\in\{1,2,\dots,|\mathbb{G}|\}$.
We write
$$
x=\operatorname{dlog}_g(h)
$$
to denote the fact that $x$ is the discrete logarithm of $h$ to the base $g$.
We will look for groups where computing the group operation is easy (namely, polynomial time) but computing discrete logarithms is hard (namely, exponential or sub-exponential time).
Our source for such groups will come from number theory.
Discrete logarithms in $\mathbb{Z}_N$ are, for better or worse, easy.
Baby (Computational) Number Theory
The complexity of basic operations with numbers. $n$ denotes the input length for each of these operations.
Greatest Common Divisors:
The greatest common divisor (gcd) of positive integers $a$ and $b$ is the largest positive integer $d$ that divides both $a$ and $b$.
$a$ and $b$ are relatively prime if their gcd is 1.
The Multiplicative Group $\mathbb{Z}_N^*$
The multiplicative group of numbers mod $N$, denoted $\mathbb{Z}_N^*$, consists of the set
$$ S=\{1\le a< N:\operatorname{gcd}(a,N)=1\} $$with multiplication mod $N$ being the operation.
Some further facts about $\mathbb{Z}_N^*$
The order of $\mathbb{Z}_N^*$, the number of positive integers smaller than $N$ that are relatively prime to it, is called the Euler totient function of $N$ denoted $\varphi(N)$.
$\varphi(p)=p-1$ if $p$ is prime.
$\varphi(p^k)=p^k-p^{k-1}$ if $p$ is prime.
$\varphi(pq)=\varphi(p)\varphi(q)$ if $\operatorname{gcd}(p,q)=1$.
If $N=\prod _i p_i^{\alpha_i}$ is the prime factorization of $N$, then $\varphi(N)=\prod_i p_i^{\alpha_i-1}(pi-1)$.
The Multiplicative Group $\mathbb{Z}_p^*$
$\mathbb{Z}_p^*$ is Cyclic.The following theorem is a very important property of $\mathbb{Z}_p^*$ when $P$ is prime.
Proof[2]
There are several followup questions.
- How many generators are there for $\mathbb{Z}_p^*$ ?
- How to tell (efficiency) if a given element $g$ is a generator ?
- How to sample a random generator for $\mathbb{Z}_p^*$ ?
$\mathbb{Z}_p^{*}$ and $\mathbb{Z}_{P-1}$
Before proceeding further, let us note the following structural fact about $\mathbb{Z}_P^*$.
There are two groups are isomorphic with an isomorphism $\phi$ that maps $x\in\mathbb{Z}_{P-1}$ to $g^x\in \mathbb{Z}_P^*$. [isomorphic: 同构的]
In particular, consider $\phi(x)=g^x \pmod P$.
We have $\phi(x+y)=\phi(x)\cdot \phi(y)$.
The isomorphism is efficiently computable in the forward direction (exponentiation, using the repeated squaring algorithm) but not known to be efficiently computable in the reverse direction.
For example, consider $\mathbb{Z}_7^*$ and $\mathbb{Z}_6$:
- $\mathbb{Z}_7^* = \{1,2,3,4,5,6\}$ and there is a generator $g=5$.
You can get $\{g,g^2,g^3,g^4,g^5,g^6\} =\{5, 4,6,2,3,1\}$.
When you perform the multiplication on $g$, you will wrap around the group.
That’s an intuitive reason why discrete logarithm is hard in $\mathbb{Z}_p^*$. - $\mathbb{Z}_6=\{1,2,3,4,5,6\}$ and the generator $g=1$.
The group operation in $\mathbb{Z}_6$ is addition.
You can get $\{g,g^2,g^3,g^4,g^5,g^6\}=\{1,2,3,4,5,6\}$.
When you perform the addition on $g$, you just walk along the group. - We can know that there is a one-to-one mapping $\phi$ from $\mathbb{Z}_6$ to $\mathbb{Z}_7^*$, that is $\mathbb{Z}_6\cong \mathbb{Z}_7^*$ .
Here is another quick application of this isomorphism:
Lemma :
Let $P$ be an odd prime.
If $g$ is a generator of $\mathbb{Z}_P^*$, then so is $g^x$ as long as $x$ and $P-1$ are relatively prime.
Proof:
There is a generator $g’$ in $\mathbb{Z}_{P-1}$corresponding to $g$ in $\mathbb{Z}_P^*$ from the isomorphism $\mathbb{Z}_P^*\cong \mathbb{Z}_{P-1}$.
$x$ and $P-1$ are relatively prime, and there are two statements.
In group $\mathbb{Z}_P^*$, if $g$ is a generator, then $g^x$ is a generator.
In group $\mathbb{Z}_{P-1}$, if $g’$ is a generator, then $xg’$ is a generator.
From the isomorphism, we can turn the question in $\mathbb{Z}_P^*$ to a relative question in $\mathbb{Z}_{P-1}$.
It’s easy to prove that if $x$ is a generator in $\mathbb{Z}_{P-1}$ (which can iterate all elements), then $ax$ is also a generator in $\mathbb{Z}_{P-1}$ (which can also iterate all elements) where $\operatorname{gcd}(a, P-1)=1$.
- The main idea is to suppose for a contradiction that there are two different $x_1$ and $x_2$,i.e. $x_1\ne x_2 \pmod{P-1}$, satisfying $ax_1=ax_2 \pmod{P-1}$.
- Then we know $P-1\mid a(x_1-x_2)$.
- From $\operatorname{gcd}(a, P-1)=1$, we know $P-1\mid x_1-x_2$.
- So $x_1= x_2 \pmod{P-1}$. (contradiction)
- QED.
As a corollary, we immediately derive the fact that $\phi(P-1)$ elements of $\mathbb{Z}_p^*$ are generators.
$\mathbb{Z}_p^*$ has lots of generators
For the first question, $\mathbb{Z}_p^*$ has lots of generators.
But how large is $\varphi(P-1)$ asymptotically? This is answered by the following classical theorem.
In other words, if you pick a random element of $\mathbb{Z}_p^*$, you will see a generator with probability
$$
\varphi(P-1)/(P-1)=\Omega(1/\log\log P)
$$
which is polynomial in $1/\log P$. So, reasonably often.[asymptotical: 渐进的]
Difference between Big $\mathcal{O}$ and Big $\Omega$:
Big $\mathcal{O}$ and Big $\Omega$ function are used in computer science to describe the performance or complexity of an algorithm.
- Big $\mathcal{O}$ is used to describe the worst case running time for an algorithm.
- But, Big $\Omega$ notation, on the other hand, is used to describe the best case running time for a given algorithm.
If you want to get a generator in $\mathbb{Z}_p^*$, you will hit a generator very quickly just keeping picking random in $\mathbb{Z}_p^*$.
Then it comes to the second question.
How to tell a generator of $\mathbb{Z}_p^*$ ?
The answer is that you can tell in poly. time whether $g\in \mathbb{Z}_p^*$ is a generator given the factorization of $P-1$.
Given the factorization of P-1
At first, we know that a generator is an element of order $P-1$, so it’s easy to check whether $g^{P-1}=1 \pmod P$.
Then we need to check if there is some smaller power of $g$ that equals $1$ since the order is the minimal power.
However, there are a large number of divisors of $P-1$, roughly $P^{1/\log \log P}$ which is not polynomial (in $\log P$).
It turns out, you do not need to check all divisors, but rather only the terminal divisors (or, the maximal factors less than $P-1$).
If $P-1=\prod_i q_i^{\alpha_i}$, the terminal divisors are $(P-1)/q_i$ for each $i$.
For example:
- If $P-1=p_1^2p_2^3p_3^2$.
- Suppose $g=p_1^2p_2^3$, then $g^{p_1^2p_2^3}=1$.
You can get some powers of $g^{p_1^2p_2^3}$ that equals $1$.
And the power to $g^{p_1^2p_2^3}$ is the multiplication to the exponent $p_1^2p_2^3$.- You can get $g^{p_1^2p_2^3p_3}=1$ which is smaller power than $P-1$.
- You can also get $g^{p_1^2p_2^3p_3^2}=1$ which is useless since $P-1=p_1^2p_2^3p_3^2$.
- So you only hit one terminal divisor, that is $(P-1)/p_3$.
- Suppose $g=p_1p_2^2p_3$, then $g^{p_1p_2^2p_3}=1$.
Similarly, you can get some powers of $g^{p_1p_2^2p_3}$ that equals $1$.- You can get $g^{p_1p_2^3p_3^2}=1$ by power $p_2p_3$.
You hit one terminal divisor $(P-1)/p_1$. - You can get $g^{p_1^2p_2^2p_3^2}=1$ by power $p_1p_3$.
You hit one terminal divisor $(P-1)/p_2$. - You can get $g^{p_1^2p_2^3p_3}=1$ by power $p_1p_2$.
You hit one terminal divisor $(P-1)/p_3$. - So you can hit 3 terminal divisors.
- You can get $g^{p_1p_2^3p_3^2}=1$ by power $p_2p_3$.
- The conclusion is that if there is a smaller power of $g$ that equals 1, then you can definitely hit some powers of terminal divisor that equals 1, which are maximal factors of $P-1$. So you can only check all the terminal divisiors.
Algorithm:
The following algorithm works on $g$ and the prime factorization of $P-1$.
- For each $i$, check if $g^{(P-1)/q_i}\overset{?}= 1\pmod P$.
- If yes for any $i$, say “not a generator”.
- Otherwise, say “generator”.
Given only g and P
That’s nice.
But can one tell if $g$ is a generator given only $g$ and $P$ ?
(as opposed to the prime factorization of $P-1$ which is in general hard to compute)
We don’t know, so there are some ways around it.
Solution 1:
Pick a random $P$ together with its prime factorization of $P-1$.
This, it turns out, can be done due to a clever algorithm of Kalai [6].
Solution 2:
Pick $P=2Q+1$ where $Q$ is prime.
Such primes like $P$ are called safe primes, and $Q$ is called a Sophie-Germain prime after the famous mathematicians.
While there are infinitely many primes, it has only been conjectured that there are infinitely many Sophie-Germain primes. This remains unproven.
Moreover, the Sophie-Germain primes are considered as the hardest class of primes for discrete logarithms.
Solution 2 is what people typically use in practice.
In practice, you just pick a random prime as $Q$ and check whether $P=2Q+1$ is a prime.
It’s efficient as long as the Sophie-Germain primes are in dense distribution.
The above solution 2 brings about new questions.
How to sample a random prime and how to test primality ?
Primes
There are some questions about primes.
- How many primes of $n$-bit length ?
- How to test primality ?
- How to sample a $n$-bit random prime ?
How many primes
The prime number theorem tells us that there are sufficiently many prime numbers.
In particular, letting $\pi(N)$ denote the number of prime numbers less than $N$, we know that $\pi(N)=\Omega(N/\log N)$.
Thus, if you pick a random number smaller than $N$, with probability $1/\log N$ (which is $1$/polynomial in bit-length $n$ in question) you have a prime number at hand.
Primality Testing
How to recognize that a given number is prime ?
This has been the subject of extensive research in computational number theory with many polynomial-time algorithms, culminating with the deterministic polynomial-time primality testing algorithm of Agrawal, Kayal and Saxena[1] (a.k.a. AKS) in 2002.
Sample a prime
How to sample a $n$-bit random prime ?
The above two facts put together tell us how to generate a random $n$-bit prime number.
- Just pick a random number less than $2^n$.
- And test if it is prime.
In expected $n$ iterations of this procedure, you will find a $n$-bit prime number, even a random prime number.
One-way Functions
One-way function is the atom of cryptography.
A one-function is a function $f$ that is easy to compute but hard to invert on average.
How to define a one-way function ?
Take 1 (Wrong):
For every p.p.t. algorithm $A$ and every chosen $x$,there is a negligible function $\mu$, s.t.
$$
\operatorname{Pr}[A(f(x))=x]<\mu(n)
$$
It is hard for $A$ to guess the inverse of $f(x)$ for every chosen $x$.
It seems right.
How about the function $f(x)=0$ which maps any $x$ to $0$ ?
The probability of guessing the inverse of $0$ is negligible if the domain size of $x$ is sufficiently large. Because it is essentially random.
But $f(x)=0$ is not one-way function obviously.
Take 2:
For every p.p.t. algorithm $A$ and every chosen $x$, there is a negligible function $\mu$,s.t.
$$
\operatorname{Pr}[A(f(x))\in \tilde{f}(f(x))]<\mu(n)
$$
where $\tilde{f}$ is the inverse function.
The difference of the new definition is that $A$ is trying to guess some possible inverses of $f(x)$ rather than the exactly chosen $x$.
In this definition, $f(x)=0$ is not one-way function.
Candidate: DLOG
Let us present an informal one-way function candidate.
$$
f(P,g,x)=(P,g,g^x \pmod P)
$$
Computing this function can be done in time polynomial in the input length.
However, inverting is the discrete logarithm problem which is conjectured to be hard.
Defined formally below.
In fact, this is not only a one-way function, but can also be made into a family of one-way permutations.
More on that later, we will see that one-way permutations can be used to build pseudorandom generators; and as we saw already, pseudorandom generators can be sued to build pseudorandom functions and stateless secret-key encryption and authentication.
We can do all the crypto we saw so far based on the hardness of the discrete logarithm problem.
Route: OWF→PRG→PRF.
However, going via this route may not be the most efficient.
So, we will look at related problems and try to build more efficient PRGs and PRFs.
The Diffie-Hellman Assumptions
There is another route to build PRGs and PRFs efficiently, that is Diffie-Hellman Assumptions.
Route: DH→PRG.
Route: DH→PRF.
Given $g^x$ and $g^y\mod P$, you can easily compute $g^{x+y}=g^x\cdot g^y \pmod P$. But it is hard to compute $g^{xy}\pmod P$.
If you can compute discrete logarithms, then you can compute $x$ from $g^x$, and raise $g^y$ to $x$ to get $(g^y)^x=g^{xy} \pmod P$.
But discrete log is hard, so this isn’t an efficient way to solve the problem.
CDH Assumption
The problem, called the computational Diffie-Hellman (CDH) problem, appears to be computationally hard, in fact as hard as computing discrete logarithms!
Moreover, it appears hard to even tell if you are given the right answer or not!
DDH Assumption
But this requires some care to formalize.
At first, one may think that given $P,g,g^x,g^y$, it is hard to distinguish between the right answer $g^{xy} \pmod P$ versus a random number $u \mod P$.
Let us call the assumption that this decisional problem is hard the decisional Diffie-Hellman (DDH) assumption.
Decisional Diffie-Hellman Assumption (first take):
For a random $n$-bit prime $P$ and random generator $g$ of $\mathbb{Z}_P^*$, and random $x,y\in \mathbb{Z}_{P-1}$ and a random number $u\in \mathbb{Z}_P^*$, there is no polynomial (in $n$) time algorithm that distinguishes between $(P,g,g^x\pmod P, g^y \pmod P, g^{xy} \pmod P)$ and $(P,g,g^x\pmod P,g^y \pmod P, u\pmod P)$.
However, the assumption turns out to be false.
DDH is False in $\mathbb{Z}_P^*$
However, the DDH assumption is false in $\mathbb{Z}_P^*$.
It seems awfully strong on first look.
It says not only it is hard to compute $g^{xy}$ from $g^x$ and $g^y$, but also that not even a single bit of $g^{xy}$ can be computed (with any polynomial advantage beyond trivial guessing).
However, there are some information about $g^{xy}$ indeed dose leak from $g^x$ and $g^y$.
Before that, let us take a quick detour in Quadratic Residues.
$$
h=g^2 \pmod P
$$
Some facts about Quadratic Residues:
- $h$ is a QR if $h=g^2\pmod P$.
- 1/2 of $\mathbb{Z}_P^*$ are QR.
- We can tell efficiently if $h$ is a QR by evaluating $h^{(P-1)/2}\overset{?}=1 \pmod P$.
- $h=g^x$ is a QR if and only if $x$ is even.
Use the facts of quadratic residue in $\mathbb{Z}_P^*$, we know the following statements are equivalent.
- $g^{xy}$ is a QR.
- iff $xy$ is even.
- iff $x$ is even or $y$ is even.
- iff $g^x$ is a QR or $g^y$ is a QR.
Now, we can distinguish between $(P,g,g^x,g^y,g^{xy})$ and $(P,g,g^x,g^y,u)$.
$g^{xy}$ is a quadratic residue with probability $3/4$ while $u$ is a quadratic residue with probability $1/2$.
So the advantage for distinguishing is $1/4$.
Thus, we need to refine our assumption.
Looking at the core reason behind the above attack, we see that there is a 1/2 chance that $g^x$ falls into a subgroup (the subgroup of quadratic residues, to be precise) and once that happens, $g^{xy}$ is also in the subgroup no matter what $y$ is.
These properties are furthermore detectable in polynomial-time which led us to attack.
A solution is to work with subgroup of $\mathbb{Z}_P^*$ of prime order.
In particular, we will take $P=2Q+1$ to be a safe prime and work with $\mathcal{QR}_P$, the subgroup of quadratic residues in $\mathbb{Z}_P^*$.
$\mathcal{QR}_P$ has order $(P-1)/2=Q$ which is indeed prime!
By virtue of this, every non-identity element of $\mathcal{QR}_P$ is its generator w.r.t. theorem 2.
With this change, we can state the following DDH assumption which is widely believed to be true.
We know DLOG→CDH→DDH but no implications are known in the reverse directions.
References
[1] M. Agrawal, N. Kayal, and N. Saxena. Primes is in p. Annals of mathematics, pages 781–793, 2004.
[2] D. Angluin. Lecture notes on the complexity of some problems est number theory. 1982.
[6] A. Kalai. Generating random factored numbers, easily. Journal of Cryptology, 16(4):287–289, 2003.
「Cryptography-MIT6875」: Lecture 6 - Number Theory