「Algebraic ECCs」: Lec4 Singleton + Plotkin Bounds and RS Code
Topics Covered:
- Singleton Bound
- Plotkin Bound
- Reed-Solomon Codes
- Dual View of RS Codes
- Generalized RS Code
In the previous lecture, we explored the GV bound and the Hamming bound in the asymptotic setting where $n,k,d$ gets big. We would like to push the GV bound as much up as possible while at the same time push down the Hamming bound as much as possible.
Today, we are going to learn two additional bounds, the Singleton and Plotkin bounds, which narrow down the yellow region a little bit.
Additionally, we will learn Reed Solomon codes, which meet the Singleton bound.
Singleton Bound
Before proving the Singleton bound, let’s examine what it looks like. The Singleton bound states that $k\le n-d+1$ for an $(n,k,d)_q$ code. In the asymptotic setting, this translates to
$$
R\le 1-\delta
$$
as illustrated below:
The purple line represents the Singleton bound for binary codes ($q=2$). This implies that all points above the Singleton bound are unachievable, which we already know from the Hamming bound. Hence, for $q=2$, the Singleton bound is strictly weaker than the Hamming bound, despite its simplicity.
However, when $q\to \infty$, the Singleton bound can be better than the Hamming bound. For $q>2$, the Hamming bound shifts up while the Singleton bound stays the same as illustrated below:
This shows that the region to the bottom right (below the Hamming bound but above the Singleton bound) becomes unachievable. This is information we cannot infer from the Hamming bound alone.
Now, let’s prove the Singleton bound.
Proof of the Singleton Bound:
- For a codeword $c=(x_1,\dots, x_n)\in \mathcal{C}$, consider throwing out the last $d-1$ coordinates. $$ c=(\underbrace{x_1, x_2, \dots, x_{n-d+1}}_{\text{call this }\varphi(c)\in \Sigma^{n-d+1}}, \underbrace{x_{n-d+2}, \dots, x_n}_{\text{get rid of these}}) $$ Define the first $n-d+1$ coordinates as $\varphi(c)\in \Sigma^{n-d+1}$.
Now, we define a new code:
$$
\tilde{\mathcal{C}}={\varphi(c):c\in \mathcal{C}}
$$which is the set of all $\varphi(c)$ for every codeword $c\in \mathcal{C}$. Thus, $\tilde{\mathcal{C}}\subseteq \Sigma^{n-d+1}$.
We derive two key claims:
- Claim 1: $|\mathcal{C}|=|\tilde{\mathcal{C}}|$.
If not, then there must be a collision: $\exists c, c’\in \mathcal{C}, c\ne c’$ such that $\varphi(c)=\varphi(c’)$. This implies that $c$ and $c’$ differ only in their last $d-1$ coordinates, meaning $\Delta(c, c’)\le d-1$, which contradicts the distance $d$ of the original code. - Claim 2: $|\tilde{\mathcal{C}}|\le q^{n-d+1}$.
This follows because $\tilde{\mathcal{C}}\subseteq \Sigma^{n-d+1}$.
- Claim 1: $|\mathcal{C}|=|\tilde{\mathcal{C}}|$.
Combining these two claims, we have: $|\mathcal{C}|=q^k\le q^{n-d+1}$. Taking log base $q$ gives us $k\le n-d+1$.
$\blacksquare$
Plotkin Bound
Recall that the GV bound only works up to the relative distance $\delta=d/n\le 1-1/q$.
Hence, as depicted below, there is a gap between $\delta\in (1-1/q, 1)$. We are wondering if it is possible to have codes with relative distance greater than $1-1/q$ and rate greater than $0$.
Unfortunately, the answer is no. This is what the Plotkin bound tells us:
This implies that the asymptotic rate $R=\frac{\log_q|\mathcal{C}|}{n}$ approaches to $0$ as $n\to \infty$.
Thus, in order to have a constant-rate code, we must have $d<(1-1/q)\cdot n$.
The proof of the Plotkin bound is omitted in class. For details, refer to “Essential Coding Theory”, Section 4.4.
Instead, we prove a corollary, which extends the Plotkin bound and achieves a trade-off when distance $\delta < 1 - 1/q$.
Before proving this corollary, let’s look at what the Plotkin bound looks like when $q=2$.
The green line represents the Plotkin bound for binary codes ($q=2$). When $\delta$ is small, the Plotkin bound is a little worse than the Hamming bound. But when $\delta$ gets larger, it is better than the Hamming bound.
When $q>2$, the Plotkin bound is also a straight line ending at $\delta=1-1/q$, which is also the endpoint of the GV bound.
Proof of Corollary of Plotkin Bound:
Assuming the Plotkin Bound holds.
Choose $n’=\lfloor{dq \over q-1}\rfloor-1$ such that $d>(1-1/q)\cdot n’$.
Notice that $n’<{dq\over q-1}$, so $d>(1-1/q)\cdot n’$. This will be useful when applying the Plotkin bound.For all $x\in \Sigma^{n-n’}$, define a new code $\mathcal{C}_x$: $$ \mathcal{C}_x=\{(\underbrace{c_{n-n'+1}, \dots, c_n}_{n'}): c\in \mathcal{C} \text{ with }(\underbrace{c_1, \dots, c_{n-n'}}_{n-n'})=x\} $$ which is the set of ENDs (the last $n’$ symbols) of codewords that BEGIN with $x$.
Now $\mathcal{C}_x$ has distance $\ge d$ with block length $n’<{d\over 1-1/q}$.
Why is this true?
For any two different codeword $c, c’\in \mathcal{C}$, they must have distance from each other at least $d$. If they have the same first $n-n’$ symbols denoted by $x$, they corresponds to the codewords in $\mathcal{C}_x$.
Thus, their distance must come from the end part, meaning that $\mathcal{C}_x$ also has the distance $\ge d$.
Applying the Plotkin bound for $\mathcal{C}_x$ with $d>(1-1/q)\cdot n’$, we have
$$
|\mathcal{C}_x|\le {d\over d-(1-1/q)\cdot n’}= {qd\over qd - (q-1)\cdot n’}\le qd
$$The second inequality follows by the fact that the denominator $qd-(q-1)\cdot n’$ is an integer $>0$. Thus, in particular, it is $\ge 1$.
We can plug this bound into the original code $\mathcal{C}$ since each codeword in $\mathcal{C}$ shows in only a certain $\mathcal{C}_x$. $$ \begin{align} |\mathcal{C}| =\sum_{x\in \Sigma^{n-n'}}|\mathcal{C}_x| &\le q^{n-n'} \cdot qd \\ &=q^{(n-\lfloor {qd\over q-1}\rfloor +1)}\cdot qd\\ &=\exp_q(n-{qd\over q-1} + o(n))\\ &=\exp_q(n (1-\delta\cdot ({q\over q-1})+o(1)) \end{align} $$
where the third equality captures the floor operation and constant by $o(n)$.
Taking log base $q$ to the both sides, we have $R\le 1-\delta\cdot ({q\over q-1}+o(1))$ as desired.
$\blacksquare$
Discussion of Two Bounds
Both the Singleton and Plotkin bounds indicate the impossible results. They demonstrate that what trade-off between the distance and rate is impossible while the GV bound shows the possible trade-off.
As depicted below, the Plotkin bounds seems strictly better than the Singleton bound.
Why would we bother to discuss the Singleton bound?
On one hand, it is true.
But one the other hand, we’ll see a family of codes that indeed achieve the Singleton bound.
Before we get there, you might think it impossible given what we’ve just stated that the Singleton bound shows the impossible results. We table the discussion in the next section.
Reed-Solomon Codes
Reed-Solomon code is a family of codes that achieve the Singleton bound while admitting efficient error-correcting algorithm. This code is widely used in practice.
Before getting into Reed-Solomon codes, let’s first discuss the polynomials over finite fields and the Vandermonde matrix.
Polynomial:
A (univariate) polynomial in variable $X$ over the finite field $\mathbb{F}_q$ of degree $d$ is of the form:
$$
f(X)=a_0+a_1\cdot X + \dots a_d\cdot X^d
$$
where each coefficient $a_i\in \mathbb{F}_q$ and the top coefficient $a_d\ne 0$.
The set of all univariate polynomials with coefficients in $\mathbb{F}_q$ is denoted by $\mathbb{F}_q[X]$.
Polynomials over finite fields behave similarly to those over $\mathbb{R}$.
There is a simple but super useful fact about polynomials that low-degree polynomial do not have too many roots.
Fact:
A non-zero polynomial of $f$ of degree $d$ over $\mathbb{F}_q$ has at most $d$ roots.
Proof Sketch:
- If $f(\beta)=0$, then $(x-\beta)\mid f$.
- If $f$ has $d+1$ (distinct) $\beta_1,\dots, \beta_{d+1}$ roots, then $(x-\beta_1)(x-\beta_2)\dots(x-\beta_{d+1})\mid f$ .
This leads to a contradiction because the grand product on the left has degree $d+1$, while $f(X)$ has degree of $d$.
Example:
Consider the field $\mathbb{F}_3=\{0, 1, 2\}$:
- $f(X)=X^2-1$ has two roots $[f(2)=f(1)=0]$
- $f(X)=X^2+2X+1$ has one root $[f(2)=0]$
- $f(X)=X^2+1$ has zero root
Note: The polynomial $X^2+1$ DOES have a root over $\mathbb{F}_2$, showing that the choice of the field matters.
Vandermonde Matrix
Next, we will explore some useful facts about Vandermonde matrix.
Vandermonde matrix:
A Vandermonde matrix has the form $$ V=\left[\begin{array}{c} 1 & \alpha_1 & \alpha_1^2 & \dots & \alpha_1^m \\ 1 & \alpha_2 & \alpha_2^2 & \dots & \alpha_2^m \\ 1 & \alpha_3 \\ \vdots \\ 1 & \alpha_n & \alpha_n^2 & \dots & \alpha_n^m \end{array}\right] $$ where $\alpha_1, \dots, \alpha_n\in \mathbb{F}_q$ are distinct. Aka, $V_{ij}=\alpha_i^{j-1}$.Proof 1:
Consider a square Vandermonde matrix $V\in \mathbb{F}^{n\times n}$.
Suppose $\vec{a}=(a_0, \dots, a_n)$ is a vector. Then $V\cdot \vec{a}$ can be expressed as $$ V\cdot \vec{a}=\left ( \begin{array}{c} \sum_{i=0}^{n-1} a_i\cdot \alpha_1^i \\ \sum_{i=0}^{n-1} a_i\cdot \alpha_2^i \\ \vdots \\ \sum_{i=0}^{n-1} a_i\cdot \alpha_n^i \end{array} \right )=\left ( \begin{array}{c} f(\alpha_1)\\ f(\alpha_2)\\ \vdots \\ f(\alpha_n) \end{array} \right ) $$ where $f(X)=a_0+a_1X+\dots a_{n-1}X^{n-1}$.
To prove that the Vandermonde matrix is invertible, we’d like to show:
if $V\cdot \vec{a}=0$, then $\vec{a}$ itself must be $0$.This is true because
- Case 1: If $f$ is a zero polynomial (i.e., $\vec{a}$ itself is the zero vector), it is clear that $V\cdot \vec{a}=0$.
- Case 2: If $f$ is a non-zero polynomial (i.e., $\vec{a}$ is a non-zero vector), then $f(X)$ has degree at most $n-1$ and cannot have $n$ roots. So, we have $V\cdot \vec{a}\ne 0$.
Hence, $\text{Ker}(V)=\emptyset$, so $V$ is invertible.
$\blacksquare$
Proof 2:
It can be proven by its determinate.
The determinate of a Vandermonde matrix is
$$
\det(V)=\prod_{1\le i<j\le n}(a_i-a_j)\ne 0
$$
since $a_i\ne a_j$ for any $i\ne j$. $\blacksquare$
Proof:
A $(r+1)\times (r+1)$ square submatrix takes the following form: $$ \left[ \begin{array}{c} \alpha_i^j & \alpha_i^{j+1} & \dots &\alpha_i^{j+r-1} \\ \alpha_{i+1}^j & \alpha_{i+1}^{j+1} & \dots &\alpha_{i+1}^{j+r-1} \\ \vdots \\ \alpha_{i+r}^j & \alpha_{i+r}^{j+1} & \dots & \alpha_{i+r}^{j+r} \end{array} \right]=D\cdot V $$ where:
- $D$ is a diagonal matrix with non-zero diagonal entries $\alpha_i^j, \dots, a_{i+r}^j$,
- $V$ is a $(r+1)\times (r+1)$ Vandermonde matrix.
The invertibility of $D\cdot V$ depends on the invertibility of both $D$ and $V$:
- $D$ is invertible if all diagonal entries are non-zero. This is guaranteed if either $j=0$ or $a_i\ne 0$ for all $i$, as required in the caveat.
- $V$ is invertible by the aforementioned theorem.
$\blacksquare$
These facts about Vandermonde matrices will be useful.
First, they imply that “polynomial interpolation works over $\mathbb{F}_q$”.
Theorem:
Given $(\alpha_i, y_i)\in \mathbb{F}_q\times \mathbb{F}_q$ for $i=1,\dots, d+1$, there is a unique degree-$d$ polynomial $f$ so that $f(\alpha_i)=y_i$ for $\forall i$.
Proof:
If $f(X)=a_0+a_1X+\dots+a_dX^d$, then the requirement that $f(\alpha_i)=y_i$ for all $i$ can be written as $V\cdot \vec{a}=\vec{y}$, where $V$ is a square Vandermonde matrix.
Hence, $\vec{a}=V^{-1}y$ is the unique solution because linear algebra “works” over $\mathbb{F}_q$.
$\blacksquare$
Moreover, the proof implies that we can find $f$ efficiently.
Actually, we can compute $f$ very efficiently by using NTT (Number Theoretic Transform), which allows for fast multiplication by certain special Vandermonde matrix .
Fact:
All functions $f:\mathbb{F}_q\mapsto \mathbb{F}_q$ are polynomials of degree $\le q-1$.
Proof:
There are only $q$ points in $\mathbb{F}_q$, so we can interpolate a (unique) degree $\le q-1$ polynomial through any function.
Example:
$f(X)=X^q$ must have some representation as a degree $\le q-1$ polynomial over $\mathbb{F}_q$.
It is $X^q\equiv X$ because $\alpha^q=\alpha$ for all $\alpha\in \mathbb{F}_q$ (by Fermat’s little theorem).
RS code
Now, we are finally ready to define the Reed-Solomon codes.
The basic idea is that low-degree polynomial does not have too many roots so that the RS code can achieve a fairly good trade-off between the rate and the distance.
The codeword of Reed-Solomon code consists of evaluations of low-degree polynomials.
This implies that these codewords don’t have too many zeros so that the distance of the code is good.
Additionally, this definition implies a natural encoding map for RS code:
$$
\vec{x}=(x_0, \dots, x_{k-1})\mapsto (f_{\vec{x}}(\alpha_0),\dots,f_{\vec{x}}(\alpha_{n}))
$$
where $f_{\vec{x}}(X)=x_0+x_1\cdot X+\dots x_{k-1}\cdot X^{k-1}$.
This also implies the Reed-Solomon code is a linear code with Vandermonde matrix as the generator matrix.
Property:
$\text{RS}_q(\vec{\alpha}, n, k)$ is a linear code, and the generator matrix is the $n\times k$ Vandermonde matrix with rows corresponding to $\alpha_1, \alpha_2, \dots, \alpha_n$.
Proof:
The proof is clear when we write down the generator matrix.
In the view of generator matrix, we have $\dim(\text{RS}(n, k))=k$ since the Vandermonde matrix has rank $k$.
Then we can easily compute distance of a linear code.
Proof:
- Since RS code is a linear code, $\text{dist}(\text{RS}_q(n, k)=\min_{c\in \text{RS}}\text{wt}(c)$.
- It suffices to show that $\max$ #non-zeros of any non-zero codewords is $k-1$.
- The #zeros of a non-zero codeword corresponds to the #roots of a non-zero polynomial of degree at most $k-1$.
- $\blacksquare$
The distance of the RS codes achieves the Singleton bound, and it is optimal for any $n$ and $k$ we choose.
Corollary:
RS codes exactly meet the Singleton bound.
Proof Sketch:
- To prove the distance is $n-k+1$, it suffices to show the code can correct any $n-k$ erasures.
- For a codeword, any $n-k$ erasures leave us $k$ remaining non-erased positions, which corresponds to $k$ rows of the generator matrix, forming a $k\times k$ submatrix.
- We can recover the message $x$ if and only if the submatrix is invertible/full rank.
- $\blacksquare$
We have seen that the RS code is MSD and it has the Vandermonde matrix as the generator matrix, so the property also holds for RS code.
Discussion of Two Bound (cont.)
We previously showed that the distance-rate trade-off in the Plotkin Bound and the Singleton Bound.
Both bounds indicate the impossible results, and the Plotkin bound is strictly better than the Singleton bound for any $q\ge 2$.
This suggests that the Singleton bound should never be achievable. However, the MDS codes are defined as codes that achieves the Singleton bound.
Why don’t MDS codes violate the Singleton bound?
The key lies in the fact that the above figure only applies to any fixed $q$.
However, in RS codes, the alphabet size $q$ is NOT fixed—it’s growing with $n$.
Thus, to get a MDS code, $q$ must be growing with $n$.
How big does $q$ have to be?
It is an open question in general!
Dual View of RS Codes
What is the parity-check matrix of an RS code?
Before getting this, we need to recall some preliminary algebra.
Multiplicative Group $\mathbb{F}_q^*$:
The set $\mathbb{F}_q^*$ is the multiplicative group of non-zero elements in $\mathbb{F}_q$, i.e. $\mathbb{F}_q^*=\mathbb{F}_q\backslash\{0\}$. $\mathbb{F}_q^*$ is CYCLIC, which means there’s some $\gamma\in \mathbb{F}_q^*$ so that $$ \mathbb{F}_q^*=\{\gamma, \gamma^2, \dots, \gamma^{q-1}\} $$ where $\gamma$ is called a Primitive Element of $\mathbb{F}_q$.Example:
- 2 is a primitive element of $\mathbb{F}_5$, and $\mathbb{F}_5^*=\{2, 2^2=4, 2^3=3, 2^4=1\}$.
- 4 is NOT a primitive element, since $4^2=1,4^3=-1, 4^4=1, 4^5=-1, \dots$, and we will never generate 2 or 3 as a power of 4.
Lemma:
For any integer $d$ with $0<d<q-1$, the sum of all $d$-th powers in $\mathbb{F}_q^*$ is zero, i.e.
$$ \sum_{\alpha\in \mathbb{F}_q}\alpha^d =0$$
Proof:
- $\sum_{\alpha\in \mathbb{F}_q}\alpha^d=\sum_{\alpha\in \mathbb{F}_q^*}\alpha^d$
- We are leaving out $\alpha=0$ since $0^d$ contributes 0 to the sum.
$=\sum_{j=0}^{q-2}(\gamma^j)^d$ for a primitive element $\gamma$.
- The index start from 0 since $\gamma^0=\gamma^{q-1}=1$.)
$=\sum_{j=0}^{q-2}(\gamma^d)^j$ for a primitive element $\gamma$.
- We are switching the order of the exponents.
$={1-(\gamma^d)^{q-1}\over 1-\gamma^d}$
- Fact: For any $x\ne 1$, it follows that $(1-x)\cdot (\sum_{j=0}^{n-1}x^j)=1-x^n$, and so $\sum_{j=0}^{n-1}x^j={1-x^n\over 1-x}$ for any $n$.
$={1-1\over 1-\gamma^d}=0$
- $(\gamma^d)^{q-1}\cdot \gamma^d = (\gamma^d)^q=\gamma^d$. (Fermat’s little theorem)
- $\gamma^d\ne0$ since $0<d<q-1$.
- So $(\gamma^d)^{q-1}=1$.
Now, we can view the RS code in a new perspective, which allows us to capture what the parity-check matrix of the RS code looks like.
This proposition kinds of flipping things around.
In the original definition, the messages are coefficients of some polynomial and the codewords are evaluations of that polynomial.
In this view, the codewords are coefficients of some polynomial.
Proof:
First, let’s prove one direction of this proposition.
- Let $f(X)=\sum_{i=0}^{k-1}f_i\cdot X^i$ be a message, so the RS codeword is $(f(\gamma^0), f(\gamma),\dots, f(\gamma^{n-1}))$.
- $c(\gamma^j)=\sum_{\ell=0}^{n-1}c_{\ell}\cdot \gamma^{j\cdot \ell}$
- $=\sum_{\ell=0}^{n-1}(\sum_{i=0}^{k-1}f_i\cdot \gamma^{i\cdot \ell})\cdot \gamma^{j\cdot \ell}$
- $=\sum_{\ell=0}^{n-1}\sum_{i=0}^{k-1}f_i\cdot \gamma^{(i+j)\cdot \ell}$
- $=\sum_{i=0}^{k-1}\sum_{\ell=0}^{n-1}f_i\cdot \gamma^{(i+j)\cdot \ell}$ (switching the order of summation)
- $=\sum_{i=0}^{k-1}f_i\cdot \sum_{\ell=0}^{n-1}\gamma^{(i+j)\cdot \ell}$
- $=\sum_{i=0}^{k-1}f_i\cdot \sum_{\ell=0}^{n-1}(\gamma^{\ell})^{i+j}$
- $0\le i\le k-1$ and $1\le j\le n-k$
- Thus, $1\le i+j\le n-1<q-1$. (strictly less than $q-1$)
- Apply the aforementioned lemma to have $\sum_{\ell=0}^{n-1}(\gamma^{\ell})^{i+j}=0$
- $=0$.
- $\text{RS}_q((\gamma^0, \dots, \gamma^{n-1}),n, k)\\ \subseteq\{(c_0, c_1, \dots, c_{n-1})\in \mathbb{F}_q^n:c(\gamma^j)=0 \text{ for }j=1, 2, \dots, n-k\}$
This shows one direction of the proposition—RS code is contained in the above set.
To prove the equality, we need to count dimension.
We will show the set also has dimension $k$.
Let $H$ be a matrix of this form: $$ H=\left[ \begin{array}{c} 1 & \gamma & \gamma^2 & \dots & \gamma^{n-1} \\ 1 & \gamma^2 & \gamma^4 & \dots & \gamma^{2(n-1)} \\ \vdots \\ 1 & \gamma^{n-k} & \gamma^{2(n-k)} & \dots & \gamma^{(n-k)(n-1)} \\ \end{array} \right]\in \mathbb{F}_q^{(n-k)\times n} $$
Consider $H\cdot c$ where $c$ is a vector in that set:
- The $j$’th entry is $\sum_{i=0}^{n-1}c_i\cdot \gamma^{ji}=c(\gamma^j)=0$.
This implies that the set $\{(c_0, c_1, \dots, c_{n-1})\in \mathbb{F}_q^n:c(\gamma^j)=0 \text{ for }j=1, 2, \dots, n-k\}$ is exactly $\text{Ker}(H)$.
This shows that the set has dimension $k$ since $H$ is a Vandermonde matrix of dimension $n-k$.
The proposition also answer our question about the parity-check matrix of the RS-codes.
Notice that the dual code $\text{RS}(n, k)^\perp$ has a generator matrix $H^T$, which again looks a lot like a Vandermonde matrix.
It suggests that the dual of the RS code is basically an RS code.
This particular derivation of the proposition used the choice of evaluation points heavily. However, a statement like this is true in general.
More precisely, we define a generalized RS codes with an additional parameter vector $\vec\lambda$.
The GRS code is almost identical to the RS code, except for the scaling factors $\lambda_i$ applied to each coordinate.
The introduction of $\vec{\lambda}$ enables the following property we’ve seen before: if we take the transpose of the parity-check matrix of a GRS code, it yields a generator matrix for another GRS code.
Theorem:
「Algebraic ECCs」: Lec4 Singleton + Plotkin Bounds and RS Code