「Algebraic ECCs」: Lec1 Basics of ECCs

In this series, I will be learning Algebraic Error Correcting Codes, lectured by Mary Wootters. The lecture videos are available here. Feedback and sugguestions are always welcome! ^ - ^

Topics Covered:

  • Basic problem in coding theory
  • Code and codeword
  • Hamming distance and minimum distance
  • Rate
  • Hamming bound on trade-off of the rate and distance

Introduction

This course is named “Algebraic Error Correcting Codes”, containing two parts:

  1. Error Correcting Codes (ECC): ECC are a fundamental tool for communication, storage, complexity theory, algorithm design, cryptography, pseudorandomness and etc.
  2. Algebraic: algebraic techniques are a fundamental tool for designing ECCs.

Basic Problem in Coding Theory

Let’s start with the basic problem in coding theory:

We encode a message $x$ of length $k$ into a codeword $c$ of length $n$, which adds some redundancy when $n>k$. When we transmit or store this codeword, something bad might happen, i.e. some bits might be corrupted. Thus, we are left with a corrupted codeword $\tilde{c}$.

image.png

The goal is to find (something about) the message $x$ given the corrupted codeword $\tilde{c}$.

This has many applications, including communication and storage:

Application1: Communication

image.png
  1. Suppose Alice has a message $x\in \{0,1\}^{k}$ and wants to sends it to Bob.
  2. In order to add some redundancy, she encodes the message into a codeword $c$ and sends the codeword through a noisy channel.
  3. After Alice passes $c$ through the noisy channel, Bob receives $\tilde{c}$ and tries to figure out the original message $x$ that Alice intended to send.

Application2: Storage

image.png
  1. Suppose $x$ is a file we want to store.
  2. Instead of storing $x$ directly, we encode it as a codeword $c$, introducing redundancy.
  3. If $c$ is stored on a CD or in a RAID array, something bad, like a fire, might happen.
  4. However, the owner still wants to recover the original file $x$.

Based on the above two applications, we summarize four things we care about in the coding schemes:

Things We Care About (Take 1):

  1. We should be able to handle SOMETHING BAD.

  2. We should be able to recover WHAT WE WANT TO KNOW about $x$.

  3. We want to MINIMIZE OVERHEAD.

    Jumping ahead, the overhead is defined as the quantity $k/n\in (0, 1]$, which should be as big as possible.

  4. We want to do all this EFFICIENTLY.

The question about these things are: what is the best trade-off between 1-4?

The trade-off depends on how we model things:

  1. What is something bad?
  2. What exactly do we want to know about $x$?
    All of $x$ or the first bit of $x$?
  3. What counts as efficient?

This lecture will gives one way of answering these questions and there are many legit ways.

Now, let’s give some formal definitions for a code.

Code

Let $\Sigma$ be any finite set and let $n>0$ be a positive integer.

Code:
A code $\mathcal{C}$ of block length $n$ over an alphabet $\Sigma$ is a subset $\mathcal{C}\subseteq\Sigma^n$. An element $c\in \mathcal{C}$ is called a codeword.
Sometimes, block length is referred to simply as length.

So far, this is not a very interesting definition. A descriptive example for this definition is provided below.

Example 1:

image.png

Another example is slightly more interesting.

Example 2:

The following is a set of 7 vectors, which forms a binary code of length 4 over $\Sigma=\{0, 1\}$.

image.png

We can relate this code to the communication between Alice and Bob mentioned earlier.

image.png

Thus, the code $\mathcal{C}$ is the image of this map, i.e. $\mathcal{C}=\mathrm{Im(ENC)}$. In other words, $\mathcal{C}$ is the set of all codewords that can be obtained using this encoding map.

Example 2 can actually be used to correct one erasure (something bad).

An erasure means we know which bit got erased, but we don’t know its original value. Based on the map definition, the missing bit must be 1.

image.png

Furthermore, it can also be used to detect one error (something bad).

An error means we know one bit my have been changed, but we don’t know which one.

image.png

To sum up, we say that the code in Example 2 can correct one erasure or detect one error. But it cannot correct one error.

Let’s see a code that can correct one error.

Example 3:

image.png

Given this encoding map, we can define a code $\mathcal{C}=\mathrm{Im(ENC)}$ . Hence, $\mathcal{C}\subseteq\{0,1\}^7$ is a binary code of length 7.

This code has a nice way to visualize as three overlapping circles.

image.png

We place the messages $x_1, x_2, x_3, x_4$ in the middle and the remaining spaces in the circles correspond to the parity bits, which are the sum of the other bits in each circle. This ensures that the sum of the bits in each circle equals 0.

With these parity bits, this code can correct one error.


With these circles in mind, we can solve the following puzzle more easily.

image.png

In a legit codeword, all three circles should sum to 0. Here, both the green and red circles are messed up while the blue circle is correct. Therefore, $\tilde{c}_3$ must be flipped.

But this solution with the help of circles seems pretty ad hoc. In order to make the solution less ad hoc, we’ll introduce more definitions to formalize the solution and flesh out the four things we care about for ECCs mentioned before.

Hamming Distance

We first give the definition of hamming distance, which equals to the number of positions on which two vectors $x, y\in \Sigma^n$ differ.

Hamming Distance:
The Hamming Distance between $x,y\in \Sigma^n$ is $$ \Delta(x,y)=\sum_{i=1}^n \mathbb{1}(x_i\ne y_i) $$

Then we can define the relative hamming distance, which is the hamming distance normalized by $n$. It’s the fraction of positions on which two vectors differ.

Relative Hamming Distance:
The Relative Hamming Distance between $x,y\in \Sigma^n$ is $$ \delta(x,y)=\frac1 n \sum_{i=1}^n \mathbb{1}(x_i\ne y_i)=\frac{\Delta(x,y)}{n} $$

Another useful definition is the minimum distance, which is the minimum hamming distance over all distinct pairs. Jumping ahead, we will see that the code with large minimum distance can be used to correct errors.

Minimum Distance:
The Minimum Distance of a code $\mathcal{C}\subseteq \Sigma^n$ is $$ \min_{c\ne c'\in \mathcal{C}}\Delta(c,c') $$ Sometimes, the minimum distance is referred to simply as distance.
Aside: The hamming distance is a metric, which obeys the triangle inequality $\Delta(x, z)\le \Delta(x, y) + \Delta(y, z)$.

Claim:

The code in Example 3 has minimum distance 3.

Now we can convince ourselves the claim is true with the circles in mind. But we’ll see a much less ad hoc way to prove the distance in Lecture 2.

If this claim is true, it explains why that code can correct one error using the triangle inequality for hamming distance.

The minimum distance equal to 3 means that the hamming distance $\Delta(c,c’)\ge 3$ for every distinct pair $c, c’\in \mathcal{C}$.

As depicted in the following figure, suppose the corrupted codeword we received is $\tilde{c}$ and $c$ is the correct code should be $c$. There is another codeword $c’\in \mathcal{C}$.

image.png

We know that the codeword $\tilde{c}$ has exactly one error so the hamming distance $\Delta(\tilde{c}, c)=1$. The triangle inequality tells us:

$$
3\le \Delta(c,c’)\le \Delta(c, \tilde{c}) + \Delta(\tilde{c}, c’)
$$

that the hamming distance $\Delta(\tilde{c}, c’)\ge 2$ for every codeword $c’$ other than the correct codeword $c$.

Thus, the correct codeword $c\in \mathcal{C}$ is uniquely defined by “the one that is closest to $\tilde{c}$”.

Minimum Distance: Proxy for Robustness

The point of this discussion is that the minimum distance is a reasonable proxy for robustness.

To be more specific, in example 2, the code had minimum distance 2 and could correct 1 erasure and detect 1 error.

In example 3, the code had minimum distance 3 and could correct 1 error.

More generally, a code with minimum distance $d$ can:

  1. correct $\le d-1$ erasures
  2. detect $\le d-1$ errors
  3. correct $\lfloor \frac{d-1}2\rfloor$ errors

For point 1 and point 3, the (inefficient) algorithm is “if you see $\tilde{c}$, return $c\in \mathcal{C}$ that’s closest to $\tilde{c}$.

For point 2, the (inefficient) algorithm is “if $\tilde{c}\notin \mathcal{C}$, say that something wrong.”

To understand why the algorithm works, we look at the following picture:

image.png
  1. Red Points represent the codewords $c, c’\in \mathcal{C}$.

  2. Orange Circle correspond to the hamming balls of radius $\lfloor \frac{d-1}{2} \rfloor$ centered at codewords. These hamming balls are disjoint because the minimum distance is $d$.

    This hamming ball around a codeword $c$ is indeed a set of points $\{x\in \Sigma^n:\Delta(c, x)\le \lfloor\frac{d-1}{2} \rfloor\}$.
  3. Green Circles correspond to the hamming balls of radius $d-1$ centered at codewords. These hamming balls are not disjoint, but they each contain exactly one codeword.

The hamming balls help explain the (inefficient) algorithm we mention before.

  • Correct $\lfloor \frac{d-1}2\rfloor$ errors:
    If $c$ is the “correct” codeword (the left red point) and $\le \lfloor \frac{d-1}{2}\rfloor$ errors are introduced, we may end up with $\tilde{c}_1$ (the purple point) within its orange circle. Since the orange circle are disjoint, the algorithm can correct codeword $c$ from $\tilde{c}_1$.

  • Detect $d-1$ errors:

    If $\le d-1$ errors are introduced, we may end up with $\tilde{c}_2$ (the pink point) within its green circles. Now it’s possible that $\tilde{c}_2$ came from $c$ or that it came from $c’$; we can’t tell. However, since each green circles only contain exactly one codeword, meaning other codewords has to live outside of this ball, we can tell that something wrong.

  • Correct $d-1$ erasures:

    If $\le d-1$ erasures are introduced in $\tilde{c}$, suppose the candidate codewords are the correct codeword $c$ and some other $c’$. We erase the corresponding positions in both candidate codewords $c$ and $c’$. Since the hamming distance between $c$ and $c’$ is at least $d$,there must be at least one position where $c$ and $c’$ differ. This differing position allows us to resolve the ambiguity the identify the correct codeword $c$.

Returning to the things we care about, we can now clarify the firs two things:

Things We Care About (Take 2):

  1. We should be able to handle $\lfloor \frac{d-1}{2}\rfloor$ worst-case errors or $d-1$ worst-case erasures.

  2. We should be able to recover all of $x$ (aka correct the errors or erasures)

  3. We want to MINIMIZE OVERHEAD.

    Jumping ahead, the overhead is defined as the quantity $k/n\in (0, 1]$, which should be as big as possible.

  4. We want to do all this EFFICIENTLY.

Moreover, the first two things can be combined to the minimum distance $d$ and we want it as large as possible.

Aside:
In this class, we only focus on the worst-case model (also called the “Hamming model” or “adversarial model”. We will discuss a little bit about the random-error model (also called the “Shannon model” or “Stochastic model”.

Rate

Moving on to the point 3, what do we mean by “overhead”?

Dimension:
The Message Length (sometimes called Dimension) of a code $\mathcal{C}$ over an alphabet $\Sigma$ is defined to be $$ k=\log_{|\Sigma|}|\mathcal{C}| $$

This definition is in line of the the encoding operation

image.png

which encodes a message $x$ of length $k$ over $\Sigma$ into a codeword $c\in \mathcal{C}$.

This map assigns each possible message to a single codeword so we have $|\Sigma^k|=|\mathcal{C}|$ aka $k=\log_{|\Sigma|}|\mathcal{C}|$.

Rate:
The Rate of a code $\mathcal{C}\subseteq \Sigma^n$ with block length $n$ over an alphabet $\Sigma$ is $$ \mathcal{R}=\frac{\log_{|\Sigma|}|\mathcal{C}|}{n}=\frac{\text{message length }k}{\text{block length }n} $$

The rate captures the notion of overhead so minimizing the overhead means to maximize the rate.

So if $\mathcal{R}$ is close to $1$, it means not too much overhead. And if $\mathcal{R}$ is close to 0, it means lots of overhead. We want the rate to be as close to 1 as possible.

Now we can denote a code by these parameters

Code:
A code with distance $d$, message length $k$, block length $n$, and alphabet $\Sigma$ is called a $(n, k, d)_{\Sigma}$ code.

After clarifying the things we care about, a question arises:

What is the best trade-off between rate and distance?

This question is still open for binary codes!

Hamming Bound: Rate vs. Distance

The Hamming bound gives us one bound on the trade of rate and distance. It establishes some limitations on how good this trade-off can be.

Let’s return to the picture we had before, with $|C|$ disjoint Hamming balls of radius $\lfloor \frac{d-1}{2}\rfloor$ in the space $\Sigma^n$.

image.png

The idea of the Hamming bound is: there can’t be too much of them or they wouldn’t all fit in space $\Sigma^n$.

To be more precise, we define the Hamming ball as follows.

Hamming Ball:
The Hamming Ball in $\Sigma^n$ of radius $e$ about some point $x\in \Sigma^n$ is $$ B_{\Sigma^n}(x, e)=\{y\in \Sigma^n: \Delta(x, y)\le e\} $$ The Volume of $B_{\Sigma^n}(x, e)$ is denoted by $\mathrm{Vol_{|\Sigma|}}(e, n)=|B_{\Sigma^n}(x, e)|$, representing the number of points in the Hamming ball.
Notice that the right hand side, $|B_{\Sigma^n}(x, e)|$, includes $x$ while the volume notation only refers to the radius $e$ and block length $n$. This is because the volume does not depend on $x$.
Note: the $\Sigma$ is sometimes dropped from the notation $B_{\Sigma^n}(x, e)$. Note: The notation $B_{\Sigma^n}(x, e/n)$ refers to the Hamming ball with relative distance.

Say that $|\Sigma|=q$, then we can express the volume of a $q$-array Hamming ball as

$$
\mathrm{Vol}_q(e, n) = 1 + \binom{n}{1}\cdot (q-1) + \binom{n}{2}\cdot (q-1)^2 + \dots + \binom{n}{e}(q-1)^e
$$

where term $1$ refers to the $\vec{0}$, term $\binom{n}{1}\cdot (q-1)$ refers to all the elements of $\Sigma^n$ of weight 1.

Armed with these definitions, we can formalize the bound.

If a code $\mathcal{C}\subseteq \Sigma^n$ has distance $d$ and message length $k$, where $|\Sigma|=q$, we have

$$
|\mathcal{C}|\cdot \mathrm{Vol}_q\left(\left\lfloor \frac{d-1}{2}\right \rfloor, n\right) \le q^n
$$

The LHS corresponds to the total volume taken up by all disjoint Hamming balls while the RHS corresponds to the total volume in the whole space $\Sigma^n$.

Taking logs base $q$ of both sides, it can be written as

$$
\log_q |\mathcal{C}|+\log_q \left(\mathrm{Vol}_q\left(\left\lfloor \frac{d-1}{2}\right \rfloor, n\right)\right)\le n
$$

It gives the bound of the rate in terms of the distance.

Hamming Bound:
$$ \text{Rate}=\frac{k}{n} \le 1-\frac{\log_q \left(\mathrm{Vol}_q\left(\left\lfloor \frac{d-1}{2}\right \rfloor, n\right)\right)}{n} $$

Back to Example 3, which was a $(n=7, k=4, d=3)_{q=2}$ code.

We have $\lfloor\frac{d-1}{2}\rfloor=1$, $\mathrm{Vol}_2(1, 7) = 1 + \binom{7}{1}\cdot 1 = 8$.

So we have rate bounded by $\frac k n \le 1 - \frac{\log_2(8)}{n} = 1 - \frac 3 7 = \frac 4 7$.

In fact, the rate $\frac k n= \frac 4 7$, so in this case the Hamming bound is tight!

It means this code achieves the best trade-off between the minimum distance and rate.

When the Hamming bound is tight, we say the code is perfect.
Example 3 (which is perfect) is a special case of something called a Hamming Code.

「Algebraic ECCs」: Lec1 Basics of ECCs

https://f7ed.com/2024/12/11/stanford-cs250-ecc-lec1/

Author

f7ed

Posted on

2024-12-11

Updated on

2024-12-26

Licensed under

CC BY-NC-SA 4.0


Comments