Section The Hamming Metric
In our society, a great deal of information is communicated electronically. Bank transactions, television programs, military communications, cell phone calls, digital images, and almost any interchange one can think of either can be or is digitized and transmitted electronically. In many situations we need to compare one set of data to another (e.g., Internet searches for text strings or image matches, DNA strands), and metrics are often used for this purpose. Computers work in a binary system, that is they recognize only zeros and ones. So a digital text message is a string of zeros and ones. That is, a digital message is a collection of elements in the space \(X^n\) for some positive integer \(n\text{,}\) where \(X = \{0,1\}\text{.}\) Each element in \(X^n\) is called a word - that is, a word is an element in \(X^n\) denoted in the form \((x_1, x_1, \ldots,
x_n)\text{.}\) Just like in the English language, where not every combination of letters corresponds to words that make sense, not every word is recognizable as part of an intelligible message. We might, for example, code the letters of the alphabet by assigning numbers 1-26 to the letters, then make them elements of \(X^n\) by converting to binary. The collection of all intelligible words is called a code. So a code is just some subset of \(X^n\) that all parties agree are sensible words. The words in a code are called code words. To deal with problems that occur in transmitting digital messages, like scrambling (encoding) messages, unscrambling (decoding) messages, and detecting and correcting errors in messages, it is useful to have a way to measure distance between words. One way is to use the Hamming metric.
Definition 4.1.
Let \(x = (x_1, x_2, \ldots,
x_n)\) and \(y = (y_1, y_2, \ldots,
y_n)\) be words in \(X^n\text{.}\) The Hamming distance \(d_H\) between \(x\) and \(y\) is
\begin{equation*}
d_H(x,y) = \sum_{i=1}^n | x_i-y_i |\text{.}
\end{equation*}
Recall that for each \(i\) both \(x_i\) and \(y_i\) are either 0 or 1. So
\begin{equation*}
| x_i-y_i | = \begin{cases}0 \amp \text{ if } x_i=y_i \\ 1 \amp \text{ if } x_i \neq y_i. \end{cases}
\end{equation*}
In other words, \(d_H(x,y)\) counts the number of components at which \(x\) and \(y\) are different.
Activity 4.1.
(a)
Explain why \(d_H\) is a metric.
(b)
Suppose we create a code
\begin{equation*}
C = \{c_1,c_2,c_3,c_4,c_5,c_6,c_7,c_8\}
\end{equation*}
in \(X^6\text{,}\) where
\begin{align*}
c_1 \amp = (0,0,0,0,0,0) \amp c_2 \amp =( 0,0,0,0,1,1) \amp c_3 \amp = (0,0,0,1,0,1)\\
c_4 \amp = (0,0,1,0,0,1) \amp c_5 \amp = (0,0,0,1,1,0) \amp c_6 \amp = (0,0,1,0,1,0)\\
c_7 \amp = (0,0,1,1,0,0) \amp c_8 \amp = (0,0,1,1,1,1) \amp \amp
\end{align*}
That is, the words \(c_1\text{,}\) \(c_2\text{,}\) \(c_3\text{,}\) \(c_4\text{,}\) \(c_5\text{,}\) \(c_6\text{,}\) \(c_7\text{,}\) \(c_8\) are the only words that can comprise a message. Find \(d_H(c_2, c_8)\text{.}\)
(c)
Suppose we are on the receiving end of the message
\begin{equation}
(0,0,0,1,1,1) \ (0,0,1,1,0,0) \ (1,0,0,0,0,0) \ (0,0,0,0,1,1) \ (0,0,1,0,0,1)\text{.}\tag{4.1}
\end{equation}
(i)
How do we know that an error has occurred in transmission of the message we received?
(ii)
To correct the errors in this received message, we replace the incorrect words with the code word(s) in \(C\) closest to them. Correct this message. (Note that there may be more than one possible substitution. Find all of the possibilities.)