Skip to main content

Section 32 Bases for Vector Spaces

Subsection Application: Image Compression

If you painted a picture with a sky, clouds, trees, and flowers, you would use a different size brush depending on the size of the features. Wavelets are like those brushes.

―Ingrid Daubechies

The advent of the digital age has presented many new opportunities for the collection, analysis, and dissemination of information. Along with these opportunities come new difficulties as well. All of this digital information must be stored in some way and be retrievable in an efficient manner. One collection of tools that is used to deal with these problems is wavelets. For example, the FBI fingerprint files contain millions of cards, each of which contains 10 rolled fingerprint impressions. Each card produces about 10 megabytes of data. To store all of these cards would require an enormous amount of space, and transmitting one full card over existing data lines is slow and inefficient. Without some sort of image compression, a sortable and searchable electronic fingerprint database would be next to impossible. To deal with this problem, the FBI adopted standards for fingerprint digitization using a wavelet compression standard.

Another problem with electronics is noise. Noise can be a big problem when collecting and transmitting data. Wavelet decomposition filters data by averaging and detailing. The detailing coefficients indicate where the details are in the original data set. If some details are very small in relation to others, eliminating them may not substantially alter the original data set. Similar ideas may be used to restore damaged audio, 57  video, photographs, and medical information. 58 

We will consider wavelets as a tool for image compression. The basic idea behind using wavelets to compress images is that we start with a digital image, made up of pixels. Each pixel can be assigned a number or a vector (depending on the makeup of the image). The image can then be represented as a matrix (or a set of matrices) \(M\text{,}\) where each entry in \(M\) represents a pixel in the image. As a simple example, consider the \(16 \times 16\) image of a flower as shown at left in Figure 32.1. (We will work with small images like this to make the calculations more manageable, but the ideas work for any size image. We could also extend our methods to consider color images, but for the sake of simplicity we focus on grayscale.)

Figure 32.1. Left: A 16 by 16 pixel image. Right: The image compressed.

This flower image is a gray-scale image, so each pixel has a numeric representation between 0 and 255, where 0 is black, 255 is white, and numbers between 0 and 255 represent shades of gray. The matrix for this flower image is

\begin{equation} \left[ {\scriptsize \begin{array}{cccccccccccccccc} 240\amp 240\amp 240\amp 240\amp 130\amp 130\amp 240\amp 130\amp 130\amp 240\amp 240\amp 240\amp 240\amp 240\amp 240\amp 240\\240\amp 240\amp 240\amp 130 \amp 175\amp 175\amp 130\amp 175\amp 175\amp 130\amp 240\amp 240\amp 240\amp 240\amp 240\amp 240\\ 240\amp 240\amp 130\amp 130\amp 175\amp 175\amp 130\amp 175\amp 175\amp 130\amp 130\amp 240\amp 240\amp 240\amp 240\amp 240 \\240\amp 130\amp 175\amp 175\amp 130\amp 175\amp 175\amp 175\amp 130\amp 175\amp 175\amp 130\amp 240\amp 240\amp 240\amp 240\\240\amp 240\amp 130\amp 175\amp 175\amp 130\amp 175\amp 130\amp 175 \amp 175\amp 130\amp 240\amp 240\amp 240\amp 240\amp 240\\255\amp 240\amp 240\amp 130\amp 130\amp 175\amp 175\amp 175\amp 130\amp 130\amp 240\amp 240\amp 225\amp 240\amp 240\amp 240\\240\amp 240 \amp 130\amp 175\amp 175\amp 130\amp 130\amp 130\amp 175\amp 175\amp 130\amp 240\amp 225\amp 255\amp 240\amp 240 \\240\amp 240\amp 130\amp 175\amp 130\amp 240\amp 130\amp 240\amp 130\amp 175\amp 130\amp 240\amp 255\amp 255\amp 255\amp 240\\240\amp 240\amp 240\amp 130\amp 240\amp 240\amp 75\amp 240\amp 240\amp 130\amp 240\amp 255\amp 255\amp 255\amp 255\amp 255\\240\amp 240\amp 240\amp 240\amp 240\amp 240 \amp 75\amp 240\amp 240\amp 240\amp 240\amp 240\amp 240\amp 240\amp 240\amp 240\\240\amp 240\amp 240 \amp 75\amp 75\amp 240\amp 75\amp 240\amp 75\amp 75\amp 240\amp 240\amp 240\amp 240\amp 240\amp 240\\50\amp 240\amp 240\amp 240\amp 75\amp 240\amp 75\amp 240\amp 75\amp 240\amp 240\amp 240\amp 240\amp 50\amp 240\amp 240 \\240\amp 75\amp 240\amp 240\amp 240\amp 75\amp 75\amp 75\amp 240\amp 240\amp 50\amp 240\amp 50\amp 240\amp 240\amp 50\\240\amp 240\amp 75\amp 240\amp 240\amp 240\amp 75\amp 240\amp 240\amp 50\amp 240\amp 50\amp 240\amp 240\amp 50\amp 240\\75\amp 75\amp 75\amp 75\amp 75\amp 75\amp 75\amp 75\amp 75\amp 75\amp 75\amp 75\amp 75\amp 75\amp 75\amp 75\\75\amp 75\amp 75\amp 75\amp 75\amp 75\amp 75\amp 75\amp 75\amp 75\amp 75\amp 75\amp 75\amp 75\amp 75\amp 75 \end{array} }\right]\text{.}\tag{32.1} \end{equation}

Now we can apply wavelets to the image and compress it. Essentially, wavelets act by averaging and differencing. The averaging creates smaller versions of the image and the differencing keeps track of how far the smaller version is from a previous copy. The differencing often produces many small (close to 0) entries, and so replacing these entries with 0 doesn't have much effect on the image (this is called thresholding). By introducing long strings of zeros into our data, we are able to store a (compressed) copy of the image in a smaller amount of space. For example, using a threshold value of 10 produces the flower image shown at right in Figure 32.1.

The averaging and differencing is done with special vectors (wavelets) that form a basis for a suitable function space. More details of this process can be found at the end of this section.

Subsection Introduction

In \(\R^n\) we defined a basis for a subspace \(W\) of \(\R^n\) to be a minimal spanning set for \(W\text{,}\) or a linearly independent spanning set (see Definition 6.8). So to consider the idea of a basis in a vector space, we will need the notion of linear independence in that context.

Since we can add vectors and multiply vectors by scalars in any vector space, and because we have a zero vector in any vector space, we can define linear independence of a finite set of vectors in any vector space as follows (compare to Definition 6.2).

Definition 32.2.

A set \(\{\vv_1, \vv_2, \ldots, \vv_k\}\) of vectors in a vector space \(V\) is linearly independent if the vector equation

\begin{equation*} x_1 \vv_1 + x_2 \vv_2 + \cdots + x_k \vv_k = \vzero \end{equation*}

for scalars \(x_1, x_2, \ldots, x_k\) has only the trivial solution

\begin{equation*} x_1 = x_2 = x_3 = \cdots = x_k = 0\text{.} \end{equation*}

If a set of vectors is not linearly independent, then the set is linearly dependent.

Alternatively, we say that the vectors \(\vv_1, \vv_2, \ldots, \vv_k\) are linearly independent (or dependent) if the set \(\{\vv_1, \vv_2, \ldots, \vv_k\}\) is linearly independent (or dependent).

Preview Activity 32.1.

(a)

We can use the tools we developed to determine if a set of vectors in \(\R^n\) is linearly independent to answer the same questions for sets of vectors in other vector spaces. For example, consider the question of whether the set \(\{1+t, 1-t\}\) in \(\pol_1\) is linearly independent or dependent. To answer this question we need to determine if there is a non-trivial solution to the equation

\begin{equation} x_1 (1+t) + x_2(1-t) = 0\text{.}\tag{32.2} \end{equation}

Note that equation (32.2) can also be written in the form

\begin{equation*} (x_1+x_2) + (x_1-x_2)t = 0\text{.} \end{equation*}
(i)

Recall that two polynomials are equal if all coefficients of like powers are the same. By equating coefficients of like power terms, rewrite equation (32.2) as an equivalent system of two equations in the two unknowns \(x_1\) and \(x_2\text{,}\) and solve for \(x_1, x_2\text{.}\)

(ii)

What does your answer to the previous part tell you about the linear independence or dependence of the set \(\{1+t, 1-t\}\) in \(\pol_1\text{?}\)

(iii)

Recall that in \(\R^n\text{,}\) a set of two vectors is linearly dependent if and only if one of the vectors in the set is a scalar multiple of the other and linearly independent if neither vector is a scalar multiple of the other. Verify your answer to part (c) from a similar perspective in \(\pol_1\text{.}\)

(b)

We can use the same type of method as in problem (1) to address the question of whether the set

\begin{equation*} \left\{ \left[ \begin{array}{cc} 1\amp 3 \\ 1\amp 2 \end{array} \right], \left[ \begin{array}{cr} 1\amp -9 \\ 1\amp 8 \end{array} \right], \left[ \begin{array}{cr} 1\amp -1 \\ 1\amp 4 \end{array} \right] \right\} \end{equation*}

is linearly independent or dependent in \(\M_{2 \times 2}\text{.}\) To answer this question we need to determine if there is a non-trivial solution to the equation

\begin{equation} x_1\left[ \begin{array}{cc} 1\amp 3 \\ 1\amp 2 \end{array} \right] + x_2\left[ \begin{array}{cr} 1\amp -9 \\ 1\amp 8 \end{array} \right] + x_3 \left[ \begin{array}{cr} 1\amp -1 \\ 1\amp 4 \end{array} \right] = \vzero\tag{32.3} \end{equation}

for some scalars \(x_1\text{,}\) \(x_2\text{,}\) and \(x_3\text{.}\) Note that the linear combination on the left side of equation (32.3) has entries

\begin{equation*} \left[ \begin{array}{cc} x_1+x_2+x_3\amp 3x_1-9x_2-x_3 \\ x_1+x_2+x_3\amp 2x_1+8x_2+4x_3 \end{array} \right]\text{.} \end{equation*}
(i)

Recall that two matrices are equal if all corresponding entries are the same. Equate corresponding entries of the matrices in equation (32.3) to rewrite the equation as an equivalent system of four equations in the three unknowns \(x_1\text{,}\) \(x_2\text{,}\) and \(x_3\text{.}\)

(ii)

Use appropriate matrix tools and techniques to find all solutions to the system from part (a).

(iii)

What does the set of solutions to the system from part (a) tell you about the linear independence or dependence of the set

\begin{equation*} \left\{ \left[ \begin{array}{cc} 1\amp 3 \\ 1\amp 2 \end{array} \right], \left[ \begin{array}{cr} 1\amp -9 \\ 1\amp 8 \end{array} \right], \left[ \begin{array}{cr} 1\amp -1 \\ 1\amp 4 \end{array} \right] \right\}? \end{equation*}
(iv)

Recall that in \(\R^n\text{,}\) a set of vectors is linearly dependent if and only if one of the vectors in the set is a linear combination of the others and linearly independent if no vector in the set is a linear combination of the others. Verify your answer to part (c) from a similar perspective in \(\M_{2 \times 2}\text{.}\)

(c)

We will define a basis for a vector space to be a linearly independent spanning set. Which, if any, of the sets in parts (1) and (2) is a basis for its vector space? Explain.

Subsection Linear Independence

The concept of linear independence, which we formally defined in Preview Activity 32.1, provides us with a process to determine if there is redundancy in a spanning set to obtain an efficient spanning set.

The definition tells us that a set \(\{\vv_1, \vv_2, \ldots, \vv_k\}\) of vectors in a vector space \(V\) is linearly dependent if there are scalars \(x_1\text{,}\) \(x_2\text{,}\) \(\ldots\text{,}\) \(x_n\text{,}\) not all of which are 0 so that

\begin{equation*} x_1 \vv_1 + x_2 \vv_2 + \cdots + x_k \vv_k = \vzero\text{.} \end{equation*}

As examples, we saw in Preview Activity 32.1 that the set \(\{1+t, 1-t\}\) is linearly independent in \(\pol_1\text{.}\) The set \(\{1+t, -1+2t+t^2, 1-8t-3t^2\}\text{,}\) on the other hand, is linearly dependent in \(\pol_2\) since \(2(1+t) + 3(-1+2t+t^2) + (1-8t-3t^2) = 0\text{.}\)

In addition to the definition, there are other ways to characterize linearly independent and dependent sets in vector spaces as the next theorems illustrate. These characterizations are the same as those we saw in \(\R^n\text{,}\) and the proofs are essentially the same as well. The proof of Theorem 32.3 is similar to that of Theorem 6.4 and is left for the exercises.

Theorem 32.3 is equivalent to the following theorem that provides the corresponding result for linearly independent sets.

One consequence of Theorem 32.3 and Theorem 32.4 is that if a spanning set is linearly dependent, then one of the vectors in the set can be written as a linear combination of the others. In other words, at least one of the vectors is redundant. In that case, we can find a smaller spanning set as the next theorem states. The proof of this theorem is similar to that of Theorem 6.7 and is left for the exercises.

Subsection Bases

A basis for a vector space is a spanning set that is as small as it can be. We already saw how to define bases formally in \(\R^n\text{.}\) We will now formally define a basis for a vector space and understand why with this definition a basis is a minimal spanning set. Bases are important because any vector in a vector space can be uniquely represented as a linear combination of basis vectors. We will see in later sections that this representation will allow us to identify any vector space with a basis of \(n\) vectors with \(\R^n\text{.}\)

To obtain the formal definition of a basis, which is a minimal spanning set, we consider what additional property makes a spanning set a minimal spanning set. As a consequence of Theorem 32.5, if \(S\) is a spanning set that is linearly dependent, then we can find a proper subset of \(S\) that has the same span. Thus, the set \(S\) cannot be a minimal spanning set. However, if \(S\) is linearly independent, then no vector in \(S\) is a linear combination of the others and we need all of the vectors in \(S\) to form the span. This leads us to the following formal characterization of a minimal spanning set, called a basis.

Definition 32.6.

A basis for a vector space \(V\) is a subset \(S\) of \(V\) if

  1. \(\Span \ S = V\) and

  2. \(S\) is a linearly independent set.

In other words, a basis for a vector space \(V\) is a linearly independent spanning set for \(V\text{.}\) To put it another way, a basis for a vector space is a minimal spanning set for the vector space. Similar reasoning will show that a basis is also a maximal linearly independent set.

The key ideas to take from the previous theorems are:

  • A basis for a vector space \(V\) is a minimal spanning set for \(V\text{.}\)

  • A basis for \(V\) is a subset \(S\) of \(V\) so that

    1. \(S\) spans \(V\) and

    2. \(S\) is linearly independent.

  • No vector in a basis can be written as a linear combination of the other vectors in the basis.

  • If a subset \(S\) of a vector space \(V\) has the property that one of the vectors in \(S\) is a linear combination of the other vectors in \(S\text{,}\) then \(S\) is not a basis for \(V\text{.}\)

As an example of a basis of a vector space, we saw in Preview Activity 32.1 that the set \(S = \{1-t, 1+t\}\) is both linearly independent and spans \(\pol_1\text{,}\) and so \(S\) is a basis for \(\pol_1\text{.}\)

Activity 32.2.

(a)

Is \(S = \{1+t, t, 1-t\}\) a basis for \(\pol_1\text{?}\) Explain.

(b)

Explain why the set \(S = \{1, t, t^2, \ldots, t^n\}\) is a basis for \(\pol_n\text{.}\) This basis is called the standard basis for \(\pol_n\text{.}\)

(c)

Show that the set

\begin{equation*} \left\{ \left[ \begin{array}{cc} 1\amp 0 \\ 1\amp 1 \end{array} \right], \left[ \begin{array}{cc} 0\amp 1 \\ 1\amp 1 \end{array} \right], \left[ \begin{array}{cc} 1\amp 1 \\ 1\amp 0 \end{array} \right], \left[ \begin{array}{cc} 1\amp 1 \\ 0\amp 1 \end{array} \right] \right\} \end{equation*}

is a basis for \(\M_{2 \times 2}\text{.}\)

It should be noted that not every vector space has a finite basis. For example, the space \(\pol\) of all polynomials with real coefficients (of any degree) is a vector space, but no finite set of vectors will span \(\pol\text{.}\) In fact, the infinite set \(\{1, t, t^2, \ldots\}\) is both linearly independent and spans \(\pol\text{,}\) so \(\pol\) has an infinite basis.

Subsection Finding a Basis for a Vector Space

We already know how to find bases for certain vector spaces, namely \(\Nul A\) and \(\Col A\text{,}\) where \(A\) is any matrix. Finding a basis for a different kind of vector space will require other methods. Since a basis for a vector space is a minimal spanning set, to find a basis for a given vector space we might begin from scratch, starting with a given vector in the space and adding one vector at a time until we have a spanning set.

Activity 32.3.

Let \(W = \{a + bt + ct^3 \mid a, b, c \text{ are scalars } \}\text{.}\) We will find a basis of \(W\) that contains the polynomial \(3+t-t^3\text{.}\)

(a)

Let \(\CS_1 = \{3+t-t^3\}\text{.}\) Find a polynomial \(p(t)\) in \(W\) that is not in \(\Span \ \CS_1\text{.}\) Explain why this means that the set \(\CS_1\) does not span \(W\text{.}\)

(b)

Let \(\CS_2 = \{3+t-t^3, p(t)\}\text{.}\) Find a polynomial \(q(t)\) that is not in \(\Span \ \CS_2\text{.}\) What does this mean about \(\CS_2\) being a possible spanning set of \(W\text{?}\)

(c)

Let \(\CS_3 = \{3+t-t^3, p(t), q(t)\}\text{.}\) Explain why the set \(\CS_3\) is a basis for \(W\text{.}\)

Alternatively, we might construct a basis from a known spanning set.

Activity 32.4.

Let \(W = \left\{\left[ \begin{array}{cc} v+z\amp w+z \\ x\amp y \end{array} \right] \left. \right| v, w, x, y, z \text{ are scalars } \right\}\text{.}\) Assume that \(W\) is a subspace of \(\M_{2 \times 2}\text{.}\)

(a)

Find a set \(S\) of five \(2 \times 2\) matrices that spans \(W\) (since \(W\) is a span of a set of vectors in \(\M_{2 \times 2}\text{,}\) \(W\) is a subspace of \(\M_{2 \times 2}\)). Without doing any computation, can this set \(S\) be a basis for \(W\text{?}\) Why or why not?

(b)

Find a subset \(\CB\) of \(S\) that is a basis for \(W\text{.}\)

Activity 32.3 and Activity 32.4 give us two ways of finding a basis for a subspace \(W\) of a vector space \(V\text{,}\) assuming \(W\) has a basis with finitely many vectors. One way (illustrated in Activity 32.3) is to start by choosing any non-zero vector \(\vw_1\) in \(W\text{.}\) Let \(\CS_1 = \{\vw_1\}\text{.}\) If \(\CS_1\) spans \(W\text{,}\) then \(\CS_1\) is a basis for \(W\text{.}\) If not, there is a vector \(\vw_2\) in \(W\) that is not in \(\Span \ \CS_1\text{.}\) Then \(\CS_2 = \{\vw_1, \vw_2\}\) is a linearly independent set. If \(\Span \ \CS_2 = W\text{,}\) then \(\CS_2\) is a basis for \(W\) and we are done. If not, repeat the process. We will show later that this process must stop as long as we know that \(W\) has a basis with fini tely many vectors.

Another way (illustrated in Activity 32.4) to find a basis for \(W\) is to start with a spanning set \(\CS_1\) of \(W\text{.}\) If \(\CS_1\) is linearly independent, then \(\CS_1\) is a basis for \(W\text{.}\) If \(\CS_1\) is linearly dependent, then one vector in \(\CS_1\) is a linear combination of the others and we can remove that vector to obtain a new set \(\CS_2\) that also spans \(W\text{.}\) If \(\CS_2\) is linearly independent, then \(\CS_2\) is a basis for \(W\text{.}\) If not, we repeat the process as many times as needed until we arrive until at a subset \(\CS_k\) of \(\CS_1\) that is linearly independent and spans \(W\text{.}\) We summarize these results in the following theorem.

We conclude this section with the result mentioned in the introduction — that every vector in a vector space with basis \(\B\) can be written in one and only one way as a linear combination of basis vectors. The proof is similar to that of Theorem 6.6 and is left to the exercises.

Subsection Examples

What follows are worked examples that use the concepts from this section.

Example 32.9.

Let \(S = \{1, 1+t, 2-t^2, 1+t+t^2, t-t^2\}\text{.}\)

(a)

Does \(S\) span \(\pol_2\text{?}\) Explain.

Solution.

Let \(p(t) = a_0+a_1t+a_2t^2\) be an arbitrary vector in \(\pol_2\text{.}\) If \(p(t)\) is in \(\Span \ S\text{,}\) then there are weights \(c_1\text{,}\) \(c_2\text{,}\) \(c_3\text{,}\) \(c_4\text{,}\) and \(c_5\) such that

\begin{equation*} a_0+a_1t+a_2t^2 = c_1(1) + c_2(1+t) + c_3(2-t^2) + c_4(1+t+t^2) + c_5(t-t^2)\text{.} \end{equation*}

Equating coefficients of like powers gives us the system

\begin{align*} {}c_1 \amp {}+{} \amp {}c_2 \amp {}+{} \amp {2}c_3 \amp {}+{} \amp {}c_4 \amp {}{} \amp {} \amp = a_0\amp {}\\ {} \amp {}{} \amp {}c_2 \amp {}{} \amp {} \amp {}+{} \amp {}c_4 \amp {}+{} \amp {}c_5 \amp = a_1\amp {}\\ {} \amp {}{} \amp {} \amp {}{} \amp {-}c_3 \amp {}+{} \amp {}c_4 \amp {}-{} \amp {}c_5 \amp = a_2\amp {.} \end{align*}

The reduced row echelon form of the coefficient matrix \(A\) is

\begin{equation*} \left[ \begin{array}{cccrr} 1\amp 0\amp 0\amp 2\amp -3 \\ 0\amp 1\amp 0\amp 1\amp 1 \\ 0\amp 0\amp 1\amp -1\amp 1 \end{array} \right]\text{.} \end{equation*}

Since there is a pivot in every row of \(A\text{,}\) the system \(A \vx = \vb\) is always consistent. We conclude that \(S\) does span \(\pol_2\text{.}\)

(b)

Explain why \(S\) is not a basis for \(\pol_2\text{.}\)

Solution.

The fact that the coefficient matrix \(A\) of our system has non-pivot columns means that each vector in \(\pol_2\) can be written in more than one way as a linear combination of vectors in \(S\text{.}\) This means that \(S\) is not linearly independent and so cannot be a basis for \(\pol_2\text{.}\)

(c)

Find a subset of \(S\) that is a basis for \(\pol_2\text{.}\) Explain your reasoning.

Solution.

That the first three columns of \(A\) are pivot columns implies that the polynomials \(1\text{,}\) \(1+t\text{,}\) and \(2-t^2\) are linearly independent. Since there is a pivot in every row of \(A\text{,}\) the three polynomials \(1\text{,}\) \(1+t\text{,}\) and \(2-t^2\) also span \(\pol_2\text{.}\) So \(\{1, 1+t, 2-t^2\}\) is a subset of \(S\) that is a basis for \(\pol_2\text{.}\)

Example 32.10.

Let \(U\) be the set of all matrices of real numbers of the form \(\left[ \begin{array}{cc} u \amp -u-x \\ 0 \amp x \end{array} \right]\) and \(W\) be the set of all real matrices of the form \(\left[ \begin{array}{cc} v \amp 0 \\ w \amp -v \end{array} \right]\text{.}\)

(a)

Find a basis for \(U\) and a basis for \(W\text{.}\)

Solution.

Every matrix in \(U\) has the form

\begin{equation*} \left[ \begin{array}{cc} u \amp -u-x \\ 0 \amp x \end{array} \right] = u\left[ \begin{array}{cr} 1 \amp -1 \\ 0 \amp 0 \end{array} \right] + x\left[ \begin{array}{cr} 0 \amp -1 \\ 0 \amp 1 \end{array} \right]\text{.} \end{equation*}

Let \(S_U = \left\{ \left[ \begin{array}{cr} 1 \amp -1 \\ 0 \amp 0 \end{array} \right], \left[ \begin{array}{cr} 0 \amp -1 \\ 0 \amp 1 \end{array} \right] \right\}\text{.}\) Then \(U = \Span \ S_U\) and \(U\) is a subspace of \(\M_{2 \times 2}\text{.}\) If

\begin{equation*} c_1\left[ \begin{array}{cr} 1 \amp -1 \\ 0 \amp 0 \end{array} \right] + c_2\left[ \begin{array}{cr} 0 \amp -1 \\ 0 \amp 1 \end{array} \right] = 0\text{,} \end{equation*}

then \(c_1=c_2=0\) and \(S_U\) is also linearly independent. This makes \(S_U\) a basis for \(U\text{.}\) Similarly, every matrix in \(W\) has the form

\begin{equation*} \left[ \begin{array}{cc} v \amp 0 \\ w \amp -v \end{array} \right] = v \left[ \begin{array}{cr} 1 \amp 0 \\ 0 \amp -1 \end{array} \right] + w\left[ \begin{array}{cc} 0 \amp 0 \\ 1 \amp 0 \end{array} \right]\text{.} \end{equation*}

Let \(S_W = \left\{ \left[ \begin{array}{cr} 1 \amp 0 \\ 0 \amp -1 \end{array} \right] , \left[ \begin{array}{cc} 0 \amp 0 \\ 1 \amp 0 \end{array} \right] \right\}\text{.}\) Then \(W = \Span \ S_W\) and \(W\) is a subspace of \(\M_{2 \times 2}\text{.}\) If

\begin{equation*} c_1\left[ \begin{array}{cr} 1 \amp 0 \\ 0 \amp -1 \end{array} \right] + c_2\left[ \begin{array}{cc} 0 \amp 0 \\ 1 \amp 0 \end{array} \right] = 0\text{,} \end{equation*}

then \(c_1=c_2=0\) and \(S_W\) is also linearly independent. This makes \(S_W\) a basis for \(W\text{.}\)

(b)

Let \(U + W = \{A+B : A \text{ is in } U \text{ and } B \text{ is in } W\}\text{.}\) Show that \(U+W\) is a subspace of \(\M_{2 \times 2}\) and find a basis for \(U + W\text{.}\)

Solution.

Every matrix in \(U+W\) has the form

\begin{align*} \left[ \begin{array}{cc} u \amp -u-x\\ 0 \amp x \end{array} \right] \amp + \left[ \begin{array}{cc} v \amp 0\\ w \amp -v \end{array} \right] = \left[ \begin{array}{cc} u+v \amp -u-x\\ w \amp x-v \end{array} \right]\\ \amp = u\left[ \begin{array}{cr} 1 \amp -1\\ 0 \amp 0 \end{array} \right] + x\left[ \begin{array}{cr} 0 \amp -1\\ 0 \amp 1 \end{array} \right]\\ \amp \qquad + v\left[ \begin{array}{cr} 1 \amp 0\\ 0 \amp -1 \end{array} \right] + w\left[ \begin{array}{cc} 0 \amp 0\\ 1 \amp 0 \end{array} \right]\text{.} \end{align*}

Let \(S = \left\{ \left[ \begin{array}{cr} 1 \amp -1 \\ 0 \amp 0 \end{array} \right], \left[ \begin{array}{cr} 0 \amp -1 \\ 0 \amp 1 \end{array} \right], \left[ \begin{array}{cr} 1 \amp 0 \\ 0 \amp -1 \end{array} \right], \left[ \begin{array}{cc} 0 \amp 0 \\ 1 \amp 0 \end{array} \right] \right\}\text{.}\) Then \(U+W = \Span \ S\) and \(U+W\) is a subspace of \(\M_{2 \times 2}\text{.}\) If

\begin{equation*} c_1\left[ \begin{array}{cr} 1 \amp -1 \\ 0 \amp 0 \end{array} \right] + c_2\left[ \begin{array}{cr} 0 \amp -1 \\ 0 \amp 1 \end{array} \right] + c_3\left[ \begin{array}{cr} 1 \amp 0 \\ 0 \amp -1 \end{array} \right] + c_4\left[ \begin{array}{cc} 0 \amp 0 \\ 1 \amp 0 \end{array} \right] = 0\text{,} \end{equation*}

then

\begin{align*} {}c_1 \amp {} \amp {} \amp {}+{} \amp {}c_3 \amp {} \amp {} \amp = \amp 0\amp {}\\ {-}c_1 \amp {}-{} \amp {}c_2 \amp {} \amp {} \amp {} \amp {} \amp = \amp 0\amp {}\\ {} \amp {} \amp {} \amp {} \amp {} \amp {} \amp {}c_4 \amp = \amp 0\amp {}\\ {} \amp {} \amp {}c_2 \amp {}-{} \amp {}c_3 \amp {} \amp {} \amp = \amp 0\amp {.} \end{align*}

The reduced row echelon form of \(\left[ \begin{array}{rrrc} 1\amp 0\amp 1\amp 0 \\ -1\amp -1\amp 0\amp 0 \\ 0\amp 0\amp 0\amp 1 \\ 0\amp 1\amp -1\amp 0 \end{array} \right]\) is \(\left[ \begin{array}{ccrc} 1\amp 0\amp 1\amp 0 \\ 0\amp 1\amp -1\amp 0 \\ 0\amp 0\amp 0\amp 1 \\ 0\amp 0\amp 0\amp 0 \end{array} \right]\text{.}\) The vectors that correspond to the pivot columns are linearly independent and span \(U+W\text{,}\) so a basis for \(U+W\) is

\begin{equation*} \left\{ \left[ \begin{array}{cr} 1 \amp -1 \\ 0 \amp 0 \end{array} \right], \left[ \begin{array}{cr} 0 \amp -1 \\ 0 \amp 1 \end{array} \right], \left[ \begin{array}{cc} 0 \amp 0 \\ 1 \amp 0 \end{array} \right] \right\}\text{.} \end{equation*}

Subsection Summary

The important idea in this section is that of a basis for a vector space. A basis is a minimal spanning set and another equivalent characterization of the “minimal” property is linear independence.

  • A set \(\{\vv_1, \vv_2, \ldots, \vv_k\}\) of vectors in a vector space \(V\) is linearly independent if the vector equation

    \begin{equation*} x_1 \vv_1 + x_2 \vv_2 + \cdots + x_k \vv_k = \vzero \end{equation*}

    for scalars \(x_1, x_2, \ldots, x_k\) has only the trivial solution

    \begin{equation*} x_1 = x_2 = x_3 = \cdots = x_k = 0\text{.} \end{equation*}

    If a set of vectors is not linearly independent, then the set is linearly dependent.

  • A set \(\{\vv_1, \vv_2, \ldots, \vv_k\}\) of vectors in a vector space \(V\) is linearly independent if and only if none of the vectors in the set can be written as a linear combination of the remaining vectors in the set.

  • A set \(\{\vv_1, \vv_2, \ldots, \vv_k\}\) of vectors in a vector space \(V\) is linearly dependent if and only if at least one of the vectors in the set can be written as a linear combination of the remaining vectors in the set.

  • A basis for a vector space \(V\) is a subset \(S\) of \(V\) if

    1. \(\Span \ S = V\) and

    2. \(S\) is a linearly independent set.

  • A basis is important in that it provides us with an efficient way to represent any vector in the vector space — any vector can be written in one and only one way as a linear combination of vectors in a basis.

  • To find a basis of a vector space, we can start with a spanning set \(S\) and toss out any vector in \(S\) that can be written as a linear combination of the remaining vectors in \(S\text{.}\) We repeat the process with the remaining subset of \(S\) until we arrive at a linearly independent spanning set. Alternatively, we can find a spanning set for the space and remove any vector that is a linear combination of the others in the spanning set. We can repeat this process until we wind up with a linearly independent spanning set.

Exercises Exercises

1.

Determine if the given sets are linearly independent or dependent in the indicated vector space. If dependent, write one of the vectors as a linear combination of the others. If independent, determine if the set is a basis for the vector space.

(a)

\(\{[1 \ 4 \ 6]^{\tr}, [2 \ -1 \ 3]^{\tr}, [0 \ 1 \ 5]^{\tr}\}\) in \(\R^3\)

(b)

\(\{1-2t^2+t^3, 3-t+4t^3, 2-3t\}\) in \(\pol_3\)

(c)

\(\{1+t, -1-5t+4t^2+t^3, 1+t^2+t^3, t+2t^3\}\) in \(\pol_3\)

(d)

\(\left\{ \left[ \begin{array}{ccc} 1\amp 2\amp 0\\0\amp 1\amp 1 \end{array} \right], \left[ \begin{array}{crc} 1\amp -2\amp 0\\0\amp -1\amp 1 \end{array} \right], \left[ \begin{array}{ccr} 1\amp 2\amp 0\\0\amp 1\amp -1 \end{array} \right] \right\}\) in \(\M_{3 \times 2}\text{.}\)

2.

Let \(S = \{1+t+t^2, t+t^2, 1+t, 1+t^2\}\) in \(\pol_2\text{.}\)

(a)

Show that the set \(S\) spans \(\pol_2\text{.}\)

(b)

Show that the set \(S\) is linearly dependent.

(c)

Find a subset of \(S\) that is a basis for \(\pol_2\text{.}\) Be sure to verify that you have a basis.

3.

Find two different bases for \(\M_{2 \times 2}\text{.}\) Explain how you know that each set is a basis.

4.

The set \(W = \{at+bt^2 \mid a \text{ and } b \text{ are scalars } \}\) is a subspace of \(\pol_3\text{.}\)

(a)

Find a set of vectors in \(\pol_3\) that spans \(W\text{.}\)

(b)

Find a basis for \(W\text{.}\) Be sure to verify that you have a basis.

5.

Suppose that the set \(\{\vu, \vv, \vw\}\) is a basis for a vector space \(V\text{.}\) Is the set \(\{\vu+\vv, \vu+\vw, \vv+\vw\}\) a basis for \(V\text{?}\) Verify your result.

6.

Determine all scalars \(c\) so that the set \(\{c^2+t^2, c+2t, 1+t^2\}\) is a basis for \(\pol_2\text{.}\)

7.

A symmetric matrix is a matrix \(A\) so that \(A^{\tr} = A\text{.}\) Is it possible to find a basis for \(\M_{2 \times 2}\) consisting entirely of symmetric matrices? If so, exhibit one such basis. If not, explain why not.

8.

Find a basis of the subspace of \(\M_{2 \times 3}\) consisting of all matrices of the form \(\left[ \begin{array}{ccc} a\amp b\amp c \\ d\amp e\amp f \end{array} \right]\) where \(c=a-2d\) and \(f = b+3e\text{.}\)

12.

Show that if \(W_1, W_2\) are subspaces of \(V\) such that \(W_1\cap W_2 = \{ \vzero \}\text{,}\) then for any linearly independent vectors \(\vu_1, \vu_2, \ldots, \vu_k\) in \(W_1\) and \(\vv_1, \vv_2, \ldots, \vv_\ell\) in \(W_2\text{,}\) the set \(\{ \vu_1\text{,}\) \(\vu_2\text{,}\) \(\ldots\text{,}\) \(\vu_k\text{,}\) \(\vv_1\text{,}\) \(\vv_2\text{,}\) \(\ldots\text{,}\) \(\vv_\ell\}\) is linearly independent in \(V\text{.}\)

13.

Label each of the following statements as True or False. Provide justification for your response. Throughout, let \(V\) be a vector space.

(a) True/False.

If \(\vv\) is in \(V\text{,}\) then the set \(\{\vv\}\) is linearly independent.

(b) True/False.

If a set of vectors span a subspace, then the set forms a basis of this subspace.

(c) True/False.

If a linearly independent set of vectors spans a subspace, then the set forms a basis of this subspace.

(d) True/False.

If the set \(S\) spans \(V\) and removing any vector from \(S\) makes it not a spanning set anymore, then \(S\) is a basis.

(e) True/False.

If \(S\) is a linearly independent set in \(V\) and for every \(\vu\) in \(V\text{,}\) adding \(\vu\) to \(S\) makes it not linearly independent anymore, then \(S\) is a basis.

(f) True/False.

If a subset \(S\) of \(V\) spans \(V\text{,}\) then \(S\) must be linearly independent.

(g) True/False.

If a subset \(S\) of \(V\) is linearly independent, then \(S\) must span \(V\text{.}\)

(h) True/False.

If \(S\) is a linearly dependent set in \(V\text{,}\) then every vector in \(S\) is a linear combination of the other vectors in \(S\text{.}\)

(i) True/False.

A vector space cannot have more than one basis.

(j) True/False.

If \(\vu\) is a non-zero vector in \(V\text{,}\) then there is a basis of \(V\) containing \(\vu\text{.}\)

(k) True/False.

If \(\vu, \vv\) are two linearly independent vectors in \(V\text{,}\) then there is a basis of \(V\) containing \(\vu, \vv\text{.}\)

(l) True/False.

If \(\vu\) is in a basis of \(V\text{,}\) then \(2\vu\) cannot be in a basis of \(V\text{.}\)

Subsection Project: Image Compression with Wavelets

We return to the problem of image compression introduced at the beginning of this section. The first step in the wavelet compression process is to digitize an image. There are two important ideas about digitalization to understand here: intensity levels and resolution. In grayscale image processing, it is common to think of 256 different intensity levels, or scales, of gray ranging from 0 (black) to 255 (white). A digital image can be created by taking a small grid of squares (called pixels) and coloring each pixel with some shade of gray. The resolution of this grid is a measure of how many pixels are used per square inch. An example of a 16 by 16 pixel picture of a flower was shown in Figure 32.1.

An image can be thought of in several ways: as a two-dimensional array; as one long vector by stringing the columns together one after another; or as a collection of column vectors. For simplicity, we will use the last approach in this project. We call each column vector in a picture a signal. Wavelets are used to process signals. After processing we can apply some technique to compress the processed signals.

To process a signal we select a family of wavelets. There are many different families of wavelets — which family to use depends on the problem to be addressed. The simplest family of wavelets is the Haar family. More complicated families of wavelets are usually used in applications, but the basic ideas in wavelets can be seen through working with the Haar wavelets, and their relative simplicity will make the details easier to follow. Each family of wavelets has a father wavelet (usually denoted \(\varphi\)) and a mother wavelet (\(\psi\)).

Wavelets are generated from the mother wavelet by scalings and translations. To further simplify our work we will restrict ourselves to wavelets on [0,1], although this is not necessary. The advantage the wavelets have over other methods of data analysis (Fourier analysis for example) is that with the scalings and translations we are able to analyze both frequency on large intervals and isolate signal discontinuities on very small intervals. The way this is done is by using a large collection (infinite, in fact) of basis functions with which to transform the data. We'll begin by looking at how these basis functions arise.

If we sample data at various points, we can consider our data to represent a piecewise constant function obtained by partitioning [0,1] into \(n\) equal sized subintervals, where \(n\) represents the number of sample points. For the purposes of this project we will always choose \(n\) to be a power of 2. So we can consider all of our data to represent functions. For us, then, it is natural to look at these functions in the vector space of all functions from \(\R\) to \(\R\text{.}\) Since our data is piecewise constant, we can really restrict ourselves to a subspace of this larger vector space — subspaces of piecewise constant functions. The most basic piecewise constant function on the interval \([0,1]\) is the one whose value is 1 on the entire interval. We define \(\varphi\) to be this constant function (called the characteristic function of the unit interval). That is

\begin{equation*} \varphi(x) = \begin{cases}1 \amp \text{ if } 0 \leq x \lt 1 \\ 0, \amp \text{ otherwise. } \end{cases} \end{equation*}

This function \(\varphi\) is the Father Haar wavelet.

Figure 32.11. Graphs of \(\varphi(x)\text{,}\) \(\varphi(2x)\text{,}\) and \(\varphi(2x-1)\) from left to right.

This function \(\varphi\) may seem to be a very simple function but it has properties that will be important to us. One property is that \(\varphi\) satisfies a scaling equation. For example, Figure 32.11 shows that

\begin{equation*} \varphi(x) = \varphi(2x) + \varphi(2x-1) \end{equation*}

while Figure 32.12 shows that

\begin{equation*} \varphi(x) = \varphi(2^2x) + \varphi(2^2x-1) + \varphi(2^2x-2) + \varphi(2^2x-3)\text{.} \end{equation*}
Figure 32.12. Graphs of \(\varphi(2^2x)\text{,}\) \(\varphi(2^2x-1)\text{,}\) \(\varphi(2^2x-2)\text{,}\) and \(\varphi(2^2x-3)\text{,}\) from left to right.

So \(\varphi\) is a sum of scalings and translations of itself. In general, for each positive integer \(n\) and integers \(k\) between 0 and \(2^n-1\) we define

\begin{equation*} \varphi_{n,k}(x) = \varphi\left(2^nx-k\right)\text{.} \end{equation*}

Then \(\varphi(x) = \sum_{k=0}^{2^{n}-1} \varphi_{n,k}(x)\) for each \(n\text{.}\)

These functions \(\varphi_{n,k}\) are useful in that they form a basis for the vector space \(V_n\) of all piecewise constant functions on \([0,1]\) that have possible breaks at the points \(\frac{1}{2^n}\text{,}\) \(\frac{2}{2^n}\text{,}\) \(\frac{3}{2^n}\text{,}\) \(\ldots\text{,}\) \(\frac{2^n-1}{2^n}\text{.}\) This is exactly the kind of space in which digital signals live, especially if we sample signals at \(2^n\) evenly spaced points on \([0,1]\text{.}\) Let \(\CB_n = \{ \varphi_{n,k} : 0 \leq k \leq 2^n-1\}\text{.}\) You may assume without proof that \(\CB_n\) is a basis of \(V_n\text{.}\)

Project Activity 32.5.

(a)

Draw the linear combination \(2\varphi_{2,0} - 3\varphi_{2,1} + 17 \varphi_{2,2} + 30 \varphi_{2,3}\text{.}\) What does this linear combination look like? Explain the statement made previously ``Notice that these \(2^n\) functions \(\varphi_{n,k}\) form a basis for the vector space of all piecewise constant functions on \([0,1]\) that have possible breaks at the points \(\frac{1}{2^n}\text{,}\) \(\frac{2}{2^n}\text{,}\) \(\frac{3}{2^n}\text{,}\) \(\ldots\text{,}\) \(\frac{2^n-1}{2^n}\)".

(b)

Remember that we can consider our data to represent a piecewise constant function obtained by partitioning \([0,1]\) into \(n\) subintervals, where \(n\) represents the number of sample points. Suppose we collect the following data: \(10\text{,}\) \(13\text{,}\) \(21\text{,}\) \(55\text{,}\) \(3\text{,}\) \(12\text{,}\) \(4\text{,}\) \(18\text{.}\) Explain how we can use this data to define a piecewise constant function \(f\) on \([0,1]\text{.}\) Express \(f\) as a linear combination of suitable functions \(\varphi_{n,k}\text{.}\) Plot this linear combination of \(\varphi_{n,k}\) to verify.

Working with functions can be more cumbersome than working with vectors in \(\R^n\text{,}\) but the digital nature of our data makes it possible to view our piecewise constant functions as vectors in \(\R^n\) for suitable \(n\text{.}\) More specifically, if \(f\) is an element in \(V_n\text{,}\) then \(f\) is a piecewise constant function on \([0,1]\) with possible breaks at the points \(\frac{1}{2^n}\text{,}\) \(\frac{2}{2^n}\text{,}\) \(\frac{3}{2^n}\text{,}\) \(\ldots\text{,}\) \(\frac{2^n-1}{2^n}\text{.}\) If \(f\) has the value of \(y_i\) on the interval between \(\frac{i-1}{2^n}\) and \(\frac{i}{2^n}\text{,}\) then we can identify \(f\) with the vector \(\left[y_1 \ y_1 \ \ldots \ y_{2^n}\right]^{\tr}\text{.}\)

Project Activity 32.6.

(a)

Determine the vector in \(\R^8\) that is identified with \(\varphi\text{.}\)

(b)

Determine the value of \(m\) and the vectors in \(\R^m\) that are identified with \(\varphi_{2,0}\text{,}\) \(\varphi_{2,1}\text{,}\) \(\varphi_{2,2}\text{,}\) and \(\varphi_{2,3}\text{.}\)

We can use the functions \(\varphi_{n,k}\) to represent digital signals, but to manipulate the data in useful ways we need a different perspective. A different basis for \(V_n\) (a wavelet basis) will allow us to identify the pieces of the data that are most important. We illustrate in the next activity with the spaces \(V_1\) and \(V_2\text{.}\)

Project Activity 32.7.

The space \(V_1\) consists of all functions that are piecewise constant on \([0,1]\) with a possible break at \(x=\frac{1}{2}\text{.}\) The functions \(\varphi = \varphi_{n,k}\) are used to records the values of a signal, and by summing these values we can calculate their average. Wavelets act by averaging and differencing, and so \(\varphi\) does the averaging. We need functions that will perform the differencing.

(a)

Define \(\{\psi_{0,0}\}\) as

\begin{equation*} \psi_{0,0}(x) = \begin{cases}1 \amp \text{ if } 0 \leq x \lt \frac{1}{2} \\ -1 \amp \text{ if } \frac{1}{2} \leq x \lt 1 \\ 0 \amp \text{ otherwise } \end{cases}\text{.} \end{equation*}

A picture of \(\psi_{0,0}\) is shown in Figure 32.13. Since \(\psi_{0,0}\) assumes values of \(1\) and \(-1\text{,}\) we can use \(\psi_{0,0}\) to perform differencing. The function \(\psi = \psi_{0,0}\) is the Mother Haar wavelet. 59  Show that \(\{\varphi, \psi\}\) is a basis for \(V_1\text{.}\)

Figure 32.13. The graphs of \(\psi_{0,0}\text{,}\) \(\psi_{1,0}\) and \(\psi_{1,1}\) from left to right.
(b)

We continue in a manner similar to the one in which we constructed bases for \(V_n\text{.}\) For \(k=0\) and \(k=1\text{,}\) let \(\psi_{1,k} = \psi\left(2^1x-k\right)\text{.}\) Graphs of \(\psi_{1,0}\) and \(\psi_{1,1}\) are shown in Figure 32.13. The functions \(\psi_{1,k}\) assume the values of \(1\) and \(-1\) on smaller intervals, and so can be used to perform differencing on smaller scale than \(\psi_{0,0}\text{.}\) Show that \(\{\varphi_{0,0}, \psi_{0,0}, \psi_{1,0}, \psi_{1,1}\}\) is a basis for \(V_2\text{.}\)

As Project Activity 32.7 suggests, we can make a basis for \(V_n\) from \(\varphi_{0,0}\) and functions of the form \(\psi_{n,k}\) defined by \(\psi_{n,k}(x) = \psi\left(2^nx-k\right)\) for \(k\) from 0 to \(2^n-1\text{.}\) More specifically, if we let \(\CS_n = \{\psi_{n,k} : 0 \leq k \leq 2^n-1\}\text{,}\) then the set

\begin{equation*} \CW_n = \{\varphi_{0,0}\} \cup \bigcup_{j=0}^{n-1} \CS_j \end{equation*}

is a basis for \(V_n^{\perp}\) (we state this without proof). The functions \(\psi_{n,k}\) are the wavelets.

Project Activity 32.8.

We can now write any function in \(V_n\) using the basis \(\CW_n\text{.}\) As an example, the string 50, 16, 14, 28 represents a piecewise constant function which can be written as \(50 \varphi_{2,0} + 16 \varphi_{2,1} + 14 \varphi_{2,2} + 28 \varphi_{2,3}\text{,}\) an element in \(V_2\text{.}\)

(a)

Specifically identify the functions in \(\CW_0\text{,}\) \(\CW_1\text{,}\) and \(\CW_2\text{,}\) and \(\CW_3\text{.}\)

(b)

As mentioned earlier, we can identify a signal, and each wavelet function, with a vector in \(\R^m\) for an appropriate value of \(m\text{.}\) We can then use this identification to decompose any signal as a linear combination of wavelets. We illustrate this idea with the signal \([50 \ 16 \ 14 \ 28]^{\tr}\) in \(\R^4\text{.}\) Recall that we can represent this signal as the function \(f = 50 \varphi_{2,0} + 16 \varphi_{2,1} + 14 \varphi_{2,2} + 28 \varphi_{2,3}\text{.}\)

(i)

Find the the vectors \(\vw_1\text{,}\) \(\vw_2\text{,}\) \(\vw_3\text{,}\) and \(\vw_4\) in \(\R^m\) that are identified with \(\varphi_{0,0}\text{,}\) \(\psi_{0,0}\text{,}\) \(\psi_{1,0}\text{,}\) and \(\psi_{1,1}\text{,}\) respectively.

(ii)

Any linear combination \(c_1\varphi_{0,0} + c_2 \psi_{0,0} + c_3 \psi_{1,0} + c_4\psi_{1,1}\) is then identified with the linear combination \(c_1 \vw_1 + c_2 \vw_2 + c_3 \vw_3 + c_4 \vw_4\text{.}\) Use this idea to find the weights to write the function \(f\) as a linear combination of \(\varphi_{0,0}\text{,}\) \(\psi_{0,0}\text{,}\) \(\psi_{1,0}\text{,}\) and \(\psi_{1,1}\text{.}\)

Although is it not necessarily easy to observe, the weights in the decomposition \(f = 27 \varphi_{0,0} + 6 \psi_{0,0} + 17 \psi_{1,0} - 7 \psi_{1,1}\) are just averages and differences of the original weights in \(f = 50 \varphi_{2,0} + 16 \varphi_{2,1} + 14 \varphi_{2,2} + 28 \varphi_{2,3}\text{.}\) To see how, notice that if we take the overall average of the original weights we obtain the value of \(27\text{.}\) If we average the original weights in pairs (\(50\) and \(16\text{,}\) and \(14\) and \(28\)) we obtain the values \(33\) and \(21\text{,}\) and if we take average differences of the original weights in pairs (\(50\) and \(16\text{,}\) and \(14\) and \(28\)) we obtain the values \(17\) and \(-7\text{.}\) We can treat the signal \([33 \ 21]^{\tr}\) formed from the average of the pairs of the original weights as a smaller copy of the original signal. The average difference of the entries of this new signal is \(6\text{.}\) So the weights in our final decomposition are obtained by differences between successive averages and certain coefficients. The coefficients in our final decomposition \(27 \varphi_{0,0} + 6 \psi_{0,0} + 17 \psi_{1,0} - 7 \psi_{1,1}\) are called wavelet coefficients. This is the idea that makes wavelets so useful for image compression. In many images, pixels that are near to each other often have similar coloring or shading. These pixels are coded with numbers that are close in value. In the differencing process, these numbers are replaced with numbers that are close to 0. If there is little difference in the shading of the adjacent pixels, the image will be changed only a little if the shadings are made the same. This results in replacing these small wavelet coefficients with zeros. If the processed vectors contain long strings of zeros, the vectors can be significantly compressed.

Once we have recognized the pattern in expressing our original function as an overall average and wavelet coefficients we can perform these operations more quickly with matrices.

Project Activity 32.9.

The process of averaging and differencing discussed in and following Project Activity 32.8 can be viewed as a matrix-vector problem. As we saw in Project Activity 32.8, we can translate the problem of finding wavelet coefficients to the matrix world.

(a)

Consider again the problem of finding the wavelet coefficients contained in the vector \([27 \ 6 \ 17 \ - 7]^{\tr}\) for the signal \([50 \ 16 \ 14 \ 28]^{\tr}\text{.}\) Find the matrix \(A_4\) that has the property that \(A_4 [50 \ 16 \ 14 \ 28]^{\tr} = [27 \ 6 \ 17 \ - 7]^{\tr}\text{.}\) (You have already done part of this problem in Project Activity 32.8.) Explain how \(A_4\) performs the averaging and differencing discussed earlier.

(b)

Repeat the process in part (a) to find the matrix \(A_8\) that converts a signal to its wavelet coefficients.

(c)

The matrix \(A_i\) is called a forward wavelet transformation matrix and \(A_i^{-1}\) is the inverse wavelet transform matrix. Use \(A_8\) to show that the wavelet coefficients for the data string \([80 \ 48 \ 4 \ 36 \ 28 \ 64 \ 6 \ 50]^{\tr}\) are contained in the vector \([39.5 \ 2.5 \ 22 \ 9 \ 16 \ -16 \ -18 \ -22]^{\tr}\text{.}\)

Now we have all of the necessary background to discuss image compression. Suppose we want to store an image. We partition the image vertically and horizontally and record the color or shade at each grid entry. The grid entries will be our pixels. This gives a matrix, \(M\text{,}\) of colors, indexed by pixels or horizontal and vertical position. To simplify our examples we will work in gray-scale, where our grid entries are integers between 0 (black) and 255 (white). We can treat each column of our grid as a piecewise constant function. As an example, the image matrix \(M\) that produced the picture at left in Figure 32.1 is given in (32.1).

We can then apply a 16 by 16 forward wavelet transformation matrix \(A_{16}\) to \(M\) to convert the columns to averages and wavelet coefficients that will appear in the matrix \(A_{16}M\text{.}\) These wavelet coefficients allow us to compress the image — that is, create a smaller set of data that contains the essence of the original image.

Recall that the forward wavelet transformation matrix computes weighted differences of consecutive entries in the columns of the image matrix \(M\text{.}\) If two entries in \(M\) are close in values, the weighted difference in \(A_{16}M\) will be close to 0. For our example, the matrix \(A_{16}M\) is approximately

\begin{equation*} \left[ {\tiny{ \begin{array}{cccccccccccccccc} 208.0\amp 202.0\amp 178.0\amp 165.0\amp 155.0\amp 172.0\amp 118.0\amp 172.0\amp 155.0\amp 153.0\amp 176.0\amp 202.0\amp 208.0\amp 210.0\amp 209.0\amp 208.0\\ 33.4\amp 24.1\amp - 0.625\amp 0.938\amp - 2.50\amp - 5.94\amp 42.8\amp - 5.94\amp - 2.50\amp 12.8\amp 0.938\amp 24.7\amp 30.6\amp 33.4\amp 32.5\amp 31.6 \\- 1.88\amp - 13.8\amp 19.4\amp 2.50\amp 0.0\amp - 2.50\amp 8.12\amp - 2.50 \amp 0.0\amp 2.50\amp 19.4\amp - 13.8\amp 1.88\amp - 3.75\amp - 1.88\amp 0.0\\ 17.5\amp 61.9\amp 61.9\amp 6.88\amp 0.0\amp 61.9\amp 0.0\amp 61.9\amp 0.0\amp 30.6\amp 65.0\amp 66.9\amp 66.9\amp 19.4\amp 66.9\amp 66.9\\ 0.0\amp 27.5\amp 43.8\amp 16.2\amp 0.0 \amp - 11.2\amp 16.2\amp - 11.2\amp 0.0\amp 16.2\amp 43.8\amp 27.5\amp 0.0\amp 0.0\amp 0.0\amp 0.0 \\ 3.75\amp 0.0\amp 27.5\amp - 11.2\amp 0.0\amp - 16.2\amp 22.5\amp - 16.2\amp 0.0\amp - 11.2\amp 27.5\amp 0.0\amp - 3.75\amp - 7.50\amp - 3.75\amp 0.0\\ 47.5\amp 0.0\amp 0.0\amp 13.8\amp 82.5\amp 0.0\amp 0.0\amp 0.0\amp 82.5\amp 13.8\amp 0.0\amp 3.75\amp 3.75\amp 51.2\amp 3.75\amp 3.75\\ 82.5\amp 41.2\amp 41.2\amp 82.5\amp 82.5\amp 41.2\amp 0.0\amp 41.2\amp 82.5\amp 35.0\amp 35.0\amp 35.0\amp 35.0\amp 82.5\amp 35.0\amp 35.0 \\ 0.0\amp 0.0\amp 0.0\amp 55.0\amp - 22.5\amp - 22.5\amp 55.0\amp - 22.5\amp - 22.5\amp 55.0\amp 0.0\amp 0.0\amp 0.0\amp 0.0\amp 0.0\amp 0.0\\ 0.0\amp 55.0\amp - 22.5\amp - 22.5\amp 22.5\amp 0.0\amp - 22.5\amp 0.0\amp 22.5\amp - 22.5\amp - 22.5\amp 55.0\amp 0.0\amp 0.0\amp 0.0\amp 0.0\\- 7.50\amp 0.0\amp - 55.0\amp 22.5\amp 22.5\amp - 22.5\amp 0.0\amp - 22.5\amp 22.5\amp 22.5\amp - 55.0\amp 0.0\amp 7.50\amp 0.0\amp 0.0\amp 0.0 \\ 0.0\amp 0.0\amp 0.0\amp 0.0\amp 22.5\amp - 55.0\amp 0.0\amp - 55.0\amp 22.5 \amp 0.0\amp 0.0\amp 0.0\amp - 15.0\amp 0.0\amp - 7.50\amp 0.0\\ 0.0\amp 0.0\amp 0.0\amp - 55.0\amp 0.0\amp 0.0\amp 0.0\amp 0.0\amp 0.0\amp - 55.0\amp 0.0\amp 7.50\amp 7.50\amp 7.50\amp 7.50\amp 7.50\\ 95.0\amp 0.0\amp 0.0\amp - 82.5\amp 0.0\amp 0.0\amp 0.0\amp 0.0\amp 0.0\amp - 82.5\amp 0.0\amp 0.0\amp 0.0\amp 95.0\amp 0.0\amp 0.0\\ 0.0\amp - 82.5\amp 82.5\amp 0.0\amp 0.0\amp - 82.5\amp 0.0\amp - 82.5\amp 0.0\amp 95.0\amp - 95.0\amp 95.0 \amp - 95.0\amp 0.0\amp 95.0\amp - 95.0\\ 0.0\amp 0.0\amp 0.0\amp 0.0\amp 0.0\amp 0.0\amp 0.0\amp 0.0\amp 0.0\amp 0.0\amp 0.0\amp 0.0\amp 0.0\amp 0.0\amp 0.0\amp 0.0 \end{array} }} \right]\text{.} \end{equation*}

Note that there are many wavelet coefficients that are quite small compared to others — the ones where the weighted averages are close to 0. In a sense, the weighted differences tell us how much “detail” about the whole that each piece of information contains. If a piece of information contains only a small amount of information about the whole, then we shouldn't sacrifice much of the picture if we ignore the small “detail” coefficients. One way to ignore the small “detail” coefficients is to use thresholding.

With thresholding (this is hard thresholding or keep or kill), we decide on how much of the detail we want to remove (this is called the tolerance). So we set a tolerance and then replace each entry in our matrix \(A_{16}M\) whose absolute value is below the tolerance with 0 to obtain a new matrix \(M_1\text{.}\) In our example, if you use a threshold value of 10 we obtain the new matrix \(M_1\text{:}\)

\begin{equation*} \left[ {\tiny{\begin{array}{cccccccccccccccc} 208.0\amp 202.0\amp 178.0\amp 165.0\amp 155.0\amp 172.0\amp 118.0\amp 172.0\amp 155.0\amp 153.0\amp 176.0\amp 202.0\amp 208.0\amp 210.0\amp 209.0\amp 208.0\\ 33.4\amp 24.1\amp 0.0\amp 0.0\amp 0.0\amp 0.0\amp 42.8 \amp 0.0\amp 0.0\amp 12.8\amp 0.0\amp 24.7\amp 30.6\amp 33.4\amp 32.5\amp 31.6 \\ 0.0\amp - 13.8\amp 19.4\amp 0.0\amp 0.0\amp 0.0\amp 0.0\amp 0.0\amp 0.0\amp 0.0\amp 19.4\amp - 13.8\amp 0.0\amp 0.0\amp 0.0\amp 0.0\\ 17.5\amp 61.9\amp 61.9\amp 0.0\amp 0.0\amp 61.9\amp 0.0\amp 61.9\amp 0.0\amp 30.6\amp 65.0\amp 66.9\amp 66.9\amp 19.4\amp 66.9\amp 66.9\\ 0.0\amp 27.5\amp 43.8\amp 16.2\amp 0.0\amp - 11.2\amp 16.2\amp - 11.2\amp 0.0\amp 16.2\amp 43.8\amp 27.5\amp 0.0\amp 0.0\amp 0.0\amp 0.0 \\ 0.0\amp 0.0\amp 27.5\amp - 11.2\amp 0.0\amp - 16.2\amp 22.5\amp - 16.2\amp 0.0\amp - 11.2\amp 27.5\amp 0.0\amp 0.0\amp 0.0\amp 0.0\amp 0.0\\ 47.5\amp 0.0\amp 0.0\amp 13.8\amp 82.5\amp 0.0\amp 0.0\amp 0.0\amp 82.5\amp 13.8\amp 0.0\amp 0.0\amp 0.0\amp 51.2\amp 0.0\amp 0.0\\ 82.5\amp 41.2\amp 41.2\amp 82.5\amp 82.5\amp 41.2\amp 0.0\amp 41.2\amp 82.5\amp 35.0\amp 35.0\amp 35.0\amp 35.0\amp 82.5\amp 35.0\amp 35.0 \\ 0.0\amp 0.0\amp 0.0\amp 55.0\amp - 22.5\amp - 22.5\amp 55.0\amp - 22.5\amp - 22.5\amp 55.0\amp 0.0\amp 0.0\amp 0.0\amp 0.0\amp 0.0\amp 0.0\\ 0.0\amp 55.0\amp - 22.5\amp - 22.5\amp 22.5\amp 0.0\amp - 22.5\amp 0.0\amp 22.5\amp - 22.5\amp - 22.5\amp 55.0\amp 0.0\amp 0.0\amp 0.0\amp 0.0\\ 0.0\amp 0.0\amp - 55.0\amp 22.5\amp 22.5\amp - 22.5\amp 0.0\amp - 22.5\amp 22.5\amp 22.5\amp - 55.0\amp 0.0\amp 0.0\amp 0.0\amp 0.0\amp 0.0 \\ 0.0\amp 0.0\amp 0.0\amp 0.0\amp 22.5\amp - 55.0\amp 0.0\amp - 55.0\amp 22.5 \amp 0.0\amp 0.0\amp 0.0\amp - 15.0\amp 0.0\amp 0.0\amp 0.0\\ 0.0\amp 0.0\amp 0.0\amp - 55.0\amp 0.0\amp 0.0\amp 0.0\amp 0.0\amp 0.0\amp - 55.0\amp 0.0\amp 0.0\amp 0.0\amp 0.0\amp 0.0\amp 0.0\\ 95.0\amp 0.0\amp 0.0\amp - 82.5\amp 0.0\amp 0.0\amp 0.0\amp 0.0\amp 0.0\amp - 82.5\amp 0.0\amp 0.0\amp 0.0\amp 95.0\amp 0.0\amp 0.0\\ 0.0\amp - 82.5\amp 82.5\amp 0.0\amp 0.0\amp - 82.5\amp 0.0\amp - 82.5\amp 0.0\amp 95.0\amp - 95.0\amp 95.0\amp - 95.0\amp 0.0\amp 95.0\amp - 95.0\\ 0.0\amp 0.0\amp 0.0\amp 0.0\amp 0.0\amp 0.0\amp 0.0\amp 0.0\amp 0.0\amp 0.0\amp 0.0\amp 0.0\amp 0.0\amp 0.0\amp 0.0\amp 0.0 \end{array} }} \right]\text{.} \end{equation*}

We now have introduced many zeros in our matrix. This is where we compress the image. To store the original image, we need to store every pixel. Once we introduce strings of zeros we can identify a new code (say 256) that indicates we have a string of zeros. We can then follow that code with the number of zeros in the string. So if we had a string of 15 zeros in a signal, we could store that information in 2 bytes rather than 15 and obtain significant savings in storage. This process removes some detail from our picture, but only the small detail. To convert back to an image, we just undo the forward processing by multiplying our thresholded matrix \(M_1\) by \(A_{16}^{-1}\text{.}\) The ultimate goal is to obtain significant compression but still have \(A_{16}^{-1}M_1\) retain all of the essence of the original image.

In our example using \(M_1\text{,}\) the reconstructed image matrix is \(A_{16}^{-1}M_1\) (rounded to the nearest whole number) is

\begin{equation*} \left[ \begin{array}{cccccccccccccccc} 242\amp 240\amp 241\amp 237\amp 132\amp 138\amp 232\amp 138\amp 132\amp 238\amp 239\amp 240\amp 238\amp 244\amp 242\amp 240\\ 242\amp 240\amp 241\amp 127\amp 178\amp 183\amp 122\amp 183\amp 178\amp 128\amp 239\amp 240\amp 238\amp 244\amp 242\amp 240\\ 242\amp 240\amp 131\amp 127\amp 178\amp 183\amp 122\amp 183\amp 178\amp 128\amp 129\amp 240\amp 238\amp 244\amp 242\amp 240 \\ 242\amp 130\amp 176\amp 172\amp 132\amp 183\amp 167\amp 183\amp 132\amp 172\amp 174\amp 130\amp 238\amp 244\amp 242\amp 240 \\ 242\amp 240\amp 131\amp 177\amp 178\amp 133\amp 183\amp 133\amp 178\amp 178\amp 129\amp 240\amp 238\amp 244\amp 242\amp 240 \\ 242\amp 240\amp 241\amp 132\amp 132\amp 178\amp 183\amp 178\amp 132\amp 132\amp 239\amp 240\amp 238\amp 244\amp 242\amp 240 \\ 242\amp 240\amp 131\amp 177\amp 178\amp 133\amp 138\amp 133\amp 178\amp 178\amp 129\amp 240\amp 223\amp 244\amp 242\amp 240 \\ 242\amp 240\amp 131\amp 177\amp 132\amp 243\amp 138\amp 243\amp 132\amp 178\amp 129\amp 240\amp 253\amp 244\amp 242\amp 240 \\ 240\amp 240\amp 239\amp 124\amp 238\amp 234\amp 75\amp 234\amp 238\amp 130\amp 241\amp 244\amp 244\amp 248\amp 244\amp 244 \\ 240\amp 240\amp 239\amp 234\amp 238\amp 234\amp 75\amp 234\amp 238\amp 240\amp 241\amp 244\amp 244\amp 248\amp 244\amp 244 \\ 240\amp 240\amp 239\amp 69\amp 73\amp 234\amp 75\amp 234\amp 73\amp 75\amp 241\amp 244\amp 244\amp 240\amp 244\amp 244 \\ 50\amp 240\amp 239\amp 234\amp 73\amp 234\amp 75\amp 234\amp 73\amp 240\amp 241\amp 244\amp 244\amp 50\amp 244\amp 244 \\ 240\amp 75\amp 239\amp 248\amp 238\amp 69\amp 75\amp 69\amp 238\amp 240\amp 51\amp 240\amp 50\amp 240\amp 240\amp 50 \\ 240\amp 240\amp 74\amp 248\amp 238\amp 234\amp 75\amp 234\amp 238\amp 50\amp 241\amp 50\amp 240\amp 240\amp 50\amp 240 \\ 75\amp 75\amp 74\amp 83\amp 73\amp 69\amp 75\amp 69\amp 73\amp 75\amp 76\amp 75\amp 75\amp 75\amp 75\amp 75\\ 75\amp 75\amp 74\amp 83\amp 73\amp 69\amp 75\amp 69\amp 73\amp 75\amp 76\amp 75\amp 75\amp 75\amp 75\amp 75 \end{array} \right]\text{.} \end{equation*}

We convert this into a gray-scale image and obtain the image at right in Figure 32.1. Compare this image to the original at right in Figure 32.1. It is difficult to tell the difference.

There is a Sage file you can use at faculty.gvsu.edu/schlicks/Wavelets_Sage.html that allows you to create your own 16 by 16 image and process, process your image with the Haar wavelets in \(\R^{16}\text{,}\) apply thresholding, and reconstruct the compressed image. matrix. You can create your own image, experiment with several different threshold levels, and choose the one that you feel gives the best combination of strings of 0s while reproducing a reasonable copy of the original image.

see ccrma.stanford.edu/groups/edison/brahms/brahms.html for a discussion of the denoising of a Brahms recording
A review of wavelets in biomedical applications. M. Unser, A. Aldroubi. Proceedings of the IEEE, Volume: 84, Issue: 4 , Apr 1996
The first mention of wavelets appeared in an appendix to the thesis of A. Haar in 1909.