Projections onto Subspaces and the Gram-Schmidt Process in \R^n

Section 25 Projections onto Subspaces and the Gram-Schmidt Process in \(\R^n\)

Focus Questions

By the end of this section, you should be able to give precise and thorough answers to the questions listed below. You may want to keep these questions in mind to focus your thoughts as you complete the section.

What is a projection of a vector onto a subspace and why are such projections important?
What is a projection of a vector orthogonal to a subspace and why are such orthogonal projections important?
What is the Gram-Schmidt process and why is it useful?
What is the QR-factorization of a matrix and why is it useful?

Subsection Application: MIMO Systems

MIMO (Multiple Input Multiple Output) systems are widely used to increase data transmission rates and improve the quality of wireless communication systems. MIMO is one of several forms of smart antenna technology, and it makes use of multiple antennas at the transmitter to send signals and multiple antennas at the receiver to accept the signals. MIMO systems transmit the same data on multiple streams, which introduces redundancy into the system. MIMO systems can utilize bounced and reflected signals to actually improve signal strength. This is very useful in urban environments in the presence of large buildings and the absence of direct line-of-sight transmission. MIMO systems can transmit several information streams in parallel (known as spatial multiplexing), allowing for increased data transformation rates. The presence of multiple receiver antennas allows for greater reliability in the system.

In a MIMO system, every transmitter antenna sends a signal to every receiver antenna as illustrated in Figure 25.1. In each of these transmissions the signal can be disturbed in some way. Two types of disturbances are fading and noise. Fading is due to time variations in the received signal, which can occur because of atmospheric conditions or obstacles over the path which are varying with respect to time. In a MIMO system we have multipath fading for the different paths the signal takes from different antennas to receivers, which causes fluctuations in amplitude, phase and angle of arrival of the received signal. Noise is an unwanted signal which interferes with actual signal. Both fading and noise result in a received signal that is different than the transmitted signal. The problem is how to recover the original signal from the transmitted signal. The QR decomposition is used in this process. To improve the efficiency of MIMO systems, different methods are introduced to determine a QR decomposition. We will discuss MIMO systems in more detail later in this section, and how Householder transformations can be used to find QR decompositions.

Subsection Introduction

In many situations (least squares approximations, for example) want to find a vector \(\vw\) in a subspace \(W\) of \(\R^n\) that is the “best” approximation to a vector \(\vv\) not in the subspace. A natural measure of “best” is to find a vector \(\vw\)in \(W\) if one exists, such that the distance from \(\vw\) to \(\vu\) is a minimum. This means we want to find \(\vw\) so that the quantity

\begin{equation*} ||\vw - \vu || \end{equation*}

is as small as possible over all vectors \(\vw\) in \(W\) To do this, we will need to find a suitable projection of the vector \(\vv\) onto the subspace \(W\) We have already done this in the case that \(W = \Span\{\vw_1\}\) is the span of a single vector -- we can project the vector \(\vv\) in the direction of \(\vw_1\text{.}\) Our goal is to generalize this to project a vector \(\vv\) onto an entire subspace in a useful way.

Preview Activity 25.1.

Let \(\CB = \{\vw_1, \vw_2\}\) be a basis for a subspace \(W\) of \(\R^3\) where \(\vw_1 = [1 \ 0 \ 0]^{\tr}\) and \(\vw_2 = [0 \ 1 \ 0]^{\tr}\) Note that \(\CB\) is an orthonormal basis for \(W\) Let \(\vv = [1 \ 2 \ 1]^{\tr}\) Note that \(W\) is the \(xy\)plane and that \(\vv\) is not in \(W\) as illustrated in Figure 25.2.

Figure 25.2. The space \(W\) and vectors \(\vw_1\text{,}\) \(\vw_2\text{,}\) and \(\vv\text{.}\)

(a)

Find the projection \(\vu_1\)of \(\vv\)onto \(W_1 = \Span\{\vw_1\}\)

Hint.

See Equation (23.2).

(b)

Find the projection \(\vu_2\)of \(\vv\)onto \(W_2 = \Span\{\vw_2\}\)

(c)

Calculate the distance between \(\vv\)and \(\vu_1\)and \(\vv\)and \(\vu_2\) Which of \(\vu_1\)and \(\vu_2\)is closer to \(\vv\text{?}\)

(d)

Show that the vector \(\frac{3}{4} [1 \ 2 \ 0]^{\tr}\)is in \(W\)and find the distance between \(\vv\)and \(\frac{3}{4} [1 \ 2 \ 0]^{\tr}\)

(e)

Part (4) shows that neither vector \(\vu_1 = \proj_{\vw_1} \vv\)nor \(\vu_2 = \proj_{\vw_2} \vv\)is the vector in \(W\)that is closest to \(\vv\) We should probably expect this neither projection uses the fact that the other vector might contribute to the closest vector. Let us instead consider the sum \(\vu_3 = \vu_1+\vu_2\) Calculate the components of this vector \(\vu_3\)and determine the distance between \(\vu_3\)and \(\vv\) Which of the three vectors \(\vu_1\) \(\vu_2\) and \(\vu_3\)in \(W\)is closest to \(\vv\text{?}\)

(f)

A picture of \(\vw_1\) \(\vw_2\) \(W\) and \(\vv\)is shown in Figure 25.2. Draw in \(\vu_1\) \(\vu_2\) and \(\vu_3\) Draw a picture of the vector \(\vw\)in \(W\)that appears to be closest to \(\vv\) Does this vector seem to have any relationship to \(\vu_1\) \(\vu_2\) or \(\vu_3\text{?}\) If yes, what relationship do you see?

Subsection Projections onto Subspaces and Orthogonal Projections

Preview Activity 25.1 gives an indication of how we can project a vector \(\vv\) in \(\R^n\) onto a subspace \(W\) of \(\R^n\text{.}\) If we have an orthogonal basis for \(W\text{,}\) we can just add the projections of \(\vv\) onto each basis vector. The resulting vector is called the projection of \(\vv\) onto the subspace \(W\text{.}\) As we did with projections onto vectors, we can also define the projection of \(\vv\) orthogonal to \(W\text{.}\) Note that to make this all work out properly, we will see that we need an orthogonal basis for \(W\text{.}\)

Definition 25.3.

Let \(W\) be a subspace of \(\R^n\) and let \(\{\vw_1, \vw_2, \ldots, \vw_m\}\) be an orthogonal basis for \(W\text{.}\) The projection of the vector \(\vv\) in \(V\) onto \(W\) is the vector

\begin{equation*} \proj_W \vv = \frac{\vv \cdot \vw_1}{\vw_1 \cdot \vw_1} \vw_1 + \frac{\vv \cdot \vw_2}{\vw_2 \cdot \vw_2} \vw_2 + \cdots + \frac{\vv \cdot \vw_m}{\vw_m \cdot \vw_m} \vw_m\text{.} \end{equation*}

The projection of \(\vv\) orthogonal to \(W\) is the vector

\begin{equation*} \proj_{\perp W} \vv = \vv - \proj_W \vv\text{.} \end{equation*}

The notation \(\proj_{\perp W} \vv\) indicates that we expect this vector to be orthogonal to every vector in \(W\text{.}\) We address this in the following activity.

Activity 25.2.

Let \(W = \Span \{\vw_1, \vw_2\}\) in \(\R^3\text{,}\) with \(\vw_1=[1 \ 0 \ 0]^{\tr}\) and \(\vw_2= [0 \ 1 \ 0]^{\tr}\text{,}\) and \(\vv = [1 \ 2 \ 1]^{\tr}\) as in Preview Activity 25.1. Recall that \(\proj_W \vv = [1 \ 2 \ 0]^{\tr}\text{.}\) Find the projection of \(\vv\) orthogonal to \(W\) and show directly that \(\proj_{\perp W} \vv\) is orthogonal to the basis vectors for \(W\) (and hence to every vector in \(W\)).

Activity 25.2 indicates that the vector \(\proj_{\perp W} \vv\) is in fact orthogonal to every vector in \(W\text{.}\) To see that this is true in general, let \(\CB = \{\vw_1, \vw_2, \ldots, \vw_m\}\) be an orthogonal basis for a subspace \(W\) of \(\R^n\) and let \(\vv\) be a vector in \(\R^n\text{.}\) Let

\begin{equation*} \vw = \proj_W \vv = \frac{\vv \cdot \vw_1}{\vw_1 \cdot \vw_1} \vw_1 + \frac{\vv \cdot \vw_2}{\vw_2 \cdot \vw_2} \vw_2 + \cdots + \frac{\vv \cdot \vw_m}{\vw_m \cdot \vw_m} \vw_m\text{.} \end{equation*}

Then \(\vv - \vw\) is the projection of \(\vv\) orthogonal to \(W\text{.}\) We will show that \(\vv - \vw\) is orthogonal to every basis vector for \(W\text{.}\) Since \(\CB\) is an orthogonal basis for \(W\text{,}\) we know that \(\vw_i \cdot \vw_j = 0\) for \(i \neq j\text{.}\) So if \(k\) is between 1 and \(m\) then

\begin{align*} \vw_k \cdot (\vv - \vw) \amp = (\vw_k \cdot \vv) - (\vw_k \cdot \vw)\\ \amp = (\vw_k \cdot \vv) - \left[\vw_k \cdot \left(\ds \frac{\vv \cdot \vw_1}{\vw_1 \cdot \vw_1} \vw_1 + \frac{\vv \cdot \vw_2}{\vw_2 \cdot \vw_2} \vw_2 + \cdots + \frac{\vv \cdot \vw_m}{\vw_m \cdot \vw_m} \vw_m \right)\right]\\ \amp = (\vw_k \cdot \vv) - \left(\frac{\vv \cdot \vw_k}{\vw_k \cdot \vw_k}\right) (\vw_k \cdot \vw_k)\\ \amp = (\vw_k \cdot \vv) - (\vv \cdot \vw_k)\\ \amp = 0\text{.} \end{align*}

So the vector \(\vv - \vw\) is orthogonal to every basis vector for \(W\text{,}\) and therefore to every vector in \(W\) (Theorem 23.16). Because \(\CB\) is a basis for \(W\text{,}\) if \(\vx\) is in \(W\text{,}\) then

\begin{equation*} \vx = c_1 \vw_1 + c_2 \vw_2 + \cdots + c_m \vw_m \end{equation*}

for some scalars \(c_1\text{,}\) \(c_2\text{,}\) \(\ldots\text{,}\) \(c_m\text{.}\) So

\begin{equation*} (\vv - \vw) \cdot \vx = \sum_{k=1}^m c_k (\vv - \vw) \cdot \vw_k = 0\text{,} \end{equation*}

and so \(\vv - \vw\) is orthogonal to every vector in \(W\text{.}\)

Subsection Best Approximations

We have seen that \(\proj_{\perp W} \vv\) is orthogonal to every vector in \(W\text{,}\) which suggests that \(\proj_W \vv\) is in fact the vector in \(W\) that is closest to \(\vv\text{.}\) We now verify this fact and conclude that \(\proj_W \vv\) is the vector in \(W\) closest to \(\vv\) and therefore the best approximation of \(\vv\) by a vector in \(W\text{.}\)

Theorem 25.4.

Let \(\B = \{\vw_1, \vw_2, \ldots, \vw_m\}\) be an orthogonal basis for a subspace \(W\) of \(\R^n\) and let \(\vv\) be a vector in \(\R^n\text{.}\) Then

\begin{equation*} ||\vv - \proj_W \vv || \lt || \vv - \vx || \end{equation*}

for every vector \(\vx\) different from \(\proj_W \vv\) in \(W\text{.}\)

Proof.

Let \(\B = \{\vw_1, \vw_2, \ldots, \vw_m\}\) be an orthogonal basis for a subspace \(W\) of \(\R^n\) and let \(\vv\) be a vector in \(\R^n\text{.}\) Let \(\vx\) be a vector in \(W\text{.}\) The vector \(\proj_W \vv - \vx\) is in \(W\text{,}\) so is orthogonal to \(\proj_{\perp W} \vv = \vv - \proj_W \vv\text{.}\) Thus, the dotted triangle whose vertices are the tips of the vectors \(\vv\text{,}\) \(\vx\text{,}\) and \(\proj_W \vv\) in Figure 25.5 is a right triangle.

Figure 25.5. The best approximation to \(\vv\) in \(W\)

The Pythagorean theorem then shows that

\begin{equation*} || \vv - \vx ||^2 = || \proj_{\perp W} \vv ||^2 + || \proj_W \vv - \vx ||^2\text{.} \end{equation*}

Now \(\vx \neq \proj_W \vv\text{,}\) so \(|| \proj_W \vv - \vx ||^2 > 0\text{.}\) This implies

\begin{equation*} || \vv - \vx ||^2 > || \proj_{\perp W} \vv ||^2 \end{equation*}

and it follows that

\begin{equation*} || \vv - \vx || > || \proj_{\perp W} \vv ||\text{.} \end{equation*}

This theorem shows that the distance from \(\proj_W \vv\) to \(\vv\) is less than the distance from any other vector in \(W\) to \(\vv\text{.}\) So \(\proj_W \vv\) is the best approximation to \(\vv\) of all the vectors in \(W\text{.}\)

If \(\vv = [v_1 \ v_2 \ v_3 \ \ldots \ v_m]^{\tr}\) and \(\proj_W \vv = [w_1 \ w_2 \ w_3 \ \ldots \ w_m]^{\tr}\text{,}\) then the square of the error in approximating \(\vv\) by \(\proj_W \vv\) is given by

\begin{equation*} || \vv - \proj_W \vv ||^2 = \sum_{i=1}^m (v_i - w_i)^2\text{.} \end{equation*}

So \(\proj_W \vv\) minimizes this sum of squares over all vectors in \(W\text{.}\) As a result, we call \(\proj_W \vv\) the least squares approximation to \(\vv\text{.}\)

Activity 25.3.

Let \(\CB = \left\{\left[ \begin{array}{r} 1 \\ 0 \\ -1 \\ 0 \end{array} \right], \left[ \begin{array}{r} 0 \\ 1 \\ 0 \\ -1 \end{array} \right], \left[ \begin{array}{r} -1 \\ 1 \\ -1 \\ 1 \end{array} \right]\right\}\) and let \(W = \Span(\CB)\) in \(\R^4\text{.}\) Find the best approximation in \(W\) to the vector \(\vv = \left[ \begin{array}{r} 2 \\ 3 \\ 1 \\ -1 \end{array} \right]\) in \(W\text{.}\)

Subsection The Gram-Schmidt Process

We have seen that orthogonal bases make computations very convenient. However, until now we had no convenient way to construct orthogonal bases. The problem we address in this section is how to create an orthogonal basis from any basis.

Preview Activity 25.4.

Let \(W = \Span \{\vv_1, \vv_2, \vv_3\}\) in \(\R^4\text{,}\) where \(\vv_1 = [1 \ 1 \ 1 \ 1]^{\tr}\text{,}\) \(\vv_2 = [-1 \ 4 \ 4 \ -1]^{\tr}\text{,}\) and \(\vv_3 = [2 \ -1 \ 1 \ 0]^{\tr}\text{.}\) Our goal in this preview activity is to begin to understand how we can find an orthogonal set \(\CB = \{\vw_1, \vw_2, \vw_3\}\) in \(\R^4\) so that \(\Span \ \CB = W\text{.}\) To begin, we could start by letting \(\vw_1 = \vv_1\text{.}\)

(a)

Now we want to find a vector in \(W\) that is orthogonal to \(\vw_1\text{.}\) Let \(W_1 = \Span\{\vw_1\}\text{.}\) Explain why \(\vw_2 = \proj_{\perp W_1} \vv_2\) is in \(W\) and is orthogonal to \(\vw_1\text{.}\) Then calculate the vector \(\vw_2\text{.}\)

(b)

Next we need to find a third vector \(\vw_3\) that is in \(W\) and is orthogonal to both \(\vw_1\) and \(\vw_2\text{.}\) Let \(W_2 = \Span \{\vw_1, \vw_2\}\text{.}\) Explain why \(\vw_3 = \proj_{\perp W_2} \vv_3\) is in \(W\) and is orthogonal to both \(\vw_1\) and \(\vw_2\text{.}\) Then calculate the vector \(\vw_3\text{.}\)

(c)

Explain why the set \(\{\vw_1, \vw_2, \vw_3\}\) is an orthogonal basis for \(W\text{.}\)

Preview Activity 25.4 shows the first steps of the Gram-Schmidt process to construct an orthogonal basis from any basis of a subspace in \(\R^n\text{.}\) To understand why the process works in general, let \(\{\vv_1, \vv_2, \ldots, \vv_m\}\) be a basis for a subspace \(W\) of \(\R^n\text{.}\) Let \(\vw_1 = \vv_1\) and let \(W_1 = \Span\{\vw_1\}\text{.}\) Since \(\vw_1 = \vv_1\) we have that \(W_1 = \Span\{\vw_1\} = \Span\{\vv_1\}\text{.}\)

The vectors \(\vv_1 = \vw_1\) and \(\vv_2\) are possibly not orthogonal, but we know the orthogonal projection of \(\vv_2\) onto \(W_1^{\perp}\) is orthogonal to \(\vw_1\text{.}\) Let

\begin{equation*} \vw_2 = \proj_{\perp W_1} \vv_2 = \vv_2 - \frac{ \vv_2 \cdot \vw_1 }{ \vw_1 \cdot \vw_1 } \vw_1\text{.} \end{equation*}

Then \(\{\vw_1, \vw_2\}\) is an orthogonal set. Note that \(\vw_1 = \vv_1 \neq \vzero\text{,}\) and the fact that \(\vv_2 \notin W_1\) implies that \(\vw_2 \neq \vzero\text{.}\) So the set \(\{\vw_1, \vw_2\}\) is linearly independent, being a set of non-zero orthogonal vectors. Now the question is whether \(\Span\{\vw_1,\vw_2\} = \Span\{\vv_1,\vv_2\}\text{.}\) Note that \(\vw_2\) is a linear combination of \(\vv_1\) and \(\vv_2\text{,}\) so \(\vw_2\) is in \(\Span\{\vv_1,\vv_2\}\text{.}\) Since \(\Span\{\vw_1,\vw_2\}\) is a 2-dimensional subspace of the 2-dimensional space \(\Span\{\vv_1,\vv_2\}\text{,}\) it must be true that \(\Span\{\vw_1,\vw_2\} = W_2 = \Span\{\vv_1,\vv_2\}\text{.}\)

Now we take the next step, adding \(\vv_3\) into the mix. The vector

\begin{equation*} \vw_3 = \proj_{\perp W_2} \vv_3 = \vv_3 - \frac{ \vv_3 \cdot \vw_1 }{ \vw_1 \cdot \vw_1 } \vw_1 - \frac{ \vv_3 \cdot \vw_2 }{ \vw_2 \cdot \vw_2 } \vw_2 \end{equation*}

is orthogonal to both \(\vw_1\) and \(\vw_2\) and, by construction, \(\vw_3\) is a linear combination of \(\vv_1\text{,}\) \(\vv_2\text{,}\) and \(\vv_3\text{.}\) So \(\vw_3\) is in \(W_3 = \Span\{\vv_1, \vv_2, \vv_3\}\text{.}\) The fact that \(\vv_3 \notin W_2\) implies that \(\vw_3 \neq \vzero\) and \(\{\vw_1, \vw_2, \vw_3\}\) is a linearly independent set. Since \(\Span\{\vw_1, \vw_2, \vw_3\}\) is a 3-dimensional subspace of the 3-dimensional space \(\Span\{\vv_1,\vv_2,\vv_3\}\text{,}\) we conclude that \(\Span\{\vw_1, \vw_2, \vw_3\} = W_3 = \Span\{\vv_1,\vv_2,\vv_3\}\text{.}\)

We continue inductively in this same manner. If we have constructed a set \(\{\vw_1\text{,}\) \(\vw_2\text{,}\) \(\vw_3\text{,}\) \(\ldots\text{,}\) \(\vw_{k-1}\}\) of \(k-1\) orthogonal vectors such that

\begin{equation*} \Span\{\vw_1, \vw_2, \vw_3, \ldots, \vw_{k-1}\} = \Span\{\vv_1,\vv_2,\vv_3, \ldots, \vv_{k-1}\}\text{,} \end{equation*}

then we let

\begin{align*} \vw_{k} \amp = \proj_{\perp W_{k-1}} \vv_k\\ \amp = \vv_k - \ds \frac{ \vv_k \cdot \vw_1 }{ \vw_1 \cdot \vw_1 } \vw_1 - \frac{ \vv_k \cdot \vw_2 }{ \vw_2 \cdot \vw_2 } \vw_2 - \cdots - \frac{ \vv_k \cdot \vw_{k-1} }{ \vw_{k-1} \cdot \vw_{k-1} } \vw_{k-1}\text{,} \end{align*}

where

\begin{equation*} W_{k-1} = \Span\{\vw_1, \vw_2, \vw_3, \ldots, \vw_{k-1}\}\text{.} \end{equation*}

We know that \(\vw_k\) is orthogonal to \(\vw_1\text{,}\) \(\vw_2\text{,}\) \(\ldots\text{,}\) \(\vw_{k-1}\text{.}\) Since \(\vw_1\text{,}\) \(\vw_2\text{,}\) \(\ldots\text{,}\) \(\vw_{k-1}\text{,}\) and \(\vv_k\) are all in \(W_k = \Span\{\vv_1, \vv_2, \ldots, \vv_k\}\) we see that \(\vw_k\) is also in \(W_k\text{.}\) Since \(\vv_k \notin W_{k-1}\) implies that \(\vw_k \neq \vzero\) and \(\{\vw_1, \vw_2, \ldots, \vw_k\}\) is a linearly independent set. Then \(\Span\{\vw_1, \vw_2, \vw_3, \ldots, \vw_{k}\}\) is a \(k\)-dimensional subspace of the \(k\)-dimensional space \(W_k\text{,}\) so it follows that

\begin{equation*} \Span\{\vw_1, \vw_2, \vw_3, \ldots, \vw_{k}\} = W_k = \Span\{\vv_1,\vv_2,\vv_3, \ldots, \vv_{k}\}\text{.} \end{equation*}

This process will end when we have an orthogonal set \(\{\vw_1\text{,}\) \(\vw_2\text{,}\) \(\vw_3\text{,}\) \(\ldots\text{,}\) \(\vw_{m}\}\) with \(\Span\{\vw_1\text{,}\) \(\vw_2\text{,}\) \(\vw_3\text{,}\) \(\ldots\text{,}\) \(\vw_{m}\}\) = \(W\text{.}\)

We summarize the process in the following theorem.

Theorem 25.6. The Gram-Schmidt Process.

Let \(\{\vv_1, \vv_2, \ldots, \vv_m\}\) be a basis for a subspace \(W\) of \(\R^n\text{.}\) The set \(\{\vw_1, \vw_2, \vw_3, \ldots, \vw_{m}\}\) defined by

\(\vw_1 = \vv_1\text{,}\)
\(\vw_2 = \vv_2 - \ds \frac{ \vv_2 \cdot \vw_1 }{ \vw_1 \cdot \vw_1 } \vw_1\text{,}\)
\(\vw_3 = \vv_3 - \ds \frac{ \vv_3 \cdot \vw_1 }{ \vw_1 \cdot \vw_1 } \vw_1 - \frac{ \vv_3 \cdot \vw_2 }{ \vw_2 \cdot \vw_2 } \vw_2\text{,}\) \(\vdots\)
\(\vw_m = \vv_m - \ds \frac{ \vv_m \cdot \vw_1 }{ \vw_1 \cdot \vw_1 } \vw_1 - \frac{ \vv_m \cdot \vw_2 }{ \vw_2 \cdot \vw_2 } \vw_2 - \cdots - \frac{ \vv_m \cdot \vw_{m-1} }{ \vw_{m-1} \cdot \vw_{m-1} } \vw_{m-1}\text{.}\)

is an orthogonal basis for \(W\text{.}\) Moreover,

\begin{equation*} \Span\{\vw_1, \vw_2, \vw_3, \ldots, \vw_{k}\} = \Span\{\vv_1,\vv_2,\vv_3, \ldots, \vv_{k}\} \end{equation*}

for \(1\leq k\leq m\text{.}\)

The Gram-Schmidt process builds an orthogonal basis \(\{\vw_1, \vw_2, \vw_3, \ldots, \vw_{m}\}\) for us from a given basis. To make an orthonormal basis \(\{\vu_1, \vu_2, \vu_3, \ldots, \vu_{m}\}\text{,}\) all we need do is normalize each basis vector: that is, for each \(i\text{,}\) we let

\begin{equation*} \vu_i = \ds \frac{\vw_i}{|| \vw_i ||} \,\text{.} \end{equation*}

Activity 25.5.

Let \(W = \Span \left\{\left[ \begin{array}{c} 1\\1\\0\\0 \end{array} \right], \left[ \begin{array}{c} 0\\1\\1\\0 \end{array} \right], \left[ \begin{array}{c} 0\\0\\1\\1 \end{array} \right] \right\}\text{.}\)

(a)

Use the Gram-Schmidt process to find an orthogonal basis for \(W\text{.}\)

Hint.

Order your vectors carefully to minimize computation.

(b)

Find the projection of the vector \(\vv = [0 \ 1 \ 1 \ 1]^{\tr}\) onto \(W\text{.}\)

Subsection The QR Factorization of a Matrix

There are several different factorizations, or decompositions, of matrices where each matrix is written as a product of certain types of matrices: LU decomposition using lower and upper triangular matrices (see Section 22), EVD (EigenVector Decomposition) decomposition using eigenvectors and diagonal matrices (see Section 19), and in this section we will introduce the QR decomposition of a matrix. The QR decomposition is one of the most important tools for matrix computations. Jack Dongarra and Francis Sullivan⁴² nominated the QR decomposition as one of the “10 algorithms with the greatest influence on the development and practice of science and engineering in the 20th century.” The QR algorithm's importance stems from the fact that is a “genuinely new contribution to the field of numerical analysis and not just a refinement of ideas given by Newton, Gauss, Hadamard, or Schur.”⁴³ Most computer algebra systems (e.g., MATLAB) use a variant of the QR decomposition to compute eigenvalues. While there are other methods for approximating eigenvalues, the QR algorithm stands out. For example, the power method is useful when a matrix is large and contains a lot of zero entries (such matrices are called sparse). When most of the entries of a matrix are nonzero, the power method is not recommended. The better, more sophisticated, alternative in these situations is to use the QR decomposition. The QR factorization has applications to solving least squares problems and approximating eigenvalues of matrices.

Activity 25.6.

Let \(A = \left[ \begin{array}{cc} 1\amp 0 \\ 0\amp 0 \\ 0\amp 2 \end{array} \right]\text{.}\)

(a)

Find an orthonormal basis \(\CB\) for \(\Col A\text{.}\) Let \(Q\) be the matrix whose columns are these orthonormal basis vectors.

(b)

Write the columns of \(A\) as linear combinations of the columns of \(Q\text{.}\) That is, if \(A = [\va_1 \ \va_2]\text{,}\) find \([\va_1]_{\CB}\) and \([\va_2]_{\CB}\text{.}\) Let \(R = [[\va_1]_{\CB} \ [\va_2]_{\CB}]\text{.}\)

(c)

Find the product \(QR\) and compare to \(A\text{.}\)

Activity 25.6 contains the main ideas to find the QR factorization of a matrix. Let

\begin{equation*} A = [\va_1 \ \va_2 \ \va_3 \ \cdots \ \va_n ] \end{equation*}

be an \(m \times n\) matrix with rank⁴⁴ \(n\text{.}\) We can use the Gram-Schmidt process to find an orthonormal basis \(\{\vu_1, \vu_2, \ldots, \vu_n\}\) for \(\Col A\text{.}\) Recall also that \(\Span\{\vu_1, \vu_2, \ldots, \vu_k\} = \Span\{\va_1, \va_2, \ldots, \va_k\}\) for any \(k\) between 1 and \(n\text{.}\) Let

\begin{equation*} Q = [\vu_1 \ \vu_2 \ \vu_3 \ \cdots \ \vu_n ]\text{.} \end{equation*}

If \(k\) is between 1 and \(n\text{,}\) then \(\va_k\) is in \(\Span\{\vu_1, \vu_2, \ldots, \vu_k\}\) and

\begin{equation*} \va_k = r_{1k}\vu_1 + r_{2k}\vu_2 + \cdots + r_{kk}\vu_k \end{equation*}

for some scalars \(r_{1k}\text{,}\) \(r_{2k}\text{,}\) \(\ldots\text{,}\) \(r_{kk}\text{.}\) Then

\begin{equation*} Q \left[ \begin{array}{c} r_{1k} \\ r_{2k} \\ \vdots \\ r_{kk} \\ 0 \\ \vdots \\ 0 \end{array} \right] = \va_k\text{.} \end{equation*}

If we let \(\vr_k = \left[ \begin{array}{c} r_{1k} \\ r_{2k} \\ \vdots \\ r_{kk} \\ 0 \\ \vdots \\ 0 \end{array} \right]\) for \(k\) from 1 to \(n\text{,}\) then

\begin{equation*} A = [Q\vr_1 \ Q\vr_2 \ Q\vr_3 \ \cdots \ Q\vr_k]\text{.} \end{equation*}

This is the QR factorization of \(A\) into the product

\begin{equation*} A = QR \end{equation*}

where the columns of \(Q\) form an orthonormal basis for \(\Col A\) and

\begin{equation*} R = [\vr_1 \ \vr_2 \ \vr_3 \ \cdots \ \vr_n ] = \left[ \begin{array}{cccccc} r_{11}\amp r_{12}\amp r_{13}\amp \cdots \amp r_{1n-1}\amp r_{1n} \\ 0\amp r_{22}\amp r_{23}\amp \cdots \amp r_{2n-1}\amp r_{2n} \\ 0\amp 0\amp r_{33}\amp \cdots \amp r_{3n-1}\amp r_{3n} \\ \vdots\amp \vdots \amp \vdots \amp \vdots \amp \vdots \amp \vdots \\ 0\amp 0\amp 0\amp \cdots \amp 0\amp r_{nn} \end{array} \right] \end{equation*}

is an upper triangular matrix. Note that \(Q\) is an \(m \times n\) matrix and \(R = [r_{ij}]\) is an \(n \times n\) matrix with \(r_{ii} \neq 0\) for each \(i\text{.}\)

Activity 25.7.

The set \(\{\vu_1, \vu_2\}\text{,}\) where \(\vu_1 = [1 \ 1 \ 1 \ 1]^{\tr}\) and \(\vu_2 = [1 \ -1 \ 1 \ -1]^{\tr}\) is an orthogonal basis for the column space of a \(4 \times 2\) matrix \(A = [\va_1 \ \va_2]\text{.}\) Moreover, \(\va_1 = 2\vu_1\) and \(\va_2 = 3\vu_2+4\vu_2\text{.}\) What is \(A\text{?}\)

The QR factorization provides a widely used algorithm (the QR algorithm) for approximating all of the eigenvalues of a matrix. The computer system MATLAB utilizes four versions of the QR algorithm to approximate the eigenvalues of real symmetric matrices, eigenvalues of real nonsymmetric matrices, eigenvalues of pairs of complex matrices, and singular values of general matrices.

The algorithm works as follows.

Start with an \(n \times n\) matrix \(A\text{.}\) Let \(A_1 = A\text{.}\)
Find the QR factorization for \(A_1\) and write it as \(A_1 = Q_1R_1\text{,}\) where \(Q_1\) is orthogonal and \(R_1\) is upper triangular.
Let \(A_2 = Q_1^{-1}A_1Q_1 = Q_1^{\tr}AQ_1 = R_1Q_1\text{.}\) Find the QR factorization of \(A_2\) and write it as \(A_2 = Q_2R_2\text{.}\)
Let \(A_3 = Q_2^{-1}A_2Q_2 = Q_2^{\tr}AQ_2 = R_2Q_2\text{.}\) Find the QR factorization of \(A_3\) and write it as \(A_3 = Q_3R_3\text{.}\)
Continue in this manner to obtain a sequence \(\{A_k\}\) where \(A_k = Q_kR_k\) and \(A_{k+1} = R_kQ_k\text{.}\)

Note that \(A_{k+1} = Q_k^{-1}A_kQ_k\) and so all of the matrices \(A_k\) are similar to each other and therefore all have the same eigenvalues. We won't go into the details, but it can be shown that if the eigenvalues of \(A\) are real and have distinct absolute values, then the sequence \(\{A_i\}\) converges to an upper triangular matrix with the eigenvalues of \(A\) as the diagonal entries. If some of the eigenvalues of \(A\) are complex, then the sequence \(\{A_i\}\) converges to a block upper triangular matrix, where the diagonal blocks are either \(1 \times 1\) (approximating the real eigenvalues of \(A\)) or \(2 \times 2\) (which provide a pair of conjugate complex eigenvalues of \(A\)).

Subsection Examples

What follows are worked examples that use the concepts from this section.

Example 25.7.

Let \(W\) be the subspace of \(\R^4\) spanned by \(\vw_1 = [1 \ 0 \ 0 \ 0]^{\tr}\text{,}\) \(\vw_2 = [1 \ 1 \ 1 \ 0]^{\tr}\text{,}\) and \(\vw_3 = [1 \ 2 \ 0 \ 1]^{\tr}\text{.}\)

(a)

Use the Gram-Schmidt process to find an orthonormal basis for the subspace \(W\) of \(\R^4\) spanned by \(\vw_1 = [1 \ 0 \ 0 \ 0]^{\tr}\text{,}\) \(\vw_2 = [1 \ 1 \ 1 \ 0]^{\tr}\text{,}\) and \(\vw_3 = [1 \ 2 \ 0 \ 1]^{\tr}\text{.}\)

Solution.

First note that \(\vw_1\text{,}\) \(\vw_2\text{,}\) and \(\vw_3\) are linearly independent. We let \(\vv_1 = \vw_1\) and the Gram-Schmidt process gives us

\begin{align*} \vv_2 \amp = \vw_2 - \frac{\vw_2 \cdot \vv_1}{\vv_1 \cdot \vv_1} \vv_1\\ \amp = [1 \ 1 \ 1 \ 0]^{\tr} - \frac{1}{1}[1 \ 0 \ 0 \ 0]^{\tr}\\ \amp = [0 \ 1 \ 1 \ 0]^{\tr} \end{align*}

and

\begin{align*} \vv_3 \amp = \vw_3 - \frac{\vw_3 \cdot \vv_1}{\vv_1 \cdot \vv_1} \vv_1 - \frac{\vw_3 \cdot \vv_2}{\vv_2 \cdot \vv_2} \vv_2\\ \amp = [1 \ 2 \ 0 \ 1]^{\tr} - \frac{1}{1}[1 \ 0 \ 0 \ 0]^{\tr} - \frac{2}{2}[0 \ 1 \ 1 \ 0]^{\tr}\\ \amp = [0 \ 1 \ -1 \ 1]^{\tr}\text{.} \end{align*}

So \(\{\vv_1, \vv_2, \vv_3\}\) is an orthogonal basis for \(W\text{.}\) An orthonormal basis is found by dividing each vector by its magnitude, so

\begin{equation*} \{[1 \ 0 \ 0 \ 0]^{\tr}, \frac{1}{\sqrt{2}} [0 \ 1 \ 1 \ 0]^{\tr}, \frac{1}{\sqrt{3}}[0 \ 1 \ -1 \ 1]^{\tr}\} \end{equation*}

is an orthonormal basis for \(W\text{.}\)

(b)

Find a QR factorization of the matrix \(A = \left[ \begin{array}{cccc} 1\amp 1\amp 1\amp 0 \\ 0\amp 1\amp 2\amp 0 \\ 0\amp 1\amp 0\amp 0 \\ 0\amp 0\amp 1\amp 1 \end{array} \right]\text{.}\)

Solution.

Technology shows that the reduced row echelon form of \(A\) is \(I_4\text{,}\) so the columns of \(A\) are linearly independent and \(A\) has rank \(4\text{.}\) From part (a) we have an orthogonal basis for the span of the first three columns of \(A\text{.}\) To find a fourth vector to add so that the span is \(\Col \ A\text{,}\) we apply the Gram-Schmidt process one more time with \(\vw_4 = [0 \ 0 \ 0 \ 1]^{\tr}\text{:}\)

\begin{align*} \vv_4 \amp = \vw_4 - \frac{\vw_4 \cdot \vv_1}{\vv_1 \cdot \vv_1} \vv_1 - \frac{\vw_4 \cdot \vv_2}{\vv_2 \cdot \vv_2} \vv_2 - \frac{\vw_4 \cdot \vv_3}{\vv_3 \cdot \vv_3} \vv_3\\ \amp = [0 \ 0 \ 0 \ 1]^{\tr} - \frac{0}{1}[1 \ 0 \ 0 \ 0]^{\tr} - \frac{0}{2}[0 \ 1 \ 1 \ 0]^{\tr} - \frac{1}{3}[0 \ 1 \ -1 \ 1]^{\tr}\\ \amp = \frac{1}{3} [0 \ -1 \ 1 \ 2]^{\tr}\text{.} \end{align*}

So \(\{\vu_1, \vu_2, \vu_3, \vu_4\}\) is an orthonormal basis for \(\Col \ A\text{,}\) where

\begin{equation*} \begin{array}{ccc} \vu_1 = [1 \ 0 \ 0 \ 0]^{\tr} \amp \amp \vu_2 = \frac{\sqrt{2}}{2}[0 \ 1 \ 1 \ 0]^{\tr} \\ \vu_3 = \frac{\sqrt{3}}{3}[0 \ 1 \ -1 \ 1]^{\tr} \amp \amp \vu_4 = \frac{\sqrt{6}}{6} [0 \ -1 \ 1 \ 2]^{\tr}. \end{array} \end{equation*}

This makes

\begin{equation*} Q = \left[ \begin{array}{ccrr} 1\amp 0\amp 0\amp 0 \\ 0\amp \frac{\sqrt{2}}{2}\amp \frac{\sqrt{3}}{3}\amp -\frac{\sqrt{6}}{6} \\ 0\amp \frac{\sqrt{2}}{2}\amp -\frac{\sqrt{3}}{3}\amp \frac{\sqrt{6}}{6} \\ 0\amp 0\amp \frac{\sqrt{3}}{3}\amp \frac{\sqrt{6}}{3} \end{array} \right]\text{.} \end{equation*}

To find the matrix \(R\text{,}\) we write the columns of \(A\) in terms of the basis vectors \(\vu_1\text{,}\) \(\vu_2\text{,}\) \(\vu_3\text{,}\) and \(\vu_4\text{.}\) Technology shows that the reduced row echelon form of \([Q \ | \ A]\) is

\begin{equation*} \left[ \begin{array}{cccc|cccc} 1\amp 0\amp 0\amp 0\amp 1\amp 1\amp 1\amp 0 \\ 0\amp 1\amp 0\amp 0\amp 0\amp \sqrt{2}\amp \sqrt{2}\amp 0 \\ 0\amp 0\amp 1\amp 0\amp 0\amp 0\amp \sqrt{3}\amp \frac{\sqrt{3}}{3} \\ 0\amp 0\amp 0\amp 1\amp 0\amp 0\amp 0\amp \frac{\sqrt{6}}{3} \end{array} \right]\text{.} \end{equation*}

\begin{equation*} R = \left[ \begin{array}{cccc} 1\amp 1\amp 1\amp 0 \\ 0\amp \sqrt{2}\amp \sqrt{2}\amp 0 \\ 0\amp 0\amp \sqrt{3}\amp \frac{\sqrt{3}}{3} \\ 0\amp 0\amp 0\amp \frac{\sqrt{6}}{3} \end{array} \right]\text{.} \end{equation*}

Example 25.8.

Let \(W = \Span\{[1 \ 0 \ 1 \ 0]^{\tr}, [0 \ 0 \ 1 \ -1]^{\tr}, [1 \ 0 \ -1 \ 0]^{\tr}\}\text{.}\) Find the vector in \(W\) that is closest to the vector \([1 \ 1 \ 1 \ 1]^{\tr}\text{.}\) Provide a numeric measure of the error in approximating \([1 \ 1 \ 1 \ 1]^{\tr}\) by this vector.

Solution.

Our job is to find \(\proj_{W} [1 \ 1 \ 1 \ 1]^{\tr}\text{.}\) To do this, we need an orthogonal basis of \(W\text{.}\) Let \(\vv_1 = [1 \ 0 \ 1 \ 0]^{\tr}\text{,}\) \(\vv_2 = [0 \ 0 \ 1 \ -1]^{\tr}\text{,}\) and \(\vv_3 = [1 \ 0 \ -1 \ 0]^{\tr}\text{.}\) Technology shows that each column of the matrix \([\vv_1 \ \vv_2 \ \vv_3]\) is a pivot column, so the set \(\{\vv_1, \vv_2, \vv_3\}\) is a basis for \(W\text{.}\) We apply the Gram-Schmidt process to this basis to obtain an orthogonal basis \(\{\vw_1, \vw_2, \vw_3\}\) of \(W\text{.}\) We start with \(\vw_1= \vv_1\text{,}\) then

\begin{align*} \vw_2 \amp = \vv_2 - \frac{\vv_2 \cdot \vw_1}{ \vw_1 \cdot \vw_1} \vw_1\\ \amp = [0 \ 0 \ 1 \ -1]^{\tr} - \frac{1}{2} [1 \ 0 \ 1 \ 0]^{\tr}\\ \amp = \frac{1}{2}[-1 \ 0 \ 1 \ -2]^{\tr} \end{align*}

and

\begin{align*} \vw_3 \amp = \vv_3 - \frac{\vv_3 \cdot \vw_1}{ \vw_1 \cdot \vw_1} \vw_1 - \frac{\vv_3 \cdot \vw_2}{ \vw_2 \cdot \vw_2} \vw_2\\ \amp = [1 \ 0 \ -1 \ 0]^{\tr} - \frac{0}{2} [1 \ 0 \ 1 \ 0]^{\tr} + \frac{2}{3} \left(\frac{1}{2}[-1 \ 0 \ 1 \ -2]^{\tr}\right)\\ \amp = \frac{1}{3} [2 \ 0 \ -2 \ -2]^{\tr}\text{.} \end{align*}

Then, letting \(\vz = [1 \ 1 \ 1 \ 1]^{\tr}\) we have

\begin{align*} \proj_{W} \vz \amp = \frac{\vz \cdot \vw_1}{\vw_1 \cdot \vw_1} \vw_1 + \frac{\vz \cdot \vw_2}{\vw_2 \cdot \vw_2} \vw_2 + \frac{\vz \cdot \vw_3}{\vw_3 \cdot \vw_3} \vw_3\\ \amp = \frac{2}{2}[1 \ 0 \ 1 \ 0]^{\tr} + \frac{-1}{3/2}\frac{1}{2}[-1 \ 0 \ 1 \ -2]^{\tr} + \frac{-2/3}{4/3} \frac{1}{3} [2 \ 0 \ -2 \ -2]^{\tr}\\ \amp = [1 \ 0 \ 1 \ 0]^{\tr} - \frac{1}{3}[-1 \ 0 \ 1 \ -2]^{\tr} - \frac{1}{6}[2 \ 0 \ -2 \ -2]^{\tr}\\ \amp = [1 \ 0 \ 1 \ 1]^{\tr}\text{.} \end{align*}

The norm of \(\proj_{W^{\perp}} \vz = \vz - \proj_W \vz\) tells us how well our projection \([1 \ 0 \ 1 \ 1]^{\tr}\) approximates \(\vz = [1 \ 1 \ 1 \ 1]^{\tr}\text{.}\) Now

\begin{equation*} ||\proj_{W^{\perp}} \vz || = ||[1 \ 1 \ 1 \ 1]^{\tr} - [1 \ 0 \ 1 \ 1]^{\tr}|| = ||[0 \ 1 \ 0 \ 0]^{\tr}|| = 1\text{,} \end{equation*}

so \([1 \ 0 \ 1 \ 1]^{\tr}\) is one unit away from \(\vz = [1 \ 1 \ 1 \ 1]^{\tr}\text{.}\)

Subsection Summary

The projection of the vector \(\vv\) in \(V\) onto \(W\) is the vector

\begin{equation*} \proj_W \vv = \frac{\vv \cdot \vw_1}{\vw_1 \cdot \vw_1} \vw_1 + \frac{\vv \cdot \vw_2}{\vw_2 \cdot \vw_2} \vw_2 + \cdots + \frac{\vv \cdot \vw_m}{\vw_m \cdot \vw_m} \vw_m\text{,} \end{equation*}

where \(W\) is the a subspace of \(\R^n\) with orthogonal basis \(\{\vw_1, \vw_2, \ldots, \vw_m\}\text{.}\) These projections are important in that \(\proj_W \vv\) is the best approximation of the vector \(\vv\) by a vector in \(W\) in the least squares sense.
With \(W\) as in (a), the projection of \(\vv\) orthogonal to \(W\) is the vector

\begin{equation*} \proj_{\perp W} \vv = \vv - \proj_W \vv\text{.} \end{equation*}

The norm of \(\proj_{\perp W} \vv\) provides a measure of how well \(\proj_W \vv\) approximates the vector \(\vv\text{.}\)
The Gram-Schmidt process produces an orthogonal basis from any basis. It works as follows. Let \(\{\vv_1, \vv_2, \ldots, \vv_m\}\) be a basis for a subspace \(W\) of \(\R^n\text{.}\) The set \(\{\vw_1\text{,}\) \(\vw_2\text{,}\) \(\vw_3\text{,}\) \(\ldots\text{,}\) \(\vw_{m}\}\) defined by
- \(\vw_1 = \vv_1\text{,}\)
- \(\vw_2 = \vv_2 - \ds \frac{\vv_2 \cdot \vw_1}{\vw_1 \cdot \vw_1} \vw_1\text{,}\)
- \(\vw_3 = \vv_3 - \ds \frac{ \vv_3 \cdot \vw_1 }{ \vw_1 \cdot \vw_1 } \vw_1 - \frac{ \vv_3 \cdot \vw_2 }{ \vw_2 \cdot \vw_2 } \vw_2\text{,}\) \(\vdots\)
- \(\vw_m = \vv_m - \ds \frac{ \vv_m \cdot \vw_1 }{ \vw_1 \cdot \vw_1 } \vw_1 - \frac{ \vv_m \cdot \vw_2 }{ \vw_2 \cdot \vw_2 } \vw_2 - \cdots - \frac{ \vv_m \cdot \vw_{m-1} }{ \vw_{m-1} \cdot \vw_{m-1} } \vw_{m-1}\text{.}\)
is an orthogonal basis for \(W\text{.}\) Moreover,

\begin{equation*} \Span\{\vw_1, \vw_2, \vw_3, \ldots, \vw_{k}\} = \Span\{\vv_1,\vv_2,\vv_3, \ldots, \vv_{k}\} \end{equation*}

for each \(k\) between 1 and \(m\text{.}\)
The QR factorization has applications to solving least squares problems and approximating eigenvalues of matrices. The QR factorization writes an \(m \times n\) matrix with rank \(n\) as a product \(A = QR\text{,}\) where the columns of \(Q\) form an orthonormal basis for \(\Col A\) and

\begin{equation*} R = [\vr_1 \ | \ \vr_2 \ | \ \vr_3 \ | \ \cdots \ | \ \vr_n ] = \left[ \begin{array}{cccccc} r_{11}\amp r_{12}\amp r_{13}\amp \cdots \amp r_{1n-1}\amp r_{1n} \\ 0\amp r_{22}\amp r_{23}\amp \cdots \amp r_{2n-1}\amp r_{2n} \\ 0\amp 0\amp r_{33}\amp \cdots \amp r_{3n-1}\amp r_{3n} \\ \vdots\amp \vdots \amp \vdots \amp \vdots \amp \vdots \amp \vdots \\ 0\amp 0\amp 0\amp \cdots \amp 0\amp r_{nn} \end{array} \right] \end{equation*}

is an upper triangular matrix.

Exercises Exercises

1.

Let \(W = \Span \{\vw_1, \vw_2\}\) in \(\R^3\text{,}\) with \(\vw_1=[1 \ 2 \ 1]^{\tr}\) and \(\vw_2= [1 \ -1 \ 1]^{\tr}\text{,}\) and \(\vv = [1 \ 0 \ 0]^{\tr}\text{.}\)

(a)

Find \(\proj_W \vv\) and \(\proj_{\perp W} \vv\)

(b)

Find the vector in \(W\) that is closest to \(\vv\text{.}\) How close is this vector to \(\vv\text{?}\)

2.

3.

In this exercise we determine the least-squares line (the line of best fit in the least squares sense) to a set of \(k\) data points \((x_1,y_1)\text{,}\) \((x_2,y_2)\text{,}\) \(\ldots\text{,}\) \((x_k,y_k)\) in the plane. In this case, we want to fit a line of the form \(f(x) = mx+b\) to the data. If the data were to lie on a line, then we would have a solution to the system

\begin{align*} mx_1 + b \amp = y_1\\ mx_2 + b \amp = y_2\\ \vdots\\ mx_k + b \amp = y_k \end{align*}

This system can be written as

\begin{equation*} m \vw_1 + b\vw_2 = \vy\text{,} \end{equation*}

where \(\vw_1 = [x_1 \ x_2 \ \cdots \ x_k]^{\tr}\text{,}\) \(\vw_2 = [1 \ 1 \ \cdots \ 1]^{\tr}\text{,}\) and \(\vy = [y_1 \ y_2 \ \cdots \ y_k]^{\tr}\text{.}\) If the data does not lie on a line, then the system won't have a solution. Instead, we want to minimize the square of the distance between \(\vy\) and a vector of the form \(m\vw_1 + b \vw_2\text{.}\) That is, minimize

\begin{equation} ||\vy - (m \vw_1 + b\vw_2)||^2\text{.}\tag{25.1} \end{equation}

Rephrasing this in terms of projections, we are looking for the vector in \(W = \Span\{\vw_1, \vw_2\}\) that is closest to \(\vy\text{.}\) In other words, the values of \(m\) and \(b\) will occur as the weights when we write \(\proj_{W} \vy\) as a linear combination of \(\vw_1\) and \(\vw_2\text{.}\) The one wrinkle in this problem is that we need an orthogonal basis for \(W\) to find this projection. Find the least squares line for the data points \((1,2)\text{,}\) \((2,4)\) and \((3,5)\) in \(\R^2\text{.}\)

4.

Each set \(S\) is linearly independent. Use the Gram-Schmidt process to create an orthogonal set of vectors with the same span as \(S\text{.}\) Then find an orthonormal basis for the same span.

(a)

\(S = \left\{[1 \ 1 \ 1]^{\tr}, [5 \ -1 \ 2]^{\tr}\right\}\) in \(\R^3\)

(b)

\(S = \left\{[1 \ 0 \ 2]^{\tr}, [-1 \ 1 \ 1]^{\tr}, [1 \ 1 \ 19]^{\tr} \right\}\) in \(\R^3\)

(c)

\(S = \left\{[1 \ 0 \ 1 \ 0 \ 1 \ 0 \ 1]^{\tr}, [-1 \ 2 \ 3 \ 0 \ 1 \ 0 \ 1]^{\tr}, [1 \ 0 \ 4 \ -1 \ 2 \ 0 \ 1]^{\tr}, [1 \ 1 \ 1 \ 1 \ 1 \ 1 \ 1]^{\tr} \right\}\) in \(\R^7\)

5.

Let \(S = \{\vv_1, \vv_2, \vv_3\}\) be a set of linearly dependent vectors in \(\R^n\) for some integer \(n \geq 3\text{.}\) What is the result if the Gram-Schmidt process is applied to the set \(S\text{?}\) Explain.

6.

A fellow student wants to find a QR factorization for a \(3 \times 4\) matrix. What would you tell this student and why?

7.

Find the QR factorizations of the given matrices.

(a)

\(\left[ \begin{array}{ccc} 1\amp 2\amp 3 \\ 0\amp 4\amp 5 \\ 0\amp 0\amp 6 \end{array} \right]\)

(b)

\(\left[ \begin{array}{cc} 1\amp 2 \\ 1\amp 3 \\ 1\amp 2 \\ 1\amp 3 \end{array} \right]\)

(c)

\(\left[ \begin{array}{ccc} 0\amp 1\amp 0\\0\amp 0\amp 0\\3\amp 0\amp 4 \\ 0\amp 0\amp 5 \end{array} \right]\text{.}\)

8.

Find an orthonormal basis \(\{\vu_1, \vu_2, \vu_3\}\) of \(\R^3\) such that \(\Span\{\vu_1,\vu_2\} = \Span\{[1 \ 2 \ 3]^{\tr}, [1 \ 1 \ 0]^{\tr}\}\text{.}\)

9.

(a)

Let \(A = \left[ \begin{array}{cc} 2\amp 1\\2\amp 6\\2\amp 6\\2\amp 1 \end{array} \right] = [\va_1 \ \va_2]\text{.}\) A QR factorization of \(A\) has \(Q = [\vu_1 \ \vu_2]\) with \(\vu_1 = \frac{1}{2}[1 \ 1 \ 1 \ 1]^{\tr}\) and \(\vu_2 = \frac{1}{2}[-1 \ 1 \ 1 \ -1]^{\tr}\text{,}\) and \(\vu_3 = \frac{1}{\sqrt{12}}[1 \ -1 \ -1 \ 3]^{\tr}\) and \(R = [r_{ij}] = \left[ \begin{array}{cc} 4\amp 7 \\ 0\amp 5 \end{array} \right]\text{.}\) Calculate the dot products \(\va_i \cdot \vu_j\) for \(i\) and \(j\) between \(1\) and \(2\text{.}\) How are these dot products connected to the entries of \(R\text{?}\)

(b)

Explain why the result of part (a) is true in general. That is, if \(A = [\va_1 \ \ \va_2 \ \ldots \ \va_n]\) has QR factorization with \(Q = [\vu_1 \ \vu_2 \ \ldots \ \vu_n]\) and \(R = [r_{ij}]\text{,}\) then \(r_{ij} = \va_j \cdot \vu_i\text{.}\)

10.

Upper triangular matrices play an important role in the QR decomposition. In this exercise we examine two properties of upper triangular matrices.

(a)

Show that if \(R\) and \(S\) are upper triangular \(n \times n\) matrices with positive diagonal entries, then \(RS\) is also an upper triangular \(n \times n\) matrices with positive diagonal entries.

(b)

Show that if \(R\) is an invertible upper triangular matrix with positive diagonal entries, then \(R^{-1}\) is also an upper triangular matrix with positive diagonal entries.

11.

In this exercise we demonstrate that the QR decomposition of an \(m \times n\) matrix with linearly independent columns is unique.

(a)

Suppose that \(A = QR\text{,}\) where \(Q\) is an \(m \times n\) matrix with orthogonal columns and \(R\) is an \(n \times n\) upper triangular matrix. Show that \(R\) is invertible.

Hint.

What can we say about \(\vx\) if \(R \vx = \vzero\text{?}\)

(b)

Show that the only \(n \times n\) orthogonal upper triangular matrix with positive diagonal entries is the identity matrix \(I_n\text{.}\)

Hint.

Let \(A = [\va_1 \ \va_2 \ \ldots \ \va_n] = [a_{ij}]\) be an \(n \times n\) orthogonal upper triangular matrices with positive diagonal entries. What does that tell us about \(\va_i \cdot \va_j\text{?}\)

(c)

Show that if \(Q_1\) and \(Q_2\) are \(m \times n\) matrices with orthogonal columns, and \(S\) is a matrix such that \(Q_1 = Q_2S\text{,}\) then \(S\) is an orthogonal matrix.

Hint.

Write \(Q_1^{\tr}Q_1\) in terms of \(Q_2\) and \(S\text{.}\)

(d)

Suppose that \(Q_1\) and \(Q_2\) are \(m \times n\) matrices with orthogonal columns and \(R_1\) and \(R_2\) are \(n \times n\) upper triangular matrices such that

\begin{equation*} A = Q_1R_1 = Q_2R_2\text{.} \end{equation*}

Show that \(Q_1 = Q_2\) and \(R_1 = R_2\text{.}\) Conclude that the QR factorization of a matrix is unique.

Hint.

Use the previous parts of this problem along with the results of Exercise 10.

12.

Label each of the following statements as True or False. Provide justification for your response. Throughout, let \(V\) be a vector space.

(a) True/False.

If \(\{\vw_1, \vw_2\}\) is a basis for a subspace \(W\) of \(\R^3\text{,}\) then the vector \(\frac{\vv \cdot \vw_1}{\vw_1 \cdot \vw_1} \vw_1 + \frac{\vv \cdot \vw_2}{\vw_2 \cdot \vw_2} \vw_2\) is the vector in \(W\) closest to \(\vv\text{.}\)

(b) True/False.

If \(W\) is a subspace of \(\R^n\text{,}\) then the vector \(\vv - \proj_{W} \vv\) is orthogonal to every vector in \(W\text{.}\)

(c) True/False.

If \(\vu_1\text{,}\) \(\vu_2\text{,}\) \(\vu_3\) are vectors in \(\R^n\text{,}\) then the Gram-Schmidt process constructs an orthogonal set of vectors \(\{\vv_1, \vv_2, \vv_3\}\) with the same span as \(\{\vu_1, \vu_2, \vu_3\}\text{.}\)

(d) True/False.

Any set \(\{\vu_1, \vu_2, \ldots, \vu_k\}\) of orthogonal vectors in \(\R^n\) can be extended to an orthogonal basis of \(V\text{.}\)

(e) True/False.

If \(A\) is an \(n \times n\) matrix with \(AA^{\tr} = I_n\text{,}\) then the rows of \(A\) form an orthogonal set.

(f) True/False.

Every nontrivial subspace of \(\R^n\) has an orthogonal basis.

(g) True/False.

If \(W\) is a subspace of \(\R^n\) satisfying \(W^\perp=\{\vzero\}\text{,}\) then \(W=\R^n\text{.}\)

Subsection Project: MIMO Systems and Householder Transformations

In a simplified model, a MIMO system will have \(k\) transmitting antennas and \(m\) receiving antennas. We record a transmitted symbol in a vector \(\vx = [x_1 \ x_2 \ \ldots \ x_k]^{\tr}\) (one \(x_i\) for each transmitting antenna) and the received symbol as a vector \(\vy = [y_1 \ y_2 \ \ldots \ y_m]^{\tr}\) (one \(y_j\) for each received symbol).

Between each transmit antenna and each receiver antenna is a fading channel. The MIMO system is the collection of fading channels. If we let \(h_{ij}\) be the fading between the \(j\)th transmitter and \(i\)th receiver, then we can represent this multipath fading as an \(m \times k\) matrix \(H = [h_{ij}]\text{.}\) We assume that there is also some noise in the received signal that we represent by \(n_j\) (for the \(j\)th receiving antenna). Our MIMO system is then represented as the matrix-vector equation

\begin{equation*} \vy = H\vx + \vn\text{,} \end{equation*}

where \(\vn = [n_1 \ n_2 \ \ldots \ n_m]^{\tr}\text{.}\) The goal is to reproduce the original signal \(\vx\) from the received signal \(\vy\text{.}\) This is where the QR decomposition comes into play.

Project Activity 25.8.

To see why and how the QR decomposition is used in MIMO systems, we begin with an example. Assume that we have two transmitters and three receivers, and suppose \(H = \left[ \begin{array}{cc} 2\amp 1\\2\amp 1\\1\amp 5 \end{array} \right]\text{.}\) From this point, to simplify the situation we assume that noise is negligible and consider the system \(\vy = H\vx\text{.}\) Assume that the received signal is \(\vy = \left[ \begin{array}{c} 2\\1\\4 \end{array} \right]\text{.}\)

(a)

Show that the system \(H \vx = \vy\) is inconsistent.

(b)

When we receive a signal, we need to interpret it, and that is difficult if we cannot find a solution. One reason that the system might not have a solution is that the elements in \(\vx\) have to come from a specified alphabet, and so we are looking for solutions that are in that alphabet, and there may be no direct solution in that alphabet. As a result, we generally have to find a “best” solution in the alphabet space. That can be done with the QR decomposition. Assume that \(H\) has a QR decomposition \(H = QR\) with \(Q\) having orthonormal columns and \(R\) a diagonal matrix. Explain how the equation \(\vy = H\vx\) can be transformed into

\begin{equation} Q^{\tr} \vy = R\vx\text{.}\tag{25.2} \end{equation}

(c)

Return to our example of \(H = \left[ \begin{array}{cc} 2\amp 1\\2\amp 1\\1\amp 5 \end{array} \right]\text{.}\) Assume that

\begin{equation*} H = QR = \left[ \begin{array}{cr} \frac{2}{3}\amp -\frac{\sqrt{2}}{6} \\ \frac{2}{3}\amp -\frac{\sqrt{2}}{6} \\ \frac{1}{3}\amp \frac{2\sqrt{2}}{3} \end{array} \right] \left[ \begin{array}{cc} 3\amp 3\\0\amp 3\sqrt{2} \end{array} \right]\text{.} \end{equation*}

Use (25.2) to find \(\vx\) if \(\vy = \left[ \begin{array}{c} 2\\1\\4 \end{array} \right]\text{.}\)

In Project Activity 25.8 we saw that a QR decomposition allows us to replace the system \(\vy = H\vx\) with an upper triangular system \(Q^{\tr}\vy = R\vx\) that has a solution. This solution is what is called a least squares solution (which we will discuss in more detail in a later section). The key to all of this is finding an efficient way to calculate a QR decomposition. This is the focus of the remainder of this project.

Subsection Householder Transformations and the QR Decomposition

There are three main ways to compute a QR decomposition.

The method discussed in this section uses Gram-Schmidt orthogonalization. This method is not recommended for numerical use as it is inherently numerically unstable. Under certain circumstances, cancellation can destroy the accuracy of the computed vectors and the resulting vectors may not be orthogonal.
The method that we will discuss in this project using Householder transformations (developed by Alston S. Householder). In Gaussian elimination, we can zero out entries below the diagonal by multiplying by elementary matrices to perform row operations. Householder transformations do something similar, allowing us to replace columns with columns of mostly zeros, which then allow for more efficient computations.
Givens (or Jacobi) rotations (used by W. Givens and originally invented by Jacobi for use with in solving the symmetric eigenvalue problem) is another method that allows us to selectively produce zeros into a matrix. We will focus on Householder transformations in this project.

The first step in a QR decomposition using Householder matrices is to create a matrix that is upper triangular. To get to this form, we use a Householder transformation. A Householder transformation (or matrix) is a matrix of the form

\begin{equation*} H = I - 2 \frac{\vv \vv^{\tr}}{||\vv||^2}\text{,} \end{equation*}

where \(\vv\) is a vector in \(\R^n\) and \(I\) is the \(n \times n\) identity matrix.

Project Activity 25.9.

We will discover some properties of Householder transformations in this activity.

(a)

Let \(\vx\) be any vector and let \(\vy\) be a unit vector. In this part of the exercise we show that, with a suitable choice of \(\vv\text{,}\) the Householder transformation \(H\) transforms \(\vx\) into a vector of the same length parallel to \(\vy\text{.}\) That is,

\begin{equation*} H \vx = -\sigma \vy \end{equation*}

for some scalar \(\sigma\text{.}\) A similar argument shows that \(H (\vx - \sigma \vy) = \sigma \vy\text{.}\) Letting \(\vy = \ve_1\) gives us the following result.

Lemma 25.9.

Let \(\vx\) be any vector, let \(\sigma = ||\vx||\text{,}\) and let \(\vv = \vx \pm \sigma \ve_1\text{.}\) Then \(H \vx = \mp \sigma \ve_1\text{.}\)

Hint.

Let \(\sigma = ||\vx||\text{,}\) and let \(\vv = \vx + \sigma \vy\text{.}\) Apply \(H\) to \(\vx\text{,}\) factor out \(\vx + \sigma \vy\text{,}\) and ultimately show that \(H \vx = -\sigma \vy\text{.}\)

(b)

The Householder matrix \(H\) has two other important properties. Show that \(H\) is symmetric and orthogonal.

There are many advantages to using Householder matrices. One is that instead of having to store all entries of the matrix \(H = I - 2 \frac{\vv \vv^{\tr}}{||\vv||^2}\text{,}\) all we need to store is the entries of \(\vv\text{.}\) Another advantage of Householder transformations is that they can be used to efficiently calculate QR decompositions. The next project activity shows how to use Lemma 25.9 and Householder transformations to compute a QR decomposition through an example.

Project Activity 25.10.

Assume that we have five receiving antennas and three transmitting antennas, and that

\begin{equation*} A = \left[ \begin{array}{crc} 1.0\amp 0.0\amp 1.0 \\ 7.0\amp 7.0\amp 8.0 \\1.0\amp 2.0\amp 1.0 \\ 7.0\amp 7.0\amp 6.0 \\ 0.0\amp -1.0\amp 0.0 \end{array} \right]\text{.} \end{equation*}

(We use \(A\) now instead of \(H\) so as to avoid confusion with Householder matrices.) The calculations in this activity can get rather messy, so use appropriate technology and round entries to four decimal places to the right of the decimal.

(a)

Let \(\vx_1\) be the first column of \(A\text{.}\) Identify an appropriate Householder transformation \(H_1\) so that \(H_1 \vx\) is a constant multiple of \([1 \ 0 \ 0 \ 0 \ 0]^{\tr}\text{.}\) Determine the matrix \(A_1\text{,}\) where \(A_1 = H_1 A\text{.}\) (Special note: when deciding which of \(\vx \pm \sigma \ve_1\) to use to create \(H\text{,}\) it is best to use the one whose sign is the same as \(x_1\) (the first entry of \(\vx\)). We won't go into the details, but this helps prevent problems due to cancellation,)

(b)

Next we consider just the bottom right \(4 \times 2\) portion \(\hat{A}_2\) of the matrix \(A_1\) found in part (a).

(i)

Repeat the process on \(\hat{A}_2\) to find a Householder matrix \(\hat{H}_2\) that will make the first column of \(\hat{A}_2\) have all zeros below its first entry.

(ii)

Let

\begin{equation*} H_2 = \left[ \begin{array}{cc} I_1\amp 0 \\ 0\amp \hat{H}_2 \end{array} \right]\text{.} \end{equation*}

Explain what the matrix \(A_2 = H_2H_1A\) is.

(c)

As a final step, we consider just the bottom right \(3 \times 1\) portion \(\hat{A}_3\) of the matrix \(A_2\) and repeat the process.

(i)

Find a matrix \(\hat{H}_3\) that produces zeros below the first entry.

(ii)

Let

\begin{equation*} H_3 = \left[ \begin{array}{cc}I_2\amp 0 \\ 0\amp \hat{H}_3 \end{array} \right]\text{.} \end{equation*}

What matrix is \(H_3H_2H_1A\text{?}\) Why?

(d)

Explain why \(Q = H_1H_2H_3\) is an orthogonal matrix. Then find an upper triangular matrix \(R\) such that \(A = QR\text{.}\)

Project Activity 25.10 illustrates the general method for finding a QR decomposition of a matrix \(A\) by using Householder transformations to reduce \(A\) to an upper triangular matrix \(R\text{.}\) The product of the Householder matrices is then the orthogonal matrix \(Q\text{.}\) One final note: You may have noticed that the matrix \(Q\) you found in this project was a square \(5 \times 5\) matrix and \(R\) was a \(5 \times 3\) matrix, and so they have different sizes than the QR decomposition in this section. We could use the Gram-Schmidt process to obtain these larger matrices by extending the columns of a \(5 \times 3\) matrix \(Q\) to form an orthonormal basis for \(\R^5\text{.}\) However, if we want a QR decomposition for \(A\) in which \(Q\) is \(5 \times 3\) and \(R\) is \(3 \times 3\) from the work we did in Project Activity 25.10, we can simply delete the last two columns of the matrix \(Q\) and the last two rows of the matrix \(R\) we calculated. This smaller QR decomposition is often referred to as a thin QR decomposition.

Jack Dongarra and Francis Sullivan. Introduction to the top 10 algorithms. Computing in Science and Engineering, 2: 22-23, 2000.

[3] Beresford N. Parlett. The QR algorithm. Computing in Science and Engineering, 2:38-42, 2000.

Recall that the rank of a matrix \(A\) is the dimension of the column space of \(A\text{.}\)