Skip to main content

Section 24 Orthogonal and Orthonormal Bases in \(\R^n\)

Subsection Application: Rotations in 3D

An aircraft in flight, like a plane or the space shuttle, can perform three independent rotations: roll, pitch, and yaw. Roll is a rotation about the axis through the nose and tail of the aircraft, pitch is rotation moving the nose of the aircraft up or down through the axis from wingtip to wingtip, and yaw is the rotation when the nose of the aircraft turns left or right about the axis though the plane from top to bottom. These rotations take place in \(3\)-space and the axes of the rotations change as the aircraft travels through space. To understand how aircraft maneuver, it is important to know about general rotations in space. These are more complicated than rotations in \(2\)-space, and, as we will see later in this section, involve orthogonal sets.

Subsection Introduction

If \(\CB = \{\vv_1, \vv_2, \ldots, \vv_m\}\) is a basis for a subspace \(W\) of \(\R^n\text{,}\) we know that any vector \(\vw\) in \(W\) can be written uniquely as a linear combination of the vectors in \(\CB\text{.}\) In the past, the way we have found the coordinates of \(\vx\) with respect to \(\CB\text{,}\) i.e. the weights needed to write a vector \(\vx\) as a linear combination of the elements in \(\CB\text{,}\) has been to row reduce the matrix \([\vv_1 \ \vv_2 \ \cdots \ \vv_m \ | \ \vx]\) to solve the corresponding system. This can be a cumbersome process, especially if we need to do it many times. This process also forces us to determine all of the weights at once. For certain types of bases, namely the orthogonal and orthonormal bases, there is a much easier way to find individual weights for this linear combination.

Recall that two nonzero vectors \(\vu\) and \(\vv\) in \(\R^n\) are orthogonal if \(\vu \cdot \vv = 0\text{.}\) We can extend this idea to an entire set. For example, the standard basis \(\CS = \{\ve_1, \ve_2, \ve_3\}\) for \(\R^3\) has the property that any two distinct vectors in \(\CS\) are orthogonal to each other. The basis vectors in \(\CS\) make a very nice coordinate system for \(\R^3\text{,}\) where the basis vectors provide the directions for the coordinate axes. We could rotate this standard basis, or multiply any of the vectors in the basis by a nonzero constant, and retain a basis in which all distinct vectors are orthogonal to each other (e.g., \(\{[2 \ 0 \ 0]^{\tr}, [0 \ 3 \ 0]^{\tr}, [0 \ 0 \ 1]^{\tr}\}\)). We define this idea of having all vectors be orthogonal to each other for sets, and then for bases.

Definition 24.1.

A non-empty subset \(S\) of \(\R^n\) is orthogonal if \(\vu \cdot \vv = 0\) for every pair of distinct vectors \(\vu\) and \(\vv\) in \(S\text{.}\)

Preview Activity 24.1.

(a)

Determine if the set \(S = \{[1 \ 2 \ 1]^{\tr}, [2 \ -1 \ 0]^{\tr}\}\) is an orthogonal set.

(b)

Orthogonal bases are especially important.

Definition 24.2.

An orthogonal basis \(\CB\) for a subspace \(W\) of \(\R^n\) is a basis of \(W\) that is also an orthogonal set.

Let \(\CB = \{\vv_1, \vv_2, \vv_3\}\text{,}\) where \(\vv_1 = [1 \ 2 \ 1]^{\tr}\text{,}\) \(\vv_2 = [2 \ -1 \ 0]^{\tr}\text{,}\) and \(\vv_3 = [1 \ 2 \ -5]^{\tr}\text{.}\)

(i)

Explain why \(\CB\) is an orthogonal basis for \(\R^3\text{.}\)

(ii)

Suppose \(\vx\) has coordinates \(x_1, x_2, x_3\) with respect to the basis \(\CB\text{,}\) i.e.

\begin{equation*} \vx = x_1 \vv_1 + x_2 \vv_2 + x_3 \vv_3 \,\text{.} \end{equation*}

Substitute this expression for \(\vx\) in \(\vx \cdot \vv_1\) and use the orthogonality property of the basis \(\CB\) to show that \(x_1 = \frac{\vx \cdot \vv_1}{\vv_1 \cdot \vv_1}\text{.}\) Then determine \(x_2\) and \(x_3\) similarly. Finally, calculate the values of \(x_1\text{,}\) \(x_2\text{,}\) and \(x_3\) if \(\vx = [1 \ 1 \ 1]^{\tr}\text{.}\)

(iii)

Find components of \(\vx = [1 \ 1 \ 1]^{\tr}\) by reducing the augmented matrix \([\vv_1 \ \vv_2 \ \vv_3 \ | \ \vx]\text{.}\) Does this result agree with your work from the previous part?

Subsection Orthogonal Sets

We defined orthogonal sets in \(\R^n\) and bases of subspaces of \(\R^n\) in Definitions 24.1 and Definition 24.2. We saw that the standard basis in \(\R^3\) is an orthogonal set and an orthogonal basis of \(\R^3\) — there are many other examples as well.

Activity 24.2.

Let \(\vw_1 = \left[ \begin{array}{r} -2 \\ 1 \\ -1 \end{array} \right]\text{,}\) \(\vw_2 = \left[ \begin{array}{r} 0 \\ 1 \\ 1 \end{array} \right]\text{,}\) and \(\vw_3 = \left[ \begin{array}{r} 1 \\ 1 \\ -1 \end{array} \right]\text{.}\) In the same manner as in Preview Activity 24.1, we can show that the set \(S_1 = \left\{ \vw_1, \vw_2, \vw_3 \right\}\) is an orthogonal subset of \(\R^3\text{.}\)

(a)

Is the set \(S_2 = \left\{ \left[ \begin{array}{r} -2 \\ 1 \\ -1 \end{array} \right], \left[ \begin{array}{r} 0 \\ 1 \\ 1 \end{array} \right], \left[ \begin{array}{r} 1 \\ 1 \\ -1 \end{array} \right], \left[ \begin{array}{r} 1 \\ 2 \\ 0 \end{array} \right] \right\}\) an orthogonal subset of \(\R^3\text{?}\)

(b)

Suppose a vector \(\vv\) is a vector so that \(S_1 \cup \{\vv\}\) is an orthogonal subset of \(\R^3\text{.}\) Then \(\vw_i \cdot \vv = 0\) for each \(i\text{.}\) Explain why this implies that \(\vv\) is in \(\Nul A\text{,}\) where \(A = \left[ \begin{array}{rcr} -2\amp 1\amp -1 \\ 0\amp 1\amp 1 \\ 1\amp 1\amp -1 \end{array} \right]\text{.}\)

(c)

Assuming that the reduced row echelon form of the matrix \(A\) is \(I_3\text{,}\) explain why it is not possible to find a nonzero vector \(\vv\) so that \(S_1 \cup \{\vv\}\) is an orthogonal subset of \(\R^3\text{.}\)

The example from Activity 24.2 suggests that we can have three orthogonal nonzero vectors in \(\R^3\text{,}\) but no more. Orthogonal vectors are, in a sense, as far apart as they can be. So we might expect that there is no linear relationship between orthogonal vectors. The following theorem makes this clear.

Proof.

Let \(S = \{\vv_1, \vv_2, \ldots, \vv_m\}\) be a set of nonzero orthogonal vectors in \(\R^n\text{.}\) To show that \(\vv_1\text{,}\) \(\vv_2\text{,}\) \(\ldots\text{,}\) \(\vv_m\) are linearly independent, assume that

\begin{equation} x_1\vv_1 + x_2\vv_2 + \cdots + x_m \vv_m = \vzero\tag{24.1} \end{equation}

for some scalars \(x_1\text{,}\) \(x_2\text{,}\) \(\ldots\text{,}\) \(x_m\text{.}\) We will show that \(x_i = 0\) for each \(i\) from 1 to \(m\text{.}\) Since the vectors in \(S\) are orthogonal to each other, we know that \(\vv_i \cdot \vv_j = 0\) whenever \(i \neq j\text{.}\) Fix an index \(k\) between 1 and \(m\text{.}\) We evaluate the dot product of both sides of (24.1) with \(\vv_k\) and simplify using the dot product properties:

\begin{align} \vv_k \cdot (x_1\vv_1 + x_2\vv_2 + \cdots + x_m \vv_m) \amp = \vv_k \cdot \vzero\notag\\ (\vv_k \cdot x_1\vv_1) + (\vv_k \cdot x_2\vv_2) + \cdots + (\vv_k \cdot x_m \vv_m) \amp = 0\notag\\ x_1(\vv_k \cdot \vv_1) + x_2(\vv_k \cdot \vv_2) + \cdots + x_m(\vv_k \cdot \vv_m) \amp = 0\text{.}\tag{24.2} \end{align}

Now all of the dot products on the left are 0 except for \(\vv_k \cdot \vv_k\text{,}\) so (24.2) becomes

\begin{equation*} x_k (\vv_k \cdot \vv_k) = 0\text{.} \end{equation*}

We assumed that \(\vv_k \neq \vzero\) and since \(\vv_k \cdot \vv_k = || \vv_k ||^2 \neq 0\text{,}\) we conclude that \(x_k = 0\text{.}\) We chose \(k\) arbitrarily, so we have shown that \(x_k =0\) for each \(k\) between 1 and \(m\text{.}\) Therefore, the only solution to equation (24.1) is the trivial solution with \(x_1 = x_2 = \cdots = x_m = 0\) and the set \(S\) is linearly independent.

Subsection Properties of Orthogonal Bases

Orthogonality is a useful and important property for a basis to have. In Preview Activity 24.1 we saw that if a vector \(\vx\) in the span of an orthogonal basis \(\{\vv_1, \vv_2, \vv_3\}\) could be written as a linear combination of the basis vectors as \(\vx = x_1 \vv_1 + x_2 \vv_2 + x_3 \vv_3\text{,}\) then \(x_1 = \frac{\vx \cdot \vv_1}{\vv_1 \cdot \vv_1}\text{.}\) If we continued that same argument we could show that

\begin{equation*} \vx = \left(\frac{\vx \cdot \vv_1}{\vv_1 \cdot \vv_1}\right) \vv_1 + \left(\frac{\vx \cdot \vv_2}{\vv_2 \cdot \vv_2}\right) \vv_2 + \left(\frac{\vx \cdot \vv_3}{\vv_3 \cdot \vv_3}\right) \vv_3\text{.} \end{equation*}

We can apply this idea in general to see how the orthogonality of an orthogonal basis allows us to quickly and easily determine the weights to write a given vector as a linear combination of orthogonal basis vectors. To see why, let \(\CB = \{\vv_1, \vv_2, \ldots, \vv_m\}\) be an orthogonal basis for a subspace \(W\) of \(\R^n\) and let \(\vx\) be any vector in \(W\text{.}\) We know that

\begin{equation*} \vx = x_1\vv_1 + x_2\vv_2 + \cdots + x_m \vv_m \end{equation*}

for some scalars \(x_1\text{,}\) \(x_2\text{,}\) \(\ldots\text{,}\) \(x_m\text{.}\) Let \(1\leq k\leq m\text{.}\) Then, using orthogonality of vectors \(\vv_1, \vv_2, \ldots, \vv_m\text{,}\) we have

\begin{equation*} \vv_k \cdot \vx = x_1(\vv_k \cdot \vv_1) + x_2(\vv_k \cdot \vv_2) + \cdots + x_m(\vv_k \cdot \vv_m) = x_k \vv_k \cdot \vv_k\text{.} \end{equation*}

So

\begin{equation*} x_k = \ds \frac{\vx \cdot \vv_k}{\vv_k \cdot \vv_k}\text{.} \end{equation*}

Thus, we can calculate each weight individually with two simple dot products. We summarize this discussion in the next theorem.

Activity 24.3.

Let \(\vv_1 = [1 \ 0 \ 1]^{\tr}\text{,}\) \(\vv_2 = [0 \ 1 \ 0]^{\tr}\text{,}\) and \(\vv_3 = [0 \ 0 \ 1]^{\tr}\text{.}\) The set \(\CB = \{\vv_1, \vv_2, \vv_3\}\) is a basis for \(\R^3\text{.}\) Let \(\vx = [1 \ 0 \ 0]^{\tr}\text{.}\) Calculate

\begin{equation*} \frac{\vx \cdot \vv_1}{\vv_1 \cdot \vv_1}\vv_1 + \frac{\vx \cdot \vv_2}{\vv_2 \cdot \vv_2}\vv_2 + \frac{\vx \cdot \vv_3}{\vv_3 \cdot \vv_3}\vv_3\text{.} \end{equation*}

Compare to \(\vx\text{.}\) Does this violate Theorem 24.4? Explain.

Subsection Orthonormal Bases

The decomposition (24.3) is even simpler if \(\vv_k \cdot \vv_k = 1\) for each \(k\text{,}\) that is, if \(\vv_k\) is a unit vector for each \(k\text{.}\) In this case, the denominators are all 1 and we don't even need to consider them. We have a familiar example of such a basis for \(\R^n\text{,}\) namely the standard basis \(\CS = \{\ve_1, \ve_2, \ldots, \ve_n\}\text{.}\)

Recall that

\begin{equation*} \vv \cdot \vv = || \vv ||^2\text{,} \end{equation*}

so the condition \(\vv \cdot \vv = 1\) implies that the vector \(\vv\) has norm 1. An orthogonal basis with this additional condition is a very nice basis and is given a special name.

Definition 24.5.

An orthonormal basis \(\CB = \{\vu_1, \vu_2, \ldots, \vu_m\}\) for a subspace \(W\) of \(\R^n\) is an orthogonal basis such that \(|| \vu_k || = 1\) for \(1\leq k\leq m\text{.}\)

In other words, an orthonormal basis is an orthogonal basis in which every basis vector is a unit vector. A good question to ask here is how we can construct an orthonormal basis from an orthogonal basis.

Activity 24.4.

(a)

Let \(\vv_1\) and \(\vv_2\) be orthogonal vectors. Explain how we can obtain unit vectors \(\vu_1\) in the direction of \(\vv_1\) and \(\vu_2\) in the direction of \(\vv_2\text{.}\)

(b)

Show that \(\vu_1\) and \(\vu_2\) from the previous part are orthogonal vectors.

(c)

Use the ideas from this problem to construct an orthonormal basis for \(\R^3\) from the orthogonal basis \(S = \left\{ \left[ \begin{array}{r} -2 \\ 1 \\ -1 \end{array} \right], \left[ \begin{array}{r} 0 \\ 1 \\ 1 \end{array} \right], \left[ \begin{array}{r} 1 \\ 1 \\ -1 \end{array} \right] \right\}\text{.}\)

In general, we can construct an orthonormal basis \(\{\vu_1, \vu_2, \ldots, \vu_m\}\) from an orthogonal basis \(\CB = \{\vv_1, \vv_2, \ldots, \vv_m\}\) by normalizing each vector in \(\CB\) (that is, dividing each vector by its norm).

Subsection Orthogonal Matrices

We have seen in the diagonalization process that we diagonalize a matrix \(A\) with a matrix \(P\) whose columns are linearly independent eigenvectors of \(A\text{.}\) In general, calculating the inverse of the matrix whose columns are eigenvectors of \(A\) in the diagonalization process can be time consuming, but if the columns form an orthonormal set, then the calculation is very straightforward.

Activity 24.5.

Let \(\vu_1 = \frac{1}{3}[2 \ 1 \ 2]^{\tr}\text{,}\) \(\vu_2 = \frac{1}{3}[-2 \ 2 \ 1]^{\tr}\text{,}\) and \(\vu_3 = \frac{1}{3}[1 \ 2 \ -2]^{\tr}\text{.}\) It is not difficult to see that the set \(\{\vu_1, \vu_2, \vu_3\}\) is an orthonormal basis for \(\R^3\text{.}\) Let

\begin{equation*} A = [\vu_1 \ \vu_2 \ \vu_3] = \frac{1}{3} \left[ \begin{array}{crr} 2\amp -2\amp 1\\1\amp 2\amp 2\\2\amp 1\amp -2 \end{array} \right]\text{.} \end{equation*}
(a)

Use the definition of the matrix-matrix product to find the entries of the second row of the matrix product \(A^{\tr}A\text{.}\) Why should you have expected the result?

Hint.

How are the rows of \(A^{\tr}\) related to the columns of \(A\text{?}\)

(b)

With the result of part (a) in mind, what is the matrix product \(A^{\tr}A\text{?}\) What does this tell us about the relationship between \(A^{\tr}\) and \(A^{-1}\text{?}\) Use technology to calculate \(A^{-1}\) and confirm your answer.

(c)

Suppose \(P\) is an \(n \times n\) matrix whose columns form an orthonormal basis for \(\R^n\text{.}\) Explain why \(P^{\tr}P = I_n\text{.}\)

The result of Activity 24.5 is that if the columns of a square matrix \(P\) form an orthonormal set, then \(P^{-1} = P^{\tr}\text{.}\) This makes calculating \(P^{-1}\) very easy. Note, however, that this only works if the columns of \(P\) form an orthonormal basis for \(\Col P\text{.}\) You should also note that if \(P\) is an \(n \times n\) matrix satisfying \(P^{\tr}P = I_n\text{,}\) then the columns of \(P\) must form an orthonormal set. Matrices like this appear quite often and are given a special name.

Definition 24.6.

An orthogonal matrix is an \(n \times n\) matrix \(P\) such that \(P^{\tr}P = I_n\text{.}\) 41 

Activity 24.6.

As a special case, we apply the result of Activity 24.5 to a \(2 \times 2\) rotation matrix \(P = \left[ \begin{array}{cr} \cos(\theta) \amp -\sin(\theta) \\ \sin(\theta) \amp \cos(\theta) \end{array} \right]\text{.}\)

(a)

Show that the columns of \(P\) form an orthonormal set.

(b)

Use the fact that \(P^{-1} = P^{\tr}\) to find \(P^{-1}\text{.}\) Explain how this shows that the inverse of a rotation matrix by an angle \(\theta\) is just another rotation matrix but by the angle \(-\theta\text{.}\)

Orthogonal matrices are useful because they satisfy some special properties. For example, if \(P\) is an orthogonal \(n \times n\) matrix and \(\vx, \vy \in \R^n\text{,}\) then

\begin{equation*} (P\vx) \cdot (P\vy) = (P\vx)^{\tr}(P\vy) = \vx^{\tr}P^{\tr}P\vy = \vx^{\tr}\vy = \vx \cdot \vy\text{.} \end{equation*}

This property tells us that the matrix transformation \(T\) defined by \(T(\vx) = P\vx\) preserves dot products and, hence, orthogonality. In addition,

\begin{equation*} ||P\vx||^2 = P\vx \cdot P\vx = \vx \cdot \vx = ||\vx||^2\text{,} \end{equation*}

so \(||P\vx|| = ||\vx||\text{.}\) This means that \(T\) preserves length. Such a transformation is called an isometry and it is convenient to work with functions that don't expand or contract things. Moreover, if \(\vx\) and \(\vy\) are nonzero vectors, then

\begin{equation*} \frac{P\vx \cdot P\vy}{||P\vx|| \ ||P\vy||} = \frac{\vx \cdot \vy}{||\vx|| \ ||\vy||}\text{.} \end{equation*}

Thus \(T\) also preserves angles. Transformations defined by orthogonal matrices are very well behaved transformations. To summarize,

We have discussed orthogonal and orthonormal bases for subspaces of \(\R^n\) in this section. There are reasonable questions that follow, such as

  • Can we always find an orthogonal (or orthonormal) basis for any subspace of \(\R^n\text{?}\)

  • Given a vector \(\vv\) in \(W\text{,}\) can we find an orthogonal basis of \(W\) that contain \(\vv\text{?}\)

We will answer these questions in subsequent sections.

Subsection Examples

What follows are worked examples that use the concepts from this section.

Example 24.8.

Let \(S = \{\vv_1, \vv_2, \vv_3\}\text{,}\) where \(\vv_1 = [1 \ 1 \ -4]^{\tr}\text{,}\) \(\vv_2 = [2 \ 2 \ 1]^{\tr}\text{,}\) and \(\vv_3 = [1 \ -1 \ 0]^{\tr}\text{.}\)

(a)

Show that \(S\) is an orthogonal set.

Solution.

Using the dot product formula, we see that \(\vv_1 \cdot \vv_2 = 0\text{,}\) \(\vv_1 \cdot \vv_3 = 0\text{,}\) and \(\vv_2 \cdot \vv_3 = 0\text{.}\) Thus, the set \(S\) is an orthogonal set.

(b)

Create an orthonormal set \(S' = \{\vu_1, \vu_2, \vu_3\}\) from the vectors in \(S\text{.}\)

Solution.

To make an orthonormal set \(S' = \{\vu_1, \vu_2, \vu_3\}\) from \(S\text{,}\) we divide each vector in \(S\) by its magnitude. This gives us

\begin{equation*} \vu_1 = \frac{1}{\sqrt{18}}[1 \ 1 \ -4]^{\tr}, \ \vu_2 = \frac{1}{3}[2 \ 2 \ 1]^{\tr}, \ \text{ and } \ \vu_3 = \frac{1}{\sqrt{2}} [1 \ -1 \ 0]^{\tr}\text{.} \end{equation*}
(c)

Just by calculating dot products, write the vector \(\vw = [2 \ 1 \ -1]^{\tr}\) as a linear combination of the vectors in \(S'\text{.}\)

Solution.

Since \(S'\) is an orthonormal basis for \(\R^3\text{,}\) we know that

\begin{align*} \vw \amp = (\vw \cdot \vu_1) \vu_1 + (\vw \cdot \vu_2) \vu_2 + (\vw \cdot \vu_3) \vu_3\\ \amp = \frac{7}{\sqrt{18}}[1 \ 1 \ -4]^{\tr} + \frac{5}{3}[2 \ 2 \ 1]^{\tr} + \frac{1}{\sqrt{2}}[1 \ -1 \ 0]^{\tr}\text{.} \end{align*}

Example 24.9.

Let \(\vu_1 = \frac{1}{\sqrt{3}}[1 \ 1 \ 1]^{\tr}\text{,}\) \(\vu_2 = \frac{1}{\sqrt{2}}[1 \ -1 \ 0]^{\tr}\text{,}\) and \(\vu_3 = \frac{1}{\sqrt{6}}[1 \ 1 \ -2]^{\tr}\text{.}\) Let \(\CB = \{\vu_1, \vu_2, \vu_3\}\text{.}\)

(a)

Show that \(\CB\) is an orthonormal basis for \(\R^3\text{.}\)

Solution.

Using the dot product formula, we see that \(\vu_i \cdot \vu_j = 0\) if \(i \neq j\) and that \(\vu_i \cdot \vu_i = 1\) for each \(i\text{.}\) Since orthogonal vectors are linearly independent, the set \(\CB\) is a linearly independent set with \(3\) vectors in a \(3\)-dimensional space. It follows that \(\CB\) is an orthonormal basis for \(\R^3\text{.}\)

(b)

Let \(\vw = [1 \ 2 \ 1]^{\tr}\text{.}\) Find \([\vw]_{\CB}\text{.}\)

Solution.

Since \(\CB\) is an orthonormal basis for \(\R^3\text{,}\) we know that

\begin{equation*} \vw = (\vw \cdot \vu_1) \vu_1 + (\vw \cdot \vu_2) \vu_2 + (\vw \cdot \vu_3) \vu_3\text{.} \end{equation*}

Therefore,

\begin{equation*} [\vw]_{\CB} = [(\vw \cdot \vu_1) \ (\vw \cdot \vu_2) \ (\vw \cdot \vu_3)]^{\tr} = \left[ \frac{4}{\sqrt{3}} \ -\frac{1}{\sqrt{2}} \ \frac{1}{\sqrt{6}} \right]^{\tr}\text{.} \end{equation*}
(c)

Calculate \(||\vw||\) and \(\left|\left|[\vw]_{\CB}\right|\right|\text{.}\) What do you notice?

Solution.

Using the definition of the norm of a vector we have

\begin{align*} ||\vw|| \amp = \sqrt{1^2+2^2+1^2} = \sqrt{6}\\ \left|\left|[\vw]_{|CB}\right|\right| \amp = \sqrt{\left( \frac{4}{\sqrt{3}}\right)^2+\left(-\frac{1}{\sqrt{2}}\right)^2 + \left(\frac{1}{\sqrt{6}}\right)^2} = \sqrt{6}\text{.} \end{align*}

So in this case we have \(||\vw|| = \left|\left|[\vw]_{\CB}\right|\right|\text{.}\)

(d)

Show that the result of part (c) is true in general. That is, if \(\CS = \{\vv_1, \vv_2, \ldots, \vv_n\}\) is an orthonormal basis for \(\R^n\text{,}\) and if \(\vz = c_1\vv_1 + c_2 \vv_2 + \cdots + c_n \vv_n\text{,}\) then

\begin{equation*} ||\vz|| =\sqrt{c_1^2+c_2^2 + \cdots + c_n^2}\text{.} \end{equation*}
Solution.

Let \(\CS = \{\vv_1, \vv_2, \ldots, \vv_n\}\) be an orthonormal basis for \(\R^n\text{,}\) and suppose that \(\vz = c_1\vv_1 + c_2 \vv_2 + \cdots + c_n \vv_n\text{.}\) Then

\begin{align} || \vz || \amp = \sqrt{\vz \cdot \vz}\notag\\ \amp = \sqrt{ (c_1\vv_1 + c_2 \vv_2 + \cdots + c_n \vv_n) \cdot (c_1\vv_1 + c_2 \vv_2 + \cdots + c_n \vv_n)}\text{.}\tag{24.4} \end{align}

Since \(\CS\) is an orthonormal basis for \(\R^n\text{,}\) it follows that \(\vv_i \cdot \vv_j = 0\) if \(i \neq j\) and \(\vv_ \cdot \vv_i = 1\text{.}\) Expanding the dot product in (24.4), the only terms that won't be zero are the ones that involve \(\vv_i \cdot \vv_i\text{.}\) This leaves us with

\begin{align*} || \vz || \amp = \sqrt{ (c_1\vv_1 + \cdots + c_n \vv_n) \cdot (c_1\vv_1 + \cdots + c_n \vv_n)}\\ \amp = \sqrt{c_1c_1 (\vv_1 \cdot \vv_1) + c_2c_2 (\vv_2 \cdot \vv_2) + \cdots + c_nc_n (\vv_n \cdot \vv_n)}\\ \amp = \sqrt{c_1^2+c_2^2 + \cdots + c_n^2}\text{.} \end{align*}

Subsection Summary

  • A subset \(S\) of \(\R^n\) is an orthogonal set if \(\vu \cdot \vv = 0\) for every pair of distinct vector \(\vu\) and \(\vv\) in \(S\text{.}\)

  • Any orthogonal set of nonzero vectors is linearly independent.

  • A basis \(\CB\) for a subspace \(W\) of \(\R^n\) is an orthogonal basis if \(\CB\) is also an orthogonal set.

  • An orthogonal basis \(\CB\) for a subspace \(W\) of \(\R^n\) is an orthonormal basis if each vector in \(\CB\) has unit length.

  • If \(\CB = \{\vv_1, \vv_2, \ldots, \vv_m\}\) is an orthogonal basis for a subspace \(W\) of \(\R^n\) and \(\vx\) is any vector in \(W\text{,}\) then

    \begin{equation*} \vx = \sum_{i=1}^m c_i \vv_i \end{equation*}

    where \(c_i = \frac{\vx \cdot \vv_i}{\vv_i \cdot \vv_i}\text{.}\)

  • An \(n \times n\) matrix \(P\) is an orthogonal matrix if \(P^{\tr}P = I_n\text{.}\) Orthogonal matrices are important, in part, because the matrix transformations they define are isometries.

Exercises Exercises

1.

Find an orthogonal basis for the subspace \(W = \{ [x \ y \ z] : 4x-3z = 0\}\) of \(\R^3\text{.}\)

2.

Let \(\{\vv_1, \vv_2, \ldots, \vv_n\}\) be an orthogonal basis for \(\R^n\) and, for some \(k\) between 1 and \(n\text{,}\) let \(W = \Span\{\vv_1,\vv_2, \ldots, \vv_k\}\text{.}\) Show that \(\{\vv_{k+1}, \vv_{k+2}, \ldots, \vv_n\}\) is a basis for \(W^{\perp}\text{.}\)

3.

Let \(W\) be a subspace of \(\R^n\) for some \(n\text{,}\) and let \(\{\vw_1, \vw_2, \ldots, \vw_k\}\) be an orthogonal basis for \(W\text{.}\) Let \(\vx\) be a vector in \(\R^n\text{.}\) and define \(\vw\) as

\begin{equation*} \vw = \frac{\vx \cdot \vw_1}{\vw_1 \cdot \vw_1} \vw_1 + \frac{\vx \cdot \vw_2}{\vw_2 \cdot \vw_2} \vw_2 + \cdots + \frac{\vx \cdot \vw_k}{\vw_k \cdot \vw_k} \vw_k\text{.} \end{equation*}
(a)

Explain why \(\vw\) is in \(W\text{.}\)

Hint.

Where are \(\vw_1\text{,}\) \(\vw_2\text{,}\) \(\ldots\text{,}\) \(\vw_k\text{?}\)

(b)

Let \(\vz = \vx - \vw\text{.}\) Show that \(\vz\) is in \(W^{\perp}\text{.}\)

Hint.

Take the dot product of \(\vz\) with \(\vw_i\text{.}\)

(c)

Explain why \(\vx\) can be written as a sum of vectors, one in \(W\) and one in \(W^{\perp}\text{.}\)

Hint.

Use parts (a) and (b).

(d)

Suppose \(\vx = \vw+\vw_1\) and \(\vx = \vu+\vu_1\text{,}\) where \(\vw\) and \(\vu\) are in \(W\) and \(\vw_1\) and \(\vu_1\) are in \(W^{\perp}\text{.}\) Show that \(\vw=\vu\) and \(\vw_1 = \vu_1\text{,}\) so that the representation of \(\vx\) as a sum of a vector in \(W\) and a vector in \(W^{\perp}\) is unique.

Hint.

Collect terms in \(W\) and in \(W^{\perp}\text{.}\)

4.

Use the result of problem Exercise 3 above and that \(W\cap W^\perp=\{\vzero\}\) to show that \(\dim(W)+\dim(W^\perp)=n\) for a subspace \(W\) of \(\R^n\text{.}\) (See Exercise 13 in Section 12 for the definition of the sum of subspaces.)

5.

Let \(P\) be an \(n \times n\) matrix. We showed that if \(P\) is an orthogonal matrix, then \((P\vx) \cdot (P\vy) = \vx \cdot \vy\) for any vectors \(\vx\) and \(\vy\) in \(\R^n\text{.}\) Now we ask if the converse of this statement is true. That is, determine the validity of the following statement: if \((P\vx) \cdot (P\vy) = \vx \cdot \vy\) for any vectors \(\vx\) and \(\vy\) in \(\R^n\text{,}\) then \(P\) is an orthogonal matrix? Verify your answer.

Hint.

Consider \((P \ve_i) \cdot (P \ve_j)\) where \(\ve_t\) is the \(t\)th standard basis vector for \(\R^n\text{.}\)

6.

In this exercise we examine reflection matrices. In the following exercise we will show that the reflection and rotation matrices are the only \(2 \times 2\) orthogonal matrices. We will determine how to represent the reflection across a line through the origin in \(\R^2\) as a matrix transformation. The setup is as follows. Let \(L(\theta)\) be the line through the origin in \(\R^2\) that makes an angle \(\theta\) with the positive \(x\)-axis as illustrated in Figure 24.10.

Figure 24.10. Reflecting across a line \(L(\theta)\text{.}\)
(a)

Find a unit vector \(\vu\) in the direction of the line \(L(\theta)\text{.}\)

(b)

Let \(\vv = \left[ \begin{array}{c} a\\b \end{array} \right]\) be an arbitrary vector in \(\R^2\) as represented in Figure 24.10. Determine the components of the vectors \(\proj_{\vu} \vv\) and \(\proj_{\perp \vu} \vv\text{.}\) Reproduce Figure 24.10 and draw the vectors \(\proj_{\vu} \vv\) and \(\proj_{\perp \vu} \vv\) in your figure.

(c)

The vector labeled \(\vw\) is the reflection of the vector \(\vv\) across the line \(L(\theta)\text{.}\) Write \(\vw\) in terms of \(\vv\) and \(\proj_{\vu} \vv\text{.}\) Clearly explain your method.

(d)

Finally, show that the matrix \(A\) such that \(A \vv = \vw\) is given by

\begin{equation*} A = \left[ \begin{array}{cr} \cos(2\theta)\amp \sin(2\theta) \\ \sin(2\theta)\amp -\cos(2\theta) \end{array} \right]\text{.} \end{equation*}

The matrix \(A\) is the reflection matrix across the line \(L(\theta)\text{.}\) (You will want to look up some appropriate trigonometric identities.)

7.

In this exercise we will show that the only orthogonal \(2 \times 2\) matrices are the rotation matrices \(\left[ \begin{array}{cr} \cos(\theta)\amp -\sin(\theta) \\ \sin(\theta)\amp \cos(\theta) \end{array} \right]\) and the reflection matrices \(\left[ \begin{array}{cr} \cos(\theta)\amp \sin(\theta) \\ \sin(\theta) \amp -\cos(\theta) \end{array} \right]\) (see Exercise 6). Throughout this exercise let \(a\text{,}\) \(b\text{,}\) \(c\text{,}\) and \(d\) be real numbers such that \(M = \left[ \begin{array}{cc} a\amp b \\ c\amp d \end{array} \right]\) is an orthogonal \(2 \times 2\) matrix. Let \(\vv_1= \left[ \begin{array}{c} a \\ c \end{array} \right]\) and \(\vv_2= \left[ \begin{array}{c} b \\ d \end{array} \right]\) be the columns of \(M\text{.}\)

(a)

Explain why the terminal point of \(\vv_1\) in standard position lies on the unit circle. Then explain why there is an angle \(\theta\) such that \(a = \cos(\theta)\) and \(c = \sin(\theta)\text{.}\) What angle, specifically, is \(\theta\text{?}\) Draw a picture to illustrate.

Hint.

Think polar coordinates.

(b)

A similar argument to (b) shows that there is an angle \(\alpha\) such that \(\vv_2 = \left[ \begin{array}{c} b\\d \end{array} \right] = \left[ \begin{array}{c} \cos(\alpha)\\\sin(\alpha) \end{array} \right]\text{.}\) Given that \(M\) is an orthogonal matrix, how must \(\alpha\) be related to \(\theta\text{?}\) Use this result to find the two possibilities for \(\vv_2\) as a vector in terms of \(\cos(\theta)\) and \(\sin(\theta)\text{.}\) (You will likely want to look up some trigonometric identities for this part of the problem.)

Hint.

What properties do the columns of an orthogonal matrix have?

(c)

By considering the two possibilities from part (c), show that \(M\) is either a rotation matrix or a reflection matrix. Conclude that the only \(2 \times 2\) orthogonal matrices are the reflection and rotation matrices.

8.

Suppose \(A, B\) are orthogonal matrices of the same size.

(a)

Show that \(AB\) is also an orthogonal matrix.

(b)

Show that \(A^\tr\) is also an orthogonal matrix.

(c)

Show that \(A^{-1}\) is also an orthogonal matrix.

9.

Label each of the following statements as True or False. Provide justification for your response.

(a) True/False.

Any orthogonal subset of \(\R^n\) is linearly independent.

(b) True/False.

Every single vector set is an orthogonal set.

(c) True/False.

If \(S\) is an orthogonal set in \(\R^n\) with exactly \(n\) nonzero vectors, then \(S\) is a basis for \(\R^n\text{.}\)

(d) True/False.

Every set of three linearly independent vectors in \(\R^3\) is an orthogonal set.

(e) True/False.

If \(A\) and \(B\) are \(n \times n\) orthogonal matrices, then \(A+B\) must also be an orthogonal matrix.

(f) True/False.

If the set \(S=\{\vv_1, \vv_2, \ldots, \vv_n\}\) is an orthogonal set in \(\R^n\text{,}\) then so is the set \(\{c_1\vv_1, c_2\vv_2, \ldots, c_n\vv_n\}\) for any scalars \(c_1\text{,}\) \(c_2\text{,}\) \(\ldots\text{,}\) \(c_n\text{.}\)

(g) True/False.

If \(\CB=\{\vv_1, \vv_2, \ldots, \vv_n\}\) is an orthogonal basis of \(\R^n\text{,}\) then so is \(\{c_1\vv_1\text{,}\) \(c_2\vv_2\text{,}\) \(\ldots\text{,}\) \(c_n\vv_n\}\) for any nonzero scalars \(c_1\text{,}\) \(c_2\text{,}\) \(\ldots\text{,}\) \(c_n\text{.}\)

(h) True/False.

If \(A\) is an \(n\times n\) orthogonal matrix, the rows of \(A\) form an orthonormal basis of \(\R^n\text{.}\)

(i) True/False.

If \(A\) is an orthogonal matrix, any matrix obtained by interchanging columns of \(A\) is also an orthogonal matrix.

Subsection Project: Understanding Rotations in 3-Space

Recall that a counterclockwise rotation of \(2\)-space around the origin by an angle \(\theta\) is accomplished by left multiplication by the matrix \(\left[ \begin{array}{cr} \cos(\theta)\amp -\sin(\theta) \\ \sin(\theta)\amp \cos(\theta) \end{array} \right]\text{.}\) Notice that the columns of this rotation matrix are orthonormal, so this rotation matrix is an orthogonal matrix. As the next activity shows, rotation matrices in 3D are also orthogonal matrices.

Project Activity 24.7.

Let \(R\) be a rotation matrix in 3D. A rotation does not change lengths of vectors, nor does it change angles between vectors. Let \(\ve_1 = [ 1 \ 0 \ 0]^{\tr}\text{,}\) \(\ve_2 = [0 \ 1 \ 0]^{\tr}\text{,}\) and \(\ve_3 = [0 \ 0 \ 1]^{\tr}\) be the standard unit vectors in \(\R^3\text{.}\)

(a)

Explain why the columns of \(R\) form an orthonormal set.

Hint.

How are \(R \ve_1\text{,}\) \(R\ve_2\text{,}\) and \(R\ve_3\) related to the columns of \(R\text{?}\)

(b)

Explain why \(R\) is an orthogonal matrix. What must be true about \(\det(R)\text{?}\)

Hint.

What is \(R^{\tr}\) and what is \(\det(R^{\tr}R)\text{?}\)

By Project Activity 24.7 we know that the determinant of any rotation matrix is either \(1\) or \(-1\text{.}\) Having a determinant of \(1\) preserves orientation, and we will identify these rotations as being counterclockwise, and we will identify the others with determinant of \(-1\) as being clockwise. We will set the convention that a rotation is always measured counterclockwise (as we did in \(\R^2\)), and so every rotation matrix will have determinant \(1\text{.}\)

Returning to the counterclockwise rotation of \(2\)-space around the origin by an angle \(\theta\) determined by left multiplication by the matrix \(\left[ \begin{array}{cr} \cos(\theta)\amp -\sin(\theta) \\ \sin(\theta)\amp \cos(\theta) \end{array} \right]\text{,}\) we can think of this rotation in \(3\)-space as the rotation that keeps points in the \(xy\) plane in the \(xy\) plane, but rotates these points counterclockwise around the \(z\) axis. In other words, in the standard \(xyz\) coordinate system, with standard basis \(\ve_1\text{,}\) \(\ve_2\text{,}\) \(\ve_3\text{,}\) our rotation matrix \(R\) has the property that \(R \ve_3 = \ve_3\text{.}\) Now \(R \ve_3\) is the third column of \(R\text{,}\) so the third column of \(R\) is \(\ve_3\text{.}\) Similarly, \(R \ve_1\) is the first column of \(R\) and \(R \ve_2\) is the second column of \(R\text{.}\) Since \(R\) is a counterclockwise rotation of the \(xy\) plane space around the origin by an angle \(\theta\) it follows that this rotation is given by the matrix

\begin{equation} R_{\ve_3}(\theta) = \left[ \begin{array}{ccc} \cos(\theta)\amp -\sin(\theta)\amp 0 \\ \sin(\theta)\amp \cos(\theta)\amp 0 \\ 0\amp 0\amp 1 \end{array} \right]\text{.}\tag{24.5} \end{equation}

In this notation in (24.5), the subscript gives the direction of the line fixed by the rotation and the angle provides the counterclockwise rotation in the plane perpendicular to this vector. This vector is called a normal vector for the rotation. Note also that the columns of \(R_{\ve_3}(\theta)\) form an orthogonal set such that each column vector has norm \(1\text{.}\)

This idea describes a general rotation matrix \(R_{\vn}(\theta)\) in 3D by specifying a normal vector \(\vn\) and an angle \(\theta\text{.}\) For example, with roll, a normal vector points from the tail of the aircraft to its tip. It is our goal to understand how we can determine an arbitrary rotation matrix of the form \(R_{\vn}(\theta)\text{.}\) We can accomplish this by using the rotation around the \(z\) axis and change of basis matrices to find rotation matrices around other axes. Let \(\CS = \{\ve_1, \ve_2, \ve_3\}\) be the standard basis for \(\R^3\)

Project Activity 24.8.

In this activity we see how to determine the rotation matrix around the \(x\) axis using the matrix \(R_{\ve_3}(\theta)\) and a change of basis.

(a)

Define a new ordered basis \(\CB\) so that our axis of rotation is the third vector. So in this case the third vector in \(\CB\) will be \(\ve_1\text{.}\) The other two vectors need to make \(\CB\) an orthonormal set. So we have plenty of choices. For example, we could set \(\CB = \{\ve_2, \ve_3, \ve_1\}\text{.}\) Find the change of basis matrix \(\underset{\CS \leftarrow \CB}{P}\) from \(\CB\) to \(\CS\text{.}\)

(b)

Use the change of basis matrix from part (a) to find the change of basis matrix \(\underset{\CB \leftarrow \CS}{P}\) from \(\CS\) to \(\CB\text{.}\)

(c)

To find our rotation matrix around the \(x\) axis, we can first change basis from \(\CS\) to \(\CB\text{,}\) then perform a rotation around the new \(z\) axis using (24.5), then changing basis back from \(\CB\) to \(\CS\text{.}\) In other words,

\begin{equation*} R_{\ve_1}(\theta) = \underset{\CS \leftarrow \CB}{P} R_{\ve_3}(\theta) \underset{\CB \leftarrow \CS}{P}\text{.} \end{equation*}

Find the entries of this matrix \(R_{\ve_1}(\theta)\text{.}\)

IMPORTANT NOTE.

We could have considered using \(\CB_1 = \{\ve_3, \ve_2, \ve_1\}\) in Project Activity 24.8 instead of \(\CB = \{\ve_2, \ve_3, \ve_1\}\text{.}\) Then we would have

\begin{equation*} \underset{\CS \leftarrow \CB_1}{P} = \left[ \begin{array}{ccc} 0\amp 0\amp 1 \\ 0\amp 1\amp 0 \\ 1\amp 0\amp 0 \end{array} \right]\text{.} \end{equation*}

The difference between the two options is that, in the first we have \(\det\left(\underset{\CS \leftarrow \CB_1}{P}\right) = -1\) while \(\det\left(\underset{\CS \leftarrow \CB}{P} \right) = 1\) in the second. Using \(\CB_1\) will give clockwise rotations while \(\CB\) gives counterclockwise rotations (this is the difference between a left hand system and a right hand system). So it is important to ensure that our change of basis matrix is one with determinant \(1\text{.}\)

We do one more example to illustrate the process before tackling the general case.

Project Activity 24.10.

In this activity we find the rotation around the axis given by the line \(x=y/2=z\text{.}\) This line is in the direction of the vector \(\vn = [1 \ 2 \ 1]^{\tr}\text{.}\) So we start with making a unit vector in the direction of \(\vn\) as the third vector in an ordered basis \(\CB\text{.}\) The other two vectors need to make \(\CB\) an orthonormal set with \(\det\left(\underset{\CS \leftarrow \CB}{P} \right) = 1\text{.}\)

(a)

Find a unit vector \(\vw\) in the direction of \(\vn\text{.}\)

(b)

Show that \([2 \ -1 \ 0]^{\tr}\) is orthogonal to the vector \(\vw\) from part (a). Then find a unit vector \(\vv\) that is in the same direction as \([2 \ -1 \ 0]^{\tr}\text{.}\)

(c)

Let \(\vv\) be as in the previous part. Now the trick is to find a third unit vector \(\vu\) so that \(\CB = \{\vu, \vv, \vw\}\) is an orthonormal set. This can be done with the cross product. If \(\va = [a_1 \ a_2 \ a_3]^{\tr}\) and \(\vb = [b_1 \ b_2 \ b_3]^{\tr}\text{,}\) then the cross product \(\va \times \vb\) of \(\va\) and \(\vb\) is the vector

\begin{equation*} \va \times \vb = \left(a_2b_3-a_3b_2\right) \ve_1 - \left(a_1b_3-a_3b_1\right) \ve_2 + \left(a_1b_2-a_2b_1\right) \ve_3\text{.} \end{equation*}

(You can check that \(\{\va \times \vb, \va, \vb\}\) is an orthogonal set that gives the correct determinant for the change of basis matrix.) Use the cross product to find a unit vector \(\vu\) so that \(\CB = \{\vu, \vv, \vw\}\) is an orthonormal set.

(d)

Find the entries of the matrix \(R_{\vw}(\theta)\text{.}\)

In the next activity we summarize the general process to find a 3D rotation matrix \(R_{\vn}(\theta)\) for any normal vector \(\vn\text{.}\) There is a GeoGebra applet at geogebra.org/m/n9gbjhfx that allows you to visualize rotation matrices in 3D.

Project Activity 24.11.

Let \(\vn = [n_1 \ n_2 \ n_3]^{\tr}\) be a normal vector (nonzero) for our rotation. We need to create an orthonormal basis \(\CB = \{\vu, \vv, \vw\}\) where \(\vw\) is a unit vector in the direction of \(\vn\) so that the change of basis matrix \(\underset{\CS \leftarrow \CB}{P}\) has determinant \(1\text{.}\)

(a)

Find, by inspection, a vector \(\vy\) that is orthogonal to \(\vn\text{.}\)

Hint.

You may need to consider some cases to ensure that \(\vv\) is not the zero vector.

(b)

Once we have a normal vector \(\vn\) and a vector \(\vy\) orthogonal to \(\vn\text{,}\) the vector \(\vz = \vy \times \vn\) gives us an orthogonal set \(\{\vz, \vy, \vn\}\text{.}\) We then normalize each vector to create our orthonormal basis \(\CB = \{\vu, \vv, \vw\}\text{.}\) Use this process to find the matrix that produces a \(45^{\circ}\) counterclockwise rotation around the normal vector \([1 \ 0 \ -1]^{\tr}\text{.}\)

It isn't clear why such matrices are called orthogonal since the columns are actually orthonormal, but that is the standard terminology in mathematics.