Skip to main content

Section 33 The Dimension of a Vector Space

Subsection Application: Principal Component Analysis

The discipline of statistics is based on the idea of analyzing data. In large data sets it is usually the case that one wants to understand the relationships between the different data in the set. This can be difficult to do when the data set is large and it is impossible to visually examine the data for patterns. Principal Component Analysis (PCA) is a tool for identifying and representing underlying patterns in large data sets, and PCA has been called one of the most important and valuable linear algebra tools for statistical analysis. PCA is used to transform a collection of variables into a (usually smaller) number of uncorrelated variables called principal components. The principal components form the most meaningful basis from which to view data by removing extraneous information and revealing underlying relationships in the data. This presents a framework for how to reduce a complex data set to a lower dimension while retaining the important attributes of the data set. The output helps the experimenter determine which dynamics in the data are important and which can be ignored.

Subsection Introduction

In Section 15 we learned that any two bases for a subspace of \(\R^n\) contain the same number of vectors. This allowed us to define the dimension of a subspace of \(\R^n\text{.}\) In this section we extend the arguments we made in Section 15 to arbitrary vector spaces and define the dimension of a vector space.

Preview Activity 33.1.

The main tool we used to prove that any two bases for a subspace of \(\R^n\) must contain the same number of elements was Theorem 15.1. In this preview activity we will show that the same argument can be used for vector spaces. More specifically, we will prove a special case of the following theorem generalizing Theorem 15.1.

Suppose \(V\) is a vector space with basis \(\CB = \{\vv_1, \vv_2\}\text{.}\) Consider the set \(U=\{\vu_1, \vu_2, \vu_3\}\) of vectors in \(V\text{.}\) We will show that \(U\) is linearly dependent using a similar approach to the Preview Activity 15.1.

(a)

What vector equation involving \(\vu_1, \vu_2, \vu_3\) do we need to solve to determine linear independence/dependence of these vectors? Use \(x_1, x_2, x_3\) for coefficients.

(b)

Since \(\CB\) is a basis of \(V\text{,}\) it spans \(V\text{.}\) Using this information, rewrite the vectors \(\vu_i\) in terms of \(\vv_j\) and substitute into the above equation to obtain another equation in terms of \(\vv_j\text{.}\)

(c)

Since \(\CB\) is a basis of \(V\text{,}\) the vectors \(\vv_1, \vv_2\) are linearly independent. Using the equation in the previous part, determine what this means about the coefficients \(x_1, x_2, x_3\text{.}\)

(d)

Express the conditions on \(x_1, x_2, x_3\) in the form of a matrix-vector equation. Explain why there are infinitely many solutions for \(x_i\)'s and why this means the vectors \(\vu_1, \vu_2, \vu_3\) are linearly dependent.

Subsection Finite Dimensional Vector Spaces

Theorem 33.1 shows that if sets \(\CB_1\) and \(\CB_2\) are finite bases for a vector space \(V\text{,}\) which are linearly independent by definition, then each cannot contain more elements than the other, so the number of elements in each basis must be equal.

Theorem 33.2 states that if a vector space \(V\) has a basis with a finite number of vectors, then the number of vectors in a basis for that vector space is a well-defined number. In other words, the number of vectors in a basis is an invariant of the vector space. This important number is given a name.

Definition 33.3.

A finite-dimensional vector space is a vector space that can be spanned by a finite number of vectors. The dimension of a non-trivial finite-dimensional vector space is the number of vectors in a basis for \(V\text{.}\) The dimension of the trivial vector space is defined to be 0.

We denote the dimension of a finite dimensional vector space \(V\) by \(\dim(V)\text{.}\)

Not every vector space is finite dimensional. We have seen, for example, that the vector space \(\pol\) of all polynomials, regardless of degree, is not a finite-dimensional vector space. In fact, the polynomials

\begin{equation*} 1, t, t^2, \ldots, t^n, \ldots \end{equation*}

are linearly independent, so \(\pol\) has an infinite linearly independent set and therefore has no finite basis. A vector space that has an infinite basis is called an infinite dimensional vector space.

Activity 33.2.

Since columns of the \(n \times n\) identity matrix span \(\R^n\) and are linearly independent, the columns of \(I_n\) form a basis for \(\R^n\) (the standard basis). Consequently, we have that \(\dim(\R^n) = n\text{.}\) In this activity we determine the dimensions of other familiar vector spaces. Find the dimensions of each of the indicated vector spaces. Verify your answers.

(a)

\(\pol_1\)

(b)

\(\pol_2\)

(c)

\(\pol_n\)

(d)

\(\M_{2 \times 3}\)

(e)

\(\M_{3 \times 4}\)

(f)

\(\M_{k \times n}\)

Finding the dimension of a finite-dimensional vector space amounts to finding a basis for the space.

Activity 33.3.

Let \(W = \{(a+b) + (a-b)t + (2a+3b)t^2 \mid a, b \text{ are scalars } \}\text{.}\)

(a)

Find a finite set of polynomials in \(W\) that span \(W\text{.}\)

(b)

Determine if the spanning set from part (a) is linearly independent or dependent. Clearly explain your process.

(c)

What is \(\dim(W)\text{?}\) Explain.

Subsection The Dimension of a Subspace

Every subspace of a finite-dimensional vector space is a vector space, and since a subspace is contained in a vector space it is natural to think that the dimension of a subspace should be less than or equal to the dimension of the larger vector space. We verify that fact in this section.

Activity 33.4.

Let \(V\) be a finite dimensional vector space of dimension \(n\) and let \(W\) be a subspace of \(V\text{.}\) Explain why \(W\) cannot have dimension larger than \(\dim(V)\text{,}\) and if \(W \neq V\) then \(\dim(W) \lt \dim(V)\text{.}\)

Subsection Conditions for a Basis of a Vector Space

There are two items we need to confirm before we can state that a subset \(\CB\) of a subspace \(W\) of a vector space is a basis for \(W\text{:}\) the set \(\CB\) must be linearly independent and span \(W\text{.}\) We can reduce the amount of work it takes to show that a set is a basis if we know the dimension of the vector space in advance.

Activity 33.5.

Let \(W\) be a subspace of a vector space \(V\) with \(\dim(W) = k\text{.}\) We know that every basis of \(W\) contains exactly \(k\) vectors.

(a)

Suppose that \(S\) is a subset of \(W\) that contains \(k\) vectors and is linearly independent. In this part of the activity we will show that \(S\) must span \(W\text{.}\)

(i)

Suppose that \(S\) does not span \(W\text{.}\) Explain why this implies that \(W\) contains a set of \(k+1\) linearly independent vectors.

(ii)

Explain why the result of i tells us that \(S\) is a basis for \(W\text{.}\)

(b)

Now suppose that \(S\) is a subset of \(W\) with \(k\) vectors that spans \(W\text{.}\) In this part of the activity we will show that \(S\) must be linearly independent.

(i)

Suppose that \(S\) is not linearly independent. Explain why we can then find a proper subset of \(S\) that is linearly independent but has the same span as \(S\text{.}\)

(ii)

Explain why the result of i tells us that \(S\) is a basis for \(W\text{.}\)

The result of Activity 33.5 is summarized in the following theorem (compare to Theorem 15.4).

Subsection Examples

What follows are worked examples that use the concepts from this section.

Example 33.5.

Find a basis and dimension for each of the indicated subspaces of the given vector spaces.

(a)

\(\{a+b(t+t^2) : a, b \in \R\}\) in \(\pol_2\)

Solution.

Let \(W = \{a+b(t+t^2) : a, b \in \R\}\text{.}\) Every element in \(W\) has the form

\begin{equation*} a+b(t+t^2) = a(1) + b(t+t^2)\text{.} \end{equation*}

So \(W = \Span\{1, t+t^2\}\text{.}\) Since neither \(1\) nor \(t+t^2\) is a scalar multiple of the other, the set \(\{1, t+t^2\}\) is linearly independent. Thus, \(\{1, t+t^2\}\) is a basis for \(W\) and \(\dim(W) = 2\text{.}\)

(b)

\(\Span\left\{1, \frac{1}{1+x^2}, \frac{2+x^2}{1+x^2}, \arctan(x)\right\}\) in \(\F\)

Solution.

Let \(W = \Span\left\{1, \frac{1}{1+x^2}, \frac{2+x^2}{1+x^2}, \arctan(x)\right\}\text{.}\) To find a basis for \(W\text{,}\) we find a linearly independent subset of \(\left\{1, \frac{1}{1+x^2}, \frac{2+x^2}{1+x^2}, \arctan(x)\right\}\text{.}\) Consider the equation

\begin{equation*} c_1 (1) + c_2\left(\frac{1}{1+x^2}\right) + c_3\left(\frac{2+x^2}{1+x^2}\right) + c_4 \arctan(x) = 0\text{.} \end{equation*}

To find the weights \(c_i\) for which this equality of functions holds, we use the fact that the we must have equality for every \(x\text{.}\) So we pick four different values for \(x\) to obtain a linear system that we can solve for the weights. Evaluating both sides of the equation at \(x=0\text{,}\) \(x=1\text{,}\) \(x=-1\text{,}\) and \(x=2\) yields the equations

\begin{align*} {}c_1 \amp {}+{} \amp {}c_2 \amp {}+{} \amp {2}c_3 \amp {} \amp {} \amp = \ \amp 0\amp {}\\ {}c_1 \amp {}+{} \amp {\frac{1}{2}}c_2 \amp {}+{} \amp {\frac{3}{2}}c_3 \amp {}+{} \amp {\frac{\pi}{4}}c_4 \amp = \ \amp 0\amp {}\\ {}c_1 \amp {}+{} \amp {\frac{1}{2}}c_2 \amp {}+{} \amp {\frac{3}{2}}c_3 \amp {}-{} \amp {\frac{\pi}{4}}c_4 \amp = \ \amp 0\amp {}\\ {}c_1 \amp {}+{} \amp {\frac{1}{5}}c_2 \amp {}+{} \amp {\frac{6}{5}}c_3 \amp {}+{} \amp {\arctan(2)}c_4 \amp = \ \amp 0\amp {.} \end{align*}

The reduced row echelon form of the coefficient matrix

\begin{equation*} \left[ \begin{array}{cccr} 1\amp 1\amp 2\amp 0 \\ 1\amp \frac{1}{2}\amp \frac{3}{2}\amp \frac{\pi}{4} \\ 1\amp \frac{1}{2}\amp \frac{3}{2}\amp -\frac{\pi}{4} \\ 1\amp \frac{1}{5}\amp \frac{6}{5}\amp \arctan(2) \end{array} \right] \end{equation*}

is \(\left[ \begin{array}{cccc} 1\amp 0\amp 1\amp 0 \\ 0\amp 1\amp 1\amp 0 \\ 0\amp 0\amp 0\amp 1 \\ 0\amp 0\amp 0\amp 0 \end{array} \right]\text{.}\) The general solution to this linear system is \(c_1 = c_2 = -c_3\) and \(c_4 = 0\text{.}\) Notice that

\begin{equation*} (-1)(1) + (-1)\left( \frac{1}{1+x^2}\right) + \frac{2+x^2}{1+x^2} = 0 \end{equation*}

or

\begin{equation*} \frac{2+x^2}{1+x^2} = 1 + \left( \frac{1}{1+x^2}\right)\text{,} \end{equation*}

so \(\frac{2+x^2}{1+x^2}\) is a linear combination of the other vectors. The vectors corresponding to the pivot columns are linearly independent, so it follows that \(1\text{,}\) \(\frac{1}{1+x^2}\text{,}\) and \(\arctan(x)\) are linearly independent. We conclude that \(\left\{1, \frac{1}{1+x^2}, \arctan(x)\right\}\) is a basis for \(W\) and \(\dim(W) = 3\text{.}\)

(c)

\(\{p(t) \in \pol_n : p(-t) = p(t)\}\) in \(\pol_4\) (The polynomials with the property that \(p(-t) = p(t)\) are called even polynomials.)

Solution.

Let \(W = \{p(t) \in \pol_n : p(-t) = p(t)\}\) in \(\pol_4\text{.}\) Let \(p(t) \in W\) and suppose that \(p(t) = a_0 + a_1t+a_2t^2+a_3t^3+a_4t^4\text{.}\) Since \(p(-t)=p(t)\text{,}\) we must have \(p(1) = p(-1)\) and \(p(3) = p(-3)\text{.}\) Since \(p(1) = p(-1)\text{,}\) it follows that

\begin{equation*} a_0+a_1+a_2+a_3+a_4 = a_0-a_1+a_2-a_3+a_4 \end{equation*}

or

\begin{equation*} a_1+a_3 = 0\text{.} \end{equation*}

Similarly, the fact that \(p(3) = p(-3)\) yields the equation

\begin{equation*} a_0+3a_1+9a_2+27a_3+81a_4 = a_0-3a_1+9a_2-27a_3+81a_4 \end{equation*}

or

\begin{equation*} a_1+9a_3 = 0\text{.} \end{equation*}

The reduced row echelon form of the coefficient matrix of system \(a_1+a_3 = 0\) and \(a_1+9a_3 = 0\) is \(I_2\text{,}\) so it follows that \(a_1=a_3=0\text{.}\) Thus, \(p(t) = a_0 + a_2t^2+a_4t^4\) and so \(W = \Span\{1, t^2, t^4\}\text{.}\) Equating like terms in the equation

\begin{equation*} c_1(1) + c_2(t^2) + c_3(t^4) = 0 \end{equation*}

yields \(c_1=c_2=c_3=0\text{.}\) We conclude that \(\{1,t^2,t^4\}\) is linearly independent and is therefore a basis for \(W\text{.}\) Thus, \(\dim(W) = 3\text{.}\) As an alternative solution, notice that \(p(t) = t\) is not in \(W\text{.}\) So \(W \neq V\) and we know that \(\dim(W) \lt \dim(V)\text{.}\) Since \(1\text{,}\) \(t^2\text{,}\) and \(t^4\) are in \(W\text{,}\) we can show as above that \(1\text{,}\) \(t^2\text{,}\) and \(t^4\) are linearly independent. We can conclude that \(\dim(W)=3\) since it cannot be \(4\text{.}\)

Example 33.6.

Let \(U_{n \times n}\) be the set of all \(n \times n\) upper triangular matrices. Recall that a matrix \(A = [a_{ij}]\) is upper triangular if \(a_{ij} = 0\) whenever \(i > j\text{.}\) That is, a matrix is upper triangular if all entries below the diagonal are 0.

(a)

Show that \(U_{n \times n}\) is a subspace of \(\M_{n \times n}\text{.}\)

Solution.

Since the \(n \times n\) zero matrix \(0_{n \times n}\) has all entries equal to 0, it follows that \(0_{n \times n}\) is in \(U_{n \times n}\text{.}\) Let \(A = [a_{ij}]\) and \(B = [b_{ij}]\) be in \(U_{n \times n}\text{,}\) and let \(C = [c_{ij}] = A+B\text{.}\) Then \(c_{ij} = a_{ij} + b_{ij} = 0 + 0\) when \(i > j\text{.}\) So \(C\) is an upper triangular matrix and \(U_{n \times n}\) is closed under addition. Let \(c\) be a scalar. The \(ij\)th entry of \(cA\) is \(ca_{ij} = c(0) = 0\) whenever \(i > j\text{.}\) So \(cA\) is an upper triangular matrix and \(U_{n \times n}\) is closed under multiplication by scalars. We conclude that \(U_{n \times n}\) is a subspace of \(\M_{n \times n}\text{.}\)

(b)

Find the dimensions of \(U_{2 \times 2}\) and \(U_{3 \times 3}\text{.}\) Explain. Make a conjecture as to what \(\dim(U_{n \times n})\) is in terms of \(n\text{.}\)

Solution.

Let \(M_{11} = \left[ \begin{array}{cc} 1\amp 0 \\ 0\amp 0 \end{array} \right]\text{,}\) \(M_{12} = \left[ \begin{array}{cc} 0\amp 1 \\ 0\amp 0 \end{array} \right]\text{,}\) and \(M_{22}=\left[ \begin{array}{cc} 0\amp 0 \\ 0\amp 1 \end{array} \right]\text{.}\) We will show that \(S = \{M_{11}, M_{12}, M_{22}\}\) is a basis for \(U_{2 \times 2}\text{.}\) Consider the equation

\begin{equation*} x_1M_{11} + x_2M_{12} + x_3 M_{22} = 0\text{.} \end{equation*}

Equating like entries shows that \(x_1 = x_2 = x_3 = 0\text{,}\) and so \(S\) is linearly independent. If \(\left[ \begin{array}{cc} a\amp b \\ 0\amp c \end{array} \right]\) is in \(U_{2 \times 2}\text{,}\) then

\begin{equation*} \left[ \begin{array}{cc} a\amp b \\ 0\amp c \end{array} \right] = aM_{11} + bM_{12} + cM_{22} \end{equation*}

and so \(S\) spans \(U_{2 \times 2}\text{.}\) Thus, \(S\) is a basis for \(U_{2 \times 2}\) and so \(\dim(U_{2 \times 2}) = 3\text{.}\) Similarly, for the \(3 \times 3\) case let \(M_{ij}\) for \(i \leq j\) be the \(3 \times 3\) matrix with a 1 in the \(ij\) position and 0 in every other position. Let

\begin{equation*} S = \{M_{11}, M_{12}, M_{13}, M_{22}, M_{23}, M_{33}\}\text{.} \end{equation*}

Equating corresponding entries shows that if

\begin{equation*} x_1M_{11} + x_2M_{12} + x_3M_{13} + x_4 M_{22}+ x_5 M_{23} + x_6M_{33} = 0\text{,} \end{equation*}

then \(x_1 = x_2 = x_3 = x_4 = x_5 = x_6 = 0\text{.}\) So \(S\) is a linearly independent set. If \(A = [a_{ij}]\) is in \(U_{3 \times 3}\text{,}\) then \(A = \sum_{i \geq j} a_{ij}M_{ij}\) and \(S\) spans \(U_{3 \times 3}\text{.}\) We conclude that \(S\) is a basis for \(U_{3 \times 3}\) and \(\dim(U_{3 \times 3}) = 6\text{.}\) In general, for an \(n \times n\) matrix, the set of matrices \(M_{ij}\text{,}\) one for each entry on and above the diagonal, is a basis for \(U_{n \times n}\text{.}\) There are \(n\) such matrices for the entries on the diagonal. The number of entries above the diagonal is equal to half the total number of entries (\(n^2\)) minus half the number of entries on the diagonal (\(n\)). So there is a total of \(\frac{n^2-n}{2}\) such matrices for the entries above the diagonal. Therefore,

\begin{equation*} \dim(U_{n \times n}) = n+\frac{n^2-n}{2} = \frac{n^2+n}{2}\text{.} \end{equation*}

Subsection Summary

  • A finite dimensional vector space is a vector space that can be spanned by a finite set of vectors.

  • We showed that any two bases for a finite dimensional vector space must contain the same number of vectors. Therefore, we can define the dimension of a finite dimensional vector space \(V\) to be the number of vectors in any basis for \(V\text{.}\)

  • If \(V\) is a vector space with dimension \(n\) and \(S\) is any linearly independent subset of \(V\) with \(n\) vectors, then \(S\) is a basis for \(V\text{.}\) Otherwise, we could add vectors to \(S\) to make a basis for \(V\) and then \(V\) would have a basis of more than \(n\) vectors.

  • If \(V\) is a vector space with dimension \(n\) and \(S\) is any subset of \(V\) with \(n\) vectors that spans \(V\text{,}\) then \(S\) is a basis for \(V\text{.}\) Otherwise, we could remove vectors from \(S\) to obtain a basis for \(V\) and then \(V\) would have a basis of fewer than \(n\) vectors.

  • For any finite dimensional space \(V\) and a subspace \(W\) of \(V\text{,}\) \(\dim(W)\leq \dim(V)\text{.}\)

Exercises Exercises

1.

Let \(W = \Span\{ 1+t^2, 2+t+2t^2+t^3, 1+t+t^3, t-t^2+t^3\}\) in \(\pol_3\text{.}\) Find a basis for \(W\text{.}\) What is the dimension of \(W\text{?}\)

2.

Let \(A = \left[ \begin{array}{cc} 1\amp 2\\1\amp 0 \end{array} \right]\) and \(B = \left[ \begin{array}{cr} 1\amp 0\\1\amp -1 \end{array} \right]\text{.}\)

(a)

Are \(A\) and \(B\) linearly independent or dependent? Verify your result.

(b)

Extend the set \(S = \{A,B\}\) to a basis for \(\M_{2 \times 2}\text{.}\) That is, find a basis for \(\M_{2 \times 2}\) that contains both \(A\) and \(B\text{.}\)

3.

Let \(A = \left[ \begin{array}{ccc} 1\amp 2\amp 0 \\ 3\amp 0\amp 2 \end{array} \right]\text{,}\) \(B = \left[ \begin{array}{rcc} -1\amp 1\amp 1 \\ 2\amp 4\amp 0 \end{array} \right]\text{,}\) \(C= \left[ \begin{array}{crr} 5\amp 1\amp -3 \\ 0\amp -12\amp 4 \end{array} \right]\text{,}\) \(D = \left[ \begin{array}{crr} 5\amp 4\amp -2 \\ 5\amp -8\amp 6 \end{array} \right]\text{,}\) and \(E = \left[ \begin{array}{rcc} 2\amp 0\amp 0 \\ -2\amp 0\amp 1 \end{array} \right]\) in \(\M_{2 \times 3}\) and let \(S = \left\{ A, B, C, D, E \right\}\text{.}\)

(a)

Is \(S\) a basis for \(\M_{2 \times 3}\text{?}\) Explain.

(b)

Determine if \(S\) is a linearly independent or dependent set. Verify your result.

(c)

Find a basis \(\CB\) for \(\Span \ S\) that is a subset of \(S\) and write all of the vectors in \(S\) as linear combinations of the vectors in \(\CB\text{.}\)

(d)

Extend your basis \(\CB\) from part (c) to a basis \(\M_{2 \times 3}\text{.}\) Explain your method.

4.

Determine the dimension of each of the following vector spaces.

(a)

\(\Span\{2, 1+t, t^2\}\) in \(\pol_2\)

(b)

The space of all polynomials in \(\pol_3\) whose constant terms is 0.

(c)

\(\Nul \left[ \begin{array}{cc} 1\amp 2 \\ 2\amp 4 \end{array} \right]\)

(d)

\(\Span\left\{ [1 \ 2 \ 0 \ 1 \ -1]^{\tr}, [0 \ 1 \ 1 \ 1 \ 0]^{\tr}, [0 \ 3 \ 2 \ 3 \ 0]^{\tr}, [-1 \ 0 \ 1 \ 1 \ 1]^{\tr}\right\}\) in \(\R^5\)

5.

Let \(W\) be the set of matrices in \(\M_{2 \times 2}\) whose diagonal entries sum to 0. Show that \(W\) is a subspace of \(\M_{2 \times 2}\text{,}\) find a basis for \(W\text{,}\) and then find \(\dim(W)\text{.}\)

6.

Show that if \(W\) is a subspace of a finite dimensional vector space \(V\text{,}\) then any basis of \(W\) can be extended to a basis of \(V\text{.}\)

7.

Let \(W\) be the set of all polynomials \(a+bt+ct^2\) in \(\pol_2\) such that \(a+b+c = 0\text{.}\) Show that \(W\) is a subspace of \(\pol_2\text{,}\) find a basis for \(W\text{,}\) and then find \(\dim(W)\text{.}\)

8.

Suppose \(W_1, W_2\) are subspaces in a finite-dimensional space \(V\text{.}\)

(a)

Show that it is not true in general that \(\dim(W_1)+\dim(W_2) \leq \dim(V)\text{.}\)

(b)

Are there any conditions on \(W_1\) and \(W_2\) that will ensure that \(\dim(W_1)+\dim(W_2) \leq \dim(V)\text{?}\)

Hint.

See problem 12 in the previous section.

9.

Suppose \(W_1 \subseteq W_2\) are two subspaces of a finite-dimensional space. Show that if \(\dim(W_1)=\dim(W_2)\text{,}\) then \(W_1=W_2\text{.}\)

Hint.

Consider dimensions.

10.

Suppose \(W_1, W_2\) are both three-dimensional subspaces inside \(\R^4\text{.}\) In this exercise we will show that \(W_1\cap W_2\) contains a plane. Let \(\{\vu_1, \vu_2, \vu_3\}\) be a basis for \(W_1\) and let \(\{\vv_1, \vv_2, \vv_3\}\) be a basis for \(W_2\text{.}\)

(a)

If \(\vv_1\text{,}\) \(\vv_2\text{,}\) and \(\vv_3\) are all in \(W_1\text{,}\) explain why \(W_1 \cap W_2\) must contain a plane.

(b)

Now we consider the case where not all of \(\vv_1\text{,}\) \(\vv_2\text{,}\) and \(\vv_3\) are in \(W_1\text{.}\) Since the arguments will be the same, let us assume that \(\vv_1\) is not in \(W_1\text{.}\)

(i)

Explain why the set \(\CS = \{\vu_1, \vu_2, \vu_3, \vv_1\}\) is a basis for \(\R^4\text{.}\)

(ii)

Explain why \(\vv_2\) and \(\vv_3\) can be written as linear combinations of the vectors in \(\CS\text{.}\) Use these linear combinations to find two vectors that are in \(W_1 \cap W_2\text{.}\) Then show that these vectors span a plane in \(W_1 \cap W_2\text{.}\)

11.

A magic matrix is an \(n \times n\) matrix in which the sum of the entries along any row, column, or diagonal is the same. For example, the \(3 \times 3\) matrix

\begin{equation*} \left[ \begin{array}{rcc} 5\amp 4\amp 0 \\ -2\amp 3\amp 8 \\ 6\amp 2\amp 1 \end{array} \right] \end{equation*}

is a magic matrix. Note that the entries of a magic matrix can be any real numbers.

(a)

Show that the set of \(n \times n\) magic matrices is a subspace of \(\M_{n \times n}\text{.}\)

Hint.

Consider the row, column, and diagonal sums.

(b)

Let \(V\) be the space of \(3 \times 3\) magic matrices. Find a basis for \(V\) and determine the dimension of \(V\text{.}\)

12.

Label each of the following statements as True or False. Provide justification for your response.

(a) True/False.

The dimension of a finite dimensional vector space is the minimum number of vectors needed to span that space.

(b) True/False.

The dimension of a finite dimensional vector space is the maximum number of linearly independent vectors that can exist in that space.

(c) True/False.

If \(n\) vectors span an \(n\)-dimensional vector space \(V\text{,}\) then these vectors form a basis of \(V\text{.}\)

(d) True/False.

Any set of \(n\) vectors form a basis in an \(n\)-dimensional vector space.

(e) True/False.

Every vector in a vector space \(V\) spans a one-dimensional subspace of \(V\text{.}\)

(f) True/False.

Any set of \(n\) linearly independent vectors in a vector space \(V\) of dimensional \(n\) is a basis for \(V\text{.}\)

(g) True/False.

If \(\{\vv_1, \vv_2, \ldots, \vv_k\}\) is linearly independent in \(V\text{,}\) then \(\dim(V)\geq k\text{.}\)

(h) True/False.

If a set of \(k\) vectors span \(V\text{,}\) then any set of more than \(k\) vectors in \(V\) is linearly dependent.

(i) True/False.

If an infinite set of vectors span \(V\text{,}\) then \(V\) is infinite-dimensional.

(j) True/False.

If \(W_1, W_2\) are both two-dimensional subspaces of a three dimensional vector space \(V\text{,}\) then \(W_1\cap W_2\neq \{\vzero \}\text{.}\)

(k) True/False.

If \(\dim(V)=n\) and \(W\) is a subspace of \(V\) with dimension \(n\text{,}\) then \(W=V\text{.}\)

Subsection Project: Understanding Principal Component Analysis

Suppose we were to film an experiment involving a ball that is bouncing up and down. Naively, we set up several cameras to follow the process of the experiment from different perspectives and collect the data. All of this data tells us something about the bouncing ball, but there may be no perspective that tells us an important piece of information — the axis along which the ball bounces. The question, then, is how we can extract from the data this important piece of information. Principal Component Analysis (PCA) is a tool for just this type of analysis.

Table 33.7. SAT data
State \(1\) \(2\) \(3\) \(4\) \(5\) \(6\) \(7\) \(8\) \(9\) \(10\)
EBRW \(595\) \(540\) \(522\) \(508\) \(565\) \(512\) \(643\) \(574\) \(534\) \(539\)
Math \(571\) \(536\) \(493\) \(493\) \(554\) \(501\) \(655\) \(566\) \(534\) \(539\)

We will use an example to illustrate important concepts we will need. To realistically apply PCA we will have much more data than this, but for now we will restrict ourselves to only two variables so that we can visualize our results. Table 33.7 presents information from ten states on two attributes related to the SAT — Evidence-Based Reading and Writing (EBRW) score and Math score. The SAT is made up of three sections: Reading, Writing and Language (also just called Writing), and Math. The The EBRW score is calculated by combining the Reading and Writing section scores — both the Math and EBRW are scored on a scale of 200-800.

Each attribute (Math, EBRW score) creates a vector whose entries are the student responses for that attribute. The data provides the average scores from participating students in each state. In this example we have two attribute vectors:

\begin{align*} \vx_1 \amp = [595 \ 540 \ 522 \ 508 \ 565 \ 512 \ 643 \ 574 \ 534 \ 539]^{\tr} \text{ and }\\ \vx_2 \amp = [571 \ 536 \ 493 \ 493 \ 554 \ 501 \ 655 \ 566 \ 534 \ 539]^{\tr}\text{.} \end{align*}

These vectors form the rows of a \(2 \times 10\) matrix

\begin{equation*} X_0 = \left[ \begin{array}{c} \vx_1^{\tr} \\ \vx_2^{\tr} \end{array} \right] = \left[ \begin{array}{cccccccccc} 595\amp 540\amp 522\amp 508\amp 565\amp 512\amp 643\amp 574\amp 534\amp 539 \\ 571\amp 536\amp 493\amp 493\amp 554\amp 501\amp 655\amp 566\amp 534\amp 539 \end{array} \right] \end{equation*}

that makes up our data set. A plot of the data is shown at left in Figure 33.8, where the EBRW score is along the horizontal axis and the math score is along the vertical axis.

Figure 33.8. Two views of the data set (EBRW horizontal, math vertical).

The question we want to answer is, how do we represent our data set so that the most important features in the data set are revealed?

Project Activity 33.6.

Before analyzing a data set there is often some initial preparation that needs to be made. Issues that have to be dealt with include the problem that attributes might have different units and scales. For example, in a data set with attributes about people, height could be measured in inches while weight is measured in pounds. It is difficult to compare variables when they are on different scales. Another issue to consider is that some attributes are independent of the others (height, for example does not depend on hair color), while some are interrelated (body mass index depends on height and weight). To simplify our work, we will not address these type of problems. The only preparation we will do with our data is to center it.

(a)

An important piece of information about a one-dimensional data set \(\vx = [x_1 \ x_2 \ x_3 \ \cdots \ x_n]^{\tr}\) is the sample average or mean

\begin{equation*} \overline{\vx} = \sum_{i=1}^n x_i\text{.} \end{equation*}

Calculate the means \(\overline{\vx_1}\) and \(\overline{\vx_2}\) for our SAT data from the matrix \(X_0\text{.}\)

(b)

We can use these sample means to center our data at the origin by translating the data so that each column of our data matrix has mean \(0\text{.}\) We do this by subtracting the mean for that row vector from each component of the vector. Determine the matrix \(X\) that contains the centered data for our SAT data set from matrix \(X_0\text{.}\)

A plot of the centered data for our SAT data is shown at right in Figure 33.8. Later we will see why centering the data is useful — it will more easily allow us to project onto subspaces. The goal of PCA is to find a matrix \(P\) so that \(PX = Y\text{,}\) and \(Y\) is suitably transformed to identify the important aspects of the data. We will discuss what the important aspects are shortly. Before we do so, we need to discuss a way to compare the one dimensional data vectors \(\vx_1\) and \(\vx_2\text{.}\)

Project Activity 33.7.

To compare the two one dimensional data vectors, we need to consider variance and covariance.

(a)

It is often useful to know how spread out a data set is, something the average doesn't tell us. For example, the data sets \([1 \ 2 \ 3]^{\tr}\) and \([-2 \ 0 \ 8]^{\tr}\) both have averages of \(2\text{,}\) but the data in \([-2 \ 0 \ 8]^{\tr}\) is more spread out. Variance provides one measure of how spread out a one-dimensional data set \(\vx = [x_1 \ x_2 \ x_3 \ \cdots \ x_n]^{\tr}\) is. Variance is defined as

\begin{equation*} \var(\vx) = \frac{1}{n-1} \sum_{i=1}^{n} \left(x_i - \overline{\vx} \right)^2\text{.} \end{equation*}

The variance provides a measure of how far from the average the data is spread. 60  Determine the variances of the two data vectors \(\vx_1\) and \(\vx_2\text{.}\) Which is more spread out?

(b)

In general, we will have more than one-dimensional data, as in our SAT data set. It will be helpful to have a way to compare one-dimensional data sets to try to capture the idea of variance for different data sets — how much the data in two different data sets varies from the mean with respect to each other. One such measure is covariance — essentially the average of all corresponding products of deviations from the means. We define the covariance of two data vectors \(\vx = [x_1 \ x_2 \ \cdots \ x_n]^{\tr}\) and \(\vy = [y_1 \ y_2 \ \cdots \ y_n]^{\tr}\) as

\begin{equation*} \cov(\vx,\vy) = \frac{1}{n-1} \sum_{i=1}^{n} \left(x_i-\overline{\vx}\right)\left(y_i-\overline{\vy}\right)\text{.} \end{equation*}

Determine all covariances

\begin{equation*} \cov(\vx_1,\vx_1), \ \cov(\vx_1,\vx_2), \ \cov(\vx_2,\vx_1), \ \text{ and } \ \cov(\vx_2,\vx_2)\text{.} \end{equation*}

How are \(\cov(\vx_1,\vx_2)\) and \(\cov(\vx_2,\vx_1)\) related? How are \(\cov(\vx_1,\vx_1)\) and \(\cov(\vx_2,\vx_2)\) related to variances?

(c)

What is most important about covariance is its sign. Suppose \(\vy = [y_1 \ y_2 \ \ldots \ y_n]^{\tr}\text{,}\) \(\vz = [z_1 \ z_2 \ \ldots \ z_n]^{\tr}\) and \(\cov(\vy,\vz) > 0\text{.}\) Then if \(y_i\) is larger than \(y_j\) it is likely that \(z_i\) is also larger than \(z_j\text{.}\) For example, if \(\vy\) is a vector that records a persons height from age \(2\) to \(10\) and \(\vz\) records that same person's weight in the same years, we might expect that when \(y_i\) increases so does \(z_i\text{.}\) Similarly, if \(\cov(\vy,\vz) \lt 0\text{,}\) then as one data set increases, the other decreases. As an example, if \(\vy\) records the number of hours a student spends playing video games each semester ad \(\vz\) gives the student's GPA for each semester, then we might expect that \(z_i\) decreases as \(y_i\) increases. When \(\cov(\vy,\vz) = 0\text{,}\) then \(\vy\) and \(\vz\) are said to be uncorrelated or independent of each other. For our example \(\vx_1\) and \(\vx_2\text{,}\) what does \(\cov(\vx_1,\vx_2)\) tell us about the relationship between \(\vx_1\) and \(\vx_2\text{?}\) Why should we expect this from the context?

(d)

The covariance gives us information about the relationships between the attributes. So instead of working with the original data, we work with the covariance data. If we have \(m\) data vectors \(\vx_1\text{,}\) \(\vx_2\text{,}\) \(\ldots\text{,}\) \(\vx_m\) in \(\R^n\text{,}\) the covariance matrix \(C = [c_{ij}]\) satisfies \(c_{ij} = \cov(\vx_i, \vx_j)\text{.}\) Calculate the covariance matrix for our SAT data. Then explain why \(C = \frac{1}{n-1}XX^{\tr}\text{.}\)

Recall that the goal of PCA is to find a matrix \(P\) such that \(PX = Y\) where \(P\) transforms the data set to a coordinate system in which the important aspects of the data are revealed. We are now in a position to discuss what that means.

An ideal view of our data would be one in which we can see the direction of greatest variance and one that minimizes redundancy. With redundant data the variables are not independent — that is, covariance is nonzero. So we would like the covariances to all be zero (or as close to zero as possible) to remove redundancy in our data. That means that we would like a covariance matrix in which the non-diagonal entries are all zero. This will be possible if the covariance matrix is diagonalizable.

Project Activity 33.8.

Consider the covariance matrix \(C = \left[ \begin{array}{cc} 1760.18\amp 1967.62\\ 1967.62\amp 2319.29 \end{array} \right]\text{.}\) Explain why we can find a matrix \(P\) with determinant 1 whose columns are unit vectors that diagonalizes \(C\text{.}\) Then find such a matrix. Use technology as appropriate.

For our purposes, we want to diagonalize \(XX^{\tr}\) with \(PXX^{\tr}P^{-1}\text{,}\) so the matrix \(P\) that serve our purposes is the one whose rows are the eigenvectors of \(XX^{\tr}\text{.}\) To understand why this matrix is the one we want, recall that we want to have \(PX = Y\text{,}\) and we want to diagonalize \(XX^{\tr}\) to a diagonal covariance matrix \(YY^{\tr}\text{.}\) In this situation we will have (recalling that \(P^{-1}=P^{\tr}\))

\begin{equation*} \frac{1}{n-1}YY^{\tr} = \frac{1}{n-1}(PX)(PX)^{\tr} = \frac{1}{n-1}P\left(XX^{\tr}\right)P^{\tr} = P\left(XX^{\tr}\right)P^{-1}\text{.} \end{equation*}

So the matrix \(P\) that we want is exactly the one that diagonalizes \(XX^{\tr}\text{.}\)

Project Activity 33.9.

There are two useful ways we can interpret the results of our work so far. An eigenvector of \(XX^{\tr}\) that corresponds to the largest (also called the dominant) eigenvalue \(\lambda_1\) is \([0.66 \ 0.76]^{\tr}\text{.}\) A plot of the centered data along with the eigenspace \(E_{\lambda_1}\) of \(XX^{\tr}\) spanned by \(\vv_1 = [0.66 \ 0.76]^{\tr}\) is shown at left in Figure 33.9. The eigenvector \(\vv_1\) is called the first principal component of \(X\text{.}\) Notice that this line \(E_{\lambda_1}\) indicates the direction of greatest variation in the data. That is, the sum of the squares of the differences between the data points and the mean is as large as possible in this direction. In other words, when we project the data points onto \(E_{\lambda_1}\text{,}\) as shown at right in Figure 33.9, the variation of the resulting points is larger than it is for any other line. In other words, the data is most spread out in this direction.

(a)

There is another way we can interpret this result. If we drop a perpendicular from one of our data points to the space \(E_{\lambda_1}\) it creates a right triangle with sides of length \(a\text{,}\) \(b\text{,}\) and \(c\) as illustrated in the middle of Figure 33.9. Use this idea to explain why maximizing the variation also minimizes the sum of the squares of the distances from the data points to this line. As a result, we have projected our two-dimensional data onto the one-dimensional space that maximizes the variance of the data.

Figure 33.9. The principal component.
(b)

Recall that the matrix (to two decimal places) \(P = \left[ \begin{array}{rr} - 0.66\amp -0.76\\ 0.76\amp 0.66 \end{array} \right]\) transforms the data set \(X\) to a new data set \(Y=PX\) whose covariance matrix is diagonal. Explain how the \(x\)-axis is related to the transformed data set \(Y\text{.}\)

Figure 33.10. Applying \(P\text{.}\)

The result of Project Activity 33.9 is that we have reduced the problem from considering the data in a two-dimensional space to a one-dimensional space \(E_{\lambda_1}\) where the most important aspect of the data is revealed. Of course, we eliminate some of the characteristics of the data, but the most important aspect is still included and highlighted. This is one of the important uses of PCA, data dimension reduction, which allows us to reveal key relationships between the variables that might not be evident when looking at a large dataset.

The second eigenvector of \(XX^{\tr}\) also has meaning. A picture of the eigenspace \(E_{\lambda_2}\) corresponding to the smaller eigenvector \(\lambda_2\) of \(XX^{\tr}\) is shown in Figure 33.11. The second eigenvector of \(XX^{\tr}\) is orthogonal to the first, and the direction of the second eigenvector tells us the direction of the second most amount of variance as can be seen in Figure 33.11.

Figure 33.11. The second principal component.

To summarize, the unit eigenvector for the largest eigenvalue of \(XX^{\tr}\) indicates the direction in which the data has the greatest variance. The direction of the unit eigenvector for the smaller eigenvalue shows the direction in which the data has the second largest variance. This direction is also perpendicular to the first (indicating \(0\) covariance). The directions of the eigenvectors are called the principal components of \(X\text{.}\) The eigenvector with the highest eigenvalue is the first principal component of the data set and the other eigenvectors are ordered by the eigenvalues, highest to lowest. The principal components provide a new coordinate system from which to view our data — one in which we can see the maximum variance and in which there is zero covariance.

Project Activity 33.10.

We can use the eigenvalues of \(XX^{\tr}\) to quantify the amount of variance that is accounted for by our projections. Notice that the points along the \(x\)-axis at right in Figure 33.10 are exactly the numbers in the first row of \(Y\text{.}\) These numbers provide the projections of the data in \(Y\) onto the \(x\)-axis — the axis along which the data has its greatest variance.

(a)

Calculate the variance of the data given by the first row of \(Y\text{.}\) This is the variance of the data i the direction of the eigenspace \(E_{\lambda_1}\text{.}\) How does the result compare to entries of the covariance matrix for \(Y\text{.}\)

(b)

Repeat part (a) for the data along the second row of \(Y\text{.}\)

(c)

The total variance of the data set is the sum of the variances. Explain why the amount of variance in the data that is accounted for in the direction of \(E_{\lambda_1}\) is

\begin{equation*} \frac{\lambda_1}{\lambda_1 + \lambda_2}\text{.} \end{equation*}

Then calculate this amount for the SAT data.

In general, PCA is most useful for larger data sets. The process is the same.

  • Start with a set of data that forms the rows of an \(m \times n\) matrix. We center the data by subtracting the mean of each row from the entries of that row to create a centered data set in a matrix \(X\text{.}\)

  • The principal components of \(X\) are the eigenvectors of \(XX^{\tr}\text{,}\) ordered so that they correspond to the eigenvalues of \(XX^{\tr}\) in decreasing order.

  • Let \(P\) be the matrix whose rows are the principal components of \(X\text{,}\) ordered from highest to lowest. Then \(Y = PX\) is suitably transformed to identify the important aspects of the data.

  • If \(\lambda_1\text{,}\) \(\lambda_2\text{,}\) \(\ldots\text{,}\) \(\lambda_n\) are the eigenvalues of \(XX^{\tr}\) in decreasing order, then the amount of variance in the data accounted for by the first \(r\) principal components is given by

    \begin{equation*} \frac{\lambda_1+\lambda_2 + \cdots + \lambda_r}{\lambda_1+\lambda_2 + \cdots + \lambda_n}\text{.} \end{equation*}
  • The first \(r\) rows of \(Y=PX\) provide the projection of the data set \(X\) onto an \(r\)-dimensional space spanned by the first \(r\) principal components of \(X\text{.}\)

Project Activity 33.11.

Let us now consider a problem with more than two variables. We continue to keep the data set small so that we can conveniently operate with it. Table 33.12 presents additional information from ten states on four attributes related to the SAT — Participation rates, Evidence-Based Reading and Writing (EBRW) score, Math score, and average SAT score. Use technology as appropriate for this activity.

Table 33.12. SAT data
State \(1\) \(2\) \(3\) \(4\) \(5\) \(6\) \(7\) \(8\) \(9\) \(10\)
Rate \(6\) \(60\) \(97\) \(100\) \(64\) \(99\) \(4\) \(23\) \(79\) \(70\)
EBRW \(595\) \(540\) \(522\) \(508\) \(565\) \(512\) \(643\) \(574\) \(534\) \(539\)
Math \(571\) \(536\) \(493\) \(493\) \(554\) \(501\) \(655\) \(566\) \(534\) \(539\)
SAT \(1166\) \(1076\) \(1014\) \(1001\) \(1120\) \(1013\) \(1296\) \(1140\) \(1068\) \(1086\)
(a)

Determine the centered data matrix \(X\) for this data set.

(b)

Find the covariance matrix for this data set. Round to four decimal places.

(c)

Find the principal components of \(X\text{.}\) Include at least four decimal places accuracy.

(d)

How much variation is accounted for in the data by the first principal component? In other words, if we reduce this data to one dimension, how much of the variation do we retain? Explain.

(e)

How much variation is accounted for in the data by the first two principal components? In other words, if we reduce this data to two dimensions, how much of the variation do we retain? Explain.

We conclude with a comment. A reasonable question to ask is how we interpret the principal components. Let \(P\) be an orthogonal matrix such that \(PCP^{\tr}\) is the diagonal matrix with the eigenvalues of \(C\) along the diagonal, in decreasing order. We then have the new perspective \(Y = PX\) from which to view the data. The first principal component \(\vp_1\) (the first row of \(P\)) determines the new variable \(\vy_1 = [y_i]\) (the first row of \(Y\)) in the following manner. Let \(\vp_1 = [p_i]^{\tr}\) and let \(\vc_i\) represent the columns of \(X\) so that \(X = [\vc_1 \ \vc_2 \ \vc_3 \ \cdots \ \vc_{20}]\text{.}\) Recognizing that

\begin{equation*} PX = \left[ \begin{array}{c} \vp_1^{\tr} \\ \vp_2^{\tr} \\ \vp_3^{\tr} \\ \vp_4^{\tr} \\\vp_5^{\tr} \\ \vp_6^{\tr} \end{array} \right] [\vc_1 \ \vc_2 \ \vc_3 \ \cdots \ \vc_{20}]\text{,} \end{equation*}

we have that

\begin{equation*} y_i = \vp_1^{\tr} \vc_i\text{.} \end{equation*}

That is,

\begin{equation*} \vy_1 = [\vp_1^{\tr}\vc_1 \ \vp_1^{\tr} \vc_2 \ \vp_1^{\tr} \vc_3 \ \cdots \ \vp_{1}^{\tr} \vc_{20}]\text{.} \end{equation*}

So each \(y_i\) is a linear combination of the original variables (contained in the \(\vc_i\)) with weights from the first principal component. The other new variables are obtained in the same way from the remaining principal components. So even though the principal components may not have an easy interpretation in context, they are connected to the original data in this way. By reducing the data to a few important principal components — that is, visualizing the data in a subspace of small dimension — we can account for almost all of the variation in the data and relate that information back to the original data.

It might seem that we should divide by \(n\) instead of \(n-1\) in the variance, but it is generally accepted to do this for reasons we won't get into. Suffice it to say that if we are using a sample of the entire population, then dividing by \(n-1\) provides a variance whose square root is closer to the standard deviation than we would get if we divide by \(n\text{.}\) If we are calculating the variance of an entire population, then we would divide by \(n\text{.}\)