Section 29 The Singular Value Decomposition
Focus Questions
By the end of this section, you should be able to give precise and thorough answers to the questions listed below. You may want to keep these questions in mind to focus your thoughts as you complete the section.
What is the operator norm of a matrix and what does it tell us about the matrix?
What is a singular value decomposition of a matrix? Why is a singular value decomposition important?
How does a singular value decomposition relate fundamental subspaces connected to a matrix?
What is an outer product decomposition of a matrix and how is it useful?
Subsection Application: Search Engines and Semantics
Effective search engines search for more than just words. Language is complex and search engines must deal with the fact that there are often many ways to express a given concept (this is called synonymy, that multiple words can have the same meaning), and that a single word can have multiple meanings (polysemy). As a consequence, a search on a word may provide irrelevant matches (e.g., searching for derivative could provide pages on mathematics or financial securities) or you might search for articles on cats but the paper you really want uses the word felines. A better search engine will not necessarily try to match terms, but instead retrieve information based on concept or intent. Latent Semantic Indexing (LSI) (or Latent Semantic Analysis), developed in the late 1980s, helps search engines determine concept and intent in order to provide more accurate and relevant results. LSI essentially works by providing underlying (latent) relationships between words (semantics) that search engines need to provide context and understanding (indexing). LSI provides a mapping of both words and documents into a lower dimensional “concept” space, and makes the search in this new space. The mapping is provided by the singular value decomposition.
Subsection Introduction
The singular value decomposition (SVD) of a matrix is an important and useful matrix decomposition. Unlike other matrix decompositions, every matrix has a singular value decomposition. The SVD is used in a variety of applications including scientific computing, digital signal processing, image compression, principal component analysis, web searching through latent semantic indexing, and seismology. Recall that the eigenvector decomposition of an \(n \times n\) diagonalizable matrix \(M\) has the form \(P^{-1}MP\text{,}\) where the columns of the matrix \(P\) are \(n\) linearly independent eigenvectors of \(M\) and the diagonal entries of the diagonal matrix \(P^{-1}MP\) are the eigenvalues of \(M\text{.}\) The singular value decomposition does something similar for any matrix of any size. One of the keys to the SVD is that the matrix \(A^{\tr}A\) is symmetric for any matrix \(A\text{.}\)
Subsection The Operator Norm of a Matrix
Before we introduce the Singular Value Decomposition, let us work through some preliminaries to motivate the idea. The first is to provide an answer to the question “How ‘big’ is a matrix?” There are many ways to interpret and answer this question, but a substantial (and useful) answer should involve more than just the dimensions of the matrix. A good measure of the size of a matrix, which we will refer to as the norm of the matrix, should take into account the action of the linear transformation defined by the matrix on vectors. This then will lead to questions about how difficult or easy is it to solve a matrix equation \(A \vx = \vb\text{.}\)
If we want to incorporate the action of a matrix \(A\) into a calculation of the norm of \(A\text{,}\) we might think of measuring how much \(A\) can change a vector \(\vx\text{.}\) This could lead us to using \(||A\vx||\) as some sort of measure of a norm of \(A\text{.}\) However, since \(||A (c\vx)|| = |c| \ ||A\vx||\) for any scalar \(c\text{,}\) scaling \(\vx\) by a large scalar will produce a large norm, so this is not a viable definition of a norm. We could instead measure the relative effect that \(A\) has on a vector \(\vx\) as \(\ds \frac{||A\vx||}{||\vx||}\text{,}\) since this ratio does not change when \(\vx\) is multiplied by a scalar. The largest of all of these ratios would provide a good sense of how much \(A\) can change vectors. Thus, we define the operator norm of a matrix \(A\) as follows.
Definition 29.1.
The operator norm 50 of a matrix \(A\) is
Due to the linearity of matrix multiplication, we can restrict ourselves to unit vectors for an equivalent definition of the operator norm of the matrix \(A\) as
Preview Activity 29.1.
(a)
Determine \(||A||\) if \(A\) is the zero matrix.
(b)
Determine \(||I_n||\text{,}\) where \(I_n\) is the \(n \times n\) identity matrix.
(c)
Let \(A = \left[ \begin{array}{cc} 1\amp 0\\0\amp 2 \end{array} \right]\text{.}\) Find \(||A||\text{.}\) Justify your answer. (Hint: \(x_1^2+4x_2^2 \leq 4(x_1^2 + x_2^2)\text{.}\))
(d)
If \(P\) is an orthogonal matrix, what is \(||P||\text{?}\) Why?
The operator norm of a matrix tells us that how big the action of an \(m \times n\) matrix is can be determined by its action on the unit sphere in \(\R^n\) (the unit sphere is the set of terminal point of unit vectors). Let us consider two examples.
Example 29.2.
Let \(A = \left[ \begin{array}{cc} 2\amp 1 \\ 2\amp 5 \end{array} \right]\text{.}\) We can draw a graph to see the action of \(A\) on the unit circle. A picture of the set \(\{A\vx \ : \ ||\vx|| = 1\}\) is shown in Figure 29.3.
It appears that \(A\) transforms the unit circle into an ellipse. To find \(||A||\) we want to maximize \(||A\vx||\) for \(\vx\) on the unit circle. This is the same as maximizing
Now \(A^{\tr}A = \left[ \begin{array}{cc} 8\amp 12 \\ 12\amp 26 \end{array} \right]\) is a symmetric matrix, so we can orthogonally diagonalize \(A^{\tr}A\text{.}\) The eigenvalues of \(A^{\tr}A\) are 32 and 2. Let \(P = [\vu_1 \ \vu_2]\text{,}\) where \(\vu_1=\left[ \frac{\sqrt{5}}{5} \ \frac{2\sqrt{5}}{5} \right]^{\tr}\) is a unit eigenvector of \(A^{\tr}A\) with eigenvalue 32 and \(\vu_2=\left[ -\frac{2\sqrt{5}}{5} \ \frac{\sqrt{5}}{5} \right]^{\tr}\) is a unit eigenvector of \(A^{\tr}A\) with eigenvalue 2. Then \(P\) is an orthogonal matrix such that \(P^{\tr}(A^{\tr}A)P = \left[ \begin{array}{cc} 32\amp 0 \\ 0\amp 2 \end{array} \right] = D\text{.}\) It follows that
Now \(P^{\tr}\) is orthogonal, so \(||P^{\tr}\vx|| = ||\vx||\) and \(P^{\tr}\) maps the unit circle to the unit circle. Moreover, if \(\vx\) is on the unit circle, then \(\vy = P\vx\) is also on the unit circle and \(P^{\tr}\vy = P^{\tr}P\vx = \vx\text{.}\) So every point \(\vx\) on the unit circle corresponds to a point \(P\vx\) on the unit circle. Thus, the forms \(\vx^{\tr} (A^{\tr}A) \vx\) and \((P^{\tr}\vx)^{\tr} D (P^{\tr}\vx)\) take on exactly the same values over all points on the unit circle. Now we just need to find the maximum value of \((P^{\tr}\vx)^{\tr} D (P^{\tr}\vx)\text{.}\) This turns out to be relatively easy since \(D\) is a diagonal matrix.
Let's simplify the notation. Let \(\vy = P^{\tr}\vx\text{.}\) Then our job is to maximize \(\vy^{\tr}D\vy\text{.}\) If \(\vy = [y_1 \ y_2]^{\tr}\text{,}\) then
We want to find the maximum value of this expression for \(\vy\) on the unit circle. Note that \(2y_2^2 \leq 32y_2^2\) and so
Since \([1 \ 0]^{\tr}\) is on the unit circle, the expression \(32y_1^2 + 2y_2^2\) attains the value 32 at some point on the unit circle, so 32 is the maximum value of \(\vy^{\tr} D \vy\) over all \(\vy\) on the unit circle. While we are at it, we can similarly find the minimum value of \(\vy^{\tr} D \vy\) for \(\vy\) on the unit circle. Since \(2y_1^2 \leq 32y_1^2\) we see that
Since the expression \(\vy^{\tr} D \vy\) attains the value 2 at \([0 \ 1]^{\tr}\) on the unit circle, we can see that \(\vy^{\tr} D \vy\) attains the minimum value of 2 on the unit circle.
Now we can return to the expression \(\vx^{\tr} (A^{\tr}A) \vx\text{.}\) Since \(\vy^{\tr} D \vy\) assumes the same values as \(\vx^{\tr} (A^{\tr}A) \vx\text{,}\) we can say that the maximum value of \(\vx^{\tr} (A^{\tr}A) \vx\) for \(\vx\) on the unit circle is 32 (and the minimum value is 2). Moreover, the quadratic form \((P^{\tr}\vx)^{\tr} D (P^{\tr}\vx)\) assumes its maximum value when \(P^{\tr}\vx = [1 \ 0]^{\tr}\) or \([-1 \ 0]^{\tr}\text{.}\) Thus, the form \(\vx^{\tr} (A^{\tr}A) \vx\) assumes its maximum value at the vector \(\vx = P[1 \ 0 ]^{\tr} = \vu_1\) or \(-\vu_1\text{.}\) Similarly, the quadratic form \(\vx^{\tr} (A^{\tr}A) \vx\) attains its minimum value at \(P[0 \ 1]^{\tr} = \vu_2\) or \(-\vu_2\text{.}\) We conclude that \(||A|| = \sqrt{32}\text{.}\)
Figure 29.4 shows the image of the unit circle under the action of \(A\) and the images of \(A\vu_1\) and \(A\vu_2\) where \(\vu_1, \vu_2\) are the two unit eigenvectors of \(A^{\tr}A\text{.}\) The image also supports that \(A\vx\) assumes its maximum and minimum values for points on the unit circle at \(\vu_1\) and \(\vu_2\text{.}\)
IMPORTANTE NOTE 1.
What we have just argued is that the maximum value of \(||A\vx||\) for \(\vx\) on the unit sphere in \(\R^n\) is the square root of the largest eigenvalue of \(A^{\tr}A\) and occurs at a corresponding unit eigenvector.
Example 29.5.
This same process works for matrices other than \(2 \times 2\) ones. For example, consider \(A = \left[ \begin{array}{rrr} -2\amp 8\amp 20 \\ 14\amp 19\amp 10 \end{array} \right]\text{.}\) In this case \(A\) maps \(\R^3\) to \(\R^2\text{.}\) The image of the unit sphere \(\{\vx \in \R^3 : ||\vx|| = 1\}\) under left multiplication by \(A\) is a filled ellipse as shown in Figure 29.6.
As with the previous example, the norm of \(A\) is the square root of the maximum value of \(\vx^{\tr} (A^{\tr}A) \vx\) and this maximum value is the dominant eigenvalue of \(A^{\tr}A = \left[ \begin{array}{ccc} 200\amp 250\amp 100\\ 250\amp 425\amp 350\\ 100\amp 350\amp 500 \end{array} \right]\text{.}\) The eigenvalues of \(A\) are \(\lambda_1 = 900\text{,}\) \(\lambda_2 = 225\text{,}\) and \(\lambda_3 = 0\) with corresponding unit eigenvectors \(\vu_1=\left[ \frac{1}{3} \ \frac{2}{3} \ \frac{2}{3} \right]^{\tr}\text{,}\) \(\vu_1=\left[ -\frac{2}{3} \ -\frac{1}{3} \ \frac{2}{3} \right]^{\tr}\text{,}\) and \(\vu_3=\left[ \frac{2}{3} \ -\frac{2}{3} \ \frac{1}{3} \right]^{\tr}\text{.}\) So in this case we have \(||A|| = \sqrt{900} = 30\text{.}\) The transformation defined by matrix multiplication by \(A\) from \(\R^3\) to \(\R^2\) has a one-dimensional kernel which is spanned by the eigenvector corresponding to \(\lambda_3\text{.}\) The image of the transformation is 2-dimensional and the image of the unit circle is an ellipse where \(A\vu_1\) gives the major axis of the ellipse and \(A\vu_2\) gives the minor axis. Essentially, the square roots of the eigenvalues of \(A^{\tr}A\) tell us how \(A\) stretches the image space in each direction.
IMPORTANT NOTE 2.
We have just argued that the image of the unit \(n\)-sphere under the action of an \(m \times n\) matrix is an ellipsoid in \(\R^m\) stretched the greatest amount, \(\sqrt{\lambda_1}\text{,}\) in the direction of an eigenvector for the largest eigenvalue (\(\lambda_1\)) of \(A^{\tr}A\text{;}\) the next greatest amount, \(\sqrt{\lambda_2}\text{,}\) in the direction of a unit vector for the second largest eigenvalue (\(\lambda_2\)) of \(A^{\tr}A\text{;}\) and so on.
Activity 29.2.
Let \(A = \left[ \begin{array}{rc} 0\amp 5\\ 4\amp 3 \\ -2\amp 1 \end{array} \right]\text{.}\) Then \(A^{\tr}A = \left[ \begin{array}{cc} 20\amp 10\\ 10\amp 35 \end{array} \right]\text{.}\) The eigenvalues of \(A^{\tr}A\) are \(\lambda_1 = 40\) and \(\lambda_2 = 15\) with respective eigenvectors \(\vv_1 = \left[ \begin{array}{c} \frac{1}{2} \\ 1 \end{array} \right]\) and \(\vv_2 = \left[ \begin{array}{r} -2 \\ 1 \end{array} \right]\text{.}\)
(a)
Find \(||A||\text{.}\)
(b)
Find a unit vector \(\vx\) at which \(||A\vx||\) assumes its maximum value.
Subsection The SVD
The Singular Value Decomposition (SVD) is essentially a concise statement of what we saw in the previous section that works for any matrix. We will uncover the SVD in this section.
Preview Activity 29.3.
Let \(A = \left[ \begin{array}{ccc} 1\amp 1\amp 0\\ 0\amp 1\amp 1 \end{array} \right]\text{.}\) Since \(A\) is not square, we cannot diagonalize \(A\text{.}\) However, the matrix
is a symmetric matrix and can be orthogonally diagonalized. The eigenvalues of \(A^{\tr}A\) are 3, 1, and 0 with corresponding eigenvectors
respectively. Use appropriate technology to do the following.
(a)
Find an orthogonal matrix \(V = [\vv_1 \ \vv_2 \ \vv_3]\) that orthogonally diagonalizes \(A^{\tr}A\text{,}\) where
(b)
For \(i=1,2\text{,}\) let \(\vu_i = \frac{A \vv_i}{||A \vv_i||}\text{.}\) Find each \(\vu_i\text{.}\) Why don't we define \(\vu_3\) in this way?
(c)
Let \(U = [\vu_1 \ \vu_2]\text{.}\) What kind of matrix is \(U\text{?}\) Explain.
(d)
Calculate the matrix product \(U^{\tr}AV\text{.}\) What do you notice? How is this similar to the eigenvector decomposition of a matrix?
Preview Activity 29.3 contains the basic ideas behind the Singular Value Decomposition. Let \(A\) be an \(m \times n\) matrix with real entries. Note that \(A^{\tr}A\) is a symmetric \(n \times n\) matrix and, hence, it can be orthogonally diagonalized. Let \(V = [\vv_1 \ \vv_2 \ \vv_3 \ \cdots \ \vv_n ]\) be an \(n \times n\) orthogonal matrix whose columns form an orthonormal set of eigenvectors for \(A^{\tr}A\text{.}\) For each \(i\text{,}\) let \((A^{\tr}A)\vv_i = \lambda_i \vv_i\text{.}\) We know
Now notice that for each \(i\) we have
so \(\lambda_i \geq 0\text{.}\) Thus, the matrix \(A^{\tr}A\) has no negative eigenvalues. We can always arrange the eigenvectors and eigenvalues of \(A^{\tr}A\) so that
Also note that
if \(i \neq j\text{.}\) So the set \(\{A\vv_1, A\vv_2, \ldots, A\vv_n\}\) is an orthogonal set in \(\R^m\text{.}\) Each of the vectors \(A\vv_i\) is in \(\Col A\text{,}\) and so \(\{A\vv_1, A\vv_2, \ldots, A\vv_n\}\) is an orthogonal subset of \(\Col A\text{.}\) It is possible that \(A\vv_i = \vzero\) for some of the \(\vv_i\) (if \(A^{\tr}A\) has 0 as an eigenvalue). Let \(\vv_1\text{,}\) \(\vv_2\text{,}\) \(\ldots\text{,}\) \(\vv_r\) be the eigenvectors corresponding to the nonzero eigenvalues. Then the set
is a linearly independent set of nonzero orthogonal vectors in \(\Col A\text{.}\) Now we will show that \(\CB\) is a basis for \(\Col A\text{.}\) Let \(\vy\) be a vector in \(\Col A\text{.}\) Then \(\vy = A\vx\) for some vector \(\vx\) in \(\R^n\text{.}\) Recall that the vectors \(\vv_1\text{,}\) \(\vv_2\text{,}\) \(\ldots\text{,}\) \(\vv_n\) form an orthonormal basis of \(\R^n\text{,}\) so
for some scalars \(x_1\text{,}\) \(x_2\text{,}\) \(\ldots\text{,}\) \(x_n\text{.}\) Since \(A\vv_j = \vzero\) for \(r+1 \leq j \leq n\) we have
So \(\Span \ \CB = \Col A\) and \(\CB\) is an orthogonal basis for \(\Col A\text{.}\)
Now we are ready to find the Singular Value Decomposition of \(A\text{.}\) First we create an orthonormal basis \(\{\vu_1, \vu_2, \ldots, \vu_r\}\) for \(\Col A\) by normalizing the vectors \(A\vv_i\text{.}\) So we let
for \(i\) from 1 to \(r\text{.}\)
Remember from (29.1) that \(||A\vv_i||^2 = \lambda_i\text{,}\) so if we let \(\sigma_i = \sqrt{\lambda_i}\text{,}\) then we have
We ordered the \(\lambda_i\) so that \(\lambda_1 \geq \lambda_2 \geq \cdots \geq \lambda_n\text{,}\) so we also have
The scalars \(\sigma_1\text{,}\) \(\sigma_2\text{,}\) \(\ldots\text{,}\) \(\sigma_n\) are called the singular values of \(A\text{.}\)
Definition 29.7.
Let \(A\) be an \(m \times n\) matrix. The singular values of \(A\) are the square roots of the eigenvalues of \(A^{\tr}A\text{.}\)
The vectors \(\vu_1\text{,}\) \(\vu_2\text{,}\) \(\ldots\text{,}\) \(\vu_r\) are \(r\) orthonormal vectors in \(\R^m\text{.}\) We can extend the set \(\{\vu_1\text{,}\) \(\vu_2\text{,}\) \(\ldots\text{,}\) \(\vu_r\}\) to an orthonormal basis \(\CC = \{\vu_1, \vu_2, \ldots, \vu_r, \vu_{r+1} \vu_{r+2}, \ldots, \vu_m\}\) of \(\R^m\text{.}\) Recall that \(A \vv_i = \sigma_i \vu_i\) for \(1 \leq i \leq r\) and \(A \vv_j = \vzero\) for \(r+1 \leq j \leq n\text{,}\) so
We can write the matrix \([\sigma_1 \vv_1 \ \sigma_2 \vv_2 \ \cdots \ \sigma_r \vv_r \ \vzero \ \vzero \ \cdots \ \vzero]\) in another way. Let \(\Sigma\) be the \(m\times n\) matrix defined as
Now
So if \(U = [\vu_1 \ \vu_2 \ \cdots \ \vu_m]\text{,}\) then
Since \(V\) is an orthogonal matrix, we have that
This is the Singular Value Decomposition of \(A\text{.}\)
Theorem 29.8. The Singular Value Decomposition.
Let \(A\) be an \(m \times n\) matrix of rank \(r\text{.}\) There exist an \(m \times m\) orthogonal matrix \(U\text{,}\) an \(n \times n\) orthogonal matrix \(V\text{,}\) and an \(m \times n\) matrix \(\Sigma\) whose first \(r\) diagonal entries are the singular values \(\sigma_1\text{,}\) \(\sigma_2\text{,}\) \(\ldots\text{,}\) \(\sigma_r\) and whose other entries are 0, such that
SVD Summary.
A Singular Value Decomposition of an \(m \times n\) matrix \(A\) of rank \(r\) can be found as follows.
Find an orthonormal basis \(\{\vv_1, \vv_2, \vv_3, \ldots, \vv_n\}\) of eigenvectors of \(A^{\tr}A\) such that \((A^{\tr}A) \vv_i = \lambda_i \vv_i\) for \(i\) from 1 to \(n\) with \(\lambda_1 \geq \lambda_{2} \geq \cdots \geq \lambda_n \geq 0\) with the first \(r\) eigenvalues being positive. The vectors \(\vv_1\text{,}\) \(\vv_2\text{,}\) \(\vv_3\text{,}\) \(\ldots\text{,}\) \(\vv_n\) are the right singular vectors of \(A\text{.}\)
-
Let
\begin{equation*} V= [\vv_1 \ \vv_2 \ \vv_3 \ \cdots \ \vv_n ]\text{.} \end{equation*}Then \(V\) orthogonally diagonalizes \(A^{\tr}A\text{.}\)
-
The singular values of \(A\) are the numbers \(\sigma_i\text{,}\) where \(\sigma_i = \sqrt{\lambda_i} > 0\) for \(i\) from 1 to \(r\text{.}\) Let \(\Sigma\) be the \(m \times n\) matrix
\begin{equation*} \Sigma = \left[ \begin{array}{ccccc|c} \sigma_1\amp \amp \amp \amp 0\amp \\ \amp \sigma_2\amp \amp \amp \amp 0 \\ \amp \amp \sigma_3\amp \amp \amp \\ \amp \amp \amp \ddots \amp \amp \\ 0\amp \amp \amp \amp \sigma_r \\ \hline \amp \amp 0\amp \amp \amp 0 \end{array} \right] \end{equation*} For \(i\) from 1 to \(r\text{,}\) let \(\vu_i = \frac{A\vv_i}{||A\vv_i||}\text{.}\) Then the set \(\{\vu_1, \vu_2, \ldots, \vu_r\}\) forms an orthonormal basis of \(\Col A\text{.}\)
-
Extend the set \(\{\vu_1, \vu_2, \ldots, \vu_r\}\) to an orthonormal basis
\begin{equation*} \{\vu_1, \vu_2, \ldots, \vu_r, \vu_{r+1} \vu_{r+2}, \ldots, \vu_m\} \end{equation*}of \(\R^m\text{.}\) Let
\begin{equation*} U = [\vu_1 \ \vu_2 \ \cdots \ \vu_m]\text{.} \end{equation*}The vectors \(\vu_1\text{,}\) \(\vu_2\text{,}\) \(\ldots\text{,}\) \(\vu_m\) are the left singular vectors of \(A\text{.}\)
Then \(A = U \Sigma V^{\tr}\) is a singular value decomposition of \(A\text{.}\)
Activity 29.4.
Let \(A = \left[ \begin{array}{rc} 0\amp 5\\ 4\amp 3 \\ -2\amp 1 \end{array} \right]\text{.}\) Then \(A^{\tr}A = \left[ \begin{array}{cc} 20\amp 10\\ 10\amp 35 \end{array} \right]\text{.}\) The eigenvalues of \(A^{\tr}A\) are \(\lambda_1 = 40\) and \(\lambda_2 = 15\) with respective eigenvectors \(\vw_1 = \left[ \begin{array}{c} 1 \\ 2 \end{array} \right]\) and \(\vw_2 = \left[ \begin{array}{r} -2 \\ 1 \end{array} \right]\text{.}\)
(a)
Find an orthonormal basis \(\{\vv_1, \vv_2, \vv_3, \ldots, \vv_n\}\) of eigenvectors of \(A^{\tr}A\text{.}\) What is \(n\text{?}\) Find the matrix \(V\) in a SVD for \(A\text{.}\)
(b)
Find the singular values of \(A\text{.}\) What is the rank \(r\) of \(A\text{?}\) Why?
(c)
What are the dimensions of the matrix \(\Sigma\) in a SVD of \(A\text{?}\) Find \(\Sigma\text{.}\)
(d)
Find the vectors \(\vu_1\text{,}\) \(\vu_2\text{,}\) \(\ldots\text{,}\) \(\vu_r\text{.}\) If necessary, extend this set to an orthonormal basis
of \(\R^m\text{.}\)
(e)
Find the matrix \(U\) so that \(A = U \Sigma V^{\tr}\) is a SVD for \(A\text{.}\)
There is another way we can write this SVD of \(A\text{.}\) Let the \(m \times n\) matrix \(A\) have a singular value decomposition \(U \Sigma V^{\tr}\text{,}\) where
Since \(A = U \Sigma V^{\tr}\) we see that
This is called an outer product decomposition of \(A\) and tells us everything we learned above about the action of the matrix \(A\) as a linear transformation. Each of the products \(\vu_i\vv_i^{\tr}\) is a rank 1 matrix (see Exercise 9), and \(||A\vv_1|| = \sigma_1\) is the largest value \(A\) takes on the unit \(n\)-sphere, \(||A\vv_2|| = \sigma_2\) is the next largest dilation of the unit \(n\)-sphere, and so on. An outer product decomposition allows us to approximate \(A\) with smaller rank matrices. For example, the matrix \(\sigma_1 \vu_1\vv_1^{\tr}\) is the best rank 1 approximation to \(A\text{,}\) \(\sigma_1 \vu_1\vv_1^{\tr} + \sigma_2 \vu_2\vv_2^{\tr}\) is the best rank 2 approximation, and so on. This will be very useful in applications, as we will see in the next section.
Subsection SVD and the Null, Column, and Row Spaces of a Matrix
We conclude this section with a short discussion of how a singular value decomposition relates fundamental subspaces of a matrix. We have seen that the vectors \(\vu_1\text{,}\) \(\vu_2\text{,}\) \(\ldots\text{,}\) \(\vu_r\) in an SVD for an \(m \times n\) matrix \(A\) form a basis for \(\Col A\text{.}\) Recall also that \(A \vv_j = \vzero\) for \(r+1 \leq j \leq n\text{.}\) Since \(\dim(\Nul A) + \dim(\Col A) = n\text{,}\) it follows that the vectors \(\vv_{r+1}\text{,}\) \(\vv_{r+2}\text{,}\) \(\ldots\text{,}\) \(\vv_n\) form a basis for \(\Nul A\text{.}\) As you will show in the exercises, the set \(\{\vv_1, \vv_2, \ldots, \vv_r\}\) is a basis for \(\Row A\text{.}\) Thus, an SVD for a matrix \(A\) tells us about three fundamental vector spaces related to \(A\text{.}\)
Subsection Examples
What follows are worked examples that use the concepts from this section.
Example 29.9.
Let \(A = \left[ \begin{array}{cccc} 2 \amp 0 \amp 0 \amp 0 \\ 0 \amp 2 \amp 1 \amp 0 \\ 0\amp 1 \amp 2 \amp 0 \end{array} \right]\text{.}\)
(a)
Find a singular value decomposition for \(A\text{.}\) You may use technology to find eigenvalues and eigenvectors of matrices.
Solution.
With \(A\) as given, we have \(A^{\tr}A = \left[ \begin{array}{cccc} 4 \amp 0 \amp 0 \amp 0 \\ 0 \amp 5 \amp 4 \amp 0 \\ 0 \amp 4 \amp 5 \amp 0 \\ 0\amp 0\amp 0\amp 0 \end{array} \right]\text{.}\) Technology shows that the eigenvalues of \(A^{\tr}A\) are \(\lambda_1 = 9\text{,}\) \(\lambda_2 = 4\text{,}\) \(\lambda_3 = 1\text{,}\) and \(\lambda_4 = 0\) with corresponding orthonormal eigenvectors \(\vv_1 = \frac{1}{\sqrt{2}}[0 \ 1 \ 1 \ 0]^{\tr}\text{,}\) \(\vv_2 = [1 \ 0 \ 0 \ 0]^{\tr}\text{,}\) \(\vv_3 = \frac{1}{\sqrt{2}}[0 \ -1 \ 1 \ 0]^{\tr}\text{,}\) and \(\vv_4 = [0 \ 0 \ 0 \ 1]^{\tr}\text{.}\) This makes \(V = [\vv_1 \ \vv_2 \ \vv_3 \ \vv_4]\text{.}\) The singular values of \(A\) are \(\sigma_1 = \sqrt{9} = 3\text{,}\) \(\sigma_2 = \sqrt{4}= 2\text{,}\) \(\sigma_3 = \sqrt{1} = 1\text{,}\) and \(\sigma_4 = 0\text{,}\) so \(\Sigma\) is the \(3 \times 4\) matrix with the nonzero singular values along the diagonal and zeros everywhere else. Finally, we define the vectors \(\vu_i\) as \(\vu_i = \frac{1}{||A \vv_i||} A\vv_i\text{.}\) Again, technology gives us \(\vu_1 = \frac{1}{\sqrt{2}}[0 \ 1 \ 1]^{\tr}\text{,}\) \(\vu_2 = [1 \ 0 \ 0]^{\tr}\text{,}\) and \(\vu_3 = \frac{1}{\sqrt{2}}[0 \ -1 \ 1]^{\tr}\text{.}\) Thus, a singular value decomposition of \(A\) is \(U \Sigma V^{\tr}\text{,}\) where
(b)
Use the singular value decomposition to find a basis for \(\Col A\text{,}\) \(\Row A\text{,}\) and \(\Nul A\text{.}\)
Solution.
Recall that the right singular vectors of an \(m \times n\) matrix \(A\) of rank \(r\) form an orthonormal basis \(\{\vv_1, \vv_2, \vv_3, \ldots, \vv_n\}\) of eigenvectors of \(A^{\tr}A\) such that \((A^{\tr}A) \vv_i = \lambda_i \vv_i\) for \(i\) from 1 to \(n\) with \(\lambda_1 \geq \lambda_{2} \geq \cdots \geq \lambda_n \geq 0\text{.}\) These vectors are the columns of the matrix \(V = [\vv_1 \ \vv_2 \ \cdots \ \vv_n]\) in a singular value decomposition of \(A\text{.}\) For \(i\) from 1 to \(r\text{,}\) we let \(\vu_i = \frac{A\vv_i}{||A\vv_i||}\text{.}\) Then the set \(\{\vu_1, \vu_2, \ldots, \vu_r\}\) forms an orthonormal basis of \(\Col A\text{.}\) We extend this set \(\{\vu_1, \vu_2, \ldots, \vu_r\}\) to an orthonormal basis \(\{\vu_1, \vu_2, \ldots, \vu_r, \vu_{r+1} \vu_{r+2}, \ldots, \vu_m\}\) of \(\R^m\text{.}\) Recall also that \(A \vv_j = \vzero\) for \(r+1 \leq j \leq n\text{.}\) Since \(\dim(\Nul A) + \dim(\Col A) = n\text{,}\) it follows that the vectors \(\vv_{r+1}\text{,}\) \(\vv_{r+2}\text{,}\) \(\ldots\text{,}\) \(\vv_n\) form a basis for \(\Nul A\text{.}\) Finally, the set \(\{\vv_1, \vv_2, \ldots, \vv_r\}\) is a basis for \(\Row A\text{.}\) So in our example, we have \(m = 3\text{,}\) \(n = 4\text{,}\) \(\vv_1 = \frac{1}{\sqrt{2}}[0 \ 1 \ 1 \ 0]^{\tr}\text{,}\) \(\vv_2 = [1 \ 0 \ 0 \ 0]^{\tr}\text{,}\) \(\vv_3 = \frac{1}{\sqrt{2}}[0 \ -1 \ 1 \ 0]^{\tr}\text{,}\) and \(\vv_4 = [0 \ 0 \ 0 \ 1]^{\tr}\text{.}\) Since the singular values of \(A\) are \(3\text{,}\) \(2\text{,}\) \(1\text{,}\) and \(0\text{,}\) it follows that \(r = \rank(A) = 3\text{.}\) We also have \(\vu_1 = \frac{1}{\sqrt{2}}[0 \ 1 \ 1]^{\tr}\text{,}\) \(\vu_2 = [1 \ 0 \ 0]^{\tr}\text{,}\) and \(\vu_3 = \frac{1}{\sqrt{2}}[ 0 \ -1 \ 1]^{\tr}\text{.}\) So
is a basis for \(\Row A\) and
is a basis for \(\Nul A\text{.}\) Finally, the set
is a basis for \(\Col A\text{.}\)
Example 29.10.
Let
The eigenvalues of \(A^{\tr}A\) are \(\lambda_1 = 144\text{,}\) \(\lambda_2 = 36\text{,}\) and \(\lambda_3=0\) with corresponding eigenvectors
In addition,
(a)
Find orthogonal matrices \(U\) and \(V\text{,}\) and the matrix \(\Sigma\text{,}\) so that \(U \Sigma V^{\tr}\) is a singular value decomposition of \(A\text{.}\)
Solution.
Normalizing the eigenvectors \(\vw_1\text{,}\) \(\vw_2\text{,}\) and \(\vw_3\) to normal eigenvectors \(\vv_1\text{,}\) \(\vv_2\text{,}\) and \(\vv_3\text{,}\) respectively, gives us an orthogonal matrix
Now \(A \vv_i = A \frac{\vw_i}{||\vw_i||} = \frac{1}{|| \vw_i ||} A \vw_i\text{,}\) so normalizing the vectors \(A \vv_1\) and \(A \vv_2\) gives us vectors
that are the first two columns of our matrix \(U\text{.}\) Given that \(U\) is a \(4 \times 4\) matrix, we need to find two other vectors orthogonal to \(\vu_1\) and \(\vu_2\) that will combine with \(\vu_1\) and \(\vu_2\) to form an orthogonal basis for \(\R^4\text{.}\) Letting \(\vz_1 = [1 \ 1 \ 1 \ 1]^{\tr}\text{,}\) \(\vz_2 = [1 \ -1 \ -1 \ 1]^{\tr}\text{,}\) \(\vz_3 = [1 \ 0 \ 0 \ 0]^{\tr}\text{,}\) and \(\vz_4 = [0 \ 1 \ 0 \ 1]^{\tr}\text{,}\) a computer algebra system shows that the reduced row echelon form of the matrix \([\vz_1 \ \vz_2 \ \vz_3 \ \vz_4]\) is \(I_4\text{,}\) so that vectors \(\vz_1\text{,}\) \(\vz_2\text{,}\) \(\vz_3\text{,}\) \(\vz_4\) are linearly independent. Letting \(\vw_1 = \vz_1\) and \(\vw_2 = \vz_2\text{,}\) the Gram-Schmidt process shows that the set \(\{\vw_1, \vw_2, \vw_3, \vw_4\}\) is an orthogonal basis for \(\R^4\text{,}\) where
and (using \([1 \ 0 \ 0 \ -1]^{\tr}\) for \(\vw_3\))
The set \(\{\vu_1, \vu_2, \vu_3, \vu_4\}\) where \(\vu_1 = \frac{1}{2}[1 \ 1 \ 1 \ 1]^{\tr}\text{,}\) \(\vu_2 = \frac{1}{2}[1 \ -1 \ -1 \ 1]^{\tr}\text{,}\) \(\vu_3 = \frac{1}{\sqrt{2}}[1 \ 0 \ 0 \ -1]^{\tr}\) and \(\vu_4 = \frac{1}{\sqrt{2}}[0 \ 1 \ -1 \ 0]^{\tr}\) is an orthonormal basis for \(\R^4\) and we can let
The singular values of \(A\) are \(\sigma_1 = \sqrt{\lambda_1} = 12\) and \(\sigma_2 = \sqrt{\lambda_2} = 6\text{,}\) and so
Therefore, a singular value decomposition of \(A\) is \(U \Sigma V^{\tr}\) of
(b)
Determine the best rank 1 approximation to \(A\text{.}\)
Solution.
Determine the best rank 1 approximation to \(A\text{.}\) The outer product decomposition of \(A\) is
So the rank one approximation to \(A\) is
Note that the rows in this rank one approximation are the averages of the two distinct rows in the matrix \(A\text{,}\) which makes sense considering that this is the closest rank one matrix to \(A\text{.}\)
Subsection Summary
We learned about the singular value decomposition of a matrix.
-
The operator norm of an \(m \times n\) matrix \(A\) is
\begin{equation*} ||A|| = \max_{||\vx|| \neq 0} \frac{||A\vx||}{||\vx||} = \max_{||\vx||=1} ||A\vx||\text{.} \end{equation*}The operator norm of a matrix tells us that how big the action of an \(m \times n\) matrix is can be determined by its action on the unit sphere in \(\R^n\text{.}\)
-
A singular value decomposition of an \(m \times n\) matrix is of the form \(A = U \Sigma V^{\tr}\text{,}\) where
\(V= [\vv_1 \ \vv_2 \ \vv_3 \ \cdots \ \vv_n ]\) where \(\{\vv_1, \vv_2, \vv_3, \ldots, \vv_n\}\) is an orthonormal basis of eigenvectors of \(A^{\tr}A\) such that \((A^{\tr}A) \vv_i = \lambda_i \vv_i\) for \(i\) from 1 to \(n\) with \(\lambda_1 \geq \lambda_{2} \geq \cdots \geq \lambda_n \geq 0\text{,}\)
\(U = [\vu_1 \ \vu_2 \ \cdots \ \vu_m]\) where \(\vu_i = \frac{A\vv_i}{||A\vv_i||}\) for \(i\) from 1 to \(r\text{,}\) and this orthonormal basis of \(\Col A\) is extended to an orthonormal basis \(\{\vu_1, \vu_2, \ldots, \vu_r, \vu_{r+1} \vu_{r+2}, \ldots, \vu_m\}\) of \(\R^m\text{,}\)
-
\(\Sigma = \left[ \begin{array}{ccccc|c} \sigma_1\amp \amp \amp \amp 0\amp \\ \amp \sigma_2\amp \amp \amp \amp 0 \\ \amp \amp \sigma_3\amp \amp \amp \\ \amp \amp \amp \ddots \amp \amp \\ 0\amp \amp \amp \amp \sigma_r \\ \hline \amp \amp 0\amp \amp \amp 0 \end{array} \right]\text{,}\) where \(\sigma_i = \sqrt{\lambda_i} > 0\) for \(i\) from 1 to \(r\text{.}\)
A singular value decomposition is important in that every matrix has a singular value decomposition, and a singular value decomposition has a variety of applications including scientific computing and digital signal processing, image compression, principal component analysis, web searching through latent semantic indexing, and seismology.
The vectors \(\vu_1\text{,}\) \(\vu_2\text{,}\) \(\ldots\text{,}\) \(\vu_r\) in an SVD for an \(m \times n\) matrix \(A\) form a basis for \(\Col A\) while the vectors \(\vv_{r+1}\text{,}\) \(\vv_{r+2}\text{,}\) \(\ldots\text{,}\) \(\vv_n\) form a basis for \(\Nul A\text{.}\) Also, the set \(\{\vv_1, \vv_2, \ldots, \vv_r\}\) is a basis for \(\Row A\text{.}\)
-
Let \(A\) have an SVD as in the second bullet. Decomposing \(A\) as
\begin{equation*} A =\sigma_1 \vu_1\vv_1^{\tr} + \sigma_2 \vu_2\vv_2^{\tr} + \sigma_3 \vu_3\vv_3^{\tr} + \cdots + \sigma_r \vu_r\vv_r^{\tr} \end{equation*}is an outer product decomposition of \(A\text{.}\) An outer product decomposition allows us to approximate \(A\) with smaller rank matrices. For example, the matrix \(\sigma_1 \vu_1\vv_1^{\tr}\) is the best rank 1 approximation to \(A\text{,}\) \(\sigma_1 \vu_1\vv_1^{\tr} + \sigma_2 \vu_2\vv_2^{\tr}\) is the best rank 2 approximation, and so on.
Exercises Exercises
1.
Find a singular value decomposition of the following matrices.
(a)
\(\left[ \begin{array}{cc} 1\amp 1\\0\amp 0 \end{array} \right]\)
(b)
\(\left[ \begin{array}{c} 1\\0\\1 \end{array} \right]\)
(c)
\(\left[ \begin{array}{ccc} 1\amp 1\amp 0\\1\amp 0\amp 1 \end{array} \right]\)
(d)
\(\left[ \begin{array}{cc} 1\amp 2\\2\amp 1\\3\amp 1\\1\amp 3 \end{array} \right]\)
(e)
\(\left[ \begin{array}{cccc} 2\amp 0\amp 0\amp 0\\0\amp 2\amp 1\amp 0\\0\amp 1\amp 2\amp 0 \end{array} \right]\)
2.
Let \(A\) be an \(m \times n\) matrix of rank \(r\) with singular value decomposition \(U \Sigma V^{\tr}\text{,}\) where \(U = [ \vu_1 \ \vu_2 \ \cdots \ \vu_m ]\) and \(V = [\vv_1 \ \vv_2 \ \cdots \ \vv_n]\text{.}\) We have seen that the set \(\{ \vu_1, \vu_2, \ldots, \vu_r\}\) is a basis for \(\Col A\text{,}\) and the vectors \(\vv_{r+1}\text{,}\) \(\vv_{r+2}\text{,}\) \(\ldots\text{,}\) \(\vv_n\) form a basis for \(\Nul A\text{.}\) In this exercise we examine the set \(\{\vv_1, \vv_2, \ldots, \vv_r\}\) and determine what this set tells us about \(\Row A\text{.}\)
(a)
Find a singular value decomposition for \(A^{\tr}\text{.}\)
Use the singular value decomposition \(U \Sigma V^{\tr}\) for \(A\text{.}\)
(b)
Explain why the result of (a) shows that the set \(\{\vv_1, \vv_2, \ldots, \vv_r\}\) is a basis for \(\Row A\text{.}\)
3.
Let \(A = \left[ \begin{array}{cc} 1\amp 1 \\ 2\amp 2 \\ 3\amp 3 \end{array} \right]\text{.}\)
(a)
Find the singular values of \(A\text{.}\)
(b)
Find a singular value decomposition of \(A\text{.}\)
(c)
Use a singular value decomposition to find orthonormal bases for the following:
(i)
\(\Nul A\)
(ii)
\(\Col A\)
(iii)
\(\Row A\)
4.
Let \(A\) have the singular value decomposition as in (29.2).
(a)
Show, using (29.2), that \(||A\vv_j|| = \sigma_j\text{.}\)
(b)
Explain why \(||A|| = \sigma_1\text{.}\)
The set \(\{\vv_1, \vv_2, \ldots, \vv_n\}\) is an orthonormal basis of \(\R^n\text{.}\) Use this to show that \(||A \vx||^2 \leq \sigma_1^2\) for any unit vector \(\vx\) in \(\R^n\text{.}\)
5.
Show that \(A\) and \(A^{\tr}\) have the same nonzero singular values. How are their singular value decompositions related?
Find the transpose of an SVD for \(A\text{.}\)
6.
The vectors \(\vv_i\) that form the columns of the matrix \(V\) in a singular value decomposition of a matrix \(A\) are eigenvectors of \(A^{\tr}A\text{.}\) In this exercise we investigate the vectors \(\vu_i\) that make up the columns of the matrix \(U\) in a singular value decomposition of a matrix \(A\) for each \(i\) between 1 and the rank of \(A\text{,}\) and their connection to the matrix \(AA^{\tr}\text{.}\)
(a)
Let \(A = \left[ \begin{array}{ccr} 1\amp 1\amp 0 \\ 0\amp 1\amp -1 \end{array} \right]\text{.}\) A singular value decomposition of \(A\) is \(U \Sigma V^{\tr}\text{,}\) where
(i)
Determine the rank \(r\) of \(A^{\tr}A\) and identify the vectors \(\vu_1\text{,}\) \(\vu_2\text{,}\) \(\ldots\text{,}\) \(\vu_r\text{.}\)
(ii)
Calculate \(AA^{\tr} \vu_i\) for each \(i\) between 1 and \(r\text{.}\) How is \(AA^{\tr} \vu_i\) related to \(\vu_i\text{?}\)
(b)
Now we examine the result of part (a) in general. Let \(A\) be an arbitrary matrix. Calculate \(AA^{\tr} \vu_i\) for \(1 \leq i \leq \rank(A)\) and determine specifically how \(AA^{\tr} \vu_i\) is related to \(\vu_i\text{.}\) What does this tell us about the vectors \(\vu_i\) and the matrix \(AA^{\tr}\text{?}\)
(c)
Now show in general that the columns of \(U\) are orthonormal eigenvectors for \(AA^{\tr}\text{.}\) (That is, what can we say about the vectors \(\vu_i\) if \(i > \rank(A)\text{?}\))
7.
If \(A\) is a symmetric matrix with eigenvalues \(\lambda_1 \geq \lambda_2 \geq \cdots \geq \lambda_n \geq 0\text{,}\) what is \(||A||\text{?}\) Justify your answer.
8.
Let \(A\) be a \(n \times n\) symmetric matrix.
(a)
Show that if \(\vv\) is an eigenvector of \(A\) with eigenvalue \(\lambda\text{,}\) then \(\vv\) is an eigenvector for \(A^{\tr}A\text{.}\) What is the corresponding eigenvalue?
(b)
Show that if \(\vv\) is an eigenvector of \(A^{\tr}A\) with non-negative eigenvalue \(\lambda\text{,}\) then \(A\vv\) is an eigenvector of \(A^{\tr}A\text{.}\) What is the corresponding eigenvalue?
(c)
Suppose \(U \Sigma V^{\tr}\) is a singular value decomposition of \(A\text{.}\) Explain why \(V \Sigma V^{\tr}\) is also a singular value decomposition of \(A\text{.}\)
9.
Let \(\vu_1\text{,}\) \(\vu_2\text{,}\) \(\ldots\text{,}\) \(\vu_r\) and \(\vv_1\text{,}\) \(\vv_2\text{,}\) \(\ldots\text{,}\) \(\vv_r\) be the vectors found in a singular value decomposition of a matrix \(A\text{,}\) where \(r\) is the rank of \(A\text{.}\) Show that \(\vu_i\vv_i^{\tr}\) is a rank 1 matrix for each \(i\text{.}\)
Mimic Exercise 5 in Section 27.
10.
Is it possible for a matrix \(A\) to have a singular value decomposition \(U \Sigma V^{\tr}\) in which \(U = V\text{?}\) If no, explain why. If yes, determine for which matrices we can have \(U = V\text{.}\)
11.
Label each of the following statements as True or False. Provide justification for your response.
(a) True/False.
If \(\sigma\) is a singular value of a matrix \(A\text{,}\) then \(\sigma\) is an eigenvalue of \(A^{\tr}A\text{.}\)
(b) True/False.
A set of right singular vectors of a matrix \(A\) is also a set of left singular vectors of \(A^{\tr}\text{.}\)
(c) True/False.
The transpose of a singular value decomposition of a matrix \(A\) is a singular value decomposition for \(A^{\tr}\text{.}\)
(d) True/False.
Similar matrices have the same singular values.
(e) True/False.
If \(A\) is an \(n \times n\) matrix, then the singular values of \(A^2\) are the squares of the singular values of \(A\text{.}\)
(f) True/False.
The \(\Sigma\) matrix in an SVD of \(A\) is unique.
(g) True/False.
The matrices \(U, V\) in an SVD of \(A\) are unique.
(h) True/False.
If \(A\) is a positive definite matrix, then an orthogonal diagonalization of \(A\) is an SVD of \(A\text{.}\)
Subsection Project: Latent Semantic Indexing
As an elementary example to illustrate the idea behind Latent Semantic Indexing (LSI), consider the problem of creating a program to search a collection of documents for words, or words related to a given word. Document collections are usually very large, but we use a small example for illustrative purposes. A standard example that is given in several publications 51 is the following. Suppose we have nine documents \(c_1\) through \(c_5\) (titles of documents about human-computer interaction) and \(m_1\) through \(m_4\) (titles of graph theory papers) that make up our library:
\(c_1\text{:}\) Human machine interface for ABC computer applications
\(c_2\text{:}\) A survey of user opinion of computer system response time
\(c_3\text{:}\) The EPS user interface management system
\(c_4\text{:}\) System and human system engineering testing of EPS
\(c_5\text{:}\) Relation of user perceived response time to error measurement
\(m_1\text{:}\) The generation of random, binary, ordered trees
\(m_2\text{:}\) The intersection graph of paths in trees
\(m_3\text{:}\) Graph minors IV: Widths of trees and well-quasi-ordering
\(m_4\text{:}\) Graph minors: A survey
To make a searchable database, one might start by creating a list of key terms that appear in the documents (generally removing common words such as “a”, “the”, “of”, etc., called stop words — these words contribute little, if any, context). In our documents we identify the key words that are shown in italics. (Note that we are just selecting key words to make our example manageable, not necessarily identifying the most important words.) Using the key words we create a term-document matrix. The term-document matrix is the matrix in which the terms form the rows and the documents the columns. If \(A = [a_{ij}]\) is the term-document matrix, then \(a_{ij}\) counts the number of times word \(i\) appears in document \(j\text{.}\) The term-document matrix \(A\) for our library is
One of our goals is to rate the pages in our library for relevance if we search for a query. For example, suppose we want to rate the pages for the query survey, computer. This query can be represented by the vector \(\vq= [0 \ 0 \ 1 \ 0 \ 0 \ 0 \ 0 \ 0 \ 1 \ 0 \ 0 \ 0 ]^{\tr}\text{.}\)
Project Activity 29.5.
In a standard term-matching search with \(m \times n\) term-document matrix \(A\text{,}\) a query vector \(\vq\) would be matched with the terms to determine the number of matches. The matching counts the number of times each document agrees with the query.
(a)
Explain why this matching is accomplished by the matrix-vector product \(A^{\tr} \vq\text{.}\)
(b)
Let \(\vy = [y_1 \ y_2 \ \ldots \ y_n]^{\tr} = A^{\tr} \vq\text{.}\) Explain why \(y_i = \cos(\theta_i) ||\va_i|| ||\vq||\text{,}\) where \(\va_i\) is the \(i\)th column of \(A\) and \(\theta_i\) is the angle between \(\va_i\) and \(\vq\text{.}\)
(c)
We can use the cosine calculation from part (b) to compare matches to our query — the closer the cosine is to \(1\text{,}\) the better the match (dividing by the product of the norms is essentially converting all vectors to unit vectors for comparison purposes). This is often referred to as the cosine distance. Calculate the cosines of the \(\theta_i\) for our example of the query \(\vq= [0 \ 0 \ 1 \ 0 \ 0 \ 0 \ 0 \ 0 \ 1 \ 0 \ 0 \ 0 ]^{\tr}\text{.}\) Order the documents from best to worst match for this query.
Though we were able to rate the documents in Project Activity 29.5 using the cosine distance, the result is less than satisfying. Documents \(c_3\text{,}\) \(c_4\text{,}\) and \(c_5\) are all related to computers but do not appear at all in or results. This is a problem with language searches — we don't want to compare just words, but we also need to compare the concepts the words represent. The fact that words can represent different things implies that a random choice of word by different authors can introduce noise into the word-concept relationship. To filter out this noise, we can apply the singular value decomposition to find a smaller set of concepts to better represent the relationships. Before we do so, we examine some useful properties of the term-document matrix.
Project Activity 29.6.
Let \(A = [ \va_1 \ \va_2 \ \cdots \ \va_9]\text{,}\) where \(\va_1\text{,}\) \(\va_2\text{,}\) \(\ldots\text{,}\) \(\va_9\) are the columns of \(A\text{.}\)
(a)
In Project Activity 29.5 you should have seen that \(b_{ij} = \va_i^{\tr} \va_j = \va_i \cdot \va_j\text{.}\) Assume for the moment that all of the entries in \(A\) are either \(0\) or \(1\text{.}\) Explain why in this case the dot product \(\va_i \cdot \va_j\) tells us how many terms documents \(i\) and \(j\) have in common. Also, the matrix \(A^{\tr}A\) takes dot products of the columns of \(A\text{,}\) which refer to what's happening in each document and so is looking at document-document interactions. For these reasons, we call \(A^{\tr}A\) the document-document matrix.
(b)
Use appropriate technology to calculate the entries of the matrix \(C = [c_{ij}] = AA^{\tr}\text{.}\) This matrix is the term-term matrix. Assume for the moment that all of the entries in \(A\) are either \(0\) or \(1\text{.}\) Explain why if terms \(i\) and \(j\) occur together in \(k\) documents, then \(c_{ij} =k\text{.}\)
The nature of the term-term and document-document matrices makes it realistic to think about a SVD.
Project Activity 29.7.
To see why a singular value decomposition might be useful, suppose our term-document matrix \(A\) has singular value decomposition \(A = U \Sigma V^{\tr}\text{.}\) (Don't actually calculate the SVD yet).
(a)
Show that the document-document matrix \(A^{\tr}A\) satisfies \(A^{\tr}A = \left(V \Sigma^{\tr}\right) \left(V\Sigma^{\tr}\right)^{\tr}\text{.}\) This means that we can compare document \(i\) and document \(j\) using the dot product of row \(i\) and column \(j\) of the matrix product \(V\Sigma^{\tr}\text{.}\)
(b)
Show that the term-term matrix \(AA^{\tr}\) satisfies \(AA^{\tr} = \left(U \Sigma\right) \left(U\Sigma\right)^{\tr}\text{.}\) Thus we can compare term \(i\) and term \(j\) using the dot product of row \(i\) and column \(j\) of the matrix product \(U\Sigma\text{.}\) (Exercise 6 shows that the columns of \(U\) are orthogonal eigenvectors of \(AA^{\tr}\text{.}\))
As we will see, the connection of the matrices \(U\) and \(V\) to documents and terms that we saw in Project Activity 29.7 will be very useful when we use the SVD of the term-document matrix to reduce dimensions to a “concept” space. We will be able to interpret the rows of the matrices \(U\) and \(V\) as providing coordinates for terms and documents in this space.
Project Activity 29.8.
The singular value decomposition (SVD) allows us to produce new, improved term-document matrices. For this activity, use the term-document matrix \(A\) in Figure 29.11.
(a)
Use appropriate technology to find a singular value decomposition of \(A\) so that \(A = U \Sigma V^{\tr}\text{.}\) Print your entries to two decimal places (but keep as many as possible for computational purposes).
Each singular value tells us how important its semantic dimension is. If we remove the smaller singular values (the less important dimensions), we retain the important information but eliminate minor details and noise. We produce a new term-document matrix \(A_k\) by keeping the largest \(k\) of the singular values and discarding the rest. This gives us an approximation
using the outer product decomposition, where \(\sigma_1\text{,}\) \(\sigma_2\text{,}\) \(\ldots\text{,}\) \(\sigma_k\) are the \(k\) largest singular values of \(A\text{.}\) Note that if \(A\) is an \(m \times n\) matrix, letting \(U_k = [\vu_1 \ \vu_2 \ \cdots \ \vu_k]\) (an \(m \times k\) matrix), \(\Sigma_k\) the \(k \times k\) matrix with the first \(k\) singular values along the diagonal, and \(V^{\tr} = [\vv_1 \ \vv_2 \ \cdots \ \vv_k ]^{\tr}\) (a \(k \times n\) matrix), then we can also write \(A_k = U_k \Sigma_k V_k^{\tr}\text{.}\) This is sometimes referred to as a reduced SVD. Find \(U_2\text{,}\) \(\Sigma_2\text{,}\) and \(V_2^{\tr}\text{,}\) and find the new term-document matrix \(A_2\text{.}\)
Once we have our term-document matrix, there are three basic comparisons to make: comparing terms, comparing documents, and comparing terms and documents. Term-document matrices are usually very large, with dimension being the number of terms. By using a reduced SVD we can create a much smaller approximation. In our example, the matrix \(A_k\) in Project Activity 29.8 reduces our problem to a \(k\)-dimensional space. Intuitively, we can think of LSI as representing terms as averages of all of the documents in which they appear and documents as averages of all of the terms they contain. Through this process, LSI attempts to combine the surface information in our library into a deeper abstraction (the “concept” space) that captures the mutual relationships between terms and documents.
We now need to understand how we can represent documents and terms in this smaller space where \(A_k = U_k \Sigma_k V_k^{\tr}\text{.}\) Informally, we can consider the rows of \(U_k\) as representing the coordinates of each term in the lower dimensional concept space and the columns of \(V_k^{\tr}\) as the coordinates of the documents, while the entries of \(\Sigma_k\) tell us how important each semantic dimension is. The dot product of two row vectors of \(A_k\) indicates how terms compare across documents. This product is \(A_kA_k^{\tr}\text{.}\) Just as in Project Activity 29.7, we have \(A_kA_k^{\tr} = \left(U_k \Sigma_k\right) \left(U_k\Sigma_k\right)^{\tr}\text{.}\) In other words, if we consider the rows of \(U_k\Sigma_k\) as coordinates for terms, then the dot products of these rows give us term to term comparisons. (Note that multiplying \(U\) by \(\Sigma\) just stretches the rows of \(U\) by the singular values according to the importance of the concept represented by that singular value.) Similarly, the dot product between columns of \(A\) provide a comparison of documents. This comparison is given by \(A_k^{\tr}A_k^ = \left(V_k \Sigma_k^{\tr}\right) \left(V_k\Sigma_k^{\tr}\right)^{\tr}\) (again by Project Activity 29.7). So we can consider the rows of \(V\Sigma^{\tr}\) as providing coordinates for documents.
Project Activity 29.9.
We have seen how to compare terms to terms and documents to documents. The matrix \(A_k\) itself compares terms to documents. Show that \(A_k = \left(U_k \Sigma_k^{1/2}\right) \left(V_k\Sigma_k^{1/2}\right)^{\tr}\text{,}\) where \(\Sigma_k^{1/2}\) is the diagonal matrix of the same size as \(\Sigma_k\) whose diagonal entries are the square roots of the corresponding diagonal entries in \(\Sigma_k\text{.}\) Thus, all useful comparisons of terms and documents can be made using the rows of the matrices \(U\) and \(V\text{,}\) scaled in some way by the singular values in \(\Sigma\text{.}\)
To work in this smaller concept space, it is important to be able to find appropriate comparisons to objects that appeared in the original search. For example, to complete the latent structure view of the system, we must also convert the original query to a representation within the new term-document system represented by \(A_k\text{.}\) This new representation is called a pseudo-document.
Project Activity 29.10.
For an original query \(\vq\text{,}\) we start with its term vector \(\va_{\vq}\) (a vector in the coordinate system determined by the columns of \(A\)) and find a representation \(\vv_{\vq}\) that we can use as a column of \(V^{\tr}\) in the document-document comparison matrix. If this representation was perfect, then it would take a real document in the original system given by \(A\) and produce the corresponding column of \(U\) if we used the full SVD. In other words, we would have \(\va_{\vq} = U \Sigma \vv_{\vq}^{\tr}\text{.}\)
(a)
Use the fact that \(A_k = U_k \Sigma_k V_k^{\tr}\text{,}\) to show that \(V_k = A_k^{\tr} U_k \Sigma_k^{-1}\text{.}\) It follows that \(\vq\) is transformed into the query \(\vq_k = \vq^{\tr}U_k \Sigma_k^{-1}\text{.}\)
(b)
In our example, using \(k=2\text{,}\) the terms can now be represented as \(2\)-dimensional vectors (the rows of \(U_2\text{,}\) see Figure 29.12), or as points in the plane. More specifically, human is represented by the vector (to two decimal places) \([-0.22 \ -0.11]^{\tr}\text{,}\) interface by \([- 0.20 \ - 0.07]^{\tr}\text{,}\) etc. Similarly, the documents are represented by columns of \(V_2\) (see Figure 29.13), so that the document \(c_1\) is represented by \([-0.20 \ -0.06]^{\tr}\text{,}\) \(c_2\) by \([-0.61 \ 0.17]^{\tr}\text{,}\) etc. From this perspective we can visualize these documents in the plane. Plot the documents and the query in the 2-dimensional concept space. Then calculate the cosine distances from the query to the documents in this space. Which documents now give the best three matches to the query? Compare the matches to your plot.
As we can see from Project Activity 29.10, the original query had no match at all with any documents except \(c_1\text{,}\) \(c_2\text{,}\) and \(m_4\text{.}\) In the new concept space, the query now has some connection to every document,. So LSI has made semantic connections between the terms and documents that were not present in the original term-document matrix, which gives us better results for our search.