Skip to main content

Section 29 The Singular Value Decomposition

Subsection Application: Search Engines and Semantics

Effective search engines search for more than just words. Language is complex and search engines must deal with the fact that there are often many ways to express a given concept (this is called synonymy, that multiple words can have the same meaning), and that a single word can have multiple meanings (polysemy). As a consequence, a search on a word may provide irrelevant matches (e.g., searching for derivative could provide pages on mathematics or financial securities) or you might search for articles on cats but the paper you really want uses the word felines. A better search engine will not necessarily try to match terms, but instead retrieve information based on concept or intent. Latent Semantic Indexing (LSI) (or Latent Semantic Analysis), developed in the late 1980s, helps search engines determine concept and intent in order to provide more accurate and relevant results. LSI essentially works by providing underlying (latent) relationships between words (semantics) that search engines need to provide context and understanding (indexing). LSI provides a mapping of both words and documents into a lower dimensional “concept” space, and makes the search in this new space. The mapping is provided by the singular value decomposition.

Subsection Introduction

The singular value decomposition (SVD) of a matrix is an important and useful matrix decomposition. Unlike other matrix decompositions, every matrix has a singular value decomposition. The SVD is used in a variety of applications including scientific computing, digital signal processing, image compression, principal component analysis, web searching through latent semantic indexing, and seismology. Recall that the eigenvector decomposition of an n×n diagonalizable matrix M has the form P1MP, where the columns of the matrix P are n linearly independent eigenvectors of M and the diagonal entries of the diagonal matrix P1MP are the eigenvalues of M. The singular value decomposition does something similar for any matrix of any size. One of the keys to the SVD is that the matrix ATA is symmetric for any matrix A.

Subsection The Operator Norm of a Matrix

Before we introduce the Singular Value Decomposition, let us work through some preliminaries to motivate the idea. The first is to provide an answer to the question “How ‘big’ is a matrix?” There are many ways to interpret and answer this question, but a substantial (and useful) answer should involve more than just the dimensions of the matrix. A good measure of the size of a matrix, which we will refer to as the norm of the matrix, should take into account the action of the linear transformation defined by the matrix on vectors. This then will lead to questions about how difficult or easy is it to solve a matrix equation Ax=b.

If we want to incorporate the action of a matrix A into a calculation of the norm of A, we might think of measuring how much A can change a vector x. This could lead us to using ||Ax|| as some sort of measure of a norm of A. However, since ||A(cx)||=|c| ||Ax|| for any scalar c, scaling x by a large scalar will produce a large norm, so this is not a viable definition of a norm. We could instead measure the relative effect that A has on a vector x as ||Ax||||x||, since this ratio does not change when x is multiplied by a scalar. The largest of all of these ratios would provide a good sense of how much A can change vectors. Thus, we define the operator norm of a matrix A as follows.

Definition 29.1.

The operator norm  50  of a matrix A is

||A||=max||x||0{||Ax||||x||}.

Due to the linearity of matrix multiplication, we can restrict ourselves to unit vectors for an equivalent definition of the operator norm of the matrix A as

||A||=max||x||=1{||Ax||}.

Preview Activity 29.1.

(a)

Determine ||A|| if A is the zero matrix.

(b)

Determine ||In||, where In is the n×n identity matrix.

(c)

Let A=[1002]. Find ||A||. Justify your answer. (Hint: x12+4x224(x12+x22).)

(d)

If P is an orthogonal matrix, what is ||P||? Why?

The operator norm of a matrix tells us that how big the action of an m×n matrix is can be determined by its action on the unit sphere in Rn (the unit sphere is the set of terminal point of unit vectors). Let us consider two examples.

Example 29.2.

Let A=[2125]. We can draw a graph to see the action of A on the unit circle. A picture of the set {Ax : ||x||=1} is shown in Figure 29.3.

Figure 29.3. The image of the unit circle under the action of A.

It appears that A transforms the unit circle into an ellipse. To find ||A|| we want to maximize ||Ax|| for x on the unit circle. This is the same as maximizing

||Ax||2=(Ax)T(Ax)=xTATAx.

Now ATA=[8121226] is a symmetric matrix, so we can orthogonally diagonalize ATA. The eigenvalues of ATA are 32 and 2. Let P=[u1 u2], where u1=[55 255]T is a unit eigenvector of ATA with eigenvalue 32 and u2=[255 55]T is a unit eigenvector of ATA with eigenvalue 2. Then P is an orthogonal matrix such that PT(ATA)P=[32002]=D. It follows that

xT(ATA)x=xTPDPTx=(PTx)TD(PTx).

Now PT is orthogonal, so ||PTx||=||x|| and PT maps the unit circle to the unit circle. Moreover, if x is on the unit circle, then y=Px is also on the unit circle and PTy=PTPx=x. So every point x on the unit circle corresponds to a point Px on the unit circle. Thus, the forms xT(ATA)x and (PTx)TD(PTx) take on exactly the same values over all points on the unit circle. Now we just need to find the maximum value of (PTx)TD(PTx). This turns out to be relatively easy since D is a diagonal matrix.

Let's simplify the notation. Let y=PTx. Then our job is to maximize yTDy. If y=[y1 y2]T, then

yTDy=32y12+2y22.

We want to find the maximum value of this expression for y on the unit circle. Note that 2y2232y22 and so

32y12+2y2232y12+32y22=32(y12+y22)=32||y||2=32.

Since [1 0]T is on the unit circle, the expression 32y12+2y22 attains the value 32 at some point on the unit circle, so 32 is the maximum value of yTDy over all y on the unit circle. While we are at it, we can similarly find the minimum value of yTDy for y on the unit circle. Since 2y1232y12 we see that

32y12+2y222y12+2y22=2(y12+y22)=2||y||2=2.

Since the expression yTDy attains the value 2 at [0 1]T on the unit circle, we can see that yTDy attains the minimum value of 2 on the unit circle.

Now we can return to the expression xT(ATA)x. Since yTDy assumes the same values as xT(ATA)x, we can say that the maximum value of xT(ATA)x for x on the unit circle is 32 (and the minimum value is 2). Moreover, the quadratic form (PTx)TD(PTx) assumes its maximum value when PTx=[1 0]T or [1 0]T. Thus, the form xT(ATA)x assumes its maximum value at the vector x=P[1 0]T=u1 or u1. Similarly, the quadratic form xT(ATA)x attains its minimum value at P[0 1]T=u2 or u2. We conclude that ||A||=32.

Figure 29.4 shows the image of the unit circle under the action of A and the images of Au1 and Au2 where u1,u2 are the two unit eigenvectors of ATA. The image also supports that Ax assumes its maximum and minimum values for points on the unit circle at u1 and u2.

Figure 29.4. The image of the unit circle under the action of A, and the vectors Au1 and Au2

IMPORTANTE NOTE 1.

What we have just argued is that the maximum value of ||Ax|| for x on the unit sphere in Rn is the square root of the largest eigenvalue of ATA and occurs at a corresponding unit eigenvector.

Example 29.5.

This same process works for matrices other than 2×2 ones. For example, consider A=[2820141910]. In this case A maps R3 to R2. The image of the unit sphere {xR3:||x||=1} under left multiplication by A is a filled ellipse as shown in Figure 29.6.

Figure 29.6. The image of the unit circle under the action of A, and the vectors Au1 and Au2

As with the previous example, the norm of A is the square root of the maximum value of xT(ATA)x and this maximum value is the dominant eigenvalue of ATA=[200250100250425350100350500]. The eigenvalues of A are λ1=900, λ2=225, and λ3=0 with corresponding unit eigenvectors u1=[13 23 23]T, u1=[23 13 23]T, and u3=[23 23 13]T. So in this case we have ||A||=900=30. The transformation defined by matrix multiplication by A from R3 to R2 has a one-dimensional kernel which is spanned by the eigenvector corresponding to λ3. The image of the transformation is 2-dimensional and the image of the unit circle is an ellipse where Au1 gives the major axis of the ellipse and Au2 gives the minor axis. Essentially, the square roots of the eigenvalues of ATA tell us how A stretches the image space in each direction.

IMPORTANT NOTE 2.

We have just argued that the image of the unit n-sphere under the action of an m×n matrix is an ellipsoid in Rm stretched the greatest amount, λ1, in the direction of an eigenvector for the largest eigenvalue (λ1) of ATA; the next greatest amount, λ2, in the direction of a unit vector for the second largest eigenvalue (λ2) of ATA; and so on.

Activity 29.2.

Let A=[054321]. Then ATA=[20101035]. The eigenvalues of ATA are λ1=40 and λ2=15 with respective eigenvectors v1=[121] and v2=[21].

(a)

Find ||A||.

(b)

Find a unit vector x at which ||Ax|| assumes its maximum value.

Subsection The SVD

The Singular Value Decomposition (SVD) is essentially a concise statement of what we saw in the previous section that works for any matrix. We will uncover the SVD in this section.

Preview Activity 29.3.

Let A=[110011]. Since A is not square, we cannot diagonalize A. However, the matrix

ATA=[110121011]

is a symmetric matrix and can be orthogonally diagonalized. The eigenvalues of ATA are 3, 1, and 0 with corresponding eigenvectors

[121], [101],  and  [111],

respectively. Use appropriate technology to do the following.

(a)

Find an orthogonal matrix V=[v1 v2 v3] that orthogonally diagonalizes ATA, where

VT(ATA)V=[300010000].
(b)

For i=1,2, let ui=Avi||Avi||. Find each ui. Why don't we define u3 in this way?

(c)

Let U=[u1 u2]. What kind of matrix is U? Explain.

(d)

Calculate the matrix product UTAV. What do you notice? How is this similar to the eigenvector decomposition of a matrix?

Preview Activity 29.3 contains the basic ideas behind the Singular Value Decomposition. Let A be an m×n matrix with real entries. Note that ATA is a symmetric n×n matrix and, hence, it can be orthogonally diagonalized. Let V=[v1 v2 v3  vn] be an n×n orthogonal matrix whose columns form an orthonormal set of eigenvectors for ATA. For each i, let (ATA)vi=λivi. We know

VT(ATA)V=[λ10000λ200000λn].

Now notice that for each i we have

(29.1)||Avi||2=(Avi)T(Avi)=viT(ATA)vi=viTλivi=λi||vi||2=λi,

so λi0. Thus, the matrix ATA has no negative eigenvalues. We can always arrange the eigenvectors and eigenvalues of ATA so that

λ1λ2λn0.

Also note that

(Avi)(Avj)=(Avi)T(Avj)=viT(ATA)vj=viTλj vj=λjvivj=0

if ij. So the set {Av1,Av2,,Avn} is an orthogonal set in Rm. Each of the vectors Avi is in Col A, and so {Av1,Av2,,Avn} is an orthogonal subset of Col A. It is possible that Avi=0 for some of the vi (if ATA has 0 as an eigenvalue). Let v1, v2, , vr be the eigenvectors corresponding to the nonzero eigenvalues. Then the set

B={Av1,Av2,,Avr}

is a linearly independent set of nonzero orthogonal vectors in Col A. Now we will show that B is a basis for Col A. Let y be a vector in Col A. Then y=Ax for some vector x in Rn. Recall that the vectors v1, v2, , vn form an orthonormal basis of Rn, so

x=x1v1+x2v2++xnvn

for some scalars x1, x2, , xn. Since Avj=0 for r+1jn we have

y=Ax=A(x1v1+x2v2++xnvn)=x1Av1+x2Av2++xrAvr+xr+1Avr+1++xnAvn=x1Av1+x2Av2++xrAvr.

So Span B=Col A and B is an orthogonal basis for Col A.

Now we are ready to find the Singular Value Decomposition of A. First we create an orthonormal basis {u1,u2,,ur} for Col A by normalizing the vectors Avi. So we let

ui=Avi||Avi||

for i from 1 to r.

Remember from (29.1) that ||Avi||2=λi, so if we let σi=λi, then we have

ui=Aviσi and Avi=σiui.

We ordered the λi so that λ1λ2λn, so we also have

σ1σ2σr>0.

The scalars σ1, σ2, , σn are called the singular values of A.

Definition 29.7.

Let A be an m×n matrix. The singular values of A are the square roots of the eigenvalues of ATA.

The vectors u1, u2, , ur are r orthonormal vectors in Rm. We can extend the set {u1, u2, , ur} to an orthonormal basis C={u1,u2,,ur,ur+1ur+2,,um} of Rm. Recall that Avi=σiui for 1ir and Avj=0 for r+1jn, so

AV=A[v1 v2  vn]=[Av1 Av2  Avn]=[σ1u1 σ2u2  σrur 0 0  0].

We can write the matrix [σ1v1 σ2v2  σrvr 0 0  0] in another way. Let Σ be the m×n matrix defined as

Σ=[σ10σ20σ30σr00].

Now

[u1 u2  um]Σ=[σ1u1 σ1u2  σrur 0 0  0]=AV.

So if U=[u1 u2  um], then

UΣ=AV.

Since V is an orthogonal matrix, we have that

UΣVT=AVVT=A.

This is the Singular Value Decomposition of A.

SVD Summary.

A Singular Value Decomposition of an m×n matrix A of rank r can be found as follows.

  1. Find an orthonormal basis {v1,v2,v3,,vn} of eigenvectors of ATA such that (ATA)vi=λivi for i from 1 to n with λ1λ2λn0 with the first r eigenvalues being positive. The vectors v1, v2, v3, , vn are the right singular vectors of A.

  2. Let

    V=[v1 v2 v3  vn].

    Then V orthogonally diagonalizes ATA.

  3. The singular values of A are the numbers σi, where σi=λi>0 for i from 1 to r. Let Σ be the m×n matrix

    Σ=[σ10σ20σ30σr00]
  4. For i from 1 to r, let ui=Avi||Avi||. Then the set {u1,u2,,ur} forms an orthonormal basis of Col A.

  5. Extend the set {u1,u2,,ur} to an orthonormal basis

    {u1,u2,,ur,ur+1ur+2,,um}

    of Rm. Let

    U=[u1 u2  um].

    The vectors u1, u2, , um are the left singular vectors of A.

  6. Then A=UΣVT is a singular value decomposition of A.

Activity 29.4.

Let A=[054321]. Then ATA=[20101035]. The eigenvalues of ATA are λ1=40 and λ2=15 with respective eigenvectors w1=[12] and w2=[21].

(a)

Find an orthonormal basis {v1,v2,v3,,vn} of eigenvectors of ATA. What is n? Find the matrix V in a SVD for A.

(b)

Find the singular values of A. What is the rank r of A? Why?

(c)

What are the dimensions of the matrix Σ in a SVD of A? Find Σ.

(d)

Find the vectors u1, u2, , ur. If necessary, extend this set to an orthonormal basis

{u1,u2,,ur,ur+1ur+2,,um}

of Rm.

(e)

Find the matrix U so that A=UΣVT is a SVD for A.

There is another way we can write this SVD of A. Let the m×n matrix A have a singular value decomposition UΣVT, where

U=[u1 u2  um],Σ=[σ10σ20σ30σr00], and V=[v1 v2 v3  vn].

Since A=UΣVT we see that

A=[u1 u2 u3  um][σ10σ20σ30σr00][v1Tv2Tv3TvnT]=[σ1u1 σ2u2 σ3u3  σrur 0  0][v1Tv2Tv3TvnT](29.2)=σ1u1v1T+σ2u2v2T+σ3u3v3T++σrurvrT.

This is called an outer product decomposition of A and tells us everything we learned above about the action of the matrix A as a linear transformation. Each of the products uiviT is a rank 1 matrix (see Exercise 9), and ||Av1||=σ1 is the largest value A takes on the unit n-sphere, ||Av2||=σ2 is the next largest dilation of the unit n-sphere, and so on. An outer product decomposition allows us to approximate A with smaller rank matrices. For example, the matrix σ1u1v1T is the best rank 1 approximation to A, σ1u1v1T+σ2u2v2T is the best rank 2 approximation, and so on. This will be very useful in applications, as we will see in the next section.

Subsection SVD and the Null, Column, and Row Spaces of a Matrix

We conclude this section with a short discussion of how a singular value decomposition relates fundamental subspaces of a matrix. We have seen that the vectors u1, u2, , ur in an SVD for an m×n matrix A form a basis for Col A. Recall also that Avj=0 for r+1jn. Since dim(Nul A)+dim(Col A)=n, it follows that the vectors vr+1, vr+2, , vn form a basis for Nul A. As you will show in the exercises, the set {v1,v2,,vr} is a basis for Row A. Thus, an SVD for a matrix A tells us about three fundamental vector spaces related to A.

Subsection Examples

What follows are worked examples that use the concepts from this section.

Example 29.9.

Let A=[200002100120].

(a)

Find a singular value decomposition for A. You may use technology to find eigenvalues and eigenvectors of matrices.

Solution.

With A as given, we have ATA=[4000054004500000]. Technology shows that the eigenvalues of ATA are λ1=9, λ2=4, λ3=1, and λ4=0 with corresponding orthonormal eigenvectors v1=12[0 1 1 0]T, v2=[1 0 0 0]T, v3=12[0 1 1 0]T, and v4=[0 0 0 1]T. This makes V=[v1 v2 v3 v4]. The singular values of A are σ1=9=3, σ2=4=2, σ3=1=1, and σ4=0, so Σ is the 3×4 matrix with the nonzero singular values along the diagonal and zeros everywhere else. Finally, we define the vectors ui as ui=1||Avi||Avi. Again, technology gives us u1=12[0 1 1]T, u2=[1 0 0]T, and u3=12[0 1 1]T. Thus, a singular value decomposition of A is UΣVT, where

U=[0101201212012],Σ=[300002000010],  and V=[01001201201201200001].
(b)

Use the singular value decomposition to find a basis for Col A, Row A, and Nul A.

Solution.

Recall that the right singular vectors of an m×n matrix A of rank r form an orthonormal basis {v1,v2,v3,,vn} of eigenvectors of ATA such that (ATA)vi=λivi for i from 1 to n with λ1λ2λn0. These vectors are the columns of the matrix V=[v1 v2  vn] in a singular value decomposition of A. For i from 1 to r, we let ui=Avi||Avi||. Then the set {u1,u2,,ur} forms an orthonormal basis of Col A. We extend this set {u1,u2,,ur} to an orthonormal basis {u1,u2,,ur,ur+1ur+2,,um} of Rm. Recall also that Avj=0 for r+1jn. Since dim(Nul A)+dim(Col A)=n, it follows that the vectors vr+1, vr+2, , vn form a basis for Nul A. Finally, the set {v1,v2,,vr} is a basis for Row A. So in our example, we have m=3, n=4, v1=12[0 1 1 0]T, v2=[1 0 0 0]T, v3=12[0 1 1 0]T, and v4=[0 0 0 1]T. Since the singular values of A are 3, 2, 1, and 0, it follows that r=rank(A)=3. We also have u1=12[0 1 1]T, u2=[1 0 0]T, and u3=12[0 1 1]T. So

{12[0 1 1 0]T,[1 0 0 0]T,12[0 1 1 0]T}

is a basis for Row A and

{[0 0 0 1]T}

is a basis for Nul A. Finally, the set

{12[0 1 1]T,[1 0 0]T,12[[0 1 1]T}

is a basis for Col A.

Example 29.10.

Let

A=[254630630254].

The eigenvalues of ATA are λ1=144, λ2=36, and λ3=0 with corresponding eigenvectors

w1=[221], w1=[212],  and  w1=[122].

In addition,

Aw1=[18181818]  and Aw2=[9999].
(a)

Find orthogonal matrices U and V, and the matrix Σ, so that UΣVT is a singular value decomposition of A.

Solution.

Normalizing the eigenvectors w1, w2, and w3 to normal eigenvectors v1, v2, and v3, respectively, gives us an orthogonal matrix

V=[232313231323132323].

Now Avi=Awi||wi||=1||wi||Awi, so normalizing the vectors Av1 and Av2 gives us vectors

u1=12[1111]  and  u2=12[1111]

that are the first two columns of our matrix U. Given that U is a 4×4 matrix, we need to find two other vectors orthogonal to u1 and u2 that will combine with u1 and u2 to form an orthogonal basis for R4. Letting z1=[1 1 1 1]T, z2=[1 1 1 1]T, z3=[1 0 0 0]T, and z4=[0 1 0 1]T, a computer algebra system shows that the reduced row echelon form of the matrix [z1 z2 z3 z4] is I4, so that vectors z1, z2, z3, z4 are linearly independent. Letting w1=z1 and w2=z2, the Gram-Schmidt process shows that the set {w1,w2,w3,w4} is an orthogonal basis for R4, where

w3=[1 0 0 0]T[1 0 0 0]T[1 1 1 1]T[1 1 1 1]T[1 1 1 1]T[1 1 1 1]T[1 0 0 0]T[1 1 1 1]T[1 1 1 1]T[1 1 1 1]T[1 1 1 1]T=[1 0 0 0]T14[1 1 1 1]T14[1 1 1 1]T=14[2 0 0 2]T

and (using [1 0 0 1]T for w3)

w4=[0 1 0 0]T[0 1 0 0]T[1 1 1 1]T[1 1 1 1]T[1 1 1 1]T[1 1 1 1]T[0 1 0 0]T[1 1 1 1]T[1 1 1 1]T[1 1 1 1]T[1 1 1 1]T[0 1 0 0]T[1 0 0 1]T[1 0 0 1]T[1 0 0 1]T[1 0 0 1]T=[0 1 0 0]T14[1 1 1 1]T+14[1 1 1 1]T0=14[0 2 2 0]T.

The set {u1,u2,u3,u4} where u1=12[1 1 1 1]T, u2=12[1 1 1 1]T, u3=12[1 0 0 1]T and u4=12[0 1 1 0]T is an orthonormal basis for R4 and we can let

U=[1212120121201212120121212120].

The singular values of A are σ1=λ1=12 and σ2=λ2=6, and so

Σ=[1200060000000].

Therefore, a singular value decomposition of A is UΣVT of

[1212120121201212120121212120][1200060000000][232313231323132323].
(b)

Determine the best rank 1 approximation to A.

Solution.

Determine the best rank 1 approximation to A. The outer product decomposition of A is

A=σ1u1v1T+σ2u2v2T.

So the rank one approximation to A is

σ1u1v1T=12(12)[1111][232313]=[442442442442].

Note that the rows in this rank one approximation are the averages of the two distinct rows in the matrix A, which makes sense considering that this is the closest rank one matrix to A.

Subsection Summary

We learned about the singular value decomposition of a matrix.

  • The operator norm of an m×n matrix A is

    ||A||=max||x||0||Ax||||x||=max||x||=1||Ax||.

    The operator norm of a matrix tells us that how big the action of an m×n matrix is can be determined by its action on the unit sphere in Rn.

  • A singular value decomposition of an m×n matrix is of the form A=UΣVT, where

    • V=[v1 v2 v3  vn] where {v1,v2,v3,,vn} is an orthonormal basis of eigenvectors of ATA such that (ATA)vi=λivi for i from 1 to n with λ1λ2λn0,

    • U=[u1 u2  um] where ui=Avi||Avi|| for i from 1 to r, and this orthonormal basis of Col A is extended to an orthonormal basis {u1,u2,,ur,ur+1ur+2,,um} of Rm,

    • Σ=[σ10σ20σ30σr00], where σi=λi>0 for i from 1 to r.

      A singular value decomposition is important in that every matrix has a singular value decomposition, and a singular value decomposition has a variety of applications including scientific computing and digital signal processing, image compression, principal component analysis, web searching through latent semantic indexing, and seismology.

  • The vectors u1, u2, , ur in an SVD for an m×n matrix A form a basis for Col A while the vectors vr+1, vr+2, , vn form a basis for Nul A. Also, the set {v1,v2,,vr} is a basis for Row A.

  • Let A have an SVD as in the second bullet. Decomposing A as

    A=σ1u1v1T+σ2u2v2T+σ3u3v3T++σrurvrT

    is an outer product decomposition of A. An outer product decomposition allows us to approximate A with smaller rank matrices. For example, the matrix σ1u1v1T is the best rank 1 approximation to A, σ1u1v1T+σ2u2v2T is the best rank 2 approximation, and so on.

Exercises Exercises

1.

Find a singular value decomposition of the following matrices.

(e)

[200002100120]

2.

Let A be an m×n matrix of rank r with singular value decomposition UΣVT, where U=[u1 u2  um] and V=[v1 v2  vn]. We have seen that the set {u1,u2,,ur} is a basis for Col A, and the vectors vr+1, vr+2, , vn form a basis for Nul A. In this exercise we examine the set {v1,v2,,vr} and determine what this set tells us about Row A.

(a)

Find a singular value decomposition for AT.

Hint.

Use the singular value decomposition UΣVT for A.

(b)

Explain why the result of (a) shows that the set {v1,v2,,vr} is a basis for Row A.

3.

Let A=[112233].

(a)

Find the singular values of A.

(b)

Find a singular value decomposition of A.

(c)

Use a singular value decomposition to find orthonormal bases for the following:

(ii)

Col A

(iii)

Row A

4.

Let A have the singular value decomposition as in (29.2).

(b)

Explain why ||A||=σ1.

Hint.

The set {v1,v2,,vn} is an orthonormal basis of Rn. Use this to show that ||Ax||2σ12 for any unit vector x in Rn.

5.

Show that A and AT have the same nonzero singular values. How are their singular value decompositions related?

Hint.

Find the transpose of an SVD for A.

6.

The vectors vi that form the columns of the matrix V in a singular value decomposition of a matrix A are eigenvectors of ATA. In this exercise we investigate the vectors ui that make up the columns of the matrix U in a singular value decomposition of a matrix A for each i between 1 and the rank of A, and their connection to the matrix AAT.

(a)
>

Let A=[110011]. A singular value decomposition of A is UΣVT, where

U=12[1111],Σ=[300010],V=[161361612012131313].
(i)

Determine the rank r of ATA and identify the vectors u1, u2, , ur.

(ii)

Calculate AATui for each i between 1 and r. How is AATui related to ui?

(b)

Now we examine the result of part (a) in general. Let A be an arbitrary matrix. Calculate AATui for 1irank(A) and determine specifically how AATui is related to ui. What does this tell us about the vectors ui and the matrix AAT?

(c)

Now show in general that the columns of U are orthonormal eigenvectors for AAT. (That is, what can we say about the vectors ui if i>rank(A)?)

7.

If A is a symmetric matrix with eigenvalues λ1λ2λn0, what is ||A||? Justify your answer.

8.

Let A be a n×n symmetric matrix.

(a)

Show that if v is an eigenvector of A with eigenvalue λ, then v is an eigenvector for ATA. What is the corresponding eigenvalue?

(b)

Show that if v is an eigenvector of ATA with non-negative eigenvalue λ, then Av is an eigenvector of ATA. What is the corresponding eigenvalue?

(c)

Suppose UΣVT is a singular value decomposition of A. Explain why VΣVT is also a singular value decomposition of A.

9.

Let u1, u2, , ur and v1, v2, , vr be the vectors found in a singular value decomposition of a matrix A, where r is the rank of A. Show that uiviT is a rank 1 matrix for each i.

10.

Is it possible for a matrix A to have a singular value decomposition UΣVT in which U=V? If no, explain why. If yes, determine for which matrices we can have U=V.

11.

Label each of the following statements as True or False. Provide justification for your response.

(a) True/False.

If σ is a singular value of a matrix A, then σ is an eigenvalue of ATA.

(b) True/False.

A set of right singular vectors of a matrix A is also a set of left singular vectors of AT.

(c) True/False.

The transpose of a singular value decomposition of a matrix A is a singular value decomposition for AT.

(d) True/False.

Similar matrices have the same singular values.

(e) True/False.

If A is an n×n matrix, then the singular values of A2 are the squares of the singular values of A.

(f) True/False.

The Σ matrix in an SVD of A is unique.

(g) True/False.

The matrices U,V in an SVD of A are unique.

(h) True/False.

If A is a positive definite matrix, then an orthogonal diagonalization of A is an SVD of A.

Subsection Project: Latent Semantic Indexing

As an elementary example to illustrate the idea behind Latent Semantic Indexing (LSI), consider the problem of creating a program to search a collection of documents for words, or words related to a given word. Document collections are usually very large, but we use a small example for illustrative purposes. A standard example that is given in several publications 51  is the following. Suppose we have nine documents c1 through c5 (titles of documents about human-computer interaction) and m1 through m4 (titles of graph theory papers) that make up our library:

  • c1: Human machine interface for ABC computer applications

  • c2: A survey of user opinion of computer system response time

  • c3: The EPS user interface management system

  • c4: System and human system engineering testing of EPS

  • c5: Relation of user perceived response time to error measurement

  • m1: The generation of random, binary, ordered trees

  • m2: The intersection graph of paths in trees

  • m3: Graph minors IV: Widths of trees and well-quasi-ordering

  • m4: Graph minors: A survey

To make a searchable database, one might start by creating a list of key terms that appear in the documents (generally removing common words such as “a”, “the”, “of”, etc., called stop words — these words contribute little, if any, context). In our documents we identify the key words that are shown in italics. (Note that we are just selecting key words to make our example manageable, not necessarily identifying the most important words.) Using the key words we create a term-document matrix. The term-document matrix is the matrix in which the terms form the rows and the documents the columns. If A=[aij] is the term-document matrix, then aij counts the number of times word i appears in document j. The term-document matrix A for our library is

Figure 29.11. Term-document matrix for our library

One of our goals is to rate the pages in our library for relevance if we search for a query. For example, suppose we want to rate the pages for the query survey, computer. This query can be represented by the vector q=[0 0 1 0 0 0 0 0 1 0 0 0]T.

Project Activity 29.5.

In a standard term-matching search with m×n term-document matrix A, a query vector q would be matched with the terms to determine the number of matches. The matching counts the number of times each document agrees with the query.

(a)

Explain why this matching is accomplished by the matrix-vector product ATq.

(b)

Let y=[y1 y2  yn]T=ATq. Explain why yi=cos(θi)||ai||||q||, where ai is the ith column of A and θi is the angle between ai and q.

(c)

We can use the cosine calculation from part (b) to compare matches to our query — the closer the cosine is to 1, the better the match (dividing by the product of the norms is essentially converting all vectors to unit vectors for comparison purposes). This is often referred to as the cosine distance. Calculate the cosines of the θi for our example of the query q=[0 0 1 0 0 0 0 0 1 0 0 0]T. Order the documents from best to worst match for this query.

Though we were able to rate the documents in Project Activity 29.5 using the cosine distance, the result is less than satisfying. Documents c3, c4, and c5 are all related to computers but do not appear at all in or results. This is a problem with language searches — we don't want to compare just words, but we also need to compare the concepts the words represent. The fact that words can represent different things implies that a random choice of word by different authors can introduce noise into the word-concept relationship. To filter out this noise, we can apply the singular value decomposition to find a smaller set of concepts to better represent the relationships. Before we do so, we examine some useful properties of the term-document matrix.

Project Activity 29.6.

Let A=[a1 a2  a9], where a1, a2, , a9 are the columns of A.

(a)

In Project Activity 29.5 you should have seen that bij=aiTaj=aiaj. Assume for the moment that all of the entries in A are either 0 or 1. Explain why in this case the dot product aiaj tells us how many terms documents i and j have in common. Also, the matrix ATA takes dot products of the columns of A, which refer to what's happening in each document and so is looking at document-document interactions. For these reasons, we call ATA the document-document matrix.

(b)

Use appropriate technology to calculate the entries of the matrix C=[cij]=AAT. This matrix is the term-term matrix. Assume for the moment that all of the entries in A are either 0 or 1. Explain why if terms i and j occur together in k documents, then cij=k.

The nature of the term-term and document-document matrices makes it realistic to think about a SVD.

Project Activity 29.7.

To see why a singular value decomposition might be useful, suppose our term-document matrix A has singular value decomposition A=UΣVT. (Don't actually calculate the SVD yet).

(a)

Show that the document-document matrix ATA satisfies ATA=(VΣT)(VΣT)T. This means that we can compare document i and document j using the dot product of row i and column j of the matrix product VΣT.

(b)

Show that the term-term matrix AAT satisfies AAT=(UΣ)(UΣ)T. Thus we can compare term i and term j using the dot product of row i and column j of the matrix product UΣ. (Exercise 6 shows that the columns of U are orthogonal eigenvectors of AAT.)

As we will see, the connection of the matrices U and V to documents and terms that we saw in Project Activity 29.7 will be very useful when we use the SVD of the term-document matrix to reduce dimensions to a “concept” space. We will be able to interpret the rows of the matrices U and V as providing coordinates for terms and documents in this space.

Project Activity 29.8.

The singular value decomposition (SVD) allows us to produce new, improved term-document matrices. For this activity, use the term-document matrix A in Figure 29.11.

(a)

Use appropriate technology to find a singular value decomposition of A so that A=UΣVT. Print your entries to two decimal places (but keep as many as possible for computational purposes).

Each singular value tells us how important its semantic dimension is. If we remove the smaller singular values (the less important dimensions), we retain the important information but eliminate minor details and noise. We produce a new term-document matrix Ak by keeping the largest k of the singular values and discarding the rest. This gives us an approximation

Ak=σ1u1v1T+σ2u2v2T++σkukvkT

using the outer product decomposition, where σ1, σ2, , σk are the k largest singular values of A. Note that if A is an m×n matrix, letting Uk=[u1 u2  uk] (an m×k matrix), Σk the k×k matrix with the first k singular values along the diagonal, and VT=[v1 v2  vk]T (a k×n matrix), then we can also write Ak=UkΣkVkT. This is sometimes referred to as a reduced SVD. Find U2, Σ2, and V2T, and find the new term-document matrix A2.

Once we have our term-document matrix, there are three basic comparisons to make: comparing terms, comparing documents, and comparing terms and documents. Term-document matrices are usually very large, with dimension being the number of terms. By using a reduced SVD we can create a much smaller approximation. In our example, the matrix Ak in Project Activity 29.8 reduces our problem to a k-dimensional space. Intuitively, we can think of LSI as representing terms as averages of all of the documents in which they appear and documents as averages of all of the terms they contain. Through this process, LSI attempts to combine the surface information in our library into a deeper abstraction (the “concept” space) that captures the mutual relationships between terms and documents.

We now need to understand how we can represent documents and terms in this smaller space where Ak=UkΣkVkT. Informally, we can consider the rows of Uk as representing the coordinates of each term in the lower dimensional concept space and the columns of VkT as the coordinates of the documents, while the entries of Σk tell us how important each semantic dimension is. The dot product of two row vectors of Ak indicates how terms compare across documents. This product is AkAkT. Just as in Project Activity 29.7, we have AkAkT=(UkΣk)(UkΣk)T. In other words, if we consider the rows of UkΣk as coordinates for terms, then the dot products of these rows give us term to term comparisons. (Note that multiplying U by Σ just stretches the rows of U by the singular values according to the importance of the concept represented by that singular value.) Similarly, the dot product between columns of A provide a comparison of documents. This comparison is given by AkTAk=(VkΣkT)(VkΣkT)T (again by Project Activity 29.7). So we can consider the rows of VΣT as providing coordinates for documents.

Project Activity 29.9.

We have seen how to compare terms to terms and documents to documents. The matrix Ak itself compares terms to documents. Show that Ak=(UkΣk1/2)(VkΣk1/2)T, where Σk1/2 is the diagonal matrix of the same size as Σk whose diagonal entries are the square roots of the corresponding diagonal entries in Σk. Thus, all useful comparisons of terms and documents can be made using the rows of the matrices U and V, scaled in some way by the singular values in Σ.

To work in this smaller concept space, it is important to be able to find appropriate comparisons to objects that appeared in the original search. For example, to complete the latent structure view of the system, we must also convert the original query to a representation within the new term-document system represented by Ak. This new representation is called a pseudo-document.

Figure 29.12. Terms in the reduced concept space
Figure 29.13. Documents in the reduced concept space

Project Activity 29.10.

For an original query q, we start with its term vector aq (a vector in the coordinate system determined by the columns of A) and find a representation vq that we can use as a column of VT in the document-document comparison matrix. If this representation was perfect, then it would take a real document in the original system given by A and produce the corresponding column of U if we used the full SVD. In other words, we would have aq=UΣvqT.

(a)

Use the fact that Ak=UkΣkVkT, to show that Vk=AkTUkΣk1. It follows that q is transformed into the query qk=qTUkΣk1.

(b)

In our example, using k=2, the terms can now be represented as 2-dimensional vectors (the rows of U2, see Figure 29.12), or as points in the plane. More specifically, human is represented by the vector (to two decimal places) [0.22 0.11]T, interface by [0.20 0.07]T, etc. Similarly, the documents are represented by columns of V2 (see Figure 29.13), so that the document c1 is represented by [0.20 0.06]T, c2 by [0.61 0.17]T, etc. From this perspective we can visualize these documents in the plane. Plot the documents and the query in the 2-dimensional concept space. Then calculate the cosine distances from the query to the documents in this space. Which documents now give the best three matches to the query? Compare the matches to your plot.

As we can see from Project Activity 29.10, the original query had no match at all with any documents except c1, c2, and m4. In the new concept space, the query now has some connection to every document,. So LSI has made semantic connections between the terms and documents that were not present in the original term-document matrix, which gives us better results for our search.

Technically this definition should be in terms of a supremum, but because the equivalent definition restricts the x's to a compact subset, the sup is achieved and we can use max.
e.g., Deerwester, S., Dumais, S. T., Fumas, G. W., Landauer, T. K. and Harshman, R. Indexing by latent semantic analysis. Journal of the American Society for Information Science, 1990, 41: 391–407, and Landauer, T. and Dutnais, S. A Solution to Plato's Problem: The Latent Semantic Analysis Theory of Acquisition, Induction, and Representation of Knowledge. Psychological Review, 1997. Vol. 1M. No. 2, 211-240.