The Singular Value Decomposition

Section 29 The Singular Value Decomposition

Focus Questions

By the end of this section, you should be able to give precise and thorough answers to the questions listed below. You may want to keep these questions in mind to focus your thoughts as you complete the section.

What is the operator norm of a matrix and what does it tell us about the matrix?
What is a singular value decomposition of a matrix? Why is a singular value decomposition important?
How does a singular value decomposition relate fundamental subspaces connected to a matrix?
What is an outer product decomposition of a matrix and how is it useful?

🔗

Subsection Application: Search Engines and Semantics

🔗

Effective search engines search for more than just words. Language is complex and search engines must deal with the fact that there are often many ways to express a given concept (this is called synonymy, that multiple words can have the same meaning), and that a single word can have multiple meanings (polysemy). As a consequence, a search on a word may provide irrelevant matches (e.g., searching for derivative could provide pages on mathematics or financial securities) or you might search for articles on cats but the paper you really want uses the word felines. A better search engine will not necessarily try to match terms, but instead retrieve information based on concept or intent. Latent Semantic Indexing (LSI) (or Latent Semantic Analysis), developed in the late 1980s, helps search engines determine concept and intent in order to provide more accurate and relevant results. LSI essentially works by providing underlying (latent) relationships between words (semantics) that search engines need to provide context and understanding (indexing). LSI provides a mapping of both words and documents into a lower dimensional “concept” space, and makes the search in this new space. The mapping is provided by the singular value decomposition.

🔗

Subsection Introduction

🔗

The singular value decomposition (SVD) of a matrix is an important and useful matrix decomposition. Unlike other matrix decompositions, every matrix has a singular value decomposition. The SVD is used in a variety of applications including scientific computing, digital signal processing, image compression, principal component analysis, web searching through latent semantic indexing, and seismology. Recall that the eigenvector decomposition of an

n \times n

diagonalizable matrix

M

has the form

P^{- 1} M P,

where the columns of the matrix

P

are

n

linearly independent eigenvectors of

M

and the diagonal entries of the diagonal matrix

P^{- 1} M P

are the eigenvalues of

M .

The singular value decomposition does something similar for any matrix of any size. One of the keys to the SVD is that the matrix

A^{T} A

is symmetric for any matrix

A .

🔗

Subsection The Operator Norm of a Matrix

🔗

Before we introduce the Singular Value Decomposition, let us work through some preliminaries to motivate the idea. The first is to provide an answer to the question “How ‘big’ is a matrix?” There are many ways to interpret and answer this question, but a substantial (and useful) answer should involve more than just the dimensions of the matrix. A good measure of the size of a matrix, which we will refer to as the norm of the matrix, should take into account the action of the linear transformation defined by the matrix on vectors. This then will lead to questions about how difficult or easy is it to solve a matrix equation

A x = b .

🔗

If we want to incorporate the action of a matrix

A

into a calculation of the norm of

A,

we might think of measuring how much

A

can change a vector

x .

This could lead us to using

| | A x | |

as some sort of measure of a norm of

A .

However, since

| | A (c x) | | = | c | | | A x | |

for any scalar

c,

scaling

x

by a large scalar will produce a large norm, so this is not a viable definition of a norm. We could instead measure the relative effect that

A

has on a vector

x

\frac{| | A x | |}{| | x | |},

since this ratio does not change when

x

is multiplied by a scalar. The largest of all of these ratios would provide a good sense of how much

A

can change vectors. Thus, we define the operator norm of a matrix

A

as follows.

🔗

Definition 29.1.

The operator norm ⁵⁰ of a matrix $A$ is

| | A | | = max_{| | x | | \neq 0} {\frac{| | A x | |}{| | x | |}} .

🔗

Due to the linearity of matrix multiplication, we can restrict ourselves to unit vectors for an equivalent definition of the operator norm of the matrix

A

| | A | | = max_{| | x | | = 1} {| | A x | |} .

🔗

Preview Activity 29.1.

🔗

(a)

Determine $| | A | |$ if $A$ is the zero matrix.

🔗

(b)

Determine $| | I_{n} | |,$ where $I_{n}$ is the $n \times n$ identity matrix.

🔗

(c)

Let $A = [\begin{array}{cc} 1 & 0 \\ 0 & 2 \end{array}] .$ Find $| | A | | .$ Justify your answer. (Hint: $x_{1}^{2} + 4 x_{2}^{2} \leq 4 (x_{1}^{2} + x_{2}^{2}) .$ )

🔗

(d)

If $P$ is an orthogonal matrix, what is $| | P | | ?$ Why?

🔗

The operator norm of a matrix tells us that how big the action of an

m \times n

matrix is can be determined by its action on the unit sphere in

R^{n}

(the unit sphere is the set of terminal point of unit vectors). Let us consider two examples.

🔗

Example 29.2.

Let $A = [\begin{array}{cc} 2 & 1 \\ 2 & 5 \end{array}] .$ We can draw a graph to see the action of $A$ on the unit circle. A picture of the set ${A x : | | x | | = 1}$ is shown in Figure 29.3.

Figure 29.3. The image of the unit circle under the action of $A .$

| | A x | |^{2} = (A x)^{T} (A x) = x^{T} A^{T} A x .

Now $A^{T} A = [\begin{array}{cc} 8 & 12 \\ 12 & 26 \end{array}]$ is a symmetric matrix, so we can orthogonally diagonalize $A^{T} A .$ The eigenvalues of $A^{T} A$ are 32 and 2. Let $P = [u_{1} u_{2}],$ where $u_{1} = {[\frac{\sqrt{5}}{5} \frac{2 \sqrt{5}}{5}]}^{T}$ is a unit eigenvector of $A^{T} A$ with eigenvalue 32 and $u_{2} = {[- \frac{2 \sqrt{5}}{5} \frac{\sqrt{5}}{5}]}^{T}$ is a unit eigenvector of $A^{T} A$ with eigenvalue 2. Then $P$ is an orthogonal matrix such that $P^{T} (A^{T} A) P = [\begin{array}{cc} 32 & 0 \\ 0 & 2 \end{array}] = D .$ It follows that

x^{T} (A^{T} A) x = x^{T} P D P^{T} x = (P^{T} x)^{T} D (P^{T} x) .

Now $P^{T}$ is orthogonal, so $| | P^{T} x | | = | | x | |$ and $P^{T}$ maps the unit circle to the unit circle. Moreover, if $x$ is on the unit circle, then $y = P x$ is also on the unit circle and $P^{T} y = P^{T} P x = x .$ So every point $x$ on the unit circle corresponds to a point $P x$ on the unit circle. Thus, the forms $x^{T} (A^{T} A) x$ and $(P^{T} x)^{T} D (P^{T} x)$ take on exactly the same values over all points on the unit circle. Now we just need to find the maximum value of $(P^{T} x)^{T} D (P^{T} x) .$ This turns out to be relatively easy since $D$ is a diagonal matrix.

Let's simplify the notation. Let $y = P^{T} x .$ Then our job is to maximize $y^{T} D y .$ If $y = [y_{1} y_{2}]^{T},$ then

y^{T} D y = 32 y_{1}^{2} + 2 y_{2}^{2} .

We want to find the maximum value of this expression for $y$ on the unit circle. Note that $2 y_{2}^{2} \leq 32 y_{2}^{2}$ and so

32 y_{1}^{2} + 2 y_{2}^{2} \leq 32 y_{1}^{2} + 32 y_{2}^{2} = 32 (y_{1}^{2} + y_{2}^{2}) = 32 | | y | |^{2} = 32 .

Since $[1 0]^{T}$ is on the unit circle, the expression $32 y_{1}^{2} + 2 y_{2}^{2}$ attains the value 32 at some point on the unit circle, so 32 is the maximum value of $y^{T} D y$ over all $y$ on the unit circle. While we are at it, we can similarly find the minimum value of $y^{T} D y$ for $y$ on the unit circle. Since $2 y_{1}^{2} \leq 32 y_{1}^{2}$ we see that

32 y_{1}^{2} + 2 y_{2}^{2} \geq 2 y_{1}^{2} + 2 y_{2}^{2} = 2 (y_{1}^{2} + y_{2}^{2}) = 2 | | y | |^{2} = 2 .

Since the expression $y^{T} D y$ attains the value 2 at $[0 1]^{T}$ on the unit circle, we can see that $y^{T} D y$ attains the minimum value of 2 on the unit circle.

Now we can return to the expression $x^{T} (A^{T} A) x .$ Since $y^{T} D y$ assumes the same values as $x^{T} (A^{T} A) x,$ we can say that the maximum value of $x^{T} (A^{T} A) x$ for $x$ on the unit circle is 32 (and the minimum value is 2). Moreover, the quadratic form $(P^{T} x)^{T} D (P^{T} x)$ assumes its maximum value when $P^{T} x = [1 0]^{T}$ or $[- 1 0]^{T} .$ Thus, the form $x^{T} (A^{T} A) x$ assumes its maximum value at the vector $x = P [1 0]^{T} = u_{1}$ or $- u_{1} .$ Similarly, the quadratic form $x^{T} (A^{T} A) x$ attains its minimum value at $P [0 1]^{T} = u_{2}$ or $- u_{2} .$ We conclude that $| | A | | = \sqrt{32} .$

Figure 29.4 shows the image of the unit circle under the action of $A$ and the images of $A u_{1}$ and $A u_{2}$ where $u_{1}, u_{2}$ are the two unit eigenvectors of $A^{T} A .$ The image also supports that $A x$ assumes its maximum and minimum values for points on the unit circle at $u_{1}$ and $u_{2} .$

Figure 29.4. The image of the unit circle under the action of $A,$ and the vectors $A u_{1}$ and $A u_{2}$

🔗

IMPORTANTE NOTE 1.

🔗

What we have just argued is that the maximum value of

| | A x | |

for

x

on the unit sphere in

R^{n}

is the square root of the largest eigenvalue of

A^{T} A

and occurs at a corresponding unit eigenvector.

🔗

Example 29.5.

This same process works for matrices other than $2 \times 2$ ones. For example, consider $A = [\begin{array}{rrr} - 2 & 8 & 20 \\ 14 & 19 & 10 \end{array}] .$ In this case $A$ maps $R^{3}$ to $R^{2} .$ The image of the unit sphere ${x \in R^{3} : | | x | | = 1}$ under left multiplication by $A$ is a filled ellipse as shown in Figure 29.6.

Figure 29.6. The image of the unit circle under the action of $A,$ and the vectors $A u_{1}$ and $A u_{2}$

As with the previous example, the norm of $A$ is the square root of the maximum value of $x^{T} (A^{T} A) x$ and this maximum value is the dominant eigenvalue of $A^{T} A = [\begin{array}{ccc} 200 & 250 & 100 \\ 250 & 425 & 350 \\ 100 & 350 & 500 \end{array}] .$ The eigenvalues of $A$ are $λ_{1} = 900,$ $λ_{2} = 225,$ and $λ_{3} = 0$ with corresponding unit eigenvectors $u_{1} = {[\frac{1}{3} \frac{2}{3} \frac{2}{3}]}^{T},$ $u_{1} = {[- \frac{2}{3} - \frac{1}{3} \frac{2}{3}]}^{T},$ and $u_{3} = {[\frac{2}{3} - \frac{2}{3} \frac{1}{3}]}^{T} .$ So in this case we have $| | A | | = \sqrt{900} = 30 .$ The transformation defined by matrix multiplication by $A$ from $R^{3}$ to $R^{2}$ has a one-dimensional kernel which is spanned by the eigenvector corresponding to $λ_{3} .$ The image of the transformation is 2-dimensional and the image of the unit circle is an ellipse where $A u_{1}$ gives the major axis of the ellipse and $A u_{2}$ gives the minor axis. Essentially, the square roots of the eigenvalues of $A^{T} A$ tell us how $A$ stretches the image space in each direction.

🔗

IMPORTANT NOTE 2.

🔗

We have just argued that the image of the unit

n

-sphere under the action of an

m \times n

matrix is an ellipsoid in

R^{m}

stretched the greatest amount,

\sqrt{λ_{1}},

in the direction of an eigenvector for the largest eigenvalue (

λ_{1}

) of

A^{T} A;

the next greatest amount,

\sqrt{λ_{2}},

in the direction of a unit vector for the second largest eigenvalue (

λ_{2}

) of

A^{T} A;

and so on.

🔗

Activity 29.2.

Let $A = [\begin{array}{rc} 0 & 5 \\ 4 & 3 \\ - 2 & 1 \end{array}] .$ Then $A^{T} A = [\begin{array}{cc} 20 & 10 \\ 10 & 35 \end{array}] .$ The eigenvalues of $A^{T} A$ are $λ_{1} = 40$ and $λ_{2} = 15$ with respective eigenvectors $v_{1} = [\begin{matrix} \frac{1}{2} \\ 1 \end{matrix}]$ and $v_{2} = [\begin{array}{r} - 2 \\ 1 \end{array}] .$

🔗

(a)

Find $| | A | | .$

🔗

(b)

Find a unit vector $x$ at which $| | A x | |$ assumes its maximum value.

🔗

Subsection The SVD

🔗

The Singular Value Decomposition (SVD) is essentially a concise statement of what we saw in the previous section that works for any matrix. We will uncover the SVD in this section.

🔗

Preview Activity 29.3.

Let $A = [\begin{array}{ccc} 1 & 1 & 0 \\ 0 & 1 & 1 \end{array}] .$ Since $A$ is not square, we cannot diagonalize $A .$ However, the matrix

A^{T} A = [\begin{array}{ccc} 1 & 1 & 0 \\ 1 & 2 & 1 \\ 0 & 1 & 1 \end{array}]

is a symmetric matrix and can be orthogonally diagonalized. The eigenvalues of $A^{T} A$ are 3, 1, and 0 with corresponding eigenvectors

[\begin{matrix} 1 \\ 2 \\ 1 \end{matrix}], [\begin{array}{r} - 1 \\ 0 \\ 1 \end{array}], and [\begin{array}{r} 1 \\ - 1 \\ 1 \end{array}],

respectively. Use appropriate technology to do the following.

🔗

(a)

Find an orthogonal matrix $V = [v_{1} v_{2} v_{3}]$ that orthogonally diagonalizes $A^{T} A,$ where

V^{T} (A^{T} A) V = [\begin{array}{ccc} 3 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 0 \end{array}] .

🔗

(b)

For $i = 1, 2,$ let $u_{i} = \frac{A v_{i}}{| | A v_{i} | |} .$ Find each $u_{i} .$ Why don't we define $u_{3}$ in this way?

🔗

(c)

Let $U = [u_{1} u_{2}] .$ What kind of matrix is $U ?$ Explain.

🔗

(d)

Calculate the matrix product $U^{T} A V .$ What do you notice? How is this similar to the eigenvector decomposition of a matrix?

🔗

Preview Activity 29.3 contains the basic ideas behind the Singular Value Decomposition. Let

A

be an

m \times n

matrix with real entries. Note that

A^{T} A

is a symmetric

n \times n

matrix and, hence, it can be orthogonally diagonalized. Let

V = [v_{1} v_{2} v_{3} \dots v_{n}]

be an

n \times n

orthogonal matrix whose columns form an orthonormal set of eigenvectors for

A^{T} A .

For each

i,

let

(A^{T} A) v_{i} = λ_{i} v_{i} .

We know

V^{T} (A^{T} A) V = [\begin{array}{ccccc} λ_{1} & 0 & 0 & \dots & 0 \\ 0 & λ_{2} & 0 & \dots & 0 \\ ⋮ & ⋮ & ⋮ & ⋮ \\ 0 & 0 & 0 & \dots & λ_{n} \end{array}] .

🔗

Now notice that for each

i

we have

\begin{matrix} (29.1) & | | A v_{i} | |^{2} = (A v_{i})^{T} (A v_{i}) = v_{i}^{T} (A^{T} A) v_{i} = v_{i}^{T} λ_{i} v_{i} = λ_{i} | | v_{i} | |^{2} = λ_{i}, \end{matrix}

🔗

λ_{i} \geq 0 .

Thus, the matrix

A^{T} A

has no negative eigenvalues. We can always arrange the eigenvectors and eigenvalues of

A^{T} A

so that

λ_{1} \geq λ_{2} \geq \dots \geq λ_{n} \geq 0 .

🔗

Also note that

(A v_{i}) \cdot (A v_{j}) = (A v_{i})^{T} (A v_{j}) = v_{i}^{T} (A^{T} A) v_{j} = v_{i}^{T} λ_{j} v_{j} = λ_{j} v_{i} \cdot v_{j} = 0

🔗

i \neq j .

So the set

{A v_{1}, A v_{2}, \dots, A v_{n}}

is an orthogonal set in

R^{m} .

Each of the vectors

A v_{i}

is in

Col A,

and so

{A v_{1}, A v_{2}, \dots, A v_{n}}

is an orthogonal subset of

Col A .

It is possible that

A v_{i} = 0

for some of the

v_{i}

(if

A^{T} A

has 0 as an eigenvalue). Let

v_{1},

v_{2},

\dots,

v_{r}

be the eigenvectors corresponding to the nonzero eigenvalues. Then the set

B = {A v_{1}, A v_{2}, \dots, A v_{r}}

🔗

is a linearly independent set of nonzero orthogonal vectors in

Col A .

Now we will show that

B

is a basis for

Col A .

Let

y

be a vector in

Col A .

Then

y = A x

for some vector

x

R^{n} .

Recall that the vectors

v_{1},

v_{2},

\dots,

v_{n}

form an orthonormal basis of

R^{n},

x = x_{1} v_{1} + x_{2} v_{2} + \dots + x_{n} v_{n}

🔗

for some scalars

x_{1},

x_{2},

\dots,

x_{n} .

Since

A v_{j} = 0

for

r + 1 \leq j \leq n

we have

\begin{aligned} y & = A x \\ = A (x_{1} v_{1} + x_{2} v_{2} + \dots + x_{n} v_{n}) \\ = x_{1} A v_{1} + x_{2} A v_{2} + \dots + x_{r} A v_{r} + x_{r + 1} A v_{r + 1} + \dots + x_{n} A v_{n} \\ = x_{1} A v_{1} + x_{2} A v_{2} + \dots + x_{r} A v_{r} . \end{aligned}

🔗

Span B = Col A

and

B

is an orthogonal basis for

Col A .

🔗

Now we are ready to find the Singular Value Decomposition of

A .

First we create an orthonormal basis

{u_{1}, u_{2}, \dots, u_{r}}

for

Col A

by normalizing the vectors

A v_{i} .

So we let

u_{i} = \frac{A v_{i}}{| | A v_{i} | |}

🔗

for

i

from 1 to

r .

🔗

Remember from (29.1) that

| | A v_{i} | |^{2} = λ_{i},

so if we let

σ_{i} = \sqrt{λ_{i}},

then we have

u_{i} = \frac{A v_{i}}{σ_{i}} and A v_{i} = σ_{i} u_{i} .

🔗

We ordered the

λ_{i}

so that

λ_{1} \geq λ_{2} \geq \dots \geq λ_{n},

so we also have

σ_{1} \geq σ_{2} \geq \dots \geq σ_{r} > 0 .

🔗

The scalars

σ_{1},

σ_{2},

\dots,

σ_{n}

are called the singular values of

A .

🔗

Definition 29.7.

Let $A$ be an $m \times n$ matrix. The singular values of $A$ are the square roots of the eigenvalues of $A^{T} A .$

🔗

The vectors

u_{1},

u_{2},

\dots,

u_{r}

are

r

orthonormal vectors in

R^{m} .

We can extend the set

{u_{1},

u_{2},

\dots,

u_{r}}

to an orthonormal basis

C = {u_{1}, u_{2}, \dots, u_{r}, u_{r + 1} u_{r + 2}, \dots, u_{m}}

R^{m} .

Recall that

A v_{i} = σ_{i} u_{i}

for

1 \leq i \leq r

and

A v_{j} = 0

for

r + 1 \leq j \leq n,

\begin{aligned} A V & = A [v_{1} v_{2} \dots v_{n}] \\ = [A v_{1} A v_{2} \dots A v_{n}] \\ = [σ_{1} u_{1} σ_{2} u_{2} \dots σ_{r} u_{r} 0 0 \dots 0] . \end{aligned}

🔗

We can write the matrix

[σ_{1} v_{1} σ_{2} v_{2} \dots σ_{r} v_{r} 0 0 \dots 0]

in another way. Let

Σ

be the

m \times n

matrix defined as

Σ = [\begin{array}{cccccc} σ_{1} & 0 \\ σ_{2} & 0 \\ σ_{3} \\ ⋱ \\ 0 & σ_{r} \\ 0 & 0 \end{array}] .

🔗

Now

[u_{1} u_{2} \dots u_{m}] Σ = [σ_{1} u_{1} σ_{1} u_{2} \dots σ_{r} u_{r} 0 0 \dots 0] = A V .

🔗

So if

U = [u_{1} u_{2} \dots u_{m}],

then

U Σ = A V .

🔗

Since

V

is an orthogonal matrix, we have that

U Σ V^{T} = A V V^{T} = A .

🔗

This is the Singular Value Decomposition of

A .

🔗

Theorem 29.8. The Singular Value Decomposition.

Let $A$ be an $m \times n$ matrix of rank $r .$ There exist an $m \times m$ orthogonal matrix $U,$ an $n \times n$ orthogonal matrix $V,$ and an $m \times n$ matrix $Σ$ whose first $r$ diagonal entries are the singular values $σ_{1},$ $σ_{2},$ $\dots,$ $σ_{r}$ and whose other entries are 0, such that

A = U Σ V^{T} .

🔗

SVD Summary.

🔗

A Singular Value Decomposition of an

m \times n

matrix

A

of rank

r

can be found as follows.

Find an orthonormal basis ${v_{1}, v_{2}, v_{3}, \dots, v_{n}}$ of eigenvectors of $A^{T} A$ such that $(A^{T} A) v_{i} = λ_{i} v_{i}$ for $i$ from 1 to $n$ with $λ_{1} \geq λ_{2} \geq \dots \geq λ_{n} \geq 0$ with the first $r$ eigenvalues being positive. The vectors $v_{1},$ $v_{2},$ $v_{3},$ $\dots,$ $v_{n}$ are the right singular vectors of $A .$
Let

$V = [v_{1} v_{2} v_{3} \dots v_{n}] .$

Then $V$ orthogonally diagonalizes $A^{T} A .$
The singular values of $A$ are the numbers $σ_{i},$ where $σ_{i} = \sqrt{λ_{i}} > 0$ for $i$ from 1 to $r .$ Let $Σ$ be the $m \times n$ matrix

$Σ = [\begin{array}{cccccc} σ_{1} & 0 \\ σ_{2} & 0 \\ σ_{3} \\ ⋱ \\ 0 & σ_{r} \\ 0 & 0 \end{array}]$
For $i$ from 1 to $r,$ let $u_{i} = \frac{A v_{i}}{| | A v_{i} | |} .$ Then the set ${u_{1}, u_{2}, \dots, u_{r}}$ forms an orthonormal basis of $Col A .$
Extend the set ${u_{1}, u_{2}, \dots, u_{r}}$ to an orthonormal basis

${u_{1}, u_{2}, \dots, u_{r}, u_{r + 1} u_{r + 2}, \dots, u_{m}}$

of $R^{m} .$ Let

$U = [u_{1} u_{2} \dots u_{m}] .$

The vectors $u_{1},$ $u_{2},$ $\dots,$ $u_{m}$ are the left singular vectors of $A .$
Then $A = U Σ V^{T}$ is a singular value decomposition of $A .$

🔗

Activity 29.4.

Let $A = [\begin{array}{rc} 0 & 5 \\ 4 & 3 \\ - 2 & 1 \end{array}] .$ Then $A^{T} A = [\begin{array}{cc} 20 & 10 \\ 10 & 35 \end{array}] .$ The eigenvalues of $A^{T} A$ are $λ_{1} = 40$ and $λ_{2} = 15$ with respective eigenvectors $w_{1} = [\begin{matrix} 1 \\ 2 \end{matrix}]$ and $w_{2} = [\begin{array}{r} - 2 \\ 1 \end{array}] .$

🔗

(a)

Find an orthonormal basis ${v_{1}, v_{2}, v_{3}, \dots, v_{n}}$ of eigenvectors of $A^{T} A .$ What is $n ?$ Find the matrix $V$ in a SVD for $A .$

🔗

(b)

Find the singular values of $A .$ What is the rank $r$ of $A ?$ Why?

🔗

(c)

What are the dimensions of the matrix $Σ$ in a SVD of $A ?$ Find $Σ .$

🔗

(d)

Find the vectors $u_{1},$ $u_{2},$ $\dots,$ $u_{r} .$ If necessary, extend this set to an orthonormal basis

{u_{1}, u_{2}, \dots, u_{r}, u_{r + 1} u_{r + 2}, \dots, u_{m}}

of $R^{m} .$

🔗

(e)

Find the matrix $U$ so that $A = U Σ V^{T}$ is a SVD for $A .$

🔗

There is another way we can write this SVD of

A .

Let the

m \times n

matrix

A

have a singular value decomposition

U Σ V^{T},

where

\begin{aligned} U & = [u_{1} u_{2} \dots u_{m}], \\ Σ & = [\begin{array}{cccccc} σ_{1} & 0 \\ σ_{2} & 0 \\ σ_{3} \\ ⋱ \\ 0 & σ_{r} \\ 0 & 0 \end{array}], and \\ V & = [v_{1} v_{2} v_{3} \dots v_{n}] . \end{aligned}

🔗

Since

A = U Σ V^{T}

we see that

\begin{aligned} A & = [u_{1} u_{2} u_{3} \dots u_{m}] [\begin{array}{cccccc} σ_{1} & 0 \\ σ_{2} & 0 \\ σ_{3} \\ ⋱ \\ 0 & σ_{r} \\ 0 & 0 \end{array}] [\begin{array}{c} v_{1}^{T} \\ v_{2}^{T} \\ v_{3}^{T} \\ ⋮ \\ v_{n}^{T} \end{array}] \\ = [σ_{1} u_{1} σ_{2} u_{2} σ_{3} u_{3} \dots σ_{r} u_{r} 0 \dots 0] [\begin{array}{c} v_{1}^{T} \\ v_{2}^{T} \\ v_{3}^{T} \\ ⋮ \\ v_{n}^{T} \end{array}] \\ (29.2) & = σ_{1} u_{1} v_{1}^{T} + σ_{2} u_{2} v_{2}^{T} + σ_{3} u_{3} v_{3}^{T} + \dots + σ_{r} u_{r} v_{r}^{T} . \end{aligned}

🔗

This is called an outer product decomposition of

A

and tells us everything we learned above about the action of the matrix

A

as a linear transformation. Each of the products

u_{i} v_{i}^{T}

is a rank 1 matrix (see Exercise 9), and

| | A v_{1} | | = σ_{1}

is the largest value

A

takes on the unit

n

-sphere,

| | A v_{2} | | = σ_{2}

is the next largest dilation of the unit

n

-sphere, and so on. An outer product decomposition allows us to approximate

A

with smaller rank matrices. For example, the matrix

σ_{1} u_{1} v_{1}^{T}

is the best rank 1 approximation to

A,

σ_{1} u_{1} v_{1}^{T} + σ_{2} u_{2} v_{2}^{T}

is the best rank 2 approximation, and so on. This will be very useful in applications, as we will see in the next section.

🔗

Subsection SVD and the Null, Column, and Row Spaces of a Matrix

🔗

We conclude this section with a short discussion of how a singular value decomposition relates fundamental subspaces of a matrix. We have seen that the vectors

u_{1},

u_{2},

\dots,

u_{r}

in an SVD for an

m \times n

matrix

A

form a basis for

Col A .

Recall also that

A v_{j} = 0

for

r + 1 \leq j \leq n .

Since

\dim (Nul A) + \dim (Col A) = n,

it follows that the vectors

v_{r + 1},

v_{r + 2},

\dots,

v_{n}

form a basis for

Nul A .

As you will show in the exercises, the set

{v_{1}, v_{2}, \dots, v_{r}}

is a basis for

Row A .

Thus, an SVD for a matrix

A

tells us about three fundamental vector spaces related to

A .

🔗

Subsection Examples

🔗

What follows are worked examples that use the concepts from this section.

🔗

Example 29.9.

Let $A = [\begin{array}{cccc} 2 & 0 & 0 & 0 \\ 0 & 2 & 1 & 0 \\ 0 & 1 & 2 & 0 \end{array}] .$

🔗

(a)

Find a singular value decomposition for $A .$ You may use technology to find eigenvalues and eigenvectors of matrices.

Solution.

With $A$ as given, we have $A^{T} A = [\begin{array}{cccc} 4 & 0 & 0 & 0 \\ 0 & 5 & 4 & 0 \\ 0 & 4 & 5 & 0 \\ 0 & 0 & 0 & 0 \end{array}] .$ Technology shows that the eigenvalues of $A^{T} A$ are $λ_{1} = 9,$ $λ_{2} = 4,$ $λ_{3} = 1,$ and $λ_{4} = 0$ with corresponding orthonormal eigenvectors $v_{1} = \frac{1}{\sqrt{2}} [0 1 1 0]^{T},$ $v_{2} = [1 0 0 0]^{T},$ $v_{3} = \frac{1}{\sqrt{2}} [0 - 1 1 0]^{T},$ and $v_{4} = [0 0 0 1]^{T} .$ This makes $V = [v_{1} v_{2} v_{3} v_{4}] .$ The singular values of $A$ are $σ_{1} = \sqrt{9} = 3,$ $σ_{2} = \sqrt{4} = 2,$ $σ_{3} = \sqrt{1} = 1,$ and $σ_{4} = 0,$ so $Σ$ is the $3 \times 4$ matrix with the nonzero singular values along the diagonal and zeros everywhere else. Finally, we define the vectors $u_{i}$ as $u_{i} = \frac{1}{| | A v_{i} | |} A v_{i} .$ Again, technology gives us $u_{1} = \frac{1}{\sqrt{2}} [0 1 1]^{T},$ $u_{2} = [1 0 0]^{T},$ and $u_{3} = \frac{1}{\sqrt{2}} [0 - 1 1]^{T} .$ Thus, a singular value decomposition of $A$ is $U Σ V^{T},$ where

\begin{aligned} U & = [\begin{array}{ccr} 0 & 1 & 0 \\ \frac{1}{\sqrt{2}} & 0 & - \frac{1}{\sqrt{2}} \\ \frac{1}{\sqrt{2}} & 0 & \frac{1}{\sqrt{2}} \end{array}], \\ Σ & = [\begin{array}{cccc} 3 & 0 & 0 & 0 \\ 0 & 2 & 0 & 0 \\ 0 & 0 & 1 & 0 \end{array}], and \\ V & = [\begin{array}{ccrc} 0 & 1 & 0 & 0 \\ \frac{1}{\sqrt{2}} & 0 & - \frac{1}{\sqrt{2}} & 0 \\ \frac{1}{\sqrt{2}} & 0 & \frac{1}{\sqrt{2}} & 0 \\ 0 & 0 & 0 & 1 \end{array}] . \end{aligned}

🔗

(b)

Use the singular value decomposition to find a basis for $Col A,$ $Row A,$ and $Nul A .$

Solution.

Recall that the right singular vectors of an $m \times n$ matrix $A$ of rank $r$ form an orthonormal basis ${v_{1}, v_{2}, v_{3}, \dots, v_{n}}$ of eigenvectors of $A^{T} A$ such that $(A^{T} A) v_{i} = λ_{i} v_{i}$ for $i$ from 1 to $n$ with $λ_{1} \geq λ_{2} \geq \dots \geq λ_{n} \geq 0 .$ These vectors are the columns of the matrix $V = [v_{1} v_{2} \dots v_{n}]$ in a singular value decomposition of $A .$ For $i$ from 1 to $r,$ we let $u_{i} = \frac{A v_{i}}{| | A v_{i} | |} .$ Then the set ${u_{1}, u_{2}, \dots, u_{r}}$ forms an orthonormal basis of $Col A .$ We extend this set ${u_{1}, u_{2}, \dots, u_{r}}$ to an orthonormal basis ${u_{1}, u_{2}, \dots, u_{r}, u_{r + 1} u_{r + 2}, \dots, u_{m}}$ of $R^{m} .$ Recall also that $A v_{j} = 0$ for $r + 1 \leq j \leq n .$ Since $\dim (Nul A) + \dim (Col A) = n,$ it follows that the vectors $v_{r + 1},$ $v_{r + 2},$ $\dots,$ $v_{n}$ form a basis for $Nul A .$ Finally, the set ${v_{1}, v_{2}, \dots, v_{r}}$ is a basis for $Row A .$ So in our example, we have $m = 3,$ $n = 4,$ $v_{1} = \frac{1}{\sqrt{2}} [0 1 1 0]^{T},$ $v_{2} = [1 0 0 0]^{T},$ $v_{3} = \frac{1}{\sqrt{2}} [0 - 1 1 0]^{T},$ and $v_{4} = [0 0 0 1]^{T} .$ Since the singular values of $A$ are $3,$ $2,$ $1,$ and $0,$ it follows that $r = rank (A) = 3 .$ We also have $u_{1} = \frac{1}{\sqrt{2}} [0 1 1]^{T},$ $u_{2} = [1 0 0]^{T},$ and $u_{3} = \frac{1}{\sqrt{2}} [0 - 1 1]^{T} .$ So

{\frac{1}{\sqrt{2}} [0 1 1 0]^{T}, [1 0 0 0]^{T}, \frac{1}{\sqrt{2}} [0 - 1 1 0]^{T}}

is a basis for $Row A$ and

{[0 0 0 1]^{T}}

is a basis for $Nul A .$ Finally, the set

{\frac{1}{\sqrt{2}} [0 1 1]^{T}, [1 0 0]^{T}, \frac{1}{\sqrt{2}} [[0 - 1 1]^{T}}

is a basis for $Col A .$

🔗

Example 29.10.

Let

A = [\begin{array}{ccc} 2 & 5 & 4 \\ 6 & 3 & 0 \\ 6 & 3 & 0 \\ 2 & 5 & 4 \end{array}] .

The eigenvalues of $A^{T} A$ are $λ_{1} = 144,$ $λ_{2} = 36,$ and $λ_{3} = 0$ with corresponding eigenvectors

w_{1} = [\begin{matrix} 2 \\ 2 \\ 1 \end{matrix}], w_{1} = [\begin{array}{r} - 2 \\ 1 \\ 2 \end{array}], and w_{1} = [\begin{array}{r} 1 \\ - 2 \\ 2 \end{array}] .

In addition,

A w_{1} = [\begin{matrix} 18 \\ 18 \\ 18 \\ 18 \end{matrix}] and A w_{2} = [\begin{array}{r} 9 \\ - 9 \\ - 9 \\ 9 \end{array}] .

🔗

(a)

Find orthogonal matrices $U$ and $V,$ and the matrix $Σ,$ so that $U Σ V^{T}$ is a singular value decomposition of $A .$

Solution.

Normalizing the eigenvectors $w_{1},$ $w_{2},$ and $w_{3}$ to normal eigenvectors $v_{1},$ $v_{2},$ and $v_{3},$ respectively, gives us an orthogonal matrix

V = [\begin{array}{crr} \frac{2}{3} & - \frac{2}{3} & \frac{1}{3} \\ \frac{2}{3} & \frac{1}{3} & - \frac{2}{3} \\ \frac{1}{3} & \frac{2}{3} & \frac{2}{3} \end{array}] .

Now $A v_{i} = A \frac{w_{i}}{| | w_{i} | |} = \frac{1}{| | w_{i} | |} A w_{i},$ so normalizing the vectors $A v_{1}$ and $A v_{2}$ gives us vectors

u_{1} = \frac{1}{2} [\begin{matrix} 1 \\ 1 \\ 1 \\ 1 \end{matrix}] and u_{2} = \frac{1}{2} [\begin{array}{r} 1 \\ - 1 \\ - 1 \\ 1 \end{array}]

that are the first two columns of our matrix $U .$ Given that $U$ is a $4 \times 4$ matrix, we need to find two other vectors orthogonal to $u_{1}$ and $u_{2}$ that will combine with $u_{1}$ and $u_{2}$ to form an orthogonal basis for $R^{4} .$ Letting $z_{1} = [1 1 1 1]^{T},$ $z_{2} = [1 - 1 - 1 1]^{T},$ $z_{3} = [1 0 0 0]^{T},$ and $z_{4} = [0 1 0 1]^{T},$ a computer algebra system shows that the reduced row echelon form of the matrix $[z_{1} z_{2} z_{3} z_{4}]$ is $I_{4},$ so that vectors $z_{1},$ $z_{2},$ $z_{3},$ $z_{4}$ are linearly independent. Letting $w_{1} = z_{1}$ and $w_{2} = z_{2},$ the Gram-Schmidt process shows that the set ${w_{1}, w_{2}, w_{3}, w_{4}}$ is an orthogonal basis for $R^{4},$ where

\begin{aligned} w_{3} & = [1 0 0 0]^{T} - \frac{[1 0 0 0]^{T} \cdot [1 1 1 1]^{T}}{[1 1 1 1]^{T} \cdot [1 1 1 1]^{T}} [1 1 1 1]^{T} \\ - \frac{[1 0 0 0]^{T} \cdot [1 - 1 - 1 1]^{T}}{[1 - 1 - 1 1]^{T} \cdot [1 - 1 - 1 1]^{T}} [1 - 1 - 1 1]^{T} \\ = [1 0 0 0]^{T} - \frac{1}{4} [1 1 1 1]^{T} - \frac{1}{4} [1 - 1 - 1 1]^{T} \\ = \frac{1}{4} [2 0 0 - 2]^{T} \end{aligned}

and (using $[1 0 0 - 1]^{T}$ for $w_{3}$ )

\begin{aligned} w_{4} & = [0 1 0 0]^{T} - \frac{[0 1 0 0]^{T} \cdot [1 1 1 1]^{T}}{[1 1 1 1]^{T} \cdot [1 1 1 1]^{T}} [1 1 1 1]^{T} \\ - \frac{[0 1 0 0]^{T} \cdot [1 - 1 - 1 1]^{T}}{[1 - 1 - 1 1]^{T} \cdot [1 - 1 - 1 1]^{T}} [1 - 1 - 1 1]^{T} \\ - \frac{[0 1 0 0]^{T} \cdot [1 0 0 - 1]^{T}}{[1 0 0 - 1]^{T} \cdot [1 0 0 - 1]^{T}} [1 0 0 - 1]^{T} \\ = [0 1 0 0]^{T} - \frac{1}{4} [1 1 1 1]^{T} + \frac{1}{4} [1 - 1 - 1 1]^{T} - 0 \\ = \frac{1}{4} [0 2 - 2 0]^{T} . \end{aligned}

The set ${u_{1}, u_{2}, u_{3}, u_{4}}$ where $u_{1} = \frac{1}{2} [1 1 1 1]^{T},$ $u_{2} = \frac{1}{2} [1 - 1 - 1 1]^{T},$ $u_{3} = \frac{1}{\sqrt{2}} [1 0 0 - 1]^{T}$ and $u_{4} = \frac{1}{\sqrt{2}} [0 1 - 1 0]^{T}$ is an orthonormal basis for $R^{4}$ and we can let

U = [\begin{array}{crrr} \frac{1}{2} & \frac{1}{2} & \frac{1}{\sqrt{2}} & 0 \\ \frac{1}{2} & - \frac{1}{2} & 0 & \frac{1}{\sqrt{2}} \\ \frac{1}{2} & - \frac{1}{2} & 0 & - \frac{1}{\sqrt{2}} \\ \frac{1}{2} & \frac{1}{2} & - \frac{1}{\sqrt{2}} & 0 \end{array}] .

The singular values of $A$ are $σ_{1} = \sqrt{λ_{1}} = 12$ and $σ_{2} = \sqrt{λ_{2}} = 6,$ and so

Σ = [\begin{matrix} 12 & 0 & 0 \\ 0 & 6 & 0 \\ 0 & 0 & 0 \\ 0 & 0 & 0 \end{matrix}] .

Therefore, a singular value decomposition of $A$ is $U Σ V^{T}$ of

[\begin{array}{crrr} \frac{1}{2} & \frac{1}{2} & \frac{1}{\sqrt{2}} & 0 \\ \frac{1}{2} & - \frac{1}{2} & 0 & \frac{1}{\sqrt{2}} \\ \frac{1}{2} & - \frac{1}{2} & 0 & - \frac{1}{\sqrt{2}} \\ \frac{1}{2} & \frac{1}{2} & - \frac{1}{\sqrt{2}} & 0 \end{array}] [\begin{matrix} 12 & 0 & 0 \\ 0 & 6 & 0 \\ 0 & 0 & 0 \\ 0 & 0 & 0 \end{matrix}] [\begin{array}{rrc} \frac{2}{3} & \frac{2}{3} & \frac{1}{3} \\ - \frac{2}{3} & \frac{1}{3} & \frac{2}{3} \\ \frac{1}{3} & - \frac{2}{3} & \frac{2}{3} \end{array}] .

🔗

(b)

Determine the best rank 1 approximation to $A .$

Solution.

Determine the best rank 1 approximation to $A .$ The outer product decomposition of $A$ is

A = σ_{1} u_{1} v_{1}^{T} + σ_{2} u_{2} v_{2}^{T} .

So the rank one approximation to $A$ is

σ_{1} u_{1} v_{1}^{T} = 12 (\frac{1}{2}) [\begin{matrix} 1 \\ 1 \\ 1 \\ 1 \end{matrix}] [\begin{array}{ccc} \frac{2}{3} & \frac{2}{3} & \frac{1}{3} \end{array}] = [\begin{array}{ccc} 4 & 4 & 2 \\ 4 & 4 & 2 \\ 4 & 4 & 2 \\ 4 & 4 & 2 \end{array}] .

Note that the rows in this rank one approximation are the averages of the two distinct rows in the matrix $A,$ which makes sense considering that this is the closest rank one matrix to $A .$

🔗

Subsection Summary

🔗

We learned about the singular value decomposition of a matrix.

The operator norm of an $m \times n$ matrix $A$ is

$| | A | | = max_{| | x | | \neq 0} \frac{| | A x | |}{| | x | |} = max_{| | x | | = 1} | | A x | | .$

The operator norm of a matrix tells us that how big the action of an $m \times n$ matrix is can be determined by its action on the unit sphere in $R^{n} .$
A singular value decomposition of an $m \times n$ matrix is of the form $A = U Σ V^{T},$ where
- $V = [v_{1} v_{2} v_{3} \dots v_{n}]$ where ${v_{1}, v_{2}, v_{3}, \dots, v_{n}}$ is an orthonormal basis of eigenvectors of $A^{T} A$ such that $(A^{T} A) v_{i} = λ_{i} v_{i}$ for $i$ from 1 to $n$ with $λ_{1} \geq λ_{2} \geq \dots \geq λ_{n} \geq 0,$
- $U = [u_{1} u_{2} \dots u_{m}]$ where $u_{i} = \frac{A v_{i}}{| | A v_{i} | |}$ for $i$ from 1 to $r,$ and this orthonormal basis of $Col A$ is extended to an orthonormal basis ${u_{1}, u_{2}, \dots, u_{r}, u_{r + 1} u_{r + 2}, \dots, u_{m}}$ of $R^{m},$
- $Σ = [\begin{array}{cccccc} σ_{1} & 0 \\ σ_{2} & 0 \\ σ_{3} \\ ⋱ \\ 0 & σ_{r} \\ 0 & 0 \end{array}],$ where $σ_{i} = \sqrt{λ_{i}} > 0$ for $i$ from 1 to $r .$
  
  A singular value decomposition is important in that every matrix has a singular value decomposition, and a singular value decomposition has a variety of applications including scientific computing and digital signal processing, image compression, principal component analysis, web searching through latent semantic indexing, and seismology.
The vectors $u_{1},$ $u_{2},$ $\dots,$ $u_{r}$ in an SVD for an $m \times n$ matrix $A$ form a basis for $Col A$ while the vectors $v_{r + 1},$ $v_{r + 2},$ $\dots,$ $v_{n}$ form a basis for $Nul A .$ Also, the set ${v_{1}, v_{2}, \dots, v_{r}}$ is a basis for $Row A .$
Let $A$ have an SVD as in the second bullet. Decomposing $A$ as

$A = σ_{1} u_{1} v_{1}^{T} + σ_{2} u_{2} v_{2}^{T} + σ_{3} u_{3} v_{3}^{T} + \dots + σ_{r} u_{r} v_{r}^{T}$

is an outer product decomposition of $A .$ An outer product decomposition allows us to approximate $A$ with smaller rank matrices. For example, the matrix $σ_{1} u_{1} v_{1}^{T}$ is the best rank 1 approximation to $A,$ $σ_{1} u_{1} v_{1}^{T} + σ_{2} u_{2} v_{2}^{T}$ is the best rank 2 approximation, and so on.

🔗

Exercises Exercises

Hint.

The set ${v_{1}, v_{2}, \dots, v_{n}}$ is an orthonormal basis of $R^{n} .$ Use this to show that $| | A x | |^{2} \leq σ_{1}^{2}$ for any unit vector $x$ in $R^{n} .$

🔗

5.

Show that $A$ and $A^{T}$ have the same nonzero singular values. How are their singular value decompositions related?

Hint.

Find the transpose of an SVD for $A .$

🔗

6.

The vectors $v_{i}$ that form the columns of the matrix $V$ in a singular value decomposition of a matrix $A$ are eigenvectors of $A^{T} A .$ In this exercise we investigate the vectors $u_{i}$ that make up the columns of the matrix $U$ in a singular value decomposition of a matrix $A$ for each $i$ between 1 and the rank of $A,$ and their connection to the matrix $A A^{T} .$

🔗

(a)

Let $A = [\begin{array}{ccr} 1 & 1 & 0 \\ 0 & 1 & - 1 \end{array}] .$ A singular value decomposition of $A$ is $U Σ V^{T},$ where

Let $u_{1},$ $u_{2},$ $\dots,$ $u_{r}$ and $v_{1},$ $v_{2},$ $\dots,$ $v_{r}$ be the vectors found in a singular value decomposition of a matrix $A,$ where $r$ is the rank of $A .$ Show that $u_{i} v_{i}^{T}$ is a rank 1 matrix for each $i .$

Hint.

through

c_{5}

(titles of documents about human-computer interaction) and

m_{1}

through

m_{4}

(titles of graph theory papers) that make up our library:

$c_{1} :$ Human machine interface for ABC computer applications
$c_{2} :$ A survey of user opinion of computer system response time
$c_{3} :$ The EPS user interface management system
$c_{4} :$ System and human system engineering testing of EPS
$c_{5} :$ Relation of user perceived response time to error measurement
$m_{1} :$ The generation of random, binary, ordered trees
$m_{2} :$ The intersection graph of paths in trees
$m_{3} :$ Graph minors IV: Widths of trees and well-quasi-ordering
$m_{4} :$ Graph minors: A survey

🔗

To make a searchable database, one might start by creating a list of key terms that appear in the documents (generally removing common words such as “a”, “the”, “of”, etc., called stop words — these words contribute little, if any, context). In our documents we identify the key words that are shown in italics. (Note that we are just selecting key words to make our example manageable, not necessarily identifying the most important words.) Using the key words we create a term-document matrix. The term-document matrix is the matrix in which the terms form the rows and the documents the columns. If

A = [a_{i j}]

is the term-document matrix, then

a_{i j}

counts the number of times word

i

appears in document

j .

The term-document matrix

A

for our library is

🔗
Figure 29.11. Term-document matrix for our library

🔗

One of our goals is to rate the pages in our library for relevance if we search for a query. For example, suppose we want to rate the pages for the query survey, computer. This query can be represented by the vector

q = [0 0 1 0 0 0 0 0 1 0 0 0]^{T} .

🔗

Project Activity 29.5.

In a standard term-matching search with $m \times n$ term-document matrix $A,$ a query vector $q$ would be matched with the terms to determine the number of matches. The matching counts the number of times each document agrees with the query.

🔗

(a)

Explain why this matching is accomplished by the matrix-vector product $A^{T} q .$

🔗

(b)

Let $y = [y_{1} y_{2} \dots y_{n}]^{T} = A^{T} q .$ Explain why $y_{i} = \cos (θ_{i}) | | a_{i} | | | | q | |,$ where $a_{i}$ is the $i$ th column of $A$ and $θ_{i}$ is the angle between $a_{i}$ and $q .$

🔗

(c)

We can use the cosine calculation from part (b) to compare matches to our query — the closer the cosine is to $1,$ the better the match (dividing by the product of the norms is essentially converting all vectors to unit vectors for comparison purposes). This is often referred to as the cosine distance. Calculate the cosines of the $θ_{i}$ for our example of the query $q = [0 0 1 0 0 0 0 0 1 0 0 0]^{T} .$ Order the documents from best to worst match for this query.

🔗

Though we were able to rate the documents in Project Activity 29.5 using the cosine distance, the result is less than satisfying. Documents

c_{3},

c_{4},

and

c_{5}

are all related to computers but do not appear at all in or results. This is a problem with language searches — we don't want to compare just words, but we also need to compare the concepts the words represent. The fact that words can represent different things implies that a random choice of word by different authors can introduce noise into the word-concept relationship. To filter out this noise, we can apply the singular value decomposition to find a smaller set of concepts to better represent the relationships. Before we do so, we examine some useful properties of the term-document matrix.

🔗

Project Activity 29.6.

Let $A = [a_{1} a_{2} \dots a_{9}],$ where $a_{1},$ $a_{2},$ $\dots,$ $a_{9}$ are the columns of $A .$

🔗

(a)

In Project Activity 29.5 you should have seen that $b_{i j} = a_{i}^{T} a_{j} = a_{i} \cdot a_{j} .$ Assume for the moment that all of the entries in $A$ are either $0$ or $1 .$ Explain why in this case the dot product $a_{i} \cdot a_{j}$ tells us how many terms documents $i$ and $j$ have in common. Also, the matrix $A^{T} A$ takes dot products of the columns of $A,$ which refer to what's happening in each document and so is looking at document-document interactions. For these reasons, we call $A^{T} A$ the document-document matrix.

🔗

(b)

Use appropriate technology to calculate the entries of the matrix $C = [c_{i j}] = A A^{T} .$ This matrix is the term-term matrix. Assume for the moment that all of the entries in $A$ are either $0$ or $1 .$ Explain why if terms $i$ and $j$ occur together in $k$ documents, then $c_{i j} = k .$

🔗

The nature of the term-term and document-document matrices makes it realistic to think about a SVD.

🔗

Project Activity 29.7.

To see why a singular value decomposition might be useful, suppose our term-document matrix $A$ has singular value decomposition $A = U Σ V^{T} .$ (Don't actually calculate the SVD yet).

🔗

(a)

Show that the document-document matrix $A^{T} A$ satisfies $A^{T} A = (V Σ^{T}) {(V Σ^{T})}^{T} .$ This means that we can compare document $i$ and document $j$ using the dot product of row $i$ and column $j$ of the matrix product $V Σ^{T} .$

🔗

(b)

Show that the term-term matrix $A A^{T}$ satisfies $A A^{T} = (U Σ) {(U Σ)}^{T} .$ Thus we can compare term $i$ and term $j$ using the dot product of row $i$ and column $j$ of the matrix product $U Σ .$ (Exercise 6 shows that the columns of $U$ are orthogonal eigenvectors of $A A^{T} .$ )

🔗

As we will see, the connection of the matrices

U

and

V

to documents and terms that we saw in Project Activity 29.7 will be very useful when we use the SVD of the term-document matrix to reduce dimensions to a “concept” space. We will be able to interpret the rows of the matrices

U

and

V

as providing coordinates for terms and documents in this space.

🔗

Project Activity 29.8.

The singular value decomposition (SVD) allows us to produce new, improved term-document matrices. For this activity, use the term-document matrix $A$ in Figure 29.11.

🔗

(a)

Use appropriate technology to find a singular value decomposition of $A$ so that $A = U Σ V^{T} .$ Print your entries to two decimal places (but keep as many as possible for computational purposes).

Each singular value tells us how important its semantic dimension is. If we remove the smaller singular values (the less important dimensions), we retain the important information but eliminate minor details and noise. We produce a new term-document matrix $A_{k}$ by keeping the largest $k$ of the singular values and discarding the rest. This gives us an approximation

A_{k} = σ_{1} u_{1} v_{1}^{T} + σ_{2} u_{2} v_{2}^{T} + \dots + σ_{k} u_{k} v_{k}^{T}

using the outer product decomposition, where $σ_{1},$ $σ_{2},$ $\dots,$ $σ_{k}$ are the $k$ largest singular values of $A .$ Note that if $A$ is an $m \times n$ matrix, letting $U_{k} = [u_{1} u_{2} \dots u_{k}]$ (an $m \times k$ matrix), $Σ_{k}$ the $k \times k$ matrix with the first $k$ singular values along the diagonal, and $V^{T} = [v_{1} v_{2} \dots v_{k}]^{T}$ (a $k \times n$ matrix), then we can also write $A_{k} = U_{k} Σ_{k} V_{k}^{T} .$ This is sometimes referred to as a reduced SVD. Find $U_{2},$ $Σ_{2},$ and $V_{2}^{T},$ and find the new term-document matrix $A_{2} .$

🔗

Once we have our term-document matrix, there are three basic comparisons to make: comparing terms, comparing documents, and comparing terms and documents. Term-document matrices are usually very large, with dimension being the number of terms. By using a reduced SVD we can create a much smaller approximation. In our example, the matrix

A_{k}

in Project Activity 29.8 reduces our problem to a

k

-dimensional space. Intuitively, we can think of LSI as representing terms as averages of all of the documents in which they appear and documents as averages of all of the terms they contain. Through this process, LSI attempts to combine the surface information in our library into a deeper abstraction (the “concept” space) that captures the mutual relationships between terms and documents.

🔗

We now need to understand how we can represent documents and terms in this smaller space where

A_{k} = U_{k} Σ_{k} V_{k}^{T} .

Informally, we can consider the rows of

U_{k}

as representing the coordinates of each term in the lower dimensional concept space and the columns of

V_{k}^{T}

as the coordinates of the documents, while the entries of

Σ_{k}

tell us how important each semantic dimension is. The dot product of two row vectors of

A_{k}

indicates how terms compare across documents. This product is

A_{k} A_{k}^{T} .

Just as in Project Activity 29.7, we have

A_{k} A_{k}^{T} = (U_{k} Σ_{k}) {(U_{k} Σ_{k})}^{T} .

In other words, if we consider the rows of

U_{k} Σ_{k}

as coordinates for terms, then the dot products of these rows give us term to term comparisons. (Note that multiplying

U

Σ

just stretches the rows of

U

by the singular values according to the importance of the concept represented by that singular value.) Similarly, the dot product between columns of

A

provide a comparison of documents. This comparison is given by

A_{k}^{T} A_{k}^{=} (V_{k} Σ_{k}^{T}) {(V_{k} Σ_{k}^{T})}^{T}

(again by Project Activity 29.7). So we can consider the rows of

V Σ^{T}

as providing coordinates for documents.

🔗

Project Activity 29.9.

We have seen how to compare terms to terms and documents to documents. The matrix $A_{k}$ itself compares terms to documents. Show that $A_{k} = (U_{k} Σ_{k}^{1 / 2}) {(V_{k} Σ_{k}^{1 / 2})}^{T},$ where $Σ_{k}^{1 / 2}$ is the diagonal matrix of the same size as $Σ_{k}$ whose diagonal entries are the square roots of the corresponding diagonal entries in $Σ_{k} .$ Thus, all useful comparisons of terms and documents can be made using the rows of the matrices $U$ and $V,$ scaled in some way by the singular values in $Σ .$

🔗

To work in this smaller concept space, it is important to be able to find appropriate comparisons to objects that appeared in the original search. For example, to complete the latent structure view of the system, we must also convert the original query to a representation within the new term-document system represented by

A_{k} .

This new representation is called a pseudo-document.

🔗
Figure 29.12. Terms in the reduced concept space

🔗
Figure 29.13. Documents in the reduced concept space

🔗

Project Activity 29.10.

For an original query $q,$ we start with its term vector $a_{q}$ (a vector in the coordinate system determined by the columns of $A$ ) and find a representation $v_{q}$ that we can use as a column of $V^{T}$ in the document-document comparison matrix. If this representation was perfect, then it would take a real document in the original system given by $A$ and produce the corresponding column of $U$ if we used the full SVD. In other words, we would have $a_{q} = U Σ v_{q}^{T} .$

🔗

(a)

Use the fact that $A_{k} = U_{k} Σ_{k} V_{k}^{T},$ to show that $V_{k} = A_{k}^{T} U_{k} Σ_{k}^{- 1} .$ It follows that $q$ is transformed into the query $q_{k} = q^{T} U_{k} Σ_{k}^{- 1} .$

🔗

(b)

In our example, using $k = 2,$ the terms can now be represented as $2$ -dimensional vectors (the rows of $U_{2},$ see Figure 29.12), or as points in the plane. More specifically, human is represented by the vector (to two decimal places) $[- 0.22 - 0.11]^{T},$ interface by $[- 0.20 - 0.07]^{T},$ etc. Similarly, the documents are represented by columns of $V_{2}$ (see Figure 29.13), so that the document $c_{1}$ is represented by $[- 0.20 - 0.06]^{T},$ $c_{2}$ by $[- 0.61 0.17]^{T},$ etc. From this perspective we can visualize these documents in the plane. Plot the documents and the query in the 2-dimensional concept space. Then calculate the cosine distances from the query to the documents in this space. Which documents now give the best three matches to the query? Compare the matches to your plot.

🔗

As we can see from Project Activity 29.10, the original query had no match at all with any documents except

c_{1},

c_{2},

and

m_{4} .

In the new concept space, the query now has some connection to every document,. So LSI has made semantic connections between the terms and documents that were not present in the original term-document matrix, which gives us better results for our search.

Technically this definition should be in terms of a supremum, but because the equivalent definition restricts the

x

's to a compact subset, the sup is achieved and we can use max.

e.g., Deerwester, S., Dumais, S. T., Fumas, G. W., Landauer, T. K. and Harshman, R. Indexing by latent semantic analysis. Journal of the American Society for Information Science, 1990, 41: 391–407, and Landauer, T. and Dutnais, S. A Solution to Plato's Problem: The Latent Semantic Analysis Theory of Acquisition, Induction, and Representation of Knowledge. Psychological Review, 1997. Vol. 1M. No. 2, 211-240.