Orthogonal Diagonalization

Section 27 Orthogonal Diagonalization

Focus Questions

By the end of this section, you should be able to give precise and thorough answers to the questions listed below. You may want to keep these questions in mind to focus your thoughts as you complete the section.

What does it mean for a matrix to be orthogonally diagonalizable and why is this concept important?
What is a symmetric matrix and what important property related to diagonalization does a symmetric matrix have?
What is the spectrum of a matrix?

🔗

Subsection Application: The Multivariable Second Derivative Test

🔗

In single variable calculus, we learn that the second derivative can be used to classify a critical point of the type where the derivative of a function is 0 as a local maximum or minimum.

🔗

Theorem 27.1. The Second Derivative Test for Single-Variable Functions.

If $a$ is a critical number of a function $f$ so that $f^{'} (a) = 0$ and if $f^{″} (a)$ exists, then

if $f^{″} (a) < 0,$ then $f (a)$ is a local maximum value of $f,$
if $f^{″} (a) > 0,$ then $f (a)$ is a local minimum value of $f,$ and
if $f^{″} (a) = 0,$ this test yields no information.

🔗

In the two-variable case we have an analogous test, which is usually seen in a multivariable calculus course.

🔗

Theorem 27.2. The Second Derivative Test for Functions of Two Variables.

Suppose $(a, b)$ is a critical point of the function $f$ for which $f_{x} (a, b) = 0$ and $f_{y} (a, b) = 0 .$ Let $D$ be the quantity defined by

D = f_{x x} (a, b) f_{y y} (a, b) - f_{x y} (a, b)^{2} .

If $D > 0$ and $f_{x x} (a, b) < 0,$ then $f$ has a local maximum at $(a, b) .$
If $D > 0$ and $f_{x x} (a, b) > 0,$ then $f$ has a local minimum at $(a, b) .$
If $D < 0,$ then $f$ has a saddle point at $(a, b) .$
If $D = 0,$ then this test yields no information about what happens at $(a, b) .$

🔗

A proof of this test for two-variable functions is based on Taylor polynomials, and relies on symmetric matrices, eigenvalues, and quadratic forms. The steps for a proof will be found later in this section.

🔗

Subsection Introduction

🔗

We have seen how to diagonalize a matrix — if we can find

n

linearly independent eigenvectors of an

n \times n

matrix

A

and let

P

be the matrix whose columns are those eigenvectors, then

P^{- 1} A P

is a diagonal matrix with the eigenvalues down the diagonal in the same order corresponding to the eigenvectors placed in

P .

We will see that in certain cases we can take this one step further and create an orthogonal matrix with eigenvectors as columns to diagonalize a matrix. This is called orthogonal diagonalization. Orthogonal diagonalizability is useful in that it allows us to find a “convenient” coordinate system in which to interpret the results of certain matrix transformations. A set of orthonormal basis vectors for an orthogonally diagonalizable matrix

A

is called a set of principal axes for

A .

Orthogonal diagonalization will also play a crucial role in the singular value decomposition of a matrix, a decomposition that has been described by some as the “pinnacle” of linear algebra.

🔗

Definition 27.3.

An $n \times n$ matrix $A$ is orthogonally diagonalizable if there is an orthogonal matrix $P$ such that

P^{T} A P

is a diagonal matrix. We say that the matrix $P$ orthogonally diagonalizes the matrix $A .$

🔗

Preview Activity 27.1.

🔗

(a)

For each matrix $A$ whose eigenvalues and corresponding eigenvectors are given, find a matrix $P$ such that $P^{- 1} A P$ is a diagonal matrix.

🔗

(i)

$A = [\begin{array}{cc} 1 & 2 \\ 2 & 1 \end{array}]$ with eigenvalues $- 1$ and 3 and corresponding eigenvectors $v_{1} = [\begin{array}{r} - 1 \\ 1 \end{array}]$ and $v_{2} = [\begin{matrix} 1 \\ 1 \end{matrix}] .$

🔗

(ii)

$A = [\begin{array}{cc} 1 & 2 \\ 1 & 2 \end{array}]$ with eigenvalues $0$ and $3$ and corresponding eigenvectors $v_{1} = [\begin{array}{r} - 2 \\ 1 \end{array}]$ and $v_{2} = [\begin{matrix} 1 \\ 1 \end{matrix}] .$

🔗

(iii)

$A = [\begin{array}{ccc} 1 & 0 & 1 \\ 0 & 1 & 1 \\ 1 & 1 & 2 \end{array}]$ with eigenvalues $0,$ $1,$ and $3$ and corresponding eigenvectors $v_{1} = [\begin{array}{r} - 1 \\ - 1 \\ 1 \end{array}],$ $v_{2} = [\begin{array}{r} - 1 \\ 1 \\ 0 \end{array}],$ and $v_{3} = [\begin{matrix} 1 \\ 1 \\ 2 \end{matrix}] .$

🔗

(b)

Which matrices in part 1 seem to satisfy the orthogonal diagonalization requirement? Do you notice any common traits among these matrices?

🔗

Subsection Symmetric Matrices

🔗

As we saw in Preview Activity 27.1, matrices that are not symmetric need not be orthogonally diagonalizable, but the symmetric matrix examples are orthogonally diagonalizable. We explore that idea in this section.

🔗

P

is a matrix that orthogonally diagonalizes the matrix

A,

then

P^{T} A P = D,

where

D

is a diagonal matrix. Since

D^{T} = D

and

A = P D P^{T},

we have

\begin{aligned} A & = P D P^{T} \\ = P D^{T} P^{T} \\ = {(P^{T})}^{T} D^{T} P^{T} \\ = {(P D P^{T})}^{T} \\ = A^{T} . \end{aligned}

🔗

Therefore,

A^{T} = A

and matrices with this property are the only matrices that can be orthogonally diagonalized. Recall that any matrix

A

satisfying

A^{T} = A

is a symmetric matrix.

🔗

While we have just shown that the only matrices that can be orthogonally diagonalized are the symmetric matrices, the amazing thing about symmetric matrices is that every symmetric matrix can be orthogonally diagonalized. We will prove this shortly.

🔗

Symmetric matrices have useful properties, a few of which are given in the following activity (we will use some of these properties later in this section).

🔗

Activity 27.2.

Let $A$ be a symmetric $n \times n$ matrix and let $x$ and $y$ be vectors in $R^{n} .$

🔗

(a)

Show that $x^{T} A y = (A x)^{T} y .$

🔗

(b)

Show that $(A x) \cdot y = x \cdot (A y) .$

🔗

(c)

Show that the eigenvalues of a $2 \times 2$ symmetric matrix $A = [\begin{array}{cc} a & b \\ b & c \end{array}]$ are real.

🔗

Activity 27.2 (c) shows that a

2 \times 2

symmetric matrix has real eigenvalues. This is a general result about real symmetric matrices.

🔗

Theorem 27.4.

Let $A$ be an $n \times n$ symmetric matrix with real entries. Then the eigenvalues of $A$ are real.

🔗

Proof.

Let $A$ be an $n \times n$ symmetric matrix with real entries and let $λ$ be an eigenvalue of $A$ with eigenvector $v .$ To show that $λ$ is real, we will show that $\overset{―}{λ} = λ .$ We know

\begin{matrix} (27.1) & A v = λ v . \end{matrix}

Since $A$ has real entries, we also know that $\overset{―}{λ}$ is an eigenvalue for $A$ with eigenvector $\overset{―}{v} .$ Multiply both sides of (27.1) on the left by ${\overset{―}{v}}^{T}$ to obtain

\begin{matrix} (27.2) & {\overset{―}{v}}^{T} A v = {\overset{―}{v}}^{T} λ v = λ ({\overset{―}{v}}^{T} v) . \end{matrix}

Now

{\overset{―}{v}}^{T} A v = (A \overset{―}{v})^{T} v = (\overset{―}{λ} \overset{―}{v})^{T} v = \overset{―}{λ} ({\overset{―}{v}}^{T} v)

and equation (27.2) becomes

\overset{―}{λ} ({\overset{―}{v}}^{T} v) = λ ({\overset{―}{v}}^{T} v) .

Since $v \neq 0,$ this implies that $\overset{―}{λ} = λ$ and $λ$ is real.

🔗

To orthogonally diagonalize a matrix, it must be the case that eigenvectors corresponding to different eigenvalues are orthogonal. This is an important property and it would be useful to know when it happens.

🔗

Activity 27.3.

Let $A$ be a real symmetric matrix with eigenvalues $λ_{1}$ and $λ_{2}$ and corresponding eigenvectors $v_{1}$ and $v_{2},$ respectively.

🔗

(a)

Use Activity 27.2 (b) to show that $λ_{1} v_{1} \cdot v_{2} = λ_{2} v_{1} \cdot v_{2} .$

🔗

(b)

Explain why the result of part (a) shows that $v_{1}$ and $v_{2}$ are orthogonal if $λ_{1} \neq λ_{2} .$

🔗

Activity 27.3 proves the following theorem.

🔗

Theorem 27.5.

If $A$ is a real symmetric matrix, then eigenvectors corresponding to distinct eigenvalues are orthogonal.

🔗

Recall that the only matrices that can be orthogonally diagonalized are the symmetric matrices. Now we show that every real symmetric matrix can be orthogonally diagonalized, which completely characterizes the matrices that are orthogonally diagonalizable. The proof of the following theorem proceeds by induction. A reader who has not yet encountered this technique of proof can safely skip the proof of this theorem without loss of continuity.

🔗

Theorem 27.6.

Let $A$ be a real symmetric matrix. Then $A$ is orthogonally diagonalizable.

🔗

Proof.

Let $A$ be a real $n \times n$ symmetric matrix. The proof proceeds by induction on $n .$ If $n = 1,$ then $A$ is diagonal and orthogonally diagonalizable. So assume that any real $(n - 1) \times (n - 1)$ symmetric matrix is orthogonally diagonalizable. Assume that $A$ is a real $n \times n$ symmetric matrix. By Theorem 25.4 (find reference), the eigenvalues of $A$ are real. Let $λ_{1}$ be a real eigenvalue of $A$ with corresponding unit eigenvector $p_{1} .$ We can use the Gram-Schmidt process to extend ${p_{1}}$ to an orthonormal basis ${p_{1}, p_{2}, \dots, p_{n}}$ for $R^{n} .$ Let $P_{1} = [p_{1} p_{2} \dots p_{n}] .$ Then $P_{1}$ is an orthogonal matrix. Also,

\begin{aligned} P_{1}^{- 1} A P_{1} & = P_{1}^{T} A P_{1} \\ = P_{1}^{T} [A p_{1} A p_{2} \dots A p_{n}] \\ = [\begin{array}{c} p_{1}^{T} \\ p_{2}^{T} \\ ⋮ \\ p_{n}^{T} \end{array}] [λ p_{1} A p_{2} \dots A p_{n}] \\ = [\begin{array}{ccccc} p_{1}^{T} λ_{1} p_{1} & p_{1}^{T} A p_{2} & p_{1} A p_{3} & \dots & p_{1}^{T} A p_{n} \\ p_{2}^{T} λ_{1} p_{1} & p_{2}^{T} A p_{2} & p_{2} A p_{3} & \dots & p_{2}^{T} A p_{n} \\ ⋮ \\ p_{n}^{T} λ_{1} p_{1} & p_{n}^{T} A p_{2} & p_{n} A p_{3} & \dots & p_{n}^{T} A p_{n} \end{array}] \\ = [\begin{array}{ccccc} λ_{1} & p_{1}^{T} A p_{2} & p_{1} A p_{3} & \dots & p_{1}^{T} A p_{n} \\ 0 & p_{2}^{T} A p_{2} & p_{2} A p_{3} & \dots & p_{2}^{T} A p_{n} \\ ⋮ \\ 0 & p_{n}^{T} A p_{2} & p_{n} A p_{3} & \dots & p_{n}^{T} A p_{n} \end{array}] \\ = [\begin{array}{cc} λ_{1} & x^{T} \\ 0 & A_{1} \end{array}] \end{aligned}

where $x$ is a $(n - 1) \times 1$ vector, $0$ is the zero vector in $R^{n - 1},$ and $A_{1}$ is an $(n - 1) \times (n - 1)$ matrix. Letting $R = P_{1}^{T} A P_{1}$ we have that

R^{T} = {(P_{1}^{T} A P_{1})}^{T} = P_{1}^{T} A^{T} P_{1} = P_{1}^{T} A P_{1} = R,

so $R$ is a symmetric matrix. Therefore, $x = 0$ and $A_{1}$ is a symmetric matrix. By our induction hypothesis, $A_{1}$ is orthogonally diagonalizable. That is, there exists an $(n - 1) \times (n - 1)$ orthogonal matrix $Q$ such that $Q^{T} A_{1} Q = D_{1},$ where $D_{1}$ is a diagonal matrix. Now define $P_{2}$ by

P_{2} = [\begin{array}{cc} 1 & 0^{T} \\ 0 & Q \end{array}],

where $0$ is the zero vector in $R^{n - 1} .$ By construction, the columns of $P_{2}$ are orthonormal, so $P_{2}$ is an orthogonal matrix. Since $P_{1}$ is also an orthogonal matrix,

P^{T} = (P_{1} P_{2})^{T} = P_{2}^{T} P_{1}^{T} = P_{2}^{- 1} P_{1}^{- 1} = (P_{1} P_{2})^{- 1} = P^{- 1}

and $P$ is an orthogonal matrix. Finally,

\begin{aligned} P^{T} A P & = (P_{1} P_{2})^{T} A (P_{1} P_{2}) \\ = P_{2}^{T} (P_{1}^{T} A P_{1}) P_{2} \\ = {[\begin{array}{cc} 1 & 0^{T} \\ 0 & Q \end{array}]}^{T} [\begin{array}{cc} λ_{1} & x^{T} \\ 0 & A_{1} \end{array}] [\begin{array}{cc} 1 & 0^{T} \\ 0 & Q \end{array}] \\ = [\begin{array}{cc} 1 & 0^{T} \\ 0 & Q^{T} \end{array}] [\begin{array}{cc} λ_{1} & x^{T} \\ 0 & A_{1} \end{array}] [\begin{array}{cc} 1 & 0^{T} \\ 0 & Q \end{array}] \\ = [\begin{array}{cc} λ_{1} & 0^{T} \\ 0 & Q^{T} A_{1} Q \end{array}] \\ = [\begin{array}{cc} λ_{1} & 0^{T} \\ 0 & D_{1} \end{array}] . \end{aligned}

Therefore, $P^{T} A P$ is a diagonal matrix and $P$ orthogonally diagonalizes $A .$ This completes our proof.

🔗

The set of eigenvalues of a matrix

A

is called the spectrum of

A

and we have just proved the following theorem.

🔗

Theorem 27.7. The Spectral Theorem for Real Symmetric Matrices.

Let $A$ be an $n \times n$ symmetric matrix with real entries. Then

$A$ has $n$ real eigenvalues (counting multiplicities)
the dimension of each eigenspace of $A$ is the multiplicity of the corresponding eigenvalue as a root of the characteristic polynomial
eigenvectors corresponding to different eigenvalues are orthogonal
$A$ is orthogonally diagonalizable.

🔗

So any real symmetric matrix is orthogonally diagonalizable. We have seen examples of the orthogonal diagonalization of

n \times n

real symmetric matrices with

n

distinct eigenvalues, but how do we orthogonally diagonalize a symmetric matrix having eigenvalues of multiplicity greater than 1? The next activity shows us the process.

🔗

Activity 27.4.

Let $A = [\begin{array}{ccc} 4 & 2 & 2 \\ 2 & 4 & 2 \\ 2 & 2 & 4 \end{array}] .$ The eigenvalues of $A$ are 2 and 8, with eigenspace of dimension 2 and dimension 1, respectively.

🔗

(a)

Explain why $A$ can be orthogonally diagonalized.

🔗

(b)

Two linearly independent eigenvectors for $A$ corresponding to the eigenvalue 2 are $v_{1} = [\begin{array}{r} - 1 \\ 0 \\ 1 \end{array}]$ and $v_{2} = [\begin{array}{r} - 1 \\ 1 \\ 0 \end{array}] .$ Note that $v_{1}, v_{2}$ are not orthogonal, so cannot be in an orthogonal basis of $R^{3}$ consisting of eigenvectors of $A .$ So find a set ${w_{1}, w_{2}}$ of orthogonal eigenvectors of $A$ so that $Span {w_{1}, w_{2}} = Span {v_{1}, v_{2}} .$

🔗

(c)

The vector $v_{3} = [\begin{matrix} 1 \\ 1 \\ 1 \end{matrix}]$ is an eigenvector for $A$ corresponding to the eigenvalue 8. What can you say about the orthogonality relationship between $w_{i}$ 's and $v_{3} ?$

🔗

(d)

Find a matrix $P$ that orthogonally diagonalizes $A .$ Verify your work.

🔗

Subsection The Spectral Decomposition of a Symmetric Matrix $A$

🔗

Let

A

be an

n \times n

symmetric matrix with real entries. The Spectral Theorem tells us we can find an orthonormal basis

{u_{1}, u_{2}, \dots, u_{n}}

of eigenvectors of

A .

Let

A u_{i} = λ_{i} u_{i}

for each

1 \leq i \leq n .

P = [u_{1} u_{2} u_{3} \dots u_{n}],

then we know that

P^{T} A P = P^{- 1} A P = D,

🔗

where

D

is the

n \times n

diagonal matrix

[\begin{array}{cccccc} λ_{1} & 0 & 0 & \dots & 0 & 0 \\ 0 & λ_{2} & 0 & \dots & 0 & 0 \\ ⋮ & ⋮ & ⋮ & ⋮ & ⋮ \\ 0 & 0 & 0 & \dots & 0 & λ_{n} \end{array}] .

🔗

Since

A = P D P^{T}

we see that

\begin{aligned} A & = [u_{1} u_{2} u_{3} \dots u_{n}] [\begin{array}{cccccc} λ_{1} & 0 & 0 & \dots & 0 & 0 \\ 0 & λ_{2} & 0 & \dots & 0 & 0 \\ ⋮ & ⋮ & ⋮ & ⋮ & ⋮ \\ 0 & 0 & 0 & \dots & 0 & λ_{n} \end{array}] [\begin{array}{c} u_{1}^{T} \\ u_{2}^{T} \\ u_{3}^{T} \\ ⋮ \\ u_{n}^{T} \end{array}] \\ = [λ_{1} u_{1} λ_{2} u_{2} λ_{3} u_{3} \dots λ_{n} u_{n}] [\begin{array}{c} u_{1}^{T} \\ u_{2}^{T} \\ u_{3}^{T} \\ ⋮ \\ u_{n}^{T} \end{array}] \\ (27.3) & = λ_{1} u_{1} u_{1}^{T} + λ_{2} u_{2} u_{2}^{T} + λ_{3} u_{3} u_{3}^{T} + \dots + λ_{n} u_{n} u_{n}^{T}, \end{aligned}

🔗

where the last product follows from Exercise 4. The expression in (27.3) is called a spectral decomposition of the matrix

A .

Let

P_{i} = u_{i} u_{i}^{T}

for each

i .

The matrices

P_{i}

satisfy several special conditions given in the next theorem. The proofs are left to the exercises.

🔗

Theorem 27.8.

Let $A$ be an $n \times n$ symmetric matrix with real entries, and let ${u_{1}, u_{2}, \dots, u_{n}}$ be an orthonormal basis of eigenvectors of $A$ with $A u_{i} = λ_{i} u_{i}$ for each $i .$ For each $i,$ let $P_{i} = u_{i} u_{i}^{T} .$ Then

$A = λ_{1} P_{1} + λ_{2} P_{2} + \dots + λ_{n} P_{n},$
$P_{i}$ is a symmetric matrix for each $i,$
$P_{i}$ is a rank 1 matrix for each $i,$
$P_{i}^{2} = P_{i}$ for each $i,$
$P_{i} P_{j} = 0$ if $i \neq j,$
$P_{i} u_{i} = u_{i}$ for each $i,$
$P_{i} u_{j} = 0$ if $i \neq j,$
For any vector $v$ in $R^{n},$ $P_{i} v = {proj}_{Span {u_{i}}} v .$

🔗

The consequence of Theorem 27.8 is that any symmetric matrix can be written as the sum of symmetric, rank 1 matrices. As we will see later, this kind of decomposition contains much information about the matrix product

A^{T} A

for any matrix

A .

🔗

Activity 27.5.

Let $A = [\begin{array}{ccc} 4 & 2 & 2 \\ 2 & 4 & 2 \\ 2 & 2 & 4 \end{array}] .$ Let $λ_{1} = 2,$ $λ_{2} = 2,$ and $λ_{3} = 8$ be the eigenvalues of $A .$ A basis for the eigenspace $E_{8}$ of $A$ corresponding to the eigenvalue 8 is ${[1 1 1]^{T}}$ and a basis for the eigenspace $E_{2}$ of $A$ corresponding to the eigenvalue 2 is ${[1 - 1 0]^{T}, [1 0 - 1]^{T}} .$ (Compare to Activity 27.4.)

🔗

(a)

Find orthonormal eigenvectors $u_{1},$ $u_{2},$ and $u_{3}$ of $A$ corresponding to $λ_{1},$ $λ_{2},$ and $λ_{3},$ respectively.

🔗

(b)

Compute $λ_{1} u_{1} u_{1}^{T}$

🔗

(c)

Compute $λ_{2} u_{2} u_{2}^{T}$

🔗

(d)

Compute $λ_{3} u_{3} u_{3}^{T}$

🔗

(e)

Verify that $A = λ_{1} u_{1} u_{1}^{T} + λ_{2} u_{2} u_{2}^{T} + λ_{3} u_{3} u_{3}^{T} .$

🔗

Subsection Examples

🔗

What follows are worked examples that use the concepts from this section.

🔗

Example 27.9.

For each of the following matrices $A,$ determine if $A$ is diagonalizable. If $A$ is not diagonalizable, explain why. If $A$ is diagonalizable, find a matrix $P$ so that $P^{- 1} A P$ is a diagonal matrix. If the matrix is diagonalizable, is it orthogonally diagonalizable? If orthogonally diagonalizable, find an orthogonal matrix that diagonalizes $A .$ Use appropriate technology to find eigenvalues and eigenvectors.

🔗

(a)

$A = [\begin{array}{rrc} 2 & 0 & 0 \\ - 1 & 3 & 2 \\ 1 & - 1 & 0 \end{array}]$

Solution.

Recall that an $n \times n$ matrix $A$ is diagonalizable if and only if $A$ has $n$ linearly independent eigenvectors, and $A$ is orthogonally diagonalizable if and only if $A$ is symmetric. Since $A$ is not symmetric, $A$ is not orthogonally diagonalizable. Technology shows that the eigenvalues of $A$ are $2$ and $1$ and bases for the corresponding eigenspaces are ${[1 1 0]^{T}, [2 0 1]^{T}}$ and ${[0 - 1 1]^{T}} .$ So $A$ is diagonalizable and if $P = [\begin{array}{rcr} 1 & 2 & 0 \\ 1 & 0 & - 1 \\ 0 & 1 & 1 \end{array}],$ then

P^{- 1} A P = [\begin{array}{ccc} 2 & 0 & 0 \\ 0 & 2 & 0 \\ 0 & 0 & 1 \end{array}] .

🔗

(b)

$A = [\begin{array}{ccc} 1 & 1 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 0 \end{array}]$

Solution.

Since $A$ is not symmetric, $A$ is not orthogonally diagonalizable. Technology shows that the eigenvalues of $A$ are $0$ and $1$ and bases for the corresponding eigenspaces are ${[0 0 1]^{T}}$ and ${[1 0 0]^{T}} .$ We cannot create a basis of $R^{3}$ consisting of eigenvectors of $A,$ so $A$ is not diagonalizable.

🔗

(c)

$A = [\begin{array}{ccc} 4 & 2 & 1 \\ 2 & 7 & 2 \\ 1 & 2 & 4 \end{array}]$

Solution.

Since $A$ is symmetric, $A$ is orthogonally diagonalizable. Technology shows that the eigenvalues of $A$ are $3$ and $9$ and bases for the eigenspaces ${[- 1 0 1]^{T}, [- 2 1 0]^{T}}$ and ${[1 2 1]^{T}},$ respectively. To find an orthogonal matrix that diagonalizes $A,$ we must find an orthonormal basis of $R^{3}$ consisting of eigenvectors of $A .$ To do that, we use the Gram-Schmidt process to obtain an orthogonal basis for the eigenspace of $A$ corresponding to the eigenvalue $3 .$ Doing so gives an orthogonal basis ${v_{1}, v_{2}},$ where $v_{1} = [- 1 0 1]^{T}$ and

\begin{aligned} v_{2} & = [- 2 1 0]^{T} - \frac{[- 2 1 0]^{T} \cdot [- 1 0 1]^{T}}{[- 1 0 1]^{T} \cdot [- 1 0 1]^{T}} [- 1 0 1]^{T} \\ = [- 2 1 0]^{T} - [- 1 0 1]^{T} \\ = [- 1 1 - 1]^{T} . \end{aligned}

So an orthonormal basis for $R^{3}$ of eigenvectors of $A$ is

{\frac{1}{\sqrt{2}} [- 1 0 1]^{T}, \frac{1}{\sqrt{3}} [- 1 1 - 1]^{T}, \frac{1}{\sqrt{6}} [1 1 1]^{T}} .

Therefore, $A$ is orthogonally diagonalizable and if $P$ is the matrix $[\begin{array}{rrc} - \frac{1}{\sqrt{2}} & - \frac{1}{\sqrt{3}} & \frac{1}{\sqrt{6}} \\ 0 & \frac{1}{\sqrt{3}} & \frac{2}{\sqrt{6}} \\ \frac{1}{\sqrt{2}} & - \frac{1}{\sqrt{3}} & \frac{1}{\sqrt{6}} \end{array}],$ then

P^{- 1} A P = [\begin{array}{ccc} 3 & 0 & 0 \\ 0 & 3 & 0 \\ 0 & 0 & 9 \end{array}] .

🔗

Example 27.10.

Let $A = [\begin{array}{cccc} 0 & 0 & 0 & 1 \\ 0 & 0 & 1 & 0 \\ 0 & 1 & 0 & 0 \\ 1 & 0 & 0 & 0 \end{array}] .$ Find an orthonormal basis for $R^{4}$ consisting of eigenvectors of $A .$

Solution.

Since $A$ is symmetric, there is an orthogonal matrix $P$ such that $P^{- 1} A P$ is diagonal. The columns of $P$ will form an orthonormal basis for $R^{4} .$ Using a cofactor expansion along the first row shows that

\begin{aligned} det (A - λ I_{4}) & = det ([\begin{array}{rrrr} - λ & 0 & 0 & 1 \\ 0 & - λ & 1 & 0 \\ 0 & 1 & - λ & 0 \\ 1 & 0 & 0 & - λ \end{array}]) \\ = {(λ^{2} - 1)}^{2} \\ = (λ + 1)^{2} (λ - 1)^{2} . \end{aligned}

So the eigenvalues of $A$ are $1$ and $- 1 .$ The reduced row echelon forms of $A - I_{4}$ and $A + I_{4}$ are, respectively,

[\begin{array}{ccrr} 1 & 0 & 0 & - 1 \\ 0 & 1 & - 1 & 0 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \end{array}] and [\begin{array}{cccc} 1 & 0 & 0 & 1 \\ 0 & 1 & 1 & 0 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \end{array}] .

Thus, a basis for the eigenspace $E_{1}$ of $A$ is ${[0 1 1 0]^{T}, [1 0 0 1]^{T}}$ and a basis for the eigenspace $E_{- 1}$ of $A$ is ${[0 1 - 1 0]^{T}, [1 0 0 - 1]^{T}} .$ The set ${[0 1 1 0]^{T}, [1 0 0 1]^{T}, [0 1 - 1 0]^{T}, [1 0 0 - 1]^{T}}$ is an orthogonal set, so an orthonormal basis for $R^{4}$ consisting of eigenvectors of $A$ is

{\frac{1}{\sqrt{2}} [0 1 1 0]^{T}, \frac{1}{\sqrt{2}} [1 0 0 1]^{T}, \frac{1}{\sqrt{2}} [0 1 - 1 0]^{T}, \frac{1}{\sqrt{2}} [1 0 0 - 1]^{T}} .

🔗

Subsection Summary

An $n \times n$ matrix $A$ is orthogonally diagonalizable if there is an orthogonal matrix $P$ such that $P^{T} A P$ is a diagonal matrix. Orthogonal diagonalizability is useful in that it allows us to find a “convenient” coordinate system in which to interpret the results of certain matrix transformations. Orthogonal diagonalization also a plays a crucial role in the singular value decomposition of a matrix.
An $n \times n$ matrix $A$ is symmetric if $A^{T} = A .$ The symmetric matrices are exactly the matrices that can be orthogonally diagonalized.
The spectrum of a matrix is the set of eigenvalues of the matrix.

🔗

Exercises Exercises

🔗

1.

For each of the following matrices, find an orthogonal matrix $P$ so that $P^{T} A P$ is a diagonal matrix, or explain why no such matrix exists.

🔗

(a)

$A = [\begin{array}{rr} 3 & - 4 \\ - 4 & - 3 \end{array}]$

🔗

(b)

$A = [\begin{array}{ccc} 4 & 1 & 1 \\ 1 & 1 & 4 \\ 1 & 4 & 1 \end{array}]$

🔗

(c)

$A = [\begin{array}{cccc} 1 & 2 & 0 & 0 \\ 0 & 1 & 2 & 1 \\ 1 & 1 & 1 & 1 \\ 3 & 0 & 5 & 2 \end{array}]$

🔗

2.

For each of the following matrices find an orthonormal basis of eigenvectors of $A .$ Then find a spectral decomposition of $A .$

🔗

(a)

$A = [\begin{array}{rr} 3 & - 4 \\ - 4 & - 3 \end{array}]$

🔗

(b)

$A = [\begin{array}{ccc} 4 & 1 & 1 \\ 1 & 1 & 4 \\ 1 & 4 & 1 \end{array}]$

🔗

(c)

$A = [\begin{array}{rrr} - 4 & 0 & - 24 \\ 0 & - 8 & 0 \\ - 24 & 0 & 16 \end{array}]$

🔗

(d)

$A = [\begin{array}{ccr} 1 & 0 & 0 \\ 0 & 0 & 2 \\ 0 & 2 & - 3 \end{array}]$

🔗

3.

Find a non-diagonal $4 \times 4$ matrix with eigenvalues 2, 3 and 6 which can be orthogonally diagonalized.

🔗

4.

Let $A = [a_{i j}] = [c_{1} c_{2} \dots c_{m}]$ be an $k \times m$ matrix with columns $c_{1},$ $c_{2},$ $\dots,$ $c_{m},$ and let $B = [b_{i j}] = [\begin{matrix} r_{1} \\ r_{2} \\ ⋮ \\ r_{m} \end{matrix}]$ be an $m \times n$ matrix with rows $r_{1},$ $r_{2},$ $\dots,$ $r_{m} .$ Show that

A B = [c_{1} c_{2} \dots c_{m}] [\begin{matrix} r_{1} \\ r_{2} \\ ⋮ \\ r_{m} \end{matrix}] = c_{1} r_{1} + c_{2} r_{2} + \dots + c_{m} r_{m} .

🔗

5.

Let $A$ be an $n \times n$ symmetric matrix with real entries and let ${u_{1}, u_{2}, \dots, u_{n}}$ be an orthonormal basis of eigenvectors of $A .$ For each $i,$ let $P_{i} = u_{i} u_{i}^{T} .$ Prove Theorem 27.8 — that is, verify each of the following statements.

🔗

(a)

For each $i,$ $P_{i}$ is a symmetric matrix.

🔗

(b)

For each $i,$ $P_{i}$ is a rank 1 matrix.

🔗

(c)

For each $i,$ $P_{i}^{2} = P_{i} .$

🔗

(d)

If $i \neq j,$ then $P_{i} P_{j} = 0 .$

🔗

(e)

For each $i,$ $P_{i} u_{i} = u_{i} .$

🔗

(f)

If $i \neq j,$ then $P_{i} u_{j} = 0 .$

🔗

(g)

If $v$ is in $R^{n},$ show that

P_{i} v = {proj}_{Span {u_{i}}} v .

For this reason we call $P_{i}$ an orthogonal projection matrix.

🔗

6.

Show that if $M$ is an $n \times n$ matrix and $(M x) \cdot y = x \cdot (M y)$ for every $x, y$ in $R^{n},$ then $M$ is a symmetric matrix.

Hint.

Try $x = e_{i}$ and $y = e_{j} .$

🔗

7.

Let $A$ be an $n \times n$ symmetric matrix and assume that $A$ has an orthonormal basis ${u_{1},$ $u_{2},$ $\dots,$ $u_{n}}$ of eigenvectors of $A$ so that $A u_{i} = λ_{i} u_{i}$ for each $i .$ Let $P_{i} = u_{i} u_{i}^{T}$ for each $i .$ It is possible that not all of the eigenvalue of $A$ are distinct. In this case, some of the eigenvalues will be repeated in the spectral decomposition of $A .$ If we want only distinct eigenvalues to appear, we might do the following. Let $μ_{1},$ $μ_{2},$ $\dots,$ $μ_{k}$ be the distinct eigenvalues of $A .$ For each $j$ between 1 and $k,$ let $Q_{j}$ be the sum of all of the $P_{i}$ that have $μ_{j}$ as eigenvalue.

🔗

(a)

The eigenvalues for the matrix $A = [\begin{array}{cccc} 0 & 2 & 0 & 0 \\ 2 & 3 & 0 & 0 \\ 0 & 0 & 0 & 2 \\ 0 & 0 & 2 & 3 \end{array}]$ are $- 1$ and $4 .$ Find a basis for each eigenspace and determine each $P_{i} .$ Then find $k,$ $μ_{1},$ $\dots,$ $μ_{k},$ and each $Q_{j} .$

🔗

(b)

Show in general (not just for the specific example in part (a), that the $Q_{j}$ satisfy the same properties as the $P_{i} .$ That is, verify the following.

🔗

(i)

$A = μ_{1} Q_{1} + μ_{2} Q_{2} + \dots μ_{k} Q_{k}$

Hint.

Collect matrices with the same eigenvalues.

🔗

(ii)

$Q_{j}$ is a symmetric matrix for each $j$

Hint.

Use the fact that each $P_{i}$ is a symmetric matrix.

🔗

(iii)

$Q_{j}^{2} = Q_{j}$ for each $j$

Hint.

Use Theorem 31.8.

🔗

(iv)

$Q_{j} Q_{ℓ} = 0$ when $j \neq ℓ$

Hint.

Use Theorem 31.8.

🔗

(v)

if $E_{μ_{j}}$ is the eigenspace for $A$ corresponding to the eigenvalue $μ_{j},$ and if $v$ is in $R^{n},$ then $Q_{j} v = {proj}_{E_{μ_{j}}} v .$

Hint.

Explain why ${u_{1_{j}},$ $u_{2_{j}},$ $\dots,$ $u_{m_{j}}}$ is a orthonormal basis for $E_{μ_{j}} .$

🔗

(c)

What is the rank of $Q_{j} ?$ Verify your answer.

🔗

8.

Label each of the following statements as True or False. Provide justification for your response.

🔗

(a) True/False.

Every real symmetric matrix is diagonalizable.

🔗

(b) True/False.

If $P$ is a matrix whose columns are eigenvectors of a symmetric matrix, then the columns of $P$ are orthogonal.

🔗

(c) True/False.

If $A$ is a symmetric matrix, then eigenvectors of $A$ corresponding to distinct eigenvalues are orthogonal.

🔗

(d) True/False.

If $v_{1}$ and $v_{2}$ are distinct eigenvectors of a symmetric matrix $A,$ then $v_{1}$ and $v_{2}$ are orthogonal.

🔗

(e) True/False.

Any symmetric matrix can be written as a sum of symmetric rank 1 matrices.

🔗

(f) True/False.

If $A$ is a matrix satisfying $A^{T} = A,$ and $u$ and $v$ are vectors satisfying $A u = 2 u$ and $A v = - 2 v,$ then $u \cdot v = 0 .$

🔗

(g) True/False.

If an $n \times n$ matrix $A$ has $n$ orthogonal eigenvectors, then $A$ is a symmetric matrix.

🔗

(h) True/False.

If an $n \times n$ matrix has $n$ real eigenvalues (counted with multiplicity), then $A$ is a symmetric matrix.

🔗

(i) True/False.

For each eigenvalue of a symmetric matrix, the algebraic multiplicity equals the geometric multiplicity.

🔗

(j) True/False.

If $A$ is invertible and orthogonally diagonalizable, then so is $A^{- 1} .$

🔗

(k) True/False.

If $A, B$ are orthogonally diagonalizable $n \times n$ matrices, then so is $A B .$

🔗

Subsection Project: The Second Derivative Test for Functions of Two Variables

🔗

In this project we will verify the Second Derivative Test for functions of two variables.⁴⁸ This test will involve Taylor polynomials and linear algebra. As a quick review, recall that the second order Taylor polynomial for a function

f

of a single variable

x

x = a

\begin{matrix} (27.4) & P_{2} (x) = f (a) + f^{'} (a) (x - a) + \frac{f^{″} (a)}{2} (x - a)^{2} . \end{matrix}

🔗

As with the linearization of a function, the second order Taylor polynomial is a good approximation to

f

around

a

— that is

f (x) \approx P_{2} (x)

for

x

close to

a .

a

is a critical number for

f

with

f^{'} (a) = 0,

then

P_{2} (x) = f (a) + \frac{f^{″} (a)}{2} (x - a)^{2} .

🔗

In this situation, if

f^{″} (a) < 0,

then

\frac{f^{″} (a)}{2} (x - a)^{2} \leq 0

for

x

close to

a,

which makes

P_{2} (x) \leq f (a) .

This implies that

f (x) \approx P_{2} (x) \leq f (a)

for

x

close to

a,

which makes

f (a)

a relative maximum value for

f .

Similarly, if

f^{″} (a) > 0,

then

f (a)

is a relative minimum.

🔗

We now need a Taylor polynomial for a function of two variables. The complication of the additional independent variable in the two variable case means that the Taylor polynomials will need to contain all of the possible monomials of the indicated degrees. Recall that the linearization (or tangent plane) to a function

f = f (x, y)

at a point

(a, b)

is given by

P_{1} (x, y) = f (a, b) + f_{x} (a, b) (x - a) + f_{y} (a, b) (y - b) .

🔗

Note that

P_{1} (a, b) = f (a, b),

\frac{\partial P_{1}}{\partial x} (a, b) = f_{x} (a, b),

and

\frac{\partial P_{1}}{\partial y} (a, b) = f_{y} (a, b) .

This makes

P_{1} (x, y)

the best linear approximation to

f

near the point

(a, b) .

The polynomial

P_{1} (x, y)

is the first order Taylor polynomial for

f

(a, b) .

🔗

Similarly, the second order Taylor polynomial

P_{2} (x, y)

centered at the point

(a, b)

for the function

f

\begin{aligned} P_{2} (x, y) = f (a, b) & + f_{x} (a, b) (x - a) + f_{y} (a, b) (y - b) + \frac{f_{x x} (a, b)}{2} (x - a)^{2} \\ + f_{x y} (a, b) (x - a) (y - b) + \frac{f_{y y} (a, b)}{2} (y - b)^{2} . \end{aligned}

🔗

Project Activity 27.6.

To see that $P_{2} (x, y)$ is the best approximation for $f$ near $(a, b),$ we need to know that the first and second order partial derivatives of $P_{2}$ agree with the corresponding partial derivatives of $f$ at the point $(a, b) .$ Verify that this is true.

🔗

We can rewrite this second order Taylor polynomial using matrices and vectors so that we can apply techniques from linear algebra to analyze it. Note that

\begin{aligned} P_{2} (x, y) & = f (a, b) + \nabla f (a, b)^{T} [\begin{array}{c} x - a \\ y - b \end{array}] \\ (27.5) & + \frac{1}{2} {[\begin{array}{c} x - a \\ y - b \end{array}]}^{T} [\begin{array}{cc} f_{x x} (a, b) & f_{x y} (a, b) \\ f_{x y} (a, b) & f_{y y} (a, b) \end{array}] [\begin{array}{c} x - a \\ y - b \end{array}], \end{aligned}

🔗

where

\nabla f (x, y) = [\begin{matrix} f_{x} (x, y) \\ f_{y} (x, y) \end{matrix}]

is the gradient of

f

and

H

is the Hessian of

f,

where

H (x, y) = [\begin{array}{cc} f_{x x} (x, y) & f_{x y} (x, y) \\ f_{y x} (x, y) & f_{y y} (x, y) \end{array}] .

⁴⁹

🔗

Project Activity 27.7.

Use Equation (27.5) to compute $P_{2} (x, y)$ for $f (x, y) = x^{4} + y^{4} - 4 x y + 1$ at $(a, b) = (2, 3) .$

🔗

The important idea for us is that if

(a, b)

is a point at which

f_{x}

and

f_{y}

are zero, then

\nabla f

is the zero vector and Equation (27.5) reduces to

\begin{matrix} (27.6) & P_{2} (x, y) = f (a, b) + \frac{1}{2} {[\begin{matrix} x - a \\ y - b \end{matrix}]}^{T} [\begin{array}{cc} f_{x x} (a, b) & f_{x y} (a, b) \\ f_{x y} (a, b) & f_{y y} (a, b) \end{array}] [\begin{matrix} x - a \\ y - b \end{matrix}], \end{matrix}

🔗

To make the connection between the multivariable second derivative test and properties of the Hessian,

H (a, b),

at a critical point of a function

f

at which

\nabla f = 0,

we will need to connect the eigenvalues of a matrix to the determinant and the trace.

🔗

Let

A

be an

n \times n

matrix with eigenvalues

λ_{1},

λ_{2},

\dots,

λ_{n}

(not necessarily distinct). Exercise 1 in Section 18 shows that

\begin{matrix} (27.7) & det (A) = λ_{1} λ_{2} \dots λ_{n} . \end{matrix}

🔗

In other words, the determinant of a matrix is equal to the product of the eigenvalues of the matrix. In addition, Exercise 9 in Section 19 shows that

\begin{matrix} (27.8) & trace (A) = λ_{1} + λ_{2} + \dots + λ_{n} . \end{matrix}

🔗

for a diagonalizable matrix, where

trace (A)

is the sum of the diagonal entries of

A .

Equation (27.8) is true for any square matrix, but we don't need the more general result for this project.

🔗

The fact that the Hessian is a symmetric matrix makes it orthogonally diagonalizable. We denote the eigenvalues of

H (a, b)

λ_{1}

and

λ_{2} .

Thus there exists an orthogonal matrix

P

and a diagonal matrix

D = [\begin{array}{cc} λ_{1} & 0 \\ 0 & λ_{2} \end{array}]

such that

P^{T} H (a, b) P = D,

H (a, b) = P D P^{T} .

Equations (27.7) and (27.8) show that

λ_{1} λ_{2} = f_{x x} (a, b) f_{y y} (a, b) - f_{x y} (a, b)^{2} and λ_{1} + λ_{2} = f_{x x} (a, b) + f_{y y} (a, b) .

🔗

Now we have the machinery to verify the Second Derivative Test for Two-Variable Functions. We assume

(a, b)

is a point in the domain of a function

f

so that

\nabla f (a, b) = 0 .

First we consider the case where

f_{x x} (a, b) f_{y y} (a, b) - f_{x y} (a, b)^{2} < 0 .

🔗

Project Activity 27.8.

Explain why if $f_{x x} (a, b) f_{y y} (a, b) - f_{x y} (a, b)^{2} < 0,$ then

{[\begin{matrix} x - a \\ y - b \end{matrix}]}^{T} H (a, b) [\begin{matrix} x - a \\ y - b \end{matrix}]

is indefinite. Explain why this implies that $f$ is “saddle-shaped” near $(a, b) .$

Hint.

Substitute $w = [\begin{matrix} w_{1} \\ w_{2} \end{matrix}] = P^{T} [\begin{matrix} x - a \\ y - b \end{matrix}] .$ What does the graph of $f$ look like in the $w_{1}$ and $w_{2}$ directions?

🔗

Now we examine the situation when

f_{x x} (a, b) f_{y y} (a, b) - f_{x y} (a, b)^{2} > 0 .

🔗

Project Activity 27.9.

Assume that $f_{x x} (a, b) f_{y y} (a, b) - f_{x y} (a, b)^{2} > 0 .$

🔗

(a)

Explain why either both $f_{x x} (a, b)$ and $f_{y y} (a, b)$ are positive or both are negative.

🔗

(b)

If $f_{x x} (a, b) > 0$ and $f_{y y} (a, b) > 0,$ explain why $λ_{1}$ and $λ_{2}$ must be positive.

🔗

(c)

Explain why, if $f_{x x} (a, b) > 0$ and $f_{y y} (a, b) > 0,$ then $f (a, b)$ is a local minimum value for $f .$

🔗

When

f_{x x} (a, b) f_{y y} (a, b) - f_{x y} (a, b)^{2} > 0

and either

f_{x x} (a, b)

f_{y y} (a, b)

is negative, a slight modification of the preceding argument leads to the fact that

f

has a local maximum at

(a, b)

(the details are left to the reader). Therefore, we have proved the Second Derivative Test for functions of two variables!

🔗

Project Activity 27.10.

Use the Hessian to classify the local maxima, minima, and saddle points of $f (x, y) = x^{4} + y^{4} - 4 x y + 1 .$ Draw a graph of $f$ to illustrate.

Many thanks to Professor Paul Fishback for sharing his activity on this topic. Much of this project comes from his activity.

Note that under reasonable conditions (e.g., that

f

has continuous second order mixed partial derivatives in some open neighborhood containing

(x, y)

) we have that

f_{x y} (x, y) = f_{y x} (x, y)

and

H (x, y) = [\begin{array}{cc} f_{x x} (a, b) & f_{x y} (a, b) \\ f_{x y} (a, b) & f_{y y} (a, b) \end{array}]

is a symmetric matrix. We will only consider functions that satisfy these reasonable conditions.

Section 27 Orthogonal Diagonalization

Focus Questions

Subsection Application: The Multivariable Second Derivative Test

Theorem 27.1. The Second Derivative Test for Single-Variable Functions.

Theorem 27.2. The Second Derivative Test for Functions of Two Variables.

Subsection Introduction

Definition 27.3.

Preview Activity 27.1.

(a)

(i)

(ii)

(iii)

(b)

Subsection Symmetric Matrices

Activity 27.2.

(a)

(b)

(c)

Theorem 27.4.

Proof.

Activity 27.3.

(a)

(b)

Theorem 27.5.

Theorem 27.6.

Proof.

Theorem 27.7. The Spectral Theorem for Real Symmetric Matrices.

Activity 27.4.

(a)

(b)

(c)

(d)

Subsection The Spectral Decomposition of a Symmetric Matrix A

Theorem 27.8.

Activity 27.5.

(a)

(b)

(c)

(d)

(e)

Subsection Examples

Example 27.9.

(a)

Solution.

(b)

Solution.

(c)

Solution.

Example 27.10.

Solution.

Subsection Summary

Exercises Exercises

1.

(a)

(b)

(c)

2.

(a)

(b)

(c)

(d)

3.

4.

5.

(a)

(b)

(c)

(d)

(e)

(f)

(g)

6.

7.

(a)

(b)

(i)

(ii)

(iii)

(iv)

(v)

Subsection The Spectral Decomposition of a Symmetric Matrix $A$