Skip to main content

Section 27 Orthogonal Diagonalization

Subsection Application: The Multivariable Second Derivative Test

In single variable calculus, we learn that the second derivative can be used to classify a critical point of the type where the derivative of a function is 0 as a local maximum or minimum.

In the two-variable case we have an analogous test, which is usually seen in a multivariable calculus course.

A proof of this test for two-variable functions is based on Taylor polynomials, and relies on symmetric matrices, eigenvalues, and quadratic forms. The steps for a proof will be found later in this section.

Subsection Introduction

We have seen how to diagonalize a matrix — if we can find n linearly independent eigenvectors of an n×n matrix A and let P be the matrix whose columns are those eigenvectors, then P1AP is a diagonal matrix with the eigenvalues down the diagonal in the same order corresponding to the eigenvectors placed in P. We will see that in certain cases we can take this one step further and create an orthogonal matrix with eigenvectors as columns to diagonalize a matrix. This is called orthogonal diagonalization. Orthogonal diagonalizability is useful in that it allows us to find a “convenient” coordinate system in which to interpret the results of certain matrix transformations. A set of orthonormal basis vectors for an orthogonally diagonalizable matrix A is called a set of principal axes for A. Orthogonal diagonalization will also play a crucial role in the singular value decomposition of a matrix, a decomposition that has been described by some as the “pinnacle” of linear algebra.

Definition 27.3.

An n×n matrix A is orthogonally diagonalizable if there is an orthogonal matrix P such that

PTAP

is a diagonal matrix. We say that the matrix P orthogonally diagonalizes the matrix A.

Preview Activity 27.1.

(a)

For each matrix A whose eigenvalues and corresponding eigenvectors are given, find a matrix P such that P1AP is a diagonal matrix.

(i)

A=[1221] with eigenvalues 1 and 3 and corresponding eigenvectors v1=[11] and v2=[11].

(ii)

A=[1212] with eigenvalues 0 and 3 and corresponding eigenvectors v1=[21] and v2=[11].

(iii)

A=[101011112] with eigenvalues 0, 1, and 3 and corresponding eigenvectors v1=[111], v2=[110], and v3=[112].

(b)

Which matrices in part 1 seem to satisfy the orthogonal diagonalization requirement? Do you notice any common traits among these matrices?

Subsection Symmetric Matrices

As we saw in Preview Activity 27.1, matrices that are not symmetric need not be orthogonally diagonalizable, but the symmetric matrix examples are orthogonally diagonalizable. We explore that idea in this section.

If P is a matrix that orthogonally diagonalizes the matrix A, then PTAP=D, where D is a diagonal matrix. Since DT=D and A=PDPT, we have

A=PDPT=PDTPT=(PT)TDTPT=(PDPT)T=AT.

Therefore, AT=A and matrices with this property are the only matrices that can be orthogonally diagonalized. Recall that any matrix A satisfying AT=A is a symmetric matrix.

While we have just shown that the only matrices that can be orthogonally diagonalized are the symmetric matrices, the amazing thing about symmetric matrices is that every symmetric matrix can be orthogonally diagonalized. We will prove this shortly.

Symmetric matrices have useful properties, a few of which are given in the following activity (we will use some of these properties later in this section).

Activity 27.2.

Let A be a symmetric n×n matrix and let x and y be vectors in Rn.

(a)

Show that xTAy=(Ax)Ty.

(b)

Show that (Ax)y=x(Ay).

(c)

Show that the eigenvalues of a 2×2 symmetric matrix A=[abbc] are real.

Activity 27.2 (c) shows that a 2×2 symmetric matrix has real eigenvalues. This is a general result about real symmetric matrices.

Proof.

Let A be an n×n symmetric matrix with real entries and let λ be an eigenvalue of A with eigenvector v. To show that λ is real, we will show that λ=λ. We know

(27.1)Av=λv.

Since A has real entries, we also know that λ is an eigenvalue for A with eigenvector v. Multiply both sides of (27.1) on the left by vT to obtain

(27.2)vTAv=vTλv=λ(vTv).

Now

vTAv=(Av)Tv=(λ v)Tv=λ(vTv)

and equation (27.2) becomes

λ(vTv)=λ(vTv).

Since v0, this implies that λ=λ and λ is real.

To orthogonally diagonalize a matrix, it must be the case that eigenvectors corresponding to different eigenvalues are orthogonal. This is an important property and it would be useful to know when it happens.

Activity 27.3.

Let A be a real symmetric matrix with eigenvalues λ1 and λ2 and corresponding eigenvectors v1 and v2, respectively.

(b)

Explain why the result of part (a) shows that v1 and v2 are orthogonal if λ1λ2.

Activity 27.3 proves the following theorem.

Recall that the only matrices that can be orthogonally diagonalized are the symmetric matrices. Now we show that every real symmetric matrix can be orthogonally diagonalized, which completely characterizes the matrices that are orthogonally diagonalizable. The proof of the following theorem proceeds by induction. A reader who has not yet encountered this technique of proof can safely skip the proof of this theorem without loss of continuity.

Proof.

Let A be a real n×n symmetric matrix. The proof proceeds by induction on n. If n=1, then A is diagonal and orthogonally diagonalizable. So assume that any real (n1)×(n1) symmetric matrix is orthogonally diagonalizable. Assume that A is a real n×n symmetric matrix. By Theorem 25.4 (find reference), the eigenvalues of A are real. Let λ1 be a real eigenvalue of A with corresponding unit eigenvector p1. We can use the Gram-Schmidt process to extend {p1} to an orthonormal basis {p1,p2,,pn} for Rn. Let P1=[p1 p2  pn]. Then P1 is an orthogonal matrix. Also,

P11AP1=P1TAP1=P1T[Ap1 Ap2  Apn]=[p1Tp2TpnT][λp1 Ap2  Apn]=[p1Tλ1p1p1TAp2p1Ap3p1TApnp2Tλ1p1p2TAp2p2Ap3p2TApnpnTλ1p1pnTAp2pnAp3pnTApn]=[λ1p1TAp2p1Ap3p1TApn0p2TAp2p2Ap3p2TApn0pnTAp2pnAp3pnTApn]=[λ1xT0A1]

where x is a (n1)×1 vector, 0 is the zero vector in Rn1, and A1 is an (n1)×(n1) matrix. Letting R=P1TAP1 we have that

RT=(P1TAP1)T=P1TATP1=P1TAP1=R,

so R is a symmetric matrix. Therefore, x=0 and A1 is a symmetric matrix. By our induction hypothesis, A1 is orthogonally diagonalizable. That is, there exists an (n1)×(n1) orthogonal matrix Q such that QTA1Q=D1, where D1 is a diagonal matrix. Now define P2 by

P2=[10T0Q],

where 0 is the zero vector in Rn1. By construction, the columns of P2 are orthonormal, so P2 is an orthogonal matrix. Since P1 is also an orthogonal matrix,

PT=(P1P2)T=P2TP1T=P21P11=(P1P2)1=P1

and P is an orthogonal matrix. Finally,

PTAP=(P1P2)TA(P1P2)=P2T(P1TAP1)P2=[10T0Q]T[λ1xT0A1][10T0Q]=[10T0QT][λ1xT0A1][10T0Q]=[λ10T0QTA1Q]=[λ10T0D1].

Therefore, PTAP is a diagonal matrix and P orthogonally diagonalizes A. This completes our proof.

The set of eigenvalues of a matrix A is called the spectrum of A and we have just proved the following theorem.

So any real symmetric matrix is orthogonally diagonalizable. We have seen examples of the orthogonal diagonalization of n×n real symmetric matrices with n distinct eigenvalues, but how do we orthogonally diagonalize a symmetric matrix having eigenvalues of multiplicity greater than 1? The next activity shows us the process.

Activity 27.4.

Let A=[422242224]. The eigenvalues of A are 2 and 8, with eigenspace of dimension 2 and dimension 1, respectively.

(a)

Explain why A can be orthogonally diagonalized.

(b)

Two linearly independent eigenvectors for A corresponding to the eigenvalue 2 are v1=[101] and v2=[110]. Note that v1,v2 are not orthogonal, so cannot be in an orthogonal basis of R3 consisting of eigenvectors of A. So find a set {w1,w2} of orthogonal eigenvectors of A so that Span{w1,w2}=Span{v1,v2}.

(c)

The vector v3=[111] is an eigenvector for A corresponding to the eigenvalue 8. What can you say about the orthogonality relationship between wi's and v3?

(d)

Find a matrix P that orthogonally diagonalizes A. Verify your work.

Subsection The Spectral Decomposition of a Symmetric Matrix A

Let A be an n×n symmetric matrix with real entries. The Spectral Theorem tells us we can find an orthonormal basis {u1,u2,,un} of eigenvectors of A. Let Aui=λiui for each 1in. If P=[u1 u2 u3  un], then we know that

PTAP=P1AP=D,

where D is the n×n diagonal matrix

[λ100000λ20000000λn].

Since A=PDPT we see that

A=[u1 u2 u3  un][λ100000λ20000000λn][u1Tu2Tu3TunT]=[λ1u1 λ2u2 λ3u3  λnun][u1Tu2Tu3TunT](27.3)=λ1u1u1T+λ2u2u2T+λ3u3u3T++λnununT,

where the last product follows from Exercise 4. The expression in (27.3) is called a spectral decomposition of the matrix A. Let Pi=uiuiT for each i. The matrices Pi satisfy several special conditions given in the next theorem. The proofs are left to the exercises.

The consequence of Theorem 27.8 is that any symmetric matrix can be written as the sum of symmetric, rank 1 matrices. As we will see later, this kind of decomposition contains much information about the matrix product ATA for any matrix A.

Activity 27.5.

Let A=[422242224]. Let λ1=2, λ2=2, and λ3=8 be the eigenvalues of A. A basis for the eigenspace E8 of A corresponding to the eigenvalue 8 is {[1 1 1]T} and a basis for the eigenspace E2 of A corresponding to the eigenvalue 2 is {[1 1 0]T,[1 0 1]T}. (Compare to Activity 27.4.)

(a)

Find orthonormal eigenvectors u1, u2, and u3 of A corresponding to λ1, λ2, and λ3, respectively.

(b)

Compute λ1u1u1T

(c)

Compute λ2u2u2T

(d)

Compute λ3u3u3T

(e)

Verify that A=λ1u1u1T+λ2u2u2T+λ3u3u3T.

Subsection Examples

What follows are worked examples that use the concepts from this section.

Example 27.9.

For each of the following matrices A, determine if A is diagonalizable. If A is not diagonalizable, explain why. If A is diagonalizable, find a matrix P so that P1AP is a diagonal matrix. If the matrix is diagonalizable, is it orthogonally diagonalizable? If orthogonally diagonalizable, find an orthogonal matrix that diagonalizes A. Use appropriate technology to find eigenvalues and eigenvectors.

(a)

A=[200132110]

Solution.

Recall that an n×n matrix A is diagonalizable if and only if A has n linearly independent eigenvectors, and A is orthogonally diagonalizable if and only if A is symmetric. Since A is not symmetric, A is not orthogonally diagonalizable. Technology shows that the eigenvalues of A are 2 and 1 and bases for the corresponding eigenspaces are {[1 1 0]T,[2 0 1]T} and {[0 1 1]T}. So A is diagonalizable and if P=[120101011], then

P1AP=[200020001].
(b)

A=[110010000]

Solution.

Since A is not symmetric, A is not orthogonally diagonalizable. Technology shows that the eigenvalues of A are 0 and 1 and bases for the corresponding eigenspaces are {[0 0 1]T} and {[1 0 0]T}. We cannot create a basis of R3 consisting of eigenvectors of A, so A is not diagonalizable.

(c)

A=[421272124]

Solution.

Since A is symmetric, A is orthogonally diagonalizable. Technology shows that the eigenvalues of A are 3 and 9 and bases for the eigenspaces {[1 0 1]T,[2 1 0]T} and {[1 2 1]T}, respectively. To find an orthogonal matrix that diagonalizes A, we must find an orthonormal basis of R3 consisting of eigenvectors of A. To do that, we use the Gram-Schmidt process to obtain an orthogonal basis for the eigenspace of A corresponding to the eigenvalue 3. Doing so gives an orthogonal basis {v1,v2}, where v1=[1 0 1]T and

v2=[2 1 0]T[2 1 0]T[1 0 1]T[1 0 1]T[1 0 1]T[1 0 1]T=[2 1 0]T[1 0 1]T=[1 1 1]T.

So an orthonormal basis for R3 of eigenvectors of A is

{12[1 0 1]T,13[1 1 1]T,16[1 1 1]T}.

Therefore, A is orthogonally diagonalizable and if P is the matrix [12131601326121316], then

P1AP=[300030009].

Example 27.10.

Let A=[0001001001001000]. Find an orthonormal basis for R4 consisting of eigenvectors of A.

Solution.

Since A is symmetric, there is an orthogonal matrix P such that P1AP is diagonal. The columns of P will form an orthonormal basis for R4. Using a cofactor expansion along the first row shows that

det(AλI4)=det([λ0010λ1001λ0100λ])=(λ21)2=(λ+1)2(λ1)2.

So the eigenvalues of A are 1 and 1. The reduced row echelon forms of AI4 and A+I4 are, respectively,

[1001011000000000]  and  [1001011000000000].

Thus, a basis for the eigenspace E1 of A is {[0 1 1 0]T,[1 0 0 1]T} and a basis for the eigenspace E1 of A is {[0 1 1 0]T,[1 0 0 1]T}. The set {[0 1 1 0]T,[1 0 0 1]T,[0 1 1 0]T,[1 0 0 1]T} is an orthogonal set, so an orthonormal basis for R4 consisting of eigenvectors of A is

{12[0 1 1 0]T,12[1 0 0 1]T,12[0 1 1 0]T,12[1 0 0 1]T}.

Subsection Summary

  • An n×n matrix A is orthogonally diagonalizable if there is an orthogonal matrix P such that PTAP is a diagonal matrix. Orthogonal diagonalizability is useful in that it allows us to find a “convenient” coordinate system in which to interpret the results of certain matrix transformations. Orthogonal diagonalization also a plays a crucial role in the singular value decomposition of a matrix.

  • An n×n matrix A is symmetric if AT=A. The symmetric matrices are exactly the matrices that can be orthogonally diagonalized.

  • The spectrum of a matrix is the set of eigenvalues of the matrix.

Exercises Exercises

1.

For each of the following matrices, find an orthogonal matrix P so that PTAP is a diagonal matrix, or explain why no such matrix exists.

(a)

A=[3443]

(b)

A=[411114141]

(c)

A=[1200012111113052]

2.

For each of the following matrices find an orthonormal basis of eigenvectors of A. Then find a spectral decomposition of A.

(a)

A=[3443]

(b)

A=[411114141]

(c)

A=[402408024016]

(d)

A=[100002023]

3.

Find a non-diagonal 4×4 matrix with eigenvalues 2, 3 and 6 which can be orthogonally diagonalized.

4.

Let A=[aij]=[c1 c2  cm] be an k×m matrix with columns c1, c2, , cm, and let B=[bij]=[r1r2rm] be an m×n matrix with rows r1, r2, , rm. Show that

AB=[c1 c2  cm][r1r2rm]=c1r1+c2r2++cmrm.

5.

Let A be an n×n symmetric matrix with real entries and let {u1,u2,,un} be an orthonormal basis of eigenvectors of A. For each i, let Pi=uiuiT. Prove Theorem 27.8 — that is, verify each of the following statements.

(a)

For each i, Pi is a symmetric matrix.

(b)

For each i, Pi is a rank 1 matrix.

(c)

For each i, Pi2=Pi.

(d)

If ij, then PiPj=0.

(e)

For each i, Piui=ui.

(f)

If ij, then Piuj=0.

(g)

If v is in Rn, show that

Piv=projSpan{ui}v.

For this reason we call Pi an orthogonal projection matrix.

6.

Show that if M is an n×n matrix and (Mx)y=x(My) for every x,y in Rn, then M is a symmetric matrix.

Hint.

Try x=ei and y=ej.

7.

Let A be an n×n symmetric matrix and assume that A has an orthonormal basis {u1, u2, , un} of eigenvectors of A so that Aui=λiui for each i. Let Pi=uiuiT for each i. It is possible that not all of the eigenvalue of A are distinct. In this case, some of the eigenvalues will be repeated in the spectral decomposition of A. If we want only distinct eigenvalues to appear, we might do the following. Let μ1, μ2, , μk be the distinct eigenvalues of A. For each j between 1 and k, let Qj be the sum of all of the Pi that have μj as eigenvalue.

(a)

The eigenvalues for the matrix A=[0200230000020023] are 1 and 4. Find a basis for each eigenspace and determine each Pi. Then find k, μ1, , μk, and each Qj.

(b)

Show in general (not just for the specific example in part (a), that the Qj satisfy the same properties as the Pi. That is, verify the following.

(i)

A=μ1Q1+μ2Q2+μkQk

Hint.

Collect matrices with the same eigenvalues.

(ii)

Qj is a symmetric matrix for each j

Hint.

Use the fact that each Pi is a symmetric matrix.

(iii)

Qj2=Qj for each j

Hint.

Use Theorem 31.8.

(iv)

QjQ=0 when j

Hint.

Use Theorem 31.8.

(v)

if Eμj is the eigenspace for A corresponding to the eigenvalue μj, and if v is in Rn, then Qjv=projEμjv.

Hint.

Explain why {u1j, u2j, , umj} is a orthonormal basis for Eμj.

(c)

What is the rank of Qj? Verify your answer.

8.

Label each of the following statements as True or False. Provide justification for your response.

(a) True/False.

Every real symmetric matrix is diagonalizable.

(b) True/False.

If P is a matrix whose columns are eigenvectors of a symmetric matrix, then the columns of P are orthogonal.

(c) True/False.

If A is a symmetric matrix, then eigenvectors of A corresponding to distinct eigenvalues are orthogonal.

(d) True/False.

If v1 and v2 are distinct eigenvectors of a symmetric matrix A, then v1 and v2 are orthogonal.

(e) True/False.

Any symmetric matrix can be written as a sum of symmetric rank 1 matrices.

(f) True/False.

If A is a matrix satisfying AT=A, and u and v are vectors satisfying Au=2u and Av=2v, then uv=0.

(g) True/False.

If an n×n matrix A has n orthogonal eigenvectors, then A is a symmetric matrix.

(h) True/False.

If an n×n matrix has n real eigenvalues (counted with multiplicity), then A is a symmetric matrix.

(i) True/False.

For each eigenvalue of a symmetric matrix, the algebraic multiplicity equals the geometric multiplicity.

(j) True/False.

If A is invertible and orthogonally diagonalizable, then so is A1.

(k) True/False.

If A,B are orthogonally diagonalizable n×n matrices, then so is AB.

Subsection Project: The Second Derivative Test for Functions of Two Variables

In this project we will verify the Second Derivative Test for functions of two variables. 48  This test will involve Taylor polynomials and linear algebra. As a quick review, recall that the second order Taylor polynomial for a function f of a single variable x at x=a is

(27.4)P2(x)=f(a)+f(a)(xa)+f(a)2(xa)2.

As with the linearization of a function, the second order Taylor polynomial is a good approximation to f around a — that is f(x)P2(x) for x close to a. If a is a critical number for f with f(a)=0, then

P2(x)=f(a)+f(a)2(xa)2.

In this situation, if f(a)<0, then f(a)2(xa)20 for x close to a, which makes P2(x)f(a). This implies that f(x)P2(x)f(a) for x close to a, which makes f(a) a relative maximum value for f. Similarly, if f(a)>0, then f(a) is a relative minimum.

We now need a Taylor polynomial for a function of two variables. The complication of the additional independent variable in the two variable case means that the Taylor polynomials will need to contain all of the possible monomials of the indicated degrees. Recall that the linearization (or tangent plane) to a function f=f(x,y) at a point (a,b) is given by

P1(x,y)=f(a,b)+fx(a,b)(xa)+fy(a,b)(yb).

Note that P1(a,b)=f(a,b), P1x(a,b)=fx(a,b), and P1y(a,b)=fy(a,b). This makes P1(x,y) the best linear approximation to f near the point (a,b). The polynomial P1(x,y) is the first order Taylor polynomial for f at (a,b).

Similarly, the second order Taylor polynomial P2(x,y) centered at the point (a,b) for the function f is

P2(x,y)=f(a,b)+fx(a,b)(xa)+fy(a,b)(yb)+fxx(a,b)2(xa)2+fxy(a,b)(xa)(yb)+fyy(a,b)2(yb)2.

Project Activity 27.6.

To see that P2(x,y) is the best approximation for f near (a,b), we need to know that the first and second order partial derivatives of P2 agree with the corresponding partial derivatives of f at the point (a,b). Verify that this is true.

We can rewrite this second order Taylor polynomial using matrices and vectors so that we can apply techniques from linear algebra to analyze it. Note that

P2(x,y)=f(a,b)+f(a,b)T[xayb](27.5)+12[xayb]T[fxx(a,b)fxy(a,b)fxy(a,b)fyy(a,b)][xayb],

where f(x,y)=[fx(x,y)fy(x,y)] is the gradient of f and H is the Hessian of f, where H(x,y)=[fxx(x,y)fxy(x,y)fyx(x,y)fyy(x,y)]. 49 

Project Activity 27.7.

Use Equation (27.5) to compute P2(x,y) for f(x,y)=x4+y44xy+1 at (a,b)=(2,3).

The important idea for us is that if (a,b) is a point at which fx and fy are zero, then f is the zero vector and Equation (27.5) reduces to

(27.6)P2(x,y)=f(a,b)+12[xayb]T[fxx(a,b)fxy(a,b)fxy(a,b)fyy(a,b)][xayb],

To make the connection between the multivariable second derivative test and properties of the Hessian, H(a,b), at a critical point of a function f at which f=0, we will need to connect the eigenvalues of a matrix to the determinant and the trace.

Let A be an n×n matrix with eigenvalues λ1, λ2, , λn (not necessarily distinct). Exercise 1 in Section 18 shows that

(27.7)det(A)=λ1λ2λn.

In other words, the determinant of a matrix is equal to the product of the eigenvalues of the matrix. In addition, Exercise 9 in Section 19 shows that

(27.8)trace(A)=λ1+λ2++λn.

for a diagonalizable matrix, where trace(A) is the sum of the diagonal entries of A. Equation (27.8) is true for any square matrix, but we don't need the more general result for this project.

The fact that the Hessian is a symmetric matrix makes it orthogonally diagonalizable. We denote the eigenvalues of H(a,b) as λ1 and λ2. Thus there exists an orthogonal matrix P and a diagonal matrix D=[λ100λ2] such that PTH(a,b)P=D, or H(a,b)=PDPT. Equations (27.7) and (27.8) show that

λ1λ2=fxx(a,b)fyy(a,b)fxy(a,b)2  and  λ1+λ2=fxx(a,b)+fyy(a,b).

Now we have the machinery to verify the Second Derivative Test for Two-Variable Functions. We assume (a,b) is a point in the domain of a function f so that f(a,b)=0. First we consider the case where fxx(a,b)fyy(a,b)fxy(a,b)2<0.

Project Activity 27.8.

Explain why if fxx(a,b)fyy(a,b)fxy(a,b)2<0, then

[xayb]TH(a,b)[xayb]

is indefinite. Explain why this implies that f is “saddle-shaped” near (a,b).

Hint.

Substitute w=[w1w2]=PT[xayb]. What does the graph of f look like in the w1 and w2 directions?

Now we examine the situation when fxx(a,b)fyy(a,b)fxy(a,b)2>0.

Project Activity 27.9.

Assume that fxx(a,b)fyy(a,b)fxy(a,b)2>0.

(a)

Explain why either both fxx(a,b) and fyy(a,b) are positive or both are negative.

(b)

If fxx(a,b)>0 and fyy(a,b)>0, explain why λ1 and λ2 must be positive.

(c)

Explain why, if fxx(a,b)>0 and fyy(a,b)>0, then f(a,b) is a local minimum value for f.

When fxx(a,b)fyy(a,b)fxy(a,b)2>0 and either fxx(a,b) or fyy(a,b) is negative, a slight modification of the preceding argument leads to the fact that f has a local maximum at (a,b) (the details are left to the reader). Therefore, we have proved the Second Derivative Test for functions of two variables!

Project Activity 27.10.

Use the Hessian to classify the local maxima, minima, and saddle points of f(x,y)=x4+y44xy+1. Draw a graph of f to illustrate.

Many thanks to Professor Paul Fishback for sharing his activity on this topic. Much of this project comes from his activity.
Note that under reasonable conditions (e.g., that f has continuous second order mixed partial derivatives in some open neighborhood containing (x,y)) we have that fxy(x,y)=fyx(x,y) and H(x,y)=[fxx(a,b)fxy(a,b)fxy(a,b)fyy(a,b)] is a symmetric matrix. We will only consider functions that satisfy these reasonable conditions.