Skip to main content

Section 27 Orthogonal Diagonalization

Subsection Application: The Multivariable Second Derivative Test

In single variable calculus, we learn that the second derivative can be used to classify a critical point of the type where the derivative of a function is 0 as a local maximum or minimum.

In the two-variable case we have an analogous test, which is usually seen in a multivariable calculus course.

A proof of this test for two-variable functions is based on Taylor polynomials, and relies on symmetric matrices, eigenvalues, and quadratic forms. The steps for a proof will be found later in this section.

Subsection Introduction

We have seen how to diagonalize a matrix — if we can find \(n\) linearly independent eigenvectors of an \(n\times n\) matrix \(A\) and let \(P\) be the matrix whose columns are those eigenvectors, then \(P^{-1}AP\) is a diagonal matrix with the eigenvalues down the diagonal in the same order corresponding to the eigenvectors placed in \(P\text{.}\) We will see that in certain cases we can take this one step further and create an orthogonal matrix with eigenvectors as columns to diagonalize a matrix. This is called orthogonal diagonalization. Orthogonal diagonalizability is useful in that it allows us to find a “convenient” coordinate system in which to interpret the results of certain matrix transformations. A set of orthonormal basis vectors for an orthogonally diagonalizable matrix \(A\) is called a set of principal axes for \(A\text{.}\) Orthogonal diagonalization will also play a crucial role in the singular value decomposition of a matrix, a decomposition that has been described by some as the “pinnacle” of linear algebra.

Definition 27.3.

An \(n \times n\) matrix \(A\) is orthogonally diagonalizable if there is an orthogonal matrix \(P\) such that

\begin{equation*} P^{\tr}AP \end{equation*}

is a diagonal matrix. We say that the matrix \(P\) orthogonally diagonalizes the matrix \(A\text{.}\)

Preview Activity 27.1.

(a)

For each matrix \(A\) whose eigenvalues and corresponding eigenvectors are given, find a matrix \(P\) such that \(P^{-1}AP\) is a diagonal matrix.

(i)

\(A = \left[ \begin{array}{cc} 1\amp 2 \\ 2\amp 1 \end{array} \right]\) with eigenvalues \(-1\) and 3 and corresponding eigenvectors \(\vv_1 = \left[ \begin{array}{r} -1 \\ 1 \end{array} \right]\) and \(\vv_2 = \left[ \begin{array}{c} 1 \\ 1 \end{array} \right]\text{.}\)

(ii)

\(A = \left[ \begin{array}{cc} 1\amp 2 \\ 1\amp 2 \end{array} \right]\) with eigenvalues \(0\) and \(3\) and corresponding eigenvectors \(\vv_1 = \left[ \begin{array}{r} -2 \\ 1 \end{array} \right]\) and \(\vv_2 = \left[ \begin{array}{c} 1 \\ 1 \end{array} \right]\text{.}\)

(iii)

\(A = \left[ \begin{array}{ccc} 1\amp 0\amp 1 \\ 0\amp 1\amp 1 \\ 1\amp 1\amp 2 \end{array} \right]\) with eigenvalues \(0\text{,}\) \(1\text{,}\) and \(3\) and corresponding eigenvectors \(\vv_1 = \left[ \begin{array}{r} -1 \\ -1 \\ 1 \end{array} \right]\text{,}\) \(\vv_2 = \left[ \begin{array}{r} -1 \\ 1 \\ 0 \end{array} \right]\text{,}\) and \(\vv_3 = \left[ \begin{array}{c} 1 \\ 1 \\ 2 \end{array} \right]\text{.}\)

(b)

Which matrices in part 1 seem to satisfy the orthogonal diagonalization requirement? Do you notice any common traits among these matrices?

Subsection Symmetric Matrices

As we saw in Preview Activity 27.1, matrices that are not symmetric need not be orthogonally diagonalizable, but the symmetric matrix examples are orthogonally diagonalizable. We explore that idea in this section.

If \(P\) is a matrix that orthogonally diagonalizes the matrix \(A\text{,}\) then \(P^{\tr}AP = D\text{,}\) where \(D\) is a diagonal matrix. Since \(D^{\tr} = D\) and \(A = PDP^{\tr}\text{,}\) we have

\begin{align*} A \amp = PDP^{\tr}\\ \amp = PD^{\tr}P^{\tr}\\ \amp = \left(P^{\tr}\right)^{\tr}D^{\tr}P^{\tr}\\ \amp = \left(PDP^{\tr}\right)^{\tr}\\ \amp = A^{\tr}\text{.} \end{align*}

Therefore, \(A^{\tr} = A\) and matrices with this property are the only matrices that can be orthogonally diagonalized. Recall that any matrix \(A\) satisfying \(A^{\tr} = A\) is a symmetric matrix.

While we have just shown that the only matrices that can be orthogonally diagonalized are the symmetric matrices, the amazing thing about symmetric matrices is that every symmetric matrix can be orthogonally diagonalized. We will prove this shortly.

Symmetric matrices have useful properties, a few of which are given in the following activity (we will use some of these properties later in this section).

Activity 27.2.

Let \(A\) be a symmetric \(n \times n\) matrix and let \(\vx\) and \(\vy\) be vectors in \(\R^n\text{.}\)

(a)

Show that \(\vx^{\tr} A \vy = (A\vx)^{\tr} \vy\text{.}\)

(b)

Show that \((A\vx) \cdot \vy = \vx \cdot (A\vy)\text{.}\)

(c)

Show that the eigenvalues of a \(2 \times 2\) symmetric matrix \(A = \left[ \begin{array}{cc} a\amp b\\b\amp c \end{array} \right]\) are real.

Activity 27.2 (c) shows that a \(2 \times 2\) symmetric matrix has real eigenvalues. This is a general result about real symmetric matrices.

Proof.

Let \(A\) be an \(n\times n\) symmetric matrix with real entries and let \(\lambda\) be an eigenvalue of \(A\) with eigenvector \(\vv\text{.}\) To show that \(\lambda\) is real, we will show that \(\overline{\lambda} = \lambda\text{.}\) We know

\begin{equation} A \vv = \lambda \vv\text{.}\tag{27.1} \end{equation}

Since \(A\) has real entries, we also know that \(\overline{\lambda}\) is an eigenvalue for \(A\) with eigenvector \(\overline{\vv}\text{.}\) Multiply both sides of (27.1) on the left by \(\overline{\vv}^{\tr}\) to obtain

\begin{equation} \overline{\vv}^{\tr} A \vv = \overline{\vv}^{\tr} \lambda \vv = \lambda \left(\overline{\vv}^{\tr} \vv \right)\text{.}\tag{27.2} \end{equation}

Now

\begin{equation*} \overline{\vv}^{\tr} A \vv = (A\overline{\vv})^{\tr} \vv = (\overline{\lambda} \ \overline{\vv})^{\tr} \vv = \overline{\lambda} \left(\overline{\vv}^{\tr} \vv \right) \end{equation*}

and equation (27.2) becomes

\begin{equation*} \overline{\lambda} \left(\overline{\vv}^{\tr} \vv \right) = \lambda \left(\overline{\vv}^{\tr} \vv \right)\text{.} \end{equation*}

Since \(\vv \neq \vzero\text{,}\) this implies that \(\overline{\lambda} = \lambda\) and \(\lambda\) is real.

To orthogonally diagonalize a matrix, it must be the case that eigenvectors corresponding to different eigenvalues are orthogonal. This is an important property and it would be useful to know when it happens.

Activity 27.3.

Let \(A\) be a real symmetric matrix with eigenvalues \(\lambda_1\) and \(\lambda_2\) and corresponding eigenvectors \(\vv_1\) and \(\vv_2\text{,}\) respectively.

(a)

Use Activity 27.2 (b) to show that \(\lambda_1 \vv_1\cdot \vv_2 = \lambda_2 \vv_1 \cdot \vv_2\text{.}\)

(b)

Explain why the result of part (a) shows that \(\vv_1\) and \(\vv_2\) are orthogonal if \(\lambda_1\neq \lambda_2\text{.}\)

Activity 27.3 proves the following theorem.

Recall that the only matrices that can be orthogonally diagonalized are the symmetric matrices. Now we show that every real symmetric matrix can be orthogonally diagonalized, which completely characterizes the matrices that are orthogonally diagonalizable. The proof of the following theorem proceeds by induction. A reader who has not yet encountered this technique of proof can safely skip the proof of this theorem without loss of continuity.

Proof.

Let \(A\) be a real \(n \times n\) symmetric matrix. The proof proceeds by induction on \(n\text{.}\) If \(n = 1\text{,}\) then \(A\) is diagonal and orthogonally diagonalizable. So assume that any real \((n-1) \times (n-1)\) symmetric matrix is orthogonally diagonalizable. Assume that \(A\) is a real \(n \times n\) symmetric matrix. By Theorem 25.4 (find reference), the eigenvalues of \(A\) are real. Let \(\lambda_1\) be a real eigenvalue of \(A\) with corresponding unit eigenvector \(\vp_1\text{.}\) We can use the Gram-Schmidt process to extend \(\{\vp_1\}\) to an orthonormal basis \(\{\vp_1, \vp_2, \ldots, \vp_n\}\) for \(\R^n\text{.}\) Let \(P_1 = [\vp_1 \ \vp_2 \ \ldots \ \vp_n]\text{.}\) Then \(P_1\) is an orthogonal matrix. Also,

\begin{align*} P_1^{-1}AP_1 \amp = P_1^{\tr}AP_1\\ \amp = P_1^{\tr} [A\vp_1 \ A\vp_2 \ \ldots \ A\vp_n]\\ \amp = \left[ \begin{array}{c} \vp_1^{\tr}\\ \vp_2^{\tr}\\ \vdots\\ \vp_n^{\tr} \end{array} \right] [\lambda \vp_1 \ A\vp_2 \ \ldots \ A\vp_n]\\ \amp = \left[ \begin{array}{ccccc} \vp_1^{\tr} \lambda_1 \vp_1 \amp \vp_1^{\tr} A\vp_2 \amp \vp_1 A\vp_3 \amp \cdots \amp \vp_1^{\tr} A \vp_n\\ \vp_2^{\tr} \lambda_1 \vp_1 \amp \vp_2^{\tr} A\vp_2 \amp \vp_2 A\vp_3 \amp \cdots \amp \vp_2^{\tr} A \vp_n\\ \amp \amp \vdots \amp \amp\\ \vp_n^{\tr} \lambda_1 \vp_1 \amp \vp_n^{\tr} A\vp_2 \amp \vp_n A\vp_3 \amp \cdots \amp \vp_n^{\tr} A \vp_n \end{array} \right]\\ \amp = \left[ \begin{array}{ccccc} \lambda_1\amp \vp_1^{\tr} A\vp_2 \amp \vp_1 A\vp_3 \amp \cdots \amp \vp_1^{\tr} A \vp_n\\ 0 \amp \vp_2^{\tr} A\vp_2 \amp \vp_2 A\vp_3 \amp \cdots \amp \vp_2^{\tr} A \vp_n\\ \amp \amp \vdots \amp \amp\\ 0 \amp \vp_n^{\tr} A\vp_2 \amp \vp_n A\vp_3 \amp \cdots \amp \vp_n^{\tr} A \vp_n \end{array} \right]\\ \amp = \left[ \begin{array}{cc} \lambda_1\amp \vx^{\tr}\\ \vzero \amp A_1 \end{array} \right] \end{align*}

where \(\vx\) is a \((n-1)\times 1\) vector, \(\vzero\) is the zero vector in \(\R^{n-1}\text{,}\) and \(A_1\) is an \((n-1) \times (n-1)\) matrix. Letting \(R = P_1^{\tr}AP_1\) we have that

\begin{equation*} R^{\tr} = \left(P_1^{\tr}AP_1\right)^{\tr} = P_1^{\tr}A^{\tr}P_1 = P_1^{\tr}AP_1 = R\text{,} \end{equation*}

so \(R\) is a symmetric matrix. Therefore, \(\vx = \vzero\) and \(A_1\) is a symmetric matrix. By our induction hypothesis, \(A_1\) is orthogonally diagonalizable. That is, there exists an \((n-1) \times (n-1)\) orthogonal matrix \(Q\) such that \(Q^{\tr}A_1Q = D_1\text{,}\) where \(D_1\) is a diagonal matrix. Now define \(P_2\) by

\begin{equation*} P_2 = \left[ \begin{array}{cc} 1\amp \vzero^{\tr} \\ \vzero \amp Q \end{array} \right]\text{,} \end{equation*}

where \(\vzero\) is the zero vector in \(\R^{n-1}\text{.}\) By construction, the columns of \(P_2\) are orthonormal, so \(P_2\) is an orthogonal matrix. Since \(P_1\) is also an orthogonal matrix,

\begin{equation*} P^{\tr} = (P_1P_2)^{\tr} = P_2^{\tr}P_1^{\tr} = P_2^{-1}P_1^{-1} = (P_1P_2)^{-1} = P^{-1} \end{equation*}

and \(P\) is an orthogonal matrix. Finally,

\begin{align*} P^{\tr}AP \amp = (P_1P_2)^{\tr}A(P_1P_2)\\ \amp = P_2^{\tr}\left(P_1^{\tr}AP_1\right)P_2\\ \amp = \left[ \begin{array}{cc} 1\amp \vzero^{\tr}\\ \vzero \amp Q \end{array} \right]^{\tr} \left[ \begin{array}{cc} \lambda_1\amp \vx^{\tr}\\ 0 \amp A_1 \end{array} \right] \left[ \begin{array}{cc} 1\amp \vzero^{\tr}\\ \vzero \amp Q \end{array} \right]\\ \amp = \left[ \begin{array}{cc} 1\amp \vzero^{\tr}\\ \vzero \amp Q^{\tr} \end{array} \right] \left[ \begin{array}{cc} \lambda_1\amp \vx^{\tr}\\ 0 \amp A_1 \end{array} \right]\left[ \begin{array}{cc} 1\amp \vzero^{\tr}\\ \vzero \amp Q \end{array} \right]\\ \amp = \left[ \begin{array}{cc} \lambda_1\amp \vzero^{\tr}\\ \vzero \amp Q^{\tr}A_1Q \end{array} \right]\\ \amp = \left[ \begin{array}{cc} \lambda_1\amp \vzero^{\tr}\\ \vzero \amp D_1 \end{array} \right]\text{.} \end{align*}

Therefore, \(P^{\tr}AP\) is a diagonal matrix and \(P\) orthogonally diagonalizes \(A\text{.}\) This completes our proof.

The set of eigenvalues of a matrix \(A\) is called the spectrum of \(A\) and we have just proved the following theorem.

So any real symmetric matrix is orthogonally diagonalizable. We have seen examples of the orthogonal diagonalization of \(n \times n\) real symmetric matrices with \(n\) distinct eigenvalues, but how do we orthogonally diagonalize a symmetric matrix having eigenvalues of multiplicity greater than 1? The next activity shows us the process.

Activity 27.4.

Let \(A = \left[ \begin{array}{ccc} 4\amp 2\amp 2 \\ 2\amp 4\amp 2 \\ 2\amp 2\amp 4 \end{array} \right]\text{.}\) The eigenvalues of \(A\) are 2 and 8, with eigenspace of dimension 2 and dimension 1, respectively.

(a)

Explain why \(A\) can be orthogonally diagonalized.

(b)

Two linearly independent eigenvectors for \(A\) corresponding to the eigenvalue 2 are \(\vv_1 = \left[ \begin{array}{r} -1 \\ 0 \\ 1 \end{array} \right]\) and \(\vv_2 = \left[ \begin{array}{r} -1 \\ 1 \\ 0 \end{array} \right]\text{.}\) Note that \(\vv_1, \vv_2\) are not orthogonal, so cannot be in an orthogonal basis of \(\R^3\) consisting of eigenvectors of \(A\text{.}\) So find a set \(\{\vw_1, \vw_2\}\) of orthogonal eigenvectors of \(A\) so that \(\Span\{\vw_1, \vw_2\} = \Span\{\vv_1, \vv_2\}\text{.}\)

(c)

The vector \(\vv_3=\left[ \begin{array}{c} 1 \\ 1 \\ 1 \end{array} \right]\) is an eigenvector for \(A\) corresponding to the eigenvalue 8. What can you say about the orthogonality relationship between \(\vw_i\)'s and \(\vv_3\text{?}\)

(d)

Find a matrix \(P\) that orthogonally diagonalizes \(A\text{.}\) Verify your work.

Subsection The Spectral Decomposition of a Symmetric Matrix \(A\)

Let \(A\) be an \(n \times n\) symmetric matrix with real entries. The Spectral Theorem tells us we can find an orthonormal basis \(\{\vu_1, \vu_2, \ldots, \vu_n\}\) of eigenvectors of \(A\text{.}\) Let \(A \vu_i = \lambda_i \vu_i\) for each \(1 \leq i \leq n\text{.}\) If \(P = [ \vu_1 \ \vu_2 \ \vu_3 \ \cdots \ \vu_n]\text{,}\) then we know that

\begin{equation*} P^{\tr}AP = P^{-1}AP = D\text{,} \end{equation*}

where \(D\) is the \(n \times n\) diagonal matrix

\begin{equation*} \left[ \begin{array}{cccccc} \lambda_1\amp 0\amp 0\amp \cdots\amp 0\amp 0 \\ 0\amp \lambda_2\amp 0\amp \cdots\amp 0\amp 0 \\ \vdots\amp \vdots\amp \vdots\amp \amp \vdots\amp \vdots \\ 0\amp 0\amp 0\amp \cdots\amp 0\amp \lambda_n \end{array} \right]\text{.} \end{equation*}

Since \(A = PDP^{\tr}\) we see that

\begin{align} A \amp = [ \vu_1 \ \vu_2 \ \vu_3 \ \cdots \ \vu_n] \left[ \begin{array}{cccccc} \lambda_1\amp 0\amp 0\amp \cdots\amp 0\amp 0\notag\\ 0\amp \lambda_2\amp 0\amp \cdots\amp 0\amp 0\notag\\ \vdots\amp \vdots\amp \vdots\amp \amp \vdots\amp \vdots\notag\\ 0\amp 0\amp 0\amp \cdots\amp 0\amp \lambda_n \end{array} \right] \left[ \begin{array}{c} \vu_1^{\tr}\notag\\ \vu_2^{\tr}\notag\\ \vu_3^{\tr}\notag\\ \vdots\notag\\ \vu_n^{\tr} \end{array} \right]\notag\\ \amp = [ \lambda_1\vu_1 \ \lambda_2\vu_2 \ \lambda_3\vu_3 \ \cdots \ \lambda_n\vu_n] \left[ \begin{array}{c} \vu_1^{\tr}\notag\\ \vu_2^{\tr}\notag\\ \vu_3^{\tr}\notag\\ \vdots\notag\\ \vu_n^{\tr} \end{array} \right]\notag\\ \amp = \lambda_1 \vu_1\vu_1^{\tr} + \lambda_2 \vu_2\vu_2^{\tr} + \lambda_3 \vu_3\vu_3^{\tr} + \cdots + \lambda_n \vu_n\vu_n^{\tr}\text{,}\tag{27.3} \end{align}

where the last product follows from Exercise 4. The expression in (27.3) is called a spectral decomposition of the matrix \(A\text{.}\) Let \(P_i = \vu_i\vu_i^{\tr}\) for each \(i\text{.}\) The matrices \(P_i\) satisfy several special conditions given in the next theorem. The proofs are left to the exercises.

The consequence of Theorem 27.8 is that any symmetric matrix can be written as the sum of symmetric, rank 1 matrices. As we will see later, this kind of decomposition contains much information about the matrix product \(A^{\tr}A\) for any matrix \(A\text{.}\)

Activity 27.5.

Let \(A = \left[ \begin{array}{ccc} 4\amp 2\amp 2 \\ 2\amp 4\amp 2 \\ 2\amp 2\amp 4 \end{array} \right]\text{.}\) Let \(\lambda_1 = 2\text{,}\) \(\lambda_2 = 2\text{,}\) and \(\lambda_3 = 8\) be the eigenvalues of \(A\text{.}\) A basis for the eigenspace \(E_8\) of \(A\) corresponding to the eigenvalue 8 is \(\{[1 \ 1\ 1]^{\tr}\}\) and a basis for the eigenspace \(E_2\) of \(A\) corresponding to the eigenvalue 2 is \(\{[1 \ -1\ 0]^{\tr}, [1 \ 0 \ -1]^{\tr}\}\text{.}\) (Compare to Activity 27.4.)

(a)

Find orthonormal eigenvectors \(\vu_1\text{,}\) \(\vu_2\text{,}\) and \(\vu_3\) of \(A\) corresponding to \(\lambda_1\text{,}\) \(\lambda_2\text{,}\) and \(\lambda_3\text{,}\) respectively.

(b)

Compute \(\lambda_1 \vu_1\vu_1^{\tr}\)

(c)

Compute \(\lambda_2 \vu_2\vu_2^{\tr}\)

(d)

Compute \(\lambda_3 \vu_3\vu_3^{\tr}\)

(e)

Verify that \(A = \lambda_1 \vu_1\vu_1^{\tr} + \lambda_2 \vu_2\vu_2^{\tr} + \lambda_3 \vu_3\vu_3^{\tr}\text{.}\)

Subsection Examples

What follows are worked examples that use the concepts from this section.

Example 27.9.

For each of the following matrices \(A\text{,}\) determine if \(A\) is diagonalizable. If \(A\) is not diagonalizable, explain why. If \(A\) is diagonalizable, find a matrix \(P\) so that \(P^{-1}AP\) is a diagonal matrix. If the matrix is diagonalizable, is it orthogonally diagonalizable? If orthogonally diagonalizable, find an orthogonal matrix that diagonalizes \(A\text{.}\) Use appropriate technology to find eigenvalues and eigenvectors.

(a)

\(A = \left[ \begin{array}{rrc} 2 \amp 0 \amp 0 \\ -1 \amp 3 \amp 2 \\ 1 \amp -1 \amp 0 \end{array} \right]\)

Solution.

Recall that an \(n \times n\) matrix \(A\) is diagonalizable if and only if \(A\) has \(n\) linearly independent eigenvectors, and \(A\) is orthogonally diagonalizable if and only if \(A\) is symmetric. Since \(A\) is not symmetric, \(A\) is not orthogonally diagonalizable. Technology shows that the eigenvalues of \(A\) are \(2\) and \(1\) and bases for the corresponding eigenspaces are \(\{ [1 \ 1\ 0]^{\tr}, [2 \ 0 \ 1]^{\tr} \}\) and \(\{[0 \ -1 \ 1]^{\tr}\}\text{.}\) So \(A\) is diagonalizable and if \(P = \left[ \begin{array}{rcr} 1\amp 2\amp 0\\1\amp 0\amp -1\\0\amp 1\amp 1 \end{array} \right]\text{,}\) then

\begin{equation*} P^{-1}AP = \left[ \begin{array}{ccc} 2\amp 0\amp 0\\0\amp 2\amp 0\\0\amp 0\amp 1 \end{array} \right]\text{.} \end{equation*}
(b)

\(A = \left[ \begin{array}{ccc} 1 \amp 1 \amp 0 \\ 0 \amp 1 \amp 0 \\ 0 \amp 0 \amp 0 \end{array} \right]\)

Solution.

Since \(A\) is not symmetric, \(A\) is not orthogonally diagonalizable. Technology shows that the eigenvalues of \(A\) are \(0\) and \(1\) and bases for the corresponding eigenspaces are \(\{[0 \ 0 \ 1]^{\tr}\}\) and \(\{ [1 \ 0\ 0]^{\tr} \}\text{.}\) We cannot create a basis of \(\R^3\) consisting of eigenvectors of \(A\text{,}\) so \(A\) is not diagonalizable.

(c)

\(A = \left[ \begin{array}{ccc} 4 \amp 2 \amp 1 \\ 2 \amp 7 \amp 2 \\ 1 \amp 2 \amp 4 \end{array} \right]\)

Solution.

Since \(A\) is symmetric, \(A\) is orthogonally diagonalizable. Technology shows that the eigenvalues of \(A\) are \(3\) and \(9\) and bases for the eigenspaces \(\{[-1 \ 0 \ 1]^{\tr}, [-2 \ 1 \ 0]^{\tr}\}\) and \(\{ [1 \ 2 \ 1]^{\tr} \}\text{,}\) respectively. To find an orthogonal matrix that diagonalizes \(A\text{,}\) we must find an orthonormal basis of \(\R^3\) consisting of eigenvectors of \(A\text{.}\) To do that, we use the Gram-Schmidt process to obtain an orthogonal basis for the eigenspace of \(A\) corresponding to the eigenvalue \(3\text{.}\) Doing so gives an orthogonal basis \(\{\vv_1, \vv_2\}\text{,}\) where \(\vv_1 = [-1 \ 0 \ 1]^{\tr}\) and

\begin{align*} \vv_2 \amp = [-2 \ 1 \ 0]^{\tr} - \frac{ [-2 \ 1 \ 0]^{\tr} \cdot [-1 \ 0 \ 1]^{\tr}}{[-1 \ 0 \ 1]^{\tr} \cdot [-1 \ 0 \ 1]^{\tr}} [-1 \ 0 \ 1]^{\tr}\\ \amp = [-2 \ 1 \ 0]^{\tr} - [-1 \ 0 \ 1]^{\tr}\\ \amp = [ -1 \ 1 \ -1]^{\tr}\text{.} \end{align*}

So an orthonormal basis for \(\R^3\) of eigenvectors of \(A\) is

\begin{equation*} \left\{\frac{1}{\sqrt{2}} [-1 \ 0 \ 1]^{\tr}, \frac{1}{\sqrt{3}}[ -1 \ 1 \ -1]^{\tr}, \frac{1}{\sqrt{6}}[1 \ 1 \ 1]^{\tr} \right\}\text{.} \end{equation*}

Therefore, \(A\) is orthogonally diagonalizable and if \(P\) is the matrix \(\left[{ \begin{array}{rrc} -\frac{1}{\sqrt{2}}\amp -\frac{1}{\sqrt{3}}\amp \frac{1}{\sqrt{6}}\\0\amp \frac{1}{\sqrt{3}}\amp \frac{2}{\sqrt{6}}\\\frac{1}{\sqrt{2}}\amp -\frac{1}{\sqrt{3}}\amp \frac{1}{\sqrt{6}} \end{array} } \right]\text{,}\) then

\begin{equation*} P^{-1}AP = \left[ \begin{array}{ccc} 3\amp 0\amp 0\\0\amp 3\amp 0\\0\amp 0\amp 9 \end{array} \right]\text{.} \end{equation*}

Example 27.10.

Let \(A = \left[ \begin{array}{cccc} 0\amp 0\amp 0\amp 1 \\ 0\amp 0\amp 1\amp 0 \\ 0\amp 1\amp 0\amp 0 \\ 1\amp 0\amp 0\amp 0 \end{array} \right]\text{.}\) Find an orthonormal basis for \(\R^4\) consisting of eigenvectors of \(A\text{.}\)

Solution.

Since \(A\) is symmetric, there is an orthogonal matrix \(P\) such that \(P^{-1}AP\) is diagonal. The columns of \(P\) will form an orthonormal basis for \(\R^4\text{.}\) Using a cofactor expansion along the first row shows that

\begin{align*} \det(A-\lambda I_4) \amp = \det\left(\left[ \begin{array}{rrrr} -\lambda\amp 0\amp 0\amp 1\\ 0\amp -\lambda\amp 1\amp 0\\ 0\amp 1\amp -\lambda\amp 0\\ 1\amp 0\amp 0\amp -\lambda \end{array} \right] \right)\\ \amp = \left(\lambda^2-1\right)^2\\ \amp = (\lambda+1)^2(\lambda-1)^2\text{.} \end{align*}

So the eigenvalues of \(A\) are \(1\) and \(-1\text{.}\) The reduced row echelon forms of \(A-I_4\) and \(A+I_4\) are, respectively,

\begin{equation*} \left[ \begin{array}{ccrr} 1\amp 0\amp 0\amp -1 \\ 0\amp 1\amp -1\amp 0 \\0\amp 0\amp 0\amp 0 \\ 0\amp 0\amp 0\amp 0 \end{array} \right] \ \text{ and } \ \left[ \begin{array}{cccc} 1\amp 0\amp 0\amp 1 \\ 0\amp 1\amp 1\amp 0 \\0\amp 0\amp 0\amp 0 \\ 0\amp 0\amp 0\amp 0 \end{array} \right]\text{.} \end{equation*}

Thus, a basis for the eigenspace \(E_{1}\) of \(A\) is \(\{[0 \ 1 \ 1 \ 0]^{\tr}, [1 \ 0 \ 0 \ 1]^{\tr}\}\) and a basis for the eigenspace \(E_{-1}\) of \(A\) is \(\{[0 \ 1 \ -1 \ 0]^{\tr}, [1 \ 0 \ 0 \ -1]^{\tr}\}\text{.}\) The set \(\{[0 \ 1 \ 1 \ 0]^{\tr}, [1 \ 0 \ 0 \ 1]^{\tr}, [0 \ 1 \ -1 \ 0]^{\tr}, [1 \ 0 \ 0 \ -1]^{\tr}\}\) is an orthogonal set, so an orthonormal basis for \(\R^4\) consisting of eigenvectors of \(A\) is

\begin{equation*} \left\{\frac{1}{\sqrt{2}} [0 \ 1 \ 1 \ 0]^{\tr}, \frac{1}{\sqrt{2}}[1 \ 0 \ 0 \ 1]^{\tr}, \frac{1}{\sqrt{2}}[0 \ 1 \ -1 \ 0]^{\tr}, \frac{1}{\sqrt{2}}[1 \ 0 \ 0 \ -1]^{\tr}\right\}\text{.} \end{equation*}

Subsection Summary

  • An \(n \times n\) matrix \(A\) is orthogonally diagonalizable if there is an orthogonal matrix \(P\) such that \(P^{\tr}AP\) is a diagonal matrix. Orthogonal diagonalizability is useful in that it allows us to find a “convenient” coordinate system in which to interpret the results of certain matrix transformations. Orthogonal diagonalization also a plays a crucial role in the singular value decomposition of a matrix.

  • An \(n \times n\) matrix \(A\) is symmetric if \(A^{\tr} = A\text{.}\) The symmetric matrices are exactly the matrices that can be orthogonally diagonalized.

  • The spectrum of a matrix is the set of eigenvalues of the matrix.

Exercises Exercises

1.

For each of the following matrices, find an orthogonal matrix \(P\) so that \(P^{\tr}AP\) is a diagonal matrix, or explain why no such matrix exists.

(a)

\(A = \left[ \begin{array}{rr} 3\amp -4 \\ -4\amp -3 \end{array} \right]\)

(b)

\(A = \left[ \begin{array}{ccc} 4\amp 1\amp 1 \\ 1\amp 1\amp 4 \\ 1\amp 4\amp 1 \end{array} \right]\)

(c)

\(A = \left[ \begin{array}{cccc} 1\amp 2\amp 0\amp 0 \\ 0\amp 1\amp 2\amp 1 \\ 1\amp 1\amp 1\amp 1 \\ 3\amp 0\amp 5\amp 2 \end{array} \right]\)

2.

For each of the following matrices find an orthonormal basis of eigenvectors of \(A\text{.}\) Then find a spectral decomposition of \(A\text{.}\)

(a)

\(A = \left[ \begin{array}{rr} 3\amp -4 \\ -4\amp -3 \end{array} \right]\)

(b)

\(A = \left[ \begin{array}{ccc} 4\amp 1\amp 1 \\ 1\amp 1\amp 4 \\ 1\amp 4\amp 1 \end{array} \right]\)

(c)

\(A = \left[ \begin{array}{rrr} -4\amp 0\amp -24 \\ 0\amp -8\amp 0 \\ -24\amp 0\amp 16 \end{array} \right]\)

(d)

\(A = \left[ \begin{array}{ccr} 1\amp 0\amp 0 \\ 0\amp 0\amp 2 \\ 0\amp 2\amp -3 \end{array} \right]\)

3.

Find a non-diagonal \(4 \times 4\) matrix with eigenvalues 2, 3 and 6 which can be orthogonally diagonalized.

4.

Let \(A = [a_{ij}] = [ \vc_1 \ \vc_2 \ \cdots \ \vc_m]\) be an \(k \times m\) matrix with columns \(\vc_1\text{,}\) \(\vc_2\text{,}\) \(\ldots\text{,}\) \(\vc_m\text{,}\) and let \(B = [b_{ij}] = \left[ \begin{array}{c} \vr_1 \\ \vr_2 \\ \vdots \\ \vr_m \end{array} \right]\) be an \(m \times n\) matrix with rows \(\vr_1\text{,}\) \(\vr_2\text{,}\) \(\ldots\text{,}\) \(\vr_m\text{.}\) Show that

\begin{equation*} AB = [ \vc_1 \ \vc_2 \ \cdots \ \vc_m]\left[\begin{array}{c} \vr_1 \\ \vr_2 \\ \vdots \\ \vr_m \end{array} \right] = \vc_1\vr_1 + \vc_2\vr_2 + \cdots + \vc_m \vr_m\text{.} \end{equation*}

5.

Let \(A\) be an \(n \times n\) symmetric matrix with real entries and let \(\{\vu_1, \vu_2, \ldots, \vu_n\}\) be an orthonormal basis of eigenvectors of \(A\text{.}\) For each \(i\text{,}\) let \(P_i = \vu_i\vu_i^{\tr}\text{.}\) Prove Theorem 27.8 — that is, verify each of the following statements.

(a)

For each \(i\text{,}\) \(P_i\) is a symmetric matrix.

(b)

For each \(i\text{,}\) \(P_i\) is a rank 1 matrix.

(c)

For each \(i\text{,}\) \(P_i^2 = P_i\text{.}\)

(d)

If \(i \neq j\text{,}\) then \(P_iP_j = 0\text{.}\)

(e)

For each \(i\text{,}\) \(P_i \vu_i = \vu_i\text{.}\)

(f)

If \(i \neq j\text{,}\) then \(P_i \vu_j = 0\text{.}\)

(g)

If \(\vv\) is in \(\R^n\text{,}\) show that

\begin{equation*} P_i \vv = \proj_{\Span\{\vu_i\} } \vv\text{.} \end{equation*}

For this reason we call \(P_i\) an orthogonal projection matrix.

6.

Show that if \(M\) is an \(n \times n\) matrix and \((M\vx) \cdot \vy = \vx \cdot (M\vy)\) for every \(\vx, \vy\) in \(\R^n\text{,}\) then \(M\) is a symmetric matrix.

Hint.

Try \(\vx = \textbf{e}_i\) and \(\vy = \textbf{e}_j\text{.}\)

7.

Let \(A\) be an \(n \times n\) symmetric matrix and assume that \(A\) has an orthonormal basis \(\{\vu_1\text{,}\) \(\vu_2\text{,}\) \(\ldots\text{,}\) \(\vu_n\}\) of eigenvectors of \(A\) so that \(A \vu_i = \lambda_i \vu_i\) for each \(i\text{.}\) Let \(P_i = \vu_i\vu_i^{\tr}\) for each \(i\text{.}\) It is possible that not all of the eigenvalue of \(A\) are distinct. In this case, some of the eigenvalues will be repeated in the spectral decomposition of \(A\text{.}\) If we want only distinct eigenvalues to appear, we might do the following. Let \(\mu_1\text{,}\) \(\mu_2\text{,}\) \(\ldots\text{,}\) \(\mu_k\) be the distinct eigenvalues of \(A\text{.}\) For each \(j\) between 1 and \(k\text{,}\) let \(Q_j\) be the sum of all of the \(P_i\) that have \(\mu_j\) as eigenvalue.

(a)

The eigenvalues for the matrix \(A = \left[ \begin{array}{cccc} 0\amp 2\amp 0\amp 0 \\ 2\amp 3\amp 0\amp 0 \\ 0\amp 0\amp 0\amp 2 \\ 0\amp 0\amp 2\amp 3 \end{array} \right]\) are \(-1\) and \(4\text{.}\) Find a basis for each eigenspace and determine each \(P_i\text{.}\) Then find \(k\text{,}\) \(\mu_1\text{,}\) \(\ldots\text{,}\) \(\mu_k\text{,}\) and each \(Q_j\text{.}\)

(b)

Show in general (not just for the specific example in part (a), that the \(Q_j\) satisfy the same properties as the \(P_i\text{.}\) That is, verify the following.

(i)

\(A = \mu_1 Q_1 + \mu_2 Q_2 + \cdots \mu_k Q_k\)

Hint.

Collect matrices with the same eigenvalues.

(ii)

\(Q_j\) is a symmetric matrix for each \(j\)

Hint.

Use the fact that each \(P_i\) is a symmetric matrix.

(iii)

\(Q_j^2 = Q_j\) for each \(j\)

Hint.

Use Theorem 31.8.

(iv)

\(Q_j Q_{\ell} = 0\) when \(j \neq \ell\)

Hint.

Use Theorem 31.8.

(v)

if \(E_{\mu_j}\) is the eigenspace for \(A\) corresponding to the eigenvalue \(\mu_j\text{,}\) and if \(\vv\) is in \(\R^n\text{,}\) then \(Q_j \vv = \proj_{E_{\mu_j}} \vv\text{.}\)

Hint.

Explain why \(\{\vu_{1_j}\text{,}\) \(\vu_{2_j}\text{,}\) \(\ldots\text{,}\) \(\vu_{m_j}\}\) is a orthonormal basis for \(E_{\mu_j}\text{.}\)

(c)

What is the rank of \(Q_j\text{?}\) Verify your answer.

8.

Label each of the following statements as True or False. Provide justification for your response.

(a) True/False.

Every real symmetric matrix is diagonalizable.

(b) True/False.

If \(P\) is a matrix whose columns are eigenvectors of a symmetric matrix, then the columns of \(P\) are orthogonal.

(c) True/False.

If \(A\) is a symmetric matrix, then eigenvectors of \(A\) corresponding to distinct eigenvalues are orthogonal.

(d) True/False.

If \(\vv_1\) and \(\vv_2\) are distinct eigenvectors of a symmetric matrix \(A\text{,}\) then \(\vv_1\) and \(\vv_2\) are orthogonal.

(e) True/False.

Any symmetric matrix can be written as a sum of symmetric rank 1 matrices.

(f) True/False.

If \(A\) is a matrix satisfying \(A^{\tr} = A\text{,}\) and \(\vu\) and \(\vv\) are vectors satisfying \(A \vu = 2 \vu\) and \(A \vv = -2 \vv\text{,}\) then \(\vu \cdot \vv = 0\text{.}\)

(g) True/False.

If an \(n\times n\) matrix \(A\) has \(n\) orthogonal eigenvectors, then \(A\) is a symmetric matrix.

(h) True/False.

If an \(n\times n\) matrix has \(n\) real eigenvalues (counted with multiplicity), then \(A\) is a symmetric matrix.

(i) True/False.

For each eigenvalue of a symmetric matrix, the algebraic multiplicity equals the geometric multiplicity.

(j) True/False.

If \(A\) is invertible and orthogonally diagonalizable, then so is \(A^{-1}\text{.}\)

(k) True/False.

If \(A, B\) are orthogonally diagonalizable \(n\times n\) matrices, then so is \(AB\text{.}\)

Subsection Project: The Second Derivative Test for Functions of Two Variables

In this project we will verify the Second Derivative Test for functions of two variables. 48  This test will involve Taylor polynomials and linear algebra. As a quick review, recall that the second order Taylor polynomial for a function \(f\) of a single variable \(x\) at \(x = a\) is

\begin{equation} P_2(x) = f(a)+f'(a)(x-a)+\frac{f''(a)}{2}(x-a)^2\text{.}\tag{27.4} \end{equation}

As with the linearization of a function, the second order Taylor polynomial is a good approximation to \(f\) around \(a\) — that is \(f(x) \approx P_2(x)\) for \(x\) close to \(a\text{.}\) If \(a\) is a critical number for \(f\) with \(f'(a) = 0\text{,}\) then

\begin{equation*} P_2(x) = f(a) + \frac{f''(a)}{2}(x-a)^2\text{.} \end{equation*}

In this situation, if \(f''(a) \lt 0\text{,}\) then \(\frac{f''(a)}{2}(x-a)^2 \leq 0\) for \(x\) close to \(a\text{,}\) which makes \(P_2(x) \leq f(a)\text{.}\) This implies that \(f(x) \approx P_2(x) \leq f(a)\) for \(x\) close to \(a\text{,}\) which makes \(f(a)\) a relative maximum value for \(f\text{.}\) Similarly, if \(f''(a) > 0\text{,}\) then \(f(a)\) is a relative minimum.

We now need a Taylor polynomial for a function of two variables. The complication of the additional independent variable in the two variable case means that the Taylor polynomials will need to contain all of the possible monomials of the indicated degrees. Recall that the linearization (or tangent plane) to a function \(f = f(x,y)\) at a point \((a,b)\) is given by

\begin{equation*} P_1(x,y) = f(a,b) + f_x(a,b)(x-a) + f_y(a,b)(y-b)\text{.} \end{equation*}

Note that \(P_1(a,b) = f(a,b)\text{,}\) \(\frac{\partial P_1}{\partial x}(a,b) = f_x(a,b)\text{,}\) and \(\frac{\partial P_1}{\partial y}(a,b) = f_y(a,b)\text{.}\) This makes \(P_1(x,y)\) the best linear approximation to \(f\) near the point \((a,b)\text{.}\) The polynomial \(P_1(x,y)\) is the first order Taylor polynomial for \(f\) at \((a,b)\text{.}\)

Similarly, the second order Taylor polynomial \(P_2(x,y)\) centered at the point \((a,b)\) for the function \(f\) is

\begin{align*} P_2(x,y) = f(a,b) \amp + f_x(a,b)(x-a) + f_y(a,b)(y-b) + \frac{f_{xx}(a,b)}{2}(x-a)^2\\ \amp + f_{xy}(a,b)(x-a)(y-b) + \frac{f_{yy}(a,b)}{2}(y-b)^2\text{.} \end{align*}

Project Activity 27.6.

To see that \(P_2(x,y)\) is the best approximation for \(f\) near \((a,b)\text{,}\) we need to know that the first and second order partial derivatives of \(P_2\) agree with the corresponding partial derivatives of \(f\) at the point \((a,b)\text{.}\) Verify that this is true.

We can rewrite this second order Taylor polynomial using matrices and vectors so that we can apply techniques from linear algebra to analyze it. Note that

\begin{align} P_2(x,y) \amp = f(a,b) + \nabla f(a,b)^{\tr} \left[ \begin{array}{c} x-a\notag\\ y-b \end{array} \right]\notag\\ \amp \qquad + \frac{1}{2}\left[ \begin{array}{c} x-a\notag\\ y-b \end{array} \right]^{\tr} \left[ \begin{array}{cc} f_{xx}(a,b)\amp f_{xy}(a,b)\notag\\ f_{xy}(a,b)\amp f_{yy}(a,b) \end{array} \right] \left[ \begin{array}{c} x-a\notag\\ y-b \end{array} \right]\text{,}\tag{27.5} \end{align}

where \(\nabla f(x,y) = \left[ \begin{array}{c} f_x(x,y)\\f_y(x,y) \end{array} \right]\) is the gradient of \(f\) and \(H\) is the Hessian of \(f\text{,}\) where \(H(x,y) = \left[ \begin{array}{cc} f_{xx}(x,y)\amp f_{xy}(x,y) \\ f_{yx}(x,y)\amp f_{yy}(x,y) \end{array} \right]\text{.}\) 49 

Project Activity 27.7.

Use Equation (27.5) to compute \(P_2(x,y)\) for \(f(x,y)=x^4+y^4-4xy+1\) at \((a, b)=(2,3)\text{.}\)

The important idea for us is that if \((a, b)\) is a point at which \(f_x\) and \(f_y\) are zero, then \(\nabla f\) is the zero vector and Equation (27.5) reduces to

\begin{equation} P_2(x,y) = f(a,b) + \frac{1}{2}\left[ \begin{array}{c} x-a\\y-b \end{array} \right]^{\tr} \left[ \begin{array}{cc} f_{xx}(a,b)\amp f_{xy}(a,b) \\ f_{xy}(a,b)\amp f_{yy}(a,b) \end{array} \right] \left[ \begin{array}{c} x-a\\y-b \end{array} \right]\text{,}\tag{27.6} \end{equation}

To make the connection between the multivariable second derivative test and properties of the Hessian, \(H(a,b)\text{,}\) at a critical point of a function \(f\) at which \(\nabla f = \vzero\text{,}\) we will need to connect the eigenvalues of a matrix to the determinant and the trace.

Let \(A\) be an \(n \times n\) matrix with eigenvalues \(\lambda_1\text{,}\) \(\lambda_2\text{,}\) \(\ldots\text{,}\) \(\lambda_n\) (not necessarily distinct). Exercise 1 in Section 18 shows that

\begin{equation} \det(A) = \lambda_1 \lambda_2 \cdots \lambda_n\text{.}\tag{27.7} \end{equation}

In other words, the determinant of a matrix is equal to the product of the eigenvalues of the matrix. In addition, Exercise 9 in Section 19 shows that

\begin{equation} \trace(A) = \lambda_1 + \lambda_2 + \cdots + \lambda_n\text{.}\tag{27.8} \end{equation}

for a diagonalizable matrix, where \(\trace(A)\) is the sum of the diagonal entries of \(A\text{.}\) Equation (27.8) is true for any square matrix, but we don't need the more general result for this project.

The fact that the Hessian is a symmetric matrix makes it orthogonally diagonalizable. We denote the eigenvalues of \(H(a,b)\) as \(\lambda_1\) and \(\lambda_2\text{.}\) Thus there exists an orthogonal matrix \(P\) and a diagonal matrix \(D = \left[ \begin{array}{cc} \lambda_1\amp 0 \\ 0\amp \lambda_2 \end{array} \right]\) such that \(P^{\tr}H(a,b)P=D\text{,}\) or \(H(a,b) = PDP^{\tr}\text{.}\) Equations (27.7) and (27.8) show that

\begin{equation*} \lambda_1\lambda_2 = f_{xx}(a,b)f_{yy}(a,b)-f_{xy}(a,b)^2 \ \text{ and } \ \lambda_1 + \lambda_2 = f_{xx}(a,b) + f_{yy}(a,b)\text{.} \end{equation*}

Now we have the machinery to verify the Second Derivative Test for Two-Variable Functions. We assume \((a,b)\) is a point in the domain of a function \(f\) so that \(\nabla f(a,b) = \vzero\text{.}\) First we consider the case where \(f_{xx}(a,b)f_{yy}(a,b)-f_{xy}(a,b)^2\lt 0\text{.}\)

Project Activity 27.8.

Explain why if \(f_{xx}(a,b)f_{yy}(a,b)-f_{xy}(a,b)^2\lt 0\text{,}\) then

\begin{equation*} \left[ \begin{array}{c} x-a \\ y-b \end{array} \right]^{\tr} H(a,b) \left[ \begin{array}{c} x-a \\ y-b \end{array} \right] \end{equation*}

is indefinite. Explain why this implies that \(f\) is “saddle-shaped” near \((a,b)\text{.}\)

Hint.

Substitute \(\vw = \left[ \begin{array}{c} w_1\\w_2 \end{array} \right] = P^{\tr}\left[ \begin{array}{c} x-a \\ y-b \end{array} \right]\text{.}\) What does the graph of \(f\) look like in the \(w_1\) and \(w_2\) directions?

Now we examine the situation when \(f_{xx}(a,b)f_{yy}(a,b)-f_{xy}(a,b)^2>0\text{.}\)

Project Activity 27.9.

Assume that \(f_{xx}(a,b)f_{yy}(a,b)-f_{xy}(a,b)^2>0\text{.}\)

(a)

Explain why either both \(f_{xx}(a,b)\) and \(f_{yy}(a,b)\) are positive or both are negative.

(b)

If \(f_{xx}(a,b)>0\) and \(f_{yy}(a,b)>0\text{,}\) explain why \(\lambda_1\) and \(\lambda_2\) must be positive.

(c)

Explain why, if \(f_{xx}(a,b)>0\) and \(f_{yy}(a,b)>0\text{,}\) then \(f(a,b)\) is a local minimum value for \(f\text{.}\)

When \(f_{xx}(a,b)f_{yy}(a,b)-f_{xy}(a,b)^2>0\) and either \(f_{xx}(a,b)\) or \(f_{yy}(a,b)\) is negative, a slight modification of the preceding argument leads to the fact that \(f\) has a local maximum at \((a,b)\) (the details are left to the reader). Therefore, we have proved the Second Derivative Test for functions of two variables!

Project Activity 27.10.

Use the Hessian to classify the local maxima, minima, and saddle points of \(f(x,y)=x^4+y^4-4xy+1\text{.}\) Draw a graph of \(f\) to illustrate.

Many thanks to Professor Paul Fishback for sharing his activity on this topic. Much of this project comes from his activity.
Note that under reasonable conditions (e.g., that \(f\) has continuous second order mixed partial derivatives in some open neighborhood containing \((x,y)\)) we have that \(f_{xy}(x,y) = f_{yx}(x,y)\) and \(H(x,y) = \left[ \begin{array}{cc} f_{xx}(a,b)\amp f_{xy}(a,b) \\ f_{xy}(a,b)\amp f_{yy}(a,b) \end{array} \right]\) is a symmetric matrix. We will only consider functions that satisfy these reasonable conditions.