Skip to main content

Section 18 The Characteristic Equation

Subsection Application: Modeling the Second Law of Thermodynamics

Pour cream into your cup of coffee and the cream spreads out; straighten up your room and it soon becomes messy again; when gasoline is mixed with air in a car's cylinders, it explodes if a spark is introduced. In each of these cases a transition from a low energy state (your room is straightened up) to a higher energy state (a messy, disorganized room) occurs. This can be described by entropy — a measure of the energy in a system. Low energy is organized (like ice cubes) and higher energy is not (like water vapor). It is a fundamental property of energy (as described by the second law of thermodynamics) that the entropy of a system cannot decrease. In other words, in the absence of any external intervention, things never become more organized.

The Ehrenfest model 34  is a Markov process proposed to explain the statistical interpretation of the second law of thermodynamics using the diffusion of gas molecules. This process can be modeled as a problem of balls and bins, as we will do later in this section. The characteristic polynomial of the transition matrix will help us find the eigenvalues and allow us to analyze our model.

Subsection Introduction

We have seen that the eigenvalues of an \(n \times n\) matrix \(A\) are the scalars \(\lambda\) so that \(A - \lambda I_n\) has a nontrivial null space. Since a matrix has a nontrivial null space if and only if the matrix is not invertible, we can also say that \(\lambda\) is an eigenvalue of \(A\) if

\begin{equation} \det(A - \lambda I_n) = 0\text{.}\tag{18.1} \end{equation}

This equation is called the characteristic equation of \(A\text{.}\) It provides us an algebraic way to find eigenvalues, which can then be used in finding eigenvectors corresponding to each eigenvalue. Suppose we want to find the eigenvalues of \(A=\left[ \begin{array}{cc} 1 \amp 1 \\ 1\amp 3 \end{array} \right]\text{.}\) Note that

\begin{equation*} A- \lambda I_2 = \left[ \begin{array}{cc} 1-\lambda \amp 1 \\ 1\amp 3-\lambda \end{array} \right]\,\text{,} \end{equation*}

with determinant \((1-\lambda)(3-\lambda)-1=\lambda^2-4\lambda+2\text{.}\) Hence, the eigenvalues \(\lambda_1, \lambda_2\) are the solutions of the characteristic equation \(\lambda^2-4\lambda+2=0\text{.}\) Using quadratic formula, we find that \(\lambda_1=2+\sqrt{2}\) and \(\lambda_2=2-\sqrt{2}\) are the eigenvalues.

In this activity, our goal will be to use the characteristic equation to obtain information about eigenvalues and eigenvectors of a matrix with real entries.

Preview Activity 18.1.

(a)

For each of the following parts, use the characteristic equation to determine the eigenvalues of \(A\text{.}\) Then, for each eigenvalue \(\lambda\text{,}\) find a basis of the corresponding eigenspace, i.e., \(\Nul(A-\lambda I)\text{.}\) You might want to recall how to find a basis for the null space of a matrix from Section 13. Also, make sure that your eigenvalue candidate \(\lambda\) yields nonzero eigenvectors in \(\Nul(A-\lambda I)\) for otherwise \(\lambda\) will not be an eigenvalue.

(i)

\(A=\left[ \begin{array}{cr} 2 \amp 0 \\ 0 \amp -3 \end{array} \right]\)

(ii)

\(A=\left[ \begin{array}{cc} 1 \amp 2 \\ 0 \amp 1 \end{array} \right]\)

(iii)

\(A=\left[ \begin{array}{cc} 1 \amp 4 \\2 \amp 3 \end{array} \right]\)

(b)

Use your eigenvalue and eigenvector calculations of the above problem as a guidance to answer the following questions about a matrix with real entries.

(i)

At most how many eigenvalues can a \(2\times 2\) matrix have? Is it possible to have no eigenvalues? Is it possible to have only one eigenvalue? Explain.

(ii)

If a matrix is an upper-triangular matrix (i.e., all entries below the diagonal are 0's, as in the first two matrices of the previous problem), what can you say about its eigenvalues? Explain.

(iii)

How many linearly independent eigenvectors can be found for a \(2\times 2\) matrix? Is it possible to have a matrix without 2 linearly independent eigenvectors? Explain.

(c)

Using the characteristic equation, determine which matrices have 0 as an eigenvalue.

Subsection The Characteristic Equation

Until now, we have been given eigenvalues or eigenvectors of a matrix and determined eigenvectors and eigenvalues from the known information. In this section we use determinants to find (or approximate) the eigenvalues of a matrix. From there we can find (or approximate) the corresponding eigenvectors. The tool we will use is a polynomial equation, the characteristic equation, of a square matrix whose roots are the eigenvalues of the matrix. The characteristic equation will then provide us with an algebraic way of finding the eigenvalues of a square matrix.

We have seen that the eigenvalues of a square matrix \(A\) are the scalars \(\lambda\) so that \(A - \lambda I\) has a nontrivial null space. Since a matrix has a nontrivial null space if and only if the matrix is not invertible, we can also say that \(\lambda\) is an eigenvalue of \(A\) if

\begin{equation} \det(A - \lambda I) = 0\text{.}\tag{18.2} \end{equation}

Note that if \(A\) is an \(n \times n\) matrix, then \(\det(A - \lambda I)\) is a polynomial of degree \(n\text{.}\) Furthermore, if \(A\) has real entries, the polynomial has real coefficients. This polynomial, and the equation (18.2) are given special names.

Definition 18.1.

Let \(A\) be an \(n \times n\) matrix. The characteristic polynomial of \(A\) is the polynomial

\begin{equation*} \det(A-\lambda I_n)\text{,} \end{equation*}

where \(I_n\) is the \(n \times n\) identity matrix. The characteristic equation of \(A\) is the equation

\begin{equation*} \det(A-\lambda I_n) = 0\text{.} \end{equation*}

So the characteristic equation of \(A\) gives us an algebraic way of finding the eigenvalues of \(A\text{.}\)

Activity 18.2.

(a)

Find the characteristic polynomial of the matrix \(A = \left[ \begin{array}{crc} 3\amp -2\amp 5 \\ 1\amp 0\amp 7 \\ 0\amp 0\amp 1 \end{array} \right]\text{,}\) and use the characteristic polynomial to find all of the eigenvalues of \(A\text{.}\)

(b)

Verify that 1 and 2 are the only eigenvalues of the matrix \(\left[ \begin{array}{cccc} 1\amp 0\amp 0\amp 1\\ 1\amp 2\amp 0\amp 0 \\ 0\amp 0\amp 1\amp 0 \\ 0\amp 0\amp 0\amp 1 \end{array} \right]\text{.}\)

As we argued in Preview Activity 18.1, a \(2 \times 2\) matrix can have at most 2 eigenvalues. For an \(n \times n\) matrix, the characteristic polynomial will be a degree \(n\) polynomial, and we know from algebra that a degree \(n\) polynomial can have at most \(n\) roots. Since an eigenvalue of a matrix is a root of the characteristic polynomial of that matrix, we can conclude that an \(n \times n\) matrix can have at most \(n\) distinct eigenvalues. Activity 18.2 (b) shows that a \(4 \times 4\) matrix may have fewer than \(4\) eigenvalues, however. Note that one of these eigenvalues, the eigenvalue 1, appears three times as a root of the characteristic polynomial of the matrix. The number of times an eigenvalue appears as a root of the characteristic polynomial is called the (algebraic) multiplicity of the eigenvalue. More formally:

Definition 18.2.

The (algebraic) multiplicity of an eigenvalue \(\lambda\) of a matrix \(A\) is the largest integer \(m\) so that \((x-\lambda)^m\) divides the characteristic polynomial of \(A\text{.}\)

Thus, in Activity 18.2 (b) the eigenvalue 1 has multiplicity 3 and the eigenvalue 2 has multiplicity 1. Notice that if we count the eigenvalues of an \(n \times n\) matrix with their multiplicities, the total will always be \(n\text{.}\)

If \(A\) is a matrix with real entries, then the characteristic polynomial will have real coefficients. It is possible that the characteristic polynomial can have complex roots, and that the matrix \(A\) has complex eigenvalues. The Fundamental Theorem of Algebra shows us that if a real matrix has complex eigenvalues, then those eigenvalues will appear in conjugate pairs, i.e., if \(\lambda_1=a+ib\) is an eigenvalue of \(A\text{,}\) then \(\lambda_2=a-ib\) is another eigenvalue of \(A\text{.}\) Furthermore, for an odd degree polynomial, since the complex eigenvalues will come in conjugate pairs, we will be able to find at least one real eigenvalue.

We now summarize the information we have so far about eigenvalues of an \(n\times n\) real matrix:

Subsection Eigenspaces, A Geometric Example

Recall that for each eigenvalue \(\lambda\) of an \(n \times n\) matrix \(A\text{,}\) the eigenspace of \(A\) corresponding to the eigenvalue \(\lambda\) is \(\Nul (A - \lambda I_n)\text{.}\) These eigenspaces can tell us important information about the matrix transformation defined by \(A\text{.}\) For example, consider the matrix transformation \(T\) from \(\R^3\) to \(\R^3\) defined by \(T(\vx) = A \vx\text{,}\) where

\begin{equation*} A = \left[ \begin{array}{ccc} 1\amp 0\amp 1\\0\amp 1\amp 1\\0\amp 0\amp 2 \end{array} \right]\text{.} \end{equation*}

We are interested in understanding what this matrix transformation does to vectors in \(\R^3\text{.}\) First we note that \(A\) has eigenvalues \(\lambda_1 = 1\) and \(\lambda_2 = 2\text{,}\) with \(\lambda_1\) having multiplicity \(2\text{.}\) There is a pair \(\vv_1 = \left[ \begin{array}{c} 1\\0\\0 \end{array} \right]\) and \(\vv_2 = \left[ \begin{array}{c} 0\\1\\0 \end{array} \right]\) of linearly independent eigenvectors for \(A\) corresponding to the eigenvalue \(\lambda_1\) and an eigenvector \(\vv_3=\left[ \begin{array}{c} 1\\1\\1 \end{array} \right]\) for \(A\) corresponding to the eigenvalue \(\lambda_2\text{.}\) Note that the vectors \(\vv_1\text{,}\) \(\vv_2\text{,}\) and \(\vv_3\) are linearly independent (recall from Theorem that eigenvectors corresponding to different eigenvalues are always linearly independent). So any vector \(\vb\) in \(\R^3\) can be written uniquely as a linear combination of \(\vv_1\text{,}\) \(\vv_2\text{,}\) and \(\vv_3\text{.}\) Let's now consider the action of the matrix transformation \(T\) on a linear combination of \(\vv_1\text{,}\) \(\vv_2\text{,}\) and \(\vv_2\text{.}\) Note that

\begin{align} T(c_1\vv_1 + c_2 \vv_2 + c_3 \vv_3) \amp = c_1T(\vv_1) + c_2T(\vv_2) + c_3 T(\vv_3)\notag\\ \amp = c_1 \lambda_1 \vv_1 + c_2 \lambda_1 \vv_2 + c_3 \lambda_2 \vv_3\notag\\ \amp = (1)(c_1\vv_1 + c_2 \vv_2) + (2)c_3 \vv_3\text{.}\tag{18.3} \end{align}

Equation (18.3) illustrates that it is most convenient to view the action of \(T\) in the coordinate system where \(\Span \{\vv_1\}\) serves as the \(x\)-axis, \(\Span \{\vv_2\}\) serves as the \(y\)-axis, and \(\Span \{\vv_3\}\) as the \(z\)-axis. In this case, we can visualize that when we apply the transformation \(T\) to a vector \(\vb = c_1 \vv_1 + c_2 \vv_2 + c_3 \vv_3\) in \(\R^3\) the result is an output vector that is unchanged in the \(\vv_1\)-\(\vv_2\) plane and scaled by a factor of \(2\) in the \(\vv_3\) direction. For example, consider the box whose sides are determined by the vectors \(\vv_1\text{,}\) \(\vv_2\text{,}\) and \(\vv_3\) as shown in Figure 18.4. The transformation \(T\) stretches this box by a factor of \(2\) in the \(\vv_3\) direction and leaves everything else alone, as illustrated in Figure 18.4. So the entire \(\Span \{\vv_1, \vv_2\})\) is unchanged by \(T\text{,}\) but \(\Span \{\vv_3\})\) is scaled by \(2\text{.}\) In this situation, the eigenvalues and eigenvectors provide the most convenient perspective through which to visualize the action of the transformation \(T\text{.}\)

Figure 18.4. A box and a transformed box.

This geometric perspective illustrates how each eigenvalue and the corresponding eigenspace of \(A\) tells us something important about \(A\text{.}\) So it behooves us to learn a little more about eigenspaces.

Subsection Dimensions of Eigenspaces

There is a connection between the dimension of the eigenspace of a matrix corresponding to an eigenvalue and the multiplicity of that eigenvalue as a root of the characteristic polynomial. Recall that the dimension of a subspace of \(\R^n\) is the number of vectors in a basis for the eigenspace. We investigate the connection between dimension and multiplicity in the next activity.

Activity 18.3.

(a)

Find the dimension of the eigenspace for each eigenvalue of matrix \(A = \left[ \begin{array}{crc} 3\amp -2\amp 5 \\ 1\amp 0\amp 7 \\ 0\amp 0\amp 1 \end{array} \right]\) from Activity 18.2 (a).

(b)

Find the dimension of the eigenspace for each eigenvalue of matrix \(A=\left[ \begin{array}{cccc} 1\amp 0\amp 0\amp 1\\ 1\amp 2\amp 0\amp 0 \\ 0\amp 0\amp 1\amp 0 \\ 0\amp 0\amp 0\amp 1 \end{array} \right]\) from Activity 18.2 (b).

(c)

Consider now a \(3\times 3\) matrix with 3 distinct eigenvalues \(\lambda_1, \lambda_2, \lambda_3\text{.}\)

(i)

Recall that a polynomial of degree can have at most three distinct roots. What does that say about the multiplicities of \(\lambda_1, \lambda_2, \lambda_3\text{?}\)

(ii)

Use the fact that eigenvectors corresponding to distinct eigenvalues are linearly independent to find the dimensions of the eigenspaces for \(\lambda_1, \lambda_2, \lambda_3\text{.}\)

The examples in Activity 18.3 all provide instances of the principle that the dimension of an eigenspace corresponding to an eigenvalue \(\lambda\) cannot exceed the multiplicity of \(\lambda\text{.}\) Specifically:

The examples we have seen raise another important point. The matrix \(A = \left[ \begin{array}{ccc} 1\amp 0\amp 1\\0\amp 1\amp 1\\0\amp 0\amp 2 \end{array} \right]\) from our geometric example has two eigenvalues \(1\) and \(2\text{,}\) with the eigenvalue 1 having multiplicity 2. If we let \(E_{\lambda}\) represent the eigenspace of \(A\) corresponding to the eigenvalue \(\lambda\text{,}\) then \(\dim(E_1)=2\) and \(\dim(E_2) = 1\text{.}\) If we change this matrix slightly to the matrix \(B = \left[ \begin{array}{crc} 2\amp 0\amp 1 \\ 0\amp 1\amp 1 \\ 0\amp 0\amp 1 \end{array} \right]\) we see that \(B\) has two eigenvalues \(1\) and \(2\text{,}\) with the eigenvalue 1 having multiplicity 2. However, in this case we have \(\dim(E_1) = 1\) (like the example in from Activity 18.2 (a) and Activity 18.3 (a)). In this case the vector \(\vv_1 = [1 \ 0 \ 0]^{\tr}\) forms a basis for \(E_2\) and the vector \(\vv_2 = [0 \ 1 \ 0]^{\tr}\) forms a basis for \(E_1\text{.}\) We can visualize the action of \(B\) on the square formed by \(\vv_1\) and \(\vv_2\) in the \(xy\)-plane as a scaling by 2 in the \(\vv_1\) direction as shown in Figure 18.6, but since we do not have a third linearly independent eigenvector, the action of \(B\) in the direction of \([0 \ 0 \ 1]^{\tr}\) is not so clear.

Figure 18.6. A box and a transformed box.

So the action of a matrix transformation can be more easily visualized if the dimension of each eigenspace is equal to the multiplicity of the corresponding eigenvalue. This geometric perspective leads us to define the geometric multiplicity of an eigenvalue.

Definition 18.7.

The geometric multiplicity of an eigenvalue of an \(n \times n\) matrix \(A\) is the dimension of the corresponding eigenspace \(\Nul (A-\lambda I_n)\text{.}\)

Subsection Examples

What follows are worked examples that use the concepts from this section.

Example 18.8.

Let \(A = \left[ \begin{array}{rcr} -1\amp 0\amp -2 \\ 2\amp 1\amp 2 \\ 0\amp 0\amp 1 \end{array} \right]\text{.}\)

(a)

Find the characteristic polynomial of \(A\text{.}\)

Solution.

The characteristic polynomial of \(A\) is

\begin{align*} p(\lambda) \amp = \det(A - \lambda I_3)\\ \amp = \det\left( \left[ \begin{array}{ccc} -1-\lambda\amp 0\amp -2\\ 2\amp 1-\lambda\amp 2\\ 0\amp 0\amp 1-\lambda \end{array}\right] \right)\\ \amp = (-1-\lambda)(1-\lambda)(1-\lambda)\text{.} \end{align*}
(b)

Factor the characteristic polynomial and find the eigenvalues of \(A\text{.}\)

Solution.

The eigenvalues of \(A\) are the solutions to the characteristic equation. Since

\begin{equation*} p(\lambda) = (-1-\lambda)(1-\lambda)(1-\lambda) = 0 \end{equation*}

implies \(\lambda = -1\) or \(\lambda = 1\text{,}\) the eigenvalues of \(A\) are \(1\) and \(-1\text{.}\)

(c)

Find a basis for each eigenspace of \(A\text{.}\)

Solution.

To find a basis for the eigenspace of \(A\) corresponding to the eigenvalue \(1\text{,}\) we find a basis for \(\Nul (A-I_3)\text{.}\) The reduced row echelon form of \(A - I_ 3 = \left[ \begin{array}{rcr} -2\amp 0\amp -2 \\ 2\amp 0\amp 2 \\ 0\amp 0\amp 0 \end{array} \right]\) is \(\left[ \begin{array}{ccc} 1\amp 0\amp 1 \\ 0\amp 0\amp 0 \\ 0\amp 0\amp 0 \end{array} \right]\text{.}\) If \(\vx = \left[ \begin{array}{c} x_1\\x_2\\x_3 \end{array} \right]\text{,}\) then \((A-I_3)\vx = \vzero\) has general solution

\begin{equation*} \vx = \left[ \begin{array}{c} x_1\\x_2\\x_3 \end{array} \right] = \left[ \begin{array}{r} -x_3\\x_2\\x_3 \end{array} \right] = x_2 \left[ \begin{array}{c} 0\\1\\0 \end{array} \right] + x_3\left[ \begin{array}{r} -1\\0\\1 \end{array} \right]\text{.} \end{equation*}

Therefore, \(\{[0 \ 1 \ 0]^{\tr}, [-1 \ 0 \ 1]^{\tr}\}\) is a basis for the eigenspace of \(A\) corresponding to the eigenvalue \(1\text{.}\) To find a basis for the eigenspace of \(A\) corresponding to the eigenvalue \(-1\text{,}\) we find a basis for \(\Nul (A+I_3)\text{.}\) The reduced row echelon form of \(A + I_ 3 = \left[ \begin{array}{ccr} 0\amp 0\amp -2 \\ 2\amp 2\amp 2 \\ 0\amp 0\amp 2 \end{array} \right]\) is \(\left[ \begin{array}{ccc} 1\amp 1\amp 0 \\ 0\amp 0\amp 1 \\ 0\amp 0\amp 0 \end{array} \right]\text{.}\) If \(\vx = \left[ \begin{array}{c} x_1\\x_2\\x_3 \end{array} \right]\text{,}\) then \((A+I_3)\vx = \vzero\) has general solution

\begin{equation*} \vx = \left[ \begin{array}{c} x_1\\x_2\\x_3 \end{array} \right] = \left[ \begin{array}{r} -x_2\\x_2\\0 \end{array} \right] = x_2 \left[ \begin{array}{r} -1\\1\\0 \end{array} \right]\text{.} \end{equation*}

Therefore, a basis for the eigenspace of \(A\) corresponding to the eigenvalue \(-1\) is \(\{[-1 \ 1 \ 0]^{\tr}\}\text{.}\)

(d)

Is it possible to find a basis for \(\R^3\) consisting of eigenvectors of \(A\text{?}\) Explain.

Solution.

Let \(\vv_1 = [0 \ 1 \ 0]^{\tr}, [-1 \ 0 \ 1]^{\tr}\text{,}\) \(\vv_2 = [-1 \ 0 \ 1]^{\tr}\text{,}\) and \(\vv_3 = [-1 \ 1 \ 0]^{\tr}\text{.}\) Since eigenvectors corresponding to different eigenvalues are linearly independent, and since neither \(\vv_1\) nor \(\vv_2\) is a scalar multiple of the other, we can conclude that the set \(\{\vv_1, \vv_2, \vv_3\}\) is a linearly independent set with \(3 = \dim(\R^3)\) vectors. Therefore, \(\{\vv_1, \vv_2, \vv_3\}\) is a basis for \(\R^3\) consisting of eigenvectors of \(A\text{.}\)

Example 18.9.

Find a \(3 \times 3\) matrix \(A\) that has an eigenvector \(\vv_1 = [1 \ 0 \ 1]^{\tr}\) with corresponding eigenvalue \(\lambda_1 = 2\text{,}\) an eigenvector \(\vv_2 = [0 \ 2 \ -3]^{\tr}\) with corresponding eigenvalue \(\lambda_2 = -3\text{,}\) and an eigenvector \(\vv_3 = [-4 \ 0 \ 5]^{\tr}\) with corresponding eigenvalue \(\lambda_3 = 5\text{.}\) Explain your process.

Solution.

We are looking for a \(3 \times 3\) matrix \(A\) such that \(A \vv_1 = 2 \vv_1\text{,}\) \(A \vv_2 = -3 \vv_2\) and \(A \vv_3 = 5 \vv_3\text{.}\) Since \(\vv_1\text{,}\) \(\vv_2\text{,}\) and \(\vv_3\) are eigenvectors corresponding to different eigenvalues, \(\vv_1\text{,}\) \(\vv_2\text{,}\) and \(\vv_3\) are linearly independent. So the matrix \([\vv_1 \ \vv_2 \ \vv_3]\) is invertible. It follows that

\begin{align*} A[\vv_1 \ \vv_2 \ \vv_3] \amp = [A\vv_1 \ A\vv_2 \ A\vv_3]\\ A \left[\begin{array}{rcr} 1\amp 0\amp -4\\ 0\amp 2\amp 0\\ 1\amp -3\amp 5 \end{array} \right] \amp = [2\vv_1 \ -3\vv_2 \ 5\vv_3]\\ A \left[\begin{array}{crr} 1\amp 0\amp -4\\ 0\amp 2\amp 0\\ 1\amp -3\amp 5 \end{array} \right] \amp = \left[ \begin{array}{crr} 2\amp 0\amp -20\\ 0\amp -6\amp 0\\ 2\amp 9\amp 25 \end{array} \right]\\ A \amp = \left[ \begin{array}{crr} 2\amp 0\amp -20\\ 0\amp -6\amp 0\\ 2\amp 9\amp 25 \end{array} \right] \left[\begin{array}{crr} 1\amp 0\amp -4\\ 0\amp 2\amp 0\\ 1\amp -3\amp 5 \end{array} \right]^{-1}\\ A \amp = \left[ \begin{array}{crr} 2\amp 0\amp -20\\ 0\amp -6\amp 0\\ 2\amp 9\amp 25 \end{array} \right] \left[ \begin{array}{rcc} \frac{5}{9}\amp \frac{2}{3}\amp \frac{4}{9}\\ 0\amp \frac{1}{2}\amp 0\\ -\frac{1}{9}\amp \frac{1}{6}\amp \frac{1}{9} \end{array} \right]\\ A \amp = \left[ \begin{array}{rrr} \frac{10}{3}\amp -2\amp -\frac{4}{3}\\ 0\amp -3\amp 0\\ -\frac{5}{3}\amp 10\amp \frac{11}{3} \end{array} \right]\text{.} \end{align*}

Subsection Summary

In this section we studied the characteristic polynomial of a matrix and similar matrices.

  • If \(A\) is an \(n \times n\) matrix, the characteristic polynomial of \(A\) is the polynomial

    \begin{equation*} \det(A-\lambda I_n)\text{,} \end{equation*}

    where \(I_n\) is the \(n \times n\) identity matrix.

  • If \(A\) is an \(n \times n\) matrix, the characteristic equation of \(A\) is the equation

    \begin{equation*} \det(A-\lambda I_n) = 0\text{.} \end{equation*}
  • The characteristic equation of a square matrix provides us an algebraic method to find the eigenvalues of the matrix.

  • The eigenvalues of an upper or lower-triangular matrix are the entries on the diagonal.

  • There are at most \(n\) eigenvalues of an \(n\times n\) matrix.

  • For a real matrix \(A\text{,}\) if an eigenvalue \(\lambda\) of \(A\) is complex, then the complex conjugate of \(\lambda\) is also an eigenvalue.

  • The algebraic multiplicity of an eigenvalue \(\lambda\) is the multiplicity of \(\lambda\) as a root of the characteristic equation.

  • The dimension of the eigenspace corresponding to an eigenvalue \(\lambda\) is less than or equal to the algebraic multiplicity of \(\lambda\text{.}\)

Exercises Exercises

1.

There is a useful relationship between the determinant and eigenvalues of a matrix \(A\) that we explore in this exercise.

(a)

Let \(B = \left[ \begin{array}{cc} 2\amp 3\\8\amp 4 \end{array} \right]\text{.}\) Find the determinant of \(B\) and the eigenvalues of \(B\text{,}\) and compare \(\det(B)\) to the eigenvalues of \(B\text{.}\)

(b)

Let \(A\) be an \(n \times n\) matrix. In this part of the exercise we argue the general case illustrated in the previous part — that \(\det(A)\) is the product of the eigenvalues of \(A\text{.}\) Let \(p(\lambda) = \det(A - \lambda I_n)\) be the characteristic polynomial of \(A\text{.}\)

(i)

Let \(\lambda_1\text{,}\) \(\lambda_2\text{,}\) \(\ldots\text{,}\) \(\lambda_n\) be the eigenvalues of \(A\) (note that these eigenvalues may not all be distinct). Recall that if \(r\) is a root of a polynomial \(q(x)\text{,}\) then \((x-r)\) is a factor of \(q(x)\text{.}\) Use this idea to explain why

\begin{equation*} p(\lambda) = (-1)^{n} (\lambda-\lambda_1)(\lambda- \lambda_2) \cdots (\lambda - \lambda_n)\text{.} \end{equation*}
(ii)

Explain why \(p(0) = \lambda_1 \lambda_2 \cdots \lambda_n\text{.}\)

(iii)

Why is \(p(0)\) also equal to \(\det(A)\text{.}\) Explain how we have shown that \(\det(A)\) is the product of the eigenvalues of \(A\text{.}\)

2.

Find the eigenvalues of the following matrices. For each eigenvalue, determine its algebraic and geometric multiplicity.

(a)

\(A=\left[ \begin{array}{ccc} 1\amp 1\amp 1\\1\amp 1\amp 1\\1\amp 1\amp 1 \end{array} \right]\)

(b)

\(A=\left[ \begin{array}{ccc} 2\amp 0\amp 3\\0\amp 1\amp 0\\0\amp 1\amp 2 \end{array} \right]\)

3.

Let \(A\) be an \(n \times n\) matrix. Use the characteristic equation to explain why \(A\) and \(A^\tr\) have the same eigenvalues.

4.

Find three \(3 \times 3\) matrices whose eigenvalues are 2 and 3, and for which the dimensions of the eigenspaces for \(\lambda=2\) and \(\lambda=3\) are different.

5.

Suppose \(A\) is an \(n\times n\) matrix and \(B\) is an invertible \(n\times n\) matrix. Explain why the characteristic polynomial of \(A\) is the same as the characteristic polynomial of \(BAB^{-1}\text{,}\) and hence, as a result, the eigenvalues of \(A\) and \(BAB^{-1}\) are the same.

6.

Label each of the following statements as True or False. Provide justification for your response.

(a) True/False.

If the determinant of a \(2 \times 2\) matrix \(A\) is positive, then \(A\) has two distinct real eigenvalues.

(b) True/False.

If two \(2 \times 2\) matrices have the same eigenvalues, then the have the same eigenvectors.

(c) True/False.

The characteristic polynomial of an \(n \times n\) matrix has degree \(n\text{.}\)

(d) True/False.

If \(R\) is the reduced row echelon form of an \(n \times n\) matrix \(A\text{,}\) then \(A\) and \(R\) have the same eigenvalues.

(e) True/False.

If \(R\) is the reduced row echelon form of an \(n \times n\) matrix \(A\text{,}\) and \(\vv\) is an eigenvector of \(A\text{,}\) then \(\vv\) is an eigenvector of \(R\text{.}\)

(f) True/False.

Let \(A\) and \(B\) be \(n \times n\) matrices with characteristic polynomials \(p_A(\lambda)\) and \(p_B(\lambda)\text{,}\) respectively. If \(A \neq B\text{,}\) then \(p_A(\lambda) \neq p_B(\lambda)\text{.}\)

(g) True/False.

Every matrix has at least one eigenvalue.

(h) True/False.

Suppose \(A\) is a \(3 \times 3\) matrix with three distinct eigenvalues. Then any three eigenvectors, one for each eigenvalue, will form a basis of \(\R^3\text{.}\)

(i) True/False.

If an eigenvalue \(\lambda\) is repeated 3 times among the eigenvalues of a matrix, then there are at most 3 linearly independent eigenvectors corresponding to \(\lambda\text{.}\)

Subsection Project: The Ehrenfest Model

To realistically model the diffusion of gas molecules we would need to consider a system with a large number of balls as substitutes for the gas molecules. However, the main idea can be seen in a model with a much smaller number of balls, as we will do now. Suppose we have two bins that contain a total of \(4\) balls between them. Label the bins as Bin 1 and Bin 2. In this case we can think of entropy as the number of different possible ways the balls can be arranged in the system. For example, there is only \(1\) way for all of the balls to be in Bin 1 (low entropy), but there are \(4\) ways that we can have one ball in Bin 1 (choose any one of the four different balls, which can be distinguished from each other) and \(3\) balls in Bin 2 (higher entropy). The highest entropy state has the balls equally distributed between the bins (with \(6\) different ways to do this).

We assume that there is a way for balls to move from one bin to the other (like having gas molecules pass through a permeable membrane). A way to think about this is that we select a ball (from ball 1 to ball 4, which are different balls) and move that ball from its current bin to the other bin. Consider a “move” to be any instance when a ball changes bins. A state is any configuration of balls in the bins at a given time, and the state changes when a ball is chosen at random and moved to the other bin. The possible states are to have 0 balls in Bin 1 and 4 balls in Bin 2 (State 0, entropy 1), 1 ball in Bin 1 and 3 in Bin 2 (State 1, entropy 4), 2 balls in each Bin (State 2, entropy 6), 3 balls in Bin 1 and 1 ball in Bin 2 (State 3, entropy 4), and 4 balls in Bin 1 and 0 balls in Bin 2 (State 4, entropy 1). These states are shown in Figure 18.10.

Figure 18.10. States

Project Activity 18.4.

To model the system of balls in bins we need to understand how the system can transform from one state to another. It suffices to count the number of balls in Bin 1 (since the remaining balls will be in Bin 2). Even though the balls are labeled, our count only cares about how many balls are in each bin. Let \(\vx_0 = [x_0, x_1, x_2, x_3, x_4]^{\tr}\text{,}\) where \(x_i\) is the probability that Bin 1 contains \(i\) balls, and let \(\vx_1 = \left[ x_0^1, x_1^1, x_2^1, x_3^1, x_4^1 \right]^{\tr}\text{,}\) where \(x_i^1\) is the probability that Bin 1 contains \(i\) balls after the first move. We will call the vectors \(\vx_0\) and \(\vx_1\) probability distributions of balls in bins. Note that since all four balls have to be placed in some bin, the sum of the entries in our probability distribution vectors must be \(1\text{.}\) Recall that a move is an instance when a ball changes bins. We want to understand how \(\vx_1\) is obtained from \(\vx_0\text{.}\) In other words, we want to figure out what the probability that Bin 1 contains 0, 1, 2, 3, or 4 balls after one ball changes bins if our initial probability distribution of balls in bins is \(\vx_0\text{.}\)

We begin by analyzing the ways that a state can change. For example,

  • Suppose there are \(0\) balls in Bin 1. (In our probability distribution \(\vx_0\text{,}\) this happens with probability \(x_0\text{.}\)) Then there are four balls in Bin 2. The only way for a ball to change bins is if one of the four balls moves from Bin 2 to Bin 1, putting us in State 1. Regardless of which ball moves, we will always be put in State 1, so this happens with a probability of \(1\text{.}\) In other words, if the probability that Bin 1 contains \(0\) balls is \(x_0\text{,}\) then there is a probability of \((1)x_0\) that Bin 1 will contain 1 ball after the move.

  • Suppose we have 1 ball in Bin 1. There are four ways this can happen (since there are four balls, and the one in Bin 1 is selected at random from the four balls), so the probability of a given ball being in Bin 1 is \(\frac{1}{4}\text{.}\)

    • If the ball in Bin 1 moves, that move puts us in State \(0\text{.}\) In other words, if the probability that Bin 1 contains 1 ball is \(x_1\text{,}\) then there is a probability of \(\frac{1}{4}x_1\) that Bin 1 will contain \(0\) balls after a move.

    • If any of the \(3\) balls in Bin 2 moves (each moves with probability \(\frac{3}{4}\)), that move puts us in State 2. In other words, if the probability that Bin 1 contains 1 ball is \(x_1\text{,}\) then there is a probability of \(\frac{3}{4}x_1\) that Bin 1 will contain \(2\) balls after a move.

(a)

Complete this analysis to explain the probabilities if there are \(2\text{,}\) \(3\text{,}\) or \(4\) balls in Bin 1.

(b)

Explain how the results of part (a) show that

\begin{alignat*}{6} {}x_0^1 \amp = \amp {0}x_0 \amp {+} \amp {\frac{1}{4}}x_1 \amp {+} \amp {0}x_2 \amp {+} \amp {0}x_3 \amp {+} \amp {0}x_4\\ {}x_1^1 \amp = \amp {1}x_0 \amp {+} \amp {0}x_1 \amp {+} \amp {\frac{1}{2}}x_2 \amp {+} \amp {0}x_3 \amp {+} \amp {0}x_4\\ {}x_2^1 \amp = \amp {0}x_0 \amp {+} \amp {\frac{3}{4}}x_1 \amp {+} \amp {0}x_2 \amp {+} \amp {\frac{3}{4}}x_3 \amp {+} \amp {0}x_4\\ {}x_3^1 \amp = \amp {0}x_0 \amp {+} \amp {0}x_1 \amp {+} \amp {\frac{1}{2}}x_2 \amp {+} \amp {0}x_3 \amp {+} \amp {1}x_4\\ {}x_4^1 \amp = \amp {0}x_0 \amp {+} \amp {0}x_1 \amp {+} \amp {0}x_2 \amp {+} \amp {\frac{1}{4}}x_3 \amp {+} \amp {0}x_4 \end{alignat*}

The system we developed in Project Activity 18.4 has matrix form

\begin{equation*} \vx_1 = T \vx_0\text{,} \end{equation*}

where \(T\) is the transition matrix

\begin{equation*} T = \left[ \begin{array}{ccccc} 0 \amp \frac{1}{4} \amp 0 \amp 0 \amp 0 \\ 1 \amp 0 \amp \frac{1}{2} \amp 0 \amp 0 \\ 0 \amp \frac{3}{4} \amp 0 \amp \frac{3}{4} \amp 0 \\ 0 \amp 0 \amp \frac{1}{2} \amp 0 \amp 1 \\ 0 \amp 0 \amp 0 \amp \frac{1}{4} \amp 0 \end{array} \right]\text{.} \end{equation*}

Subsequent moves give probability distribution vectors

\begin{align*} \vx_2 \amp = T\vx_1\\ \vx_3 \amp = T\vx_2\\ \vdots \amp \vdots\\ \vx_k \amp = T\vx_{k-1} \text{.} \end{align*}

This example is an example of a Markov process (see Definition 9.4). There are several questions we can ask about this model. For example, what is the long-term behavior of this system, and how does this model relate to entropy? That is, given an initial probability distribution vector \(\vx_0\text{,}\) the system will have probability distribution vectors \(\vx_1\text{,}\) \(\vx_2\text{,}\) \(\ldots\) after subsequent moves. What happens to the vectors \(\vx_k\) as \(k\) goes to infinity, and what does this tell us about entropy? To answer these questions, we will first explore the sequence \(\{\vx_k\}\) numerically, and then use the eigenvalues and eigenvectors of \(T\) to analyze the sequence \(\{\vx_k\}\text{.}\)

Project Activity 18.5.

Use appropriate technology to do the following.

(a)

Suppose we begin with a probability distribution vector \(\vx_0 = [1 \ 0 \ 0 \ 0 \ 0]^{\tr}\text{.}\) Calculate vectors \(\vx_k\) for enough values of \(k\) so that you can identify the long term behavior of the sequence. Describe this behavior.

(b)

Repeat part (a) with

(i)

\(\vx_0 = \left[0 \ \frac{1}{2} \ \frac{1}{2} \ 0 \ 0\right]^{\tr}\)

(ii)

\(\vx_0 = \left[0 \ \frac{1}{3} \ \frac{1}{3} \ 0 \ \frac{1}{3}\right]^{\tr}\)

(iii)

\(\vx_0 = \left[\frac{1}{5} \ \frac{1}{5} \ \frac{1}{5} \ \frac{1}{5} \ \frac{1}{5}\right]^{\tr}\)

In what follows, we investigate the behavior of the sequence \(\{\vx_k\}\) that we uncovered in Project Activity 18.5.

Project Activity 18.6.

We use the characteristic polynomial to find the eigenvalues of \(T\text{.}\)

(a)

Find the characteristic polynomial of \(T\text{.}\) Factor the characteristic polynomial into a product of linear polynomials to show that the eigenvalues of \(T\) are \(0\text{,}\) \(1\text{,}\) \(-1\text{,}\) \(\frac{1}{2}\) and \(-\frac{1}{2}\text{.}\)

(b)

As we will see a bit later, certain eigenvectors for \(T\) will describe the end behavior of the sequence \(\{\vx_k\}\text{.}\) Find eigenvectors for \(T\) corresponding to the eigenvalues \(1\) and \(-1\text{.}\) Explain how the eigenvector for \(T\) corresponding to the eigenvalue \(1\) explains the behavior of one of the sequences was saw in Project Activity 18.5. (Any eigenvector of \(T\) with eigenvalue \(1\) is called an equilibrium or steady state vector.)

Now we can analyze the behavior of the sequence \(\{\vx_k\}\text{.}\)

Project Activity 18.7.

To make the notation easier, we will let \(\vv_1\) be an eigenvector of \(T\) corresponding to the eigenvalue \(0\text{,}\) \(\vv_2\) an eigenvector of \(T\) corresponding to the eigenvalue \(1\text{,}\) \(\vv_3\) an eigenvector of \(T\) corresponding to the eigenvalue \(-1\text{,}\) \(\vv_4\) an eigenvector of \(T\) corresponding to the eigenvalue \(\frac{1}{2}\text{,}\) and \(\vv_5\) an eigenvector of \(T\) corresponding to the eigenvalue \(-\frac{1}{2}\text{.}\)

(a)

Explain why \(\{\vv_1, \vv_2, \vv_3, \vv_4, \vv_5\}\) is a basis of \(\R^5\text{.}\)

(b)

Let \(\vx_0\) be any initial probability distribution vector. Explain why we can write \(\vx_0\) as

\begin{equation*} \vx_0 = a_1 \vv_1 + a_2 \vv_2 + a_3 \vv_3 + a_4 \vv_4 + a_5 \vv_5 = \sum_{i=1}^5 a_i \vv_i \end{equation*}

for some scalars \(a_1\text{,}\) \(a_2\text{,}\) \(a_3\text{,}\) \(a_4\text{,}\) and \(a_5\text{.}\)

We can now use the eigenvalues and eigenvectors of \(T\) to write the vectors \(\vx_k\) in a convenient form. Let \(\lambda_1 = 0\text{,}\) \(\lambda_2=1\text{,}\) \(\lambda_3=-1\text{,}\) \(\lambda_4=\frac{1}{2}\text{,}\) and \(\lambda_5=-\frac{1}{2}\text{.}\) Notice that

\begin{align*} \vx_1 \amp = T \vx_0\\ \amp = T(a_1 \vv_1 + a_2 \vv_2 + a_3 \vv_3 + a_4 \vv_4 + a_5 \vv_5)\\ \amp = a_1 T\vv_1 + a_2 T\vv_2 + a_3 T\vv_3 + a_4 T\vv_4 + a_5 T\vv_5\\ \amp = a_1\lambda_1 \vv_1 + a_2\lambda_2 \vv_2 + a_3 \lambda_3 \vv_3 + a_4 \lambda_4 \vv_4 + a_5 \lambda_5 \vv_5\\ \amp = \sum_{i=1}^5 a_i \lambda_i \vv_i\text{.} \end{align*}

Similarly

\begin{equation*} \vx_2 = T \vx_1 = T\left(\sum_{i=1}^5 a_i \lambda_i\vv_i\right) = \sum_{i=1}^5 a_i \lambda_i T\vv_i = \sum_{i=1}^5 a_i\lambda_i^2 \vv_i\text{.} \end{equation*}

We can continue in this manner to ultimately show that for each positive integer \(k\) we have

\begin{equation} \vx_k = \sum_{i=1}^5 a_i\lambda_i^k \vv_i\tag{18.4} \end{equation}

when \(\vx_0 = \sum_{i=1}^5 a_i \vv_i\text{.}\)

Project Activity 18.8.

Recall that we are interested in understanding the behavior of the sequence \(\{\vx_k\}\) as \(k\) goes to infinity.

(a)

Equation (18.4) shows that we need to know \(\lim_{k \to \infty} \lambda_i^k\) for each \(i\) in order to analyze \(\lim_{k \to \infty} \vx_k\text{.}\) Calculate or describe these limits.

(b)

Use the result of part (a), Equation (18.4), and Project Activity 18.6 (b) to explain why the sequence \(\{\vx_k\}\) is either eventually fixed or oscillates between two states. Compare to the results from Project Activity 18.5. How are these results related to entropy? You may use the facts that

  • \(\vv_1 = [1 \ 0 \ -2 \ 0 \ 1]^{\tr}\) is an eigenvector for \(T\) corresponding to the eigenvalue \(0\text{,}\)

  • \(\vv_2 = [1 \ 4 \ 6 \ 4 \ 1]^{\tr}\) is an eigenvector for \(T\) corresponding to the eigenvalue \(1\text{,}\)

  • \(\vv_3 = [1 \ -4 \ 6 \ -4 \ 1]^{\tr}\) is an eigenvector for \(T\) corresponding to the eigenvalue \(-1\text{,}\)

  • \(\vv_4 = [-1 \ -2 \ 0\ 2 \ 1]^{\tr}\) is an eigenvector for \(T\) corresponding to the eigenvalue \(\frac{1}{2}\text{,}\)

  • \(\vv_5 = [-1 \ 2 \ 0 \ -2 \ 1]^{\tr}\) is an eigenvector for \(T\) corresponding to the eigenvalue \(-\frac{1}{2}\text{.}\)

named after Paul and Tatiana Ehrenfest who introduced it in “Über zwei bekannte Einwände gegen das Boltzmannsche H-Theorem,” Physikalishce Zeitschrift, vol. 8 (1907), pp. 311-314)