Skip to main content

Section 40 The Jordan Canonical Form

Subsection Application: The Bailey Model of an Epidemic

The COVID-19 epidemic has generated many mathematical and statistical models to try to understand the spread of the virus. In 1950 Norman Bailey proposed a simple stochastic model of the spread of an epidemic. The solution to the model involves matrix exponentials and the Jordan canonical form is a useful tool for calculating matrix exponentials.

Subsection Introduction

We have seen several different matrix factorizations so far, eigenvalue decomposition (Section 19), singular value decomposition (Section 29), QR factorization (Section 25), and LU factorization (Section 22). In this section, we investigate the Jordan canonical form, which in a way generalizes the eigenvalue decomposition. For matrices with an eigenvalue decomposition, the geometric multiplicity of each eigenvalue (the dimension of the corresponding eigenspace) must equal its algebraic multiplicity (the number of times the eigenvalue occurs as a root of the characteristic polynomial of the matrix). We know that not every matrix has an eigenvalue decomposition. However, every square matrix has a Jordan canonical form, in which we use generalized eigenvectors and block diagonal form to approximate the eigenvalue decomposition behavior. At the end of the section we provide a complete proof of the existence of the Jordan canonical form.

Subsection When an Eigenvalue Decomposition Does Not Exist

Recall that an eigenvalue decomposition of a matrix exists if and only if the algebraic multiplicity equals the geometric multiplicity for each eigenvalue. In this case, the whole space \(\R^n\) decomposes into eigenspaces corresponding to the eigenvalues and the matrix acts as scalar multiplication in each eigenspace. We consider some cases where such a decomposition exists and some where it does not to notice some differences between the two cases and think of ways to improvise the situation.

Preview Activity 40.1.

(a)

All of the matrices below have only one eigenvalue, \(\lambda = 2\) and characteristic polynomial \((\lambda-2)^n\) where \(n\) is the matrix size. Therefore, for each case, the algebraic multiplicity of this eigenvalue is the size of the matrix. Find the geometric multiplicity of this eigenvalue (i.e. the dimension of \(\Nul(A-\lambda I)\)) in each case below to see how the behavior is different in each case.

(i)

\(A = \left[ \begin{array}{cc} 2\amp 0 \\ 0\amp 2 \end{array} \right]\)

(ii)

\(B = \left[ \begin{array}{cc} 2\amp 1 \\ 0\amp 2 \end{array} \right]\)

(iii)

\(C = \left[ \begin{array}{ccc} 2\amp 1\amp 0 \\ 0\amp 2\amp 1 \\ 0\amp 0\amp 2 \end{array} \right]\)

(iv)

\(D = \left[ \begin{array}{ccc} 2\amp 1\amp 0 \\ 0\amp 2\amp 0 \\ 0\amp 0\amp 2 \end{array} \right]\)

(v)

\(E = \left[ \begin{array}{ccc} 2\amp 0\amp 0 \\ 0\amp 2\amp 0 \\ 0\amp 0\amp 2 \end{array} \right]\)

(vi)

\(F = \left[ \begin{array}{cccc} 2\amp 1\amp 0\amp 0 \\ 0\amp 2\amp 0\amp 0 \\ 0\amp 0\amp 2\amp 1\\ 0\amp 0\amp 0\amp 2 \end{array} \right]\)

(vii)

\(G = \left[ \begin{array}{cccc} 2\amp 1\amp 0\amp 0 \\ 0\amp 2\amp 1\amp 0 \\ 0\amp 0\amp 2\amp 1\\ 0\amp 0\amp 0\amp 2 \end{array} \right]\)

(viii)

\(H = \left[ \begin{array}{cccc} 2\amp 1\amp 0\amp 0 \\ 0\amp 2\amp 0\amp 0 \\ 0\amp 0\amp 2\amp 0\\ 0\amp 0\amp 0\amp 2 \end{array} \right]\)

(ix)

\(J = \left[ \begin{array}{cccc} 2\amp 1\amp 0\amp 0 \\ 0\amp 2\amp 1\amp 0 \\ 0\amp 0\amp 2\amp 0\\ 0\amp 0\amp 0\amp 2 \end{array} \right]\)

(b)

In the examples above, the only matrices where the algebraic and geometric multiplicities of the eigenvalue 2 are equal are the diagonal matrices, which obviously have eigenvalue decompositions. The existence of ones above the diagonal destroys this property. However, the positioning of the ones is also strategical. By letting ones above the diagonal determine how to split the matrix into diagonal blocks, we can categorize the matrices. For example, the matrix in part (c) has one big \(3\times 3\) block with twos on the diagonal and ones above, while the matrix in part (d) as a \(2\times 2\) block at the top left and another \(1\times 1\) block at the bottom right. Determine the blocks for all matrices above, and identify the relationship between the number of blocks and the geometric multiplicity.

(c)

For this last problem, we will focus on the matrix \(B\) above. We know that \(\vv_1 = \left[ \begin{array}{c} 1 \\ 0 \end{array} \right]\) is an eigenvector. We do not have a second linearly independent eigenvector, i.e. a solution to \((B-2I_2)\vx=\vzero\text{.}\) However, since \((B-2I_2)^2=0\text{,}\) we know that \((B-2I_2)(B-2I_2)\vx = \vzero\) for any \(\vx\text{,}\) which implies that \((B-2I_2)\vx\) always lies inside the eigenspace corresponding to \(\lambda=2\text{.}\) Find \((B-2I_2)\left[ \begin{array}{c} 0 \\ 1 \end{array} \right]\) to verify this works in this particular case.

Subsection Generalized Eigenvectors and the Jordan Canonical Form

If an \(n \times n\) matrix \(A\) has \(n\) linearly independent eigenvectors, then \(A\) is similar to a diagonal matrix with the eigenvalues along the diagonal. You discovered in Preview Activity 40.1 a matrix without enough linearly independent eigenvectors can look close enough to a diagonal matrix. We now investigate whether this generalized representation can be achieved for all matrices.

Activity 40.2.

Let \(M = \left[ \begin{array}{crc} 3 \amp -1 \amp 0 \\ 4 \amp 7 \amp 0 \\ 0\amp 0\amp 1 \end{array} \right]\text{.}\) The characteristic polynomial of \(M\) is \((\lambda-1)(\lambda - 5)^2\text{,}\) so the eigenvalues of \(M\) are \(1\) and \(5\text{.}\) For \(\lambda=1\text{,}\) the eigenspace is also one dimensional and the vector \(\vv_1 = \left[ \begin{array}{c} 0 \\ 0 \\ 1 \end{array} \right]\) is an eigenvector. For \(\lambda=5\text{,}\) although the algebraic multiplicity is 2, the corresponding eigenspace is only one dimensional and \(\vv_0 = \left[ \begin{array}{r} -1\\2\\0 \end{array} \right]\) is an eigenvector. We cannot diagonalize matrix \(M\) because we do not have a second linearly independent eigenvector for the eigenvalue 5, which would give us the three linearly independent eigenvectors we need. Using the idea as in the last problem of Preview Activity 40.1, we can find another linearly independent vector to give us a matrix close to a diagonal matrix.

(a)

Find a vector \(\vv_3\) that is in \(\Nul (M-5I_3)^2\) but not in \(\Nul (M-5I_3)\text{.}\) Then calculate \((M-5I_3)\vv_3\text{.}\) How is \(\vv_2 = (M-5I_3)\vv_3\) related to \(\vv_0\text{?}\)

(b)

Let \(C = [\vv_1 \ \vv_2 \ \vv_3]\) and calculate the product \(C^{-1}MC\) for our example matrix \(M\text{.}\)

(c)

Describe the effect of \((M-I_3)\) and \((M-5I_3)^2\) on each of the column vectors of \(C\) and explain why this justifies that \((M-I_3)(M-5I_3)^2\) is the 0 matrix.

Activity 40.2 shows that even if a matrix does not have a full complement of linearly independent eigenvectors, we can still almost diagonalize the matrix. If \(A\) is an \(n \times n\) matrix that does not have \(n\) linearly independent eigenvectors (such matrices area called defective), the matrix \(A\) is still similar to a matrix that is “close” to diagonal. Activity 40.2 shows this for the case of a \(3 \times 3\) matrix \(M\) with two eigenvalues, \(\lambda_1, \lambda_2\) with \(\lambda_2\) having algebraic multiplicity two, and each eigenvalue with one-dimensional eigenspace. Let \(\vv_1\) be an eigenvector corresponding to \(\lambda_1\text{.}\) Because \(\lambda_2\) also has a one-dimensional eigenspace, the matrix \(M\) has only two linearly independent eigenvectors. Nevertheless, as you saw in Activity 40.2, we can find a vector \(\vv_3\) that is in \(\Nul (M - \lambda_2 I)^2\) but not in \(\Nul (M-\lambda_2 I)\text{.}\) From this, it follows that the vector \(\vv_2 = (M - \lambda_2 I) \vv_3\) is an eigenvector of \(M\) with eigenvalue \(\lambda_2\text{.}\) In this case we have \(M \vv_3 = \lambda_2 \vv_3 + \vv_2\) and

\begin{align*} M [\vv_1 \ \vv_2 \ \vv_3] \amp = [M\vv_1 \ M\vv_2 \ M\vv_3]\\ \amp = [\lambda_1 \vv_1 \, \ \, \lambda_2 \vv_2 \, \ \, \lambda_2 \vv_3+\vv_2 ]\\ \amp = [\vv_1 \ \vv_2 \ \vv_3] \left[ \begin{array}{ccc} \lambda_1 \amp 0 \amp 0\\ 0 \amp \lambda_2 \amp 1\\ 0 \amp 0 \amp \lambda_2 \end{array} \right]\text{.} \end{align*}

The lack of two linearly independent eigenvectors for this eigenvalue of algebraic multiplicity two will ensure that \(M\) is not similar to a diagonal matrix, but \(M\) is similar to a matrix with a diagonal block of the form \(\left[ \begin{array}{cc} \lambda \amp 1 \\ 0 \amp \lambda \end{array} \right]\text{.}\) So even though the matrix \(M\) is not diagonalizable, we can find an invertible matrix \(C = [\vv_1 \ \vv_2 \ \vv_3]\) so that \(C^{-1}AC\) is almost diagonalizable.

In general, suppose we have an eigenvalue \(\lambda\) of a matrix \(A\) with algebraic multiplicity two but with a one-dimensional eigenspace. Then the eigenvalue \(\lambda\) is deficient — that is \(\dim(E_{\lambda})\) is strictly less than the algebraic multiplicity of \(\lambda\text{,}\) i.e. \(E_{\lambda}\) does not contain enough linearly independent eigenvectors for \(\lambda\text{.}\) But as we saw above, we may be able to find a vector \(\vv_2\) that is in \(\Nul (A - \lambda I)^2\) but not in \(\Nul (A - \lambda I)\text{.}\) When we let \(\vv_1 = (A-\lambda I) \vv_2\text{,}\) we then also have

\begin{equation} \vzero = (A-\lambda I)^2\vv_2 = (A-\lambda I)\vv_1\tag{40.1} \end{equation}

as we argued in Preview Activity 40.1, and so \(\vv_1\) is an eigenvector for \(A\) with eigenvalue \(\lambda\text{.}\) It is vectors that satisfy an equation like (40.1) that drive the Jordan canonical form. These vectors are similar to eigenvectors and are called generalized eigenvectors.

Definition 40.1.

Let \(A\) be an \(n \times n\) matrix with eigenvalue \(\lambda\text{.}\) A generalized eigenvector of \(A\) corresponding to the eigenvalue \(\lambda\) is a non-zero vector \(\vx\) satisfying

\begin{equation*} (A - \lambda I_n)^m \vx = \vzero \end{equation*}

for some positive integer \(m\text{.}\)

In other words, a generalized eigenvector of an \(n \times n\) matrix \(A\) corresponding to the eigenvalue \(\lambda\) is a nonzero vector in \(\Nul (A - \lambda I_n)^m\) for some \(m\text{.}\) Note that every eigenvector of \(A\) is a generalized eigenvector (with \(m=1\)). In Preview Activity 40.1, \(A\) is a \(3 \times 3\) matrix with eigenvalue \(\lambda = 5\) having algebraic multiplicity 2 and geometric multiplicity 1. We were able to see that \(A\) is similar to a matrix of the form \(\left[ \begin{array}{ccc} 1 \amp 0\amp 0 \\ 0 \amp 5 \amp 1 \\ 0\amp 0\amp 5 \end{array} \right]\) because of the existence of a generalized eigenvector for the eigenvalue \(5\text{.}\)

The example in Preview Activity 40.1 presents the basic idea behind how we can find a “simple” matrix that is similar to any square matrix, even if that matrix is not diagonalizable. The key is to find generalized eigenvectors for eigenvalues whose algebraic multiplicities exceed their geometric multiplicities. One way to do this is indicated in Preview Activity 40.1 and in the next activity.

Activity 40.3.

Let

\begin{equation*} A = \left[ \begin{array}{ccr} 5\amp 1\amp -4\\4\amp 3\amp -5\\3\amp 1\amp -2 \end{array} \right]\text{.} \end{equation*}

The matrix \(A\) has \(\lambda = 2\) as its only eigenvalue, and the geometric multiplicity of \(\lambda\) as an eigenvalue is 1. For this activity you may use the fact that the reduced row echelon forms of \(A-2I\text{,}\) \((A-2I)^2\text{,}\) and \((A-2I)^3\) are, respectively,

\begin{equation*} \left[ \begin{array}{ccr} 1\amp 0\amp -1\\0\amp 1\amp -1\\0\amp 0\amp 0 \end{array} \right], \ \left[ \begin{array}{ccr} 1\amp 0\amp -1\\0\amp 0\amp 0\\0\amp 0\amp 0 \end{array} \right], \ \left[ \begin{array}{ccc} 0\amp 0\amp 0\\0\amp 0\amp 0\\0\amp 0\amp 0 \end{array} \right]\text{.} \end{equation*}
(a)

To begin, we look for a vector \(\vv_3\) that is in \(\Nul (A-2I_3)^3\) that is not in \(\Nul (A-2I_3)^2\text{.}\) Find such a vector.

(b)

Let \(\vv_2 = (A-2I_3)\vv_3\text{.}\) Show that \(\vv_2\) is in \(\Nul (A-2I_3)^2\) but is not in \(\Nul (A-2I_3)\text{.}\)

(c)

Let \(\vv_1 = (A-2I_3)\vv_2\text{.}\) Show that \(\vv_1\) is an eigenvector of \(A\) with eigenvalue \(2\text{.}\) That is, \(\vv_1\) is in \(\Nul (A-2I_3)\text{.}\)

(d)

Let \(C = [\vv_1 \ \vv_2 \ \vv_3]\text{.}\) Calculate the matrix product \(C^{-1}AC\text{.}\) What do you notice?

It is the equations \((A- 2I)\vv_{i+1} = \vv_i\) from Activity 40.3 that give us this simple form \(\left[ \begin{array}{ccc} 2\amp 1\amp 0 \\ 0\amp 2\amp 1 \\ 0\amp 0\amp 2 \end{array} \right]\text{.}\) To better understand why, notice that the equations imply that \(A\vv_{i+1} = 2\vv_{i+1} + \vv_i\text{.}\) So if \(C = [\vv_1 \ \vv_2 \ \vv_3]\text{,}\) then

\begin{align*} AC \amp = [A\vv_1 \ A\vv_2 \ A\vv_3]\\ \amp = [2 \vv_1 \ 2 \vv_2 + \vv_1 \ 2 \vv_3 + \vv_2]\\ \amp = [\vv_1 \ \vv_2 \ \vv_3] \left[ \begin{array}{ccc} 2\amp 1\amp 0\\ 0\amp 2\amp 1\\ 0\amp 0\amp 2 \end{array} \right]\text{.} \end{align*}

This method will provide us with the Jordan canonical form. The major reason that this method always works is contained in the following theorem whose proof follows from the proof of the existence of the Jordan canonical form (presented later).

To find generalized eigenvectors, then, we find a value of \(p\) so that \(\dim(\Nul(A-\lambda I_n)^p) = m\) and then find a vector \(\vv_p\) that is in \(\Nul(A-\lambda I_n)^p\) but not in \(\Nul(A-\lambda I_n)^{p-1}\text{.}\) Successive multiplications by \(A - \lambda I_n\) provide a sequence of generalized eigenvectors. The sequence

\begin{equation*} \vv_p \underset{A - \lambda I_n}{\rightarrow} \vv_{p-1} \underset{A - \lambda I_n}{\rightarrow} \cdots \ \underset{A - \lambda I_n}{\rightarrow} \vv_1 \underset{A - \lambda I_n}{\rightarrow} \vzero \end{equation*}

is called a generalized eigenvector chain.

Activity 40.4.

Let

\begin{equation*} A = \left[ \begin{array}{rrrc} 0\amp 0\amp 0\amp 2 \\ -6\amp 0\amp -2\amp 10 \\ -1\amp -1\amp 1\amp 3 \\ -3\amp -1\amp -1\amp 7 \end{array} \right]\text{.} \end{equation*}

The only eigenvalue of \(A\) is \(\lambda = 2\) and \(\lambda\) has geometric multiplicity 2. The vectors \([0 \ -1 \ 1 \ 0]\) and \([1 \ 2\ 0 \ 1]^{\tr}\) are eigenvectors for \(A\text{.}\) The reduced row echelon forms for \(A - \lambda I_4\text{,}\) \((A - \lambda I_4)^2\text{,}\) \((A - \lambda I_4)^3\) are, respectively,

\begin{equation*} \left[ \begin{array}{cccr} 1\amp 0\amp 0\amp -1 \\ 0\amp 1\amp 1\amp -2 \\ 0\amp 0\amp 0\amp 0 \\ 0\amp 0\amp 0\amp 0 \end{array} \right], \ \left[ \begin{array}{cccr} 1\amp 1\amp 1\amp -3 \\ 0\amp 0\amp 0\amp 0 \\ 0\amp 0\amp 0\amp 0 \\ 0\amp 0\amp 0\amp 0 \end{array} \right], \ \text{ and } \ \left[ \begin{array}{cccc} 0\amp 0\amp 0\amp 0 \\ 0\amp 0\amp 0\amp 0 \\ 0\amp 0\amp 0\amp 0 \\ 0\amp 0\amp 0\amp 0 \end{array} \right]\text{.} \end{equation*}
(b)

Find a vector \(\vv_3\) in \(\Nul (A - \lambda I_4)^3\) that is not in \(\Nul (A - \lambda I_4)^2\text{.}\)

(c)

Now let \(\vv_2 = (A-\lambda I_4) \vv_3\) and \(\vv_1 = (A-\lambda I_4) \vv_2\text{.}\) What special property does \(\vv_1\) have?

(d)

Find a fourth vector \(\vv_0\) so that \(\{\vv_0, \vv_1, \vv_2, \vv_3\}\) is a basis of \(\R^4\) consisting of generalized eigenvectors of \(A\text{.}\) Let \(C = [\vv_0 \ \vv_1 \ \vv_2 \ \vv_3]\text{.}\) Calculate the product \(C^{-1}AC\text{.}\) What do you see?

The previous activities illustrate the general idea for almost diagonalizing an arbitrary square matrix. First let \(A\) be an \(n \times n\) matrix with an eigenvalue \(\lambda\) of of algebraic multiplicity \(n\) and geometric multiplicity \(k\text{.}\) If \(\vv_1\text{,}\) \(\vv_2\text{,}\) \(\ldots\text{,}\) \(\vv_k\) are linearly independent eigenvectors for \(A\text{,}\) then we can extend the set as we did in Activity 40.4 above with generalized eigenvectors to a basis \(\{\vv_1, \vv_2, \ldots, \vv_k, \vv_{k+1}, \ldots, \vv_n\}\) of \(\R^n\text{.}\) The matrix \(C = [\vv_1 \ \vv_2 \ \cdots \ \vv_n]\) has the property that \(C^{-1}AC\) is almost diagonal. By almost, we mean that \(C^{_1}AC\) has block matrices along the diagonal that look like

\begin{equation} \left[ \begin{array}{ccccccc} \lambda\amp 1\amp 0 \amp \cdots\amp 0\amp 0\amp 0 \\ 0\amp \lambda\amp 1\amp \cdots\amp 0\amp 0\amp 0 \\ 0\amp 0\amp \lambda\amp \cdots\amp 0\amp 0\amp 0 \\ \amp \amp \amp \ddots\amp \ddots\amp \amp \\ 0\amp 0\amp 0\amp \cdots\amp \lambda\amp 1\amp 0 \\ 0\amp 0\amp 0\amp \cdots\amp 0\amp \lambda\amp 1 \\ 0\amp 0\amp 0\amp \cdots\amp 0\amp 0\amp \lambda \end{array} \right]\text{.}\tag{40.2} \end{equation}

and has zeros everywhere else. A matrix of the form (40.2) is called a Jordan block.

If \(A\) is an \(n \times n\) matrix with eigenvalues \(\lambda_1\text{,}\) \(\lambda_2\text{,}\) \(\ldots\text{,}\) \(\lambda_k\text{,}\) we repeat this process with every eigenvalue of \(A\) to construct an invertible matrix \(C\) so that \(C^{-1}AC\) is of the form

\begin{equation} \left[ \begin{array}{cccc} J_1\amp 0\amp \cdots\amp 0 \\ 0\amp J_2\amp \cdots\amp 0 \\ \vdots\amp \vdots\amp \ddots\amp \vdots \\ 0\amp 0\amp \cdots\amp J_t \end{array} \right]\text{,}\tag{40.3} \end{equation}

where each matrix \(J_i\) is a Jordan block (note that a \(1 \times 1\) Jordan block is allowable). The form in (40.3) is called the Jordan canonical form or Jordan normal form of the matrix \(A\text{.}\) Later in this section we will prove the following theorem.

Another example may help illustrate the process.

Activity 40.5.

Let \(A = \left[ \begin{array}{rrcccc} 4\amp -1\amp 1\amp 0\amp 0\amp 0 \\ 0\amp 3\amp 1\amp 0\amp 0\amp 0 \\ 0\amp 0\amp 3\amp 1\amp 0\amp 0 \\ 0\amp 0\amp 0\amp 3\amp 1\amp 0 \\ -1\amp 1\amp 0\amp 0\amp 4\amp 1 \\ 0\amp 0\amp 0\amp 0\amp 0\amp 4 \end{array} \right]\text{.}\) The eigenvalues of \(A\) are \(3\) and \(4\text{,}\) both with algebraic multiplicity 3. A basis for the eigenspace \(E_3\) corresponding to the eigenvalue \(3\) is \(\{[1 \ 1 \ 0 \ 0 \ 0 \ 0]^{\tr}\}\) and a basis for the eigenspace \(E_4\) corresponding to the eigenvalue \(4\) is \(\{[1 \ 0 \ 0 \ 0 \ 0 \ 1]^{\tr}, [1 \ 1 \ 1 \ 1 \ 1 \ 0]^{\tr}\}\text{.}\) In this activity we find a Jordan canonical form of \(A\text{.}\)

(a)

Assume that the reduced row echelon forms of \(A -3 I_6\text{,}\) \((A-3I_6)^2\text{,}\) and \((A-3I_6)^3\) are, respectively,

\begin{equation*} \left[ \begin{array}{crcccc} 1\amp -1\amp 0\amp 0\amp 0\amp 0\\ 0\amp 0\amp 1\amp 0\amp 0\amp 0 \\ 0\amp 0\amp 0\amp 1\amp 0\amp 0\\ 0\amp 0\amp 0\amp 0\amp 1\amp 0 \\ 0\amp 0\amp 0\amp 0\amp 0\amp 1 \\ 0\amp 0\amp 0\amp 0\amp 0\amp 0 \end{array} \right], \ \left[ \begin{array}{crcccc} 1\amp -1\amp 0\amp 0\amp 0\amp 0\\ 0\amp 0\amp 0\amp 1\amp 0\amp 0\\ 0\amp 0\amp 0\amp 0\amp 1\amp 0\\ 0\amp 0\amp 0\amp 0\amp 0\amp 1 \\ 0\amp 0\amp 0\amp 0\amp 0\amp 0 \\ 0\amp 0\amp 0\amp 0\amp 0\amp 0 \end{array} \right], \ \text{ and } \ \left[ \begin{array}{crcccc} 1\amp -1\amp 0\amp 0\amp 0\amp 0\\ 0\amp 0\amp 0\amp 0\amp 1\amp 0\\ 0\amp 0\amp 0\amp 0\amp 0\amp 1\\ 0\amp 0\amp 0\amp 0\amp 0\amp 0 \\ 0\amp 0\amp 0\amp 0\amp 0\amp 0 \\ 0\amp 0\amp 0\amp 0\amp 0\amp 0 \end{array} \right]\text{.} \end{equation*}

Find a vector \(\vv_3\) that is in \(\Nul (A-3I_6)^3\) but not in \(\Nul (A-3I_6)^2\text{.}\) Then let \(\vv_2 = (A-3I_6)\vv_3\) and \(\vv_1 = (A-3I_6)\vv_2\text{.}\) Notice that we obtain a string of three generalized eigenvectors.

(b)

Assume that the reduced row echelon forms of \(A -4 I_6\) and \((A-4I_6)^2\) are, respectively,

\begin{equation*} \left[ \begin{array}{ccccrr} 1\amp 0\amp 0\amp 0\amp -1\amp -1\\ 0\amp 1\amp 0\amp 0\amp -1\amp 0\\ 0\amp 0\amp 1\amp 0\amp -1\amp 0\\ 0\amp 0\amp 0\amp 1\amp -1\amp 0 \\ 0\amp 0\amp 0\amp 0\amp 0\amp 0 \\ 0\amp 0\amp 0\amp 0\amp 0\amp 0 \end{array} \right] \ \text{ and } \ \left[ \begin{array}{cccrcr} 1\amp 0\amp 0\amp -4\amp 3\amp -1\\ 0\amp 1\amp 0\amp -3\amp 2\amp 0 \\ 0\amp 0\amp 1\amp -2\amp 1\amp 0\\ 0\amp 0\amp 0\amp 0\amp 0\amp 0\\ 0\amp 0\amp 0\amp 0\amp 0\amp 0\\ 0\amp 0\amp 0\amp 0\amp 0\amp 0 \end{array} \right]\text{.} \end{equation*}

Find a vector \(\vv_5\) that is in \(\Nul (A-4I_6)^2\) but not in \(\Nul (A-4I_6)\text{.}\) Then let \(\vv_4 = (A-4I_6)\vv_5\text{.}\) Notice that we obtain a string of two generalized eigenvectors.

(c)

Find a generalized eigenvector \(\vv_6\) for \(A\) such that \(\{\vv_1, \vv_2, \vv_3, \vv_4, \vv_5, \vv_6\}\) is a basis for \(\R^6\text{.}\) Let \(C = [\vv_1 \ \vv_2 \ \vv_3 \ \vv_4 \ \vv_5 \ \vv_6]\text{.}\) Calculate \(J=C^{-1}AC\text{.}\) Make sure that \(J\) is a matrix in Jordan canonical form.

(d)

How does the matrix \(J\) tell us about the eigenvalues of \(A\) and their algebraic multiplicities?

(e)

How many Jordan blocks are there in \(J\) for the eigenvalue \(3\text{?}\) How many Jordan blocks are there in \(J\) for the eigenvalue \(4\text{?}\) How do these numbers compare to the geometric multiplicities of \(3\) and \(4\) as eigenvalues of \(A\text{?}\)

The previous activities highlight some of the information that a Jordan canonical tells us about a matrix. Assuming that \(C^{-1}AC = J\text{,}\) where \(J\) is in Jordan canonical form, we can say the following.

  • Since similar matrices have the same eigenvalues, the eigenvalues of \(J\text{,}\) and therefore of \(A\text{,}\) are the diagonal entries of \(J\text{.}\) Moreover, the number of times a diagonal entry appears in \(J\) is the algebraic multiplicity of the eigenvalue. This is also the sum of the sizes of all Jordan blocks corresponding to \(\lambda\text{.}\)

  • Given an eigenvalue \(\lambda\text{,}\) its geometric multiplicity is the number of Jordan blocks corresponding to \(\lambda\text{.}\)

  • Each generalized eigenvector leads to a Jordan block for that eigenvector. The number of Jordan blocks corresponding to \(\lambda\) of size at least \(j\) is \(s_j = \dim\left(\Nul (A - \lambda I)^j\right) - \dim\left(\Nul (A - \lambda I)^{j-1}\right)\text{.}\) Thus, the number of Jordan blocks of size exactly \(j\) is

    \begin{equation*} s_j-s_{j+1} = 2 \dim\left(\Nul (A - \lambda I )^j\right) - \dim\left(\Nul (A - \lambda I )^{j+1}\right) - \dim\left(\Nul (A - \lambda I)^{j-1}\right)\text{.} \end{equation*}

One interesting consequence of the existence of the Jordan canonical form is the famous Cayley-Hamilton Theorem.

The proof of the Cayley-Hamilton Theorem follows from Exercise 22 that shows that every upper triangular matrix satisfies its characteristic polynomial. If \(A\) is a square matrix, then there exists a matrix \(C\) such that \(C^{-1}AC = T\text{,}\) where \(T\) is in Jordan canonical form (that is, \(T\) is upper triangular). So \(A\) is similar to \(T\text{.}\) If \(p(x)\) is the characteristic polynomial of \(A\text{,}\) Activity 19.5 in Section 19 tells us that \(p(x)\) is the characteristic polynomial of \(T\text{.}\) Therefore, \(p(T) = 0\text{.}\) Then Exercise 14 in Section 19 shows that \(p(A) = Cp(T)C^{-1} = 0\) and \(A\) satisfies its characteristic polynomial.

Subsection Geometry of Matrix Transformations using the Jordan Canonical Form

Figure 40.5. The image of the matrix transformation \(T(\vx) = \frac{1}{3} { \left[ \begin{array}{rr} 8\amp -1 \\ -2 \amp 7 \end{array} \right] }\vx\text{.}\)

Recall that we can visualize the action of a matrix transformation defined by a diagonalizable matrix by using a change of basis. For example, let \(T(\vx) = A\vx\text{,}\) where \(A = \frac{1}{3}\left[ \begin{array}{rr} 8\amp -1\\-2\amp 7 \end{array} \right]\text{.}\) The eigenvalues of \(A\) are \(\lambda_1 = 3\) and \(\lambda_2 = 2\) with corresponding eigenvectors \(\vv_1 = [-1 \ 1]^{\tr}\) and \([1 \ 2]^{\tr}\text{.}\) So \(A\) is diagonalizable by the matrix \(P = \left[ \begin{array}{rc} -1\amp 1\\1\amp 2 \end{array} \right]\text{,}\) with \(P^{-1}AP = D = \left[ \begin{array}{cc} 3\amp 0\\0\amp 2 \end{array} \right]\text{.}\) Note that \(T(\vx) = PDP^{-1} \vx\text{.}\) Now \(P^{-1}\) is a change of basis matrix from the standard basis to the basis \(\CB = \{\vv_1, \vv_2\}\text{,}\) and \(D\) stretches space in the direction of \(\vv_1\) by a factor of 3 and stretches space in the direction of \(\vv_2\) by a factor of 2, and then \(P\) changes basis back to the standard basis. This is illustrated in Figure 40.5.

In general, if an \(n \times n\) matrix \(A\) is diagonalizable, then there is a basis \(\CB = \{\vv_1, \vv_2, \ldots, \vv_n\}\) of \(\R^n\) consisting of eigenvectors of \(A\text{.}\) Assume that \(A \vv_i = \lambda_i \vv_i\) for each \(i\text{.}\) Letting \(P = [\vv_1 \ \vv_2 \ \ldots \ \vv_n]\) we know that

\begin{equation*} P^{-1}AP = D\text{,} \end{equation*}

where \(D\) is the diagonal matrix with \(\lambda_1\text{,}\) \(\lambda_2\text{,}\) \(\ldots\text{,}\) \(\lambda_n\) down the diagonal. If \(T\) is the matrix transformation defined by \(T(\vx) = A\vx\text{,}\) then

\begin{equation*} T(\vx) = PDP^{-1} \vx\text{.} \end{equation*}

Now \(P^{-1}\) is a change of basis matrix from the standard basis to the basis \(\CB\text{,}\) and \(D\) stretches or contracts space in the direction of \(\vv_i\) by the factor \(\lambda_i\text{,}\) and then \(P\) changes basis back to the standard basis. In this way we can visualize the action of the matrix transformation using the basis \(\CB\text{.}\) If \(A\) is not a diagonalizable matrix, we can use the Jordan canonical form to understand the action of the transformation defined by \(A\text{.}\) We start by analyzing shears.

Figure 40.6. Left: A shear in the \(x\)-direction. Right: A shear in the direction of the line \(y=x\text{.}\)

Activity 40.6.

(a)

Recall from Section 7 that a matrix transformation \(T\) defined by \(T(\vx) = A\vx\text{,}\) where \(A\) is of the form \(\left[ \begin{array}{cc} 1\amp a\\0\amp 1 \end{array} \right]\) performs a shear in the \(x\) direction, as illustrated at left in Figure 40.6. That is, while \(T(\ve_1) = \ve_1\text{,}\) it is the case that \(T(\ve_2) = \ve_2 + [a \ 0]^{\tr}\text{.}\) In other words, \(T(\ve_2)-\ve_2 = [a \ 0]^{\tr}\) is in \(\Span \{\ve_1\}\text{.}\) But we can say something more. Show that if \(\vx = [x_1 \ x_2]^{\tr}\) is not in \(\Span \{\ve_1\}\text{,}\) then

\begin{equation*} T(\vx) = \vx + x_2[a \ 0]^{\tr}\text{.} \end{equation*}

The result is that if \(\vx\) is not in \(\Span \{\ve_1\}\text{,}\) then \(T(\vx) - \vx\) is in \(\Span \{\ve_1\}\text{.}\) This leads us to a general definition of a shear.

Definition 40.7.

A matrix transformation \(T\) is a shear in the direction of the line \(\ell\) (through the origin) in \(\R^2\) if

  1. \(T(\vx) = \vx\) for all \(\vx\) in \(\ell\) and

  2. \(T(\vx) - \vx\) is in \(\ell\) for all \(\vx\) not in \(\ell\text{.}\)

(b)

Let \(S(\vx) = M\vx\text{,}\) where \(M = \left[ \begin{array}{cr} 3\amp -2\\2\amp -1 \end{array} \right]\text{.}\) Also let \(\vv_1 = [1 \ 1]^{\tr}\) and \(\vv_2 = [1 \ 0]^{\tr}\text{.}\)

(i)

Let \(\vx = \left[ \begin{array}{c} t\\t \end{array} \right]\) for some scalar \(t\text{.}\) Calculate \(S(\vx) = \vx\text{.}\) How is this related to the eigenvalues of \(M\text{?}\)

(ii)

Let \(\vx\) be any vector not in \(\Span\{ [1 \ 1]^{\tr}\}\text{.}\) Show that \(S(\vx) - \vx\) is in \(\Span\{ [1 \ 1]^{\tr}\}\text{.}\)

(iii)

Explain why \(S\) is a shear and how \(S\) is related to the image at right in Figure 40.6.

As we did with diagonalizable matrices, we can understand a general matrix transformation of the form \(T(\vx) = A\vx\) by using a Jordan canonical form of \(A\text{.}\) In this context, we will encounter matrices of the form \(B = \left[ \begin{array}{cc} c\amp a\\0\amp c \end{array} \right] = \left[ \begin{array}{cc} c\amp 0 \\ 0\amp c \end{array} \right] \left[ \begin{array}{cc} 1\amp \frac{a}{c} \\ 0\amp 1 \end{array} \right]\) for some positive constant \(c\text{.}\) If \(S(\vx) = B\vx\text{,}\) then \(S\) performs a shear in the direction of \(\ve_1\) and then an expansion or contraction in all directions by a factor of \(c\text{.}\) We illustrate with an example.

Activity 40.7.

Let \(T(\vx) = A\vx\text{,}\) where \(A = \left[ \begin{array}{cr} 3\amp -1\\1\amp 1 \end{array} \right]\text{.}\) The only eigenvalue of \(A\) is \(\lambda = 2\text{,}\) and this eigenvalue has algebraic multiplicity 2 and geometric multiplicity 1. The vector \(\vv_1 = [1 \ 1]^{\tr}\) is an eigenvector for \(A\) with eigenvalue \(2\text{,}\) and \(\vv_2 = [1 \ 0]^{\tr}\) satisfies \((A - 2I_2)\vv_2 = \vv_1\text{.}\) Let \(C = \left[ \begin{array}{cc} 1\amp 1\\1\amp 0 \end{array} \right]\text{.}\)

(a)

Explain why \(T(\vx) = CJC^{-1}\vx\text{,}\) where \(J = \left[ \begin{array}{cc} 2\amp 1\\0\amp 2 \end{array} \right]\text{.}\)

(b)

The matrix \(C\) is a change of basis matrix \(\underset{\CB \leftarrow \CS}{P}\) from some basis \(\CS\) to another basis \(\CB\text{.}\) Specifically identify \(\CS\) and \(\CB\text{.}\)

(c)

If we begin with an arbitrary vector \(\vx\text{,}\) then \([\vx]_{\CS} = \vx\text{.}\) How is \(C \vx\) related to \(\CB\text{?}\)

(d)

Describe in detail what \(J\) does to a vector in the \(\CB\) coordinate system.

Hint.

\(J = \left[ \begin{array}{cc} 2\amp 0\\0\amp 2 \end{array} \right] \left[ \begin{array}{cc} 1\amp \frac{1}{2}\\0\amp 1 \end{array} \right]\text{.}\)

(e)

Put this all together to describe the action of \(T\) as illustrated in Figure 40.8. The word shear should appear in your explanation.

Figure 40.8. Change of basis, a shear, and scaling.

Activity 40.7 provides the essential ideas to understand the geometry of a general linear transformation using the Jordan canonical form. Let \(T(\vx) = A\vx\) with \(A \vv_1 = \lambda \vv_1\) and \(A \vv_2 = \lambda \vv_2 + \vv_1\text{.}\) Then \(T\) maps \(\Span \{\vv_1\}\) to \(\Span \{\vv_1\}\) by a factor of \(\lambda\text{,}\) and \(T\) maps \(\Span\{\vv_2\}\) to the line containing the terminal point of \(\vv_1\) in the direction of \(\vv_2\text{.}\) The matrix \(C\) performs a change of basis from the standard basis to the basis \(\{\vv_1, \vv_2\}\text{,}\) then the matrix \(\left[ \begin{array}{cc} 2\amp 1\\0\amp 2 \end{array} \right]\) performs an expansion by a factor of 2 in all directions and a shear in the direction of \(\vv_1\text{.}\)

Subsection Proof of the Existence of the Jordan Canonical Form

While we have constructed an algorithm to find a Jordan canonical form of a square matrix, we haven't yet addressed the question of whether every square matrix has a Jordan canonical form. We do that in this section.

Consider that any vector \([a \ b]^{\tr}\) in \(\R^2\) can be written as the sum \([a \ 0]^{\tr} + [0 \ b]^{\tr}\text{.}\) The vector \([a \ 0]^{\tr}\) is a vector in the subspace \(W_1 = \{ [x \ 0]^{\tr} | x \in \R\} = \Span \{[1 \ 0]^{\tr}\}\) and the vector \([0 \ b]^{\tr}\) is in the subspace \(W_2 = \{[0 \ y]^{\tr} | y \in \R\} = \Span\{[0 \ 1]^{\tr}\text{.}\) Also notice that \(W_1 \cap W_2 = \{\vzero\}\text{.}\) We can extend this idea to \(\R^3\text{,}\) where any vector \([a \ b \ c]^{\tr}\) can be written as \([a \ 0 \ 0]^{\tr} + [0 \ b \ 0]^{\tr} + [0 \ 0 \ c]^{\tr}\text{,}\) with \([a \ 0 \ 0]^{\tr}\) in \(W_1 = \Span\{[1 \ 0 \ 0]^{\tr}\text{,}\) \([0 \ b \ 0]^{\tr}\) in \(W_2 = \Span\{[0 \ 1 \ 0]^{\tr}\text{,}\) and \([0 \ 0 \ c]^{\tr}\) in \(W_3 = \Span\{[0 \ 0 \ 1]^{\tr}\text{.}\) In this situation we write \(\R^2 = W_1 \oplus W_2\) and \(\R^3 = W_1 \oplus W_2 \oplus W_3\) and say that \(\R^2\) is the direct sum of \(W_1\) and \(W_2\text{,}\) while \(\R^3\) is the direct sum of \(W_1\text{,}\) \(W_2\text{,}\) and \(W_3\text{.}\)

Definition 40.9.

A vector space \(V\) is a direct sum of subspaces \(V_1\text{,}\) \(V_2\text{,}\) \(\ldots\text{,}\) \(V_m\) if every vector \(\vv\) in \(V\) can be written uniquely as a sum

\begin{equation*} \vv = \vv_1+\vv_2+\vv_3+\cdots + \vv_m\text{,} \end{equation*}

with \(\vv_i \in V_i\) for each \(i\text{.}\)

If \(V\) is a direct sum of subspaces \(V_1\text{,}\) \(V_2\text{,}\) \(\ldots\text{,}\) \(V_m\text{,}\) then we write

\begin{equation*} V = V_1 \oplus V_2 \oplus \cdots \oplus V_m\text{.} \end{equation*}

Some useful facts about direct sums are given in the following theorem. The proofs are left for the exercises.

Subsection Nilpotent Matrices and Invariant Subspaces

We will prove the existence of the Jordan canonical form in two steps. In the next subsection Lemma 40.14 will show that every linear transformation can be diagonalized in some form, and Lemma 40.15 will provide the specific Jordan canonical form. Before we proceed to the lemmas, there are two concepts we need to introduce — nilpotent matrices and invariant subspaces. We don't need these concepts beyond our proof, so we won't spend a lot of time on them.

Activity 40.8.

Let \(A = \left[ \begin{array}{rr} 1\amp 1\\-1\amp -1 \end{array} \right]\) and \(B = \left[ \begin{array}{rcr} 2\amp 1\amp -3 \\ -2\amp 1\amp 1 \\ 2\amp 1\amp -3 \end{array} \right]\text{.}\)

(a)

Calculate the positive integer powers of \(A\) and \(B\text{.}\) What do you notice?

(b)

Compare the eigenvalues of \(A\) to the eigenvalues of \(B\text{.}\) What do you notice?

Activity 40.8 shows that there are some matrices whose powers eventually become the zero matrix, and that there might be some connection to the eigenvalues of these matrices. Such matrices are given a special name.

Definition 40.11.

A square matrix \(A\) is nilpotent if \(A^m =0\) for some positive integer \(m\text{.}\) Correspondingly, a linear transformation \(T\) from a \(n\)-dimensional vector space \(V\) to \(V\) is nilpotent if \(T^m = 0\) for some positive integer \(m\text{.}\)

Nilpotent matrices are the essential obstacle to the diagonalization process. If \(A\) is a nilpotent matrix, the smallest positive integer \(m\) such that \(A^m = 0\) is called the index of \(A\text{.}\)

A characterization of nilpotent matrices is given in the following theorem.

The proof is left to the exercises.

We have seen that if \(T: V \to V\) is a linear transformation from a vector space to itself, and if \(\lambda\) is an eigenvalue of \(T\) with eigenvector \(\vv\text{,}\) then \(T(\vv) = \lambda \vv\text{.}\) In other words, \(T\) maps every vector in \(W = \Span\{\vv\}\) to a vector in \(W\text{.}\) When this happens we say that \(W\) is invariant under the transformation \(T\text{.}\)

Definition 40.13.

A subspace \(W\) of a vector space \(V\) is invariant under a linear transformation \(T: V \to V\) if \(T(\vw) \in W\) whenever \(\vw\) is in \(W\text{.}\)

So, for example, every eigenspace of a transformation is invariant under the transformation. Other spaces that are always invariant are \(V\) and \(\{\vzero\}\text{.}\)

Activity 40.9.

Let \(V\) be a vector space and let \(T : V \to V\) be a linear transformation.

(a)

Let \(V = \R^2\) and \(T\) the linear transformation defined by \(T([x \ y]^{\tr}) = [x-y \ y-x]^{\tr}\text{.}\) Find two invariant subspaces besides \(V\) or \(\{\vzero\}\) for \(T\text{.}\)

(b)

Recall that \(\Ker(T) = \{\vv \in V : T(\vv) = \vzero\}\text{.}\) Is \(\Ker(T)\) invariant under \(T\text{?}\) Explain.

(c)

Recall that \(\Range(T) = \{\vw \in V : \vw = T(\vv) \text{ for some } \vv \in V\}\text{.}\) Is \(\Range(T)\) invariant under \(T\text{?}\) Explain.

Subsection The Jordan Canonical Form

We are now ready to prove the existence of the Jordan canonical form.

An example might help illustrate the lemma.

Activity 40.10.

Let \(T:\pol_4 \to \pol_4\) be defined by

\begin{align*} T\left(a_0+a_1t+a_2t^2+a_3t^3+a_4t^4\right) = (2a_0+a_1) \amp + (a_1-a_2)t + (a_0+a_1)t^2\\ \amp \qquad + (-a_0-a_1+a_2+2a_3-a_4)t^3 + (2a_4)t^4\text{.} \end{align*}

The matrix of \(T\) with respect to the standard basis \(\CS = \{1,t,t^2,t^3,t^4\}\) is

\begin{equation*} A = [T]_{\CS} = \left[ \begin{array}{rrrcc} 2\amp 1\amp 0\amp 0\amp 0\\0\amp 1\amp -1\amp 0\amp 0 \\ 1\amp 1\amp 0\amp 0\amp 0 \\ -1\amp -1\amp 1\amp 2\amp -1 \\ 0\amp 0\amp 0\amp 0\amp 2 \end{array} \right]\text{.} \end{equation*}

The eigenvalues of \(A\) (and \(T\)) are \(2\) and \(1\text{,}\) and the algebraic multiplicity of the eigenvalue \(2\) is 2 while its geometric multiplicity is 1, and the algebraic multiplicity of the eigenvalue \(1\) is 3 while its geometric multiplicity is also 1.

For every \(t\text{,}\) we can find \(\Ker(T-\lambda I)^t\) using \(\Nul (A- \lambda I)^t\text{.}\)

(a)

Technology shows that \(\dim(\Nul A-I) = 1\text{,}\) \(\dim(\Nul A-I)^2 = 2\text{,}\) and \(\dim(\Nul A-I)^3 = 3\text{.}\) A basis for \(\Nul (A-I)^3\) is \(\CC_1 = \left\{ [-1 \ 1 \ 0 \ 0 \ 0]^{\tr}, [1 \ 0 \ 1 \ 0 \ 0]^{\tr}, [1 \ 0 \ 0 \ 1 \ 0]^{\tr}\right\}\text{.}\) Find a basis \(\CB_1\) for \(\Ker\left((T-I)^3\right)\text{.}\)

(b)

Technology also shows that \(\dim(\Nul A-2I) = 1\) and \(\dim(\Nul A-2I)^2 = 2\text{.}\) A basis for \(\Nul (A-2I)^2\) is \(\CC_2 = \left\{ [0 \ 0 \ 0 \ 1 \ 0]^{\tr}, [0 \ 0 \ 0 \ 0 \ 1]^{\tr}\right\}\text{.}\) Find a basis \(\CC_2\) for \(\Ker\left((T-2I)^2\right)\text{.}\)

(c)

Identify the \(\lambda_i\) and \(s_i\) in Activity 40.10. Let \(\CB = \CB_1 \cup \CB_2\text{.}\) Find the matrix \([T]_{\CB}\text{.}\)

Since each \(\Ker(T-\lambda_i I)^{s_i}\) in Activity 40.10 is \(T\) invariant, \(T\) maps vectors in \(\Ker(T-\lambda_i I)^{s_i}\) back into \(\Ker(T-\lambda_i I)^{s_i}\text{.}\) So \(T\) applied to each \(\Ker(T-\lambda_i I)^{s_i}\) provides a matrix \(B_i\text{.}\) Applying \(T\) to each \(\Ker(T-\lambda_i I)^{s_i}\) produces the matrix

\begin{equation*} \left[ \begin{array}{cccc} B_1 \amp \amp \cdots \amp \\ \amp B_2 \amp \cdots \amp 0 \\ 0 \amp \amp \ddots \amp \\ \amp \amp \cdots \amp B_r \end{array} \right]\text{,} \end{equation*}

where the \(B_i\) are square matrices corresponding to the eigenvalues of \(T\text{.}\) These blocks are determined by the restriction of \(T\) to the spaces \(\Ker(T-\lambda_i I)^{s_i}\) with respect to the found basis. To obtain the Jordan canonical form, we need to know that we can always choose these basis to create the correct block matrices. Lemma 40.15 will provide those details.

Proof of Lemma 40.14.

Choose a \(\lambda_i\) and, for convenience, label it \(\lambda\text{.}\) For each positive integer \(j\text{,}\) let \(W_j = \Ker(T - \lambda I)^j\text{.}\) If \((T-\lambda I)^j\vx = \vzero\text{,}\) then

\begin{equation*} (T-\lambda I)^{j+1}(\vx) = (T-\lambda I) (T-\lambda I)^j(\vx) = (T-\lambda I)(\vzero) = \vzero\text{,} \end{equation*}

so \(W_j \subseteq W_{j+1}\text{.}\) Thus we have the containments

\begin{equation*} W_1 \subseteq W_2 \subseteq W_3 \subseteq \cdots W_k \subseteq \cdots\text{.} \end{equation*}

Now \(V\) is finite dimensional, so this sequence must reach equality at some integer \(t\text{.}\) That is, \(W_t = W_{t+1} = \cdots\text{.}\) Let \(s\) be the smallest positive integer for which this happens.

We plan to show that \(V = \Ker(T - \lambda I)^s \oplus \Range(T - \lambda I)^s\text{.}\) We begin by demonstrating that \(\Ker(T - \lambda I)^s\cap \Range(T - \lambda I)^s = \{0\}\text{.}\) Let \(\vv \in \Ker(T - \lambda I)^s \cap \Range(T - \lambda I)^s\text{.}\) Then \((T - \lambda I)^s(\vv) = \vzero\) and there exists \(\vu \in V\) such that \((T-\lambda I)^s(\vu) = \vv\text{.}\) It follows that

\begin{equation*} (T-\lambda I)^{2s}(\vu) = (T-\lambda I)^s(T-\lambda I)^{s}(\vu) = (T-\lambda I)^s(\vv) = \vzero\text{.} \end{equation*}

But \(\Ker(T - \lambda I)^{2s} = \Ker(T- \lambda I)^s\text{,}\) so

\begin{equation*} \vzero = (T-\lambda I)^{2s}(\vu) = (T-\lambda I)^{s}(\vu) = \vv\text{.} \end{equation*}

We conclude that \(\Ker(T - \lambda I)^s \cap \Range(T - \lambda I)^s = \{\vzero\}\text{.}\)

Now we will show that \(V = \Ker(T - \lambda I)^s \oplus \Range(T - \lambda I)^s\text{.}\) Let \(\vz = \vz_1 + \vz_2\) with \(\vz_1 \in \Ker(T - \lambda I)^s\) and \(\vz_2 \in \Range(T - \lambda I)^s\text{.}\) First we will show that \(\vz\) is uniquely represented in this way. Suppose \(\vz = \vz'_1+\vz'_2\) with \(\vz'_1 \in \Ker(T - \lambda I)^s\) and \(\vz'_2 \in \Range(T - \lambda I)^s\text{.}\) Then

\begin{equation*} \vz_1+\vz_2 = \vz'_1+\vz'_2 \end{equation*}

and

\begin{equation*} \vz_1-\vz'_1 = \vz'_2-\vz_2\text{.} \end{equation*}

But \(\Ker(T - \lambda I)^s \cap \Range(T - \lambda I)^s = \{\vzero\}\text{,}\) so \(\vz_1-\vz'_1 = \vzero\) and \(\vz'_2-\vz_2=\vzero\) which means \(\vz_1=\vz'_1\) and \(\vz_2 = \vz'_2\text{.}\) Now let \(Z = \Ker(T - \lambda I)^s \oplus \Range(T - \lambda I)^s\text{.}\) We then know that

\begin{equation*} \dim(Z) = \dim\left(\Ker(T - \lambda I)^s\right) + \dim\left(\Range(T - \lambda I)^s\right)\text{.} \end{equation*}

Also, the Rank-Nullity Theorem shows that

\begin{equation*} \dim(V) = \dim\left(\Ker(T - \lambda I)^s\right) + \dim\left(\Range(T - \lambda I)^s\right)\text{.} \end{equation*}

So \(Z\) is a subspace of \(V\) with \(\dim(Z) = \dim(V)\text{.}\) We conclude that \(Z = V\) and that

\begin{equation*} V = \Ker(T - \lambda I)^s \oplus \Range(T - \lambda I)^s\text{.} \end{equation*}

Next we demonstrate that \(\Ker(T - \lambda I)^s\) and \(\Range(T - \lambda I)^s\) are invariant under \(T\text{.}\) Note that

\begin{equation*} T(T - \lambda I) = T^2 - \lambda T = (T-\lambda I)T\text{,} \end{equation*}

and \(T\) commutes with \((T - \lambda I)\text{.}\) By induction, \(T\) commutes with \((T - \lambda I)^s\text{.}\) Suppose that \(\vv \in \Ker(T - \lambda I)^s\text{.}\) Then

\begin{equation*} (T - \lambda I)^sT(\vv) = T(T - \lambda I)^s(\vv) = T(\vzero) = \vzero\text{.} \end{equation*}

So \(T(\vv) \in \Ker(T - \lambda I)^s\text{.}\) Similarly, suppose that \(\vv \in \Range(T - \lambda I)^s\text{.}\) Then there is a \(\vu \in V\) such that \((T - \lambda I)^s(\vu) = \vv\text{.}\) Then

\begin{equation*} T(\vv) = T\left((T - \lambda I)^s(\vu)\right) = \left(T(T - \lambda I)^s\right)(\vu) = \left((T - \lambda I)^sT\right)(\vu) = (T - \lambda I)^s(T(\vu))\text{,} \end{equation*}

and \(T(\vv) \in \Range(T - \lambda I)^s\text{.}\)

We conclude our proof by induction on the number \(r\) of eigenvalues of \(T\text{.}\) Suppose that \(r=1\) and so \(T\) has exactly one eigenvalue \(\lambda\text{.}\) Then \(T-\lambda I\) has only zero as an eigenvalue (otherwise, there is \(\mu \neq 0\) such that \((T - \lambda I) - \mu I = T - (\lambda+\mu)I\) has a nontrivial kernel. This makes \(\lambda+\mu\) an eigenvalue of \(T\text{.}\)) In this situation, \(T - \lambda I\) is nilpotent and so \((T - \lambda I)^t = 0\) for some positive integer \(t\text{.}\) If \(s\) is the smallest such power, then \(V = \Ker(T - \lambda I)^s\) and \(\Range(T - \lambda I)^s = \{\vzero\}\text{.}\) So every vector in \(V\) is in \(\Ker(T - ]lambda I)^s\) and

\begin{equation*} V = \Ker(T - \lambda I)^s \oplus \Range(T - \lambda I)^s = \Ker(T - \lambda I)^s\text{.} \end{equation*}

Thus, the statement is true when \(r=1\text{.}\) Assume that the statement is true for linear transformations with fewer than \(r\) eigenvalues. Now assume that \(T\) has distinct eigenvalues \(\lambda_1\text{,}\) \(\lambda_2\text{,}\) \(\ldots\text{,}\) \(\lambda_r\text{.}\) By our previous work, we know that

\begin{equation*} V = \Ker(T - \lambda_1 I)^{s_1} \oplus \Range(T - \lambda_1 I)^{s_1} \end{equation*}

for some positive integer \(s_1\text{.}\) Let \(V_1 = \Range(T - \lambda_1 I)^{s_1}\text{.}\) Since \(\Range(T - \lambda_1 I)^{s_1}\) is \(T\) invariant, we know that \(T\) maps \(V_1\) to \(V_1\text{.}\) The eigenvalues of \(T\) on \(V_1\) are \(\lambda_2\text{,}\) \(\lambda_3\text{,}\) \(\ldots\text{,}\) \(\lambda_r\text{.}\) By our induction hypothesis, we have

\begin{equation*} V_1 = \Ker(T-\lambda_2 I)^{s_2} \oplus \Ker(T-\lambda_3 I)^{s_3} \oplus \cdots \oplus \Ker(T-\lambda_r I)^{s_r}\text{,} \end{equation*}

for some positive integers \(s_2\text{,}\) \(s_3\text{,}\) \(\ldots\text{,}\) \(s_r\text{,}\) which makes

\begin{equation*} V = \Ker(T-\lambda_1 I)^{s_1} \oplus \Ker(T-\lambda_2 I)^{s_2} \oplus \Ker(T-\lambda_3 I)^{s_3} \oplus \cdots \oplus \Ker(T-\lambda_r I)^{s_r}\text{.} \end{equation*}

Lemma 40.14 tells us that \(T\) has a diagonal form with block matrices down the diagonal. To obtain a Jordan canonical form, we need to identify the correct bases for the summands of \(V\text{.}\) (Lemma is due to Mark Wildon from “A SHORT PROOF OF THE EXISTENCE OF JORDAN NORMAL FORM”.)

Notice the similarity of Lemma 40.15 to chains of generalized eigenvectors. An example might help illustrate Lemma 40.15.

Activity 40.11.

Let \(T: \pol_5 \to \pol_5\) be defined by

\begin{equation*} T(a_0+a_1t+a_2t^2+a_3t^3+a_4t^4+a_5t^5) = (-a_1-a_4+a_5)t + (-a_0-a_1+a_3-a_4+a_5)t^2 + (a_1+a_4)t^4\text{.} \end{equation*}

Let \(\CS = \{1,t,t^2,t^3,t^4,t^5\}\) be the standard basis for \(\pol_5\text{.}\) Then

\begin{equation*} A = [T]_{\CS} = \left[ \begin{array}{rrccrc} 0\amp 0\amp 0\amp 0\amp 0\amp 0 \\ 0\amp -1\amp 0\amp 0\amp -1\amp 1 \\ -1\amp -1\amp 0\amp 1\amp -1\amp 1 \\ 0\amp 0\amp 0\amp 0\amp 0\amp 0 \\ 0\amp 1\amp 0\amp 0\amp 1\amp 0 \\ 0\amp 0\amp 0\amp 0\amp 0\amp 0 \end{array} \right]\text{.} \end{equation*}

Technology shows that the only eigenvalue of \(A\) is \(0\) and that the geometric multiplicity of \(0\) is \(3\text{.}\) Since \(0\) is the only eigenvalue of \(A\text{,}\) we know that \(A\) (and \(T\)) is nilpotent. Using technology we find that the reduced row echelon forms of \(A\) and \(A^2\) and respectively,

\begin{equation*} \left[ \begin{array}{cccrcc} 1\amp 0\amp 0\amp -1\amp 0\amp 0 \\ 0\amp 1\amp 0\amp 0\amp 1\amp 0 \\ 0\amp 0\amp 0\amp 0\amp 0\amp 1 \\ 0\amp 0\amp 0\amp 0\amp 0\amp 0 \\ 0\amp 0\amp 0\amp 0\amp 0\amp 0 \\ 0\amp 0\amp 0\amp 0\amp 0\amp 0 \end{array} \right], \ \left[ \begin{array}{cccrcc} 0\amp 0\amp 0\amp 0\amp 0\amp 1 \\ 0\amp 0\amp 0\amp 0\amp 0\amp 0 \\ 0\amp 0\amp 0\amp 0\amp 0\amp 0 \\ 0\amp 0\amp 0\amp 0\amp 0\amp 0 \\ 0\amp 0\amp 0\amp 0\amp 0\amp 0 \\ 0\amp 0\amp 0\amp 0\amp 0\amp 0 \end{array} \right]\text{,} \end{equation*}

while \(A^3 = 0\text{.}\) We see that \(\dim(\Ker(T^3)) \dim(\Nul A^3) = 6\) while \(\dim(\Ker(T^2)) = \dim(\Nul A^2) = 5\text{.}\)

(a)

Notice that the vector \(\vv_1 = [0 \ 0 \ 0 \ 0 \ 0 \ 1]^{\tr}\) is in \(\Nul A^3\) but not in \(\Nul A^2\text{.}\) Use this vector to construct one chain \(u_1\text{,}\) \(T(u_1)\text{,}\) and \(T^2(u_1)\) of generalized eigenvectors starting with a vector \(u_1\) that is in \(\Ker(T^3)\) but not in \(\Ker(T^2)\text{.}\) What can we say about the vector \(T^2(u_1)\) in relation to eigenvectors of \(T\text{?}\)

(b)

We know two other eigenvectors of \(T\text{,}\) so we need another chain of generalized eigenvectors to provide a basis of \(\pol_5\) of generalized eigenvectors. Use the fact that \(\vv_2 = [1 \ 0 \ 0 \ 0 \ 0 \ 0]^{\tr}\) is in \(\Nul A^2\) but not in \(\Nul A\) to find another generalized eigenvector \(u_2\) in \(\Ker(T^2)\) that is not in \(\Ker(T)\text{.}\) Then create a chain \(u_2\) and \(T(u_2)\) of generalized eigenvectors. What is true about \(T(u_2)\) in relation to eigenvectors of \(T\text{?}\)

(c)

Let \(u_3 = 1+t^3\) be a third eigenvector of \(T\text{.}\) Explain why \(\{u_1, T(u_1), T^2(u_1), u_2, T(u_2), u_3\}\) is a basis of \(\pol_5\text{.}\) Identify the values of \(k\) and the \(a_i\) in Lemma 40.12.

Notice that if we let \(C = [[T^2(\vu_1)]_{\CS} \ [T(\vu_1)]_{\CS} \ [\vu_1]_{\CS} \ [T(\vu_4)]_{|CS} \ [\vu_4]_{|CS} \ [\vu_6]_{\CS}]\) using the vectors from Activity 40.11 we should find that

\begin{equation*} C^{-1}AC = \left[ \begin{array}{cccccc} 0\amp 1\amp 0\amp 0\amp 0\amp 0 \\ 0\amp 0\amp 1\amp 0\amp 0\amp 0 \\ 0\amp 0\amp 0\amp 0\amp 0\amp 0 \\ 0\amp 0\amp 0\amp 0\amp 1\amp 0 \\ 0\amp 0\amp 0\amp 0\amp 0\amp 0 \\ 0\amp 0\amp 0\amp 0\amp 0\amp 0 \end{array} \right]\text{,} \end{equation*}

and this basis provides a matrix that produces a Jordan canonical form of \(A\text{.}\)

Lemma 40.15 provides the sequences of generalized eigenvectors that we need to make the block matrices in Jordan canonical form. This works as follows. Start, for example, with \(\vu_1\text{,}\) \(T(\vu_1)\text{,}\) \(\ldots\text{,}\)\(T^{a_1-1}(\vu_1)\text{.}\) Since \(T^s = 0\text{,}\) we know that \(T\) is nilpotent and so has only \(0\) as an eigenvalue. If \(T\) has a nonzero eigenvalue \(\lambda\text{,}\) we replace \(T\) with \(T - \lambda I\text{.}\)

Let \(\CB = \{T^{a_1-1}(\vu_1), T^{a_1-2}(\vu_1), \ldots, T^2(\vu_1), T(\vu_1), \vu_1\}\text{.}\) Then

\begin{align*} _{\CB} \amp = [T^{a_1}(\vu_1)]_{\CB} = [0 \ 0 \ 0 \ \ \ldots \ 0 \ 0]^{\tr}\\ [T(T^{a_1-2}(\vu_1))]_{\CB} \amp = [T^{a_1-1}(\vu_1)]_{\CB} = [1 \ 0 \ 0 \ \ \ldots \ 0 \ 0]^{\tr}\\ [T(T^{a_1-3}(\vu_1))]_{\CB} \amp = [T^{a_1-3}(\vu_1)]_{\CB} = [0 \ 1 \ 0 \ \ \ldots \ 0 \ 0]^{\tr}\\ \amp \vdots\\ [T(T^2(\vu_1))]_{\CB} \amp = [T^3(\vu_1)]_{\CB} = [0 \ 0 \ 0 \ \ \ldots \ 0 \ 1 \ 0 \ 0 \ 0]^{\tr}\\ [T(T(\vu_1))]_{\CB} \amp = [T^2(\vu_1)]_{\CB} = [0 \ 0 \ 0 \ \ \ldots \ 0 \ 0 \ 1 \ 0 \ 0]^{\tr}\\ [T(\vu_1)]_{\CB} \amp = [T(\vu_1)]_{\CB} = [0 \ 0 \ 0 \ \ \ldots \ 0 \ 0 \ 0 \ 1 \ 0]^{\tr}\text{.} \end{align*}

This makes

\begin{equation*} [T]_{\CB} = \left[ \begin{array}{cccccccc} 0\amp 1\amp 0\amp 0\amp \cdots \amp 0 \amp 0\amp 0 \\ 0\amp 0\amp 1\amp 0\amp \cdots \amp 0\amp 0\amp 0 \\ \amp \amp \ddots\amp \ddots \amp \amp \amp \amp \\ \amp \amp \amp \amp \vdots \amp \amp \amp \\ 0\amp 0\amp 0\amp 0\amp \cdots \amp 0\amp 1\amp 0 \\ 0\amp 0\amp 0\amp 0\amp \cdots \amp 0\amp 0\amp 1 \\ 0\amp 0\amp 0\amp 0\amp \cdots \amp 0\amp 0\amp 0 \end{array} \right]\text{,} \end{equation*}

which gives one Jordan block.

Proof of Lemma 40.15.

If \(T(\vx) = \vzero\) for all \(\vx \in V\text{,}\) then we can choose \(\vu_1\text{,}\) \(\vu_2\text{,}\) \(\ldots\text{,}\) \(\vu_{\dim(V)}\) to form any basis of \(V\) and all \(a_i = 1\text{.}\) So we can assume that \(T\) is a nonzero transformation. Also, we claim that \(T(V)\) is a proper subset of \(V\) (recall that \(T(V)\) is the same as \(\Range(T)\)). If not, \(V = T(V) = T^2(V) = \cdots = T^m(V)\) for any positive integer \(m\text{.}\) But this contradicts the fact that \(T^s = 0\text{.}\)

We proceed by induction on \(\dim(V)\text{.}\) If \(\dim(V) = 1\text{,}\) since\(T(V)\) is a proper subset of \(V\) the only possibility is that \(T(V) = \{\vzero\}\text{.}\) We have already discussed this case. For the inductive step assume that the lemma is true for any vector space of positive integer dimension less than \(\dim(V)\text{.}\) Our assumption that \(T\) is a nonzero transformation allows us to conclude that \(0 \subset T(V) \subset V\text{.}\) Thus, \(1 \leq \dim(T(V)) \lt \dim(V)\text{.}\) We apply the inductive hypothesis to the transformation \(T: T(V) \to T(V)\) to find integers \(b_1\text{,}\) \(b_2\text{,}\) \(\ldots\text{,}\) \(b_{\ell}\) and vectors \(\vv_1\text{,}\) \(\vv_2\text{,}\) \(\ldots\text{,}\) \(\vv_{\ell}\) in \(T(V)\) such that the vectors

\begin{equation} \vv_1, T(\vv_1), \ldots, T^{b_1-1}(\vv_1), \ldots, \vv_{\ell}, T(\vv_{\ell}), \ldots, T^{b_{\ell}-1}(\vv_{\ell})\tag{40.4} \end{equation}

form a basis for \(T(V)\) and \(T^{b_i}(\vv_i) = \vzero\) for \(1 \leq i \leq \ell\text{.}\)

Now \(\vv_1\text{,}\) \(\vv_2\text{,}\) \(\ldots\text{,}\) \(\vv_{\ell}\) are in \(T(V)\text{,}\) so there exist \(\vu_1\text{,}\) \(\vu_2\text{,}\) \(\ldots\text{,}\) \(\vu_{\ell}\) in \(V\) such that \(T(\vu_i) = \vv_i\) for each \(1 \leq i \leq \ell\text{.}\) This implies that \(T^j(\vu_i) = T^{j-1}(T(\vu_i)) = T^{j-1}(\vv_i)\) for all \(i\) and all positive integers \(j\text{.}\) The vectors \(T^{b_1-1}(\vv_1)\text{,}\) \(T^{b_2-1}(\vv_2)\text{,}\) \(\ldots\text{,}\) \(T^{b_{\ell}-1}(\vv_{\ell})\) are linearly independent and \(T\left(T^{b_i-1}(\vv_i)\right) = T^{b_i}(\vv_i) = 0\) for \(1 \leq i \leq \ell\text{,}\) so the vectors \(T^{b_1-1}(\vv_1)\text{,}\) \(T^{b_2-1}(\vv_2)\text{,}\) \(\ldots\text{,}\) \(T^{b_{\ell}-1}(\vv_{\ell})\) are all in \(\Ker(T)\text{.}\) Extend the set \(\left\{T^{b_1-1}(\vv_1), T^{b_2-1}(\vv_2), \ldots, T^{b_{\ell}-1}(\vv_{\ell})\right\}\) to a basis of \(\Ker(T)\) with the vectors \(\vw_1\text{,}\) \(\vw_2\text{,}\) \(\ldots\text{,}\) \(\vw_m\) for \(m = \dim(\Ker(T)) - \ell\text{.}\) That is, the set

\begin{equation} \left\{T^{b_1-1}(\vv_1), T^{b_2-1}(\vv_2), \ldots, T^{b_{\ell}-1}(\vv_{\ell}), \vw_1, \vw_2, \ldots, \vw_m\right\}\tag{40.5} \end{equation}

is a basis for \(\Ker(T)\text{.}\) We will now show that the vectors

\begin{equation} \vu_1, T(\vu_1), \ldots, T^{b_1}(\vu_1), \ldots, \vu_{\ell}, T(\vu_{\ell}), \ldots, T^{b_{\ell}}(\vu_{\ell}), \vw_1, \ldots, \vw_m\tag{40.6} \end{equation}

form a basis for \(V\text{.}\) To demonstrate linear independence, suppose that

\begin{align} c_{1,0}\vu_1\amp +c_{1,1}T(\vu_1)+ \cdots + c_{1,b_1}T^{b_1}(\vu_1) + \cdots\notag\\ \amp + c_{\ell,0}\vu_{\ell} + c_{\ell,1}T(\vu_{\ell})+ \cdots + c_{\ell,b_{\ell}}T^{b_{\ell}}(\vu_{\ell}) + d_1\vw_1+ \cdots + d_m\vw_m = \vzero\tag{40.7} \end{align}

for some scalars \(c_{i,j}\) and \(d_k\text{.}\) Apply \(T\) to this linear combination to obtain the vector equation

\begin{align*} c_{1,0}T(\vu_1)\amp +c_{1,1}T^2(\vu_1)+ \cdots + c_{1,b_1}T^{b_1+1}(\vu_1) + \cdots\\ \amp + c_{\ell,0}T(\vu_{\ell}) + c_{\ell,1}T^2(\vu_{\ell})+ \cdots + c_{\ell,b_{\ell}}T^{b_{\ell}+1}(\vu_{\ell}) + d_1T(\vw_1)+ \cdots \\ \amp + d_mT(\vw_m) = \vzero\text{.} \end{align*}

Using the relationship \(T^j(\vu_i) = T^{j-1}(\vv_i)\) gives us the equation

\begin{align*} c_{1,0}\vv_1\amp +c_{1,1}T(\vv_1)+ \cdots + c_{1,b_1}T^{b_1}(\vv_1) + \cdots\\ \amp + c_{\ell,0}\vv_{\ell} + c_{\ell,1}T(\vv_{\ell})+ \cdots + c_{\ell,b_{\ell}}T^{b_{\ell}}(\vv_{\ell}) + d_1T(\vw_1)+ \cdots + d_mT(\vw_m) = \vzero\text{.} \end{align*}

Recall that \(T^{b_i}(\vv_i) = 0\) and that \(\vw_1\text{,}\) \(\vw_2\text{,}\) \(\ldots\text{,}\) \(\vw_m\) are in \(\Ker(T)\) to obtain the equation

\begin{align*} c_{1,0}\vv_1 \amp +c_{1,1}T(\vv_1)+ \cdots + c_{1,b_1-1}T^{b_1-1}(\vv_1) + \cdots + c_{\ell,0}\vv_{\ell} + c_{\ell,1}T(\vv_{\ell})+ \cdots\\ \amp + c_{\ell,b_{\ell}-1}T^{b_{\ell}-1}(\vv_{\ell}) = \vzero\text{.} \end{align*}

But this final equation is a linear combination of the basis elements in (40.4) of \(T(V)\text{,}\) and so the scalars are all \(0\text{.}\) Replacing these scalars with \(0\) in (40.7) results in

\begin{equation*} c_{1,b_1}T^{b_1}(\vu_1) + \cdots + c_{\ell,b_{\ell}}T^{b_{\ell}}(\vu_{\ell}) + d_1\vw_1+ \cdots + d_m\vw_m = 0\text{.} \end{equation*}

But this is a linear combination of vectors in a basis for \(\Ker(T)\) and so all of the scalars are also \(0\text{.}\) Hence, the vectors \(\vu_1\text{,}\) \(T(\vu_1)\text{,}\) \(\ldots\text{,}\) \(T^{b_1}(\vu_1)\text{,}\) \(\ldots\text{,}\) \(\vu_{\ell}\text{,}\) \(T(\vu_{\ell})\text{,}\) \(\ldots\text{,}\) \(T^{b_{\ell}}(\vu_{\ell})\text{,}\) \(\vw_1\text{,}\) …, \(\vw_m\) are linearly independent.

The Rank-Nullity Theorem shows tells us that \(\dim(V) = \dim(T(V)) + \dim(\Ker(T))\text{.}\) The vectors in (40.5) form a basis for \(\Ker(T)\text{,}\) and so \(\dim(\Ker(T)) = \ell + m\text{.}\) The vectors in (40.4) form a basis for \(T(V)\text{,}\) so \(\dim(T(V)) = b_1+b_2+ \cdots + b_{\ell}\text{.}\) Thus,

\begin{equation*} \dim(V) = \ell+m + b_1+b_2+ \cdots + b_{\ell} = m + (b_1-1) + (b_2-1) + \cdots + (b_{\ell}-1)\text{.} \end{equation*}

But this is exactly the number of vectors in our claimed basis (40.6). This verifies Lemma 40.15 with \(k=\ell+m\text{,}\) \(a_i=b_i+1\) for \(1 \leq i \leq \ell\text{,}\) \(u_{j+\ell} = w_j\) and \(a_{j+\ell} = 1\) for \(1 \leq m\text{.}\)

We return to Activity 40.10 to illustrate the use of Lemma 40.15.

Example 40.16.

We work with the transformation \(T:\pol_4 \to \pol_4\) defined by

\begin{align*} T\left(a_0+a_1t+a_2t^2+a_3t^3+a_4t^4\right) = (2a_0+a_1) \amp + (a_1-a_2)t + (a_0+a_1)t^2\\ \amp + (-a_0-a_1+a_2+2a_3-a_4)t^3 \\ \amp + (2a_4)t^4\text{.} \end{align*}

Recall from Activity 40.10 that

\begin{equation*} \pol_4 = \Ker(T-I)^{3} \oplus \Ker(T-2 I)^{2}\text{.} \end{equation*}

and a basis for \(\Ker(T-I)^3\) is \(\CB_1 = \{p_1(t), p_2(t), p_3(t)\}\) with \(p_1(t) = -1+t\text{,}\) \(p_2(t) = 1+t^2\text{,}\) and \(p_3(t)= 1+t^3\text{.}\) Since

\begin{equation*} (T-I)(p_1(t)) = 0, \ (T-I)(p_2(t)) = -p_1(t), \ \text{ and } \ (T-I)(p_3(t)) = p_2(t) \end{equation*}

we see that \(T-I\) maps \(\Ker(T-I)^3\) to \(\Ker(T-I)^3\text{.}\) The matrix of \(T-I\) with respect to \(\CB_1\) is

\begin{equation*} [T-I]_{\CB_1} = \left[ \begin{array}{crc} 0\amp -1\amp 0 \\ 0\amp 0\amp 1 \\ 0\amp 0\amp 0 \end{array} \right]\text{.} \end{equation*}

The reduced row echelon form of \([T-I]_{\CB_1}\) and \([T-I]_{\CB_1}^2\) are

\begin{equation*} \left[ \begin{array}{ccc} 0\amp 1\amp 0 \\ 0\amp 0\amp 1 \\ 0\amp 0\amp 0 \end{array} \right] \ \text{ and } \ \left[ \begin{array}{ccr} 0\amp 0\amp 1 \\ 0\amp 0\amp 0 \\ 0\amp 0\amp 0 \end{array} \right]\text{,} \end{equation*}

while \([T-I]_{\CB_1}^3 = 0\text{.}\) This makes \((T-I)^3 = 0\text{.}\) We apply Lemma 40.15 to \(T-I : \Ker(T-I)^3 \to \Ker(T-I)^3\text{.}\) We choose \(u_1\) to be a vector in \(\Ker(T-I)^3\) that is not in \(\Ker(T-I)^2\text{.}\) Once such vector has \([u_1]_{\CB_1} = [0 \ 0 \ 1]^{\tr}\text{,}\) or \(u_1=1+t^3\text{.}\) We then let \(u_2 = (T-I)(u_1) = 1+t^2\) and \(u_1 = (T-I)(u_2) = 1-t\text{.}\) This gives us the basis \(\{1-t, 1+t^2, 1+t^3\}\) for \(\Ker(T-I)^3\text{.}\)

We can also apply Lemma 40.15 to \(T-2I : \Ker(T-I)^2 \to \Ker(T-I)^2\text{.}\) Since

\begin{equation*} (T-2I)(p_4(t)) = 0 \ \text{ and } \ (T-2I)(p_5(t)) = -p_4(t)\text{,} \end{equation*}

we have that

\begin{equation*} [T-2I]_{\CB_2} = \left[ \begin{array}{cr} 0\amp -1 \\ 0\amp 0 \end{array} \right]\text{.} \end{equation*}

It follows that \((T-2I)^2 = 0\text{.}\) Selecting \(u_4 = t^4\) and letting \(u_5 = (T-2I)(u_4) = -p_4(t)\text{,}\) we obtain the basis \(\{-t^3, t^4\}\) for \(\Ker(T-I)^2\text{.}\)

Let \(q_1(t) = 1+t^3\text{,}\) \(q_2(t) = 1+t^2\text{,}\) \(q_3(t) = 1-t\text{,}\) \(q_4(t) = t^4\text{,}\) and \(q_5(t) = -t^3\text{,}\) and let \(\CC = \{q_1(t), q_2(t), q_3(t), q_4(t), q_5(t)\}\text{.}\) Since

\begin{align*} T(q_1(t)) \amp = q_1(t)+q_2(t)\\ T(q_2(t)) \amp = q_2(t)+q_3(t)\\ T(q_3(t)) \amp = q_3(t)\\ T(q_4(t)) \amp = 2q_4(t)+q_5(t)\\ T(q_5(t)) \amp = 2q_5(t) \end{align*}

it follows that

\begin{equation*} [T]_{\CC} = \left[ \begin{array}{ccccc} 1\amp 1\amp 0\amp 0\amp 0 \\ 0\amp 1\amp 1\amp 0\amp 0 \\ 0\amp 0\amp 1\amp 0\amp 0 \\ 0\amp 0\amp 0\amp 2\amp 1 \\ 0\amp 0\amp 0\amp 0\amp 2 \end{array} \right]\text{,} \end{equation*}

and we have found a basis for \(\pol_4\) for which the matrix \(T\) has a Jordan canonical form.

Subsection Examples

What follows are worked examples that use the concepts from this section.

Example 40.17.

Find a Jordan form \(J\) for each of the following matrices. Find a matrix \(C\) such that \(C^{-1}AC = J\text{.}\)

(a)

\(A = \left[ \begin{array}{cc} 1\amp 1\\1\amp 1 \end{array} \right]\)

Solution.

The eigenvalues of \(A\) are \(2\) and \(0\) with corresponding eigenvectors \([1 \ 1]^{\tr}\) and \([-1 \ 1]^{\tr}\text{.}\) Since we have a basis for \(\R^2\) consisting of eigenvectors for \(A\text{,}\) we know that \(A\) is diagonalizable. Moreover, Jordan canonical form of \(A\) is \(J = \left[ \begin{array}{cc} 2\amp 0\\0\amp 0 \end{array} \right]\) and \(C^{-1}AC = J\text{,}\) where \(C = \left[ \begin{array}{cr} 1\amp -1\\1\amp 1 \end{array} \right]\text{.}\)

(b)

\(A = \left[ \begin{array}{ccc} 0\amp 1\amp 1\\0\amp 0\amp 0 \\ 0\amp 0\amp 0 \end{array} \right]\)

Solution.

Since \(A\) is upper triangular, its eigenvalues are the diagonal entries. So the only eigenvalue of \(A\) is 0, and technology shows that this eigenvalue has geometric multiplicity 2. An eigenvector for \(A\) is \(\vv_1 = [1 \ 0 \ 0]^{\tr}\text{.}\) A vector \(\vv_2\) that satisfies \(A \vv_2 = \vv_1\) is \(\vv_2 = [0 \ 0 \ 1]^{\tr}\text{.}\) Letting \(C = \left[ \begin{array}{ccr} 1\amp 0\amp 0\\0\amp 0\amp -1\\0\amp 1\amp 1 \end{array} \right]\) gives us

\begin{equation*} \C^{-1}AC = \left[ \begin{array}{ccc} 0\amp 1\amp 0 \\ 0\amp 0\amp 0 \\ 0\amp 0\amp 0 \end{array} \right]\text{.} \end{equation*}
(c)

\(A = \left[ \begin{array}{cccc} 1\amp 1\amp 1\amp 1\\0\amp 1\amp 0\amp 0\\0\amp 0\amp 2\amp 2 \\ 0\amp 0\amp 0\amp 2 \end{array} \right]\)

Solution.

Again, \(A\) is upper triangular, so the eigenvalues of \(A\) are \(2\) and \(1\text{,}\) both of algebraic multiplicity 2 and geometric multiplicity 1. Technology shows that the reduced row echelon forms of \(A - 2I_4\) and \((A-2I_4)^2\) are

\begin{equation*} \left[ \begin{array}{ccrc} 1\amp 0\amp -1\amp 0 \\ 0\amp 1\amp 0\amp 0 \\ 0\amp 0\amp 0\amp 1 \\ 0\amp 0\amp 0\amp 0 \end{array} \right] \ \text{ and } \ \left[ \begin{array}{ccrc} 1\amp 0\amp -1\amp 1 \\ 0\amp 1\amp 0\amp 0 \\ 0\amp 0\amp 0\amp 0 \\ 0\amp 0\amp 0\amp 0 \end{array} \right]\text{.} \end{equation*}

Now \(\vv_2 = [-1 \ 0 \ 0 \ 1]^{\tr}\) is in \(\Nul (A-2I_4)^2\text{,}\) and

\begin{equation*} \vv_1 = (A-2I_4)\vv_2 = [2 \ 0 \ 2 \ 0]^{\tr}\text{.} \end{equation*}

Notice that Let \(\vv_1\) is an eigenvector of \(A\) with eigenvalue \(2\text{.}\) Technology also shows that the reduced row echelon forms of \(A-I_4\) and \((A-I_4)^2\) are

\begin{equation*} \left[ \begin{array}{cccc} 0\amp 1\amp 0\amp 0 \\ 0\amp 0\amp 1\amp 0 \\ 0\amp 0\amp 0\amp 1 \\ 0\amp 0\amp 0\amp 0 \end{array} \right] \ \text{ and } \ \left[ \begin{array}{cccc} 0\amp 0\amp 1\amp 0 \\ 0\amp 0\amp 0\amp 1 \\ 0\amp 0\amp 0\amp 0 \\ 0\amp 0\amp 0\amp 0 \end{array} \right]\text{.} \end{equation*}

Now \(\vv_4 = [0 \ 1 \ 0 \ 0]^{\tr}\) is in \(\Nul (A-I_4)^2\text{,}\) and

\begin{equation*} \vv_3 = (A-I_4)\vv_4 = [1 \ 0 \ 0 \ 0]^{\tr}\text{.} \end{equation*}

Notice that Let \(\vv_3\) is an eigenvector of \(A\) with eigenvalue \(1\text{.}\) Letting \(C = \left[ \begin{array}{crcc} 2\amp -1\amp 1\amp 0 \\ 0\amp 0\amp 0\amp 1 \\ 2\amp 0\amp 0\amp 0 \\ 0\amp 1\amp 0\amp 0 \end{array} \right]\) gives us

\begin{equation*} C^{-1}AC = \left[ \begin{array}{cccc} 2\amp 1\amp 0\amp 0 \\ 0\amp 2\amp 0\amp 0 \\ 0\amp 0\amp 1\amp 1 \\ 0\amp 0\amp 0\amp 1 \end{array} \right]\text{.} \end{equation*}
Figure 40.18. The effect of the transformation \(T\text{.}\)

Example 40.19.

Let \(T(\vx) = A\vx\text{,}\) where \(A = \left[ \begin{array}{crc} 0\amp 1\amp 1 \\ 0\amp 1\amp 2 \\ 1\amp -1\amp 2 \end{array} \right]\text{.}\) Assume that \(P^{-1}AP = \left[ \begin{array}{ccc} 1\amp 1\amp 0 \\ 0\amp 1\amp 1 \\ 0\amp 0\amp 1 \end{array} \right]\text{,}\) where \(P = \left[ \begin{array}{ccc} 1\amp 0\amp 4 \\ 0\amp 2\amp 4 \\ 1\amp 2\amp 0 \end{array} \right]\text{.}\) Find a specific coordinate system in which it is possible to succinctly describe the action of \(T\text{,}\) then describe the action of \(T\) on \(\R^3\) in as much detail as possible.

Solution.

First note that \(1\) is the only eigenvalue of \(A\text{.}\) Since \(A\) does not have \(0\) as an eigenvalue, it follows that \(A\) is invertible and so \(T\) is both one-to-one and onto. Let \(\vv_1 = [4 \ 4 \ 0]^{\tr}\text{,}\) \(\vv_2 = [0 \ 2 \ 2]^{\tr}\text{,}\) and \(\vv_3 = [1 \ 0 \ 1]^{\tr}\text{.}\) We have that

\begin{align*} (A-I_3)\vv_3 \amp = \vv_2\\ (A-I_3)\vv_2 \amp = \vv_1\\ (A-I_3) \vv_1 \amp = \vzero\text{.} \end{align*}

and so

\begin{align*} T(\vv_3) \amp = A \vv_3 = \vv_3+\vv_2\\ T(\vv_2) \amp = A \vv_2 = \vv_2+\vv_1\\ T(\vv_1) \amp = A\vv_1 = \vv_1\text{.} \end{align*}

If we consider the coordinate system in \(\R^3\) defined by the basis \(\{\vv_1, \vv_2, \vv_3\}\) as shown in blue in Figure 40.18, the fact that \(T(\vv_1) = \vv_1\) shows that \(T\) fixes all vectors in \(\Span\{\vv_1\}\text{.}\) That \(T(\vv_2) = \vv_2+\vv_1\) tells us that \(T\) maps \(\Span\{\vv_2\}\) onto \(\Span\{\vv_1+\vv_2\}\text{,}\) and \(T(\vv_3) = \vv_3+\vv_2\) shows that \(T\) maps \(\Span\{\vv_3\}\) onto \(\Span\{\vv_2+\vv_3\}\text{.}\) So \(T\) sends the box defined by \(\vv_1\text{,}\) \(\vv_2\text{,}\) and \(\vv_3\) onto the box defined by \(\vv_1\text{,}\) \(\vv_1+\vv_2\text{,}\) and \(\vv_2+\vv_3\) (in red in Figure 40.18. So the action of \(T\) is conveniently viewed in the coordinate system determined by the columns of a matrix \(P\) that converts \(A\) into its Jordan canonical form.

Subsection Summary

  • Any square matrix \(A\) is similar to a Jordan canonical form

    \begin{equation*} \left[ \begin{array}{cccc} J_1\amp 0\amp \cdots\amp 0 \\ 0\amp J_2\amp \cdots\amp 0 \\ \vdots\amp \vdots\amp \ddots\amp \vdots \\ 0\amp 0\amp \cdots\amp J_t \end{array} \right]\text{,} \end{equation*}

    where each matrix \(J_i\) is a Jordan block of the form

    \begin{equation*} \left[ \begin{array}{ccccccc} \lambda\amp 1\amp 0 \amp \cdots\amp 0\amp 0\amp 0 \\ 0\amp \lambda\amp 1\amp \cdots\amp 0\amp 0\amp 0 \\ 0\amp 0\amp \lambda\amp \cdots\amp 0\amp 0\amp 0 \\ \amp \amp \amp \ddots\amp \ddots\amp \amp \\ 0\amp 0\amp 0\amp \cdots\amp \lambda\amp 1\amp 0 \\ 0\amp 0\amp 0\amp \cdots\amp 0\amp \lambda\amp 1 \\ 0\amp 0\amp 0\amp \cdots\amp 0\amp 0\amp \lambda \end{array} \right]\text{.} \end{equation*}

    with \(\lambda\) as a eigenvalue of \(A\text{.}\)

  • A generalized eigenvector of an \(n \times n\) matrix \(A\) corresponding to an eigenvalue \(\lambda\) of \(A\) is a non-zero vector \(\vx\) satisfying

    \begin{equation*} (A - \lambda I_n)^m \vx = \vzero \end{equation*}

    for some positive integer \(m\text{.}\) If \(A\) is an \(n \times n\) matrix, then we can find a basis of \(\R^n\) consisting of generalized eigenvectors \(\vv_1\text{,}\) \(\vv_2\text{,}\) \(\ldots\text{,}\) \(\vv_n\) of \(A\) so that that matrix \(C = [\vv_1 \ \vv_2 \ \cdots \ \vv_n]\) has the property that \(C^{-1}AC\) is a Jordan canonical form.

  • A vector space \(V\) is a direct sum of subspaces \(V_1\text{,}\) \(V_2\text{,}\) \(\ldots\text{,}\) \(V_m\) if every vector \(\vv\) in \(V\) can be written uniquely as a sum

    \begin{equation*} \vv = \vv_1+\vv_2+\vv_3+\cdots + \vv_m\text{,} \end{equation*}

    with \(\vv_i \in V_i\) for each \(i\text{.}\)

  • A square matrix \(A\) is nilpotent if and only if \(0\) is the only eigenvalue of \(A\text{.}\) The Jordan form of a matrix \(A\) can always be written in the form \(D + N\text{,}\) where \(D\) is a diagonal matrix and \(N\) is a nilpotent matrix.

  • A subspace \(W\) of a vector space \(V\) is invariant under a linear transformation \(T: V \to V\) if \(T(\vw) \in W\) whenever \(\vw\) is in \(W\text{.}\)

Exercises Exercises

1.

Find a Jordan canonical form for each of the following matrices.

(a)

\(A=\left[ \begin{array}{rcr} 10\amp 6\amp 1\\2\amp 12\amp -1\\-4\amp 12\amp 14 \end{array} \right]\)

(b)

\(B=\left[ \begin{array}{rcc} 8\amp 4\amp 1\\0\amp 9\amp 0\\-1\amp 4\amp 10 \end{array} \right]\)

(c)

\(C=\left[ \begin{array}{crc} 6\amp 0\amp 0\\2\amp 4\amp 2\\8\amp -8\amp 14 \end{array} \right]\)

(d)

\(D=\left[ \begin{array}{rrrr} 3\amp 3\amp -2\amp 1\\-2\amp -3\amp 3\amp -2\\0\amp -1\amp 2\amp -1\\5\amp 8\amp -6\amp 4 \end{array} \right]\)

2.

Let \(a \neq 0\text{.}\) Show that the Jordan canonical form of \(\left[ \begin{array}{cccc} 4\amp 1\amp 0\amp 0 \\ 0\amp 4\amp a\amp 0 \\ 0\amp 0\amp 4\amp 0 \\ 0\amp 0\amp 0\amp 4 \end{array} \right]\) is independent of the value of \(a\text{.}\)

3.

Show that the Jordan canonical form of \(\left[ \begin{array}{cccc} 2\amp 1\amp 0\amp 0 \\ 0\amp 2\amp 1\amp 0 \\ 0\amp 0\amp 2\amp a \\ 0\amp 0\amp 0\amp 4 \end{array} \right]\) is independent of the value of \(a\text{.}\)

4.

Let \(A\) and \(B\) be similar matrices with \(B = Q^{-1}AQ\text{.}\) Let \(U = C_1^{-1}AC_1\) be the Jordan canonical form of \(A\text{.}\) Show that \(U = C_2^{-1}BC_2\) is also the Jordan canonical form of \(B\) with \(C_2 = Q^{-1}C_1\text{.}\)

5.

Find all of the Jordan canonical forms for \(2 \times 2\) matrices and \(3 \times 3\) matrices.

6.

Find the Jordan canonical form of \(\left[ \begin{array}{crrcr} 1\amp 2\amp 2\amp 1\amp -2\\0\amp -1\amp 0\amp 0\amp 0 \\ 0\amp 0\amp -1\amp 1\amp 2 \\ 0\amp 2\amp 0\amp 1\amp 0 \\ 0\amp 1\amp 0\amp 1\amp 1 \end{array} \right]\text{.}\)

7.

For the matrix \(A\text{,}\) find a matrix \(C\) so that \(J = C^{-1}AC\) is the Jordan canonical form of \(A\text{,}\) where first: the diagonal entries do not increase as we move down the matrix \(J\) and, second: the Jordan blocks do not increase in size as we move down the matrix \(J\text{.}\)

\begin{equation*} A = \left[ \begin{array}{rrrrrrrrrr} 3\amp -1\amp 1\amp -1\amp 1\amp -1\amp 1\amp -1\amp 1\amp -1 \\ 1\amp 1\amp 1\amp 1\amp -1\amp 1\amp -1\amp 1\amp -1\amp 1 \\ 1\amp -1\amp 3\amp 1\amp -1\amp 1\amp -1\amp 1\amp -1\amp 1\\ 1\amp -1\amp 1\amp 1\amp 1\amp 1\amp -1\amp 1\amp -1\amp 1 \\ 0\amp 0\amp 0\amp 0\amp 2\amp 2\amp 0\amp 0\amp 0\amp 0 \\ 0\amp 0\amp 0\amp 0\amp 0\amp 2\amp 2\amp 0\amp 0\amp 0\\ 0\amp 0\amp 0\amp 0\amp 0\amp 0\amp 2\amp 2\amp 0\amp 0\\ -1\amp 1\amp -1\amp 1\amp -1\amp 1\amp -1\amp 3\amp 1\amp -1\\ 2\amp -2\amp 2\amp -2\amp 2\amp -2\amp 2\amp -2\amp 4\amp 0\\ 1\amp -1\amp 1\amp -1\amp 1\amp -1\amp 1\amp -1\amp 1\amp 3 \end{array} \right]\text{.} \end{equation*}

8.

A polynomial in two variables is an object of the form

\begin{equation*} p(s,t) = \sum a_{ij}s^it^j \end{equation*}

where \(i\) and \(j\) are integers greater than or equal to 0. For example,

\begin{equation*} q(s,t) = 3 + 2st + 4s^2t - 10st^3+5s^2t^2 \end{equation*}

is a polynomial in two variables. The degree of a monomial \(s^it^j\) is \(i+j\text{.}\) The degree of a polynomial \(p(s,t)\) is the largest degree of any monomial summand of \(p(s,t)\text{.}\) So the degree of the example polynomial \(q(s,t)\) is 4. Two polynomials in \(V\) are equal if they have the same coefficients on like terms. We add two polynomials in the variables \(s\) and \(t\) by adding coefficients of like terms. Scalar multiplication is done by multiplying each coefficient by the given scalar. Let \(V\) be the set of all polynomials of two variables of degree less than or equal to 2. You may assume that \(V\) is a vector space under this addition and multiplication by scalars and has the standard additive identity and additive inverses. Define \(F: V \to V\) and \(G: V \to V\) by

\begin{equation*} F(p(s,t)) = s\frac{\partial p}{\partial s}(s,t) + \frac{\partial p}{\partial t}(s,t) \end{equation*}

and

\begin{equation*} G(p(s,t)) = \frac{\partial p}{\partial s}(s,t) + t\frac{\partial p}{\partial t}(s,t)\text{.} \end{equation*}

That is,

\begin{align*} F(a_0+a_1s+a_2t+a_3st +a_4s^2+a_5t^2) \amp = s(a_1+a_3t+2a_4s) + (a_2+a_3s+2a_5t)\\ \amp = a_2 + (a_1+a_3)s + 2a_5t + a_3st + 2a_4s^2 \end{align*}

and

\begin{align*} G(a_0+a_1s+a_2t+a_3st +a_4s^2+a_5t^2) \amp = (a_1+a_3t+2a_4s) + t(a_2+a_3s+2a_5t)\\ \amp = a_1 + 2a_4s + (a_2+a_3)t + a_3st + 2a_5t^2\text{.} \end{align*}

You may assume that \(F\) and \(G\) are linear transformations.

(a)

Explain why \(\CB = \{1,s,t,st,s^2,t^2\}\) is a basis for \(V\text{.}\)

(b)

Explain why \(F\) and \(G\) are not diagonalizable.

(c)

Show that \([F]_{\CB}\) and \([G]_{\CB}\) have the same Jordan canonical form. (So different transformations can have the same Jordan canonical form.)

(d)

Find ordered bases \(\CC_F = \{f_1, f_2, f_3, f_4, f_5, f_6\}\) and \(\CC_G = \{g_1, g_2, g_3, g_4, g_5, g_6\}\) of \(V\) for which \([F]_{\CC_F}\) and \([G]_{\CC_G}\) are the Jordan canonical form from (c).

(e)

Recall that a linear transformation can be defined by its action on a basis. So define \(H : V \to V\) satisfying \(H(g_i) = f_i\) for \(i\) from 1 to 6. Show that \(H\) is invertible and that \(G = H^{-1}FH\text{.}\) This is the essential argument to show that any two linear transformations with the same matrix with respect to some bases are similar.

Hint.

What matrix is \([H]_{\CC_G}^{\CC_F}\text{?}\)

9.

Let

\begin{equation*} \vv_p \underset{A - \lambda I}{\rightarrow} \vv_{p-1} \underset{A - \lambda I}{\rightarrow} \cdots \ \underset{A - \lambda I}{\rightarrow} \vv_1 \underset{A - \lambda I}{\rightarrow} \vzero \end{equation*}

be a chain of generalized eigenvectors for a matrix \(A\) corresponding to the eigenvalue \(\lambda\text{.}\) In this problem we show that the set \(\{\vv_1, \vv_2, \ldots, \vv_p\}\) is linearly independent.

(a)

Explain why \(\vv_k\) is in \(\Nul (A-\lambda I)^k\) for any \(k\) between \(1\) and \(p\text{.}\)

Hint.

Show that \(\vv_{1} = (A-\lambda I)^{k-1} \vv_k\) for each \(k\) from \(2\) to \(p\text{.}\)

(b)

Consider the equation

\begin{equation} x_1 \vv_1 + x_2 \vv_2 + \cdots + x_p \vv_p = \vzero\tag{40.8} \end{equation}

for scalars \(x_1\text{,}\) \(x_2\text{,}\) \(\ldots\text{,}\) \(x_p\text{.}\)

(i)

Multiply both sides of (40.8) on the left by \((A-\lambda I)^{p-1}\text{.}\) Explain why we can then conclude that \(x_p = 0\text{.}\)

(ii)

Rewrite (40.8) using the result from part i. Explain how we can then demonstrate that \(x_{p-1} = 0\text{.}\)

(iii)

Describe how we can use the process in parts i. and ii. to show that \(x_1 = x_2 = \cdots x_p = 0\text{.}\) What does this tell us about the set \(\{\vv_1, \vv_2, \ldots, \vv_p\}\text{?}\)

10.

Find, if possible, a matrix transformation \(T:\R^3 \to \R^3\) for which there is a one-dimensional invariant subspace but no two-dimensional invariant subspace. If not possible, explain why.

11.

Let \(A = \left[ \begin{array}{cr} 2\amp -1\\1\amp 0 \end{array} \right]\text{.}\) It is the case that \(A\) has a single eigenvalue of algebraic multiplicity 2 and geometric multiplicity 1.

(a)

Find a matrix \(C\) whose columns are generalized eigenvectors for \(A\) so that \(C^{-1}AC=J\) is in Jordan canonical form.

(b)

Determine the entries of \(J^k\) and then find the entries of \(A^k\) for any positive integer \(k\text{.}\)

12.

Let \(\vv_1\text{,}\) \(\vv_2\text{,}\) \(\ldots\text{,}\) \(\vv_r\) be eigenvectors of an \(n \times n\) matrix \(A\text{,}\) and let \(W = \Span\{\vv_1, \vv_2, \ldots, \vv_r\}\text{.}\) Show that \(W\) is invariant under the matrix transformation \(T\) defined by \(T(\vx) = A\vx\text{.}\)

13.

Let \(A\) be an \(n \times n\) matrix with eigenvalue \(\lambda\text{.}\) Let \(S:\R^n \to \R^n\) be defined by \(S(\vx) = B\vx\) for some matrix \(B\text{.}\) Show that if \(B\) commutes with \(A\text{,}\) then the eigenspace \(E_\lambda\) of \(A\) corresponding to the eigenvalue \(\lambda\) is invariant under \(S\text{.}\)

Hint.

Show that \((A - \lambda I)S(\vx) = \vzero\)

14.

Determine which of the following matrices is nilpotent. Justify your answers. For each nilpotent matrix, find its index.

(a)

\(\left[ \begin{array}{rc} -2\amp 1\\-4\amp 2 \end{array} \right]\)

(b)

\(\left[ \begin{array}{cc} 0\amp 1\\1\amp 0 \end{array} \right]\)

(c)

\(\left[ \begin{array}{rrc} -1\amp 1\amp 0\\0\amp -1\amp 1 \\1\amp -3\amp 2 \end{array} \right]\)

(d)

\(\left[ \begin{array}{cc} xy\amp x^2\\-y^2\amp -xy \end{array} \right]\) for any real numbers \(x\) and \(y\)

15.

Find two different nonzero \(3 \times 3\) nilpotent matrices whose index is \(2\text{.}\) If no such matrices exist, explain why.

16.

Find, if possible, \(4 \times 4\) matrices whose indices are \(1\text{,}\) \(2\text{,}\) \(3\text{,}\) and \(4\text{.}\) If not possible, explain why.

17.

Let \(V\) be an \(n\)-dimensional vector space and let \(W\) be a subspace of \(V\text{.}\) Show that \(V = W \oplus W^{\perp}\text{.}\)

Hint.

Use a projection onto a subspace.

18.

Let \(W\) be a subspace of an \(n\)-dimensional vector space \(V\) that is invariant under a transformation \(T: V \to V\text{.}\)

(a)

Show by example that \(W^{\perp}\) need not be invariant under \(T\text{.}\)

(b)

Show that if \(T\) is an isometry, then \(W^{\perp}\) is invariant under \(T\text{.}\)

19.

Let \(T: \pol_2 \to \pol_2\) be defined by \(T(p(t)) = p(t) - p'(t)\text{,}\) where \(p'(t)\) is the derivative of \(p(t)\text{.}\) That is, if \(p(t) = a_0+a_1t+a_2t^2\text{,}\) then \(p'(t) = a_1 + 2a_2t\text{.}\) Find a basis \(\CB\) for \(\pol_2\) in which the matrix \([T]_{\CB}\) is in Jordan canonical form.

20.

Let \(A\) be an \(n \times n\) nilpotent matrix with index \(m\text{.}\) Since \(A^{m-1}\) is not zero, there is a vector \(\vv \in \R^n\) such that \(A^{m-1} \vv \neq \vzero\text{.}\) Prove that the vectors \(\vv\text{,}\) \(A\vv\text{,}\) \(A^2\vv\text{,}\) \(\ldots\text{,}\) \(A^{m-1}\vv\) are linearly independent.

21.

(a)

Let \(A = \left[ \begin{array}{rcr} 2\amp 1\amp -3 \\ -2\amp 1\amp 1 \\ 2\amp 1\amp -3 \end{array} \right]\text{.}\)

(i)

Show that \(A\) is nilpotent.

Hint.

Calculate powers of \(A\text{.}\)

(ii)

Calculate the matrix product \((I_3-A)\left(I_3+A+A^2\right)\text{.}\) What do you notice and what does this tell us about \((I_3-A)\text{?}\)

(b)

If \(A\) is an \(n \times n\) nilpotent matrix, show that \(I - A\) is nonsingular, where \(I\) is the \(n \times n\) identity matrix.

22.

In this exercise we show that every upper triangular matrix satisfies its characteristic polynomial.

(a)

To illustrate how this will work, consider a \(3 \times 3\) example. Let \(T = \left[ \begin{array}{ccc} \lambda_1\amp a\amp b \\ 0\amp \lambda_2\amp c \\ 0\amp 0\amp \lambda_3 \end{array} \right]\text{.}\)

(i)

What is the characteristic polynomial \(p(x)\) of \(T\text{?}\)

(ii)

Consider the matrices \(S_1 = T-\lambda_1 I_3\text{,}\) \(S_2 = T-\lambda_2 I_3\text{,}\) and \(S_3 = T-\lambda_3 I_3\text{.}\) Show that the first column of \(S_1\) is \(\vzero^{\tr}\text{,}\) the first two columns of \(S_1S_2\) are \(\vzero^{\tr}\text{,}\) and that every column of \(S_1S_2S_3\) is \(\vzero^{\tr}\text{.}\) Conclude that \(T\) satisfies its characteristic polynomial.

(b)

Now we consider the general case. Suppose \(T\) is an \(n \times n\) upper triangular matrix with diagonal entries \(\lambda_1\text{,}\) \(\lambda_2\text{,}\) \(\ldots\text{,}\) \(\lambda_n\) in order. For \(i\) from \(1\) to \(n\) let \(S_i = T - \lambda_i I_n\text{.}\) Show that for each \(k\) from 1 to \(n\text{,}\) the first \(k\) columns of \(S_1S_2 \cdots S_k\) are equal to \(\vzero{\tr}\text{.}\) Then explain how this demonstrates that \(T\) satisfies its characteristic polynomial.

23.

Prove Theorem 40.12 that a square matrix \(A\) is nilpotent if and only if 0 is the only eigenvalue of \(A\text{.}\)

Hint 1.

For one direction, use the Cayley-Hamilton Theorem.

Hint 2.

If \(A\) is nilpotent and \(\vx\) is an eigenvector of \(A\text{,}\) what is \(A^m \vx\) for every positive integer \(m\text{?}\) If \(0\) is the only eigenvalue of \(A\text{,}\) what is the characteristic polynomial of \(A\text{?}\)

24.

Let \(V\) be a vector space that is a direct sum \(V = V_1 \oplus V_2 \oplus \cdots \oplus V_m\) for some positive integer \(m\text{.}\) Prove the following.

(a)

\(V_i \cap V_j = \{\vzero\}\) whenever \(i \neq j\text{.}\)

(b)

If \(V\) is finite dimensional, and if \(\CB_i\) is a basis for \(V_i\text{,}\) then the set \(\CB = \cup_{i=1}^m \CB_i\) is a basis for \(V\text{.}\)

(c)

\(\dim(V) = \dim(V_1) + \dim(V_2) + \cdots + \dim(V_m)\text{.}\)

25.

Label each of the following statements as True or False. Provide justification for your response.

(a) True/False.

If \(A\) is an \(n \times n\) nilpotent matrix, then the index of \(A\) is \(n\text{.}\)

(b) True/False.

The Jordan canonical form of a matrix is unique.

(c) True/False.

Every nilpotent matrix is singular.

(d) True/False.

Eigenvectors of a linear transformation \(T\) are also generalized eigenvectors of \(T\text{.}\)

(e) True/False.

It is possible for a generalized eigenvector of a matrix \(A\) to correspond to a scalar that is not an eigenvalue for \(A\text{.}\)

(f) True/False.

The vectors in a cycle of generalized eigenvectors of a matrix are linearly independent.

(g) True/False.

A Jordan canonical form of a diagonal matrix \(A\) is \(A\text{.}\)

(h) True/False.

Let \(V\) be a finite-dimensional vector space and let \(T\) be a linear transformation from \(V\) to \(V\text{.}\) Let \(J\) be a Jordan canonical form for \(T\text{.}\) If \(\CB\) is a basis of \(V\text{,}\) then a Jordan canonical form of \([T]_{\CB}\) is \(J\text{.}\)

(i) True/False.

Matrices with the same Jordan canonical form are similar.

(j) True/False.

Let \(V\) be a finite-dimensional vector space and let \(T\) be a linear transformation from \(V\) to \(V\text{.}\) If \(\CB\) is an ordered basis for \(V\text{,}\) then \(T\) is nilpotent if and only if \([T]_{\CB}\) is nilpotent.

(k) True/False.

If \(T\) is a linear transformation whose only eigenvalue is 0, then \(T\) is the zero transformation.

(l) True/False.

Let \(V\) be a finite-dimensional vector space and let \(T\) be a linear transformation from \(V\) to \(V\text{.}\) Then \(V = T(V) \oplus \Ker(T)\text{.}\)

(m) True/False.

Let \(V\) be a finite-dimensional vector space and let \(T\) be a linear transformation from \(V\) to \(V\text{.}\) If \(\CB\) is an ordered basis for \(V\) of generalized eigenvectors of \(T\text{,}\) then \([T]_{\CB}\) is a Jordan canonical form for \(T\text{.}\)

Subsection Project: Modeling an Epidemic

The COVID-19 epidemic has generated many mathematical and statistical models to try to understand the spread of the virus. In this project we examine a simple stochastic model of the spread of an epidemic proposed by Norman Bailey in 1950. 64  This is a model of a relatively mild epidemic in which no one dies from the disease. Of course, mathematicians build on simple models to form more complicated and realistic ones, but this is a good, and accessible, starting point. Bailey writes about the the difficulties in the stochastic 65  analysis of epidemics. For example, the overall epidemic “can often be broken down into smaller epidemics occurring in separate regional subdivisions” and that these regional epidemics “are not necessarily in phase and often interact with each other”. This is behavior that has been clearly evident in the COVID-19 epidemic in the US. Even within a single district, “it is obvious that a given infectious individual has not the same chance of infecting each inhabitant. He will probably be in close contact with a small number of people only, perhaps of the order of 10-50, depending on the nature of his activities.” But then the epidemic for the whole district will “be built up from epidemics taking place in several relatively small groups of associates and acquaintances.” So we can see in this analysis that an epidemic can spread from small, localized areas.

Bailey begins by considering a community of \(n\) persons who are susceptible to a disease, and supposes introducing a single infected individual into the community. Bailey makes the following assumptions: “We shall assume that the infection spreads by contact between the members of the community, and that it is not sufficiently serious for cases to be withdrawn from circulation by isolation or death; also that no case becomes clear of infection during the course of the main part of the epidemic.”

To see how the Bailey's model is constructed, we make some assumption. We split the population at any time into two groups, those infected with the disease, and those susceptible, i.e., not currently infected. We assume that once an individual is infected, that individual is always infected. A person catches the disease by interacting with an infected individual. That is, if a susceptible individual meets an infected individual, the chance that the susceptible person contracts the disease is \(\beta\) (the infection rate). For example, there many be a \(5\%\) chance that an encounter between a susceptible individual and an infected individual results in the susceptible individual contracting the disease. (Of course, there are many variables involved in such an interaction — if no one is wearing masks and the interaction involves close contact, the infection rate would be higher than if everyone practices social distancing. For the sake of simplicity, we assume one infection rate for the entire population.) With this simple model we assume assume homogeneous mixing in the population. That is, it is equally likely that any one individual will interact with any other in a given time frame. Let \(y\) be the number of susceptible individuals in the population at time \(t\text{.}\) Then the number of infected individuals is the total population minus \(y\text{,}\) or \(n+1-y\) (recall that we introduced an infected individual into the population). So the change in the number of susceptible individuals in the population at time \(s\) is \(-\beta y(n+1-y)\) (since the \(n+1-y\) susceptible individuals interact with the \(y\) infected individuals). That is, \(\frac{dy}{ds} = -\beta y (n+1-y)\text{.}\) Bailey makes the substitution of \(t\) for \(\beta s\) to simplify the equation. This substitution makes \(ds = \frac{1}{\beta} dt\text{,}\) which produces the differential equation \(\beta \frac{dy}{dt} = -\beta y (n+1-y)\text{.}\) The \(\beta\)s cancel, producing the differential equation

\begin{equation} \frac{dy}{dt} = -y(n+1-y)\tag{40.9} \end{equation}

with \(y(0) = n\text{.}\)

The above analysis is just to provide some background. Bailey's stochastic model deals with probabilities instead of actual numbers, so we now take that approach. For \(r\) from \(0\) to \(n\text{,}\) let \(p_r(t)\) be the probability that there are \(r\) susceptible individuals still uninfected at time \(t\text{.}\) Similar to the case describe above, if there are \(r\) susceptible individuals, any one of them can be come infected by interacting with the \(n-r+1\) infected individuals. With that in mind, Bailey's model of the spread of the disease is the system

\begin{equation*} \begin{cases}\frac{d p_r(t)}{dt} = (r+1)(n-r)p_{r+1}(t) - r(n-r+1)p_r(t) \amp \text{ for } 0 \leq r \leq n-1\\ \frac{dp_n(t)}{dt} = -np_n(t). \amp \end{cases} \end{equation*}

That is,

\begin{align*} \frac{d p_0(t)}{dt} \amp = np_1(t)\\ \frac{d p_1(t)}{dt} \amp = 2(n-1)p_2(t) - np_1(t)\\ \frac{d p_2(t)}{dt} \amp = 3(n-2)p_3(t) - 2(n-1)p_2(t)\\ \vdots \amp\\ \frac{d p_{n-1}(t)}{dt} \amp = np_n(t) - (n-1)(2)p_{n-1}(t)\\ \frac{dp_n(t)}{dt} \amp = -np_n(t)\text{.} \end{align*}

Since we start with \(n\) susceptible individuals, at time \(0\) we have \(p_r(0) = 0\) if \(0 \leq r \leq n-1\) and \(p_n(0)=1\text{.}\)

This system can be written in matrix form. Let \(P(t) = \left[ \begin{array}{c} p_0(t)\\p_1(t) \\ \vdots \\ p_n(t) \end{array} \right]\text{.}\) Assuming that we can differentiate a vector function component-wise, our system becomes

\begin{equation*} \frac{dP(t)}{dt} = A P(t) \end{equation*}

with initial condition \(P(0) = [0 \ 0 \ \ldots \ 0 \ 1]^{\tr}\text{,}\) where

\begin{equation*} A=\left[ \begin{array}{ccccccccc} 0\amp n\amp 0\amp 0\amp 0\amp \cdots \amp 0\amp 0\amp 0 \\ 0\amp -n\amp 2(n-1)\amp 0 \amp 0\amp \cdots \amp 0\amp 0\amp 0 \\ 0\amp 0\amp -2(n-1)\amp 3(n-2) \amp 0 \amp \cdots \amp 0\amp 0\amp 0 \\ \amp \amp \amp \amp \vdots\amp \amp \amp \amp \\ 0\amp 0\amp \cdots \amp -r(n-r+1) \amp (r+1)(n-r)\amp 0 \amp \cdots \amp 0\amp 0 \\ \amp \amp \amp \amp \vdots\amp \amp \amp \amp \\ 0\amp 0\amp 0\amp \cdots\amp 0\amp 0 \amp 0\amp -2(n-1)\amp n \\ 0\amp 0\amp 0\amp \cdots\amp 0\amp 0\amp 0\amp 0\amp -n \end{array} \right]\text{.} \end{equation*}

The solution to this system involves a matrix exponential \(e^{At}\text{.}\) The matrix exponential acts much like our familiar exponential function in that

\begin{equation*} \frac{d}{dx}e^{At} = Ae^{At}\text{.} \end{equation*}

With this in mind it is not difficult to see that Bailey's system has solution \(P(t)= e^{At}P(0)\text{.}\) To truly understand this solution, we need to make sense of the matrix exponential \(e^{At}\text{.}\)

We can make sense of the matrix exponential by utilizing the Taylor series of the exponential function centered at the origin. From calculus we know that

\begin{equation} e^x = 1 + x + \frac{1}{2!}x^2 + \frac{1}{3!}x^3 + \cdots + \frac{1}{n!}x^n + \cdots = \sum_{k\geq 0} \frac{1}{k!}x^k\text{.}\tag{40.10} \end{equation}

Since powers of square matrices are defined, the matrix exponential \(e^M\) of a square matrix \(M\) is then

\begin{equation} e^M = I_n + M + \frac{1}{2!}M^2 + \frac{1}{3!}M^3 + \cdots + \frac{1}{n!}M^n + \cdots = \sum_{k\geq 0} \frac{1}{k!}M^k\text{.}\tag{40.11} \end{equation}

Just as in calculus, \(e^M\) converges for any square matrix \(M\text{.}\)

Project Activity 40.12.

If \(M\) is diagonalizable, then \(e^M\) can be found fairly easily. Assume that there is an invertible matrix \(P\) such that \(P^{-1}MP = D\text{,}\) where \(D = \left[ \begin{array}{ccccc} \lambda_1\amp 0 \amp 0\amp \cdots \amp 0 \\ 0 \amp \lambda_2\amp 0 \amp \cdots \amp 0 \\ \vdots \amp \vdots \amp \vdots \amp \ddots \amp \vdots \\ 0 \amp 0\amp 0 \amp \cdots \amp \lambda_n \end{array} \right]\) is a diagonal matrix.

(a)

Use (40.11) to explain why

\begin{equation*} e^M = Pe^DP^{-1}\text{.} \end{equation*}
(b)

Now show that

\begin{equation*} e^M = P\left[ \begin{array}{ccccc} e^{\lambda_1}\amp 0 \amp 0\amp \cdots \amp 0 \\ 0 \amp e^{\lambda_2}\amp 0 \amp \cdots \amp 0 \\ \vdots \amp \vdots \amp \vdots \amp \ddots \amp \vdots \\ 0 \amp 0\amp 0 \amp \cdots \amp e^{\lambda_n} \end{array} \right]P^{-1}\text{.} \end{equation*}
Hint.

What is \(D^k\) for any positive integer \(k\text{?}\) Then add corresponding components and compare to (40.10).

As we will see, the matrix \(A\) in Bailey's model is not diagonalizable, so we need to learn how to consider the matrix exponential in this new situation. In these cases we can utilize the Jordan canonical form. Of course, the computations are more complicated. We first illustrate with the \(2 \times 2\) case.

Project Activity 40.13.

Assume \(A\) is a \(2 \times 2\) matrix with Jordan canonical form\(J = \left[ \begin{array}{cc} \lambda\amp 1\\0\amp \lambda \end{array} \right]\text{,}\) where \(C^{-1}AC = J\text{.}\) The same argument as above shows that

\begin{equation*} e^A = Ce^JC^{-1} \end{equation*}

and so we only have to be able to find \(e^J\text{.}\)

(a)

We can write \(J\) in the form \(J = D+N\text{,}\) where \(D\) is a diagonal matrix and \(N\) is a nilpotent matrix. Find \(D\) and \(N\) in this case and explain why \(N\) is a nilpotent matrix.

(b)

Find the index of \(N\text{.}\) That is, find the smallest positive power \(s\) of \(N\) such that \(N^s = 0\text{.}\) Then use (40.11) to find \(e^N\text{.}\)

(c)

Assume that the matrix exponential satisfies the standard property of exponential functions that \(e^{R+S} = e^Re^S\) for any \(n \times n\) matrices \(R\) and \(S\text{.}\) Use this property to explain why

\begin{equation*} e^J = \left[ \begin{array}{cc} e^{\lambda}\amp e^{\lambda}\\0\amp e^{\lambda} \end{array} \right]\text{.} \end{equation*}
(d)

Use the previous information to calculate \(e^M\) where \(M = \left[ \begin{array}{cr} 3 \amp -1 \\ 4 \amp 7 \end{array} \right]\text{.}\)

Now we turn to the general case of finding the matrix exponential \(e^M\) when \(M\) is not diagonalizable. Suppose \(C\) is an invertible matrix and \(C^{-1}MC = J\text{,}\) where \(J\) is a Jordan canonical form of \(M\text{.}\) Then we have

\begin{equation*} e^M = e^{CJC^{-1}} = C e^J C^{-1}\text{.} \end{equation*}

We know that \(J\) can be written in the form \(D+N\text{,}\) where \(D\) is a diagonal matrix and \(N\) is an upper triangular matrix with zeros on the diagonal and some ones along the superdiagonal. It follows that

\begin{equation*} e^M = Ce^De^NC^{-1}\text{.} \end{equation*}

We know that if

\begin{equation*} D = \left[ \begin{array}{cccccc} \lambda_1\amp 0\amp 0\amp \cdots\amp 0\amp 0 \\ 0\amp \lambda_2\amp 0\amp \cdots\amp 0\amp 0 \\ \amp \amp \amp \ddots\amp \amp \\ 0\amp 0\amp 0\amp \cdots\amp 0\amp \lambda_n \end{array} \right]\text{,} \end{equation*}

then

\begin{equation*} e^D = \left[ \begin{array}{cccccc} e^{\lambda_1}\amp 0\amp 0\amp \cdots\amp 0\amp 0 \\ 0\amp e^{\lambda_2}\amp 0\amp \cdots\amp 0\amp 0 \\ \amp \amp \amp \ddots\amp \amp \\ 0\amp 0\amp 0\amp \cdots\amp 0\amp e^{\lambda_n} \end{array} \right]\text{.} \end{equation*}

So finding \(e^M\) boils down to determining \(e^N\text{.}\)

Now \(N\) has only zero as an eigenvalue, so \(N\) is nilpotent. If \(k\) is the index of \(N\text{,}\) then

\begin{equation*} e^N = I + N + \frac{1}{2}N^2 + \frac{1}{3!}N^3 + \cdots + \frac{1}{(k-1)!}N^{k-1}\text{.} \end{equation*}

Project Activity 40.14.

Let us apply this analysis to a specific case of Bailey's model, with \(n = 5\text{.}\)

(a)

Find the entries of \(A\) when \(n = 5\text{.}\)

(b)

Let \(J\) be a Jordan canonical form for \(A\text{.}\) Explain why the solution to Bailey's model has the form

\begin{equation*} P(t)= e^{At}P(0) = Ce^{Dt}e^{Nt}C^{-1}P(0)\text{,} \end{equation*}

where \(C\) is an invertible matrix, \(Dt\) is a diagonal matrix, and \(Nt\) is a nilpotent matrix.

(c)

Find a Jordan canonical form \(J\) for \(A\text{.}\)

(d)

Find a diagonal matrix \(D\) and a nilpotent matrix \(N\) so that \(J = D+N\text{.}\)

(e)

Find \(e^{Dt}\) and \(e^{Nt}\text{.}\) Then show that

\begin{equation*} P(t) = \left[ \begin{array}{c} 1+\frac{172}{3}e^{-5t}-80te^{-5t}+\frac{125}{3}e^{-8t}-200te^{-8t}-100e^{-9t} \\ -\frac{220}{3}e^{-5t}+80te^{-5t}-\frac {320}{3}e^{-8t}+320te^{-8t} +180e^{-9t} \\ 10e^{-5t}+80e^{-8t}-120te^{-8t}-90e^{-9t} \\ \frac{10}{3}e^{-5t}-\frac{40}{3}e^{-8t}+10e^{-9t} \\ \frac{5}{3}e^{-5t}- \frac{5}{3}e^{-8t} \\ e^{-5t} \end{array} \right]\text{.} \end{equation*}

One use for mathematical models like Bailey's is to make predictions that can help set policies. Recall that we made the substitution of \(t\) for \(\beta s\) in our original equation in order to make the equations dimensionless, where \(\beta\) is the infection rate — the rate at which people catch the disease. Let us replace \(t\) with \(\beta s\) in our solution and analyze the effect of changing the value of \(\beta\text{.}\)

Project Activity 40.15.

If \(\beta = 1\text{,}\) then the disease is easily transmitted from person to person. If \(\beta\) can be made smaller, then the disease is not so easily transmitted. We continue to work with the case where \(n=5\text{.}\) Plot the curves \(p_0(t)\) through \(p_5(t)\) on the same set of axes for the following values of \(\beta\text{:}\)

\begin{equation*} (a) \beta =1 (b) \beta = 0.5 (c) \beta = 0.25\text{.} \end{equation*}

Explain what you see and how this might be related to the phrase “flattening the curve” used during the COVID-19 pandemic of 2020.

In general, the matrix \(A\) has eigenvalues of algebraic multiplicity 1 and 2. When \(n\) is even, \(n+1\) is odd and the eigenvalues of \(A\) will occur in pairs except for the single eigenvalue \(0\) of multiplicity 1. When \(n\) is odd, \(n+1\) is even and we have two eigenvalues of multiplicity 1: \(0\) and the eigenvalue \(-r(n-r+1)\) when \(r = \frac{n+1}{2}\text{,}\) at which \(-r(n-r+1) = -\frac{(n+1)^2}{4}\text{.}\) It can be shown, although we won't do it here, that every eigenvalue of algebraic multiplicity 2 has geometric multiplicity 1. This information completely determines the Jordan canonical formof \(A\text{.}\)

Bailey, N. T. J. (1950) A simple stochastic epidemic. Biometrika 37
The word stochastic refers to quantities that have a random pattern that can be analyzed statistically, but generally cannot be predicted precisely.