Skip to main content

Section 40 The Jordan Canonical Form

Subsection Application: The Bailey Model of an Epidemic

The COVID-19 epidemic has generated many mathematical and statistical models to try to understand the spread of the virus. In 1950 Norman Bailey proposed a simple stochastic model of the spread of an epidemic. The solution to the model involves matrix exponentials and the Jordan canonical form is a useful tool for calculating matrix exponentials.

Subsection Introduction

We have seen several different matrix factorizations so far, eigenvalue decomposition (Section 19), singular value decomposition (Section 29), QR factorization (Section 25), and LU factorization (Section 22). In this section, we investigate the Jordan canonical form, which in a way generalizes the eigenvalue decomposition. For matrices with an eigenvalue decomposition, the geometric multiplicity of each eigenvalue (the dimension of the corresponding eigenspace) must equal its algebraic multiplicity (the number of times the eigenvalue occurs as a root of the characteristic polynomial of the matrix). We know that not every matrix has an eigenvalue decomposition. However, every square matrix has a Jordan canonical form, in which we use generalized eigenvectors and block diagonal form to approximate the eigenvalue decomposition behavior. At the end of the section we provide a complete proof of the existence of the Jordan canonical form.

Subsection When an Eigenvalue Decomposition Does Not Exist

Recall that an eigenvalue decomposition of a matrix exists if and only if the algebraic multiplicity equals the geometric multiplicity for each eigenvalue. In this case, the whole space Rn decomposes into eigenspaces corresponding to the eigenvalues and the matrix acts as scalar multiplication in each eigenspace. We consider some cases where such a decomposition exists and some where it does not to notice some differences between the two cases and think of ways to improvise the situation.

Preview Activity 40.1.

(a)

All of the matrices below have only one eigenvalue, λ=2 and characteristic polynomial (λ2)n where n is the matrix size. Therefore, for each case, the algebraic multiplicity of this eigenvalue is the size of the matrix. Find the geometric multiplicity of this eigenvalue (i.e. the dimension of Nul (AλI)) in each case below to see how the behavior is different in each case.

(iii)

C=[210021002]

(iv)

D=[210020002]

(v)

E=[200020002]

(vi)

F=[2100020000210002]

(vii)

G=[2100021000210002]

(viii)

H=[2100020000200002]

(ix)

J=[2100021000200002]

(b)

In the examples above, the only matrices where the algebraic and geometric multiplicities of the eigenvalue 2 are equal are the diagonal matrices, which obviously have eigenvalue decompositions. The existence of ones above the diagonal destroys this property. However, the positioning of the ones is also strategical. By letting ones above the diagonal determine how to split the matrix into diagonal blocks, we can categorize the matrices. For example, the matrix in part (c) has one big 3×3 block with twos on the diagonal and ones above, while the matrix in part (d) as a 2×2 block at the top left and another 1×1 block at the bottom right. Determine the blocks for all matrices above, and identify the relationship between the number of blocks and the geometric multiplicity.

(c)

For this last problem, we will focus on the matrix B above. We know that v1=[10] is an eigenvector. We do not have a second linearly independent eigenvector, i.e. a solution to (B2I2)x=0. However, since (B2I2)2=0, we know that (B2I2)(B2I2)x=0 for any x, which implies that (B2I2)x always lies inside the eigenspace corresponding to λ=2. Find (B2I2)[01] to verify this works in this particular case.

Subsection Generalized Eigenvectors and the Jordan Canonical Form

If an n×n matrix A has n linearly independent eigenvectors, then A is similar to a diagonal matrix with the eigenvalues along the diagonal. You discovered in Preview Activity 40.1 a matrix without enough linearly independent eigenvectors can look close enough to a diagonal matrix. We now investigate whether this generalized representation can be achieved for all matrices.

Activity 40.2.

Let M=[310470001]. The characteristic polynomial of M is (λ1)(λ5)2, so the eigenvalues of M are 1 and 5. For λ=1, the eigenspace is also one dimensional and the vector v1=[001] is an eigenvector. For λ=5, although the algebraic multiplicity is 2, the corresponding eigenspace is only one dimensional and v0=[120] is an eigenvector. We cannot diagonalize matrix M because we do not have a second linearly independent eigenvector for the eigenvalue 5, which would give us the three linearly independent eigenvectors we need. Using the idea as in the last problem of Preview Activity 40.1, we can find another linearly independent vector to give us a matrix close to a diagonal matrix.

(a)

Find a vector v3 that is in Nul (M5I3)2 but not in Nul (M5I3). Then calculate (M5I3)v3. How is v2=(M5I3)v3 related to v0?

(b)

Let C=[v1 v2 v3] and calculate the product C1MC for our example matrix M.

(c)

Describe the effect of (MI3) and (M5I3)2 on each of the column vectors of C and explain why this justifies that (MI3)(M5I3)2 is the 0 matrix.

Activity 40.2 shows that even if a matrix does not have a full complement of linearly independent eigenvectors, we can still almost diagonalize the matrix. If A is an n×n matrix that does not have n linearly independent eigenvectors (such matrices area called defective), the matrix A is still similar to a matrix that is “close” to diagonal. Activity 40.2 shows this for the case of a 3×3 matrix M with two eigenvalues, λ1,λ2 with λ2 having algebraic multiplicity two, and each eigenvalue with one-dimensional eigenspace. Let v1 be an eigenvector corresponding to λ1. Because λ2 also has a one-dimensional eigenspace, the matrix M has only two linearly independent eigenvectors. Nevertheless, as you saw in Activity 40.2, we can find a vector v3 that is in Nul (Mλ2I)2 but not in Nul (Mλ2I). From this, it follows that the vector v2=(Mλ2I)v3 is an eigenvector of M with eigenvalue λ2. In this case we have Mv3=λ2v3+v2 and

M[v1 v2 v3]=[Mv1 Mv2 Mv3]=[λ1v1 λ2v2 λ2v3+v2]=[v1 v2 v3][λ1000λ2100λ2].

The lack of two linearly independent eigenvectors for this eigenvalue of algebraic multiplicity two will ensure that M is not similar to a diagonal matrix, but M is similar to a matrix with a diagonal block of the form [λ10λ]. So even though the matrix M is not diagonalizable, we can find an invertible matrix C=[v1 v2 v3] so that C1AC is almost diagonalizable.

In general, suppose we have an eigenvalue λ of a matrix A with algebraic multiplicity two but with a one-dimensional eigenspace. Then the eigenvalue λ is deficient — that is dim(Eλ) is strictly less than the algebraic multiplicity of λ, i.e. Eλ does not contain enough linearly independent eigenvectors for λ. But as we saw above, we may be able to find a vector v2 that is in Nul (AλI)2 but not in Nul (AλI). When we let v1=(AλI)v2, we then also have

(40.1)0=(AλI)2v2=(AλI)v1

as we argued in Preview Activity 40.1, and so v1 is an eigenvector for A with eigenvalue λ. It is vectors that satisfy an equation like (40.1) that drive the Jordan canonical form. These vectors are similar to eigenvectors and are called generalized eigenvectors.

Definition 40.1.

Let A be an n×n matrix with eigenvalue λ. A generalized eigenvector of A corresponding to the eigenvalue λ is a non-zero vector x satisfying

(AλIn)mx=0

for some positive integer m.

In other words, a generalized eigenvector of an n×n matrix A corresponding to the eigenvalue λ is a nonzero vector in Nul (AλIn)m for some m. Note that every eigenvector of A is a generalized eigenvector (with m=1). In Preview Activity 40.1, A is a 3×3 matrix with eigenvalue λ=5 having algebraic multiplicity 2 and geometric multiplicity 1. We were able to see that A is similar to a matrix of the form [100051005] because of the existence of a generalized eigenvector for the eigenvalue 5.

The example in Preview Activity 40.1 presents the basic idea behind how we can find a “simple” matrix that is similar to any square matrix, even if that matrix is not diagonalizable. The key is to find generalized eigenvectors for eigenvalues whose algebraic multiplicities exceed their geometric multiplicities. One way to do this is indicated in Preview Activity 40.1 and in the next activity.

Activity 40.3.

Let

A=[514435312].

The matrix A has λ=2 as its only eigenvalue, and the geometric multiplicity of λ as an eigenvalue is 1. For this activity you may use the fact that the reduced row echelon forms of A2I, (A2I)2, and (A2I)3 are, respectively,

[101011000], [101000000], [000000000].
(a)

To begin, we look for a vector v3 that is in Nul (A2I3)3 that is not in Nul (A2I3)2. Find such a vector.

(b)

Let v2=(A2I3)v3. Show that v2 is in Nul (A2I3)2 but is not in Nul (A2I3).

(c)

Let v1=(A2I3)v2. Show that v1 is an eigenvector of A with eigenvalue 2. That is, v1 is in Nul (A2I3).

(d)

Let C=[v1 v2 v3]. Calculate the matrix product C1AC. What do you notice?

It is the equations (A2I)vi+1=vi from Activity 40.3 that give us this simple form [210021002]. To better understand why, notice that the equations imply that Avi+1=2vi+1+vi. So if C=[v1 v2 v3], then

AC=[Av1 Av2 Av3]=[2v1 2v2+v1 2v3+v2]=[v1 v2 v3][210021002].

This method will provide us with the Jordan canonical form. The major reason that this method always works is contained in the following theorem whose proof follows from the proof of the existence of the Jordan canonical form (presented later).

To find generalized eigenvectors, then, we find a value of p so that dim(Nul (AλIn)p)=m and then find a vector vp that is in Nul (AλIn)p but not in Nul (AλIn)p1. Successive multiplications by AλIn provide a sequence of generalized eigenvectors. The sequence

vpAλInvp1AλIn AλInv1AλIn0

is called a generalized eigenvector chain.

Activity 40.4.

Let

A=[00026021011133117].

The only eigenvalue of A is λ=2 and λ has geometric multiplicity 2. The vectors [0 1 1 0] and [1 2 0 1]T are eigenvectors for A. The reduced row echelon forms for AλI4, (AλI4)2, (AλI4)3 are, respectively,

[1001011200000000], [1113000000000000],  and  [0000000000000000].
(b)

Find a vector v3 in Nul (AλI4)3 that is not in Nul (AλI4)2.

(c)

Now let v2=(AλI4)v3 and v1=(AλI4)v2. What special property does v1 have?

(d)

Find a fourth vector v0 so that {v0,v1,v2,v3} is a basis of R4 consisting of generalized eigenvectors of A. Let C=[v0 v1 v2 v3]. Calculate the product C1AC. What do you see?

The previous activities illustrate the general idea for almost diagonalizing an arbitrary square matrix. First let A be an n×n matrix with an eigenvalue λ of of algebraic multiplicity n and geometric multiplicity k. If v1, v2, , vk are linearly independent eigenvectors for A, then we can extend the set as we did in Activity 40.4 above with generalized eigenvectors to a basis {v1,v2,,vk,vk+1,,vn} of Rn. The matrix C=[v1 v2  vn] has the property that C1AC is almost diagonal. By almost, we mean that C1AC has block matrices along the diagonal that look like

(40.2)[λ100000λ100000λ000000λ100000λ100000λ].

and has zeros everywhere else. A matrix of the form (40.2) is called a Jordan block.

If A is an n×n matrix with eigenvalues λ1, λ2, , λk, we repeat this process with every eigenvalue of A to construct an invertible matrix C so that C1AC is of the form

(40.3)[J1000J2000Jt],

where each matrix Ji is a Jordan block (note that a 1×1 Jordan block is allowable). The form in (40.3) is called the Jordan canonical form or Jordan normal form of the matrix A. Later in this section we will prove the following theorem.

Another example may help illustrate the process.

Activity 40.5.

Let A=[411000031000003100000310110041000004]. The eigenvalues of A are 3 and 4, both with algebraic multiplicity 3. A basis for the eigenspace E3 corresponding to the eigenvalue 3 is {[1 1 0 0 0 0]T} and a basis for the eigenspace E4 corresponding to the eigenvalue 4 is {[1 0 0 0 0 1]T,[1 1 1 1 1 0]T}. In this activity we find a Jordan canonical form of A.

(a)

Assume that the reduced row echelon forms of A3I6, (A3I6)2, and (A3I6)3 are, respectively,

[110000001000000100000010000001000000], [110000000100000010000001000000000000],  and  [110000000010000001000000000000000000].

Find a vector v3 that is in Nul (A3I6)3 but not in Nul (A3I6)2. Then let v2=(A3I6)v3 and v1=(A3I6)v2. Notice that we obtain a string of three generalized eigenvectors.

(b)

Assume that the reduced row echelon forms of A4I6 and (A4I6)2 are, respectively,

[100011010010001010000110000000000000]  and  [100431010320001210000000000000000000].

Find a vector v5 that is in Nul (A4I6)2 but not in Nul (A4I6). Then let v4=(A4I6)v5. Notice that we obtain a string of two generalized eigenvectors.

(c)

Find a generalized eigenvector v6 for A such that {v1,v2,v3,v4,v5,v6} is a basis for R6. Let C=[v1 v2 v3 v4 v5 v6]. Calculate J=C1AC. Make sure that J is a matrix in Jordan canonical form.

(d)

How does the matrix J tell us about the eigenvalues of A and their algebraic multiplicities?

(e)

How many Jordan blocks are there in J for the eigenvalue 3? How many Jordan blocks are there in J for the eigenvalue 4? How do these numbers compare to the geometric multiplicities of 3 and 4 as eigenvalues of A?

The previous activities highlight some of the information that a Jordan canonical tells us about a matrix. Assuming that C1AC=J, where J is in Jordan canonical form, we can say the following.

  • Since similar matrices have the same eigenvalues, the eigenvalues of J, and therefore of A, are the diagonal entries of J. Moreover, the number of times a diagonal entry appears in J is the algebraic multiplicity of the eigenvalue. This is also the sum of the sizes of all Jordan blocks corresponding to λ.

  • Given an eigenvalue λ, its geometric multiplicity is the number of Jordan blocks corresponding to λ.

  • Each generalized eigenvector leads to a Jordan block for that eigenvector. The number of Jordan blocks corresponding to λ of size at least j is sj=dim(Nul (AλI)j)dim(Nul (AλI)j1). Thus, the number of Jordan blocks of size exactly j is

    sjsj+1=2dim(Nul (AλI)j)dim(Nul (AλI)j+1)dim(Nul (AλI)j1).

One interesting consequence of the existence of the Jordan canonical form is the famous Cayley-Hamilton Theorem.

The proof of the Cayley-Hamilton Theorem follows from Exercise 22 that shows that every upper triangular matrix satisfies its characteristic polynomial. If A is a square matrix, then there exists a matrix C such that C1AC=T, where T is in Jordan canonical form (that is, T is upper triangular). So A is similar to T. If p(x) is the characteristic polynomial of A, Activity 19.5 in Section 19 tells us that p(x) is the characteristic polynomial of T. Therefore, p(T)=0. Then Exercise 14 in Section 19 shows that p(A)=Cp(T)C1=0 and A satisfies its characteristic polynomial.

Subsection Geometry of Matrix Transformations using the Jordan Canonical Form

Figure 40.5. The image of the matrix transformation T(x)=13[8127]x.

Recall that we can visualize the action of a matrix transformation defined by a diagonalizable matrix by using a change of basis. For example, let T(x)=Ax, where A=13[8127]. The eigenvalues of A are λ1=3 and λ2=2 with corresponding eigenvectors v1=[1 1]T and [1 2]T. So A is diagonalizable by the matrix P=[1112], with P1AP=D=[3002]. Note that T(x)=PDP1x. Now P1 is a change of basis matrix from the standard basis to the basis B={v1,v2}, and D stretches space in the direction of v1 by a factor of 3 and stretches space in the direction of v2 by a factor of 2, and then P changes basis back to the standard basis. This is illustrated in Figure 40.5.

In general, if an n×n matrix A is diagonalizable, then there is a basis B={v1,v2,,vn} of Rn consisting of eigenvectors of A. Assume that Avi=λivi for each i. Letting P=[v1 v2  vn] we know that

P1AP=D,

where D is the diagonal matrix with λ1, λ2, , λn down the diagonal. If T is the matrix transformation defined by T(x)=Ax, then

T(x)=PDP1x.

Now P1 is a change of basis matrix from the standard basis to the basis B, and D stretches or contracts space in the direction of vi by the factor λi, and then P changes basis back to the standard basis. In this way we can visualize the action of the matrix transformation using the basis B. If A is not a diagonalizable matrix, we can use the Jordan canonical form to understand the action of the transformation defined by A. We start by analyzing shears.

Figure 40.6. Left: A shear in the x-direction. Right: A shear in the direction of the line y=x.

Activity 40.6.

(a)

Recall from Section 7 that a matrix transformation T defined by T(x)=Ax, where A is of the form [1a01] performs a shear in the x direction, as illustrated at left in Figure 40.6. That is, while T(e1)=e1, it is the case that T(e2)=e2+[a 0]T. In other words, T(e2)e2=[a 0]T is in Span{e1}. But we can say something more. Show that if x=[x1 x2]T is not in Span{e1}, then

T(x)=x+x2[a 0]T.

The result is that if x is not in Span{e1}, then T(x)x is in Span{e1}. This leads us to a general definition of a shear.

Definition 40.7.

A matrix transformation T is a shear in the direction of the line (through the origin) in R2 if

  1. T(x)=x for all x in and

  2. T(x)x is in for all x not in .

(b)

Let S(x)=Mx, where M=[3221]. Also let v1=[1 1]T and v2=[1 0]T.

(i)

Let x=[tt] for some scalar t. Calculate S(x)=x. How is this related to the eigenvalues of M?

(ii)

Let x be any vector not in Span{[1 1]T}. Show that S(x)x is in Span{[1 1]T}.

(iii)

Explain why S is a shear and how S is related to the image at right in Figure 40.6.

As we did with diagonalizable matrices, we can understand a general matrix transformation of the form T(x)=Ax by using a Jordan canonical form of A. In this context, we will encounter matrices of the form B=[ca0c]=[c00c][1ac01] for some positive constant c. If S(x)=Bx, then S performs a shear in the direction of e1 and then an expansion or contraction in all directions by a factor of c. We illustrate with an example.

Activity 40.7.

Let T(x)=Ax, where A=[3111]. The only eigenvalue of A is λ=2, and this eigenvalue has algebraic multiplicity 2 and geometric multiplicity 1. The vector v1=[1 1]T is an eigenvector for A with eigenvalue 2, and v2=[1 0]T satisfies (A2I2)v2=v1. Let C=[1110].

(a)

Explain why T(x)=CJC1x, where J=[2102].

(b)

The matrix C is a change of basis matrix PBS from some basis S to another basis B. Specifically identify S and B.

(c)

If we begin with an arbitrary vector x, then [x]S=x. How is Cx related to B?

(d)

Describe in detail what J does to a vector in the B coordinate system.

Hint.

J=[2002][11201].

(e)

Put this all together to describe the action of T as illustrated in Figure 40.8. The word shear should appear in your explanation.

Figure 40.8. Change of basis, a shear, and scaling.

Activity 40.7 provides the essential ideas to understand the geometry of a general linear transformation using the Jordan canonical form. Let T(x)=Ax with Av1=λv1 and Av2=λv2+v1. Then T maps Span{v1} to Span{v1} by a factor of λ, and T maps Span{v2} to the line containing the terminal point of v1 in the direction of v2. The matrix C performs a change of basis from the standard basis to the basis {v1,v2}, then the matrix [2102] performs an expansion by a factor of 2 in all directions and a shear in the direction of v1.

Subsection Proof of the Existence of the Jordan Canonical Form

While we have constructed an algorithm to find a Jordan canonical form of a square matrix, we haven't yet addressed the question of whether every square matrix has a Jordan canonical form. We do that in this section.

Consider that any vector [a b]T in R2 can be written as the sum [a 0]T+[0 b]T. The vector [a 0]T is a vector in the subspace W1={[x 0]T|xR}=Span{[1 0]T} and the vector [0 b]T is in the subspace W2={[0 y]T|yR}=Span{[0 1]T. Also notice that W1W2={0}. We can extend this idea to R3, where any vector [a b c]T can be written as [a 0 0]T+[0 b 0]T+[0 0 c]T, with [a 0 0]T in W1=Span{[1 0 0]T, [0 b 0]T in W2=Span{[0 1 0]T, and [0 0 c]T in W3=Span{[0 0 1]T. In this situation we write R2=W1W2 and R3=W1W2W3 and say that R2 is the direct sum of W1 and W2, while R3 is the direct sum of W1, W2, and W3.

Definition 40.9.

A vector space V is a direct sum of subspaces V1, V2, , Vm if every vector v in V can be written uniquely as a sum

v=v1+v2+v3++vm,

with viVi for each i.

If V is a direct sum of subspaces V1, V2, , Vm, then we write

V=V1V2Vm.

Some useful facts about direct sums are given in the following theorem. The proofs are left for the exercises.

Subsection Nilpotent Matrices and Invariant Subspaces

We will prove the existence of the Jordan canonical form in two steps. In the next subsection Lemma 40.14 will show that every linear transformation can be diagonalized in some form, and Lemma 40.15 will provide the specific Jordan canonical form. Before we proceed to the lemmas, there are two concepts we need to introduce — nilpotent matrices and invariant subspaces. We don't need these concepts beyond our proof, so we won't spend a lot of time on them.

Activity 40.8.

Let A=[1111] and B=[213211213].

(a)

Calculate the positive integer powers of A and B. What do you notice?

(b)

Compare the eigenvalues of A to the eigenvalues of B. What do you notice?

Activity 40.8 shows that there are some matrices whose powers eventually become the zero matrix, and that there might be some connection to the eigenvalues of these matrices. Such matrices are given a special name.

Definition 40.11.

A square matrix A is nilpotent if Am=0 for some positive integer m. Correspondingly, a linear transformation T from a n-dimensional vector space V to V is nilpotent if Tm=0 for some positive integer m.

Nilpotent matrices are the essential obstacle to the diagonalization process. If A is a nilpotent matrix, the smallest positive integer m such that Am=0 is called the index of A.

A characterization of nilpotent matrices is given in the following theorem.

The proof is left to the exercises.

We have seen that if T:VV is a linear transformation from a vector space to itself, and if λ is an eigenvalue of T with eigenvector v, then T(v)=λv. In other words, T maps every vector in W=Span{v} to a vector in W. When this happens we say that W is invariant under the transformation T.

Definition 40.13.

A subspace W of a vector space V is invariant under a linear transformation T:VV if T(w)W whenever w is in W.

So, for example, every eigenspace of a transformation is invariant under the transformation. Other spaces that are always invariant are V and {0}.

Activity 40.9.

Let V be a vector space and let T:VV be a linear transformation.

(a)

Let V=R2 and T the linear transformation defined by T([x y]T)=[xy yx]T. Find two invariant subspaces besides V or {0} for T.

(b)

Recall that Ker(T)={vV:T(v)=0}. Is Ker(T) invariant under T? Explain.

(c)

Recall that Range(T)={wV:w=T(v) for some vV}. Is Range(T) invariant under T? Explain.

Subsection The Jordan Canonical Form

We are now ready to prove the existence of the Jordan canonical form.

An example might help illustrate the lemma.

Activity 40.10.

Let T:P4P4 be defined by

T(a0+a1t+a2t2+a3t3+a4t4)=(2a0+a1)+(a1a2)t+(a0+a1)t2+(a0a1+a2+2a3a4)t3+(2a4)t4.

The matrix of T with respect to the standard basis S={1,t,t2,t3,t4} is

A=[T]S=[2100001100110001112100002].

The eigenvalues of A (and T) are 2 and 1, and the algebraic multiplicity of the eigenvalue 2 is 2 while its geometric multiplicity is 1, and the algebraic multiplicity of the eigenvalue 1 is 3 while its geometric multiplicity is also 1.

For every t, we can find Ker(TλI)t using Nul (AλI)t.

(a)

Technology shows that dim(Nul AI)=1, dim(Nul AI)2=2, and dim(Nul AI)3=3. A basis for Nul (AI)3 is C1={[1 1 0 0 0]T,[1 0 1 0 0]T,[1 0 0 1 0]T}. Find a basis B1 for Ker((TI)3).

(b)

Technology also shows that dim(Nul A2I)=1 and dim(Nul A2I)2=2. A basis for Nul (A2I)2 is C2={[0 0 0 1 0]T,[0 0 0 0 1]T}. Find a basis C2 for Ker((T2I)2).

Since each Ker(TλiI)si in Activity 40.10 is T invariant, T maps vectors in Ker(TλiI)si back into Ker(TλiI)si. So T applied to each Ker(TλiI)si provides a matrix Bi. Applying T to each Ker(TλiI)si produces the matrix

[B1B200Br],

where the Bi are square matrices corresponding to the eigenvalues of T. These blocks are determined by the restriction of T to the spaces Ker(TλiI)si with respect to the found basis. To obtain the Jordan canonical form, we need to know that we can always choose these basis to create the correct block matrices. Lemma 40.15 will provide those details.

Proof of Lemma 40.14.

Choose a λi and, for convenience, label it λ. For each positive integer j, let Wj=Ker(TλI)j. If (TλI)jx=0, then

(TλI)j+1(x)=(TλI)(TλI)j(x)=(TλI)(0)=0,

so WjWj+1. Thus we have the containments

W1W2W3Wk.

Now V is finite dimensional, so this sequence must reach equality at some integer t. That is, Wt=Wt+1=. Let s be the smallest positive integer for which this happens.

We plan to show that V=Ker(TλI)sRange(TλI)s. We begin by demonstrating that Ker(TλI)sRange(TλI)s={0}. Let vKer(TλI)sRange(TλI)s. Then (TλI)s(v)=0 and there exists uV such that (TλI)s(u)=v. It follows that

(TλI)2s(u)=(TλI)s(TλI)s(u)=(TλI)s(v)=0.

But Ker(TλI)2s=Ker(TλI)s, so

0=(TλI)2s(u)=(TλI)s(u)=v.

We conclude that Ker(TλI)sRange(TλI)s={0}.

Now we will show that V=Ker(TλI)sRange(TλI)s. Let z=z1+z2 with z1Ker(TλI)s and z2Range(TλI)s. First we will show that z is uniquely represented in this way. Suppose z=z1+z2 with z1Ker(TλI)s and z2Range(TλI)s. Then

z1+z2=z1+z2

and

z1z1=z2z2.

But Ker(TλI)sRange(TλI)s={0}, so z1z1=0 and z2z2=0 which means z1=z1 and z2=z2. Now let Z=Ker(TλI)sRange(TλI)s. We then know that

dim(Z)=dim(Ker(TλI)s)+dim(Range(TλI)s).

Also, the Rank-Nullity Theorem shows that

dim(V)=dim(Ker(TλI)s)+dim(Range(TλI)s).

So Z is a subspace of V with dim(Z)=dim(V). We conclude that Z=V and that

V=Ker(TλI)sRange(TλI)s.

Next we demonstrate that Ker(TλI)s and Range(TλI)s are invariant under T. Note that

T(TλI)=T2λT=(TλI)T,

and T commutes with (TλI). By induction, T commutes with (TλI)s. Suppose that vKer(TλI)s. Then

(TλI)sT(v)=T(TλI)s(v)=T(0)=0.

So T(v)Ker(TλI)s. Similarly, suppose that vRange(TλI)s. Then there is a uV such that (TλI)s(u)=v. Then

T(v)=T((TλI)s(u))=(T(TλI)s)(u)=((TλI)sT)(u)=(TλI)s(T(u)),

and T(v)Range(TλI)s.

We conclude our proof by induction on the number r of eigenvalues of T. Suppose that r=1 and so T has exactly one eigenvalue λ. Then TλI has only zero as an eigenvalue (otherwise, there is μ0 such that (TλI)μI=T(λ+μ)I has a nontrivial kernel. This makes λ+μ an eigenvalue of T.) In this situation, TλI is nilpotent and so (TλI)t=0 for some positive integer t. If s is the smallest such power, then V=Ker(TλI)s and Range(TλI)s={0}. So every vector in V is in Ker(T]lambdaI)s and

V=Ker(TλI)sRange(TλI)s=Ker(TλI)s.

Thus, the statement is true when r=1. Assume that the statement is true for linear transformations with fewer than r eigenvalues. Now assume that T has distinct eigenvalues λ1, λ2, , λr. By our previous work, we know that

V=Ker(Tλ1I)s1Range(Tλ1I)s1

for some positive integer s1. Let V1=Range(Tλ1I)s1. Since Range(Tλ1I)s1 is T invariant, we know that T maps V1 to V1. The eigenvalues of T on V1 are λ2, λ3, , λr. By our induction hypothesis, we have

V1=Ker(Tλ2I)s2Ker(Tλ3I)s3Ker(TλrI)sr,

for some positive integers s2, s3, , sr, which makes

V=Ker(Tλ1I)s1Ker(Tλ2I)s2Ker(Tλ3I)s3Ker(TλrI)sr.

Lemma 40.14 tells us that T has a diagonal form with block matrices down the diagonal. To obtain a Jordan canonical form, we need to identify the correct bases for the summands of V. (Lemma is due to Mark Wildon from “A SHORT PROOF OF THE EXISTENCE OF JORDAN NORMAL FORM”.)

Notice the similarity of Lemma 40.15 to chains of generalized eigenvectors. An example might help illustrate Lemma 40.15.

Activity 40.11.

Let T:P5P5 be defined by

T(a0+a1t+a2t2+a3t3+a4t4+a5t5)=(a1a4+a5)t+(a0a1+a3a4+a5)t2+(a1+a4)t4.

Let S={1,t,t2,t3,t4,t5} be the standard basis for P5. Then

A=[T]S=[000000010011110111000000010010000000].

Technology shows that the only eigenvalue of A is 0 and that the geometric multiplicity of 0 is 3. Since 0 is the only eigenvalue of A, we know that A (and T) is nilpotent. Using technology we find that the reduced row echelon forms of A and A2 and respectively,

[100100010010000001000000000000000000], [000001000000000000000000000000000000],

while A3=0. We see that dim(Ker(T3))dim(Nul A3)=6 while dim(Ker(T2))=dim(Nul A2)=5.

(a)

Notice that the vector v1=[0 0 0 0 0 1]T is in Nul A3 but not in Nul A2. Use this vector to construct one chain u1, T(u1), and T2(u1) of generalized eigenvectors starting with a vector u1 that is in Ker(T3) but not in Ker(T2). What can we say about the vector T2(u1) in relation to eigenvectors of T?

(b)

We know two other eigenvectors of T, so we need another chain of generalized eigenvectors to provide a basis of P5 of generalized eigenvectors. Use the fact that v2=[1 0 0 0 0 0]T is in Nul A2 but not in Nul A to find another generalized eigenvector u2 in Ker(T2) that is not in Ker(T). Then create a chain u2 and T(u2) of generalized eigenvectors. What is true about T(u2) in relation to eigenvectors of T?

(c)

Let u3=1+t3 be a third eigenvector of T. Explain why {u1,T(u1),T2(u1),u2,T(u2),u3} is a basis of P5. Identify the values of k and the ai in Lemma 40.12.

Notice that if we let C=[[T2(u1)]S [T(u1)]S [u1]S [T(u4)]|CS [u4]|CS [u6]S] using the vectors from Activity 40.11 we should find that

C1AC=[010000001000000000000010000000000000],

and this basis provides a matrix that produces a Jordan canonical form of A.

Lemma 40.15 provides the sequences of generalized eigenvectors that we need to make the block matrices in Jordan canonical form. This works as follows. Start, for example, with u1, T(u1), ,Ta11(u1). Since Ts=0, we know that T is nilpotent and so has only 0 as an eigenvalue. If T has a nonzero eigenvalue λ, we replace T with TλI.

Let B={Ta11(u1),Ta12(u1),,T2(u1),T(u1),u1}. Then

B=[Ta1(u1)]B=[0 0 0   0 0]T[T(Ta12(u1))]B=[Ta11(u1)]B=[1 0 0   0 0]T[T(Ta13(u1))]B=[Ta13(u1)]B=[0 1 0   0 0]T[T(T2(u1))]B=[T3(u1)]B=[0 0 0   0 1 0 0 0]T[T(T(u1))]B=[T2(u1)]B=[0 0 0   0 0 1 0 0]T[T(u1)]B=[T(u1)]B=[0 0 0   0 0 0 1 0]T.

This makes

[T]B=[01000000010000000001000000010000000],

which gives one Jordan block.

Proof of Lemma 40.15.

If T(x)=0 for all xV, then we can choose u1, u2, , udim(V) to form any basis of V and all ai=1. So we can assume that T is a nonzero transformation. Also, we claim that T(V) is a proper subset of V (recall that T(V) is the same as Range(T)). If not, V=T(V)=T2(V)==Tm(V) for any positive integer m. But this contradicts the fact that Ts=0.

We proceed by induction on dim(V). If dim(V)=1, sinceT(V) is a proper subset of V the only possibility is that T(V)={0}. We have already discussed this case. For the inductive step assume that the lemma is true for any vector space of positive integer dimension less than dim(V). Our assumption that T is a nonzero transformation allows us to conclude that 0T(V)V. Thus, 1dim(T(V))<dim(V). We apply the inductive hypothesis to the transformation T:T(V)T(V) to find integers b1, b2, , b and vectors v1, v2, , v in T(V) such that the vectors

(40.4)v1,T(v1),,Tb11(v1),,v,T(v),,Tb1(v)

form a basis for T(V) and Tbi(vi)=0 for 1i.

Now v1, v2, , v are in T(V), so there exist u1, u2, , u in V such that T(ui)=vi for each 1i. This implies that Tj(ui)=Tj1(T(ui))=Tj1(vi) for all i and all positive integers j. The vectors Tb11(v1), Tb21(v2), , Tb1(v) are linearly independent and T(Tbi1(vi))=Tbi(vi)=0 for 1i, so the vectors Tb11(v1), Tb21(v2), , Tb1(v) are all in Ker(T). Extend the set {Tb11(v1),Tb21(v2),,Tb1(v)} to a basis of Ker(T) with the vectors w1, w2, , wm for m=dim(Ker(T)). That is, the set

(40.5){Tb11(v1),Tb21(v2),,Tb1(v),w1,w2,,wm}

is a basis for Ker(T). We will now show that the vectors

(40.6)u1,T(u1),,Tb1(u1),,u,T(u),,Tb(u),w1,,wm

form a basis for V. To demonstrate linear independence, suppose that

c1,0u1+c1,1T(u1)++c1,b1Tb1(u1)+(40.7)+c,0u+c,1T(u)++c,bTb(u)+d1w1++dmwm=0

for some scalars ci,j and dk. Apply T to this linear combination to obtain the vector equation

c1,0T(u1)+c1,1T2(u1)++c1,b1Tb1+1(u1)++c,0T(u)+c,1T2(u)++c,bTb+1(u)+d1T(w1)++dmT(wm)=0.

Using the relationship Tj(ui)=Tj1(vi) gives us the equation

c1,0v1+c1,1T(v1)++c1,b1Tb1(v1)++c,0v+c,1T(v)++c,bTb(v)+d1T(w1)++dmT(wm)=0.

Recall that Tbi(vi)=0 and that w1, w2, , wm are in Ker(T) to obtain the equation

c1,0v1+c1,1T(v1)++c1,b11Tb11(v1)++c,0v+c,1T(v)++c,b1Tb1(v)=0.

But this final equation is a linear combination of the basis elements in (40.4) of T(V), and so the scalars are all 0. Replacing these scalars with 0 in (40.7) results in

c1,b1Tb1(u1)++c,bTb(u)+d1w1++dmwm=0.

But this is a linear combination of vectors in a basis for Ker(T) and so all of the scalars are also 0. Hence, the vectors u1, T(u1), , Tb1(u1), , u, T(u), , Tb(u), w1, …, wm are linearly independent.

The Rank-Nullity Theorem shows tells us that dim(V)=dim(T(V))+dim(Ker(T)). The vectors in (40.5) form a basis for Ker(T), and so dim(Ker(T))=+m. The vectors in (40.4) form a basis for T(V), so dim(T(V))=b1+b2++b. Thus,

dim(V)=+m+b1+b2++b=m+(b11)+(b21)++(b1).

But this is exactly the number of vectors in our claimed basis (40.6). This verifies Lemma 40.15 with k=+m, ai=bi+1 for 1i, uj+=wj and aj+=1 for 1m.

We return to Activity 40.10 to illustrate the use of Lemma 40.15.

Example 40.16.

We work with the transformation T:P4P4 defined by

T(a0+a1t+a2t2+a3t3+a4t4)=(2a0+a1)+(a1a2)t+(a0+a1)t2+(a0a1+a2+2a3a4)t3+(2a4)t4.

Recall from Activity 40.10 that

P4=Ker(TI)3Ker(T2I)2.

and a basis for Ker(TI)3 is B1={p1(t),p2(t),p3(t)} with p1(t)=1+t, p2(t)=1+t2, and p3(t)=1+t3. Since

(TI)(p1(t))=0, (TI)(p2(t))=p1(t),  and  (TI)(p3(t))=p2(t)

we see that TI maps Ker(TI)3 to Ker(TI)3. The matrix of TI with respect to B1 is

[TI]B1=[010001000].

The reduced row echelon form of [TI]B1 and [TI]B12 are

[010001000]  and  [001000000],

while [TI]B13=0. This makes (TI)3=0. We apply Lemma 40.15 to TI:Ker(TI)3Ker(TI)3. We choose u1 to be a vector in Ker(TI)3 that is not in Ker(TI)2. Once such vector has [u1]B1=[0 0 1]T, or u1=1+t3. We then let u2=(TI)(u1)=1+t2 and u1=(TI)(u2)=1t. This gives us the basis {1t,1+t2,1+t3} for Ker(TI)3.

We can also apply Lemma 40.15 to T2I:Ker(TI)2Ker(TI)2. Since

(T2I)(p4(t))=0  and  (T2I)(p5(t))=p4(t),

we have that

[T2I]B2=[0100].

It follows that (T2I)2=0. Selecting u4=t4 and letting u5=(T2I)(u4)=p4(t), we obtain the basis {t3,t4} for Ker(TI)2.

Let q1(t)=1+t3, q2(t)=1+t2, q3(t)=1t, q4(t)=t4, and q5(t)=t3, and let C={q1(t),q2(t),q3(t),q4(t),q5(t)}. Since

T(q1(t))=q1(t)+q2(t)T(q2(t))=q2(t)+q3(t)T(q3(t))=q3(t)T(q4(t))=2q4(t)+q5(t)T(q5(t))=2q5(t)

it follows that

[T]C=[1100001100001000002100002],

and we have found a basis for P4 for which the matrix T has a Jordan canonical form.

Subsection Examples

What follows are worked examples that use the concepts from this section.

Example 40.17.

Find a Jordan form J for each of the following matrices. Find a matrix C such that C1AC=J.

(a)

A=[1111]

Solution.

The eigenvalues of A are 2 and 0 with corresponding eigenvectors [1 1]T and [1 1]T. Since we have a basis for R2 consisting of eigenvectors for A, we know that A is diagonalizable. Moreover, Jordan canonical form of A is J=[2000] and C1AC=J, where C=[1111].

(b)

A=[011000000]

Solution.

Since A is upper triangular, its eigenvalues are the diagonal entries. So the only eigenvalue of A is 0, and technology shows that this eigenvalue has geometric multiplicity 2. An eigenvector for A is v1=[1 0 0]T. A vector v2 that satisfies Av2=v1 is v2=[0 0 1]T. Letting C=[100001011] gives us

C1AC=[010000000].
(c)

A=[1111010000220002]

Solution.

Again, A is upper triangular, so the eigenvalues of A are 2 and 1, both of algebraic multiplicity 2 and geometric multiplicity 1. Technology shows that the reduced row echelon forms of A2I4 and (A2I4)2 are

[1010010000010000]  and  [1011010000000000].

Now v2=[1 0 0 1]T is in Nul (A2I4)2, and

v1=(A2I4)v2=[2 0 2 0]T.

Notice that Let v1 is an eigenvector of A with eigenvalue 2. Technology also shows that the reduced row echelon forms of AI4 and (AI4)2 are

[0100001000010000]  and  [0010000100000000].

Now v4=[0 1 0 0]T is in Nul (AI4)2, and

v3=(AI4)v4=[1 0 0 0]T.

Notice that Let v3 is an eigenvector of A with eigenvalue 1. Letting C=[2110000120000100] gives us

C1AC=[2100020000110001].
Figure 40.18. The effect of the transformation T.

Example 40.19.

Let T(x)=Ax, where A=[011012112]. Assume that P1AP=[110011001], where P=[104024120]. Find a specific coordinate system in which it is possible to succinctly describe the action of T, then describe the action of T on R3 in as much detail as possible.

Solution.

First note that 1 is the only eigenvalue of A. Since A does not have 0 as an eigenvalue, it follows that A is invertible and so T is both one-to-one and onto. Let v1=[4 4 0]T, v2=[0 2 2]T, and v3=[1 0 1]T. We have that

(AI3)v3=v2(AI3)v2=v1(AI3)v1=0.

and so

T(v3)=Av3=v3+v2T(v2)=Av2=v2+v1T(v1)=Av1=v1.

If we consider the coordinate system in R3 defined by the basis {v1,v2,v3} as shown in blue in Figure 40.18, the fact that T(v1)=v1 shows that T fixes all vectors in Span{v1}. That T(v2)=v2+v1 tells us that T maps Span{v2} onto Span{v1+v2}, and T(v3)=v3+v2 shows that T maps Span{v3} onto Span{v2+v3}. So T sends the box defined by v1, v2, and v3 onto the box defined by v1, v1+v2, and v2+v3 (in red in Figure 40.18. So the action of T is conveniently viewed in the coordinate system determined by the columns of a matrix P that converts A into its Jordan canonical form.

Subsection Summary

  • Any square matrix A is similar to a Jordan canonical form

    [J1000J2000Jt],

    where each matrix Ji is a Jordan block of the form

    [λ100000λ100000λ000000λ100000λ100000λ].

    with λ as a eigenvalue of A.

  • A generalized eigenvector of an n×n matrix A corresponding to an eigenvalue λ of A is a non-zero vector x satisfying

    (AλIn)mx=0

    for some positive integer m. If A is an n×n matrix, then we can find a basis of Rn consisting of generalized eigenvectors v1, v2, , vn of A so that that matrix C=[v1 v2  vn] has the property that C1AC is a Jordan canonical form.

  • A vector space V is a direct sum of subspaces V1, V2, , Vm if every vector v in V can be written uniquely as a sum

    v=v1+v2+v3++vm,

    with viVi for each i.

  • A square matrix A is nilpotent if and only if 0 is the only eigenvalue of A. The Jordan form of a matrix A can always be written in the form D+N, where D is a diagonal matrix and N is a nilpotent matrix.

  • A subspace W of a vector space V is invariant under a linear transformation T:VV if T(w)W whenever w is in W.

Exercises Exercises

1.

Find a Jordan canonical form for each of the following matrices.

(a)

A=[1061212141214]

(b)

B=[8410901410]

(c)

C=[6002428814]

(d)

D=[3321233201215864]

2.

Let a0. Show that the Jordan canonical form of [410004a000400004] is independent of the value of a.

3.

Show that the Jordan canonical form of [21000210002a0004] is independent of the value of a.

4.

Let A and B be similar matrices with B=Q1AQ. Let U=C11AC1 be the Jordan canonical form of A. Show that U=C21BC2 is also the Jordan canonical form of B with C2=Q1C1.

5.

Find all of the Jordan canonical forms for 2×2 matrices and 3×3 matrices.

6.

Find the Jordan canonical form of [1221201000001120201001011].

7.

For the matrix A, find a matrix C so that J=C1AC is the Jordan canonical form of A, where first: the diagonal entries do not increase as we move down the matrix J and, second: the Jordan blocks do not increase in size as we move down the matrix J.

A=[3111111111111111111111311111111111111111000022000000000220000000002200111111131122222222401111111113].

8.

A polynomial in two variables is an object of the form

p(s,t)=aijsitj

where i and j are integers greater than or equal to 0. For example,

q(s,t)=3+2st+4s2t10st3+5s2t2

is a polynomial in two variables. The degree of a monomial sitj is i+j. The degree of a polynomial p(s,t) is the largest degree of any monomial summand of p(s,t). So the degree of the example polynomial q(s,t) is 4. Two polynomials in V are equal if they have the same coefficients on like terms. We add two polynomials in the variables s and t by adding coefficients of like terms. Scalar multiplication is done by multiplying each coefficient by the given scalar. Let V be the set of all polynomials of two variables of degree less than or equal to 2. You may assume that V is a vector space under this addition and multiplication by scalars and has the standard additive identity and additive inverses. Define F:VV and G:VV by

F(p(s,t))=sps(s,t)+pt(s,t)

and

G(p(s,t))=ps(s,t)+tpt(s,t).

That is,

F(a0+a1s+a2t+a3st+a4s2+a5t2)=s(a1+a3t+2a4s)+(a2+a3s+2a5t)=a2+(a1+a3)s+2a5t+a3st+2a4s2

and

G(a0+a1s+a2t+a3st+a4s2+a5t2)=(a1+a3t+2a4s)+t(a2+a3s+2a5t)=a1+2a4s+(a2+a3)t+a3st+2a5t2.

You may assume that F and G are linear transformations.

(a)

Explain why B={1,s,t,st,s2,t2} is a basis for V.

(b)

Explain why F and G are not diagonalizable.

(c)

Show that [F]B and [G]B have the same Jordan canonical form. (So different transformations can have the same Jordan canonical form.)

(d)

Find ordered bases CF={f1,f2,f3,f4,f5,f6} and CG={g1,g2,g3,g4,g5,g6} of V for which [F]CF and [G]CG are the Jordan canonical form from (c).

(e)

Recall that a linear transformation can be defined by its action on a basis. So define H:VV satisfying H(gi)=fi for i from 1 to 6. Show that H is invertible and that G=H1FH. This is the essential argument to show that any two linear transformations with the same matrix with respect to some bases are similar.

Hint.

What matrix is [H]CGCF?

9.

Let

vpAλIvp1AλI AλIv1AλI0

be a chain of generalized eigenvectors for a matrix A corresponding to the eigenvalue λ. In this problem we show that the set {v1,v2,,vp} is linearly independent.

(a)

Explain why vk is in Nul (AλI)k for any k between 1 and p.

Hint.

Show that v1=(AλI)k1vk for each k from 2 to p.

(b)

Consider the equation

(40.8)x1v1+x2v2++xpvp=0

for scalars x1, x2, , xp.

(i)

Multiply both sides of (40.8) on the left by (AλI)p1. Explain why we can then conclude that xp=0.

(ii)

Rewrite (40.8) using the result from part i. Explain how we can then demonstrate that xp1=0.

(iii)

Describe how we can use the process in parts i. and ii. to show that x1=x2=xp=0. What does this tell us about the set {v1,v2,,vp}?

10.

Find, if possible, a matrix transformation T:R3R3 for which there is a one-dimensional invariant subspace but no two-dimensional invariant subspace. If not possible, explain why.

11.

Let A=[2110]. It is the case that A has a single eigenvalue of algebraic multiplicity 2 and geometric multiplicity 1.

(a)

Find a matrix C whose columns are generalized eigenvectors for A so that C1AC=J is in Jordan canonical form.

(b)

Determine the entries of Jk and then find the entries of Ak for any positive integer k.

12.

Let v1, v2, , vr be eigenvectors of an n×n matrix A, and let W=Span{v1,v2,,vr}. Show that W is invariant under the matrix transformation T defined by T(x)=Ax.

13.

Let A be an n×n matrix with eigenvalue λ. Let S:RnRn be defined by S(x)=Bx for some matrix B. Show that if B commutes with A, then the eigenspace Eλ of A corresponding to the eigenvalue λ is invariant under S.

Hint.

Show that (AλI)S(x)=0

14.

Determine which of the following matrices is nilpotent. Justify your answers. For each nilpotent matrix, find its index.

(a)

[2142]

(c)

[110011132]

(d)

[xyx2y2xy] for any real numbers x and y

15.

Find two different nonzero 3×3 nilpotent matrices whose index is 2. If no such matrices exist, explain why.

16.

Find, if possible, 4×4 matrices whose indices are 1, 2, 3, and 4. If not possible, explain why.

17.

Let V be an n-dimensional vector space and let W be a subspace of V. Show that V=WW.

Hint.

Use a projection onto a subspace.

18.

Let W be a subspace of an n-dimensional vector space V that is invariant under a transformation T:VV.

(a)

Show by example that W need not be invariant under T.

(b)

Show that if T is an isometry, then W is invariant under T.

19.

Let T:P2P2 be defined by T(p(t))=p(t)p(t), where p(t) is the derivative of p(t). That is, if p(t)=a0+a1t+a2t2, then p(t)=a1+2a2t. Find a basis B for P2 in which the matrix [T]B is in Jordan canonical form.

20.

Let A be an n×n nilpotent matrix with index m. Since Am1 is not zero, there is a vector vRn such that Am1v0. Prove that the vectors v, Av, A2v, , Am1v are linearly independent.

21.

(a)

Let A=[213211213].

(i)

Show that A is nilpotent.

Hint.

Calculate powers of A.

(ii)

Calculate the matrix product (I3A)(I3+A+A2). What do you notice and what does this tell us about (I3A)?

(b)

If A is an n×n nilpotent matrix, show that IA is nonsingular, where I is the n×n identity matrix.

22.

In this exercise we show that every upper triangular matrix satisfies its characteristic polynomial.

(a)

To illustrate how this will work, consider a 3×3 example. Let T=[λ1ab0λ2c00λ3].

(i)

What is the characteristic polynomial p(x) of T?

(ii)

Consider the matrices S1=Tλ1I3, S2=Tλ2I3, and S3=Tλ3I3. Show that the first column of S1 is 0T, the first two columns of S1S2 are 0T, and that every column of S1S2S3 is 0T. Conclude that T satisfies its characteristic polynomial.

(b)

Now we consider the general case. Suppose T is an n×n upper triangular matrix with diagonal entries λ1, λ2, , λn in order. For i from 1 to n let Si=TλiIn. Show that for each k from 1 to n, the first k columns of S1S2Sk are equal to 0T. Then explain how this demonstrates that T satisfies its characteristic polynomial.

23.

Prove Theorem 40.12 that a square matrix A is nilpotent if and only if 0 is the only eigenvalue of A.

Hint 1.

For one direction, use the Cayley-Hamilton Theorem.

Hint 2.

If A is nilpotent and x is an eigenvector of A, what is Amx for every positive integer m? If 0 is the only eigenvalue of A, what is the characteristic polynomial of A?

24.

Let V be a vector space that is a direct sum V=V1V2Vm for some positive integer m. Prove the following.

(a)

ViVj={0} whenever ij.

(b)

If V is finite dimensional, and if Bi is a basis for Vi, then the set B=i=1mBi is a basis for V.

(c)

dim(V)=dim(V1)+dim(V2)++dim(Vm).

25.

Label each of the following statements as True or False. Provide justification for your response.

(a) True/False.

If A is an n×n nilpotent matrix, then the index of A is n.

(b) True/False.

The Jordan canonical form of a matrix is unique.

(c) True/False.

Every nilpotent matrix is singular.

(d) True/False.

Eigenvectors of a linear transformation T are also generalized eigenvectors of T.

(e) True/False.

It is possible for a generalized eigenvector of a matrix A to correspond to a scalar that is not an eigenvalue for A.

(f) True/False.

The vectors in a cycle of generalized eigenvectors of a matrix are linearly independent.

(g) True/False.

A Jordan canonical form of a diagonal matrix A is A.

(h) True/False.

Let V be a finite-dimensional vector space and let T be a linear transformation from V to V. Let J be a Jordan canonical form for T. If B is a basis of V, then a Jordan canonical form of [T]B is J.

(i) True/False.

Matrices with the same Jordan canonical form are similar.

(j) True/False.

Let V be a finite-dimensional vector space and let T be a linear transformation from V to V. If B is an ordered basis for V, then T is nilpotent if and only if [T]B is nilpotent.

(k) True/False.

If T is a linear transformation whose only eigenvalue is 0, then T is the zero transformation.

(l) True/False.

Let V be a finite-dimensional vector space and let T be a linear transformation from V to V. Then V=T(V)Ker(T).

(m) True/False.

Let V be a finite-dimensional vector space and let T be a linear transformation from V to V. If B is an ordered basis for V of generalized eigenvectors of T, then [T]B is a Jordan canonical form for T.

Subsection Project: Modeling an Epidemic

The COVID-19 epidemic has generated many mathematical and statistical models to try to understand the spread of the virus. In this project we examine a simple stochastic model of the spread of an epidemic proposed by Norman Bailey in 1950. 64  This is a model of a relatively mild epidemic in which no one dies from the disease. Of course, mathematicians build on simple models to form more complicated and realistic ones, but this is a good, and accessible, starting point. Bailey writes about the the difficulties in the stochastic 65  analysis of epidemics. For example, the overall epidemic “can often be broken down into smaller epidemics occurring in separate regional subdivisions” and that these regional epidemics “are not necessarily in phase and often interact with each other”. This is behavior that has been clearly evident in the COVID-19 epidemic in the US. Even within a single district, “it is obvious that a given infectious individual has not the same chance of infecting each inhabitant. He will probably be in close contact with a small number of people only, perhaps of the order of 10-50, depending on the nature of his activities.” But then the epidemic for the whole district will “be built up from epidemics taking place in several relatively small groups of associates and acquaintances.” So we can see in this analysis that an epidemic can spread from small, localized areas.

Bailey begins by considering a community of n persons who are susceptible to a disease, and supposes introducing a single infected individual into the community. Bailey makes the following assumptions: “We shall assume that the infection spreads by contact between the members of the community, and that it is not sufficiently serious for cases to be withdrawn from circulation by isolation or death; also that no case becomes clear of infection during the course of the main part of the epidemic.”

To see how the Bailey's model is constructed, we make some assumption. We split the population at any time into two groups, those infected with the disease, and those susceptible, i.e., not currently infected. We assume that once an individual is infected, that individual is always infected. A person catches the disease by interacting with an infected individual. That is, if a susceptible individual meets an infected individual, the chance that the susceptible person contracts the disease is β (the infection rate). For example, there many be a 5% chance that an encounter between a susceptible individual and an infected individual results in the susceptible individual contracting the disease. (Of course, there are many variables involved in such an interaction — if no one is wearing masks and the interaction involves close contact, the infection rate would be higher than if everyone practices social distancing. For the sake of simplicity, we assume one infection rate for the entire population.) With this simple model we assume assume homogeneous mixing in the population. That is, it is equally likely that any one individual will interact with any other in a given time frame. Let y be the number of susceptible individuals in the population at time t. Then the number of infected individuals is the total population minus y, or n+1y (recall that we introduced an infected individual into the population). So the change in the number of susceptible individuals in the population at time s is βy(n+1y) (since the n+1y susceptible individuals interact with the y infected individuals). That is, dyds=βy(n+1y). Bailey makes the substitution of t for βs to simplify the equation. This substitution makes ds=1βdt, which produces the differential equation βdydt=βy(n+1y). The βs cancel, producing the differential equation

(40.9)dydt=y(n+1y)

with y(0)=n.

The above analysis is just to provide some background. Bailey's stochastic model deals with probabilities instead of actual numbers, so we now take that approach. For r from 0 to n, let pr(t) be the probability that there are r susceptible individuals still uninfected at time t. Similar to the case describe above, if there are r susceptible individuals, any one of them can be come infected by interacting with the nr+1 infected individuals. With that in mind, Bailey's model of the spread of the disease is the system

{dpr(t)dt=(r+1)(nr)pr+1(t)r(nr+1)pr(t) for 0rn1dpn(t)dt=npn(t).

That is,

dp0(t)dt=np1(t)dp1(t)dt=2(n1)p2(t)np1(t)dp2(t)dt=3(n2)p3(t)2(n1)p2(t)dpn1(t)dt=npn(t)(n1)(2)pn1(t)dpn(t)dt=npn(t).

Since we start with n susceptible individuals, at time 0 we have pr(0)=0 if 0rn1 and pn(0)=1.

This system can be written in matrix form. Let P(t)=[p0(t)p1(t)pn(t)]. Assuming that we can differentiate a vector function component-wise, our system becomes

dP(t)dt=AP(t)

with initial condition P(0)=[0 0  0 1]T, where

A=[0n0000000n2(n1)00000002(n1)3(n2)000000r(nr+1)(r+1)(nr)0000000002(n1)n0000000n].

The solution to this system involves a matrix exponential eAt. The matrix exponential acts much like our familiar exponential function in that

ddxeAt=AeAt.

With this in mind it is not difficult to see that Bailey's system has solution P(t)=eAtP(0). To truly understand this solution, we need to make sense of the matrix exponential eAt.

We can make sense of the matrix exponential by utilizing the Taylor series of the exponential function centered at the origin. From calculus we know that

(40.10)ex=1+x+12!x2+13!x3++1n!xn+=k01k!xk.

Since powers of square matrices are defined, the matrix exponential eM of a square matrix M is then

(40.11)eM=In+M+12!M2+13!M3++1n!Mn+=k01k!Mk.

Just as in calculus, eM converges for any square matrix M.

Project Activity 40.12.

If M is diagonalizable, then eM can be found fairly easily. Assume that there is an invertible matrix P such that P1MP=D, where D=[λ10000λ200000λn] is a diagonal matrix.

(b)

Now show that

eM=P[eλ10000eλ200000eλn]P1.
Hint.

What is Dk for any positive integer k? Then add corresponding components and compare to (40.10).

As we will see, the matrix A in Bailey's model is not diagonalizable, so we need to learn how to consider the matrix exponential in this new situation. In these cases we can utilize the Jordan canonical form. Of course, the computations are more complicated. We first illustrate with the 2×2 case.

Project Activity 40.13.

Assume A is a 2×2 matrix with Jordan canonical formJ=[λ10λ], where C1AC=J. The same argument as above shows that

eA=CeJC1

and so we only have to be able to find eJ.

(a)

We can write J in the form J=D+N, where D is a diagonal matrix and N is a nilpotent matrix. Find D and N in this case and explain why N is a nilpotent matrix.

(b)

Find the index of N. That is, find the smallest positive power s of N such that Ns=0. Then use (40.11) to find eN.

(c)

Assume that the matrix exponential satisfies the standard property of exponential functions that eR+S=eReS for any n×n matrices R and S. Use this property to explain why

eJ=[eλeλ0eλ].
(d)

Use the previous information to calculate eM where M=[3147].

Now we turn to the general case of finding the matrix exponential eM when M is not diagonalizable. Suppose C is an invertible matrix and C1MC=J, where J is a Jordan canonical form of M. Then we have

eM=eCJC1=CeJC1.

We know that J can be written in the form D+N, where D is a diagonal matrix and N is an upper triangular matrix with zeros on the diagonal and some ones along the superdiagonal. It follows that

eM=CeDeNC1.

We know that if

D=[λ100000λ20000000λn],

then

eD=[eλ100000eλ20000000eλn].

So finding eM boils down to determining eN.

Now N has only zero as an eigenvalue, so N is nilpotent. If k is the index of N, then

eN=I+N+12N2+13!N3++1(k1)!Nk1.

Project Activity 40.14.

Let us apply this analysis to a specific case of Bailey's model, with n=5.

(a)

Find the entries of A when n=5.

(b)

Let J be a Jordan canonical form for A. Explain why the solution to Bailey's model has the form

P(t)=eAtP(0)=CeDteNtC1P(0),

where C is an invertible matrix, Dt is a diagonal matrix, and Nt is a nilpotent matrix.

(c)

Find a Jordan canonical form J for A.

(d)

Find a diagonal matrix D and a nilpotent matrix N so that J=D+N.

(e)

Find eDt and eNt. Then show that

P(t)=[1+1723e5t80te5t+1253e8t200te8t100e9t2203e5t+80te5t3203e8t+320te8t+180e9t10e5t+80e8t120te8t90e9t103e5t403e8t+10e9t53e5t53e8te5t].

One use for mathematical models like Bailey's is to make predictions that can help set policies. Recall that we made the substitution of t for βs in our original equation in order to make the equations dimensionless, where β is the infection rate — the rate at which people catch the disease. Let us replace t with βs in our solution and analyze the effect of changing the value of β.

Project Activity 40.15.

If β=1, then the disease is easily transmitted from person to person. If β can be made smaller, then the disease is not so easily transmitted. We continue to work with the case where n=5. Plot the curves p0(t) through p5(t) on the same set of axes for the following values of β:

(a)β=1(b)β=0.5(c)β=0.25.

Explain what you see and how this might be related to the phrase “flattening the curve” used during the COVID-19 pandemic of 2020.

In general, the matrix A has eigenvalues of algebraic multiplicity 1 and 2. When n is even, n+1 is odd and the eigenvalues of A will occur in pairs except for the single eigenvalue 0 of multiplicity 1. When n is odd, n+1 is even and we have two eigenvalues of multiplicity 1: 0 and the eigenvalue r(nr+1) when r=n+12, at which r(nr+1)=(n+1)24. It can be shown, although we won't do it here, that every eigenvalue of algebraic multiplicity 2 has geometric multiplicity 1. This information completely determines the Jordan canonical formof A.

Bailey, N. T. J. (1950) A simple stochastic epidemic. Biometrika 37
The word stochastic refers to quantities that have a random pattern that can be analyzed statistically, but generally cannot be predicted precisely.