By the end of this section, you should be able to give precise and thorough answers to the questions listed below. You may want to keep these questions in mind to focus your thoughts as you complete the section.
What is the characteristic polynomial of a matrix?
What is the characteristic equation of a matrix?
How and why is the characteristic equation of a matrix useful?
How many different eigenvalues can an matrix have?
How large can the dimension of the eigenspace corresponding to an eigenvalue be?
Pour cream into your cup of coffee and the cream spreads out; straighten up your room and it soon becomes messy again; when gasoline is mixed with air in a car's cylinders, it explodes if a spark is introduced. In each of these cases a transition from a low energy state (your room is straightened up) to a higher energy state (a messy, disorganized room) occurs. This can be described by entropy — a measure of the energy in a system. Low energy is organized (like ice cubes) and higher energy is not (like water vapor). It is a fundamental property of energy (as described by the second law of thermodynamics) that the entropy of a system cannot decrease. In other words, in the absence of any external intervention, things never become more organized.
The Ehrenfest model 34 is a Markov process proposed to explain the statistical interpretation of the second law of thermodynamics using the diffusion of gas molecules. This process can be modeled as a problem of balls and bins, as we will do later in this section. The characteristic polynomial of the transition matrix will help us find the eigenvalues and allow us to analyze our model.
We have seen that the eigenvalues of an matrix are the scalars so that has a nontrivial null space. Since a matrix has a nontrivial null space if and only if the matrix is not invertible, we can also say that is an eigenvalue of if
This equation is called the characteristic equation of . It provides us an algebraic way to find eigenvalues, which can then be used in finding eigenvectors corresponding to each eigenvalue. Suppose we want to find the eigenvalues of . Note that
with determinant . Hence, the eigenvalues are the solutions of the characteristic equation . Using quadratic formula, we find that and are the eigenvalues.
In this activity, our goal will be to use the characteristic equation to obtain information about eigenvalues and eigenvectors of a matrix with real entries.
For each of the following parts, use the characteristic equation to determine the eigenvalues of . Then, for each eigenvalue , find a basis of the corresponding eigenspace, i.e., Nul . You might want to recall how to find a basis for the null space of a matrix from Section 13. Also, make sure that your eigenvalue candidate yields nonzero eigenvectors in Nul for otherwise will not be an eigenvalue.
Use your eigenvalue and eigenvector calculations of the above problem as a guidance to answer the following questions about a matrix with real entries.
If a matrix is an upper-triangular matrix (i.e., all entries below the diagonal are 0's, as in the first two matrices of the previous problem), what can you say about its eigenvalues? Explain.
How many linearly independent eigenvectors can be found for a matrix? Is it possible to have a matrix without 2 linearly independent eigenvectors? Explain.
Until now, we have been given eigenvalues or eigenvectors of a matrix and determined eigenvectors and eigenvalues from the known information. In this section we use determinants to find (or approximate) the eigenvalues of a matrix. From there we can find (or approximate) the corresponding eigenvectors. The tool we will use is a polynomial equation, the characteristic equation, of a square matrix whose roots are the eigenvalues of the matrix. The characteristic equation will then provide us with an algebraic way of finding the eigenvalues of a square matrix.
We have seen that the eigenvalues of a square matrix are the scalars so that has a nontrivial null space. Since a matrix has a nontrivial null space if and only if the matrix is not invertible, we can also say that is an eigenvalue of if
Note that if is an matrix, then is a polynomial of degree . Furthermore, if has real entries, the polynomial has real coefficients. This polynomial, and the equation (18.2) are given special names.
As we argued in Preview Activity 18.1, a matrix can have at most 2 eigenvalues. For an matrix, the characteristic polynomial will be a degree polynomial, and we know from algebra that a degree polynomial can have at most roots. Since an eigenvalue of a matrix is a root of the characteristic polynomial of that matrix, we can conclude that an matrix can have at most distinct eigenvalues. Activity 18.2 (b) shows that a matrix may have fewer than eigenvalues, however. Note that one of these eigenvalues, the eigenvalue 1, appears three times as a root of the characteristic polynomial of the matrix. The number of times an eigenvalue appears as a root of the characteristic polynomial is called the (algebraic) multiplicity of the eigenvalue. More formally:
Thus, in Activity 18.2 (b) the eigenvalue 1 has multiplicity 3 and the eigenvalue 2 has multiplicity 1. Notice that if we count the eigenvalues of an matrix with their multiplicities, the total will always be .
If is a matrix with real entries, then the characteristic polynomial will have real coefficients. It is possible that the characteristic polynomial can have complex roots, and that the matrix has complex eigenvalues. The Fundamental Theorem of Algebra shows us that if a real matrix has complex eigenvalues, then those eigenvalues will appear in conjugate pairs, i.e., if is an eigenvalue of , then is another eigenvalue of . Furthermore, for an odd degree polynomial, since the complex eigenvalues will come in conjugate pairs, we will be able to find at least one real eigenvalue.
Recall that for each eigenvalue of an matrix , the eigenspace of corresponding to the eigenvalue is Nul . These eigenspaces can tell us important information about the matrix transformation defined by . For example, consider the matrix transformation from to defined by , where
We are interested in understanding what this matrix transformation does to vectors in . First we note that has eigenvalues and , with having multiplicity . There is a pair and of linearly independent eigenvectors for corresponding to the eigenvalue and an eigenvector for corresponding to the eigenvalue . Note that the vectors ,, and are linearly independent (recall from Theorem that eigenvectors corresponding to different eigenvalues are always linearly independent). So any vector in can be written uniquely as a linear combination of ,, and . Let's now consider the action of the matrix transformation on a linear combination of ,, and . Note that
Equation (18.3) illustrates that it is most convenient to view the action of in the coordinate system where Span serves as the -axis, Span serves as the -axis, and Span as the -axis. In this case, we can visualize that when we apply the transformation to a vector in the result is an output vector that is unchanged in the - plane and scaled by a factor of in the direction. For example, consider the box whose sides are determined by the vectors ,, and as shown in Figure 18.4. The transformation stretches this box by a factor of in the direction and leaves everything else alone, as illustrated in Figure 18.4. So the entire Span is unchanged by , but Span is scaled by . In this situation, the eigenvalues and eigenvectors provide the most convenient perspective through which to visualize the action of the transformation .
This geometric perspective illustrates how each eigenvalue and the corresponding eigenspace of tells us something important about . So it behooves us to learn a little more about eigenspaces.
There is a connection between the dimension of the eigenspace of a matrix corresponding to an eigenvalue and the multiplicity of that eigenvalue as a root of the characteristic polynomial. Recall that the dimension of a subspace of is the number of vectors in a basis for the eigenspace. We investigate the connection between dimension and multiplicity in the next activity.
The examples in Activity 18.3 all provide instances of the principle that the dimension of an eigenspace corresponding to an eigenvalue cannot exceed the multiplicity of . Specifically:
The examples we have seen raise another important point. The matrix from our geometric example has two eigenvalues and , with the eigenvalue 1 having multiplicity 2. If we let represent the eigenspace of corresponding to the eigenvalue , then and . If we change this matrix slightly to the matrix we see that has two eigenvalues and , with the eigenvalue 1 having multiplicity 2. However, in this case we have (like the example in from Activity 18.2 (a) and Activity 18.3 (a)). In this case the vector forms a basis for and the vector forms a basis for . We can visualize the action of on the square formed by and in the -plane as a scaling by 2 in the direction as shown in Figure 18.6, but since we do not have a third linearly independent eigenvector, the action of in the direction of is not so clear.
So the action of a matrix transformation can be more easily visualized if the dimension of each eigenspace is equal to the multiplicity of the corresponding eigenvalue. This geometric perspective leads us to define the geometric multiplicity of an eigenvalue.
To find a basis for the eigenspace of corresponding to the eigenvalue , we find a basis for Nul . The reduced row echelon form of is . If , then has general solution
.
Therefore, is a basis for the eigenspace of corresponding to the eigenvalue . To find a basis for the eigenspace of corresponding to the eigenvalue , we find a basis for Nul . The reduced row echelon form of is . If , then has general solution
.
Therefore, a basis for the eigenspace of corresponding to the eigenvalue is .
Is it possible to find a basis for consisting of eigenvectors of ? Explain.
Solution.
Let ,, and . Since eigenvectors corresponding to different eigenvalues are linearly independent, and since neither nor is a scalar multiple of the other, we can conclude that the set is a linearly independent set with vectors. Therefore, is a basis for consisting of eigenvectors of .
Find a matrix that has an eigenvector with corresponding eigenvalue , an eigenvector with corresponding eigenvalue , and an eigenvector with corresponding eigenvalue . Explain your process.
Solution.
We are looking for a matrix such that , and . Since ,, and are eigenvectors corresponding to different eigenvalues, ,, and are linearly independent. So the matrix is invertible. It follows that
Let be an matrix. In this part of the exercise we argue the general case illustrated in the previous part — that is the product of the eigenvalues of . Let be the characteristic polynomial of .
Let ,,, be the eigenvalues of (note that these eigenvalues may not all be distinct). Recall that if is a root of a polynomial , then is a factor of . Use this idea to explain why
Suppose is an matrix and is an invertible matrix. Explain why the characteristic polynomial of is the same as the characteristic polynomial of , and hence, as a result, the eigenvalues of and are the same.
To realistically model the diffusion of gas molecules we would need to consider a system with a large number of balls as substitutes for the gas molecules. However, the main idea can be seen in a model with a much smaller number of balls, as we will do now. Suppose we have two bins that contain a total of balls between them. Label the bins as Bin 1 and Bin 2. In this case we can think of entropy as the number of different possible ways the balls can be arranged in the system. For example, there is only way for all of the balls to be in Bin 1 (low entropy), but there are ways that we can have one ball in Bin 1 (choose any one of the four different balls, which can be distinguished from each other) and balls in Bin 2 (higher entropy). The highest entropy state has the balls equally distributed between the bins (with different ways to do this).
We assume that there is a way for balls to move from one bin to the other (like having gas molecules pass through a permeable membrane). A way to think about this is that we select a ball (from ball 1 to ball 4, which are different balls) and move that ball from its current bin to the other bin. Consider a “move” to be any instance when a ball changes bins. A state is any configuration of balls in the bins at a given time, and the state changes when a ball is chosen at random and moved to the other bin. The possible states are to have 0 balls in Bin 1 and 4 balls in Bin 2 (State 0, entropy 1), 1 ball in Bin 1 and 3 in Bin 2 (State 1, entropy 4), 2 balls in each Bin (State 2, entropy 6), 3 balls in Bin 1 and 1 ball in Bin 2 (State 3, entropy 4), and 4 balls in Bin 1 and 0 balls in Bin 2 (State 4, entropy 1). These states are shown in Figure 18.10.
To model the system of balls in bins we need to understand how the system can transform from one state to another. It suffices to count the number of balls in Bin 1 (since the remaining balls will be in Bin 2). Even though the balls are labeled, our count only cares about how many balls are in each bin. Let , where is the probability that Bin 1 contains balls, and let , where is the probability that Bin 1 contains balls after the first move. We will call the vectors and probability distributions of balls in bins. Note that since all four balls have to be placed in some bin, the sum of the entries in our probability distribution vectors must be . Recall that a move is an instance when a ball changes bins. We want to understand how is obtained from . In other words, we want to figure out what the probability that Bin 1 contains 0, 1, 2, 3, or 4 balls after one ball changes bins if our initial probability distribution of balls in bins is .
We begin by analyzing the ways that a state can change. For example,
Suppose there are balls in Bin 1. (In our probability distribution , this happens with probability .) Then there are four balls in Bin 2. The only way for a ball to change bins is if one of the four balls moves from Bin 2 to Bin 1, putting us in State 1. Regardless of which ball moves, we will always be put in State 1, so this happens with a probability of . In other words, if the probability that Bin 1 contains balls is , then there is a probability of that Bin 1 will contain 1 ball after the move.
Suppose we have 1 ball in Bin 1. There are four ways this can happen (since there are four balls, and the one in Bin 1 is selected at random from the four balls), so the probability of a given ball being in Bin 1 is .
If the ball in Bin 1 moves, that move puts us in State . In other words, if the probability that Bin 1 contains 1 ball is , then there is a probability of that Bin 1 will contain balls after a move.
If any of the balls in Bin 2 moves (each moves with probability ), that move puts us in State 2. In other words, if the probability that Bin 1 contains 1 ball is , then there is a probability of that Bin 1 will contain balls after a move.
This example is an example of a Markov process (see Definition 9.4). There are several questions we can ask about this model. For example, what is the long-term behavior of this system, and how does this model relate to entropy? That is, given an initial probability distribution vector , the system will have probability distribution vectors ,, after subsequent moves. What happens to the vectors as goes to infinity, and what does this tell us about entropy? To answer these questions, we will first explore the sequence numerically, and then use the eigenvalues and eigenvectors of to analyze the sequence .
Suppose we begin with a probability distribution vector . Calculate vectors for enough values of so that you can identify the long term behavior of the sequence. Describe this behavior.
Find the characteristic polynomial of . Factor the characteristic polynomial into a product of linear polynomials to show that the eigenvalues of are ,,, and .
As we will see a bit later, certain eigenvectors for will describe the end behavior of the sequence . Find eigenvectors for corresponding to the eigenvalues and . Explain how the eigenvector for corresponding to the eigenvalue explains the behavior of one of the sequences was saw in Project Activity 18.5. (Any eigenvector of with eigenvalue is called an equilibrium or steady state vector.)
To make the notation easier, we will let be an eigenvector of corresponding to the eigenvalue , an eigenvector of corresponding to the eigenvalue , an eigenvector of corresponding to the eigenvalue , an eigenvector of corresponding to the eigenvalue , and an eigenvector of corresponding to the eigenvalue .
Use the result of part (a), Equation (18.4), and Project Activity 18.6 (b) to explain why the sequence is either eventually fixed or oscillates between two states. Compare to the results from Project Activity 18.5. How are these results related to entropy? You may use the facts that
is an eigenvector for corresponding to the eigenvalue ,
is an eigenvector for corresponding to the eigenvalue ,
is an eigenvector for corresponding to the eigenvalue ,
is an eigenvector for corresponding to the eigenvalue ,
is an eigenvector for corresponding to the eigenvalue .
named after Paul and Tatiana Ehrenfest who introduced it in “Über zwei bekannte Einwände gegen das Boltzmannsche H-Theorem,” Physikalishce Zeitschrift, vol. 8 (1907), pp. 311-314)