Skip to main content

Section 21 Complex Eigenvalues

Subsection Application: The Gershgorin Disk Theorem

We have now seen different methods for calculating/approximating eigenvalues of a matrix. The algebraic method using the characteristic polynomial can provide exact values, but only in cases where the size of the matrix is small. Methods like the power method allow us to approximate eigenvalues in many, but not all, cases. These approximation techniques can be made more efficient if we have some idea of where the eigenvalues are. The Gershgorin Disc Theorem is a useful tool that can quickly provide bounds on the location of eigenvalues using elementary calculations. For example, using the Gershsgorin Disk Theorem we can quickly tell that the real parts of the eigenvalues of the matrix

\begin{equation*} \left[ \begin{array}{ccr} 3\amp 1\amp -1 \\ 0\amp -1+i\amp i \\ 2\amp 1\amp -2i \end{array} \right] \end{equation*}

lie between \(-4\) and 5 and the imaginary parts lie between \(-5\) and 2. Even more, we can say that the eigenvalues lie in the disks (called Gershgorin disks) shown in Figure 21.1.

Figure 21.1. Gershgorin disks.

We will learn more details about the Gershgorin Disk Theorem at the end of this section.

Subsection Introduction

So far we have worked with real matrices whose eigenvalues are all real. However, the characteristic polynomial of a matrix with real entries can have complex roots. In this section we investigate the properties of these complex roots and their corresponding eigenvectors, how these complex eigenvectors are found, and the geometric interpretation of the transformations defined by matrices with complex eigenvalues. Although we can consider matrices that have complex numbers as entries, we will restrict ourselves to matrices with real entries.

Preview Activity 21.1.

Let \(A = \left[ \begin{array}{rc} 2\amp 4 \\ -2\amp 2 \end{array} \right]\text{.}\)

(a)

Find the characteristic polynomial of \(A\text{.}\)

(b)

Find the eigenvalues of \(A\text{.}\) You should get two complex numbers. How are these complex numbers related?

(c)

Find an eigenvector corresponding to each eigenvalue of \(A\text{.}\) You should obtain vectors with complex entries.

Subsection Complex Eigenvalues

As you noticed in Preview Activity 21.1, the complex roots of the characteristic equation of a real matrix \(A\) come in complex conjugate pairs. This should come as no surprise since we know through our use of the quadratic formula that complex roots of (real) quadratic polynomials come in complex conjugate pairs. More generally, if \(p(x)=a_0+a_1x+a_2x^2 + \cdots + a_nx^n\) is a polynomial with real coefficients and \(z\) is a root of this polynomial, meaning \(p(z)=0\text{,}\) then

\begin{equation*} 0=\overline{p(z)} = \overline{a_0+a_1z+a_2z^2 + \cdots + a_nz^n} = a_0 + a_1 \overline{z} + a_2 \overline{z}^2 + \cdots + a_n \overline{z}^n = p(\overline{z})\,\text{.} \end{equation*}

Therefore, \(\overline{z}\) is also a root of \(p(x)\text{.}\)

Activity 21.2.

Let \(A=\left[ \begin{array}{cr} 0\amp -1 \\ 1\amp 0 \end{array} \right]\text{.}\)

(a)

The matrix transformation \(T:\R^2 \to \R^2\) defined by \(T(\vx)=A\vx\) is a rotation transformation. What is the angle of rotation?

(b)

Find the eigenvalues of \(A\text{.}\) For each eigenvalue, find an eigenvector.

In Preview Activity 21.1 and in Activity 21.2, you found that if \(\vv\) is an eigenvector of \(A\) corresponding to \(\lambda\text{,}\) then \(\overline{\vv}\) obtained by taking the complex conjugate of each entry in \(\vv\) is an eigenvector of \(A\) corresponding to \(\overline{\lambda}\text{.}\) Specifically, if \(\vv=\vu+i\vw\) where both \(\vu\) and \(\vw\) are real vectors is an eigenvector of \(A\text{,}\) then so is \(\overline{\vv} = \vu-i\vw\text{.}\) We can justify this property using matrix algebra as follows:

\begin{equation*} A\overline{\vv} = \overline{A}\overline{\vv} = \overline{A\vv} = \overline{\lambda \vv} = \overline{\lambda} \overline{\vv} \,\text{.} \end{equation*}

In the first equality, we used the fact that \(A\) is a real matrix, so \(\overline{A}=A\text{.}\) In all the other equalities, we used the properties of the conjugation operation in complex numbers.

Subsection Rotation and Scaling Matrices

Recall that a rotation matrix is of the form

\begin{equation*} R_\theta= \left[ \begin{array}{cr} \cos(\theta)\amp -\sin(\theta) \\ \sin(\theta)\amp \cos(\theta) \end{array} \right] \end{equation*}

where the rotation is counterclockwise about the origin by an angle of \(\theta\) radians. In Activity 21.2, we considered the rotation matrix with angle \(\pi/2\) in counterclockwise direction. We will soon see that rotation matrices play an important role in the geometry of a matrix transformation for a matrix that has complex eigenvalues. In this activity, we will restrict ourselves to the \(2 \times 2\) case, but similar arguments can be made in higher dimensions.

Activity 21.3.

Let \(A=\left[ \begin{array}{rc} 1\amp 1 \\ -1\amp 1 \end{array} \right]\text{.}\)

(a)

Explain why \(A\) is not a rotation matrix.

(b)

Although \(A\) is not a rotation matrix, there is a rotation matrix \(B\) inside \(A\text{.}\) To find the matrix \(B\text{,}\) factor out \(\sqrt{2}\) from all entries of \(A\text{.}\) In other words, write \(A\) as a product of two matrices in the form

\begin{equation*} A = \left[ \begin{array}{cc} \sqrt{2} \amp 0 \\ 0\amp \sqrt{2} \end{array} \right] B \;\text{.} \end{equation*}
(c)

The \(B\) matrix is a rotation matrix with an appropriate \(\theta\text{.}\) Find this \(\theta\text{.}\)

(d)

If we think about the product of two matrices as applying one transformation after another, describe the effect of the matrix transformation defined by \(A\) geometrically.

More generally, if we have a matrix \(A\) of the form \(A=\left[ \begin{array}{cr} a\amp -b \\ b\amp a \end{array} \right]\text{,}\) then

\begin{equation*} A = \left[ \begin{array}{cc} \sqrt{a^2+b^2} \amp 0 \\ 0\amp \sqrt{a^2+b^2} \end{array} \right] \left[ \begin{array}{cc} \frac{a}{\sqrt{a^2+b^2}}\amp \frac{-b}{\sqrt{a^2+b^2}} \\ \frac{b}{\sqrt{a^2+b^2}}\amp \frac{a}{\sqrt{a^2+b^2}} \end{array} \right] \,\text{.} \end{equation*}

The first matrix in the decomposition is a scaling matrix with a scaling factor of \(s=\sqrt{a^2+b^2}\text{.}\) So if \(s>1\text{,}\) the transformation stretches vectors, and if \(s\lt 1\text{,}\) the transformation shrinks vectors. The second matrix in the decomposition is a rotation matrix with angle \(\theta\) such that \(\cos(\theta)=\frac{a}{\sqrt{a^2+b^2}}\) and \(\sin(\theta)=\frac{b}{\sqrt{a^2+b^2}}\text{.}\) This angle is also the angle between the positive \(x\)-axis and the vector \(\vv=\left[ \begin{array}{c} a\\ b \end{array} \right]\text{.}\) We will refer to the matrices of the form \(\left[ \begin{array}{cr} a\amp -b \\ b\amp a \end{array} \right]\) as rotation-scaling matrices.

Subsection Matrices with Complex Eigenvalues

Now we will investigate how a general \(2\times 2\) matrix with complex eigenvalues can be seen to be similar (both in a linear algebra and a colloquial meaning) to a rotation-scaling matrix.

Activity 21.4.

Let \(B=\left[ \begin{array}{cr} 1\amp -5\\2\amp 3 \end{array} \right]\text{.}\) The eigenvalues of \(B\) are \(2\pm 3i\text{.}\) An eigenvector for the eigenvalue \(2-3i\) is \(\vv=\left[ \begin{array}{c} -5 \\ 1-3i \end{array} \right]\text{.}\) We will use this eigenvector to show that \(B\) is similar to a rotation-scaling matrix.

(a)

Any complex vector \(\vv\) can be written as \(\vv=\vu+i\vw\) where both \(\vu\) and \(\vw\) are real vectors. What are these real vectors \(\vu\) and \(\vw\) for the eigenvector \(\vv\) above?

(b)

Let \(P=[ \vu \ \vw ]\) be the matrix whose first column is the real part of \(\vv\) and whose second column is the imaginary part of \(\vv\) (without the \(i\)). Find \(R=P^{-1}BP\text{.}\)

(c)

Express \(R\) as a product of a rotation and a scaling matrix. What is the factor of scaling? What is the rotation angle?

In Activity 21.4, we saw that the matrix \(B\) with complex eigenvalues \(2\pm 3i\) is similar to a rotation-scaling matrix. Specifically \(R=P^{-1}BP\text{,}\) where the columns of \(P\) are the real and imaginary parts of an eigenvector of \(B\text{,}\) is the rotation-scaling matrix with a factor of scaling by \(\sqrt{2^2+3^2}\) and a rotation by angle \(\theta=\arccos(\frac{2}{\sqrt{2^2+3^2}})\text{.}\)

Does a similar decomposition result hold for a general \(2\times 2\) matrix with complex eigenvalues? We investigate this question in the next activity.

Activity 21.5.

Let \(A\) be a \(2\times 2\) matrix with complex eigenvalue \(\lambda=a-bi\text{,}\) \(b\neq 0\text{,}\) and corresponding complex eigenvector \(\vv=\vu+i\vw\text{.}\)

(a)

Explain why \(A\vv = A\vu+iA\vw\text{.}\)

(b)

Explain why \(\lambda\vv = (a\vu+b\vw)+i (a\vw-b\vu)\text{.}\)

(c)

Use the previous two results to explain why

  • \(A\vu=a\vu+b\vw\) and

  • \(A\vw = a\vw - b\vu\text{.}\)

(d)

Let \(P=[ \vu \ \vw ]\text{.}\) We will now show that \(AP=PR\) where \(R=\left[ \begin{array}{cr} a\amp -b \\ b\amp a \end{array} \right]\text{.}\)

(i)

Without any calculation, explain why

\begin{equation*} AP = [ A\vu \ A\vw ]\text{.} \end{equation*}
(ii)

Recall that if \(M\) is an \(m \times n\) matrix and \(\vx\) is an \(n \times 1\) vector, then the matrix product \(M\vx\) is a linear combination of the columns of \(M\) with weights the corresponding entries of the vector \(\vx\text{.}\) Use this idea to show that

\begin{equation*} PR = [ a\vu + b\vw \ -b\vu + a\vw ]\text{.} \end{equation*}
(iii)

Now explain why \(AP = PR\text{.}\)

(iv)

Assume for the moment that \(P\) is an invertible matrix. Show that \(A = PRP^{-1}\text{.}\)

Your work in Activity 21.5 shows that any \(2\times 2\) matrix is similar to a rotation-scaling matrix with a factor of scaling by \(\sqrt{a^2+b^2}\) and a rotation by angle \(\theta=\arccos(\frac{a}{\sqrt{a^2+b^2}})\) if \(b\geq 0\text{,}\) and \(\theta=-\arccos(\frac{a}{\sqrt{a^2+b^2}})\) if \(b\lt 0\text{.}\) Geometrically, this means that every \(2 \times 2\) real matrix with complex eigenvalues is just a scaled rotation (\(R\)) with respect to the basis \(\B\) formed by \(\vu\) and \(\vw\) from the complex eigenvector \(\vv\text{.}\) Multiplying by \(P^{-1}\) and \(P\) simply provides the change of basis from the standard basis to the basis \(\B\text{,}\) as we will see in detail when we learn about linear transformations.

The one fact that we have not yet addressed is why the matrix \(P = [ \vu \ \vw ]\) is invertible. We do that now to complete the argument.

Let \(A\) be a real \(2 \times 2\) matrix with \(A \vv = \lambda \vv\text{,}\) where \(\lambda = a-bi\text{,}\) \(b \neq 0\) and \(\vv=\vu+i\vw\) (where \(\vu\) and \(\vv\) are in \(\R^2\)) with \(\vw \neq \vzero\text{.}\) From Activity 21.5 we know that

\begin{equation*} A\vu = a\vu + b\vw \ \text{ and } \ A\vw = a\vw - b \vu\text{.} \end{equation*}

To show that \(\vu\) and \(\vw\) are linearly independent, we need to show that no nontrivial linear combination of \(\vu\) and \(\vw\) can be the zero vector. Suppose

\begin{equation*} x_1\vu + x_2\vw = \vzero \end{equation*}

for some scalars \(x_1\) and \(x_2\text{.}\) We will show that \(x_1 = x_2 = 0\text{.}\) Assume to the contrary that one of \(x_1, x_2\) is not zero. First, assume \(x_1 \neq 0\text{.}\) Then \(\vu = -\frac{x_2}{x_1} \vw\text{.}\) Let \(c = -\frac{x_2}{x_1}\text{.}\) From this we have

\begin{align*} A \vu \amp = A(c\vw)\\ A \vu \amp = cA \vw\\ a\vu + b\vw \amp = c(a \vw - b \vu)\\ (a+cb)\vu \amp = (ca-b)\vw\\ (a+cb)(c\vw) \amp = (ca-b)\vw\text{.} \end{align*}

Since \(\vw \neq \vzero\) we must have \((a+cb)c = ca-b\text{.}\) A little algebra shows that \((c^2+1)b = 0\text{.}\) Since \(b \neq 0\text{,}\) we conclude that \(c^2+1=0\text{,}\) which is impossible for a real constant \(c\text{.}\) Therefore, we cannot have \(x_1 \neq 0\text{.}\) A similar argument (left to the reader) shows that \(x_2 = 0\text{.}\) Thus we can conclude that \(\vu\) and \(\vw\) are linearly independent.

Subsection Examples

What follows are worked examples that use the concepts from this section.

Example 21.3.

Let \(A = \left[ \begin{array}{rcr} 0\amp 1\amp 0 \\ -1\amp 0\amp -1 \\ 1\amp 1\amp 1 \end{array} \right]\text{.}\)

(a)

Without doing any computations, explain why not all of the eigenvalues of \(A\) can be complex.

Solution.

Since complex eigenvalues occur in conjugate pairs, the complex eigenvalues with nonzero imaginary parts occur in pairs. Since \(A\) can have at most 3 different eigenvalues, at most two of them can have nonzero imaginary parts. So at least one eigenvalue of \(A\) is real.

(b)

Find all of the eigenvalues of \(A\text{.}\)

Solution.

For this matrix \(A\) we have \(A - \lambda I_3 = \left[ \begin{array}{rrc} -\lambda\amp 1\amp 0 \\ -1\amp -\lambda\amp -1 \\ 1\amp 1\amp -\lambda+1 \end{array} \right]\text{.}\) Using a cofactor expansion along the first row gives us

\begin{align*} \det(A - \lambda I_3) \amp = (-\lambda)\left((-\lambda)(1-\lambda)+1\right) - \left((-1)(1-\lambda)+1\right)\\ \amp = -\lambda^3 + \lambda^2 - \lambda +1 - \lambda -1\\ \amp = \lambda^3 + \lambda^2 - 2\lambda\\ \amp = -\lambda(\lambda^2 -\lambda + 2)\text{.} \end{align*}

The roots of the characteristic polynomial are \(\lambda = 0\) and

\begin{equation*} \lambda = \frac{1 \pm \sqrt{1-4(2)}}{2} = \frac{1}{2}(1 \pm \sqrt{7}i)\text{.} \end{equation*}

Example 21.4.

Let \(A = \left[ \begin{array}{rc} 1\amp 2\\-1\amp 3 \end{array} \right]\text{.}\) Find a rotation scaling matrix \(R\) that is similar to \(A\text{.}\) Identify the rotation and scaling factor.

Solution.

The eigenvalues of \(A\) are the roots of the characteristic polynomial

\begin{align*} p(\lambda) \amp = \det(A - \lambda I_2)\\ \amp = \det\left(\left[ \begin{array}{rc} 1-\lambda\amp 2\\ -1\amp 3-\lambda \end{array} \right]\right)\\ \amp = (1-\lambda)(3-\lambda) + 2\\ \amp = \lambda^2 -4\lambda + 5\text{.} \end{align*}

The quadratic formula shows that the roots of \(p(\lambda)\) are

\begin{equation*} \frac{4 \pm \sqrt{-4}}{2} = 2 \pm i\text{.} \end{equation*}

To find an eigenvector for \(A\) with eigenvalue \(2-i\text{,}\) we row reduce

\begin{equation*} A - (2-i) I_3 = \left[ \begin{array}{cc} -1+i\amp 2\\-1\amp 1+i \end{array} \right] \end{equation*}

to

\begin{equation*} \left[ \begin{array}{cc} 1\amp -i-1\\0\amp 0 \end{array} \right]\text{.} \end{equation*}

An eigenvector for \(A\) with eigenvalue \(2-i\) is then

\begin{equation*} [ 1+i \ \ 1]^{\tr} = [1 \ 1]^{\tr} + i[1 \ 0]^{\tr}\text{.} \end{equation*}

Letting \(P = \left[ \begin{array}{cc} 1\amp 1\\1\amp 0 \end{array} \right]\text{,}\) we have

\begin{equation*} R = P^{-1}AP = \left[ \begin{array}{cr} 2\amp -1\\1\amp 2 \end{array} \right]\text{.} \end{equation*}

The scaling is determined by the determinant of \(R\) which is \(5\text{,}\) and the angle \(\theta\) of rotation satisfies \(\sin(\theta) = \frac{1}{5}\text{.}\) This makes \(\theta \approx 0.2014\) radians or approximately \(11.5370^{\circ}\) counterclockwise.

Subsection Summary

  • For a real matrix, complex eigenvalues appear in conjugate pairs. Specifically, if \(\lambda=a+ib\) is an eigenvalue of a real matrix \(A\text{,}\) then \(\overline{\lambda}=a-ib\) is also an eigenvalue of \(A\text{.}\)

  • For a real matrix, if a \(\vv\) is an eigenvector corresponding to \(\lambda\text{,}\) then the vector \(\overline{\vv}\) obtained by taking the complex conjugate of each entry in \(\vv\) is an eigenvector corresponding to \(\overline{\lambda}\text{.}\)

  • The rotation-scaling matrix \(A=\left[ \begin{array}{cr} a\amp -b \\ b\amp a \end{array} \right]\) can be written as

    \begin{equation*} \left[ \begin{array}{cc} \sqrt{a^2+b^2} \amp 0 \\ 0\amp \sqrt{a^2+b^2} \end{array} \right] \left[ \begin{array}{cc} \frac{a}{\sqrt{a^2+b^2}}\amp \frac{-b}{\sqrt{a^2+b^2}} \\ \frac{b}{\sqrt{a^2+b^2}}\amp \frac{a}{\sqrt{a^2+b^2}} \end{array} \right] \,\text{.} \end{equation*}

    This decomposition geometrically means that the transformation corresponding to \(A\) can be viewed as a rotation by angle \(\theta=\arccos\left(\frac{a}{\sqrt{a^2+b^2}}\right)\) if \(b\geq 0\text{,}\) or \(\theta=-\arccos\left(\frac{a}{\sqrt{a^2+b^2}}\right)\) if \(b\lt 0\text{,}\) followed by a scaling by factor \(\sqrt{a^2+b^2}\text{.}\)

  • If \(A\) is a real \(2\times 2\) matrix with complex eigenvalue \(a-bi\) and corresponding eigenvector \(\vv=\vu+i\vw\text{,}\) then \(A\) is similar to the rotation-scaling matrix \(R=\left[ \begin{array}{cr} a\amp -b \\ b\amp a \end{array} \right]\text{.}\) More specifically,

    \begin{equation*} A=PRP^{-1} \; , \text{ where } P=[ \vu \ \vw ]\,\text{.} \end{equation*}

Exercises Exercises

1.

Find eigenvalues and eigenvectors of each of the following matrices.

(a)

\(\left[ \begin{array}{rc} 2\amp 4 \\ -2\amp 2 \end{array} \right]\)

(b)

\(\left[ \begin{array}{rc} 3\amp 2 \\ -1\amp 1 \end{array} \right]\)

(c)

\(\left[ \begin{array}{cr} 1\amp -2 \\ 4\amp -3 \end{array} \right]\)

2.

Find a rotation-scaling matrix where the rotation angle is \(\theta=3\pi/4\) and scaling factor is less than 1.

3.

Determine which rotation-scaling matrices have determinant equal to 1. Be as specific as possible.

4.

Determine the rotation-scaling matrix inside the matrix \(\left[ \begin{array}{rc} 2\amp 4 \\ -2\amp 2 \end{array} \right]\text{.}\)

5.

Find a real \(2\times 2\) matrix with eigenvalue \(1+2i\text{.}\)

6.

Find a real \(2\times 2\) matrix which is not a rotation-scaling matrix with eigenvalue \(-1+2i\text{.}\)

7.

We have seen how to find the characteristic polynomial of an \(n \times n\) matrix. In this exercise we consider the reverse question. That is, given a polynomial \(p(\lambda)\) of degree \(n\text{,}\) can we find an \(n \times n\) matrix whose characteristic polynomial is \(p(\lambda)\text{?}\)

(a)

Find the characteristic polynomial of the \(2 \times 2\) matrix \(C = \left[ \begin{array}{cc} 0\amp -a_0\\1\amp -a_1 \end{array} \right]\text{.}\) Use this result to find a real valued matrix whose eigenvalues are \(1+i\) and \(1-i\text{.}\)

(b)

Repeat part (a) by showing that \(p(\lambda) = -\left(\lambda^3+a_2\lambda^2+a_1\lambda+a_0\right)\) is the characteristic polynomial of the \(3 \times 3\) matrix \(C = \left[ \begin{array}{ccc} 0\amp 0\amp -a_0\\1\amp 0\amp -a_1\\0\amp 1\amp -a_2 \end{array} \right]\text{.}\)

(c)

We can generalize this argument. Prove, using mathematical induction, that the polynomial

\begin{equation*} p(\lambda) =(-1)^n\left( \lambda^n + a_{n-1}\lambda^{n-1} + a_{n-2}\lambda^{n-2} + \cdots + a_1 \lambda + a_0\right) \end{equation*}

is the characteristic polynomial of the matrix

\begin{equation*} C = \left[ \begin{array}{cccccc} 0\amp 0\amp 0\amp \cdots\amp 0\amp -a_{0}\\ 1\amp 0\amp 0\amp \cdots\amp 0\amp -a_{1} \\ 0\amp 1\amp 0\amp \cdots\amp 0\amp -a_2 \\ \vdots \amp \vdots \amp \vdots\amp \ddots \amp \vdots \amp \vdots \\ 0 \amp 0 \amp 0\amp \cdots \amp 1 \amp -a_{n-1} \end{array} \right]\text{.} \end{equation*}

The matrix \(C\) is called the companion matrix for \(p(\lambda)\text{.}\)

8.

Label each of the following statements as True or False. Provide justification for your response.

(a) True/False.

If \(3-4i\) is an eigenvalue of a real matrix, then so is \(3+4i\text{.}\)

(b) True/False.

If \(2+3i\) is an eigenvalue of a \(3\times 3\) real matrix \(A\text{,}\) then \(A\) has three distinct eigenvalues.

(c) True/False.

Every \(2\times 2\) real matrix with complex eigenvalues is a rotation-scaling matrix.

(d) True/False.

Every square matrix with real entries has real number eigenvalues.

(e) True/False.

If \(A\) is a \(2\times 2\) matrix with complex eigenvalues similar to a rotation-scaling matrix \(R\text{,}\) the eigenvalues of \(A\) and \(R\) are the same.

(f) True/False.

If \(A\) is a real matrix with complex eigenvalues, all eigenvectors of \(A\) must be non-real.

Subsection Project: Understanding the Gershgorin Disk Theorem

To understand the Gershgorin Disk Theorem, we need to recall how to visualize a complex number in the plane. Recall that a complex number \(z\) is a number of the form \(z = a+bi\) where \(a\) and \(b\) are real numbers and \(i^2 = -1\text{.}\) The number \(a\) is the real part of \(z\text{,}\) denoted as \(\text{ Re } (z)\text{,}\) and \(b\) is the imaginary part of \(z\text{,}\) denoted \(\text{ Im } (z)\text{.}\) The set of all complex numbers is denoted \(\C\text{.}\) We define addition and multiplication on \(\C\) as follows. For \(a+bi, c+di \in \C\text{,}\)

\begin{equation*} (a+bi) + (c+di) = (a+c) + (b+d)i \ \ \ \ \ \text{ and } \ \ \ \ \ (a+bi)(c+di) = (ac-bd) + (ad+bc)i\text{.} \end{equation*}

Note that the product is what we would expect if we “expanded” the product in the normal way and used the fact that \(i^2=-1\text{.}\) The set of complex numbers forms a field — that is, \(\C\) satisfies all of the same properties as \(\R\) as stated in Theorem 4.3.

We can visualize the complex number \(a+bi\) in the plane as the point \((a,b)\text{.}\) Here we are viewing the horizontal axis as the real axis and the vertical axis as the imaginary axis. The length (or magnitude) of the complex number \(z = a+bi\text{,}\) which we denote as \(|z|\text{,}\) is the distance from the origin to \(z\text{.}\) So by the Pythagorean Theorem we have \(|a+bi| = \sqrt{a^2+b^2}\text{.}\) Note that the magnitude of \(z = a+bi\) can be written as a complex product

\begin{equation*} |z| = \sqrt{(a+bi)(a-bi)}\text{.} \end{equation*}

The complex number \(a-bi\) is called the complex conjugate of \(z=a+bi\) and is denoted as \(\overline{z}\text{.}\) A few important properties of real numbers and their conjugates are the following. Let \(z = a+bi\) and \(w = c+di\) be complex numbers. Then

  • \(\overline{z+w} = \overline{(a+c) + (b+d)i} = (a+c)-(b+d)i = (a-bi) + (c-di) = \overline{z} + \overline{w}\text{,}\)

  • \(\overline{zw} = \overline{(ac-bd) + (ad+bc)i} = (ac-bd)-(ad+bc)i = (a-bi)(c-di) = \overline{z} \overline{w}\text{,}\)

  • \(\overline{\overline{z}} = z\text{,}\)

  • \(|z| = \sqrt{a^2+b^2} \geq \sqrt{a^2} = |a| = |\text{ Re } (z)|\text{,}\)

  • \(|z| = \sqrt{a^2+b^2} \geq \sqrt{b^2} = |b| = |\text{ Im } (z)|\text{,}\)

  • \(\left| \overline{z} \right| = |z|\text{,}\)

  • \(|z| = 0\) if and only if \(z = 0\text{,}\)

  • If \(p(x)\) is a polynomial with real coefficients and the complex number \(z\) satisfies \(p(z) = 0\text{,}\) then \(p\left(\overline{z}\right) = 0\) as well.

Using these facts we can show that the triangle inequality is true for complex numbers. That is,

\begin{equation*} |z+w| \leq |z| + |w|\text{.} \end{equation*}

To see why, notice that

\begin{align*} |z+w|^2 \amp = (z+w)(\overline{z+w})\\ \amp = (z+w)(\overline{z} + \overline{w})\\ \amp = z\overline{z} + z\overline{w} + w \overline{z} + w \overline{w}\\ \amp = z\overline{z} + z\overline{w} + \overline{zw\overline{w}} + w \overline{w}\\ \amp = |z|^2 + 2\text{ Re } (z\overline{w}) + |w|^2\\ \amp \leq |z|^2 + 2|zw| + |w|^2\\ \amp = |z|^2 + 2|z| |w| + |w|^2\\ \amp = (|z|+|w|)^2\text{.} \end{align*}

Since \(|z+w|\text{,}\) \(|z|\text{,}\) and \(|w|\) are all non-negative, taking square roots of both sides gives us \(|z+w| \leq |z| + |w|\) as desired. We can extend this triangle inequality to any number of complex numbers. That is, if \(z_1\text{,}\) \(z_2\text{,}\) \(\ldots\text{,}\) \(z_k\) are complex numbers, then

\begin{equation} |z_1+z_2+ \cdots + z_k| \leq |z_1| + |z_2| + \cdots + |z_k|\text{.}\tag{21.1} \end{equation}

We can prove Equation (21.1) by mathematical induction. We have already done the \(k=2\) case and so we assume that Equation (21.1) is true for any sum of \(k\) complex numbers. Now suppose that \(z_1\text{,}\) \(z_2\text{,}\) \(\ldots\text{,}\) \(z_k\text{,}\) \(z_{k+1}\) are complex numbers. Then

\begin{align*} |z_1+z_2+ \cdots + z_k + z_{k+1}| \amp = |(z_1+z_2+ \cdots + z_k) + z_{k+1}|\\ \amp \leq |z_1+z_2+ \cdots + z_k| + |z_{k+1}|\\ \amp \leq (|z_1| + |z_2| + \cdots + |z_k|) + |z_{k+1}|\\ \amp = |z_1| + |z_2| + \cdots + |z_k| + |z_{k+1}|\text{.} \end{align*}

To prove the Gershgorin Disk Theorem, we will use the Levy-Desplanques Theorem, which gives conditions that guarantee that a matrix is invertible. We illustrate with an example in the following activity.

Project Activity 21.6.

Let \(A = \left[ \begin{array}{rc} 3\amp 2 \\ -1\amp 4 \end{array} \right]\text{.}\) Since \(\det(A) \neq 0\text{,}\) we know that \(A\) is an invertible matrix. Let us assume for a moment that we don't know that \(A\) is invertible and try to determine if 0 is an eigenvalue of \(A\text{.}\) In other words, we want to know if there is a nonzero vector \(\vv\) so that \(A \vv = \vzero\text{.}\) Assuming the existence of such a vector \(\vv = [v_1 \ v_2]^{\tr}\text{,}\) for \(A \vv\) to be \(\vzero\) it must be the case that

\begin{equation*} 3v_1 + 2v_2 = 0\ \text{ and } \ -v_1 + 4v_2 = 0\text{.} \end{equation*}

Since the vector \(\vv\) is not the zero vector, at least one of \(v_1\text{,}\) \(v_2\) is not zero. Note that if one of \(v_1\text{,}\) \(v_2\) is zero, the so is the other. So we can assume that \(v_1\) and \(v_2\) are nonzero.

(a)

Use the fact that \(3v_1+2v_2 = 0\) to show that \(|v_2| > |v_1|\text{.}\)

(b)

Use the fact that \(-v_1 + 4v_2 = 0\) to show that \(|v_1| > |v_2|\text{.}\) What conclusion can we draw about whether 0 is an eigenvalue of \(A\text{?}\) Why does this mean that \(A\) is invertible?

What makes the arguments work in Project Activity 21.6 is that \(|3| > |2|\) and \(|4| > |-1|\text{.}\) This argument can be extended to larger matrices, as described in the following theorem.

Proof.

Let \(A = [a_{ij}]\) be an \(n \times n\) matrix satisfying \(|a_{ii}| > \sum_{j \neq i} |a_{ij}|\) for all \(i\text{.}\) Let us assume that \(A\) is not invertible, that is that there is a vector \(\vv \neq \vzero\) such that \(A \vv = \vzero\text{.}\) Let \(\vv = [v_1 \ v_2 \ \cdots \ v_n]\) and \(t\) be between 1 and \(n\) so that \(|v_t| \geq |v_i|\) for all \(i\text{.}\) That is, choose \(v_t\) to be the component of \(\vv\) with the largest absolute value.

Expanding the product \(A\vv\) using the row-column product along the \(t\)th row shows that

\begin{equation*} a_{t1}v_1 + a_{t2}v_2 + \cdots a_{tn}v_n = 0\text{.} \end{equation*}

Solving for the \(a_{tt}\) term gives us

\begin{equation*} a_{tt}v_t = -(a_{t1}v_1 + a_{t2}v_2 + \cdots a_{t(t-1)}v_{t-1}+a_{t(t+1)}v_{t+1} + \cdots + a_{tn}v_n)\text{.} \end{equation*}

Then

\begin{align*} |a_{tt}| |v_t| \amp = |-(a_{t1}v_1 + a_{t2}v_2 + \cdots a_{t(t-1)}v_{t-1}+a_{t(t+1)}v_{t+1} + \cdots + a_{tn}v_n|\\ \amp = |a_{t1}v_1 + a_{t2}v_2 + \cdots a_{t(t-1)}v_{t-1}+a_{t(t+1)}v_{t+1} + \cdots + a_{tn}v_n|\\ \amp \leq |a_{t1}| |v_1| + |a_{t2}| |v_2| + \cdots |a_{t(t-1)}| |v_{t-1}| + |a_{t(t+1)}| |v_{t+1}| + \cdots + |a_{tn}| |v_n|\\ \amp \leq |a_{t1}| |v_t| + |a_{t2}| |v_t| + \cdots |a_{t(t-1)}| |v_{t}| + |a_{t(t+1)}| |v_{t}| + \cdots + |a_{tn}| |v_t|\\ \amp = (|a_{t1}| + |a_{t2}| + \cdots |a_{t(t-1)}| + |a_{t(t+1)}| + \cdots + |a_{tn}|) |v_t|\text{.} \end{align*}

Since \(|v_t| \neq 0\text{,}\) we cancel the \(|v_t|\) term to conclude that

\begin{equation*} |a_{tt}| \leq |a_{t1}| + |a_{t2}| + \cdots |a_{t(t-1)}| + |a_{t(t+1)}| + \cdots + |a_{tn}|\text{.} \end{equation*}

But this contradicts the condition that \(|a_{ii}| > \sum_{j \neq i} |a_{ij}|\) for all \(i\text{.}\) We conclude that 0 is not an eigenvalue for \(A\) and \(A\) is invertible.

Any matrix \(A = [a_{ij}]\) satisfying the condition of the Levy-Desplanques Theorem is given a special name.

Definition 21.6.

A square matrix \(A = [a_{ij}]\) is strictly diagonally dominant if \(|a_{ii}| > \sum_{j \neq i} |a_{ij}|\) for all \(i\text{.}\)

So any strictly diagonally dominant matrix is invertible. A quick glance can show that a matrix is strictly diagonally dominant. For example, since \(|3| > |1| + |-1|\text{,}\) \(|12| > |5| +|6|\text{,}\) and \(|-8| > |-2| + |4|\text{,}\) the matrix

\begin{equation*} A = \left[ \begin{array}{rcr} 3\amp 1\amp -1 \\ 5\amp 12\amp 6 \\ -2\amp 4\amp -8 \end{array} \right] \end{equation*}

is strictly diagonally dominant and therefore invertible. However, just because a matrix is not strictly diagonally dominant, it does not follow that the matrix is non-invertible. For example, the matrix \(B = \left[ \begin{array}{cc} 1\amp 2 \\ 0\amp 1 \end{array} \right]\) is invertible, but not strictly diagonally dominant.

Now we can address the Gershgorin Disk Theorem.

Activity 21.7.

Let \(A\) be an arbitrary \(n \times n\) matrix and assume that \(\lambda\) is an eigenvalue of \(A\text{.}\)

(a)

Explain why the matrix \(A - \lambda I\) is singular.

(b)

What does the Levy-Desplanques Theorem tell us about the matrix \(A - \lambda I\text{?}\)

(c)

Explain how we can conclude the Gershgorin Disk Theorem.

Based on this theorem, we define a Gershgorin disk to be \(D(a_{ii}, r_i)\text{,}\) where \(r_i = \sum_{j \neq i} |a_{ij}|\text{.}\)

(d)

Use the Gershgorin Disk Theorem to give estimates on the locations of the eigenvalues of the matrix \(A = \left[ \begin{array}{rr} -1\amp 2 \\ -3\amp 2 \end{array} \right]\text{.}\)

The Gershgorin Disk Theorem has a consequence that gives additional information about the eigenvalues if some of the Gershgorin disks do not overlap.

Proof.

Most proofs of this theorem require some results from topology. For that reason, we will not present a completely rigorous proof but rather give the highlights. Let \(A = [a_{ij}]\) be an \(n \times n\) matrix. Let \(D_i\) be a collection of Gershgorin disks of \(A\) for \(1 \leq i \leq m\) such that \(S = \cup_{1 \leq i \leq m} D_i\) does not intersect any other Gershgorin disk of \(A\text{,}\) and let \(S'\) be the union of the Gershgorin disks of \(A\) that are different from the \(D_i\text{.}\) Note that \(S \cap S' = \emptyset\text{.}\) Let \(C\) be the matrix whose \(i\)th column is \(a_{ii}\ve_i\text{,}\) that is \(C\) is the diagonal matrix whose diagonal entries are the corresponding diagonal entries of \(A\text{.}\) Note that the eigenvalues of \(C\) are \(a_{ii}\) and the Gershgorin disks of \(C\) are just the points \(a_{ii}\text{.}\) So our theorem is true for this matrix \(C\text{.}\) To prove the result, we build a continuum of matrices from \(C\) to \(A\) as follows: let \(B = A-C\) (so that \(B\) is the matrix whose off-diagonal entries are those of \(A\) and whose diagonal entries are 0), and let \(A(t) = tB + C\) for \(t\) in the interval \([0,1]\text{.}\) Note that \(A(1) = A\text{.}\) Since the diagonal entries of \(A(t)\) are the same as those of \(A\text{,}\) the Gershgorin disks of \(A(t)\) have the same centers as the corresponding Gershgorin disks of \(A\text{,}\) while the radii of the Gershgorin disks of \(A(t)\) are those of \(A\) but scaled by \(t\text{.}\) So the Gershgorin disks of \(A(t)\) increase from points (the \(a_{ii}\)) to the Gershgorin disks of \(A\) as \(t\) increases from 0 to 1. While the centers of the disks all remain fixed, it is important to recognize that the eigenvalues of \(A(t)\) move as \(t\) changes. An illustration of this is shown in Figure 21.9 with the eigenvalues as the black points and the changing Gershgorin disks dashed in magenta, using the matrix \(\left[ \begin{array}{cc} i\amp \frac{1}{2} \\ 1\amp -2+i \end{array} \right]\text{.}\) We can learn about how the eigenvalues move with the characteristic polynomial.

Figure 21.9. How eigenvalues move.

Let \(p(t,x)\) be the characteristic polynomial of \(A(t)\text{.}\) Note that these characteristic polynomials are functions of both \(t\) and \(x\text{.}\) Since polynomials are continuous functions, their roots (the eigenvalues of \(A(t)\)) are continuous for \(t \in [0,1]\) as well. Let \(\lambda(t)\) be an eigenvalue of \(A(t)\text{.}\) Note that \(\lambda(1)\) is an eigenvalue of \(A\text{,}\) and \(\lambda(0)\) is one of the \(a_{ii}\) and is therefore in \(S\text{.}\) We will argue that \(\lambda(t)\) is in \(S\) for every value of \(t\) in \([0,1]\text{.}\) Let \(r_i\) be the radius of \(D_i\) and let \(D(t)_i\) be the Gershgorin disk of \(A(t)\) with the same center as \(D_i\) and radius \(r(t)_i = tr_i\text{.}\) Let \(S(t) = \cup_{1 \leq i \leq m} D(s)_i\text{.}\) Since \(r(s)_i \leq r_i\text{,}\) it follows that \(D(s)_i \subseteq D_i\) and so \(S(t) \cap S' = \emptyset\) as well. From topology, we know that since the disks \(D_i\) are closed, the union \(S\) of these disks is also closed. Similarly, \(S(t)\) and \(S'\) are closed. Thus, \(\lambda(t)\) is continuous in a closed set and so does not leave the set. Thus, \(\lambda(t)\) is in \(S\) for every value of \(t\) in \([0,1]\text{.}\)