Section 28 Quadratic Forms and the Principal Axis Theorem
Focus Questions
By the end of this section, you should be able to give precise and thorough answers to the questions listed below. You may want to keep these questions in mind to focus your thoughts as you complete the section.
What is a quadratic form on \(\R^n\text{?}\)
What does the Principal Axis Theorem tell us about quadratic forms?
Subsection Application: The Tennis Racket Effect
Try an experiment with a tennis racket (or a squash racket, or a ping pong paddle). Let us define a 3D coordinate system with the center of the racket as the origin with the head of the racket lying in the \(xy\)-plane. We let \(\vu_1\) be the vector in the direction of the handle and \(\vu_2\) the perpendicular direction (still lying in the plane defined by the head) as illustrated in Figure 28.1. We then let \(\vu_3\) be a vector perpendicular to the plane of the head. Hold the racket by the handle and spin it to make one rotation around the \(\vu_1\) axis. This is pretty easy. It is also not difficult to throw the racket so that it rotates around the \(\vu_3\text{.}\) Now toss the racket into the air to make one complete rotation around the axis of the vector \(\vu_2\) and catch the handle. Repeat this several times. You should notice that in most instances, the racket will also have made a half rotation around the \(\vu_1\) axis so that the other face of the racket now points up. This is quite different than the rotations around the \(\vu_1\) and \(\vu_3\) axes. A good video that illustrates this behavior can be seen at youtube.com/watch?v=4dqCQqI-Gis
.
This effect is a result in classical mechanics that describes the rotational movement of a rigid body in space, called the tennis racket effect (or the Dzhanibekov effect, after the Russian cosmonaut Vladimir Dzhanibekov who discovered the theorem's consequences while in zero gravity in space — you can see an illustration of this in the video at youtube.com/watch?v=L2o9eBl_Gzw
). The result is simple to see in practice, but is difficult to intuitively understand why the behavior is different around the intermediate axis. There is a story of a student who asked the famous physicist Richard Feynman if there is any intuitive way to understand the result; Feynman supposedly went into deep thought for about 10 or 15 seconds and answered, “no.” As we will see later in this section, we can understand this effect using the principal axes of a rigid body.
Subsection Introduction
We are familiar with quadratic equations in algebra. Examples of quadratic equations include \(x^2=1\text{,}\) \(x^2+y^2=1\text{,}\) and \(x^2+xy+y^2=3\text{.}\) We don't, however, have to restrict ourselves to two variables. A quadratic equation in \(n\) variables is any equation in which the sum of the exponents in any monomial term is 2. So a quadratic equation in the variables \(x_1\text{,}\) \(x_2\text{,}\) \(\ldots\text{,}\) \(x_n\) is an equation in the form
for some constant \(c\text{.}\) In matrix notation this expression on the left of this equation has the form
where \(\vx = \left[ \begin{array}{c} x_1 \\ x_2 \\ \vdots \\ x_n \end{array} \right]\) and \(A\) is the \(n \times n\) matrix \(A = [a_{ij}]\text{.}\) For example, if \(A= \left[ \begin{array}{rcr} 1\amp 3\amp -2 \\ -1\amp 1\amp 2 \\ 0\amp 2\amp -2 \end{array} \right]\text{,}\) then we get the quadratic expression \(x_1^2+3x_1x_2-2x_1x_3-x_2x_1+x_2^2+2x_2x_3+2x_3x_2-2x_3^2\text{.}\) We should note here that the terms involving \(x_ix_j\) and \(x_jx_i\) are repeated in our sum, but
and so we could replace \(a_{ij}\) and \(a_{ji}\) both with \(\left(\frac{a_{ij} + a_{ji}}{2}\right)\) without changing the quadratic form. With this alteration in mind, we can then assume that \(A\) is a symmetric matrix. So in the previous example, the symmetric matrix \(A'= \left[ \begin{array}{rcr} 1\amp 1\amp -1 \\ 1\amp 1\amp 2 \\ -1\amp 2\amp -2 \end{array} \right]\) gives the same quadratic expression. This leads to the following definition.
Definition 28.2.
A quadratic form on \(\R^n\) is a function \(Q\) defined by
for some \(n \times n\) symmetric matrix \(A\text{.}\)
As we show in Exercise 7, the symmetric matrix \(A\) is unique to the quadratic form, so we call the symmetric matrix \(A\) is the matrix of the quadratic form. It is these quadratic forms that we will study in this section.
Preview Activity 28.1.
(a)
To get a little more comfortable with quadratic forms, write the quadratic forms in matrix form, explicitly identifying the vector \(\vx\) and the symmetric matrix \(A\) of the quadratic form.
(i)
\(3x_1^2-2x_2^2 +4x_1x_2+x_2x_3\)
(ii)
\(x_1x_4 + 4x_2x_3 - x_2^2 + 10x_1x_5\)
(b)
Some quadratic forms form equations in \(\R^2\) that are very familiar: \(x^2+y^2=1\) is an equation of a circle, \(2x^2+3y^2=2\) is an equation of an ellipse, and \(x^2-y^2=1\) is an equation of a hyperbola. Of course, these do not represent all of the quadratic forms in \(\R^2\) — some contain cross-product terms. We can recognize the equations above because they contain no cross-product terms (terms involving \(xy\)). We can more easily recognize the quadratic forms that contain cross-product terms if we can somehow rewrite the forms in a different format with no cross-product terms. We illustrate how this can be done with the quadratic form \(Q\) defined by \(Q(\vx) = x^2-xy+y^2\text{.}\)
(i)
Write \(Q(\vx)\) in the form \(\vx^{\tr} A \vx\text{,}\) where \(A\) is a \(2 \times 2\) symmetric matrix.
(ii)
Since \(A\) is a symmetric matrix we can orthogonally diagonalize \(A\text{.}\) Given that the eigenvalues of \(A\) are \(\frac{3}{2}\) and \(\frac{1}{2}\) with corresponding eigenvectors \(\left[ \begin{array}{r} -1 \\ 1 \end{array} \right]\) and \(\left[ \begin{array}{c} 1 \\ 1 \end{array} \right]\text{,}\) respectively, find a matrix \(P\) that orthogonally diagonalizes \(A\text{.}\)
(iii)
Define \(\vy = \left[ \begin{array}{c} w \\ z \end{array} \right]\) to satisfy \(\vx = P\vy\text{.}\) Substitute for \(\vx\) in the quadratic form \(Q(\vx)\) to write the quadratic form in terms of \(w\) and \(z\text{.}\) What kind of graph does the quadratic equation \(Q(\vx) = 1\) have?
Subsection Equations Involving Quadratic Forms in \(\R^2\)
When we consider equations of the form \(Q(\vx) = d\text{,}\) where \(Q\) is a quadratic form in \(\R^2\) and \(d\) is a constant, we wind up with old friends like \(x^2+y^2=1\text{,}\) \(2x^2+3y^2=2\text{,}\) or \(x^2-y^2=1\text{.}\) As we saw in Preview Activity 28.1 these equations are relatively easy to recognize. However, when we have cross-product terms, like in \(x^2-xy+y^2=1\text{,}\) it is not so easy to identify the curve the equation represents. If there was a way to eliminate the cross-product term \(xy\) from this form, we might be more easily able to recognize its graph. The discussion in this section will focus on quadratic forms in \(\R^2\text{,}\) but we will see later that the arguments work in any number of dimensions. While working in \(\R^2\) we will use the standard variables \(x\) and \(y\) instead of \(x_1\) and \(x_2\text{.}\)
In general, the equation of the form \(Q(\vx) = d\text{,}\) where \(Q\) is a quadratic form in \(\R^2\) defined by a matrix \(A = \left[ \begin{array}{cc} a\amp b/2\\b/2\amp c \end{array} \right]\) and \(d\) is a constant looks like
The graph of an equation like this is either an ellipse (a circle is a special case of an ellipse), a hyperbola, two non-intersecting lines, a point, or the empty set (see Exercise 5). The quadratic forms do not involve linear terms, so we don't consider the cases of parabolas. One way to see into which category one of these quadratic form equations falls is to write the equation in standard form.
The standard forms for quadratic equations in \(\R^2\) are as follows, where \(a\) and \(b\) are nonzero constants and \(h\) and \(k\) are any constants.
- Lines:
\(\ds ax^2 = 1\) or \(\ds ay^2=1\) (\(a > 0\))
- Ellipse:
\(\displaystyle \ds \frac{(x-h)^2}{a^2} + \frac{(y-k)^2}{b^2} = 1\)
- Hyperbola:
\(\ds \frac{(x-h)^2}{a^2} - \frac{(y-k)^2}{b^2} = 1\) or \(\ds \frac{(y-k)^2}{b^2} - \frac{(x-h)^2}{a^2} = 1\)
Preview Activity 28.1 contains the main tool that we need to convert a quadratic form into one of these standard forms. By this we mean that if we have a quadratic form \(Q\) in the variables \(x_1\text{,}\) \(x_2\text{,}\) \(\ldots\text{,}\) \(x_n\text{,}\) we want to find variables \(y_1\text{,}\) \(y_2\text{,}\) \(\ldots\text{,}\) \(y_n\) in terms of \(x_1\text{,}\) \(x_2\text{,}\) \(\ldots\text{,}\) \(x_n\) so that when written in terms of the variables \(y_1\text{,}\) \(y_2\text{,}\) \(\ldots\text{,}\) \(y_n\) the quadratic form \(Q\) contains no cross terms. In other words, we want to find a vector \(\vy = \left[ \begin{array}{c} y_1 \\ y_2 \\ \vdots \\ y_n \end{array} \right]\) so that \(Q(\vx) = \vy^{\tr} D \vy\text{,}\) where \(D\) is a diagonal matrix. Since every real symmetric matrix is orthogonally diagonalizable, we will always be able to find a matrix \(P\) that orthogonally diagonalizes \(A\text{.}\) The details are as follows.
Let \(Q\) be the quadratic form defined by \(Q(\vx) = \vx^{\tr} A \vx\text{,}\) where \(A\) is an \(n \times n\) symmetric matrix. As in Preview Activity 28.1, the fact that \(A\) is symmetric means that we can find an orthogonal matrix \(P = [ \vp_1 \ \vp_2 \ \vp_3 \ \cdots \ \vp_n ]\) whose columns are orthonormal eigenvectors of \(A\) corresponding to eigenvalues \(\lambda_1\text{,}\) \(\lambda_2\text{,}\) \(\ldots\text{,}\) \(\lambda_n\text{,}\) respectively. Letting \(\vy = P^{\tr} \vx\) give us \(\vx = P\vy\) and
where \(D\) is the diagonal matrix whose \(i\)th diagonal entry is \(\lambda_i\text{.}\)
Moreover, the set \(\B = \{\vp_1, \vp_2, \cdots, \vp_n\}\) is an orthonormal basis for \(\R^n\) and so defines a coordinate system for \(\R^n\text{.}\) Note that if \(y = [y_1 \ y_2 \ \cdots \ y_n]^{\tr}\text{,}\) then
Thus, the coordinate vector of \(\vx\) with respect to \(\B\) is \(\vy\text{,}\) or \([\vx]_{\B} = \vy\text{.}\) We summarize in Theorem 28.3.
Theorem 28.3. Principal Axis Theorem.
Let \(A\) be an \(n \times n\) symmetric matrix. There is an orthogonal change of variables \(\vx = P\vy\) so that the quadratic form \(Q\) defined by \(Q(\vx) = \vx^{\tr} A \vx\) is transformed into the quadratic form \(\vy^{\tr} D \vy\) where \(D\) is a diagonal matrix.
The columns of the orthogonal matrix \(P\) in the Principal Axis Theorem form an orthogonal basis for \(\R^n\) and are called the principal axes for the quadratic form \(Q\text{.}\) Also, the coordinate vector of \(\vx\) with respect to this basis is \(\vy\text{.}\)
Activity 28.2.
Let \(Q\) be the quadratic form defined by \(Q(\vx) = 2x^2 + 4xy + 5y^2 = \vx^{\tr} A \vx\text{,}\) where \(\vx = \left[ \begin{array}{c} x \\ y \end{array} \right]\) and \(A = \left[ \begin{array}{cc}2\amp 2\\2\amp 5 \end{array} \right]\text{.}\)
(a)
The eigenvalues of \(A\) are \(\lambda_1 = 6\) and \(\lambda_2=1\) with corresponding eigenvectors \(\vv_1 = [1 \ 2]^{\tr}\) and \(\vv_2 = [-2 \ 1]^{\tr}\text{,}\) respectively. Find an orthogonal matrix \(P\) with determinant 1 that diagonalizes \(A\text{.}\) Is \(P\) unique? Explain. Is there a matrix without determinant 1 that orthogonally diagonalizes \(A\text{?}\) Explain.
(b)
Use the matrix \(P\) to write the quadratic form without the cross-product.
(c)
We can view \(P\) as a change of basis matrix from the coordinate system defined by \(\vy = P^{\tr} \vx\) to the standard coordinate system. In other words, in the standard \(xy\) coordinate system, the quadratic form is written as \(\vx^{\tr}A \vx\text{,}\) but in the new coordinate system defined by \(\vy\) the quadratic form is written as \((P\vy)^{\tr}A(P\vy)\text{.}\) As a change of basis matrix, \(P\) performs a rotation. See if you can recall what we learned about rotation matrices and determine the angle of rotation \(P\) defines. Plot the graph of the quadratic equation \(Q(\vx) = 1\) in the new coordinate system and identify this angle on the graph. Interpret the result.
Subsection Classifying Quadratic Forms
If we draw graphs of equations of the type \(z=Q(\vx)\text{,}\) where \(Q\) is a quadratic form, we can see that a quadratic form whose matrix does not have 0 as an eigenvalue can take on all positive values (except at \(\vx=\vzero\)) as shown at left in Figure 28.4, all negative values (except at \(\vx=\vzero\)) as shown in the center of Figure 28.4, or both positive and negative values as depicted at right in Figure 28.4. We can see when these cases happen by analyzing the eigenvalues of the matrix that defines the quadratic form. Let \(A\) be a \(2 \times 2\) symmetric matrix with eigenvalues \(\lambda_1\) and \(\lambda_2\text{,}\) and let \(P\) be a matrix that orthogonally diagonalizes \(A\) so that \(P^{\tr}AP = D = \left[ \begin{array}{cc} \lambda_1 \amp 0 \\ 0 \amp \lambda_2 \end{array} \right]\text{.}\) If we let \(\vy = \left[ \begin{array}{c} w\\z \end{array} \right] = P^{\tr}\vx\text{,}\) then
Then \(Q(\vx) \geq 0\) if all of the eigenvalues of \(A\) are positive (with \(Q(\vx) > 0\) when \(\vx \neq \vzero\)) and \(Q(\vx) \leq 0\) if all of the eigenvalues of \(A\) are negative (with \(Q(\vx) \lt 0\) when \(\vx \neq \vzero\)). If one eigenvalue of \(A\) is positive and the other negative, then \(Q(\vx)\) will take on both positive and negative values. As a result, we classify symmetric matrices (and their corresponding quadratic forms) according to these behaviors.
Definition 28.5.
A symmetric matrix \(A\) (and its associated quadratic form \(Q\)) is
positive definite if \(\vx^{\tr}A\vx > 0\) for all \(\vx \neq \vzero\text{,}\)
positive semidefinite if \(\vx^{\tr}A\vx \geq 0\) for all \(\vx\text{,}\)
negative definite if \(\vx^{\tr}A\vx \lt 0\) for all \(\vx \neq \vzero\text{,}\)
negative semidefinite if \(\vx^{\tr}A\vx \leq 0\) for all \(\vx\text{,}\)
indefinite if \(\vx^{\tr}A\vx\) takes on both positive and negative values.
For example, the quadratic form \(Q(\vx) = x^2+y^2\) at left in Figure 28.4 is positive definite (with repeated eigenvalue 1), the quadratic form \(Q(\vx) = -(x^2+y^2)\) at right in Figure 28.4 is negative definite (repeated eigenvalue \(-1\)), and the hyperbolic paraboloid \(Q(\vx) = x^2-y^2\) in the center of Figure 28.4 is indefinite (eigenvalues \(1\) and \(-1\)).
So we have argued that a quadratic form \(Q(\vx) = \vx^{\tr} A \vx\) is positive definite if \(A\) has all positive eigenvalues, negative definite if \(A\) has all negative eigenvalues, and indefinite if \(A\) has both positive and negative eigenvalues. Similarly, the quadratic form is positive semidefinite if \(A\) has all nonnegative eigenvalues and negative semidefinite if \(A\) has all nonpositive eigenvalues. Positive definite matrices are important, as we discuss in the next section.
Subsection Inner Products
We used the dot product to define lengths of vectors, to measure angles between vectors, and to define orthogonality in \(\R^n\text{.}\) We can generalize the notion of orthogonality by using different types of products called inner products that behave like the dot product.
Preview Activity 28.3.
Define a mapping from \(\R^2 \times \R^2\) to \(\R\) by
for \(\vu = [u_1 \ u_2]^{\tr}\) and \(\vv = [v_1 \ v_2]^{\tr}\) in \(\R^2\text{.}\) (The brackets \(\langle \ \rangle\) provide a shorthand way of representing the function.)
(a)
Calculate \(\langle [1 \ 2]^{\tr}, [3,-4]^{\tr} \rangle\text{.}\)
(b)
If \(\vu\) and \(\vv\) are in \(\R^2\text{,}\) is it true that
Verify your answer.
(c)
If \(\vu\text{,}\) \(\vv\text{,}\) and \(\vw\) are in \(\R^2\text{,}\) is it true that
Verify your answer.
(d)
If \(\vu\) and \(\vv\) are in \(\R^2\) and \(c\) is a scalar, is it true that
Verify your answer.
(e)
If \(\vu\) and \(\vv\) are in \(\R^2\text{,}\) must it be the case that \(\langle \vu , \vu \rangle \geq 0\text{?}\) When is \(\langle \vu , \vu \rangle = 0\text{?}\)
(f)
There is a matrix \(A\) such that \(\langle \vu, \vv \rangle = \vu^{\tr} A \vv\text{.}\) Find this matrix \(A\text{.}\)
Preview Activity 28.3 illustrates that there are functions from \(\R^n \times \R^n\) to \(\R\) other than the dot product that satisfy many of the properties in Theorem 23.6. Such functions allow us to broaden our ideas of what right angles look like. These functions are called inner products.
Definition 28.6.
An inner product \(\langle \ , \ \rangle\) on \(\R^n\) is a mapping from \(\R^n \times \R^n \to \R\) satisfying
\(\langle \vu , \vv \rangle = \langle \vv , \vu \rangle\) for all \(\vu\) and \(\vv\) in \(\R^n\text{,}\)
\(\langle \vu + \vv , \vw \rangle = \langle \vu , \vw \rangle + \langle \vv , \vw \rangle\) for all \(\vu\text{,}\) \(\vv\text{,}\) and \(\vw\) in \(\R^n\text{,}\)
\(\langle c\vu , \vv \rangle = c\langle \vu , \vv \rangle\) for all \(\vu\text{,}\) \(\vv\) in \(\R^n\) and all scalars \(c\text{,}\)
\(\langle \vu , \vu \rangle \geq 0\) for all \(\vu\) in \(\R^n\) and \(\langle \vu , \vu \rangle = 0\) if and only if \(\vu = \vzero\text{.}\)
The dot product and the example in Preview Activity 28.3 provide two examples of inner products. The examples below provide two other important inner products on \(\R^n\text{.}\)
-
If \(a_1\text{,}\) \(a_2\text{,}\) \(\ldots\text{,}\) \(a_n\) are positive scalars, then
\begin{equation*} \langle [u_1 \ u_2 \ \cdots \ u_n]^{\tr}, [v_1 \ v_2 \ \cdots \ v_n]^{\tr} \rangle = a_1u_1v_1+a_2u_2v_2+ \cdots + a_nu_nv_n \end{equation*}defines an inner product on \(\R^n\text{.}\)
-
Every invertible \(n \times n\) matrix \(A\) defines an inner product on \(\R^n\) by
\begin{equation*} \langle \vu, \vv \rangle = (A\vu) \cdot (A\vv)\text{.} \end{equation*}
As Exercise 8 will demonstrate, every inner product on \(\R^n\) can be written in the form \(\langle \vu, \vv \rangle = \vu^{\tr} A \vv\) for some special type of matrix \(A\text{.}\)
Activity 28.4.
Let \(A\) be a symmetric \(n \times n\) matrix, and define \(\langle \ , \ \rangle : \R^n \times \R^n \to \R\) by
(a)
Explain why it is necessary for \(A\) to be positive definite in order for (28.1) to define an inner product on \(\R^n\text{.}\)
(b)
Show that (28.1) defines an inner product on \(\R^n\) if \(A\) is positive definite.
(c)
Let \(\langle \ , \ \rangle\) be the mapping from \(\R^2\times \R^2\to \R\) defined by
Find a matrix \(A\) so that \(\langle \vx, \vy \rangle = \vx^{\tr} A \vy\) and explain why \(\langle \ , \ \rangle\) defines an inner product.
Subsection Examples
What follows are worked examples that use the concepts from this section.
Example 28.7.
Write the given quadratic equation in a system in which it has no cross-product terms.
(a)
\(8x^2-4xy+5y^2 = 1\)
Solution.
We write the quadratic form \(Q(x,y)=8x^2-4xy+5y^2\) as \(\vx^{\tr} A \vx\text{,}\) where \(\vx = \left[ \begin{array}{c} x\\y \end{array} \right]\) and \(A = \left[ \begin{array}{rr} 8\amp -2\\-2\amp 5 \end{array} \right]\text{.}\) The eigenvalues for \(A\) are \(9\) and \(4\text{,}\) and bases for the corresponding eigenspaces \(E_9\) and \(E_{4}\) are \(\{[-2 \ 1]^{\tr}\}\) and \(\{[1 \ 2]^{\tr}\}\text{,}\) respectively. An orthogonal matrix \(P\) that orthogonally diagonalizes \(A\) is
If \(\vy = [u \ v]^{\tr}\) and we let \(\vx = P\vy\text{,}\) then we can rewrite the quadratic equation \(8x^2-4xy+5y^2 = 1\) as
So the quadratic equation \(8x^2-4xy+5y^2=1\) is an ellipse.
(b)
\(x^2+4xy+y^2=1\)
Solution.
We write the quadratic form \(Q(x,y)=x^2+4xy+y^2\) as \(\vx^{\tr} A \vx\text{,}\) where \(\vx = \left[ \begin{array}{c} x\\y \end{array} \right]\) and \(A = \left[ \begin{array}{cc} 1\amp 2\\2\amp 1 \end{array} \right]\text{.}\) The eigenvalues for \(A\) are \(3\) and \(-1\text{,}\) and bases for the corresponding eigenspaces \(E_3\) and \(E_{-1}\) are \(\{[1 \ 1]^{\tr}\}\) and \(\{[-1 \ 1]^{\tr}\}\text{,}\) respectively. An orthogonal matrix \(P\) that orthogonally diagonalizes \(A\) is
If \(\vy = [u \ v]^{\tr}\) and we let \(\vx = P\vy\text{,}\) then we can rewrite the quadratic equation \(x^2+4xy+y^2 = 1\) as
So the quadratic equation \(x^2+4xy+y^2=1\) is a hyperbola.
(c)
\(4x^2+4y^2+4z^2+4xy+4xz+4yz-3=0\)
Solution.
We write the quadratic form \(Q(x,y,z)=4x^2+4y^2+4z^2+4xy+4xz+4yz\) as \(\vx^{\tr} A \vx\text{,}\) where \(\vx = \left[ \begin{array}{c} x\\y\\z \end{array} \right]\) and \(A = \left[ \begin{array}{ccc} 4\amp 2\amp 2\\2\amp 4\amp 2\\2\amp 2\amp 4 \end{array} \right]\text{.}\) The eigenvalues for \(A\) are \(2\) and \(8\text{,}\) and bases for the corresponding eigenspaces \(E_2\) and \(E_8\) are \(\{[-1 \ 0 \ 1]^{\tr}, [-1 \ 1\ 0]^{\tr}\}\) and \(\{[1 \ 1\ 1]^{\tr}\}\text{,}\) respectively. Applying the Gram-Schmidt process to the basis for \(E_2\) gives us an orthogonal basis \(\{\vw_1, \vw_2\}\) of \(E_2\text{,}\) where \(\vw_1 = [-1 \ 0 \ 1]^{\tr}\) and
An orthogonal matrix \(P\) that orthogonally diagonalizes \(A\) is
If \(\vy = [u \ v \ w]^{\tr}\) and we let \(\vx = P\vy\text{,}\) then we can rewrite the quadratic equation \(4x^2+4y^2+4z^2+4xy+4xz+4yz = 3\) as
So the quadratic equation \(4x^2+4y^2+4z^2+4xy+4xz+4yz-3 = 0\) is a ellipsoid.
Example 28.8.
Let \(A\) and \(B\) be positive definite matrices, and let \(C = \left[ \begin{array}{rr}5\amp -3\\-3\amp 3 \end{array} \right]\text{.}\)
(a)
Must \(A\) be invertible? Justify your answer.
Solution.
Since \(A\) has all positive eigenvalues and \(\det(A)\) is the product of the eigenvalues of \(A\text{,}\) then \(\det(A) > 0\text{.}\) Thus, \(A\) is invertible.
(b)
Must \(A^{-1}\) be positive definite? Justify your answer.
Solution.
The fact that \(A\) is positive definite means that \(A\) is also symmetric. Recall that \(\left(A^{-1}\right)^{\tr} = \left(A^{\tr}\right)^{-1}\text{.}\) Since \(A\) is symmetric, it follows that \(\left(A^{-1}\right)^{\tr} = A^{-1}\) and \(A^{-1}\) is symmetric. The eigenvalues of \(A^{-1}\) are the reciprocals of the eigenvalues of \(A\text{.}\) Since the eigenvalues of \(A\) are all positive, so are the eigenvalues of \(A^{-1}\text{.}\) Thus, \(A^{-1}\) is positive definite.
(c)
Must \(A^{2}\) be positive definite? Justify your answer.
Solution.
Notice that
so \(A^2\) is symmetric. The eigenvalues of \(A^2\) are the squares of the eigenvalues of \(A\text{.}\) Since no eigenvalue of \(A\) is \(0\text{,}\) the eigenvalues of \(A^2\) are all positive and \(A^2\) is positive definite.
(d)
Must \(A+B\) be positive definite? Justify your answer.
Solution.
We know that \(B\) is symmetric, and
so \(A+B\) is symmetric. Also, the fact that \(\vx^{\tr}A\vx > 0\) and \(\vx^{\tr}B\vx > 0\) for all \(\vx\) implies that
for all \(\vx\text{.}\) Thus, \(A+B\) is positive definite.
(e)
Is \(C\) positive definite? Justify your answer.
Solution.
The matrix \(C\) is symmetric and
So the eigenvalues of \(C\) are \(4 + \sqrt{10}\) and \(4 - \sqrt{10} \approx 0.8\text{.}\) Since the eigenvalues of \(C\) are both positive, \(C\) is positive definite.
Subsection Summary
-
A quadratic form on \(\R^n\) is a function \(Q\) defined by
\begin{equation*} Q(\vx) = \vx^{\tr} A \vx \end{equation*}for some \(n \times n\) symmetric matrix \(A\text{.}\)
The Principal Axis Theorem tells us that there is a change of variable \(\vx = P\vy\) that will remove the cross-product terms from a quadratic form and allow us to identify the form and determine the principal axes for the form.
Exercises Exercises
1.
Find the matrix for each quadratic form.
(a)
\(x_1^2 - 2x_1x_2 + 4x_2^2\) if \(\vx\) is in \(\R^2\)
(b)
\(10x_1^2 + 4x_1x_3 + 2x_2x_3 + x_3^2\) if \(\vx\) is in \(\R^3\)
(c)
\(2x_1x_2 + 2x_1x_3 - x_1x_4 + 5x_2^2 + 4x_3x_4 + 8x_4^2\) if \(\vx\) is in \(\R^4\)
2.
For each quadratic form, identify the matrix \(A\) of the form, find a matrix \(P\) that orthogonally diagonalizes \(A\text{,}\) and make a change of variable that transforms the quadratic form into one with no cross-product terms.
(a)
\(x_1^2+2x_1x_2+x_2^2\)
(b)
\(-2x_1^2+2x_1x_2+4x_1x_3-2x_2^2-4x_2x_3-x_3^2\)
(c)
\(11x_1^2-12x_1x_2-12x_1x_3-12x_1x_4-x_2^2-2x_3x_4\)
3.
One topic in multivariable calculus is constrained optimization. We can use the techniques of this section to solve certain types of constrained optimization problems involving quadratic forms. As an example, we will find the maximum and minimum values of the quadratic form defined by the matrix \(\left[ \begin{array}{cc} 2\amp 1\\1\amp 2 \end{array} \right]\) on the unit circle.
(a)
First we determine some bounds on the values of a quadratic form. Let \(Q\) be the quadratic form defined by the \(n \times n\) real symmetric matrix \(A\text{.}\) Let \(\lambda_1 \geq \lambda_2 \geq \cdots \geq \lambda_n\) be the eigenvalues of \(A\text{,}\) and let \(P\) be a matrix that orthogonally diagonalizes \(A\text{,}\) with \(P^{\tr}AP = D\) as the matrix with diagonal entries \(\lambda_1\text{,}\) \(\lambda_2\text{,}\) \(\ldots\text{,}\) \(\lambda_n\) in order. Let \(\vy = [y_1 \ y_2 \ \cdots \ y_n]^{\tr} = P^{\tr} [x_1 \ x_2 \ \cdots \ x_n]^{\tr}\text{.}\)
(i)
Show that
(ii)
Use the fact that \(\lambda_1 \geq \lambda_i\) for each \(i\) and the fact that \(P\) (and \(P^{\tr}\)) is an orthogonal matrix to show that
Substitute in part i.
(iii)
Now show that \(Q(\vx) \geq \lambda_n ||\vx||\text{.}\)
Make an argument similar to part ii.
(b)
Use the result of part (a) to find the maximum and minimum values of the quadratic form defined by the matrix \(\left[ \begin{array}{cc} 2\amp 1\\1\amp 2 \end{array} \right]\) on the unit circle.
4.
In this exercise we characterize the symmetric, positive definite, \(2 \times 2\) matrices with real entries in terms of the entries of the matrices. Let \(A = \left[ \begin{array}{cc} a\amp b/2\\b/2\amp c \end{array} \right]\) for some real numbers \(a\text{,}\) \(b\text{,}\) and \(c\text{.}\)
(a)
Assume that \(A\) is positive definite.
(i)
Show that \(a\) must be positive.
(ii)
Use the fact that the eigenvalues of \(A\) must be positive to show that \(ac > b^2\text{.}\) We conclude that if \(A\) is positive definite, then \(a> 0\) and \(ac > b^2\text{.}\)
(b)
Now show that if \(a > 0\) and \(ac > b^2\text{,}\) then \(A\) is positive definite. This will complete our classification of positive definite \(2 \times 2\) matrices.
5.
In this exercise we determine the form of
where \(A\) is a symmetric \(2 \times 2\) matrix. Let \(P\) be a matrix that orthogonally diagonalizes \(A\) and let \(\vy = P^{\tr}\vx\text{.}\)
(a)
Substitute \(\vy\) for \(P^{\tr}\vx\) in the equation \(\vx^{\tr}A\vx = 1\text{.}\) What form does the resulting equation have (write this form in terms of the eigenvalues of \(A\))?
(b)
What kind of graph does the equation (28.2) have if \(A\) is positive definite? Why?
(c)
What kind of graph does the equation (28.2) have if \(A\) has both positive and negative eigenvalues? Why?
(d)
What kind of graph does the equation (28.2) have if one eigenvalue of \(A\) is zero and the other non-zero? Why?
6.
Let \(A = [a_{ij}]\) be a symmetric \(n \times n\) matrix.
(a)
Show that \(\ve_i^{\tr}A\ve_j = a_{ij}\text{,}\) where \(\ve_i\) is the \(i\)th standard unit vector for \(\R^n\text{.}\) (This result will be useful in Exercise 7)
(b)
Let \(\vu\) be a unit eigenvector of \(A\) with eigenvalue \(\lambda\text{.}\) Find \(\vu^{\tr}A\vu\) in terms of \(\lambda\text{.}\)
7.
Suppose \(A\) and \(B\) are symmetric \(n \times n\) matrices, and let \(Q_A(\vx) = \vx^{\tr}A\vx\) and \(Q_B(\vx) = \vx^{\tr} B\vx\text{.}\) If \(Q_A(\vx) = Q_B(\vx)\) for all \(\vx\) in \(\R^n\text{,}\) show that \(A=B\text{.}\) (Hint: Use Exercise 6 (a) to compare \(Q_A(\ve_i)\) and \(Q_B(\ve_i)\text{,}\) then compare \(Q_A(\ve_i+\ve_j)\) to \(Q_B(\ve_i + \ve_j)\) for \(i \neq j\text{.}\)) Thus, quadratic forms are uniquely determined by their symmetric matrices.
Use the previous exercise to compare \(Q_A(\ve_i)\) and \(Q_B(\ve_i)\text{,}\) then compare \(Q_A(\ve_i+\ve_j)\) to \(Q_B(\ve_i + \ve_j)\) for \(i \neq j\text{.}\)
8.
In this exercise we analyze all inner products on \(\R^n\text{.}\) Let \(\langle \ , \ \rangle\) be an inner product on \(\R^n\text{.}\) Let \(\vx = [x_1 \ x_2 \ \ldots \ x_n]^{\tr}\) and \(\vy = [y_1 \ y_2 \ \ldots \ y_n]^{\tr}\) be arbitrary vectors in \(\R^n\text{.}\) Then
where \(\ve_i\) is the \(i\)th standard vector in \(\R^n\text{.}\)
(a)
Explain why
(b)
Calculate the matrix product
and compare to (28.3). What do you notice?
(c)
Explain why any inner product on \(\R^n\) is of the form \(\vx^{\tr} A \vy\) for some symmetric, positive definite matrix \(A\text{.}\)
9.
Exercise 8 shows that any inner product \(\langle \, \rangle\) on \(\R^n\) has the form \(\langle \vu, \vv \rangle = \vu^{\tr} A \vv\) for some symmetric, positive definite matrix \(A\text{.}\) Find the matrix \(A\) for which \(\vu \cdot \vv = \vu^{\tr} A \vv\) for all \(\vu\) and \(\vv\) in \(\R^n\text{.}\)
10.
Let \(\langle \ , \ \rangle\) be an inner product on \(\R^n\text{,}\) let \(\vu\text{,}\) \(\vv\text{,}\) and \(\vw\) be vectors \(\R^n\text{,}\) and let \(c\) be a scalar. Verify the following properties.
(a)
\(\langle \vzero , \vv \rangle = \langle \vv , \vzero \rangle = 0\)
(b)
\(\langle \vu , c\vv \rangle = c\langle \vu , \vv \rangle\)
(c)
\(\langle \vv+\vw , \vu \rangle = \langle \vv , \vu \rangle + \langle \vw , \vu \rangle\)
(d)
\(\langle \vu - \vv, \vw \rangle = \langle \vw, \vu - \vv \rangle= \langle \vu , \vw \rangle - \langle \vv , \vw \rangle= \langle \vw, \vu \rangle - \langle \vw, \vv \rangle\)
11.
We extend the notions of length and orthogonality in \(\R^n\) with respect to an inner product as follows. If \(\langle \, \rangle\) is an inner product on \(\R^n\text{,}\) we define the length of a vector \(\vu\) with respect to the inner product to be \(||\vu|| = \sqrt{\langle \vu, \vu \rangle}\text{.}\) We can also define two vectors \(\vu\) and \(\vv\) to be orthogonal if \(\langle \vu, \vv \rangle = 0\text{.}\) In this exercise we verify the Pythagorean Theorem with respect to an inner product.
The Pythagorean Theorem states that if \(a\) and \(b\) are the lengths of the legs of a right triangle whose hypotenuse has length \(c\text{,}\) then \(a^2+b^2=c^2\text{.}\) If we think of the legs as defining vectors \(\vu\) and \(\vv\text{,}\) then the hypotenuse is the vector \(\vu+\vv\) and we can restate the Pythagorean Theorem as
In this exercise we show that this result holds in any dimension and for any inner product. Use an arbitrary inner product \(\langle \ , \ \rangle\text{.}\)
(a)
Let \(\vu\) and \(\vv\) be orthogonal vectors in \(\R^n\text{.}\) Show that \(||\vu+\vv||^2 = ||\vu||^2+||\vv||^2\text{.}\)
Expand \(||\vu+\vv||^2\) using the dot product.
(b)
Must it be true that if \(\vu\) and \(\vv\) are vectors in \(\R^n\) with \(||\vu+\vv||^2 = ||\vu||^2+||\vv||^2\text{,}\) then \(\vu\) and \(\vv\) are orthogonal? If not, provide a counterexample. If true, verify the statement.
Expand \(||\vu+\vv||^2\) using the dot product.
12.
The Cauchy-Schwarz inequality,
for any vectors \(\vu\) and \(\vv\) in \(\R^n\text{,}\) is considered one of the most important inequalities in mathematics. We verify the Cauchy-Schwarz inequality for an arbitrary inner product \(\langle \ , \ \rangle\) in this exercise. Let \(\vu\) and \(\vv\) be vectors in \(\R^n\text{.}\)
(a)
Explain why the inequality (28.4) is true if either \(\vu\) or \(\vv\) is the zero vector. As a consequence, we assume that \(\vu\) and \(\vv\) are nonzero vectors for the remainder of this exercise.
(b)
Let \(\vw = \proj_{\vv} \vu = \frac{\langle \vu, \vv \rangle }{||\vv||^2} \vv\) and let \(\vz = \vu - \vw\text{.}\) We know that \(\langle \vw, \vz \rangle = 0\text{.}\) Use Exercise 11 of this section to show that
(c)
Now show that \(||\vw||^2 = \frac{|\langle \vu, \vv \rangle^2}{||\vv||^2}\text{.}\)
(d)
Combine parts (b) and (c) to explain why equation (28.4) is true.
13.
Let \(\vu\) and \(\vv\) be vectors in \(\R^n\text{.}\) Then \(\vu\text{,}\) \(\vv\) and \(\vu+\vv\) form a triangle. We should then expect that the length of any one side of the triangle is smaller than the sum of the lengths of the other sides (since the straight line distance is the shortest distance between two points). In other words, we expect that
Equation (28.5) is called the Triangle Inequality. Use the Cauchy-Schwarz inequality (Exercise 12) to prove the triangle inequality for any inner product on \(\R^n\text{.}\)
Expand \(||\vu+\vv||^2\) using the dot product.
14.
Label each of the following statements as True or False. Provide justification for your response.
(a) True/False.
If \(Q\) is a quadratic form, then there is exactly one matrix \(A\) such that \(Q(\vx) = \vx^{\tr}A\vx\text{.}\)
(b) True/False.
The matrix of a quadratic form is unique.
(c) True/False.
If the matrix of a quadratic form is a diagonal matrix, then the quadratic form has no cross-product terms.
(d) True/False.
The eigenvectors of the symmetric matrix \(A\) form the principal axes of the quadratic form \(\vx^{\tr}A\vx\text{.}\)
(e) True/False.
The principal axes of a quadratic form are orthogonal.
(f) True/False.
If \(a\) and \(c\) are positive, then the quadratic equation \(ax^2 + bxy + cy^2 = 1\) defines an ellipse.
(g) True/False.
If the entries of a symmetric matrix \(A\) are all positive, then the quadratic form \(\vx^{\tr}A\vx\) is positive definite.
(h) True/False.
If a quadratic form \(\vx^{\tr}A \vx\) defined by a symmetric matrix \(A\) is positive definite, then the entries of \(A\) are all non-negative.
(i) True/False.
If a quadratic form \(Q(\vx)\) on \(\R^2\) is positive definite, then the graph of \(z=Q(\vx)\) is a paraboloid opening upward.
(j) True/False.
If a quadratic form \(Q(\vx)\) on \(\R^2\) is negative definite, then the graph of \(z=Q(\vx)\) is a paraboloid opening downward.
(k) True/False.
If a quadratic form \(Q(\vx)\) on \(\R^2\) is indefinite, then there is a nonzero vector \(\vx\) such that \(Q(\vx) = 0\text{.}\)
(l) True/False.
If \(Q(\vx)\) is positive definite, then so is the quadratic form \(aQ(\vx)\) for \(a>0\text{.}\)
(m) True/False.
If If \(Q(\vx)=\vx^\tr A\vx\) is indefinite, then at least one of the eigenvalues of \(A\) is negative and at least one positive.
(n) True/False.
If \(n \times n\) symmetric matrices \(A\) and \(B\) define positive definite quadratic forms, then so does \(A+B\text{.}\)
(o) True/False.
If an invertible symmetric matrix \(A\) defines a positive definite quadratic form, then so does \(A^{-1}\text{.}\)
Subsection Project: The Tennis Racket Theorem
If a particle of mass \(m\) and velocity \(v\) is moving in a straight line, its kinetic energy \(KE\) is given by \(KE = \frac{1}{2}mv^2\text{.}\) If, instead, the particle rotates around an axis with angular velocity \(\omega\) (in radians per unit of time), its linear velocity is \(v = r \omega\text{,}\) where \(r\) is the radius of the particle's circular path. Substituting into the kinetic energy formula shows that the kinetic energy of the rotating particle is then \(KE = \frac{1}{2}\left(mr^2\right) \omega^2\text{.}\) The quantity \(mr^2\) is called the moment of inertia of the particle and is denoted by \(I\text{.}\) So \(KE = \frac{1}{2}I\omega^2\) for a rotating particle. Notice that the larger the value of \(r\text{,}\) the larger the inertia. You can imagine this with a figure skater. When a skater spins along their major axis with their arms outstretched, the speed at which they rotate is lower than when they bring their arms into their bodies. The moment of inertia for rotational motion plays a role similar to the mass in linear motion. Essentially, the inertia tells us how resistant the particle is to rotation.
To understand the tennis racket effect, we are interested in rigid bodies as they move through space. Any rigid body in three space has three principal axes about which it likes to spin. These axes are at right angles to each other and pass through the center of mass. Think of enclosing the object in an ellipsoid — the longest axis is the primary axis, the middle axis is the intermediate axis, and the third axis is the third axis. As a rigid body moves through space, it rotates around these axes and there is inertia along each axis. Just like with a tennis racket, if you were to imagine an axle along any of the principal axes and spin the object along that axel, it will either rotate happily with no odd behavior like flipping, or it won't. The former behavior is that of a stable axis and the latter an unstable axis. The Tennis Racket Theorem is a statement about the rotation of the body. Essentially, the Tennis Racket Theorem states that the rotation of a rigid object around its primary and third principal axes is stable, while rotation around its intermediate axis is not. To understand why this is so, we need to return to moments of inertia.
Assume that we have a rigid body moving through space. Euler's (rotation) equation describes the rotation of a rigid body with respect to the body's principal axes of inertia. Assume that \(I_1\text{,}\) \(I_2\text{,}\) and \(I_3\) are the moments of inertia around the primary, intermediate, and third principal axes with \(I_1 > I_2 > I_3\text{.}\) Also assume that \(\omega_1\text{,}\) \(\omega_2\text{,}\) and \(\omega_3\) are the components of the angular velocity along each axis. When there is no torque applied, using a principal orthogonal coordinates, Euler's equation tells us that
(The dots indicate a derivative with respect to time, which is common notation in physics.) We will use Euler's equations to understand the Tennis Racket Theorem.
Project Activity 28.5.
To start, we consider rotation around the first principal axis. Our goal is to show that rotation around this axis is stable. That is, small perturbations in angular velocity will have only small effects on the rotation of the object. So we assume that \(\omega_2\) and \(\omega_3\) are small. In general, the product of two small quantities will be much smaller, so (28.6) implies that \(\dot{\omega}_1\) must be very small. So we can disregard \(\dot{\omega}_1\) in our calculations.
(a)
Differentiate (28.7) with respect to time to explain why
(b)
Substitute for \(\dot{\omega}_3\) from (28.8) to show that \(\omega_2\) is an approximate solution to
for some positive constant \(k\text{.}\)
(c)
The equation (28.9) is a differential equation because it is an equation that involves derivatives of a function. Show by differentiating twice that, if
(where \(A\) and \(B\) are any scalars), then \(\omega_2\) is a solution to (28.9). (In fact, \(\omega_2\) is the general solution to (28.9), which is verified in just about any course in differential equations.)
Equation (28.10) shows that \(\omega_2\) is is bounded, so that any slight perturbations in angular velocity have a limited effect on \(\omega_2\text{.}\) A similar argument can be made for \(\omega_3\text{.}\) This implies that the rotation around the principal axes is stable — slight changes in angular velocity have limited effects on the rotations around the other axes.
We can make a similar argument for rotation around the third principal axes.
Project Activity 28.6.
In this activity, repeat the process from Project Activity to show that rotation around the third principal axis is stable. So assume that \(\omega_1\) and \(\omega_3\) are small, which implies by (28.8) implies that \(\dot{\omega}_3\) must be very small and can be disregarded in calculations.
Now the issue is why is rotation around the second principal axis different.
Project Activity 28.7.
Now assume that \(\omega_1\) and \(\omega_3\) are small. Thus, \(\dot{\omega}_2\) is very small by (28.7), and we consider \(\dot{\omega}_2\) to be negligible.
(a)
Differentiate (28.6) to show that
(b)
Substitute for \(\dot{\omega}_3\) from (28.8) to show that \(\omega_1\) is an approximate solution to
for some positive scalar \(k\text{.}\)
(c)
The fact that the constant multiplier in (28.11) is positive instead of negative as in (28.9) completely changes the type of solution. Show that
(where \(A\) and \(B\) are any scalars) is a solution to (28.11) (and, in fact, is the general solution). Explain why this shows that rotation around the second principal axis is not stable.