By the end of this section, you should be able to give precise and thorough answers to the questions listed below. You may want to keep these questions in mind to focus your thoughts as you complete the section.
What is the condition number of a matrix and what does it tell us about the matrix?
What is the pseudoinverse of a matrix?
Why are pseudoinverses useful?
How does the pseudoinverse of a matrix allow us to find least squares solutions to linear systems?
You are probably familiar with the Global Positioning System (GPS). The system allows anyone with the appropriate software to accurately determine their location at any time. The applications are almost endless, including getting real-time driving directions while in your car, guiding missiles, and providing distances on golf courses.
The GPS is a worldwide radio-navigation system owned by the US government and operated by the US Air Force. GPS is one of four global navigation satellite systems. At least twenty four GPS satellites orbit the Earth at an altitude of approximately 11,000 nautical miles. The satellites are placed so that at any time at least four of them can be accessed by a GPS receiver. Each satellite carries an atomic clock to relay a time stamp along with its position in space. There are five ground stations to coordinate and ensure that the system is working properly.
The system works by triangulation, but there is also error involved in the measurements that go into determining position. Later in this section we will see how the method of least squares can be used to determine the receiver's position.
A singular value decomposition has many applications, and in this section we discuss how a singular value decomposition can be used in image compression, to determine how sensitive a matrix can be to rounding errors in the process of row reduction, and to solve least squares problems.
The digital age has brought many new opportunities for the collection, analysis, and dissemination of information. Along with these opportunities come new difficulties as well. All of this digital information must be stored in some way and be retrievable in an efficient manner. A singular value decomposition of digitally stored information can be used to compress the information or clean up corrupted information. In this section we will see how a singular value decomposition can be used in image compression. While a singular value decomposition is normally used with very large matrices, we will restrict ourselves to small examples so that we can more clearly see how a singular value decomposition is applied.
The outer product decomposition of writes as a sum of rank 1 matrices (the summands . Each summand contains some information about the matrix . Since is the largest of the singular values, it is reasonable to expect that the summand contains the most information about among all of the summands. To get a measure of how much information contains of , we can think of as simply a long vector in where we have folded the data into a rectangular array (we will see later why taking the norm as the norm of the vector in makes sense, but for now, just use this definition). If we are interested in determining the error in approximating an image by a compressed image, it makes sense to use the standard norm in to determine length and distance, which is really just the Frobenius norm that comes from the Frobenius inner product defined by
,(30.1)
where and are matrices. (That (30.1) defines an inner product on the set of all matrices is left to discuss in a later section.) So in this section all the norms for matrices will refer to the Frobenius norm. Rather than computing the distance between and to measure the error, we are more interested in the relative error
The first step in compressing an image is to digitize the image. There are many ways to do this and we will consider one of the simplest ways and only work with gray-scale images, with the scale from 0 (black) to 255 (white). A digital image can be created by taking a small grid of squares (called pixels) and coloring each pixel with some shade of gray. The resolution of this grid is a measure of how many pixels are used per square inch. As an example, consider the 16 by 16 pixel picture of a flower shown in Figure 30.1.
To store this image pixel by pixel would require units of storage space (1 for each pixel). If we let be the matrix whose th entry is the scale of the th pixel, then is the matrix
Notice that some of these singular values are very small compared to others. As in Preview Activity 30.1, the terms with the largest singular values contain most of the information about the matrix. Thus, we shouldn't lose much information if we eliminate the small singular values. In this particular example, the last 4 singular values are significantly smaller than the rest. If we let
This small example illustrates the general idea. Suppose we had a satellite image that was pixels and we let represent this image. If we have a singular value decomposition of this image , say
if the rank of is large, it is likely that many of the singular values will be very small. If we only keep of the singular values, we can approximate by
and store the image with only the vectors ,,,,,,,. For example, if we only need 10 of the singular values of a satellite image (), then we can store the satellite image with only 20 vectors in or with numbers instead of numbers.
In the context where a matrix represents an image, the operator aspect of the matrix is irrelevant β we are only interested in the matrix as a holder of information. In this situation, we think of an matrix as simply a long vector in where we have folded the data into a rectangular array. If we are interested in determining the error in approximating an image by a compressed image, it makes sense to use the standard norm in to determine length and distance. This leads to what is called the Frobenius norm of a matrix. The Frobenius norm of an matrix is
where the are the columns of and the the columns of . Each of the products is an matrix. Since the columns of are all scalar multiples of , the matrix is a rank 1 matrix. So (30.3) expresses as a sum of rank 1 matrices. Moreover, if we let and be vectors and let and be vectors with and , then
We call the rank approximation to . Notice that the outer product expansion in (30.5) is in fact a singular value decomposition for . The error in approximating with is
A singular value decomposition for a matrix can tell us a lot about how difficult it is to accurately solve a system . Solutions to systems of linear equations can be very sensitive to rounding as the next exercise demonstrates.
Notice that a simple rounding in the entry of the coefficient matrix led to a significantly different solution. If there are rounding errors at any stage of the Gaussian elimination process, they can be compounded by further row operations. This is an important problem since computers can only approximate irrational numbers with rational numbers and so rounding can be critical. Finding ways of dealing with these kinds of errors is an area of on-going research in numerical linear algebra. This problem is given a name.
A matrix that is not ill-conditioned is said to be well-conditioned. Since small changes in entries of ill-conditioned matrices can lead to large errors in computations, it is an important problem in linear algebra to have a way to measure how ill-conditioned a matrix is. This idea will ultimately lead us to the condition number of a matrix.
Suppose we want to solve the system , where is an invertible matrix. Activity 30.4 illustrates that if is really close to being singular, then small changes in the entries of can have significant effects on the solution to the system. So the system can be very hard to solve accurately if is close to singular. It is important to have a sense of how βgoodβ we can expect any calculated solution to be. Suppose we think we solve the system but, through rounding error in our calculation of , get a solution so that , where is not exactly . Let be the error in our calculated solution and the difference between and . We would like to know how large the error can be. But this isn't exactly the right question. We could scale everything to make as large as we want. What we really need is a measure of the relative error, or how big the error is compared to itself. More specifically, we want to know how large the relative error in is compared to the relative error in . In other words, we want to know how good the relative error in is as a predictor of the relative error in (we may have some control over the relative error in , perhaps by keeping more significant digits). So we want know if there is a best constant such that
We return for a moment to the operator norm of a matrix. This is an appropriate norm to use here since we are considering to be a transformation. Recall that if is an matrix, we defined the operator norm of to be
How does a singular value decomposition tell us about the condition number of a matrix? Recall that the maximum value of for on the unit -sphere is . So . If is an invertible matrix and is a singular value decomposition for , then
Let . A computer algebra system gives the singular values of as 2.00025003124999934 and 0.000249968750000509660. What is the condition number of . What does that tell us about ? Does this seem reasonable given the result of Activity 30.4?
Not every matrix is invertible, so we cannot always solve a matrix equation . However, every matrix has a pseudoinverse that acts something like an inverse. Even when we can't solve a matrix equation because isn't in Col , we can use the pseudoinverse of to βsolveβ the equation with the βsolutionβ . While not an exact solution, turns out to be the best approximation to a solution in the least squares sense. We will use the singular value decomposition to find the pseudoinverse of a matrix.
Recall that one property of invertible matrices is that the inverse of a product of invertible matrices is the product of the inverses in the reverse order. If were invertible, then would be . Even though and are invertible, the matrix is not. But does contain non-zero eigenvalues that have reciprocals, so consider the matrix . Calculate the products and . How are the results similar to that obtained with a matrix inverse?
The only matrix in the singular value decomposition of that is not invertible is . But the matrix acts somewhat like an inverse of , so let us define as . Now we explore a few properties of the matrix .
Only some square matrices have inverses. However, every matrix has a pseudoinverse. A pseudoinverse of a matrix provides something like an inverse when a matrix doesn't have an inverse. Pseudoinverses are useful to approximate solutions to linear systems. If is invertible, then the equation has the solution , but when is not invertible and is not in Col , then the equation has no solution. In the invertible case of an matrix , there is a matrix so that . This also implies that and . To mimic this situation when is not invertible, we search for a matrix (a pseudoinverse of ) so that and , as we saw in Preview Activity 30.7. Then it turns out that acts something like an inverse for . In this case, we approximate the solution to by , and we will see that the vector turns out to be the vector in Col that is closest to in the least squares sense.
A reasonable question to ask is how we can find a pseudoinverse of a matrix . A singular value decomposition provides an answer to this question. If is an invertible matrix, then 0 is not an eigenvalue of . As a result, in the singular value decomposition of , the matrix is an invertible matrix (note that ,, and are all matrices in this case). So
To understand in general how a pseudoinverse is found, let be an matrix with , or an with rank less than . In these cases does not have an inverse. But as in Preview Activity 30.7, a singular value decomposition provides a pseudoinverse for . Let be a singular value decomposition of an matrix of rank , with
Pseudoinverses satisfy several properties that are similar to those of inverses. For example, we had an example in Preview Activity 30.7 where and . That always satisfies these properties is the subject of the next activity.
Activity 30.9 shows that satisfies properties that are similar to those of an inverse of . In fact, satisfies several other properties (that together can be used as defining properties) as stated in the next theorem. The conditions of Theorem 30.6 are called the Penrose or Moore-Penrose conditions.β53β Verification of the remaining parts of this theorem are left for the exercises.
The pseudoinverse of a matrix is also connected to least squares solutions of linear systems as we encountered in Section 24. Recall from Section 24, that if the columns of are linearly independent, then the least squares solution to is . In this section we will see how to use a pseudoinverse to solve a least squares problem, and verify that if the columns of are linearly dependent, then is in fact the pseudoinverse of .
and hence the vector is the vector in Col that minimizes . Thus, is in actuality the least squares approximation to . So a singular value decomposition allows us to construct the pseudoinverse of a matrix and then directly solve the least squares problem.
Having to calculate eigenvalues and eigenvectors for a matrix to produce a singular value decomposition to find pseudoinverse can be computationally intense. As we demonstrate in this activity, the process is easier if the columns of are linearly independent. More specifically, we will prove the following theorem.
Since the columns of are linearly independent, it follows that is invertible (see Activity 26.4). So the eigenvalues of are all non-zero. Let ,,, be the singular values of . How is related to , and what do and look like?
Let us now investigate the form of the invertible matrix (note that neither nor is necessarily invertible). If a singular value decomposition of is , show that
Therefore, to calculate and solve a least squares problem, Theorem 30.7 shows that as long as the columns of are linearly independent, we can avoid using a singular value decomposition of in finding .
Find orthogonal matrices and , and the matrix , so that is a singular value decomposition of .
Solution.
Normalizing the eigenvectors ,, and to normal eigenvectors ,, and , respectively, gives us an orthogonal matrix
.
Now , so normalizing the vectors and gives us vectors
and
that are the first two columns of our matrix . Given that is a matrix, we need to find two other vectors orthogonal to and that will combine with and to form an orthogonal basis for . Letting ,,, and , a computer algebra system shows that the reduced row echelon form of the matrix is , so that vectors ,,, are linearly independent. Letting and , the Gram-Schmidt process shows that the set is an orthogonal basis for , where and (using for ) . The set where ,, and is an orthonormal basis for and we can let
.
The singular values of are and , and so
.
Therefore, a singular value decomposition of is of
have a solution? If so, find the solution. If not, find the best approximation you can to a solution to this matrix equation.
Solution.
Augmenting with and row reducing shows that
,
so is not in Col and the equation has no solution. However, the best approximation to a solution to is found using the pseudoinverse of . That best solution is
Use technology to approximate a singular value decomposition (round to four decimal places). Use this svd to approximate the pseudoinverse of . Then use this pseudoinverse to approximate the least squares linear approximation to the system.
Solution.
Technology shows that a singular value decomposition of is approximately , where
.
Thus, with , we have that the pseudoinverse of is approximately
.
So our least squares linear approximation is found by
.
This makes our least squares linear approximation to be (to four decimal places)
The condition number of an matrix is the number . The condition number provides a measure of how well the relative error in a calculated value predicts the relative error in when we are trying to solve a system .
A pseudoinverse of a matrix can be found through a singular value decomposition. Let be a singular value decomposition of an matrix of rank , with
If is the matrix defined by
,
then .
A pseudoinverse of a matrix acts like an inverse for . So if we can't solve a matrix equation because isn't in Col , we can use the pseudoinverse of to βsolveβ the equation with the βsolutionβ . While not an exact solution, turns out to be the best approximation to a solution in the least squares sense.
How close can a matrix be to being non-invertible? We explore that idea in this exercise. Let be the upper triangular matrix with 1s along the diagonal and with every other entry being .
Use the pattern established in parts (i.) and (ii.) to find a non-trivial solution to the equation for an arbitrary value of . Be sure to verify that you have a solution. Is invertible?
Explain why is not an invertible matrix. Notice that and differ by a single entry, and that is invertible and is not. Let us examine how close is to . Calculate ? What happens to as goes to infinity? How close can an invertible matrix be to becoming non-invertible?
Exercise 8 in Section 29 shows if is a singular value decomposition for a symmetric matrix , then so is . Recall that for any positive integer . We can exploit this idea to define to be the matrix
,
where is the matrix whose diagonal entries are the square roots of the corresponding entries of . Let . Calculate and show that .
In this exercise we show that the pseudoinverse of a matrix is the unique matrix that satisfies the Moore-Penrose conditions. Let be an matrix with singular value decomposition and pseudoinverse . To show that is the unique matrix that satisfies the Moore-Penrose conditions, suppose that there is another matrix that also satisfies the Moore-Penrose conditions.
In all of the examples that we have done finding a singular value decomposition of a matrix, it has been the case (though we haven't mentioned it), that if is an matrix, then rankrank. Prove this result.
In this project we discuss some of the details about how the GPS works. The idea is based on intersections of spheres. To build a basic understanding of the system, we begin with a 2-dimensional example.
Suppose that there are three base stations ,, and in that can send and receive signals from your mobile phone. Assume that is located at point , at point , and at point . Also assume that your mobile phone location is point . Based on the time that it takes to receive the signals from the three base stations, it can be determined that your distance to base station is km, your distance to base station is km, and your distance to base station is km using a coordinate system with measurements in kilometers based on a reference point chosen to be . Due to limitations on the measurement equipment, these measurements all contain some unknown error which we will denote as . The goal is to determine your location in based on this information.
If the distance readings were accurate, then the point would lie on the circle centered at of radius . The distance from to base station can be represented in two different ways: km and . However, there is some error in the measurements (due to the receiver clock and satellite clocks not being snychronized), so we really have
If all measurements were accurate, your position would be at the intersection of the circles centered at with radius km, centered at with radius km, and centered at with radius km as shown in Figure 30.12. Even though the figure might seem to imply it, because of the error in the measurements the three circles do not intersect in one point. So instead, we want to find the best estimate of a point of intersection that we can. The system of equations (30.11), (30.12), and (30.13) is non-linear and can be difficult to solve, if it even has a solution. To approximate a solution, we can linearize the system. To do this, show that if we subtract corresponding sides of equation (30.11) from (30.12) and expand both sides, we can obtain the linear equation
We have reduced our system of three non-linear equations to the system
of two linear equations in the unknowns ,, and . Use technology to find a pseudoinverse of the coefficient matrix of this system. Use the pseudoinverse to find the least squares solution to this system. Does your solution correspond to an approximate point of intersection of the three circles?
Project Activity 30.11 provides the basic idea behind GPS. Suppose you receive a signal from a GPS satellite. The transmission from satellite provides four pieces of information β a location and a time stamp according to the satellite's atomic clock. The time stamp allows the calculation of the distance between you and the th satellite. The transmission travel time is calculated by subtracting the current time on the GPS receiver from the satellite's time stamp. Distance is then found by multiplying the transmission travel time by the rate, which is the speed of light km/s.β55β So distance is found as , where is the time at the receiver. This signal places your location within in a sphere of that radius from the center of the satellite. If you receive a signal at the same time from two satellites, then your position is at the intersection of two spheres. As can be seen at left in Figure 30.13, that intersection is a circle. So your position has been narrowed quite a bit. Now if you receive simultaneous signals from three spheres, your position is narrowed to the intersection of three spheres, or two points as shown at right in Figure 30.13. So if we could receive perfect information from three satellites, then your location would be exactly determined.
There is a problem with the above analysis β calculating the distances. These distances are determined by the time it takes for the signal to travel from the satellite to the GPS receiver. The times are measured by the clocks in the satellites and the clocks in the receivers. Since the GPS receiver clock is unlikely to be perfectly synchronized with the satellite clock, the distance calculations are not perfect. In addition, the rate at which the signal travels can change as the signal moves through the ionosphere and the troposphere. As a result, the calculated distance measurements are not exact, and are referred to as pseudoranges. In our calculations we need to factor in the error related to the time discrepancy and other factors. We will incorporate these errors into our measure of and treat as an unknown. (Of course, this is all more complicated that is presented here, but this provides the general idea.)
To ensure accuracy, the GPS uses signals from four satellites. Assume a satellite is positioned at point at a distance from the GPS receiver located at point . The distance can also be measured in two ways: as
Again, we are treating as an unknown, so this equation has the four unknowns ,,, and . Using signals from four satellites produces the system of equations
The system of equations (30.14), (30.15), (30.16), and (30.17) is a non-linear system and is difficult to solve, if it even has a solution. We want a method that will provide at least an approximate solution as well as apply if we use more than four satellites. We choose a reference node (say ) and make calculations relative to that node as we did in Project Activity 30.11.
First square both sides of the equations (30.14), (30.15), (30.16), and (30.17) to remove the roots. Then subtract corresponding sides of the new first equation (involving ) from the new second equation (involving ) to show that we can obtain the linear equation
,
where . (Note that the unknowns are ,,, and β all other quantities are known.)
Use the result of part (a) to write a linear system that can be obtained by subtracting the first equation from the third and fourth equations as well.
The linearizations from part (b) determine a system of linear equations. Identify ,, and . Then explain how we can approximate a best solution to this system in the least squares sense.
We conclude this project with a final note. At times a GPS receiver may only be able to receive signals from three satellites. In these situations, the receiver can substitute the surface of the Earth as a fourth sphere and continue the computation.
For example, as stated in http://www2.imm.dtu.dk/~pch/Projekter/tsvd.html, βThe SVD [singular value decomposition] has also applications in digital signal processing, e.g., as a method for noise reduction. The central idea is to let a matrix represent the noisy signal, compute the SVD, and then discard small singular values of . It can be shown that the small singular values mainly represent the noise, and thus the rank- matrix represents a filtered signal with less noise.β
Theorem 30.6 is often given as the definition of a pseudoinverse.
The signals travel in radio waves, which are electromagnetic waves, and travel at the speed of light. Also, is the speed of light in a vacuum, but atmosphere is not too dense so we assume this value of