Section 8 Matrix Operations
Focus Questions
By the end of this section, you should be able to give precise and thorough answers to the questions listed below. You may want to keep these questions in mind to focus your thoughts as you complete the section.
Under what conditions can we add two matrices and how is the matrix sum defined?
Under what conditions can we multiply a matrix by a scalar and how is a scalar multiple of a matrix defined?
Under what conditions can we multiply two matrices and how is the matrix product defined?
What properties do matrix addition, scalar multiplication of matrices and matrix multiplication satisfy? Are these properties similar to properties that are satisfied by vector operations?
What are two properties that make matrix multiplication fundamentally different than our standard product of real numbers?
What is the interpretation of matrix multiplication from the perspective of linear transformations?
How is the transpose of a matrix defined?
Subsection Application: Algorithms for Matrix Multiplication
Matrix multiplication is widely used in applications ranging from scientific computing and pattern recognition to counting paths in graphs. As a consequence, much work is being done in developing efficient algorithms for matrix multiplication.
We will see that a matrix product can be calculated through the row-column method. Recall that the product of two matrices and is given by
This product involves eight scalar multiplications and some scalar additions. As we will see, multiplication is more computationally expensive than addition, so we will focus on multiplication. In 1969, a German mathematician named Volker Strassen showedโ13โ that the product of two matrices can be calculated using only seven multiplications. While this is not much of an improvement, the Strassen algorithm can be applied to larger matrices, using matrix partitions (which allow for parallel computation), and its publication led to additional research on faster algorithms for matrix multiplication. More details are provided later in this section.
Subsection Introduction
A vector is a list of numbers in a specified order and a matrix is an ordered array of objects. In fact, a vector can be thought of as a matrix of size Vectors and matrices are so alike in this way that it would seem natural that we can define operations on matrices just as we did with vectors.
Recall that a matrix is made of rows and columns โ the entries reading from left to right form the rows of the matrix and the entries reading from top to bottom form the columns. The number of rows and columns of a matrix is called the size of the matrix, so an matrix has rows and columns. If we label the entry in the th row and th column of a matrix as then we write
We can generalize the operations of addition and scalar multiplication on vectors to matrices similarly. Given two matrices and of the same size, we define the sum by
when the sizes of the matrices and match. In other words, for matrices of the same size the matrix addition is defined by adding corresponding entries in the matrices. For example,
We define the scalar multiple of a matrix by scalar to be the matrix defined by
This means that we multiply each entry of the matrix by the scalar As an example,
Even though we did not have a multiplication operation on vectors, we had a matrix-vector product, which is a special case of a matrix-matrix product since a vector is a matrix with one column. However, generalizing the matrix-vector product to a matrix-matrix product is not immediate as it is not immediately clear what we can do with the other columns. We will consider this question in this section.
Note that all of the matrix operations can be performed on a calculator. After entering each matrix in the calculator, just use and ร operations to find the result of the matrix operation. (Just for fun, try using with matrices to see if it will work.)
Preview Activity 8.1.
(a)
Pick three different varying sizes of pairs of matrices which can be added. For each pair:
(i)
Find the matrices and
(ii)
How are the two matrices and related? What does this tell us about matrix addition?
(b)
Let and Determine the entries of the matrix
(c)
Now we turn to multiplication of matrices. Our first goal is to find out what conditions we need on the sizes of matrices and if the matrix-matrix product is defined and what the size of the resulting product matrix is. We know the condition and the size of the result in the special case of being a vector, i.e., a matrix with one column. So our conjectures for the general case should match what we know in the special case. In each part of this problem, use any appropriate tool (e.g., your calculator, Maple, Mathematica, WolframAlpha) to determine the matrix product if it exists. If you obtain a product, write it down and explain how its size is related to the sizes of and If you receive an error, write down the error and guess why the error occurred and/or what it means.
(i)
(ii)
(iii)
(iv)
(v)
Make a guess for the condition on the sizes of two matrices for which the product is defined. How is the size of the product matrix related to the sizes of and
(d)
The final matrix products, when defined, in problem 8.1.c might seem unrelated to the individual matrices at first. In this problem, we will uncover this relationship using our knowledge of the matrix-vector product. Let and
(i)
Calculate using any tool.
(ii)
Using the matrix-vector product, calculate where is the first column (i.e., calculate ), and then the second column of (i.e., calculate ), and then the third column of (i.e., calculate ). Do you notice these output vectors within
(iii)
Describe as best you can a definition of using the matrix-vector product.
Subsection Properties of Matrix Addition and Multiplication by Scalars
Just as we were able to define an algebra of vectors with addition and multiplication by scalars, we can define an algebra of matrices. We will see that the properties of these operations on matrices are immediate generalizations of the properties of the operations on vectors. We will then see how the matrix product arises through the connection of matrices to linear transformations. Finally, we define the transpose of a matrix. The transpose of a matrix will be useful in applications such as graph theory and least-squares fitting of curves, as well as in advanced topics such as inner product spaces and the dual space of a vector space.
We learned in Preview Activity 8.1 that we can add two matrices of the same size together by adding corresponding entries and we can multiply any matrix by a scalar by multiplying each entry of the matrix by that scalar. More generally, if and are matrices and is any scalar, then
As we have done each time we have introduced a new operation, we ask what properties the operation has. For example, you determined in Preview Activity 8.1 that addition of matrices is a commutative operation. More specifically, for every two matrices and We can use similar arguments to verify the following properties of matrix addition and multiplication by scalars. Notice that these properties are very similar to the properties of addition and scalar multiplication of vectors we discussed earlier. This should come as no surprise since the -dimensional vectors are matrices. In a strange twist, we will see that matrices themselves can be considered as vectors when we discuss vector spaces in a later section.
Theorem 8.1.
Let and be matrices and let and be scalars. Then
(this property tells us that matrix addition is commutative)
(this property tells us that matrix addition is associative)
The matrix whose entries are all 0 has the property that The matrix is called the zero matrix (It is generally clear from the context what the size of the 0 matrix is.).
The scalar multiple of the matrix has the property that The matrix is called the additive inverse of the matrix
(this property tells us that scalar multiplication of matrices distributes over scalar addition)
(this property tells us that scalar multiplication of matrices distributes over matrix addition)
Later on, we will see that these properties define the set of all matrices as a vector space. These properties just say that, regarding addition and multiplication by scalars, we can manipulate matrices just as we do real numbers. Note, however, we have not yet defined an operation of multiplication on matrices. That is the topic for the next section.
Subsection A Matrix Product
As we saw in Preview Activity 8.1, a matrix-matrix product can be found in a way which makes use of and also generalizes the matrix-vector product.
Definition 8.2.
The matrix product of a matrix and an matrix with columns is the matrix
We now consider the motivation behind this definition by thinking about the matrix transformations corresponding to each of the matrices and Recall that left multiplication by an matrix defines a transformation from to by The domain of is because the number of components of have to match the number of entries in each of row of in order for the matrix-vector product to be defined. Similarly, a matrix defines a transformation from to Since transformations are functions, we can compose them as long as the output vectors of the inside transformation lie in the domain of the outside transformation. Therefore if is the inside transformation and is the outside transformation, the composition is defined. So a natural question to ask is if we are given
is there a matrix that represents the transformation defined by We investigate this question in the next activity in the special case of a matrix and a matrix
Activity 8.2.
In this activity, we look for the meaning of the matrix product from a transformation perspective. Let and be matrix transformations defined by
where
(a)
What are the domains and codomains of and Why is the composite transformation defined? What is the domain of What is the codomain of (Recall that is defined by i.e., we substitute the output as the input into the transformation )
(b)
Let Determine the components of
(c)
Find the components of
(d)
Find a matrix so that
(e)
Use the definition of composition of transformations and the definitions of the and transformations to explain why it is reasonable to define to be the matrix Does the matrix agree with the
you found in Preview Activity 8.1 using technology?
We now consider this result in the general case of a matrix and an matrix where and define matrix transformations and respectively. In other words, and are matrix transformations defined by and The domain of is and the codomain is The domain of is and the codomain is The composition is defined because the output vectors of are in and they lie in the domain of The domain of is the same as the domain of since the input vectors first go through the transformation. The codomain of is the same as the codomain of since the final output vectors are produced by applying the transformation.
Let us see how we can obtain the matrix corresponding to the transformation Let where is the th column of and let Recall that the matrix vector product is the linear combination of the columns of with the corresponding weights from So
Note that each of the vectors are in since is an matrix. Therefore, each of these vectors can be multiplied by matrix and we can evaluate Therefore, is defined and
The properties of matrix-vector products show that
This expression is a linear combination of 's with 's being the weights. Therefore, if we let be the matrix with columns that is
then
by definition of the matrix-vector product. Combining equations (8.1), (8.2), and (8.3) shows that
where
Also note that since and we find
Since the matrix representing the transformation is the matrix
where are the columns of the matrix it is natural to define to be this matrix in light of equation (8.4).
Matrix multiplication has some properties that are unfamiliar to us as the next activity illustrates.
Activity 8.3.
(a)
Find the indicated products (by hand or using a calculator).
(b)
Is matrix multiplication commutative? Explain.
(c)
Is there an identity element for matrix multiplication? In other words, is there a matrix for which for any matrix Explain.
(d)
If and are real numbers with then we know that either or Is this same property true with matrix multiplication? Explain.
(e)
If and are real numbers with and we know that Is this same property true with matrix multiplication? Explain.
As we saw in Activity 8.3, there are matrices for which On the other hand, there are matrices for which For example, this equality will always hold for a square matrix and if is the identity matrix of the same size. It also holds if If the equality holds, we say that matrices and commute. So the identity matrix commutes with all square matrices of the same size and every matrix commutes with for any power
There is an alternative method of calculating a matrix product that we will often use that we illustrate in the next activity. This alternate version depends on the product of a row matrix with a vector. Suppose is a matrix and is an vector. Then the product is the vector
In this situation, we usually identify the matrix with its scalar entry and write
The product in (8.5) is called the scalar or dot product of with
Activity 8.4.
Let and
Let be the th row of and the th column of For example, and
Calculate the entries of the matrix where
where refers to the scalar product of row of with column of โ14โ Compare your result with the result of calculated via the product of with the columns of
Activity 8.4 shows that these is an alternate way to calculate a matrix product. To see how this works in general, let be a matrix and an matrix. We know that
Now let be the rows of so that First we argue that if then
This is the scalar product (or dot product) definition of the matrix-vector product.
To show that this definition gives the same result as the linear combination definition of matrix-vector product, we first let where are the columns of By our linear combination definition of the matrix-vector product, we obtain
Therefore, the above work shows that both linear combination and scalar product definitions give the same matrix-vector product.
Applying this to the matrix product defined in terms of the matrix-vector product, we see that
So the th entry of the matrix product is found by taking the scalar product of the th row of with the th column of In other words,
where is the th row of and is the th column of
Subsection Properties of Matrix Multiplication
Activity 8.3 shows that we must be very careful not to assume that matrix multiplication behaves like multiplication of real numbers. However, matrix multiplication does satisfy some familiar properties. For example, we now have an addition and multiplication of matrices under certain conditions, so we might ask if matrix multiplication distributes over matrix addition. To answer this question we take two arbitrary matrices and and an arbitrary matrix Then
Similar arguments can be used to show the following properties of matrix multiplication.
Theorem 8.3.
Let and be matrices of the appropriate sizes for all sums and products to be defined and let be a scalar. Then
(this property tells us that matrix multiplication is associative)
(this property tells us that matrix multiplication on the right distributes over matrix addition)
(this property tells us that matrix multiplication on the left distributes over matrix addition)
There is a square matrix with the property that or for whichever product is defined.
We verified the second part of this theorem and will assume that all of the properties of this theorem hold. The matrix introduced in Theorem 8.3 is called the (multiplicative) identity matrix. We usually omit the word multiplicative and refer to the simply as the identity matrix. This does not cause any confusion since we refer to the additive identity matrix as simply the zero matrix.
Definition 8.4.
Let be a positive integer. The identity matrix is the matrix where for each and if
We also write the matrix as
The matrix has the property that for any matrix
so is a multiplicative identity in the set of all matrices. More generally, for an matrix
Subsection The Transpose of a Matrix
One additional operation on matrices is the transpose. The transpose of a matrix occurs in many useful formulas in linear algebra and in applications of linear algebra.
Definition 8.5.
The transpose of an matrix is the matrix whose th entry is
Written out, the transpose of the matrix
is the matrix
In other words, the transpose of a matrix is the matrix whose rows are the columns of Alternatively, the transpose of is the matrix whose columns are the rows of We can also view the transpose of as the reflection of across its main diagonal, where the diagonal of a matrix consists of the entries of the form
Activity 8.5.
(a)
Find the transpose of each of the indicated matrices.
(b)
Find the transpose of the new matrix for each part above. What can you conjecture based on your results? There are certain special types of matrices that are given names.
(c)
There are certain special types of matrices that are given names.
Definition 8.6.
Let be a square matrix whose th entry is
The matrix is a diagonal matrix if whenever
The matrix is a symmetric matrix if
The matrix is an upper triangular if whenever
The matrix is a lower triangular if whenever
(i)
Find an example of a diagonal matrix What can you say about
(ii)
Find an example of a non-diagonal symmetric matrix If must be a square matrix?
(iii)
Find an example of an upper triangular matrix What kind of a matrix is
We will see later that diagonal matrices are important in that their powers are easy to calculate. Symmetric matrices arise frequently in applications such as in graph theory as adjacency matrices and in quantum mechanics as observables, and have many useful properties including being diagonalizable and having real eigenvalues, as we will also see later.
Subsection Properties of the Matrix Transpose
As with every other operation, we want to understand what properties the matrix transpose has. Properties of transposes are shown in the following theorem.
Theorem 8.7.
Let and be matrices of the appropriate sizes and let be a scalar. Then
The one property that might seem strange is the third one. To understand this property, suppose is an matrix and an matrix so that the product is defined. We will argue that by comparing the th entry of each side.
First notice that the th entry of is the th entry of The th entry of is found by taking the scalar product of the th row of with the th column of Thus, the th entry of is the scalar product of the th row of with the th column of
The th entry of is the scalar product of the th row of with the th column of But the th row of is the th column of and the th column of is the th row of So the th entry of is the scalar product of the th row of with the th column of
Since the two matrices and have the same size and same corresponding entries, they are the same matrix.
Subsection Examples
What follows are worked examples that use the concepts from this section.
Example 8.8.
Let
Determine the results of the following operations, if defined. If not defined, explain why.
(a)
Solution.
Since is a matrix and is a matrix, the number of columns of equals the number of rows of and the matrix produce is defined. Recall that if where are the columns of then Recall also that is the linear combination of the columns of with weights from so
and
So Alternatively, if then the matrix product is the matrix whose entry is Using this method we have
Now
so
(b)
Solution.
Since is a matrix but is the number of columns of is not equal to the number of rows of We conclude that is not defined.
(c)
Solution.
Since is a matrix and is the number of columns of is equal to the number of rows of Thus, the quantity is defined. First we calculate using the dot product of the rows of with the columns of Letting and where and are the rows of and and are the columns of we have
Now
so If and where and are the rows of and and are the columns of then
Now
so
(d)
Solution.
Since and are both matrices, their sum is defined and is a matrix. Because is matrix, the number of columns of is equal to the number of rows of Thus, the quantity is defined and, using the row-column method of matrix multiplication as earlier,
(e)
Solution.
Since is a matrix and is the number of columns of is equal to the number of rows of Thus, is defined and
(f)
Solution.
The fact that is a matrix means that is a matrix. Since is also a matrix, the sum is defined. The transpose of any matrix is also defined, so is defined and
Example 8.9.
(a)
Determine the matrix sum Then use this sum to calculate
Solution.
Adding corresponding terms shows that Squaring this sum yields the result
(b)
Now calculate in a different way. Use the fact that matrix multiplication distributes over matrix addition to expand (like foiling) into a sum of matrix products. The calculate each summand and add to find You should obtain the same result as part (a). If not, what could be wrong?
Solution.
Expanding (remember that matrix multiplication is not commutative) gives us
just as in part (a). If instead you obtained the matrix you likely made the mistake of equating with These two matrices are not equal in general, because we cannot say that is equal to
Subsection Summary
In this section we defined a matrix sum, scalar multiples of matrices, the matrix product, and the transpose of a matrix.
The sum of two matrices and is the matrix whose th entry is
If is an matrix, the scalar multiple of by the scalar is the matrix whose th entry is
-
If is a matrix and is an matrix, then the matrix product of the matrices and is the matrix
The matrix product is defined in this way so that the matrix of a composite of linear transformations is the product of matrices of and
An alternate way of calculating the product of an matrix with rows and an matrix with columns is that the product is the matrix whose th entry is
Matrix multiplication does not behave as the standard multiplication on real numbers. For example, we can have a product of two non-zero matrices equal to the zero matrix and there is no cancellation law for matrix multiplication.
The transpose of an matrix is the matrix whose th entry is
Exercises Exercises
1.
Calculate for each of the following matrix pairs by hand in two ways.
(a)
(b)
2.
For each of the following matrices, find all matrices which commute with the given (Two matrices and commute with each other if )
(a)
(b)
(c)
3.
Find all possible, if any, matrices satisfying each of the following matrix equations.
(a)
(b)
(c)
4.
For each of the following matrices, compute Use your results to conjecture a formula for Interpret your answer geometrically using the transformation interpretation.
(a)
(b)
(c)
5.
If for unknown matrix and vector, determine an expression for โฆ,
6.
If and find an expression for in terms of and
7.
A matrix is a nilpotent matrix if i.e., is the zero matrix, for some positive integer Explain why the matrices
are nilpotent matrices.
8.
Suppose is an matrix for which Show that there is a matrix for which where is the identity matrix of size
9.
Let and be matrices and let and be scalars. Verify Theorem 8.1. That is, show that
(a)
(b)
(c)
The matrix whose entries are all 0 has the property that
(d)
The scalar multiple of the matrix has the property that
(e)
(f)
(g)
(h)
10.
Let and be matrices of the appropriate sizes for all sums and products to be defined and let be a scalar. Verify the remaining parts of Theorem 8.3. That is, show that
(a)
(b)
(c)
There is a square matrix with the property that or for whichever product is defined.
(d)
11.
Let and be matrices of the appropriate sizes, and let be a scalar. Verify the remaining parts of Theorem 8.7. That is, show that
(a)
(b)
(c)
12.
The matrix exponential is an important tool in solving differential equations. Recall from calculus that the Taylor series expansion for centered at is
and that this Taylor series converges to for every real number We extend this idea to define the matrix exponential for any square matrix with real entries as
We explore this idea with an example. Let
(a)
Calculate Explain why for any positive integer
(b)
Show that is equal to
(c)
Explain why
13.
Show that if and are rotation matrices, then is also a rotation matrix.
14.
Label each of the following statements as True or False. Provide justification for your response. Throughout, assume that matrices are of the appropriate sizes so that any matrix sums or products are defined.
(a) True/False.
For any three matrices with implies
(b) True/False.
For any three matrices with implies
(c) True/False.
If is the zero matrix, then itself is the zero matrix.
(d) True/False.
If for every matrix then is the identity matrix
(e) True/False.
If matrix products and are both defined, then and are both square matrices of the same size.
(f) True/False.
If is a solution for (i.e., that ) and is a solution for then is a solution for
(g) True/False.
If is an matrix with two equal columns, then the matrix has two equal columns for every matrix.
(h) True/False.
If then or
Subsection Project: Strassen's Algorithm and Partitioned Matrices
Strassen's algorithm is an algorithm for matrix multiplication that can be more efficient than the standard row-column method. To understand this method, we begin with the case which will highlight the essential ideas.
Project Activity 8.6.
We first work with the case.
(a)
(i)
Calculate the matrix product
(ii)
Rather than using eight multiplications to calculate Strassen came up with the idea of using the following seven products:
Calculate through for the given matrices and Then calculate the quantities
What do you notice?
(b)
Now we repeat part (a) in general. Suppose we want to calculate the matrix product for arbitrary matrices and Let
Show that
The next step is to understand how Strassen's algorithm can be applied to larger matrices. This involves the idea of partitioned (or block) matrices. Recall that the matrix-matrix product of the matrix and the matrix is defined as
In this process, we think of as being partitioned into columns. We can expand on this idea to partition both and when calculating a matrix-matrix product.
Project Activity 8.7.
We illustrate the idea of partitioned matrices with an example. Let We can partition into smaller matrices
which are indicated by the vertical and horizontal lines. As a shorthand, we can describe this partition of as
where and The submatrices are called blocks. If is a matrix such that is defined, then must have five rows. As an example, is defined if The partition of breaks up into blocks with three and two columns, respectively. So if we partition into blocks with three and two rows, then we can use the blocks to calculate the matrix product For example, partition as
Show that
An advantage to using partitioned matrices is that computations with them can be done in parallel, which lessens the time it takes to do the work. In general, we can multiply partitioned matrices as though the submatrices are scalars. That is,
where
provided that all the submatrix products are defined.
Now we can apply Strassen's algorithm to larger matrices using partitions. This method is sometimes referred to as divide and conquer.
Project Activity 8.8.
Let and be two matrices. If is not a power of then pad the rows and columns of and with zeros to make them of size for some integer (From a practical perspective, we might instead just use unequal block sizes.) Let Partition and as
where each submatrix is of size Now we use the Strassen algorithm just as in the case, treating the submatrices as if they were scalars (with the additional constraints of making sure that the dimensions match up so that products are defined, and ensuring we multiply in the correct order). Letting
then the same algebra as in Project Activity 8.6 shows that
Apply Strassen's algorithm to calculate the matrix product where
While Strassen's algorithm can be more efficient, it does not always speed up the process. We investigate this in the next activity.
Project Activity 8.9.
We introduce a little notation to help us describe the efficiency of our calculations. We won't be formal with this notation, rather work with it in an informal way. Big O (the letter โOโ ) notation is used to describe the complexity of an algorithm. Generally speaking, in computer science big O notation can be used to describe the run time of an algorithm, the space used by the algorithm, or the number of computations required. The letter โOโ is used because the behavior described is also called the order. Big O measures the asymptotic time of an algorithm, not its exact time. For example, if it takes steps to complete an algorithm, then we say that the algorithm grows at the order of (we ignore the constants and the smaller power terms, since they become insignificant as increases) and we describe its growth as To measure the efficiency of an algorithm to determine a matrix product, we will measure the number of operations it takes to calculate the product.
(a)
Suppose and are matrices. Explain why the operation of addition (that is, calculating ) is
(b)
Suppose and are matrices. How many multiplications are required to calculate the matrix product Explain.
(c)
The standard algorithm for calculating a matrix product of two matrices requires multiplications and a number of additions. Since additions are much less costly in terms of operations, the standard matrix product is We won't show it here, but using Strassen's algorithm on a product of matrices is where That means that Strassen's algorithm applied to an matrix (where is a power of ) requires approximately multiplications. We use this to analyze situations to determine when Strassen's algorithm is computationally more efficient than the standard algorithm.
(i)
Suppose and are matrices. Determine the number of multiplications required to calculate the matrix product using the standard matrix product. Then determine the approximate number of multiplications required to calculate the matrix product using Strassen's algorithm. Which is more efficient? (Remember, we can only apply Strassen's algorithm to square matrices whose sizes are powers of )
(ii)
Repeat part i. with matrices. Which method is more efficient?
As a final note, Strassen's algorithm is approximately As of 2018, the best algorithm for matrix multiplication, developed by Virginia Williams at Stanford University, is approximately โ15โStrassen, Volker, Gaussian Elimination is not Optimal, Number. Math. 13, p. 354-356, 1969
V. V. Williams, Multiplying matrices in time, Stanford University, (2014).