Notes on Matrix Algebra - Part 2
Math 5718 - Spring 2001
We have done a fairly exhaustive study of systems of linear equations, with a particular focus on when solutions exist and whether or not they are unique. There is still much more to be learned about this problem. In this set of notes, we will explore the fundamental subspaces associated with any matrix, their orthogonality relations, and the question of inverses. 1 Definitions We need some definitions to get started. First, recall that the transpose of a matrix A, denoted AT, is the matrix whose columns are the rows of A; that is, the (i,j) element of A is the (j,i) element of AT. A special case is that of a vector: if x is an $n \times 1$ column vector, then xT is a $1 \times n$ row vector. An important result concerns the transpose of the product of two matrices: recall that (AB)T = BTAT and for a matrix-vector product, (Ax)T=xTAT. We must also introduce the notion of orthogonality. Let $x=(x_1,\ldots,x_n)^T$ and $y=(y_1,\ldots,y_n)^T$ be column vectors in ${\bf R}^n$. Two such vectors are said to be orthogonal if their scalar product (or inner product) is zero:

\begin{displaymath}x \cdot y = x^Ty = \sum_{k=1}^n x_ky_k = 0.
\end{displaymath}

Note that in ${\bf R}^2$ and ${\bf R}^3$, orthogonal vectors are geometrically perpendicular. A set of vectors $\{u_i\}$ is orthogonal if every vector in the set is orthogonal to every other vector. Orthogonality is a stronger condition than linear independence; that is, an orthogonal set is linearly independent (for example, (1,0) and (0,1)), but not conversely (for example, (1,1) and (0,1)). Two subspaces, U and V, are orthogonal if every vector in U is orthogonal to every vector in V; we write $U \perp V$. For example, the subspace $U=\{(x,0,0):x \in {\bf R}\}$ is orthogonal to the subspace $V=\{(0,0,z):z \in {\bf R}\}$. Finally, all vectors in ${\bf R}^n$ that are orthogonal to a subspace V comprise the orthogonal complement of V, denoted $V^\perp$. For example, for the subspaces $U=\{(x,y,0):x,y \in {\bf R}\}$ and $V=\{(0,0,z):z \in {\bf R}\}$, we have $U=V^\perp$ and $V=U^\perp$. The Four Fundamental Subspaces Associated with any matrix A are four fundamental subspaces that obey elegant othogonality relationships. The subspaces are the null space of A, the row space of A, the column space of A, and the null space of AT. We have already encountered two of these subspaces; we will now consider them all in detail. Null space of A. Recall that null A is comprised of solutions of the homogeneous system Ax=0. Let U be the result of reducing A to row echelon form. Because elementary row operations do not change the solution of Ax=0, null A = null U. As we saw in the last set of notes, U has r pivots that correspond to the basic variables. The remaining n-r variables are free variables. The system Ux=0 can be solved for the basic variables in terms of the free variables to give a basis of null A. Thus, dim null A = n-r. Row space of A. The row space of A is the subspace spanned by the rows of A. Elementary row operations do not change the row space of A (because the rows of U are linear combinations of the rows of A). Therefore, the row space of U and the row space of A are identical. It is easy to find the row space of U and a basis for it. Recall that U has r nonzero rows, one for each pivot element (as shown in the row echelon matrix below). Furthermore these nonzero rows are linearly independent. This fact follows because each row has a different number of leading zeros. If the rows are denoted vi, then the linear combination $a_1v_1+\cdots a_{r}v_{r}$ vanishes only if, first, a1=0, then a2=0, and so forth, until all of the ai's must be zero. Thus, the r nonzero rows of U form a basis for the row space of A, and we have that the dimension of the row space of A is r, which is the rank of A.

\begin{displaymath}\left(
\begin{array}{cccccc}
& \downarrow & \downarrow && \...
... 0 & x &
* \\ & 0 & 0 & 0 & 0 & 0 \\
\end{array}
\right),
\end{displaymath}

Column space of A. We have already encountered the column space of A, also called the range of A; it is the subspace spanned by the columns of A. The goal is to find a basis for the column space of A. Unfortunately, we cannot use the row echelon form U quite as easily as we did for the row space. The reason is that elementary row operations do change the column space of a matrix; so the column space of A is not the same as the column space of U. But there is still information in U. First, note that r columns of U contain a pivot element (as indicated in the row echelon matrix above). Furthermore, these r columns are linearly independent. This fact can be shown as we did for the nonzero rows of U. The r pivot columns of U have a different number of trailing zeros. Letting the pivot columns of U be vi, then the linear combination $a_1v_1+\cdots a_{r}v_{r}$ vanishes only if, first, ar=0, then ar-1=0, and so forth, until all of the ai's must be zero. Thus, the r pivot columns of U form a basis for the column space of U - but not for the column space of A. Here is how we find the column space of A from the column space of U. It is a subtle argument that requires some thought: We know that the systems Ax=0 and Ux=0 have the same solutions. As we have already observed, a solution of Ax=0 gives the weights in a linear combination of the columns of A. Therefore, the system Ax=0 describes a linear dependence among the columns of A. Because the same x is a solution of Ux=0, there is a corresponding linear dependence among the columns of U. It follows that if a set of columns of U is linearly independent, then the corresponding set of columns of A is linearly independent (and vice versa). Therefore, if the r pivot columns of U are linearly independent (as we have shown), then the same r columns are A are also linearly independent. Therefore, we have a method for finding a basis for the column space of A: we look for the pivot columns of U and take the same columns of A as the basis for the column space. Notice that a surprising result appears from these arguments. We have shown that
dimension of the row space = dimension of the column space = r (the rank of A).
Said differently, given any matrix, the number of linearly independent rows equals the number of linearly independent columns. Null space of AT. The fourth subspace is not so obvious, but as you will see, it completes the picture perfectly. Notice that if A is an $m \times n$ matrix, then null A and the row space of A are subspaces of ${\bf R}^n$, while the column space (or range) of A is a subspace of ${\bf R}^m$. The last subspace, null AT, is also a subspace of ${\bf R}^m$. This space is often called the left null space of A, as its elements satisfy ATy=0, or equivalently, yTA=0. The dimension of null AT can be found by applying the subspace dimension theorem (Theorem 3.4 of the text) to AT. Recall that AT is an $n \times m$ matrix; it maps ${\bf R}^m$ into ${\bf R}^n$. Therefore,

\begin{displaymath}m = \mbox{ dim range }A^T + \mbox{ dim null }A^T.
\end{displaymath}

But note that

\begin{displaymath}\mbox{ dim range }A^T = \mbox{ dim column space }A^T = \mbox{ dim row space }A=r.
\end{displaymath}

Putting these two observations together, we have that $\mbox{ dim null }A^T = m-r$. Finding a basis for null AT is not such an urgent problem. If nothing else, one could reduce AT to row echelon form and solve ATx=0 in terms of the m-r free variables (exactly as we found a basis for null A). Here is a summary of the four fundamental subspaces (sometimes called the Fundamental Theorem of Linear Algebra): Orthogonality Relations for the Subspaces. Knowing the dimensions of the four subspaces is important, but equally important is the orthogonality relations between them. We will prove that Notice that because the row space of A is the range of AT, these two statements are symmetric: the second result is the first result applied to AT. Let's prove the first statement. Suppose $w \in \mbox{null }A$ and $v \in \mbox{ row space } A = \mbox{ range }A^T$. This implies that v=ATx for some $x \in {\bf R}^m$. We first show that w and v are orthogonal. Note that

wTv = wT(ATx) = (wTAT)x = (Aw)Tx=0;

the last step was made because Aw=0. Therefore, every vector in $ \mbox{null }A$ is orthogonal to every vector in $\mbox{range
}A^T$. Thus, $ \mbox{null }A$ is orthogonal to $\mbox{range
}A^T$. A more visual way to see this relationship is to write the product Aw=0 as

\begin{displaymath}\left(
\begin{array}{cc}
\mbox{row 1}& \cdots \\ \mbox{row ...
...egin{array}{c}
0\\ 0\\ \vdots \\ 0\\
\end{array}
\right).
\end{displaymath}

This product can be interpreted to mean that the scalar product of w with each row of A is zero. Thus, $w \in \mbox{null }A$ is orthogonal to the rows of A, or $ \mbox{null }A$ is orthogonal to the row space of A. It might seem that we are finished, but this does not establish that $ \mbox{null }A$ is the orthogonal complement of $\mbox{range
}A^T$. There could be vectors orthogonal to $\mbox{range
}A^T$ that are not in $ \mbox{null }A$, and vice versa. So there is a little more work to be done. First, suppose a vector outside the null space were orthogonal to the rows of A. Looking at the product Aw above, we see this orthogonality forces Aw=0, which means $w \in \mbox{null }A$. So a vector outside $ \mbox{null }A$ cannot be orthogonal to the row space of A. What about a vector v outside the row space of A, but orthogonal to $ \mbox{null }A$? This vector is not a linear combination of the rows of A, so imagine attaching it to A as the last row of a new matrix A'. Then A' has the same number of columns as A and it has the same null space as A, but its row space has a greater dimension than the row space of A. This change violates the subspace dimension theorem. Therefore, we conclude that all vectors orthogonal to the row space of A are in $ \mbox{null }A$ and all vectors orthogonal to $ \mbox{null }A$ are in the row space of A. This implies that $ \mbox{null }A$ and the row space of A are orthogonal complements. Replacing A by AT, we can also prove that $\mbox{null }A^T$ and the range of A are orthogonal complements. It now follows that the domain space, ${\bf R}^n$, and the target space, ${\bf R}^m$, can be represented as the following direct sums: This subspace picture gives a useful interpretation of how a linear map works. Suppose A maps ${\bf R}^n \rightarrow {\bf R}^m$ and $x \in {\bf R}^n$. Then x can be decomposed (uniquely) into two parts:

\begin{displaymath}x = x_n + x_r \quad \mbox{where} \quad x_n \in \mbox{null
}A,\mbox{ and } x_r \in \mbox{ row space }A.
\end{displaymath}

Now let A work on x; we have

\begin{displaymath}Ax = A(x_n+x_r) = Ax_n+Ax_r = 0 + Ax_r = Ax_r \in \mbox{ range }A.
\end{displaymath}

We see that A maps its null space to zero and its row space onto its column space. Because the row space and the column space of A have the same dimension (r= rank of A), A is an injective and surjective (hence invertible) map from the row space to the column space. Example. Let's exercise all of these ideas on an $3 \times
4$ matrix. Let

\begin{displaymath}A=\left(
\begin{array}{cccc}
1&2&0&1\\ 0&1&1&0\\ 1&2&0&1\\ ...
...{cccc}
1&2&0&1\\ 0&1&1&0\\ 0&0&0&0\\
\end{array}
\right).
\end{displaymath}

We see immediately that there are r=2 pivots; so the rank of A is 2, and A has two linearly independent rows and columns. A basis of the row space is $\{(1,2,0,1)^T,(0,1,1,0)^T\}$ (the nonzero rows of U), confirming that the row space has dimension 2. A basis for the null space can be found by solving Ux=0. Notice that there are r=2 basic variables (x1 and x2) and n-r=2 free variables (x3 and x4), so we know that null A has dimension 2. The equations of Ux=0 are

\begin{eqnarray*}x_1+ 2x_2+x_4 &=&0 \\ x_2+x_3 &=& 0.\\
\end{eqnarray*}


Solving for x1 and x2 in terms of x3 and x4, we have that

\begin{eqnarray*}x_1 &=& 2x_3-x_4 \\ x_2&=&= -x_3.\\
\end{eqnarray*}


Thus, a basis for null A is $\{(2,-1,1,0)^T,(-1,0,0,1)^T\}$ (found by setting the free variables, x3 and x4, to 0 and 1 alternately). To find a basis for the column space of A, note that the first and second columns of U are the pivot columns and are linearly independent. Therefore, the first and second columns of A, $\{(1,0,1)^T,(2,1,2)^T\}$, are linearly independent and form a basis for the columns space of A. Finally, we already know that the dimension of null AT is m-r=1. To find a basis, we could start with AT and reduce it to row echelon form:

\begin{displaymath}A^T=\left(
\begin{array}{ccc}
1&0&1\\ 2&1&2\\ 0&1&0\\ 1&0&1...
...ccc}
1&0&1\\ 0&1&0\\ 0&0&0\\ 0&0&0\\
\end{array}
\right).
\end{displaymath}

By solving Ux=0, it follows that a basis for null AT is $\{(1,0,-1)^T\}$. The orthogonality of these bases and subspaces can be checked. For the row space of A and null A, it can be verified that

\begin{displaymath}\{(1,2,0,1)^T,(0,1,1,0)^T\} \perp \{(2,-1,1,0)^T,(-1,0,0,1)^T\}.
\end{displaymath}

For the column space of A and null AT, it can be verified that

\begin{displaymath}\{(1,0,1)^T,(2,1,2)^T\} \perp \{(1,0,-1)^T\}.
\end{displaymath}

$\clubsuit$
The Inverse Story We have already encountered the idea of the inverse of a linear map. Recall that, by definition, if $T:V \rightarrow W$ is invertible (has an inverse), then $T^{-1}:W \rightarrow V$ is the unique map such that T T-1= T-1T = I. We also proved that a map is invertible if and only if it is injective and surjective. All of these ideas carry over to the case of matrices, but there is more that we can say. Let's begin with two definitions. Suppose that A is an $m \times n$ matrix. The left inverse of A is an $n \times m$ matrix B such that BA=In, where In denotes the $n \times n$ identity matrix. The right inverse of A is an $n \times m$ matrix C such that AC=Im. A matrix many have none, one, or both of these one-sided inverses. We will determine the conditions under which these various cases arise. We can say one thing at the outset: If a matrix has both a right and left inverse, then they must be equal, as B=B(AC)=(BA)C=C. We will begin with the case that the range $A = {\bf R}^m$; then A is surjective and $r=\mbox{ rank }A=m \le n$. This means that every vector b can be expressed as a linear combination of the columns of A; hence, Ax=b has at least one solution for all $b
\in {\bf R}^m$. In particular, if we let ei be the usual ith unit vector of ${\bf R}^m$, then the system Ax=ei has at least one solution; call it xi. If we solve this system for $i=1,\ldots,m$, then we have solutions xi that we can represent in matrix form as

\begin{displaymath}A \underbrace{ \left(
\begin{array}{cccc}
x_1 & x_2 &\cdots...
...ts&\vdots&\vdots\\ &&&\\ &&&\\
\end{array}
\right)}_{I_m}.
\end{displaymath}

In other words, the m vectors xi (of length n) comprise the columns of a matrix C such that AC=Im. This $n \times m$ matrix C is a right inverse of A (perhaps one of many). We have shown that if $r=m\le n$, then a right inverse of A exists. The solution of the system Ax=b is x=Cb, because Ax=A(Cb) = (AC)b=b. Now suppose that the columns of A are linearly independent; then $r=\mbox{ rank }A=n \le m$, and A is injective. This means that when Ax=b has a solution, it is unique. Here is a way to demonstrate that a left inverse exists. Because $n \le m$, we can add m-n more linearly independent columns to A to form a new matrix $\hat{A}$. This matrix has m=n linearly independent rows and columns and is invertible; thus, $\hat{A}^{-1}\hat{A}=I_n$. We can now discard the last m-n columns of $\hat{A}$ (to recover A) and the last m-n rows of $\hat{A}^{-1}$ to give us a matrix B. You can check that the resulting product is simply BA=In. The matrix B, obtained by discarding the last m-n rows of $\hat{A}^{-1}$, is the left inverse of A. When a solution of Ax=b exists, it is given by x=Bb, because x = (BA)x=B(Ax)=Bb. Good Thought Question: Explain why a left inverse cannot exist in the case that r=m <n and why a right inverse cannot exist in the case that r=n<m. In the special case that $ r=\mbox{ rank }A=n = m$, we have a square matrix whose rows and columns are linearly independent. Now, the conclusions of both previous cases apply: A is invertible and said to be non-singular; its left inverse equals its right inverse, which we denote A-1; and the system Ax=b has a unique solution for all $b
\in {\bf R}^m$. The following table summarizes what we have established about inverses; as usual, A is $m \times n$, $r = \mbox{ rank }A$, and In is the $n \times n$ identity matrix.
  Inverse A? Ax=b Ax=0
r=m<n AC=Im (right) A surjective Nonunique solutions for all b Nonunique solutions
r=n<m BA=In (left) A injective Unique solution for some b Only x=0
r=n=m A-1A=AA-1= In A bijective (both) Unique solution for all b Only x=0