5.1 A quick review of some basic concepts in matrix algebra
A rectangular array of numbers (real numbers in our case), with \(n\) rows and \(m\) columns, is a matrix. The main application for us will be in solving systems of linear equations in statistics.
An example of a \(2\times 2\) matrix:
## a 2x2 matrix:
<- matrix(1:4, 2, 2)) (m1
## [,1] [,2]
## [1,] 1 3
## [2,] 2 4
An important operation on matrices is the transpose: the rows become columns and the columns rows. If \(X\) is matrix, \(X^T\) is its transpose. R has a function t()
that transposes a matrix:
## transpose of a matrix:
t(m1)
## [,1] [,2]
## [1,] 1 2
## [2,] 3 4
5.1.1 Matrix addition, subtraction, and multiplication
Matrix addition and subtraction are easy, they are cell-wise operations. An example shows how this works:
+ m1 m1
## [,1] [,2]
## [1,] 2 6
## [2,] 4 8
- m1 m1
## [,1] [,2]
## [1,] 0 0
## [2,] 0 0
Matrix multiplication is defined as follows. Think of this simple situation first, where you have a vector of values:
<- rnorm(5)) (m2
## [1] -0.17650 -0.99922 -1.24649 -0.75233 -0.07495
## this is a vector:
str(m2)
## num [1:5] -0.177 -0.999 -1.246 -0.752 -0.075
If you want to find out the sum of the squares of these values (\(\overset{n}{\underset{i=1}\sum} x_i ^2\)), then you can multiply each value \(x_i\) in m2 with itself, and sum up the result. This is called a dot product of two vectors (here, \(m2\) repeated twice):
\(x_1 \cdot x_1+x_2 \cdot x_2+x_3 \cdot x_3+x_4 \cdot x_4+x_5 \cdot x_5\)
Using R:
sum(m2 * m2)) (
## [1] 3.155
An important observation we will need later is that if two vectors are orthogonal, i.e., at 90 degrees to one another, then their dot product is 0. A simple example is two vectors on the cartesian plane that go along the x and y axes:
<- c(0, 1)
v1 <- c(1, 0)
v2 sum(v1 * v2)
## [1] 0
Matrix multiplication does exactly the above dot product operation (for arbitrarily large matrices). In R, matrix multiplication is carried out using a special operator (the * operator we are used to using for multiplication is used only for ordinary scalar multiplication). Notice that we have first converted the vector with \(5\) cells into a \(5\times 1\) matrix:
t(matrix(m2))
## [,1] [,2] [,3] [,4] [,5]
## [1,] -0.1765 -0.9992 -1.246 -0.7523 -0.07495
matrix(m2)
## [,1]
## [1,] -0.17650
## [2,] -0.99922
## [3,] -1.24649
## [4,] -0.75233
## [5,] -0.07495
t(matrix(m2)) %*% matrix(m2)
## [,1]
## [1,] 3.155
Note that two matrices can only be multiplied if they are conformable: the number of columns of the first matrix has to be the same as the number of rows of the second matrix. So, if you have an \(1\times 5\) matrix, you can multiply it with an \(5\times 1\) matrix, and the end result of the multiplication will be an \(1\times 1\) matrix (i.e., a scalar value). In the above example, the vector m2, converted to a matrix, is \(5\times 1\) in dimensions; you cannot multiply a \(5\times 1\) with a \(5\times 1\) matrix, but the transform of m2 to a \(1\times 5\) matrix can be multiplied with the \(5\times 1\) matrix.
More generally, if you have an \(n\times m\) matrix and multiply it with an \(m\times p\) matrix, you will get a \(n\times p\) matrix, and the operations we have to do is the dot product operation, repeatedly applied to each row of the first matrix and each column of the second matrix. For example:
<- matrix(rnorm(4), ncol = 2)
X1 <- matrix(rnorm(4), ncol = 2)
X2 ## matrix multiplication:
%*% X2 X1
## [,1] [,2]
## [1,] 1.680 -1.677
## [2,] 2.655 -2.610
## is a series of dot products of vectors:
sum(X1[1, ] * X2[, 1])
## [1] 1.68
sum(X1[1, ] * X2[, 2])
## [1] -1.677
sum(X1[2, ] * X2[, 1])
## [1] 2.655
sum(X1[2, ] * X2[, 2])
## [1] -2.61
Scalar matrix multiplication is easy. Multiplying a matrix M with a scalar value just amounts to multiplying the scalar with each cell of the matrix. The order of multiplication doesn’t matter:
5 * m2
## [1] -0.8825 -4.9961 -6.2324 -3.7617 -0.3748
* 5 m2
## [1] -0.8825 -4.9961 -6.2324 -3.7617 -0.3748
5.1.2 Diagonal matrix and identity matrix
A diagonal matrix is a square matrix that has zeros in its off-diagonals:
diag(c(1, 2, 4))
## [,1] [,2] [,3]
## [1,] 1 0 0
## [2,] 0 2 0
## [3,] 0 0 4
An identity matrix is a diagonal matrix with 1’s along its diagonal:
diag(rep(1, 3))
## [,1] [,2] [,3]
## [1,] 1 0 0
## [2,] 0 1 0
## [3,] 0 0 1
For a \(3\times 3\) identity matrix, we can write \(I_3\), and for an nxn identity matrix, \(I_n\).
Multiplying an identity matrix with any (conformable) matrix gives that matrix back:
<- diag(c(1, 1))) (I
## [,1] [,2]
## [1,] 1 0
## [2,] 0 1
<- matrix(c(6, 7, 8, 9), 2, 2)) (m3
## [,1] [,2]
## [1,] 6 8
## [2,] 7 9
%*% m3) (I
## [,1] [,2]
## [1,] 6 8
## [2,] 7 9
## the matrix does not have to be square:
<- matrix(c(2, 3, 6, 7, 8, 9), 2, 3)) (m4
## [,1] [,2] [,3]
## [1,] 2 6 8
## [2,] 3 7 9
%*% m4) (I
## [,1] [,2] [,3]
## [1,] 2 6 8
## [2,] 3 7 9
5.1.3 Powers of matrices
If A is a square \(n\times n\) matrix, then we write \(AA\) as \(A^2\), and so on. If A is diagonal, then \(AA\) is just the diagonal matrix with the diagonal elements of A squared:
<- diag(c(1, 2, 3))
m %*% m m
## [,1] [,2] [,3]
## [1,] 1 0 0
## [2,] 0 4 0
## [3,] 0 0 9
For all positive integers \(m\), \(I_n^m = I_n\).
5.1.4 Inverse of a matrix
If \(A,B\) are square matrices, of order \(n\times n\), and the following relation is satisfied:
\[\begin{equation} AB = BA= I_n \end{equation}\]
then, \(B\) is the inverse (it is unique) of \(A\). We write \(B=A^{-1}\). The inverse is analogous to division and allows us to solve matrix equations: AB=C can be solved by post-multiplying both sides by \(B^{-1}\) to get \(ABB^{-1}=AI=A=CB^{-1}\).
Pre-multiplying a matrix like \(A\) above by its inverse \(A^{-1}\) gives an identity matrix (the R function solve
computes the inverse of a matrix):
<- matrix(c(2, 3, 4, 5), 2, 2)) (m5
## [,1] [,2]
## [1,] 2 4
## [2,] 3 5
round(solve(m5) %*% m5)) (
## [,1] [,2]
## [1,] 1 0
## [2,] 0 1
If a matrix is not invertible, we say that it is singular.
A useful fact to know about non-singular matrices is the following relationship. If A and B are non-singular matrices then \((AB)^{-1} = B^{-1}A^{-1}\). Let’s try out an example:
<- matrix(c(6, 7, 8, 9), 2, 2)
m6 solve(m6)
## [,1] [,2]
## [1,] -4.5 4
## [2,] 3.5 -3
## (m5m6)^{-1}:
solve(m5 %*% m6)
## [,1] [,2]
## [1,] 17.25 -13
## [2,] -13.25 10
## m6^{-1}m5^{-1}:
solve(m6) %*% solve(m5)
## [,1] [,2]
## [1,] 17.25 -13
## [2,] -13.25 10
5.1.5 Linear independence, and rank
An important concept in matrices is linear independence. Consider a \(3\times 3\) matrix. The rows \(r_1, r_2, r_3\) are linearly dependent if \(\alpha, \beta, \gamma\), not all zero, exist such that \(\alpha r_1+ \beta r_2+ \gamma r_3 = (0,0,0)\).
If the rows or columns of a matrix A are linearly dependent, then the matrix is singular, i.e., it is not invertible.
<- matrix(c(1, 1, 3, 1, -1, 1, -2, 2, 4), 3, 3, byrow = TRUE)) (m7
## [,1] [,2] [,3]
## [1,] 1 1 3
## [2,] 1 -1 1
## [3,] -2 2 4
## invertible:
solve(m7)
## [,1] [,2] [,3]
## [1,] 0.5 -0.1667 -0.3333
## [2,] 0.5 -0.8333 -0.1667
## [3,] 0.0 0.3333 0.1667
Linear independence leads us to the concept of the rank of a matrix. The column rank of a matrix is the maximum number of linearly independent columns in the matrix. The row rank is the maximum number of linearly independent rows. Column rank is always equal to row rank, so we can just call it rank. Check the rank of matrix m7:
rankMatrix(m7)
## [1] 3
## attr(,"method")
## [1] "tolNorm2"
## attr(,"useGrad")
## [1] FALSE
## attr(,"tol")
## [1] 6.661e-16
The above is the minimum vocabulary we need to understand the linear model in its matrix form. Let us now turn to the matrix formulation of the linear model.