Least Squares Solution and Pseudo- Inverse

168 downloads 0 Views 3MB Size Report
Least Squares Solution and Pseudo-. Inverse. Introduction. In this notebook we will explore how to solve linear systems of equations given by. (1). A x b.
BGHiggins/UCDavis/ECH256/Jan_2012

Least Squares Solution and PseudoInverse Introduction In this notebook we will explore how to solve linear systems of equations given by A×x = b

(1)

In particular we will be interested in the case when A is not a square matrix, and instead has size m rows and n columns, where m ¹ n. Here is and example when m = 5, n = 3

A=

A11 A21 » A41 A51

º º ¸ º º

A13 A23 » A43 A53

If we have more rows than columns ( m > n), then we have more equations than unknowns, and the system is sometimes referred to as overdetermined. If m < n, the opposite is true and now we have more unknowns than equations. This would be the case if m=3 and n=5 A=

A11 A12 º º A15 » » ¸ º » A31 A32 º º A35

The system of equations is then sometimes referred to as under-determined. If m > n, then A× AT is a m ‰m matrix where as AT × A is a n ‰n matrix We will give precise mathematical descriptions for these terms shortly. The topics that we will discuss in this notes are; (i) Rank of a matrix (ii) Range and null-space of a matrix (iii) Eigenvalues and eigenvectors of a matrix (iv) Singular value decomposition (SVD) of a matrix (v) Least squares solution of a system of equations (vi) Pseudo-inverse of a matrix As we will show below, the above topics are all crucial to gaining a clear understanding of what it means for Eqn. (1) to have a "solution". We will also show how to use Mathematica's built-in functions to do various computations when we are faced with equations that are not "square".

A Simple Example In the following example we will do all the calculations by hand. In this way we can fully appreciate the mathematics that follows. We begin with the following system of linear equations ( written in matrix notation or vector notation)

2

ECh256LeastSquaresSolution.nb

In the following example we will do all the calculations by hand. In this way we can fully appreciate the mathematics that follows. We begin with the following system of linear equations ( written in matrix notation or vector notation) K

x y z

1 3 2 O 2 1 -1

=K

-1 O 3

”

A×x = b

(2)

Let us write these equations out: x + 3 y + 2 z = -1 2 x+y-z = 3

(3)

Next we solve the system to find x and y in terms of z. To do this we multiply the first equation by 2 and then subtract the second equation to eliminate x to get 5 y + 5 z = -5 ” y = -z - 1

(4)

Substituting this result into the first equation to eliminate y we get x = z+2

(5)

Substituting this result into the second equation to eliminate x we get y = -z - 1 Thus for each value of z we have a solution for x and y. Clearly we have an infinite number of possibilities. We can write the solution as x y z

=

2 -1 0

1 -1 1

+

Ξ

(6)

where Ξ is a parameter that takes on all values along the real line ( assuming that x, y and z are real variables). The solution as represented by Eqn. (6) can be written in vector notation as x = xP + Ν Ξ

(7)

where xp is called a particular solution to A·x=b, and Ν is call a null vector that satisfies A×Ν = 0: Clearly any null vector multiplied by a constant Ξ is also a solution. Let us now compute the magnitude of our solution given by Eqn. (7) °x´ =

x2 + y2 + z2 =

H2 + ΞL2 + H-1 - ΞL2 + Ξ2

(8)

Next we ask, what is the value of Ξ that gives a vector that has the smallest magnitude. We can find this value by simply solving â °x´

1 =0 ”

âΞ

12

I3 Ξ2 + 6 Ξ + 5M

H6 Ξ + 6L ” Ξ = -1

(9)

2

Substituting this value into Eq. (6) we get xML =

1 0 -1

(10)

This is called the minimum length solution for Eqn. (2). Note also that the null solution Ν is orthogonal to xML : xML × Ν = 0

(11)

In the rest of these notes we formalize these ideas in terms of a general theory for solving systems of linear equations that are in general not square. That is there are either more equations than unknowns or more unknowns than there are equations.

ECh256LeastSquaresSolution.nb

3

Rank of a Matrix In this section we want to be precise about what we mean by the rank of a matrix. The column rank of a matrix is the dimension of its column space: in plain words this is the number of linearly independent columns of the matrix. Likewise, the row rank of a matrix is the number of linearly independent rows of a matrix. Row rank is always equal to column rank; thus one normally uses the term rank of a matrix to identify the number of linearly independent rows or columns of a matrix. The usual way to determine the rank of a matrix is to find the row rank by transforming the matrix by elementary row operations into row reduce echelon form (RREF). The number of non-zero rows is then the rank r of the matrix. Consider the following 2‰5 matrix A=K

1 5 3 -8 O; 2 8 -2 1

We can use RowReduce to put the equation in row reduced echelon form (RREF) from which we can determine the rank of the matrix by inspection: count the number of non-zero rows. RowReduce@AD  MatrixForm 1 0 - 17 0 1

4

69 2 17 - 2

In RREF form A has 2 non-zero rows and thus we can conclude that the rank of A is r = 2. Since A has 4 columns, this means that not all the columns are linearly independent. For example, column 1 is a linear combination of columns 2 and 4. We can check out this possibility by solving following system of equations: K

1 5 -8 O = c1 K O + c2 K O 2 8 1

where c1 and c2 are non-zero constants. We can use Solve to find these constants: Solve@85 c1 - 8 c2 Š 1, 8 c1 + c2 Š 2> 69

Thus K

17 5 2 1 -8 O= K O+ K O 2 1 69 8 69

The matrix A is said to be of full rank if r = min Hm, nL. Thus if m = 5 and n=3, as shown below,

A=

A11 A21 A31 A41 A51

A12 A22 A32 A42 A52

A13 A23 A33 A43 A53

then for matrix A to have full rank, it must have 3 linearly independent columns. Of course, this also means that the 5 rows of A are not linearly independent, only 3 are. If n ³ m, then for the matrix to have full rank it must have m linearly independent rows. In the following example m=2, n=4

4

ECh256LeastSquaresSolution.nb

then for matrix A to have full rank, it must have 3 linearly independent columns. Of course, this also means that the 5 rows of A are not linearly independent, only 3 are. If n ³ m, then for the matrix to have full rank it must have m linearly independent rows. In the following example m=2, n=4 A=K

1 5 3 -8 O; 2 8 -2 1

Thus A has full rank if r = minH2, 5L = 2. This is the case as the previous calculation showed. Of course, the columns of A are not linearly independent. Matrix A is also referred to as having full row rank. As we noted earlier the following matrices are defined: AT × A and A× AT These are square matrices but their determinant is not the same, In fact Det IA × AT M ¹ 0, where A × AT is a 2‰2 system (12) T

T

Det IA × AM = 0, where A × A is a 4‰4 system A matrix is said to be rank deficient if r < min Hm, nL

(13)

Here is an example of a rank deficient matrix.

A=

1 0 -1 2 1 1 1 -1 0 -1 -2 3 ; 5 2 -1 4 -1 2 5 -8

In this case m=5 and n=4. The rank is found using RowReduce RowReduce@AD  MatrixForm 1 0 0 0 0

0 -1 2 1 2 -3 0 0 0 0 0 0 0 0 0

Thus the rank of A is r=2. Since r < min Hm, nL, we say A is rank deficient. Furthermore, even though A× AT and AT × A are defined, the following is true Det IA × AT M = 0, where A × AT is a 5‰5 system (14) T

T

Det IA × AM = 0, where A × A is a 4‰4 system The implications of Eqn. (12) and Eqn. (14) will be discussed below in determining a solution to A× x = b

Least Squares Solution: Overdetermined Systems Suppose we have the following linear system of equations A×x = b

(15)

where A is a m ‰n matrix (m > n) of rank n. This means we have more equations than unknowns. And since m > n, we say the system in overdetermined. Let us suppose that Eqn. (8) has full column rank. Then the solution to Eqn. (8) can also be interpreted in terms of the range of A, denoted by R(A). In particular, we want to find a suitable x such that b lies in the range of A. But since b is a 1‰m vector, whereas R(A) has dimension n (or less if the rank of A is less than n), it means we are trying to express b as a linear combination of vectors that span the column space of A:

ECh256LeastSquaresSolution.nb

b1 b2 » bm

= x1

A11 A21 » Am1

+ x2

A12 A22 » Am2

+ º + xn

5

A1 n A2 n » Amn

Such a construction is true only for very special choices of b. Nevertheless, we can always seek a ` vector x such that the residual ` (16) r = b-A×x ` is small as possible. One measure of smallness of r is to choose x such that the sum of squares of the residual S is as small as possible S = min IrT × rM ` Then the vector x is called the least squares solution of the overdetermined linear system (15). We prove this result next. The expression for S is ` T ` ` S = Ib - A × xM × Ib - A × xM = ±A × x - bµ

(17)

(18)

Expanding the RHS of (18) we get S = bT × b - xT × IAT × bM - IbT × A M × x + xT × IAT × AM × x

(19)

To minimize S with respect to x , we compute the derivatives ¶S ‘ ¶xj = 0, j = 1, 2, …, n. The result is ` IAT × AM × x = AT × b

(20)

Recall that the matrix A has rank n, and therefore AT also has rank n. One can then prove that AT × A is also of rank n (Nobel, p139, 1969). Since AT × A is a n‰n matrix, it therefore has full rank and its inverse exists. Thus the least squares solution to Eqn. (15) is ` -1 (21) x = IAT × AM AT × b Note that A× AT is a m‰m matrix but its inverse does not exist!

Example 1: Solution of linear system when matrix A has full rank. Consider the following system of linear equations A.x = b

(22)

where A is a 5‰3 matrix given by

A=

1 1 2 -1 -1 4 4 2 3 -3

1 2 3 ; 1 4

For this system of equations we have more equations than unknowns. If we use RowReduce to determine the row echelon form we can deduce by inspection that the rank of A is r = 3:

6

ECh256LeastSquaresSolution.nb

RowReduce@AD  MatrixForm 1 0 0 0 0

0 1 0 0 0

0 0 1 0 0

Since r = n, the matrix A has full rank. By this we mean that the rank of A is equal to the number of columns of A . This means that the dimension of the NullSpace is zero, and we confirm this with the function NullSpace NullSpace@AD 8