Characterization of Least Squares Solutions
1.4.2 Characterization of Least Squares Solutions
Let y ∈ R n be a vector of observations that is related to a parameter vector c ∈ R by the linear relation
(1.4.2) where A is a known matrix of full column rank and ǫ ∈ R m is a vector of random errors. We
y = Ac + ǫ, A ∈R m ×n ,
assume here that ǫ i , i = 1 : m, has zero mean and that ǫ i and ǫ j i.e.,
The parameter c is then a random vector which we want to estimate in terms of the known quantities A and y. Let y T c be a linear functional of the parameter c in (1.4.2). We say that θ = θ(A, y) is an unbiased linear estimator of y T c if E(θ) = y T c . It is a best linear unbiased estimator if θ has the smallest variance among all such estimators. The following theorem 15 places the method of least squares on a sound theoretical basis.
15 This theorem is originally due to C. F. Gauss (1821). His contribution was somewhat neglected until redis- covered by the Russian mathematician A. A. Markov in 1912.
47 Theorem 1.4.1 (Gauss–Markov Theorem).
1.4. The Linear Least Squares Problem
Consider a linear model ( 1.4.2), where ǫ is an uncorrelated random vector with zero mean and covariance matrix V = σ 2
I . Then the best linear unbiased estimator of any linear functional y T c is y T ˆc, where
ˆc = (A T A) −1 A T y
2 . Fur- thermore, the covariance matrix of the least squares estimate ˆc equals
is an unbiased estimate of σ 2 , i.e. E(s 2 )
Proof. See Zelen [389, pp. 560–561]. The set of all least squares solutions can also be characterized geometrically. For this
purpose we introduce two fundamental subspaces of R m , the range of A and the null space of A T , defined by
R(A) = {z ∈ R n | z = Ax, x ∈ R },
(1.4.7) If z ∈ R(A) and y ∈ N (A T ) , then z T y
N (A T = {y ∈ R |A y = 0}.
= 0, which shows that N (A T ) is the orthogonal complement of R(A).
=x T A T y
By the Gauss–Markov theorem any least squares solution to an overdetermined linear system Ax = b satisfies the normal equations
=A T b. (1.4.8) The normal equations are always consistent, since the right-hand side satisfies
Therefore, a least squares solution always exists, although it may not be unique.
Theorem 1.4.2.
2 if and only if the residual vector r = b − Ax is orthogonal to R(A) or, equivalently,
(1.4.9) Proof. Let x be a vector for which A T (b
A T (b − Ax) = 0.
− Ax) = 0. For any y ∈ R n , it holds that
b − Ay = (b − Ax) + A(x − y). Squaring this and using (1.4.9) we obtain
2 2 2 2 , where equality holds only if A(x − y) = 0.
48 Chapter 1. Principles of Numerical Calculations Now assume that A T (b
2 − 2ǫ(Az) (b − Ax)
From Theorem 1.4.2 it follows that a least squares solution x decomposes the right- hand side into two orthogonal components
(1.4.10) This geometric interpretation is illustrated in Figure 1.4.1. Note that although the solution
b = Ax + r, T r = b − Ax ∈ N (A ), Ax ∈ R(A).
x to the least squares problem may not be unique, the decomposition (1.4.10) always is unique.
Figure 1.4.1. Geometric characterization of the least squares solution. We now give a necessary and sufficient condition for the least squares solution to be
unique.
Theorem 1.4.3.
A is positive definite and hence nonsingular if and only if the columns of A are linearly independent, that is, when rank (A) = n. In this case the least squares solution x is unique and given by
The matrix A T
= (A T A) −1 A T b. (1.4.11)
Proof.
x T A T Ax
0, and hence A T A is positive definite. On the other hand, if the columns are linearly dependent, then for some x 0 have Ax 0 T A T Ax 0 T A is not positive definite. When A = 0. Then x T 0 = 0, and therefore A A is positive definite it is also nonsingular and (1.4.11) follows.
When A has full column rank A T A is symmetric and positive definite and the normal equations can be solved by computing the Cholesky factorization A T A =R T R . The normal
49 equations then become R T Rx
1.4. The Linear Least Squares Problem
=A T b , which decomposes as
=A T b,
Rx = z.
The first system is lower triangular and z is computed by forward-substitution. Then x is computed from the second upper triangular system by back-substitution. For many practical problems this method of normal equations is an adequate solution method, although its numerical stability is not the best.
Example 1.4.1.
The comet Tentax, discovered in 1968, is supposed to move within the solar system. The following observations of its position in a certain polar coordinate system have been made:
. By Kepler’s first law the comet should move in a plane orbit of elliptic or hyperbolic form,
if the perturbations from planets are neglected. Then the coordinates satisfy
r = p/(1 − e cos φ),
where p is a parameter and e the eccentricity. We want to estimate p and e by the method of least squares from the given observations.
We first note that if the relationship is rewritten as
1/p − (e/p) cos φ = 1/r,
it becomes linear in the parameters x 1 = 1/p and x 2 = e/p. We then get the linear system Ax = b, where
0.9804 The least squares solution is x = ( 0.6886 0.4839 ) T giving p = 1/x 1 = 1.4522 and finally e = px 2 = 0.7027.
By (1.4.10), if x is a least squares solution, then Ax is the orthogonal projection of b onto R(A). Thus orthogonal projections play a central role in least squares problems. In
∈R m ×m is called a projector onto a subspace S ⊂ R if and only if it holds that
general, a matrix P 1 m
P 1 v = v ∀v ∈ S,
P 2 1 =P 1 .
50 Chapter 1. Principles of Numerical Calculations
1 2 , where P
An arbitrary vector v ∈ R m can then be decomposed as v = P 1 v
+P ≡v +v
. In particular, if P is symmetric, P
2 =I−P 1 1 1 =P 1 , we have P T 1 P 2 v
1 )v = 0 ∀v ∈ R , and it follows that P T P
1 m 2 = 0. Hence v 1 2 =v 1 P 2 v = 0 for all v ∈ R , i.e., v 2 ⊥v 1 . In this case P 1 is the orthogonal projector onto S, and P 2 =I−P 1 is the orthogonal
projector onto S ⊥ . In the full column rank case, rank (A) = n, of the least squares problem, the residual
r = b − Ax can be written r = b − P R(A) b , where
(1.4.13) is the orthogonal projector onto R(A). If rank (A) < n, then A has a nontrivial null space.
R(A) T = A(A A) −1 A T
2 , then the set of all least squares solutions is
(1.4.14) In this set there is a unique solution of minimum norm characterized by x ⊥ N (A), which
S = {x = ˆx + y | y ∈ N (A)}.
is called the pseudoinverse solution.