Preliminary draft c 2008 Cambridge UP
410
18 Matrix decompositions and latent semantic indexing
Exercise 18.5
Verify that the SVD of the matrix in Equation 18.12
is U
=
−
0.816 0.000
−
0.408
−
0.707
−
0.408 0.707
, Σ
=
1.732 0.000 0.000 1.000
and V
T
= −
0.707
−
0.707 0.707
−
0.707 ,
18.13 by verifying all of the properties in the statement of Theorem
18.3 .
Exercise 18.6
Suppose that C is a term-document incidence matrix. What do the entries of C
T
C represent?
Exercise 18.7
Let C
=
0 2 1 0 3 0
2 1 0
18.14
be a term-document matrix whose entries are term frequencies; thus term 1 occurs 2 times in document 2 and once in document 3. Compute CC
T
; observe that its entries are largest where two terms have their most frequent occurrences together in the same
document.
18.3 Low-rank approximations
We next state a matrix approximation problem that at first seems to have little to do with information retrieval. We describe a solution to this matrix
problem using singular-value decompositions, then develop its application to information retrieval.
Given an M
×
N matrix C and a positive integer k, we wish to find an
M
×
N matrix C
k
of rank at most k, so as to minimize the Frobenius norm of
F
ROBENIUS NORM
the matrix difference X
=
C
−
C
k
, defined to be
k
X
k
F
=
v u
u t
M
∑
i= 1
N
∑
j= 1
X
2 ij
. 18.15
Thus, the Frobenius norm of X measures the discrepancy between C
k
and C; our goal is to find a matrix C
k
that minimizes this discrepancy, while con- straining C
k
to have rank at most k. If r is the rank of C, clearly C
r
=
C and the Frobenius norm of the discrepancy is zero in this case. When k is far
smaller than r, we refer to C
k
as a low-rank approximation.
LOW
-
RANK APPROXIMATION
The singular value decomposition can be used to solve the low-rank ma- trix approximation problem. We then derive from it an application to ap-
proximating term-document matrices. We invoke the following three-step procedure to this end:
Preliminary draft c 2008 Cambridge UP
18.3 Low-rank approximations
411
C
k
=
U Σ
k
V
T
r r
r r
r r
r r
r r
r r
r r
r r
r r
r r
r r
r r
r r
r r
r r
r r
r r
r r
r r
r r
r r
r r
r r
r r
r r
r r
◮
Figure 18.2
Illustration of low rank approximation using the singular-value de- composition. The dashed boxes indicate the matrix entries affected by “zeroing out”
the smallest singular values.
1. Given C, construct its SVD in the form shown in
18.9 ; thus, C
=
UΣV
T
. 2.
Derive from Σ the matrix Σ
k
formed by replacing by zeros the r
−
k small-
est singular values on the diagonal of Σ. 3.
Compute and output C
k
=
UΣ
k
V
T
as the rank-k approximation to C. The rank of C
k
is at most k: this follows from the fact that Σ
k
has at most k
non-zero values. Next, we recall the intuition of Example 18.1
: the effect of small eigenvalues on matrix products is small. Thus, it seems plausible
that replacing these small eigenvalues by zero will not substantially alter the product, leaving it “close” to C. The following theorem due to Eckart and
Young tells us that, in fact, this procedure yields the matrix of rank k with the lowest possible Frobenius error.
Theorem 18.4.
min
Z |
rank
Z=k
k
C
−
Z
k
F
= k
C
−
C
k
k
F
=
σ
k+ 1
. 18.16
Recalling that the singular values are in decreasing order σ
1
≥
σ
2
≥ · · ·
, we learn from Theorem
18.4 that C
k
is the best rank-k approximation to C, incurring an error measured by the Frobenius norm of C
−
C
k
equal to σ
k+ 1
. Thus the larger k is, the smaller this error and in particular, for k
=
r , the
error is zero since Σ
r
=
Σ ; provided r
M , N, then σ
r+ 1
=
0 and thus C
r
=
C .
To derive further insight into why the process of truncating the smallest r
−
k singular values in Σ helps generate a rank-k approximation of low error,
we examine the form of C
k
: C
k
=
UΣ
k
V
T
18.17
Preliminary draft c 2008 Cambridge UP
412
18 Matrix decompositions and latent semantic indexing
=
U
σ
1
· · ·
σ
k
· · ·
V
T
18.18
=
k
∑
i= 1
σ
i
~
u
i
~
v
T i
, 18.19
where
~
u
i
and
~
v
i
are the ith columns of U and V, respectively. Thus,
~
u
i
~
v
T i
is a rank-1 matrix, so that we have just expressed C
k
as the sum of k rank-1 matrices each weighted by a singular value. As i increases, the contribution
of the rank-1 matrix
~
u
i
~
v
T i
is weighted by a sequence of shrinking singular values σ
i
.
?
Exercise 18.8
Compute a rank 1 approximation C
1
to the matrix C in Example 18.12
, using the SVD as in Exercise
18.13 . What is the Frobenius norm of the error of this approximation?
Exercise 18.9
Consider now the computation in Exercise 18.8
. Following the schematic in Fig- ure
18.2 , notice that for a rank 1 approximation we have σ
1
being a scalar. Denote by U
1
the first column of U and by V
1
the first column of V. Show that the rank-1 approximation to C can then be written as U
1
σ
1
V
T 1
=
σ
1
U
1
V
T 1
.
Exercise 18.10
Exercise 18.9
can be generalized to rank k approximations: we let U
′
k
and V
′
k
denote the “reduced” matrices formed by retaining only the first k columns of U and V,
respectively. Thus U
′
k
is an M
×
k matrix while V
′
T k
is a k
×
N matrix. Then, we have
C
k
=
U
′
k
Σ
′
k
V
′
T k
, 18.20
where Σ
′
k
is the square k
×
k submatrix of Σ
k
with the singular values σ
1
, . . . , σ
k
on the diagonal. The primary advantage of using
18.20 is to eliminate a lot of redun-
dant columns of zeros in U and V, thereby explicitly eliminating multiplication by columns that do not affect the low-rank approximation; this version of the SVD is
sometimes known as the reduced SVD or truncated SVD and is a computationally simpler representation from which to compute the low rank approximation.
For the matrix C in Example 18.3
, write down both Σ
2
and Σ
′
2
.
18.4 Latent semantic indexing