A for j := 1 to n assign jth sample to cluster S update S with respect to the new clustering ; A
S. Busygin et al. Computers Operations Research 35 2008 2964 – 2987 2973
but considers all the submatrices formed by them. The algorithm is based on algebraic properties of the matrix of residues.
For a given clustering of features F
1
, F
2
, . . . , F
q
, introduce a feature cluster indicator matrix F = f
ik m×q
such that f
ik
= |F
k
|
−12
if i ∈ F
k
and f
ik
= 0 otherwise. Also, for a given clustering of samples S
1
, S
2
, . . . , S
r
, introduce a sample cluster indicator matrix S = s
j k n×r
such that s
j k
= |S
k
|
−12
if j ∈ S
k
and s
j k
= 0 otherwise. Notice that these matrices are orthonormal, that is, all columns are orthogonal to each other and have unit length. Now,
let H = h
ij m×n
be the residue matrix. There are two choices for h
ij
definition. It may be defined similar to 4: h
ij
= a
ij
−
r ik
−
c j ℓ
+
kℓ
, 10
where i ∈ F
ℓ
, j ∈ S
k
,
r
and
c
are defined as in 2 and 3, and
kℓ
is the average of the co-cluster S
k
, F
ℓ
:
kℓ
=
i∈F
ℓ
j ∈S
k
a
ij
|F
ℓ
S
k
| .
Alternatively, h
ij
may be defined just as the difference between a
ij
and the co-cluster average: h
ij
= a
ij
−
kℓ
. 11
By direct algebraic manipulations it can be shown that H = A − F F
T
ASS
T
12 in case of 11 and
H = I − F F
T
AI − SS
T
13 in case of 10.
The method tries to minimize H
2
using an iterative process such that on each iteration a current co-clustering is updated so that H
2
, at least, does not increase. The authors point out that finding the global minimum for H
2
over all possible co-clusterings would lead to an NP-hard problem. There are two types of clustering updates used: batch
when all samples or features may be moved between clusters at one time and incremental one sample or one feature is moved at a time. In case of 11 the batch algorithm works as follows:
Algorithm 1 Coclus
H 1
.
Input: data matrix A
, number of sample clusters r, number of feature clusters q.
Output: clustering indicators S and F
.