BMAA3-2-27. 333KB Jun 04 2011 12:03:33 AM

Bulletin of Mathematical Analysis and Applications
ISSN: 1821-1291, URL: http://www.bmathaa.org
Volume 3 Issue 2(2011), Pages 269-277.

STRASSENโ€™S MATRIX MULTIPLICATION ALGORITHM FOR
MATRICES OF ARBITRARY ORDER

(COMMUNICATED BY MARTIN HERMANN)

IVO HEDTKE
Abstract. The well known algorithm of Volker Strassen for matrix multiplication can only be used for (๐‘š2๐‘˜ ร— ๐‘š2๐‘˜ ) matrices. For arbitrary (๐‘› ร— ๐‘›)
matrices one has to add zero rows and columns to the given matrices to use
Strassenโ€™s algorithm. Strassen gave a strategy of how to set ๐‘š and ๐‘˜ for
arbitrary ๐‘› to ensure ๐‘› โ‰ค ๐‘š2๐‘˜ . In this paper we study the number ๐‘‘ of additional zero rows and columns and the in๏ฌ‚uence on the number of ๏ฌ‚ops used
by the algorithm in the worst case (๐‘‘ = ๐‘›/16), best case (๐‘‘ = 1) and in the
average case (๐‘‘ โ‰ˆ ๐‘›/48). The aim of this work is to give a detailed analysis
of the number of additional zero rows and columns and the additional work
caused by Strassenโ€™s bad parameters. Strassen used the parameters ๐‘š and
๐‘˜ to show that his matrix multiplication algorithm needs less than 4.7๐‘›log2 7
๏ฌ‚ops. We can show in this paper, that these parameters cause an additional
work of approximately 20 % in the worst case in comparison to the optimal

strategy for the worst case. This is the main reason for the search for better
parameters.

1. Introduction
In his paper โ€œGaussian Elimination is not Optimal โ€ ([14]) Volker Strassen
developed a recursive algorithm (we will call it ๐’ฎ) for multiplication of square
matrices of order ๐‘š2๐‘˜ . The algorithm itself is described below. Further details can
be found in [8, p. 31].
Before we start with our analysis of the parameters of Strassenโ€™s algorithm
we will have a short look on the history of fast matrix multiplication. The naive
algorithm for matrix multiplication is an ๐’ช(๐‘›3 ) algorithm. In 1969 Strassen
showed that there is an ๐’ช(๐‘›2.81 ) algorithm for this problem. Shmuel Winograd
optimized Strassenโ€™s algorithm. While the Strassen-Winograd algorithm is a
variant that is always implemented (for example in the famous GEMMW package),
there are faster ones (in theory) that are impractical to implement. The fastest
known algorithm, devised in 1987 by Don Coppersmith and Winograd, runs in
1991 Mathematics Subject Classi๏ฌcation. Primary 65F30; Secondary 68Q17.
Key words and phrases. Fast Matrix Multiplication, Strassen Algorithm.
c
โƒ2011

Universiteti i Prishtineฬˆs, Prishtineฬˆ, Kosoveฬˆ.
The author was supported by the Studienstiftung des Deutschen Volkes.
Submitted February 2, 2011. Published March 2, 2011.
269

270

I. HEDTKE

๐’ช(๐‘›2.38 ) time. There is also an interesting group-theoretic approach to fast matrix
multiplication from Henry Cohn and Christopher Umans, see [4], [5] and [13].
Most researchers believe that an optimal algorithm with ๐’ช(๐‘›2 ) runtime exists, since
then no further progress was made in ๏ฌnding one.
Because modern architectures have complex memory hierarchies and increasing
parallelism, performance has become a complex tradeo๏ฌ€, not just a simple matter
of counting ๏ฌ‚ops (in this article one ๏ฌ‚op means one ๏ฌ‚oating-point operation, that
means one addition is a ๏ฌ‚op and one multiplication is one ๏ฌ‚op, too). Algorithms
which make use of this technology were described by Paolo Dโ€™Alberto and
Alexandru Nicolau in [1]. An also well known method is Tiling: The normal
algorithm can be speeded up by a factor of two by using a six loop implementation

that blocks submatrices so that the data passes through the L1 Cache only once.
1.1. The algorithm. Let ๐ด and ๐ต be (๐‘š2๐‘˜ ร—๐‘š2๐‘˜ ) matrices. To compute ๐ถ := ๐ด๐ต
let
]
]
[
[
๐ต11 ๐ต12
๐ด11 ๐ด12
,
and ๐ต =
๐ด=
๐ต21 ๐ต22
๐ด21 ๐ด22
where ๐ด๐‘–๐‘— and ๐ต๐‘–๐‘— are matrices of order ๐‘š2๐‘˜โˆ’1 . With the following auxiliary
matrices
๐ป1 := (๐ด11 + ๐ด22 )(๐ต11 + ๐ต22 )

๐ป2 := (๐ด21 + ๐ด22 )๐ต11


๐ป3 := ๐ด11 (๐ต12 โˆ’ ๐ต22 )

๐ป4 := ๐ด22 (๐ต21 โˆ’ ๐ต11 )

๐ป5 := (๐ด11 + ๐ด12 )๐ต22

๐ป6 := (๐ด21 โˆ’ ๐ด11 )(๐ต11 + ๐ต12 )

๐ป7 := (๐ด12 โˆ’ ๐ด22 )(๐ต21 + ๐ต22 )
we get
๐ถ=

[

๐ป 1 + ๐ป4 โˆ’ ๐ป5 + ๐ป 7
๐ป2 + ๐ป4

]
๐ป3 + ๐ป 5
.

๐ป1 + ๐ป3 โˆ’ ๐ป 2 + ๐ป6

This leads to recursive computation. In the last step of the recursion the products
of the (๐‘š ร— ๐‘š) matrices are computed with the naive algorithm (straight forward
implementation with three for-loops, we will call it ๐’ฉ ).
1.2. Properties of the algorithm. The algorithm ๐’ฎ needs (see [14])
๐น๐’ฎ (๐‘š, ๐‘˜) := 7๐‘˜ ๐‘š2 (2๐‘š + 5) โˆ’ 4๐‘˜ 6๐‘š2
๏ฌ‚ops to compute the product of two square matrices of order ๐‘š2๐‘˜ . The naive
algorithm ๐’ฉ needs ๐‘›3 multiplications and ๐‘›3 โˆ’๐‘›2 additions to compute the product
of two (๐‘› ร— ๐‘›) matrices. Therefore ๐น๐’ฉ (๐‘›) = 2๐‘›3 โˆ’ ๐‘›2 . In the case ๐‘› = 2๐‘ the
algorithm ๐’ฎ is better than ๐’ฉ if ๐น๐’ฎ (1, ๐‘) < ๐น๐’ฉ (2๐‘ ), which is the case i๏ฌ€ ๐‘ โฉพ 10.
But if we use algorithm ๐’ฎ only for matrices of order at least 210 = 1024, we get a
new problem:
๐‘
๐‘
Lemma 1.1. The algorithm ๐’ฎ needs 17
3 (7 โˆ’ 4 ) units of memory (we write โ€œuomโ€
in short) (number of floats or doubles) to compute the product of two (2๐‘ ร— 2๐‘ )
matrices.


Proof. Let ๐‘€ (๐‘›) be the number of uom used by ๐’ฎ to compute the product of
matrices of order ๐‘›. The matrices ๐ด๐‘–๐‘— , ๐ต๐‘–๐‘— and ๐ปโ„“ need 15(๐‘›/2)2 uom. During the
computation of the auxiliary matrices ๐ปโ„“ we need 7๐‘€ (๐‘›/2) uom and 2(๐‘›/2)2 uom

STRASSENโ€™S ALGORITHM FOR MATRICES OF ARBITRARY SIZE

271

as input arguments for the recursive calls of ๐’ฎ. Therefore we get ๐‘€ (๐‘›) = 7๐‘€ (๐‘›/2)+
๐‘
๐‘
โ–ก
(17/4)๐‘›2 . Together with ๐‘€ (1) = 0 this yields to ๐‘€ (2๐‘ ) = 17
3 (7 โˆ’ 4 ).
As an example, if we compute the product of two (210 ร—210 ) matrices (represented
10
as double arrays) with ๐’ฎ we need 8 โ‹… 17
โˆ’ 410 ) bytes, i.e. 12.76 gigabytes of
3 (7
memory. That is an enormous amount of RAM for such a problem instance. Brice

Boyer et al. ([3]) solved this problem with fully in-place schedules of StrassenWinogradโ€™s algorithm (see the following paragraph), if the input matrices can be
overwritten.
Shmuel Winograd optimized Strassenโ€™s algorithm. The Strassen-Winograd algorithm (described in [11]) needs only 15 additions and subtractions, whereas
๐’ฎ needs 18. Winograd had also shown (see [15]), that the minimum number of
multiplications required to multiply 2 ร— 2 matrices is 7. Furthermore, Robert
Probert ([12]) showed that 15 additive operations are necessary and su๏ฌƒcient to
multiply two 2 ร— 2 matrices with 7 multiplications.
Because of the bad properties of ๐’ฎ with full recursion and large matrices, one
can study the idea to use only one step of recursion. If ๐‘› is even and we use one step
of recursion of ๐’ฎ (for the remaining products we use ๐’ฉ ) the ratio of this operation
count to that required by ๐’ฉ is (see [9])
7๐‘›3 + 11๐‘›2
8๐‘›3 โˆ’ 4๐‘›2

๐‘›โ†’โˆž

โˆ’โˆ’โˆ’โˆ’โ†’

7
.

8

Therefore the multiplication of two su๏ฌƒciently large matrices using ๐’ฎ costs approximately 12.5 % less than using ๐’ฉ .
Using the technique of stopping the recursion in the Strassen-Winograd algorithm early, there are well known implementations, as for example
โˆ™ on the Cray-2 from David Bailey ([2]),
โˆ™ GEMMW from Douglas et al. ([7]) and
โˆ™ a routine in the IBM ESSL library routine ([10]).
1.3. The aim of this work. Strassenโ€™s algorithm can only be used for (๐‘š2๐‘˜ ร—
๐‘š2๐‘˜ ) matrices. For arbitrary (๐‘›ร—๐‘›) matrices one has to add zero rows and columns
to the given matrices (see the next section) to use Strassenโ€™s algorithm. Strassen
gave a strategy of how to set ๐‘š and ๐‘˜ for arbitrary ๐‘› to ensure ๐‘› โ‰ค ๐‘š2๐‘˜ . In this
paper we study the number ๐‘‘ of additional zero rows and columns and the in๏ฌ‚uence
on the number of ๏ฌ‚ops used by the algorithm in the worst case, best case and in
the average case.
It is known ([11]), that these parameters are not optimal. We only study the
number ๐‘‘ and the additional work caused by the bad parameters of Strassen.
We give no better strategy of how to set ๐‘š and ๐‘˜, and we do not analyze other
strategies than the one from Strassen.
2. Strassenโ€™s parameter for matrices of arbitrary order
Algorithm ๐’ฎ uses recursions to multiply matrices of order ๐‘š2๐‘˜ . If ๐‘˜ = 0 then ๐’ฎ

coincides with the naive algorithm ๐’ฉ . So we will only consider the case where
๐‘˜ > 0. To use ๐’ฎ for arbitrary (๐‘› ร— ๐‘›) matrices ๐ด and ๐ต (that means for arbitrary
หœ which are both (หœ
๐‘›) we have to embed them into matrices ๐ดหœ and ๐ต
๐‘›ร—๐‘›
หœ ) matrices
๐‘˜
with ๐‘›
หœ := ๐‘š2 โฉพ ๐‘›. We do this by adding โ„“ := ๐‘›
หœ โˆ’ ๐‘› zero rows and colums to ๐ด

272

I. HEDTKE

and ๐ต. This results in
หœ=
๐ดหœ๐ต

[


๐ด
0โ„“ร—๐‘›

0๐‘›ร—โ„“
0โ„“ร—โ„“

][

]
0๐‘›ร—โ„“
หœ
=: ๐ถ,
0โ„“ร—โ„“

๐ต
0โ„“ร—๐‘›

where 0๐‘˜ร—๐‘— denotes the (๐‘˜ ร— ๐‘—) zero matrix. If we delete the last โ„“ columns and
rows of ๐ถหœ we get the result ๐ถ = ๐ด๐ต.

We now focus on how to ๏ฌnd ๐‘š and ๐‘˜ for arbitrary ๐‘› with ๐‘› โฉฝ ๐‘š2๐‘˜ . An optimal
but purely theoretical choice is
(๐‘šโˆ— , ๐‘˜ โˆ— ) = arg min{๐น๐’ฎ (๐‘š, ๐‘˜) : (๐‘š, ๐‘˜) โˆˆ โ„• ร— โ„•0 , ๐‘› โฉฝ ๐‘š2๐‘˜ }.
Further methods of ๏ฌnding ๐‘š and ๐‘˜ can be found in [11]. We choose another way.
According to Strassenโ€™s proof of the main result of [14], we de๏ฌne
๐‘˜ := โŒŠlog2 ๐‘›โŒ‹ โˆ’ 4

๐‘š := โŒŠ๐‘›2โˆ’๐‘˜ โŒ‹ + 1,

and

(2.1)

where โŒŠ๐‘ฅโŒ‹ denotes the largest integer not greater than ๐‘ฅ. We de๏ฌne ๐‘›
หœ := ๐‘š2๐‘˜ and
study the relationship between ๐‘› and ๐‘›
หœ . The results are:
โˆ™ worst case: ๐‘›
หœ โฉฝ (17/16)๐‘›,
โˆ™ best case: ๐‘›
หœ โฉพ ๐‘› + 1 and
โˆ™ average case: ๐‘›
หœ โ‰ˆ (49/48)๐‘›.
2.1. Worst case analysis.
Theorem 2.1. Let ๐‘› โˆˆ โ„• with ๐‘› โฉพ 16. For the parameters (2.1) and ๐‘š2๐‘˜ = ๐‘›
หœ we
have
17
๐‘›.
๐‘›
หœโ‰ค
16
If ๐‘› is a power of two, we have ๐‘›
หœ=

17
16 ๐‘›.

Proof. For ๏ฌxed ๐‘› there is exactly one ๐›ผ โˆˆ โ„• with 2๐›ผ โ‰ค ๐‘› < 2๐›ผ+1 . We de๏ฌne
๐ผ ๐›ผ := {2๐›ผ , . . . , 2๐›ผ+1 โˆ’ 1}. Because of (2.1) for each ๐‘› โˆˆ ๐ผ ๐›ผ the value of ๐‘˜ is
๐‘˜ = โŒŠlog2 ๐‘›โŒ‹ โˆ’ 4 = log2 2๐›ผ โˆ’ 4 = ๐›ผ โˆ’ 4.
Possible values for ๐‘š are
โŒ‹
โŒŠ
โŒ‹
โŒŠ
1
16
๐‘š = ๐‘› ๐›ผโˆ’4 + 1 = ๐‘› ๐›ผ + 1 =: ๐‘š(๐‘›).
2
2
๐‘š(๐‘›) is increasing in ๐‘› and ๐‘š(2๐›ผ ) = 17 and ๐‘š(2๐›ผ+1 ) = 33. Therefore we have
๐‘š โˆˆ {17, . . . , 32}. For each ๐‘› โˆˆ ๐ผ ๐›ผ one of the following inequalities holds:
(๐ผ1๐›ผ )
(๐ผ2๐›ผ )

2๐›ผ = 16 โ‹… 2๐›ผโˆ’4
17 โ‹… 2๐›ผโˆ’4

โ‰ค
โ‰ค

๐‘›
๐‘›
..
.

<
<

17 โ‹… 2๐›ผโˆ’4
18 โ‹… 2๐›ผโˆ’4

๐›ผ
(๐ผ16
)
31 โ‹… 2๐›ผโˆ’4 โ‰ค ๐‘› < 32 โ‹… 2๐›ผโˆ’4 = 2๐›ผ+1 .
โŠŽ16
Note that ๐ผ ๐›ผ = ๐‘—=1 ๐ผ๐‘—๐›ผ . It follows, that for all ๐‘› โˆˆ โ„• there exists exactly one ๐›ผ
with ๐‘› โˆˆ ๐ผ ๐›ผ and for all ๐‘› โˆˆ ๐ผ ๐›ผ there is exactly one ๐‘— with ๐‘› โˆˆ ๐ผ๐‘—๐›ผ .
Note that for all ๐‘› โˆˆ ๐ผ๐‘—๐›ผ we have ๐‘˜ = ๐›ผ โˆ’ 4 and ๐‘š(๐‘›) = ๐‘— + 16. If we only focus
on ๐ผ๐‘—๐›ผ the di๏ฌ€erence ๐‘›
หœ โˆ’ ๐‘› has its maximum at the lower end of ๐ผ๐‘—๐›ผ (หœ
๐‘› is constant

STRASSENโ€™S ALGORITHM FOR MATRICES OF ARBITRARY SIZE

273

๐น๐’ฎ (2๐‘— , 10 โˆ’ ๐‘—)
2 ร— 109

1 ร— 109

๐‘—
0

1

2

3

4

5

6

7

8

9

10

Figure 1. Di๏ฌ€erent parameters (๐‘š = 2๐‘— , ๐‘˜ = 10 โˆ’ ๐‘—, ๐‘› = ๐‘š2๐‘˜ )
to apply in the Strassen algorithm for matrices of order 210 .
หœ and the
and ๐‘› has its minimum at the lower end of ๐ผ๐‘—๐›ผ ). On ๐ผ๐‘—๐›ผ the value of ๐‘›
minimum of ๐‘› are
๐›ผโˆ’4
๐‘›
หœ๐›ผ
๐‘— := (16 + ๐‘—) โ‹… 2

and

๐›ผโˆ’4
๐‘›๐›ผ
.
๐‘— := (15 + ๐‘—) โ‹… 2

๐›ผ
Therefore the di๏ฌ€erence ๐‘‘๐›ผ
หœ๐›ผ
๐‘— := ๐‘›
๐‘— โˆ’ ๐‘›๐‘— is constant:
๐›ผโˆ’4
๐‘‘๐›ผ
โˆ’ (15 + ๐‘—) โ‹… 2๐›ผโˆ’4 = 2๐›ผโˆ’4
๐‘— = (16 + ๐‘—) โ‹… 2

for all ๐‘—.

To set this in relation with ๐‘› we study
๐‘Ÿ๐‘—๐›ผ :=

๐›ผ
๐‘›
หœ๐›ผ
๐‘›๐›ผ
๐‘‘๐›ผ
2๐›ผโˆ’4
๐‘—
๐‘— + ๐‘‘๐‘—
๐‘—
=
=
1
+
=
1
+
.
๐‘›๐›ผ
๐‘›๐›ผ
๐‘›๐›ผ
๐‘›๐›ผ
๐‘—
๐‘—
๐‘—
๐‘—

๐›ผ
๐›ผโˆ’4
Finally ๐‘Ÿ๐‘—๐›ผ is maximal, i๏ฌ€ ๐‘›๐›ผ
= 2๐›ผ .
๐‘— is minimal, which is the case for ๐‘›1 = 16 โ‹… 2
17
With ๐‘Ÿ1๐›ผ = 17/16 we get ๐‘›
หœ โ‰ค 16
๐‘›, which completes the proof.
โ–ก

Now we want to use the result above to take a look at the number of ๏ฌ‚ops we
need for ๐’ฎ in the worst case. The worst case is ๐‘› = 2๐‘ for any 4 โ‰ค ๐‘ โˆˆ โ„•. An
optimal decomposition (in the sense of minimizing the number (หœ
๐‘› โˆ’ ๐‘›) of zero rows
and columns we add to the given matrices) is ๐‘š = 2๐‘— and ๐‘˜ = ๐‘ โˆ’ ๐‘—, because
๐‘š2๐‘˜ = 2๐‘— 2๐‘โˆ’๐‘— = 2๐‘ = ๐‘›. Note that these parameters ๐‘š and ๐‘˜ have nothing to do
with equation (2.1). Lets have a look on the in๏ฌ‚uence of ๐‘—:
Lemma 2.2. Let ๐‘› = 2๐‘ . In the decomposition ๐‘› = ๐‘š2๐‘˜ we use ๐‘š = 2๐‘— and
๐‘˜ = ๐‘ โˆ’ ๐‘—. Then ๐‘“ (๐‘—) := ๐น๐’ฎ (2๐‘— , ๐‘ โˆ’ ๐‘—) has its minimum at ๐‘— = 3.
Proof. We have ๐‘“ (๐‘—) = 2 โ‹… 7๐‘ (8/7)๐‘— + 5 โ‹… 7๐‘ (4/7)๐‘— โˆ’ 4๐‘ 6. Thus
๐‘“ (๐‘— + 1) โˆ’ ๐‘“ (๐‘—)
= [2 โ‹… 7๐‘ (8/7)๐‘—+1 + 5 โ‹… 7๐‘ (4/7)๐‘—+1 โˆ’ 4๐‘ 6] โˆ’ [2 โ‹… 7๐‘ (8/7)๐‘— + 5 โ‹… 7๐‘ (4/7)๐‘— โˆ’ 4๐‘ 6]
= 2 โ‹… 7๐‘ (8/7)๐‘— (8/7 โˆ’ 1) + 5 โ‹… 7๐‘ (4/7)๐‘— (4/7 โˆ’ 1)
= 2 โ‹… 7๐‘ (8/7)๐‘— โ‹… 1/7 โˆ’ 15 โ‹… 7๐‘ (4/7)๐‘— โ‹… 1/7 = 2(4/7)๐‘— 7๐‘โˆ’1 (2๐‘— โˆ’ 7.5).
Therefore, ๐‘“ (๐‘—) is a minimum if ๐‘— = min{๐‘– : 2๐‘– โˆ’ 7.5 > 0} = 3.

โ–ก

274

I. HEDTKE

๐‘“1 (๐‘)

๐‘“2 (๐‘)

1.21

1.008

1.20

1.007

1.19

1.006
๐‘

๐‘

4 6 8 10 12 14 16 18 20

4 6 8 10 12 14 16 18 20

Figure 2. Comparison of Strassenโ€™s parameters (๐‘“1 , ๐‘š = 17,
๐‘˜ = ๐‘ โˆ’ 4), obviously better parameters (๐‘“2 , ๐‘š = 16, ๐‘˜ = ๐‘ โˆ’ 4)
and the optimal parameters (lemma, ๐‘š = 8, ๐‘˜ = ๐‘ โˆ’ 3) for the
worst case ๐‘› = 2๐‘ . Limits of ๐‘“๐‘— are dashed.

Figure 1 shows that it is not optimal to use ๐’ฎ with full recursion in the example
๐‘ = 10. Now we study the worst case ๐‘› = 2๐‘ and di๏ฌ€erent sets of parameters ๐‘š
and ๐‘˜ for ๐น๐’ฎ (๐‘š, ๐‘˜):
(1) If we use equation (2.1), we get the original parameters of Strassen: ๐‘˜ =
๐‘ โˆ’ 4 and ๐‘š = 17. Therefore we de๏ฌne
๐น1 (๐‘) := ๐น๐’ฎ (17, ๐‘ โˆ’ 4) = 7๐‘

2
39 โ‹… 172
๐‘ 6 โ‹… 17
โˆ’
4
.
74
44

(2) Obviously ๐‘š = 16 would be a better choice, because with this we get
๐‘š2๐‘˜ = ๐‘› (we avoid the additional zero rows and columns). Now we de๏ฌne
๐น2 (๐‘) := ๐น๐’ฎ (16, ๐‘ โˆ’ 4) = 7๐‘

37 โ‹… 162
โˆ’ 4๐‘ 6.
74

(3) Finally we use the lemma above. With ๐‘š = 8 = 23 and ๐‘˜ = ๐‘ โˆ’ 3 we get
๐น3 (๐‘) := ๐น๐’ฎ (8, ๐‘ โˆ’ 3) = 7๐‘

3 โ‹… 64
โˆ’ 4๐‘ 6.
49

Now we analyze ๐น๐‘– relative to each other. Therefore we de๏ฌne ๐‘“๐‘— := ๐น๐‘— /๐น3
(๐‘— = 1, 2). So we have ๐‘“๐‘— : {4, 5, . . .} โ†’ โ„, which is monotonously decreasing in ๐‘.
With
๐‘“1 (๐‘) =

289(4๐‘ โ‹… 2401 โˆ’ 7๐‘ โ‹… 1664)
12544(4๐‘ โ‹… 49 โˆ’ 7๐‘ โ‹… 32)

and

๐‘“2 (๐‘) =

4๐‘ โ‹… 7203 โˆ’ 7๐‘ โ‹… 4736
147(4๐‘ โ‹… 49 โˆ’ 7๐‘ โ‹… 32)

we get
๐‘“1 (4) = 3179/2624 โ‰ˆ 1.2115
๐‘“2 (4) = 124/123 โ‰ˆ 1.00813

lim ๐‘“1 (๐‘) = 3757/3136 โ‰ˆ 1.1980

๐‘โ†’โˆž

lim ๐‘“2 (๐‘) = 148/147 โ‰ˆ 1.00680.

๐‘โ†’โˆž

Figure 2 shows the graphs of the functions ๐‘“๐‘— . In conclusion, in the worst case the
parameters of Strassen need approx. 20 % more ๏ฌ‚ops than the optimal parameters
of the lemma.

STRASSENโ€™S ALGORITHM FOR MATRICES OF ARBITRARY SIZE

275

2.2. Best case analysis.
Theorem 2.3. Let ๐‘› โˆˆ โ„• with ๐‘› โฉพ 16. For the parameters (2.1) and ๐‘š2๐‘˜ = ๐‘›
หœ we
have
๐‘›+1โ‰ค๐‘›
หœ.
๐‘

If ๐‘› = 2 โ„“ โˆ’ 1, โ„“ โˆˆ {16, . . . , 31}, we have ๐‘›
หœ = ๐‘› + 1.
Proof. Like in Theorem 2.1 we have for each ๐ผ๐‘—๐›ผ a constant value for ๐‘›
หœ๐›ผ
๐‘— namely
๐›ผโˆ’4
2
(16 + ๐‘—). Therefore ๐‘› < ๐‘›
หœ holds. The di๏ฌ€erence ๐‘›
หœ โˆ’ ๐‘› has its minimum at the
หœ โˆ’ ๐‘› = 2๐›ผโˆ’4 (16 + ๐‘—) โˆ’ (2๐›ผโˆ’4 (16 + ๐‘—) โˆ’ 1) = 1.
upper end of ๐ผ๐‘—๐›ผ . There we have ๐‘›
This shows ๐‘› + 1 โ‰ค ๐‘›
หœ.
โ–ก
Let us focus on the ๏ฌ‚ops we need for ๐’ฎ, again. Lets have a look at the example
๐‘› = 2๐‘ โˆ’ 1. The original parameters (see equation (2.1)) for ๐’ฎ are ๐‘˜ = ๐‘ โˆ’ 5 and
๐‘š = 32. Accordingly we de๏ฌne ๐น (๐‘) := ๐น๐’ฎ (32, ๐‘ โˆ’ 5). Because 2๐‘ โˆ’ 1 โ‰ค 2๐‘ we can
add 1 zero row and column and use the lemma from the worst case. Now we get
the parameters ๐‘š = 8 and ๐‘˜ = ๐‘ โˆ’ 3 and de๏ฌne ๐นหœ (๐‘) := ๐น๐’ฎ (8, ๐‘ โˆ’ 3). To analyze
๐น and ๐นหœ relative to each other we have
4๐‘ โ‹… 16807 โˆ’ 7๐‘ โ‹… 11776
๐น (๐‘)
.
= ๐‘
๐‘Ÿ(๐‘) :=
4 โ‹… 16807 โˆ’ 7๐‘ โ‹… 10976
๐นหœ (๐‘)
Note that ๐‘Ÿ : {5, 6, . . .} โ†’ โ„ is monotonously decreasing in ๐‘ and has its maximum
at ๐‘ = 5. We get
๐‘Ÿ(5) = 336/311 โ‰ˆ 1.08039

lim ๐‘Ÿ(๐‘) = 11776/10976 โ‰ˆ 1.07289.

๐‘โ†’โˆž

Therefore we can say: In the best case the parameters of Strassen are approx.
8 % worse than the optimal parameters from the lemma in the worst case.
2.3. Average case analysis. With ๐”ผหœ
๐‘› we denote the expected value of ๐‘›
หœ . We
search for a relationship like ๐‘›
หœ โ‰ˆ ๐›พ๐‘› for ๐›พ โˆˆ โ„. That means ๐”ผ[หœ
๐‘›/๐‘›] = ๐›พ.
Theorem 2.4. For the parameters (2.1) of Strassen ๐‘š2๐‘˜ = ๐‘›
หœ we have
49
๐‘›.
๐”ผหœ
๐‘›=
48
๐›ผ
Proof. First we focus only on ๐ผ ๐›ผ . We write ๐”ผ๐›ผ := ๐”ผโˆฃ๐ผ ๐›ผ and ๐”ผ๐›ผ
๐‘— := ๐”ผโˆฃ๐ผ๐‘— for the
๐›ผ
๐›ผ
expected value on ๐ผ and ๐ผ๐‘— , resp. We have
16

๐”ผ๐›ผ ๐‘›
หœ=

16

16

1 โˆ‘ ๐›ผ
1 โˆ‘ ๐›ผ
1 โˆ‘
๐”ผ๐‘— ๐‘›
หœ=
๐‘›
หœ๐‘— =
(๐‘— + 16)2๐›ผโˆ’4 = 2๐›ผโˆ’5 โ‹… 49.
16 ๐‘—=1
16 ๐‘—=1
16 ๐‘—=1

Together with ๐”ผ๐›ผ ๐‘› = 21 [(2๐›ผ+1 โˆ’ 1) + 2๐›ผ ] = 2๐›ผ + 2๐›ผโˆ’1 โˆ’ 1/2, we get
[ ]
๐”ผ๐›ผ ๐‘›
หœ
2๐›ผโˆ’5 โ‹… 49
๐‘›
หœ
= ๐›ผ = ๐›ผ
.
๐œŒ(๐›ผ) := ๐”ผ๐›ผ
๐‘›
๐”ผ ๐‘›
2 + 2๐›ผโˆ’1 โˆ’ 1/2
โŠŽ๐‘˜
Now we want to calculate ๐”ผ๐‘˜ := ๐”ผโˆฃ๐‘ˆ (๐‘˜) [หœ
๐‘›/๐‘›], where ๐‘ˆ (๐‘˜) := ๐‘—=0 ๐ผ 4+๐‘— by using the
values ๐œŒ(๐‘—). Because of โˆฃ๐ผ 5 โˆฃ = 2โˆฃ๐ผ 4 โˆฃ and โˆฃ๐ผ 4 โˆช๐ผ 5 โˆฃ = 3โˆฃ๐ผ 4 โˆฃ we have ๐”ผ1 = 31 ๐œŒ(4)+ 32 ๐œŒ(5).
With the same argument we get
๐”ผ๐‘˜ =

4+๐‘˜
โˆ‘
๐‘—=4

๐›ฝ๐‘— ๐œŒ(๐‘—)

where

๐›ฝ๐‘— =

2๐‘—โˆ’4
.
โˆ’1

2๐‘˜+1

276

I. HEDTKE

Finally we have
( 4+๐‘˜
)
[ ]
โˆ‘
๐‘›
หœ
๐”ผ
= lim ๐”ผ๐‘˜ = lim
๐›ฝ๐‘— ๐œŒ(๐‘—)
๐‘˜โ†’โˆž
๐‘˜โ†’โˆž
๐‘›
๐‘—=4
(
)
4+๐‘˜
โˆ‘
49
49
22๐‘—โˆ’9
=
,
= lim
๐‘˜โ†’โˆž
2๐‘˜+1 โˆ’ 1 ๐‘—=4 2๐‘— + 2๐‘—โˆ’1 โˆ’ 1/2
48

what we intended to show.

Compared to the worst case (หœ
๐‘›โ‰ค
1
1 + 1/48 = 1 + 13 โ‹… 16
.

17
16 ๐‘›,

โ–ก

17/16 = 1 + 1/16), note that 49/48 =

3. Conclusion
Strassen used the parameters ๐‘š and ๐‘˜ in the form (2.1) to show that his matrix
multiplication algorithm needs less than 4.7๐‘›log2 7 ๏ฌ‚ops. We could show in this
paper, that these parameters cause an additional work of approx. 20 % in the worst
case in comparison to the optimal strategy for the worst case. This is the main
reason for the search for better parameters, like in [11].
References
[1] P. Dโ€™Alberto and A. Nicolau, Adaptive Winogradโ€™s Matrix Multiplications, ACM Trans.
Math. Softw. 36, 1, Article 3 (March 2009).
[2] D. H. Bailey, Extra High Speed Matrix Multiplication on the Cray-2, SIAM J. Sci. Stat.
Comput., Vol. 9, Co. 3, 603โ€“607, 1988.
[3] B. Boyer, J.-G. Dumas, C. Pernet, and W. Zhou, Memory e๏ฌƒcient scheduling of StrassenWinogradโ€™s matrix multiplication algorithm, International Symposium on Symbolic and Algebraic Computation 2009, Seฬoul.
[4] H. Cohn and C. Umans, A Group-theoretic Approach to Fast Matrix Multiplication, Proceedings of the 44th Annual Symposium on Foundations of Computer Science, 11-14 October
2003, Cambridge, MA, IEEE Computer Society, pp. 438-449.
[5] H. Cohn, R. Kleinberg, B. Szegedy and C. Umans, Group-theoretic Algorithms for Matrix
Multiplication, Proceedings of the 46th Annual Symposium on Foundations of Computer
Science, 23-25 October 2005, Pittsburgh, PA, IEEE Computer Society, pp. 379-388.
[6] D. Coppersmith and S. Winograd, Matrix Multiplication via Arithmetic Progressions, STOC
โ€™87: Proceedings of the nineteenth annual ACM symposium on Theory of Computing, 1987.
[7] C. C. Douglas, M. Heroux, G. Slishman and R. M. Smith, GEMMW: A Portable Level
3 BLAS Winograd Variant of Strassenโ€™s Matrixโ€“Matrix Multiply Algorithm, J. of Comp.
Physics, 110:1โ€“10, 1994.
[8] G. H. Golub and C. F. Van Loan, Matrix Computations, Third ed., The Johns Hopkins
University Press, Baltimore, MD, 1996.
[9] S. Huss-Lederman, E. M. Jacobson, J. R. Johnson, A. Tsao, and T. Turnbull, Implementation
of Strassenโ€™s Algorithm for Matrix Multiplication, 0-89791-854-1, 1996 IEEE
[10] IBM Engineering and Scienti๏ฌc Subroutine Library Guide and Reference, 1992. Order No.
SC23-0526.
[11] P. C. Fischer and R. L. Probert, E๏ฌƒcient Procedures for using Matrix Algorithms, Automata,
Languages and Programming โ€“ 2nd Colloquium, University of Saarbruฬˆcken, Lecture Notes
in Computer Science, 1974.
[12] R. L. Probert, On the additive complexity of matrix multiplication, SIAM J. Compu, 5:187โ€“
203, 1976.
[13] S. Robinson, Toward an Optimal Algorithm for Matrix Multiplication, SIAM News, Volume
38, Number 9, November 2005.
[14] V. Strassen, Gaussian Elimination is not Optimal, Numer. Math., 13:354โ€“356, 1969.
[15] S. Winograd, On Multiplication of 2 ร— 2 Matrices, Linear Algebra and its Applications,
4:381-388, 1971.

STRASSENโ€™S ALGORITHM FOR MATRICES OF ARBITRARY SIZE

Mathematical Institute, University of Jena, D-07737 Jena, Germany
E-mail address: Ivo.Hedtke@uni-jena.de

277

Dokumen yang terkait

AN ALIS IS YU RID IS PUT USAN BE B AS DAL AM P E RKAR A TIND AK P IDA NA P E NY E RTA AN M E L AK U K A N P R AK T IK K E DO K T E RA N YA NG M E N G A K IB ATK AN M ATINYA P AS IE N ( PUT USA N N O MOR: 9 0/PID.B /2011/ PN.MD O)

0 82 16

ANALISIS FAKTOR YANGMEMPENGARUHI FERTILITAS PASANGAN USIA SUBUR DI DESA SEMBORO KECAMATAN SEMBORO KABUPATEN JEMBER TAHUN 2011

2 53 20

EFEKTIVITAS PENDIDIKAN KESEHATAN TENTANG PERTOLONGAN PERTAMA PADA KECELAKAAN (P3K) TERHADAP SIKAP MASYARAKAT DALAM PENANGANAN KORBAN KECELAKAAN LALU LINTAS (Studi Di Wilayah RT 05 RW 04 Kelurahan Sukun Kota Malang)

45 393 31

FAKTOR ร‚โ€“ FAKTOR YANG MEMPENGARUHI PENYERAPAN TENAGA KERJA INDUSTRI PENGOLAHAN BESAR DAN MENENGAH PADA TINGKAT KABUPATEN / KOTA DI JAWA TIMUR TAHUN 2006 - 2011

1 35 26

A DISCOURSE ANALYSIS ON โ€œSPA: REGAIN BALANCE OF YOUR INNER AND OUTER BEAUTYโ€ IN THE JAKARTA POST ON 4 MARCH 2011

9 161 13

Pengaruh kualitas aktiva produktif dan non performing financing terhadap return on asset perbankan syariah (Studi Pada 3 Bank Umum Syariah Tahun 2011 โ€“ 2014)

6 101 0

Pengaruh pemahaman fiqh muamalat mahasiswa terhadap keputusan membeli produk fashion palsu (study pada mahasiswa angkatan 2011 & 2012 prodi muamalat fakultas syariah dan hukum UIN Syarif Hidayatullah Jakarta)

0 22 0

Pendidikan Agama Islam Untuk Kelas 3 SD Kelas 3 Suyanto Suyoto 2011

4 108 178

ANALISIS NOTA KESEPAHAMAN ANTARA BANK INDONESIA, POLRI, DAN KEJAKSAAN REPUBLIK INDONESIA TAHUN 2011 SEBAGAI MEKANISME PERCEPATAN PENANGANAN TINDAK PIDANA PERBANKAN KHUSUSNYA BANK INDONESIA SEBAGAI PIHAK PELAPOR

1 17 40

KOORDINASI OTORITAS JASA KEUANGAN (OJK) DENGAN LEMBAGA PENJAMIN SIMPANAN (LPS) DAN BANK INDONESIA (BI) DALAM UPAYA PENANGANAN BANK BERMASALAH BERDASARKAN UNDANG-UNDANG RI NOMOR 21 TAHUN 2011 TENTANG OTORITAS JASA KEUANGAN

3 32 52