Matrix multiplication over word-size modular rings using Bini’s approximate formula Brice Bo��� Jean-Guillaume D�m�� JNCF ���� � Novembre ����
Motivations/Goals – matrix multiplication over word size Z / p Z is a critical building block in exact linear algebra (matrix multiplication over Z , over GF ( q ) , Chinese remaindering,…) – perform faster matrix multiplication over Z / p Z (using fewer products) p. �
Motivations/Goals – matrix multiplication over word size Z / p Z is a critical building block in exact linear algebra (matrix multiplication over Z , over GF ( q ) , Chinese remaindering,…) – perform faster matrix multiplication over Z / p Z (using fewer products) p. �
Outline � Introduction � Approximate formula to Exact formula � Implementation and Timings p. �
Outline � Introduction � Approximate formula to Exact formula � Implementation and Timings p. �
Outline � Introduction � Approximate formula to Exact formula � Implementation and Timings p. �
Outline � Introduction � Approximate formula to Exact formula � Implementation and Timings p. �
– Complexity of n ω with ω ≈ 2 . 780 for ( 12 , 12 , 12 ) on 3 000 × 3 000 matrices. Bini’s approximate formula Facts – Multiplication of 3 × 2 and 2 × 2 matrices (noted ( 3 , 2 , 2 ) ) using 10 products. – Also (by duality): ( 2 , 3 , 2 ) and ( 2 , 2 , 3 ) multiplications. (compared to Strassen’s ω ≈ 2 . 807 ) – One call to Bini’s algorithm saves roughly 4 . 5 � operations ( vs. Strassen’s) Facts p. � – Computes C ε = A × B + εD ( ε ) .
– Complexity of n ω with ω ≈ 2 . 780 for ( 12 , 12 , 12 ) on 3 000 × 3 000 matrices. Bini’s approximate formula Facts – Multiplication of 3 × 2 and 2 × 2 matrices (noted ( 3 , 2 , 2 ) ) using 10 products. – Also (by duality): ( 2 , 3 , 2 ) and ( 2 , 2 , 3 ) multiplications. (compared to Strassen’s ω ≈ 2 . 807 ) – One call to Bini’s algorithm saves roughly 4 . 5 � operations ( vs. Strassen’s) Facts p. � – Computes C ε = A × B + εD ( ε ) .
– Complexity of n ω with ω ≈ 2 . 780 for ( 12 , 12 , 12 ) on 3 000 × 3 000 matrices. Bini’s approximate formula Facts – Multiplication of 3 × 2 and 2 × 2 matrices (noted ( 3 , 2 , 2 ) ) using 10 products. – Also (by duality): ( 2 , 3 , 2 ) and ( 2 , 2 , 3 ) multiplications. (compared to Strassen’s ω ≈ 2 . 807 ) – One call to Bini’s algorithm saves roughly 4 . 5 � operations ( vs. Strassen’s) Facts p. � – Computes C ε = A × B + εD ( ε ) .
on 3 000 × 3 000 matrices. Bini’s approximate formula Facts – Multiplication of 3 × 2 and 2 × 2 matrices (noted ( 3 , 2 , 2 ) ) using 10 products. – Also (by duality): ( 2 , 3 , 2 ) and ( 2 , 2 , 3 ) multiplications. (compared to Strassen’s ω ≈ 2 . 807 ) – One call to Bini’s algorithm saves roughly 4 . 5 � operations ( vs. Strassen’s) Facts p. � – Computes C ε = A × B + εD ( ε ) . – Complexity of n ω with ω ≈ 2 . 780 for ( 12 , 12 , 12 )
Bini’s approximate formula Facts – Multiplication of 3 × 2 and 2 × 2 matrices (noted ( 3 , 2 , 2 ) ) using 10 products. – Also (by duality): ( 2 , 3 , 2 ) and ( 2 , 2 , 3 ) multiplications. (compared to Strassen’s ω ≈ 2 . 807 ) – One call to Bini’s algorithm saves roughly 4 . 5 � operations ( vs. Strassen’s) Facts p. � – Computes C ε = A × B + εD ( ε ) . – Complexity of n ω with ω ≈ 2 . 780 for ( 12 , 12 , 12 ) on 3 000 × 3 000 matrices.
Bini’s approximate formula P 7 T 9 P 0 Algorithm P 2 P 3 P 4 P 5 P 6 P 8 T 7 P 9 C 11 C 12 C 21 C 22 C 31 C 32 Algorithm S 9 P 1 T 6 S 5 S 1 T 1 T 2 S 3 T 3 S 4 T 4 p. � S 6 T 5 ← A 11 + A 22 ← B 22 + ε ⋅ B 11 ← B 21 + B 22 ← A 32 + ε ⋅ A 31 ← B 11 + ε ⋅ B 21 ← A 22 + ε ⋅ A 12 ← B 21 − ε ⋅ B 11 ← A 11 + ε ⋅ A 12 ← B 22 + ε ⋅ B 12 ← A 21 + A 32 ← B 11 + ε ⋅ B 22 ← B 11 + B 12 ← A 21 + ε ⋅ A 31 ← B 12 − ε ⋅ B 22 B ← A 11 × B 22 ← S 1 × T 1 ← A 22 × T 2 ← S 3 × T 3 ← S 4 × T 4 ← S 5 × T 5 ← S 6 × T 6 ← A 21 × T 7 ← A 32 × B 11 ← S 9 × T 9 B ← ( P 1 − P 2 + P 4 − P 0 )/ ε ← ( P 5 − P 0 )/ ε ← P 4 − P 3 + P 6 ← P 1 − P 5 + P 9 ← ( P 3 − P 8 )/ ε ← ( P 6 − P 7 + P 9 − P 8 )/ ε
Bini’s approximate formula Dependencies Dependencies Symmetries ! p. �
Bini’s approximate formula Dependencies Dependencies Symmetries ! p. �
Bini’s approximate formula C11 X := e . A31 + A32 S3 13 C11 := C21 - C11 31 C22 Y := B11 + e . B21 T3 14 X := e . A12 + A22 S4 30 := C21 - C22 C31 11 Y := e . B11 + B22 T1 28 C31 := A21 * Y P7 C21 C22 := X * Y P1 29 C32 := C32 - C31 C32 12 32 := X * Y T7 := A21 + e . A31 35 C31 := (C31 - Y)/e C31 18 X S9 := (C21 + C11)/e 36 C32 := (C32 - Y)/e C32 Scheduling of the Algorithm – Only 2 temporaries! ( X and Y ) – Easy to make it inplace while overwriting the left or right operand C11 C11 P3 C21 15 Y := B21 - e . B11 T4 33 C21 := C21 - C31 16 17 C21 := X * Y P4 34 Y := A32 * B11 P8 10 := B11 + B12 Scheduling of the Algorithm 21 := X * Y P9 3 Y := e . B12 + B22 T5 C22 20 := C22 + C32 C22 4 C22 := X * Y P5 22 C32 S5 := A21 + A32 C11 # operation var # operation var 1 := A11 * B22 := A11 + e . A12 P0 19 Y := B12 - e . B22 T9 2 X X S6 Y 26 C21 := C21 + C31 C21 8 C11 := C11 + C31 C11 C32 P2 := C32 + C31 C32 9 X := A11 + A22 S1 27 25 := A22 * Y 5 6 C12 := (C22 - C11)/e C12 23 Y := B11 + e . B22 T6 Y C31 := B21 + B22 T2 24 C31 := X * Y P6 7 p. �
Bini’s approximate formula C11 X := e . A31 + A32 S3 13 C11 := C21 - C11 31 C22 Y := B11 + e . B21 T3 14 X := e . A12 + A22 S4 30 := C21 - C22 C31 11 Y := e . B11 + B22 T1 28 C31 := A21 * Y P7 C21 C22 := X * Y P1 29 C32 := C32 - C31 C32 12 32 := X * Y T7 := A21 + e . A31 35 C31 := (C31 - Y)/e C31 18 X S9 := (C21 + C11)/e 36 C32 := (C32 - Y)/e C32 Scheduling of the Algorithm – Only 2 temporaries! ( X and Y ) – Easy to make it inplace while overwriting the left or right operand C11 C11 P3 C21 15 Y := B21 - e . B11 T4 33 C21 := C21 - C31 16 17 C21 := X * Y P4 34 Y := A32 * B11 P8 10 := B11 + B12 Scheduling of the Algorithm 21 := X * Y P9 3 Y := e . B12 + B22 T5 C22 20 := C22 + C32 C22 4 C22 := X * Y P5 22 C32 S5 := A21 + A32 C11 # operation var # operation var 1 := A11 * B22 := A11 + e . A12 P0 19 Y := B12 - e . B22 T9 2 X X S6 Y 26 C21 := C21 + C31 C21 8 C11 := C11 + C31 C11 C32 P2 := C32 + C31 C32 9 X := A11 + A22 S1 27 25 := A22 * Y 5 6 C12 := (C22 - C11)/e C12 23 Y := B11 + e . B22 T6 Y C31 := B21 + B22 T2 24 C31 := X * Y P6 7 p. �
Bini’s approximate formula X := C21 - C11 C11 31 Y := B11 + e . B21 T3 14 := e . A12 + A22 13 S4 32 C31 := X * Y P3 15 Y C11 S3 T4 29 C31 := A21 * Y P7 11 C21 := X * Y P1 C32 := e . A31 + A32 := C32 - C31 C32 12 C22 := C21 - C22 C22 30 X := B21 - e . B11 33 T1 – Easy to make it inplace while overwriting the left or right operand S9 36 C32 := (C32 - Y)/e C32 Scheduling of the Algorithm – Only 2 temporaries! ( X and Y ) Brice Boyer, Jean-Guillaume Dumas, Clément Pernet, X and Wei Zhou. Memory efficient scheduling of Strassen-Winograd’s matrix multiplication algorithm. In Proceedings of the ���� Internat. Symp. Symbolic Algebraic Comput., ISSAC ’��, pages ��–��, New York, NY, USA, ����. ACM. := A21 + e . A31 18 C21 Y := C21 - C31 C21 16 C21 := X * Y P4 34 := A32 * B11 C31 P8 17 C11 := (C21 + C11)/e C11 35 C31 := (C31 - Y)/e 28 := e . B11 + B22 Scheduling of the Algorithm := C22 + C32 P9 3 Y := e . B12 + B22 T5 21 C22 C22 C32 4 C22 := X * Y P5 22 X := A21 + A32 := X * Y 20 5 C11 # operation var # operation var 1 := A11 * B22 S5 P0 19 Y := B12 - e . B22 T9 2 X := A11 + e . A12 S6 C12 Y C32 8 C11 := C11 + C31 C11 26 C32 := C32 + C31 9 := C21 + C31 X := A11 + A22 S1 27 Y := B11 + B12 T7 10 C21 C21 := (C22 - C11)/e := B21 + B22 C12 23 Y := B11 + e . B22 T6 6 Y T2 25 24 C31 := X * Y P6 7 C31 := A22 * Y P2 p. �
Outline � Introduction � Approximate formula to Exact formula � Implementation and Timings p. �
Getting an exact algorithm – Find a d + 1 scalars α i , and d + 1 pair-wise distinct scalars ε i . – Make sure ∑ d + 1 – 吁en ∑ d + 1 p. � – Let d = deg ε ( εD ( ε )) . i = 1 α i = 1 and for j = 1 , … , d that ∑ d + 1 i = 1 α i ε j i = 0 . i = 1 α i C ε i = A × B
Getting an exact algorithm – Find a d + 1 scalars α i , and d + 1 pair-wise distinct scalars ε i . – Make sure ∑ d + 1 – 吁en ∑ d + 1 D. Bini. Relations between exact and approximate bilinear algorithms. applications. Calcolo, ��:��–��, ����. ��.����/BF��������. p. � – Let d = deg ε ( εD ( ε )) . i = 1 α i = 1 and for j = 1 , … , d that ∑ d + 1 i = 1 α i ε j i = 0 . i = 1 α i C ε i = A × B
ε 2 ≈ 0 . ε 2 = 0 . Using Only One Call Ideas – Only “one recursive” call – for a double : ε = 2 − 27 – Modulo p : ε = p = 0 Ideas p. �
Recommend
More recommend