Introduction and Definitions The τ -theorem Pan’s aggregation tables and the τ -theorem Software Implementation Conclusion Fast Matrix Product Algorithms: From Theory To Practice Thomas Sibut-Pinote Inria, ´ Ecole Polytechnique, France ´ Eric Schost University of Waterloo, ON,Canada November 2nd, 2015 1/21 T. Sibut-Pinote , ´ E. Schost Fast Matrix Product Algorithms: From Theory To Practice
Motivation Complexity of matrix product ⇒ complexity of linear algebra; � θ | it takes n θ operations to multiply in M n ( K ) � ω = inf ∈ [2 , 3]; Strassen ’69 : ω < 2 . 81 (used in practice); Le Gall ’14 : ω < 2 . 3728639 (theoretical). 2/21 T. Sibut-Pinote , ´ E. Schost Fast Matrix Product Algorithms: From Theory To Practice
Motivation Complexity of matrix product ⇒ complexity of linear algebra; � θ | it takes n θ operations to multiply in M n ( K ) � ω = inf ∈ [2 , 3]; Strassen ’69 : ω < 2 . 81 (used in practice); Le Gall ’14 : ω < 2 . 3728639 (theoretical). Can we bridge the gap a little? 2/21 T. Sibut-Pinote , ´ E. Schost Fast Matrix Product Algorithms: From Theory To Practice
Problem Statement Let � m , n , p � denote the bilinear map: M m , n ( K ) × M n , p ( K ) − → M m , p ( K ) ( A , B ) �→ A · B . Goal: determine the arithmetic complexity of � m , n , p � . 3/21 T. Sibut-Pinote , ´ E. Schost Fast Matrix Product Algorithms: From Theory To Practice
Problem Statement Let � m , n , p � denote the bilinear map: M m , n ( K ) × M n , p ( K ) − → M m , p ( K ) ( A , B ) �→ A · B . Goal: determine the arithmetic complexity of � m , n , p � . Known: naive algorithm in mnp operations: n � ∀ i ∈ � 1 , m � , ∀ j ∈ � 1 , p � , [ AB ] i , j = a i , k b k , j . k =1 3/21 T. Sibut-Pinote , ´ E. Schost Fast Matrix Product Algorithms: From Theory To Practice
Problem Statement Let � m , n , p � denote the bilinear map: M m , n ( K ) × M n , p ( K ) − → M m , p ( K ) ( A , B ) �→ A · B . Goal: determine the arithmetic complexity of � m , n , p � . Known: naive algorithm in mnp operations: n � ∀ i ∈ � 1 , m � , ∀ j ∈ � 1 , p � , [ AB ] i , j = a i , k b k , j . k =1 Can we do better? 3/21 T. Sibut-Pinote , ´ E. Schost Fast Matrix Product Algorithms: From Theory To Practice
Strassen’s Algorithm Strassen’s algorithm: � 2 , 2 , 2 � in 7 multiplications (instead of 2 · 2 · 2 = 8): α 1 = ( a 1 , 2 − a 2 , 2 ) , β 1 = ( b 2 , 1 + b 2 , 2 ) , p 1 = α 1 β 1 α 2 = ( a 2 , 1 − a 1 , 1 ) , β 2 = ( b 1 , 2 + b 1 , 1 ) , p 2 = α 2 β 2 α 3 = a 1 , 1 , β 3 = ( b 1 , 2 − b 2 , 2 ) , p 3 = α 3 β 3 α 4 = a 2 , 2 , β 4 = ( b 2 , 1 − b 1 , 1 ) , p 4 = α 4 β 4 α 5 = ( a 2 , 1 + a 2 , 2 ) , β 5 = b 1 , 1 , p 5 = α 5 β 5 α 6 = ( a 1 , 2 + a 1 , 1 ) , β 6 = b 2 , 2 , p 6 = α 6 β 6 α 7 = ( a 1 , 1 + a 2 , 2 ) , β 7 = ( b 1 , 1 + b 2 , 2 ) , p 7 = α 7 β 7 4/21 T. Sibut-Pinote , ´ E. Schost Fast Matrix Product Algorithms: From Theory To Practice
Strassen’s Algorithm Strassen’s algorithm: � 2 , 2 , 2 � in 7 multiplications (instead of 2 · 2 · 2 = 8): c 1 , 1 = p 1 + p 4 − p 6 α 1 = ( a 1 , 2 − a 2 , 2 ) , β 1 = ( b 2 , 1 + b 2 , 2 ) , p 1 = α 1 β 1 α 2 = ( a 2 , 1 − a 1 , 1 ) , β 2 = ( b 1 , 2 + b 1 , 1 ) , p 2 = α 2 β 2 c 1 , 2 = p 4 + p 5 α 3 = a 1 , 1 , β 3 = ( b 1 , 2 − b 2 , 2 ) , p 3 = α 3 β 3 c 2 , 1 = p 3 + p 6 α 4 = a 2 , 2 , β 4 = ( b 2 , 1 − b 1 , 1 ) , p 4 = α 4 β 4 c 2 , 2 = p 2 + p 3 − p 5 + p 7 α 5 = ( a 2 , 1 + a 2 , 2 ) , β 5 = b 1 , 1 , p 5 = α 5 β 5 � � c 1 , 1 c 1 , 2 α 6 = ( a 1 , 2 + a 1 , 1 ) , β 6 = b 2 , 2 , p 6 = α 6 β 6 C = c 2 , 1 c 2 , 2 α 7 = ( a 1 , 1 + a 2 , 2 ) , β 7 = ( b 1 , 1 + b 2 , 2 ) , p 7 = α 7 β 7 4/21 T. Sibut-Pinote , ´ E. Schost Fast Matrix Product Algorithms: From Theory To Practice
Strassen’s Algorithm Strassen’s algorithm: � 2 , 2 , 2 � in 7 multiplications (instead of 2 · 2 · 2 = 8): c 1 , 1 = p 1 + p 4 − p 6 α 1 = ( a 1 , 2 − a 2 , 2 ) , β 1 = ( b 2 , 1 + b 2 , 2 ) , p 1 = α 1 β 1 α 2 = ( a 2 , 1 − a 1 , 1 ) , β 2 = ( b 1 , 2 + b 1 , 1 ) , p 2 = α 2 β 2 c 1 , 2 = p 4 + p 5 α 3 = a 1 , 1 , β 3 = ( b 1 , 2 − b 2 , 2 ) , p 3 = α 3 β 3 c 2 , 1 = p 3 + p 6 α 4 = a 2 , 2 , β 4 = ( b 2 , 1 − b 1 , 1 ) , p 4 = α 4 β 4 c 2 , 2 = p 2 + p 3 − p 5 + p 7 α 5 = ( a 2 , 1 + a 2 , 2 ) , β 5 = b 1 , 1 , p 5 = α 5 β 5 � � c 1 , 1 c 1 , 2 α 6 = ( a 1 , 2 + a 1 , 1 ) , β 6 = b 2 , 2 , p 6 = α 6 β 6 C = c 2 , 1 c 2 , 2 α 7 = ( a 1 , 1 + a 2 , 2 ) , β 7 = ( b 1 , 1 + b 2 , 2 ) , p 7 = α 7 β 7 Observe: C = p 1 γ 1 + p 2 γ 2 + p 3 γ 3 + p 4 γ 4 + p 5 γ 5 + p 6 γ 6 + p 7 γ 7 . where γ 1 = E 1 , 1 , γ 2 = E 2 , 2 , γ 3 = E 2 , 1 + E 2 , 2 , γ 4 = E 1 , 1 + E 1 , 2 , γ 5 = E 1 , 2 − E 2 , 2 , γ 6 = E 2 , 1 − E 2 , 2 , γ 7 = E 2 , 2 E i , j canonical basis 4/21 T. Sibut-Pinote , ´ E. Schost Fast Matrix Product Algorithms: From Theory To Practice
Strassen’s Algorithm Strassen’s algorithm: � 2 , 2 , 2 � in 7 multiplications (instead of 2 · 2 · 2 = 8): c 1 , 1 = p 1 + p 4 − p 6 α 1 = ( a 1 , 2 − a 2 , 2 ) , β 1 = ( b 2 , 1 + b 2 , 2 ) , p 1 = α 1 β 1 α 2 = ( a 2 , 1 − a 1 , 1 ) , β 2 = ( b 1 , 2 + b 1 , 1 ) , p 2 = α 2 β 2 c 1 , 2 = p 4 + p 5 α 3 = a 1 , 1 , β 3 = ( b 1 , 2 − b 2 , 2 ) , p 3 = α 3 β 3 c 2 , 1 = p 3 + p 6 α 4 = a 2 , 2 , β 4 = ( b 2 , 1 − b 1 , 1 ) , p 4 = α 4 β 4 c 2 , 2 = p 2 + p 3 − p 5 + p 7 α 5 = ( a 2 , 1 + a 2 , 2 ) , β 5 = b 1 , 1 , p 5 = α 5 β 5 � � c 1 , 1 c 1 , 2 α 6 = ( a 1 , 2 + a 1 , 1 ) , β 6 = b 2 , 2 , p 6 = α 6 β 6 C = c 2 , 1 c 2 , 2 α 7 = ( a 1 , 1 + a 2 , 2 ) , β 7 = ( b 1 , 1 + b 2 , 2 ) , p 7 = α 7 β 7 Observe: C = p 1 γ 1 + p 2 γ 2 + p 3 γ 3 + p 4 γ 4 + p 5 γ 5 + p 6 γ 6 + p 7 γ 7 . where γ 1 = E 1 , 1 , γ 2 = E 2 , 2 , γ 3 = E 2 , 1 + E 2 , 2 , γ 4 = E 1 , 1 + E 1 , 2 , γ 5 = E 1 , 2 − E 2 , 2 , γ 6 = E 2 , 1 − E 2 , 2 , γ 7 = E 2 , 2 E i , j canonical basis 7 � α i ⊗ β i ⊗ γ i . Tensor notation: i =1 4/21 T. Sibut-Pinote , ´ E. Schost Fast Matrix Product Algorithms: From Theory To Practice
Tensors and algorithms General tensor notation identified with a bilinear map: m p n � � � � m , n , p � = a i , k ⊗ b k , j ⊗ c i , j . i =1 j =1 k =1 r � Representing � m , n , p � as α i ⊗ β i ⊗ γ i gives an algorithm . i =1 Example: The elementary tensor ( a 1 , 2 + a 3 , 5 ) ⊗ b 2 , 4 ⊗ ( c 1 , 4 + c 2 , 4 ) reads as the algorithm tmp ← ( a 1 , 2 + a 3 , 5 ) · b 2 , 4 c 1 , 4 ← tmp c 2 , 4 ← tmp 5/21 T. Sibut-Pinote , ´ E. Schost Fast Matrix Product Algorithms: From Theory To Practice
Composition t ⊗ t ′ : computes the composition of two tensors. To multiply A of size ( mm ′ , nn ′ ) by B of size ( nn ′ , pp ′ ), decompose A and B into blocks: A 1 , 1 · · · A 1 , n B 1 , 1 · · · B 1 , p . . . . . . . . A = B = , . . . . · · · · · · A m , 1 A m , n B n , 1 B n , p where A i , j of size ( m ′ , n ′ ), B j , k of size ( n ′ , p ′ ). If t = � m , n , p � and t ′ = � m ′ , n ′ , p ′ � : t ⊗ t ′ ≃ � mm ′ , nn ′ , pp ′ � . 6/21 T. Sibut-Pinote , ´ E. Schost Fast Matrix Product Algorithms: From Theory To Practice
Composition t ⊗ t ′ : computes the composition of two tensors. To multiply A of size ( mm ′ , nn ′ ) by B of size ( nn ′ , pp ′ ), decompose A and B into blocks: A 1 , 1 · · · A 1 , n B 1 , 1 · · · B 1 , p . . . . . . . . A = B = , . . . . · · · · · · A m , 1 A m , n B n , 1 B n , p where A i , j of size ( m ′ , n ′ ), B j , k of size ( n ′ , p ′ ). If t = � m , n , p � and t ′ = � m ′ , n ′ , p ′ � : t ⊗ t ′ ≃ � mm ′ , nn ′ , pp ′ � . Also set t ⊗ k = t ⊗ t ⊗· · · ⊗ t ≃ � m k , n k , p k � . � �� � k times 6/21 T. Sibut-Pinote , ´ E. Schost Fast Matrix Product Algorithms: From Theory To Practice
Direct Sum of Tensors t ⊕ t ′ : computes two independent matrix products in parallel. We will denote s ⊙ t for t ⊕ t ⊕· · · ⊕ t . � �� � s times 7/21 T. Sibut-Pinote , ´ E. Schost Fast Matrix Product Algorithms: From Theory To Practice
Rank and ω Definition (Rank of a Tensor t ) � � r � R ( t ) := min r | t can be written as x i ⊗ y i ⊗ z i i =1 8/21 T. Sibut-Pinote , ´ E. Schost Fast Matrix Product Algorithms: From Theory To Practice
Rank and ω Definition (Rank of a Tensor t ) � � r � R ( t ) := min r | t can be written as x i ⊗ y i ⊗ z i i =1 R ( � m , n , p � ) is the minimal number of multiplications for � m , n , p � . 8/21 T. Sibut-Pinote , ´ E. Schost Fast Matrix Product Algorithms: From Theory To Practice
Rank and ω Definition (Rank of a Tensor t ) � � r � R ( t ) := min r | t can be written as x i ⊗ y i ⊗ z i i =1 R ( � m , n , p � ) is the minimal number of multiplications for � m , n , p � . Definition (Linear Algebra Exponent) ω := inf { τ | There exists an algorithm to multiply n × n matrices in O ( n τ ) additions and multiplications } ( ∈ [2 , 3]) 8/21 T. Sibut-Pinote , ´ E. Schost Fast Matrix Product Algorithms: From Theory To Practice
Recommend
More recommend