Fast Matrix Product Algorithms: From Theory To Practice Thomas - PowerPoint PPT Presentation

Introduction and Definitions The τ -theorem Pan’s aggregation tables and the τ -theorem Software Implementation Conclusion Fast Matrix Product Algorithms: From Theory To Practice Thomas Sibut-Pinote Inria, ´ Ecole Polytechnique, France ´ Eric Schost University of Waterloo, ON,Canada November 2nd, 2015 1/21 T. Sibut-Pinote , ´ E. Schost Fast Matrix Product Algorithms: From Theory To Practice

Motivation Complexity of matrix product ⇒ complexity of linear algebra; � θ | it takes n θ operations to multiply in M n ( K ) � ω = inf ∈ [2 , 3]; Strassen ’69 : ω < 2 . 81 (used in practice); Le Gall ’14 : ω < 2 . 3728639 (theoretical). 2/21 T. Sibut-Pinote , ´ E. Schost Fast Matrix Product Algorithms: From Theory To Practice

Motivation Complexity of matrix product ⇒ complexity of linear algebra; � θ | it takes n θ operations to multiply in M n ( K ) � ω = inf ∈ [2 , 3]; Strassen ’69 : ω < 2 . 81 (used in practice); Le Gall ’14 : ω < 2 . 3728639 (theoretical). Can we bridge the gap a little? 2/21 T. Sibut-Pinote , ´ E. Schost Fast Matrix Product Algorithms: From Theory To Practice

Problem Statement Let � m , n , p � denote the bilinear map: M m , n ( K ) × M n , p ( K ) − → M m , p ( K ) ( A , B ) �→ A · B . Goal: determine the arithmetic complexity of � m , n , p � . 3/21 T. Sibut-Pinote , ´ E. Schost Fast Matrix Product Algorithms: From Theory To Practice

Problem Statement Let � m , n , p � denote the bilinear map: M m , n ( K ) × M n , p ( K ) − → M m , p ( K ) ( A , B ) �→ A · B . Goal: determine the arithmetic complexity of � m , n , p � . Known: naive algorithm in mnp operations: n � ∀ i ∈ � 1 , m � , ∀ j ∈ � 1 , p � , [ AB ] i , j = a i , k b k , j . k =1 3/21 T. Sibut-Pinote , ´ E. Schost Fast Matrix Product Algorithms: From Theory To Practice

Problem Statement Let � m , n , p � denote the bilinear map: M m , n ( K ) × M n , p ( K ) − → M m , p ( K ) ( A , B ) �→ A · B . Goal: determine the arithmetic complexity of � m , n , p � . Known: naive algorithm in mnp operations: n � ∀ i ∈ � 1 , m � , ∀ j ∈ � 1 , p � , [ AB ] i , j = a i , k b k , j . k =1 Can we do better? 3/21 T. Sibut-Pinote , ´ E. Schost Fast Matrix Product Algorithms: From Theory To Practice

Strassen’s Algorithm Strassen’s algorithm: � 2 , 2 , 2 � in 7 multiplications (instead of 2 · 2 · 2 = 8): α 1 = ( a 1 , 2 − a 2 , 2 ) , β 1 = ( b 2 , 1 + b 2 , 2 ) , p 1 = α 1 β 1 α 2 = ( a 2 , 1 − a 1 , 1 ) , β 2 = ( b 1 , 2 + b 1 , 1 ) , p 2 = α 2 β 2 α 3 = a 1 , 1 , β 3 = ( b 1 , 2 − b 2 , 2 ) , p 3 = α 3 β 3 α 4 = a 2 , 2 , β 4 = ( b 2 , 1 − b 1 , 1 ) , p 4 = α 4 β 4 α 5 = ( a 2 , 1 + a 2 , 2 ) , β 5 = b 1 , 1 , p 5 = α 5 β 5 α 6 = ( a 1 , 2 + a 1 , 1 ) , β 6 = b 2 , 2 , p 6 = α 6 β 6 α 7 = ( a 1 , 1 + a 2 , 2 ) , β 7 = ( b 1 , 1 + b 2 , 2 ) , p 7 = α 7 β 7 4/21 T. Sibut-Pinote , ´ E. Schost Fast Matrix Product Algorithms: From Theory To Practice

Strassen’s Algorithm Strassen’s algorithm: � 2 , 2 , 2 � in 7 multiplications (instead of 2 · 2 · 2 = 8): c 1 , 1 = p 1 + p 4 − p 6 α 1 = ( a 1 , 2 − a 2 , 2 ) , β 1 = ( b 2 , 1 + b 2 , 2 ) , p 1 = α 1 β 1 α 2 = ( a 2 , 1 − a 1 , 1 ) , β 2 = ( b 1 , 2 + b 1 , 1 ) , p 2 = α 2 β 2 c 1 , 2 = p 4 + p 5 α 3 = a 1 , 1 , β 3 = ( b 1 , 2 − b 2 , 2 ) , p 3 = α 3 β 3 c 2 , 1 = p 3 + p 6 α 4 = a 2 , 2 , β 4 = ( b 2 , 1 − b 1 , 1 ) , p 4 = α 4 β 4 c 2 , 2 = p 2 + p 3 − p 5 + p 7 α 5 = ( a 2 , 1 + a 2 , 2 ) , β 5 = b 1 , 1 , p 5 = α 5 β 5 � � c 1 , 1 c 1 , 2 α 6 = ( a 1 , 2 + a 1 , 1 ) , β 6 = b 2 , 2 , p 6 = α 6 β 6 C = c 2 , 1 c 2 , 2 α 7 = ( a 1 , 1 + a 2 , 2 ) , β 7 = ( b 1 , 1 + b 2 , 2 ) , p 7 = α 7 β 7 4/21 T. Sibut-Pinote , ´ E. Schost Fast Matrix Product Algorithms: From Theory To Practice

Strassen’s Algorithm Strassen’s algorithm: � 2 , 2 , 2 � in 7 multiplications (instead of 2 · 2 · 2 = 8): c 1 , 1 = p 1 + p 4 − p 6 α 1 = ( a 1 , 2 − a 2 , 2 ) , β 1 = ( b 2 , 1 + b 2 , 2 ) , p 1 = α 1 β 1 α 2 = ( a 2 , 1 − a 1 , 1 ) , β 2 = ( b 1 , 2 + b 1 , 1 ) , p 2 = α 2 β 2 c 1 , 2 = p 4 + p 5 α 3 = a 1 , 1 , β 3 = ( b 1 , 2 − b 2 , 2 ) , p 3 = α 3 β 3 c 2 , 1 = p 3 + p 6 α 4 = a 2 , 2 , β 4 = ( b 2 , 1 − b 1 , 1 ) , p 4 = α 4 β 4 c 2 , 2 = p 2 + p 3 − p 5 + p 7 α 5 = ( a 2 , 1 + a 2 , 2 ) , β 5 = b 1 , 1 , p 5 = α 5 β 5 � � c 1 , 1 c 1 , 2 α 6 = ( a 1 , 2 + a 1 , 1 ) , β 6 = b 2 , 2 , p 6 = α 6 β 6 C = c 2 , 1 c 2 , 2 α 7 = ( a 1 , 1 + a 2 , 2 ) , β 7 = ( b 1 , 1 + b 2 , 2 ) , p 7 = α 7 β 7 Observe: C = p 1 γ 1 + p 2 γ 2 + p 3 γ 3 + p 4 γ 4 + p 5 γ 5 + p 6 γ 6 + p 7 γ 7 . where γ 1 = E 1 , 1 , γ 2 = E 2 , 2 , γ 3 = E 2 , 1 + E 2 , 2 , γ 4 = E 1 , 1 + E 1 , 2 , γ 5 = E 1 , 2 − E 2 , 2 , γ 6 = E 2 , 1 − E 2 , 2 , γ 7 = E 2 , 2 E i , j canonical basis 4/21 T. Sibut-Pinote , ´ E. Schost Fast Matrix Product Algorithms: From Theory To Practice

Strassen’s Algorithm Strassen’s algorithm: � 2 , 2 , 2 � in 7 multiplications (instead of 2 · 2 · 2 = 8): c 1 , 1 = p 1 + p 4 − p 6 α 1 = ( a 1 , 2 − a 2 , 2 ) , β 1 = ( b 2 , 1 + b 2 , 2 ) , p 1 = α 1 β 1 α 2 = ( a 2 , 1 − a 1 , 1 ) , β 2 = ( b 1 , 2 + b 1 , 1 ) , p 2 = α 2 β 2 c 1 , 2 = p 4 + p 5 α 3 = a 1 , 1 , β 3 = ( b 1 , 2 − b 2 , 2 ) , p 3 = α 3 β 3 c 2 , 1 = p 3 + p 6 α 4 = a 2 , 2 , β 4 = ( b 2 , 1 − b 1 , 1 ) , p 4 = α 4 β 4 c 2 , 2 = p 2 + p 3 − p 5 + p 7 α 5 = ( a 2 , 1 + a 2 , 2 ) , β 5 = b 1 , 1 , p 5 = α 5 β 5 � � c 1 , 1 c 1 , 2 α 6 = ( a 1 , 2 + a 1 , 1 ) , β 6 = b 2 , 2 , p 6 = α 6 β 6 C = c 2 , 1 c 2 , 2 α 7 = ( a 1 , 1 + a 2 , 2 ) , β 7 = ( b 1 , 1 + b 2 , 2 ) , p 7 = α 7 β 7 Observe: C = p 1 γ 1 + p 2 γ 2 + p 3 γ 3 + p 4 γ 4 + p 5 γ 5 + p 6 γ 6 + p 7 γ 7 . where γ 1 = E 1 , 1 , γ 2 = E 2 , 2 , γ 3 = E 2 , 1 + E 2 , 2 , γ 4 = E 1 , 1 + E 1 , 2 , γ 5 = E 1 , 2 − E 2 , 2 , γ 6 = E 2 , 1 − E 2 , 2 , γ 7 = E 2 , 2 E i , j canonical basis 7 � α i ⊗ β i ⊗ γ i . Tensor notation: i =1 4/21 T. Sibut-Pinote , ´ E. Schost Fast Matrix Product Algorithms: From Theory To Practice

Tensors and algorithms General tensor notation identified with a bilinear map: m p n � � � � m , n , p � = a i , k ⊗ b k , j ⊗ c i , j . i =1 j =1 k =1 r � Representing � m , n , p � as α i ⊗ β i ⊗ γ i gives an algorithm . i =1 Example: The elementary tensor ( a 1 , 2 + a 3 , 5 ) ⊗ b 2 , 4 ⊗ ( c 1 , 4 + c 2 , 4 ) reads as the algorithm tmp ← ( a 1 , 2 + a 3 , 5 ) · b 2 , 4 c 1 , 4 ← tmp c 2 , 4 ← tmp 5/21 T. Sibut-Pinote , ´ E. Schost Fast Matrix Product Algorithms: From Theory To Practice

Composition t ⊗ t ′ : computes the composition of two tensors. To multiply A of size ( mm ′ , nn ′ ) by B of size ( nn ′ , pp ′ ), decompose A and B into blocks:     A 1 , 1 · · · A 1 , n B 1 , 1 · · · B 1 , p . . . .  . .   . .  A = B =  , . . . .    · · · · · · A m , 1 A m , n B n , 1 B n , p where A i , j of size ( m ′ , n ′ ), B j , k of size ( n ′ , p ′ ). If t = � m , n , p � and t ′ = � m ′ , n ′ , p ′ � : t ⊗ t ′ ≃ � mm ′ , nn ′ , pp ′ � . 6/21 T. Sibut-Pinote , ´ E. Schost Fast Matrix Product Algorithms: From Theory To Practice

Composition t ⊗ t ′ : computes the composition of two tensors. To multiply A of size ( mm ′ , nn ′ ) by B of size ( nn ′ , pp ′ ), decompose A and B into blocks:     A 1 , 1 · · · A 1 , n B 1 , 1 · · · B 1 , p . . . .  . .   . .  A = B =  , . . . .    · · · · · · A m , 1 A m , n B n , 1 B n , p where A i , j of size ( m ′ , n ′ ), B j , k of size ( n ′ , p ′ ). If t = � m , n , p � and t ′ = � m ′ , n ′ , p ′ � : t ⊗ t ′ ≃ � mm ′ , nn ′ , pp ′ � . Also set t ⊗ k = t ⊗ t ⊗· · · ⊗ t ≃ � m k , n k , p k � . � �� k times 6/21 T. Sibut-Pinote , ´ E. Schost Fast Matrix Product Algorithms: From Theory To Practice

Direct Sum of Tensors t ⊕ t ′ : computes two independent matrix products in parallel. We will denote s ⊙ t for t ⊕ t ⊕· · · ⊕ t . � �� s times 7/21 T. Sibut-Pinote , ´ E. Schost Fast Matrix Product Algorithms: From Theory To Practice

Rank and ω Definition (Rank of a Tensor t ) � � r � R ( t ) := min r | t can be written as x i ⊗ y i ⊗ z i i =1 8/21 T. Sibut-Pinote , ´ E. Schost Fast Matrix Product Algorithms: From Theory To Practice

Rank and ω Definition (Rank of a Tensor t ) � � r � R ( t ) := min r | t can be written as x i ⊗ y i ⊗ z i i =1 R ( � m , n , p � ) is the minimal number of multiplications for � m , n , p � . 8/21 T. Sibut-Pinote , ´ E. Schost Fast Matrix Product Algorithms: From Theory To Practice

Rank and ω Definition (Rank of a Tensor t ) � � r � R ( t ) := min r | t can be written as x i ⊗ y i ⊗ z i i =1 R ( � m , n , p � ) is the minimal number of multiplications for � m , n , p � . Definition (Linear Algebra Exponent) ω := inf { τ | There exists an algorithm to multiply n × n matrices in O ( n τ ) additions and multiplications } ( ∈ [2 , 3]) 8/21 T. Sibut-Pinote , ´ E. Schost Fast Matrix Product Algorithms: From Theory To Practice

Fast Matrix Product Algorithms: From Theory To Practice Thomas - PowerPoint PPT Presentation

Introduction and Definitions The -theorem Pans aggregation tables and the -theorem Software Implementation Conclusion Fast Matrix Product Algorithms: From Theory To Practice Thomas Sibut-Pinote Inria, Ecole Polytechnique, France

Product Section Product Section New Product Introduction New Product Introduction Product

Matrix Multiplication Matrix Multiplication via Matrix-Vector Mult Defn. If matrix A is m n

Parallel Linear Algebra Our goals: Fast and efficient parallel algorithms for the matrix-vector

[3] The Matrix What is a matrix? Traditional answer Neo: What is the Matrix? Trinity: The answer

+ Design of Parallel Algorithms Parallel Dense Matrix Algorithms + Topic Overview n

Parallel Numerical Algorithms Chapter 3 Dense Linear Systems Section 3.1 Vector and

Building an IoT Platform with Matrix matthew@matrix.org http://www.matrix.org What is Matrix?

Introductory Matrix Operations Matrix Entries Defn. For matrix A , notation a ij means the en-

Gov 2000: 10. Multiple Regression in Matrix Form Matthew Blackwell Fall 2016 1 / 64 1. Matrix

Liberating Communication with Matrix matthew@matrix.org http://www.matrix.org What is Matrix?

Fast algorithms: from type theory to number theory Luca De Feo INRIA Saclay, Projet TANC October

Being a METS Startup Fast Failure; Fast Reward November 2016 Fast Failure; Fast Reward

Matrix Multiplication Matrix multiplication is an operation with properties quite different from

I/O Lower Bounds and Algorithms for Matrix-Matrix Multiplication Tyler M. Smith July 5, 2017 1

Matrix-chain multiplication Carola Wenk 1 CMPS 6610 Algorithms Matrix-chain multiplication

Introduction to Parallel Computing George Karypis Dense Matrix Algorithms Outline Focus on

Kronecker coefficients: bounds and complexity Igor Pak, UCLA Triangle Lectures in Combinatorics,

Efficiency and Computational Complexity (Part 2) Rate of growth of functions

I-Complexity and Discrete Towards Precise . . . Derivative of Logarithms: Our Result Proof A

On the Average Complexity of the k -Level EuroCG 2020, W urzburg Raphael Steiner joint work

The Exact Round Complexity of Secure Computation Antigoni Polychroniadou (Aarhus University)

Models of complexity growth and random quantum circuits Nick Hunter-Jones Perimeter Institute

On the Complexity of Computing the k -Metric Dimension of Graphs ISMAEL GONZALEZ YERO Department

Graph Streaming and Sketching Lecture 19 Nov 5, 2020 Chandra (UIUC) CS498ABD 1 Fall 2020 1