Sparse Computations and Multi-BSP Albert-Jan Yzelman October 11, - PowerPoint PPT Presentation

Sparse Computations and Multi-BSP Sparse Computations and Multi-BSP Albert-Jan Yzelman October 11, 2016 Parallel Computing & Big Data Huawei Technologies France Albert-Jan Yzelman

Sparse Computations and Multi-BSP BSP BSP machine = { sequential processor } + interconnect The machine is described entirely by ( p , g , L ): strobing synchronisation, homogeneous processing, uniform full-duplex network, Albert-Jan Yzelman

Sparse Computations and Multi-BSP BSP BSP algorithm: strobing barriers full overlap h -relation bottlenecks: max s { sent s , recv s } work balance L. G. Valiant, A bridging model for parallel computation , CACM, 1990 Albert-Jan Yzelman

Sparse Computations and Multi-BSP BSP BSP cost: w (0) w (1) h (1) T p = max + L + max { max + L , max s g + L } + . . . s s s s s Separation of computation vs. communication. L. G. Valiant, A bridging model for parallel computation , CACM, 1990 Albert-Jan Yzelman

Sparse Computations and Multi-BSP BSP BSP cost: w (0) w (1) h (1) T p = max + L + max { max + L , max s g + L } + . . . s s s s s Separation of algorithm vs. hardware. L. G. Valiant, A bridging model for parallel computation , CACM, 1990 Albert-Jan Yzelman

Sparse Computations and Multi-BSP Immortal algorithms The BSP paradigm, allows the design of immortal algorithms : given a problem to compute given a BSP computer ( p , g , l ) find the BSP algorithm that attains provably minimal cost. E.g., fast Fourier transforms, matrix-matrix multiplication. Thinking in Sync : the Bulk-Synchronous Parallel approach to large-scale computing. Bisseling and Yzelman, ACM Hot Topic ’16. http://www.computingreviews.com/hottopic/hottopic_essay.cfm?htname=BSP Albert-Jan Yzelman

Sparse Computations and Multi-BSP BSP sparse matrix–vector multiplication Variables A s , x s , y s are local versions of the global variables A , x , y distributed according to π A , π x , π y . Albert-Jan Yzelman

Sparse Computations and Multi-BSP BSP sparse matrix–vector multiplication Variables A s , x s , y s are local versions of the global variables A , x , y distributed according to π A , π x , π y . 1: for j | ∃ a ij � = 0 ∈ A s and π x ( j ) � = s do get x π x ( j ) , j 2: 3: sync { execute fan-out } Albert-Jan Yzelman

Sparse Computations and Multi-BSP BSP sparse matrix–vector multiplication Variables A s , x s , y s are local versions of the global variables A , x , y distributed according to π A , π x , π y . 1: for j | ∃ a ij � = 0 ∈ A s and π x ( j ) � = s do get x π x ( j ) , j 2: 3: sync { execute fan-out } 4: y s = A s x s { local multiplication stage } Albert-Jan Yzelman

Sparse Computations and Multi-BSP BSP sparse matrix–vector multiplication Variables A s , x s , y s are local versions of the global variables A , x , y distributed according to π A , π x , π y . 1: for j | ∃ a ij � = 0 ∈ A s and π x ( j ) � = s do get x π x ( j ) , j 2: 3: sync { execute fan-out } 4: y s = A s x s { local multiplication stage } 5: for i | ∃ a ij ∈ A s and π y ( i ) � = s do send ( i , y s , i ) to π y ( i ) 6: 7: sync { execute fan-in } Albert-Jan Yzelman

Sparse Computations and Multi-BSP BSP sparse matrix–vector multiplication Variables A s , x s , y s are local versions of the global variables A , x , y distributed according to π A , π x , π y . 1: for j | ∃ a ij � = 0 ∈ A s and π x ( j ) � = s do get x π x ( j ) , j 2: 3: sync { execute fan-out } 4: y s = A s x s { local multiplication stage } 5: for i | ∃ a ij ∈ A s and π y ( i ) � = s do send ( i , y s , i ) to π y ( i ) 6: 7: sync { execute fan-in } 8: for all ( i , α ) received do add α to y s , i 9: Rob H. Bisseling, “Parallel Scientific Computation”, Oxford Press, 2004. Albert-Jan Yzelman

Sparse Computations and Multi-BSP BSP sparse matrix–vector multiplication Suppose π A assigns every nonzero a ij ∈ A to processor π A ( i , j ). If 1 π y ( i ) ∈ { s | ∃ a ij ∈ A , π A ( i , j ) = s } and 2 π x ( j ) ∈ { s | ∃ a ij ∈ A , π A ( i , j ) = s } ; Albert-Jan Yzelman

Sparse Computations and Multi-BSP BSP sparse matrix–vector multiplication Suppose π A assigns every nonzero a ij ∈ A to processor π A ( i , j ). If 1 π y ( i ) ∈ { s | ∃ a ij ∈ A , π A ( i , j ) = s } and 2 π x ( j ) ∈ { s | ∃ a ij ∈ A , π A ( i , j ) = s } ; then � � λ col fan-out communication scatters � − 1 elements from x , j j i ( λ row fan-in communication gathers � − 1) elements from y , i where λ row = |{ s | ∃ a ij ∈ A s }| and i λ col = |{ s | ∃ a ij ∈ A s }| . j Minimising the λ − 1 metric minimises total communication volume . Albert-Jan Yzelman

Sparse Computations and Multi-BSP BSP sparse matrix–vector multiplication Partitioning combined with reordering illustrates clear separators: 1 2 3 4 1 2 3 4 Group nonzeroes a ij for which π A ( i ) = π A ( j ), permute rows i with λ i > 1 in between, apply recursive bipartitioning. Albert-Jan Yzelman

Sparse Computations and Multi-BSP BSP sparse matrix–vector multiplication When partitioning in both dimensions: Albert-Jan Yzelman

Sparse Computations and Multi-BSP BSP sparse matrix–vector multiplication Classical worst-case bounds (in flops): (1 + ǫ ) + n / p ( √ p − 1)(2 g + 1) + 2 l . 2 nz ( A ) Block: p Albert-Jan Yzelman

Sparse Computations and Multi-BSP BSP sparse matrix–vector multiplication Classical worst-case bounds (in flops): (1 + ǫ ) + n / p ( √ p − 1)(2 g + 1) + 2 l . 2 nz ( A ) Block: p 2 nz ( A ) Row 1D: (1 + ǫ ) + gh fan-out + l . p Albert-Jan Yzelman

Sparse Computations and Multi-BSP BSP sparse matrix–vector multiplication Classical worst-case bounds (in flops): (1 + ǫ ) + n / p ( √ p − 1)(2 g + 1) + 2 l . 2 nz ( A ) Block: p 2 nz ( A ) Row 1D: (1 + ǫ ) + gh fan-out + l . p 2 nz ( A ) (1 + ǫ ) + max s recv fan-in Col 1D: + gh fan-in + l . s p Albert-Jan Yzelman

Sparse Computations and Multi-BSP BSP sparse matrix–vector multiplication Classical worst-case bounds (in flops): (1 + ǫ ) + n / p ( √ p − 1)(2 g + 1) + 2 l . 2 nz ( A ) Block: p 2 nz ( A ) Row 1D: (1 + ǫ ) + gh fan-out + l . p 2 nz ( A ) (1 + ǫ ) + max s recv fan-in Col 1D: + gh fan-in + l . s p 2 nz ( A ) (1 + ǫ ) + max s recv fan-in Full 2D: + g ( h fan-out + h fan-in ) + 2 l . s p Albert-Jan Yzelman

Sparse Computations and Multi-BSP BSP sparse matrix–vector multiplication Classical worst-case bounds (in flops): (1 + ǫ ) + n / p ( √ p − 1)(2 g + 1) + 2 l . 2 nz ( A ) Block: p 2 nz ( A ) Row 1D: (1 + ǫ ) + gh fan-out + l . p 2 nz ( A ) (1 + ǫ ) + max s recv fan-in Col 1D: + gh fan-in + l . s p 2 nz ( A ) (1 + ǫ ) + max s recv fan-in Full 2D: + g ( h fan-out + h fan-in ) + 2 l . s p Memory overhead (buffers):   � � � � � � � ( λ row λ col  = O Θ − 1) + − 1 . p 1 λ> 1 i i i j λ : λ row ∪ λ col Albert-Jan Yzelman

Sparse Computations and Multi-BSP BSP sparse matrix–vector multiplication Classical worst-case bounds (in flops): (1 + ǫ ) + n / p ( √ p − 1)(2 g + 1) + 2 l . 2 nz ( A ) Block: p 2 nz ( A ) Row 1D: (1 + ǫ ) + gh fan-out + l . p 2 nz ( A ) (1 + ǫ ) + max s recv fan-in Col 1D: + gh fan-in + l . s p 2 nz ( A ) (1 + ǫ ) + max s recv fan-in Full 2D: + g ( h fan-out + h fan-in ) + 2 l . s p Memory overhead (buffers):   � � � � � � � ( λ row λ col  = O Θ − 1) + − 1 . p 1 λ> 1 i i i j λ : λ row ∪ λ col Depending on the higher-level algorithm: fan-in latency can be hidden behind other kernels, fan-out latency can be hidden as well. Albert-Jan Yzelman

Sparse Computations and Multi-BSP Multi-BSP Multi-BSP computer = p ( subcomputers or processors ) + M bytes of local memory+ an interconnect Albert-Jan Yzelman

Sparse Computations and Multi-BSP Multi-BSP Multi-BSP computer = p ( subcomputers or processors ) + M bytes of local memory+ an interconnect A total of 4 L parameters: ( p 0 , g 0 , l 0 , M 0 , . . . , p L − 1 , g L − 1 , l L − 1 , M L − 1 ). Advantages: memory-aware, non-uniform! Albert-Jan Yzelman

Sparse Computations and Multi-BSP Multi-BSP Multi-BSP computer = p ( subcomputers or processors ) + M bytes of local memory+ an interconnect A total of 4 L parameters: ( p 0 , g 0 , l 0 , M 0 , . . . , p L − 1 , g L − 1 , l L − 1 , M L − 1 ). Advantages: memory-aware, non-uniform! Disadvantages: (likely) harder to prove optimality. L. G. Valiant, A bridging model for multi-core computing , CACM 2011. Albert-Jan Yzelman

Sparse Computations and Multi-BSP Multi-BSP An example with L = 3 quadlets ( p , g , l , M ): C = (2 , g 0 , l 0 , M 0 ) (4 , g 1 , l 1 , M 1 ) (8 , g 2 , l 2 , M 2 ) Each quadlet runs its own BSP SPMD program. Albert-Jan Yzelman

Sparse Computations and Multi-BSP Multi-BSP SpMV multiplication SPMD-style Multi-BSP SpMV multiplication: define process 0 at level − 1 as the Multi-BSP root. let process s at level k have parent t at level k − 1. define ( A − 1 , 0 , x − 1 , 0 , y − 1 , 0 ) = ( A , x , y ), the original input. Albert-Jan Yzelman

Sparse Computations and Multi-BSP Albert-Jan Yzelman October 11, - PowerPoint PPT Presentation

Sparse Computations and Multi-BSP Sparse Computations and Multi-BSP Albert-Jan Yzelman October 11, 2016 Parallel Computing & Big Data Huawei Technologies France Albert-Jan Yzelman Sparse Computations and Multi-BSP BSP BSP machine = {

General Assembly 04 JULY 2019 BSP ITALY MARKET TRENDS BSP ITALY R&S Market Data Dec. 2018

BSP Trees Binary Spatial Partitioning (BSP) means: Partition (or split) a space into Binary

BSP Pressure Sensor 1 28.05.2014 Balluff GmbH, BA Marco Zorcic www.balluff.com BSP Pressure

Embarrassingly Parallel Computations 3.2 1 Embarrassingly Parallel Computations A computation

Sparse Matrices Example Of Sparse Matrices diagonal tridiagonal sparse many elements are

Sparse Matrices sparse many elements are zero dense few elements are zero Example Of

Structuring Computations Structuring Computations Contents Jacobs Types06, 18/4/06

CS-184: Computer Graphics Lecture #7: BSP and AABB Trees Brandon Wang and Prof. James OBrien

CS-184: Computer Graphics Lecture #7: BSP and AABB Trees Prof. James OBrien University of

AIRS activities at NCEP John C. Derber NOAA/NWS/NCEP/EMC (VanDelst, Tahara, Treadon, et al.)

BSP? IS A 7 Dimensions of ABA Applied Analytic T echnological Behavioral Conceptually

presentation to Shareholders BSP Annual General Meeting, 21 May 2010 Presentation overview

Parallel Numerical Algorithms Chapter 4 Sparse Linear Systems Section 4.1 Direct Methods

Extremal results for sparse pseudorandom graphs Yufei Zhao Massachusetts Institute of Technology

Sparse tensors are a natural way of representing real-world data 1 Sparse tensors are a natural

MLSS 06 - Canberra Elements Hierarchical Basis Sparse Grids Sparse Grids Combination

AN EXTENSIBLE AND PRIVACY- PRESERVING MOBILE ID Michael Hlzl, MSc Institute of Networks and

End to End Quality with the Sonar Ecosystem and the Water Leak Metaphor G. Ann Campbell

G 8243 Entropy and Information in Probability I. Kontoyiannis Columbia University Spring 2009

Logic Puzzles Problem Solving Club Birds In Trees There are 2 trees in a garden (tree

Abstraction-Carrying Code an Puebla and Manuel Hermenegildo , Elvira Albert

Commitments etc. Bart Geurts Ulterior motives Two aspects of promises Albert to Berta:

Halls B G Theorem E(H) L(H) R(H) Hall.1 Hall.2 Albert R Meyer. April 3, 2013 Albert R

BJC in Action: Comparison of Student Perceptions of a Computer Science Principles Course Thomas

Sparse Computations and Multi-BSP Albert-Jan Yzelman October 11, - PowerPoint PPT Presentation

Sparse Computations and Multi-BSP Sparse Computations and Multi-BSP Albert-Jan Yzelman October 11, 2016 Parallel Computing & Big Data Huawei Technologies France Albert-Jan Yzelman Sparse Computations and Multi-BSP BSP BSP machine = {

General Assembly 04 JULY 2019 BSP ITALY MARKET TRENDS BSP ITALY R&amp;S Market Data Dec. 2018

BSP Trees Binary Spatial Partitioning (BSP) means: Partition (or split) a space into Binary

BSP Pressure Sensor 1 28.05.2014 Balluff GmbH, BA Marco Zorcic www.balluff.com BSP Pressure

Embarrassingly Parallel Computations 3.2 1 Embarrassingly Parallel Computations A computation

Sparse Matrices Example Of Sparse Matrices diagonal tridiagonal sparse many elements are

Sparse Matrices sparse many elements are zero dense few elements are zero Example Of

Structuring Computations Structuring Computations Contents Jacobs Types06, 18/4/06

CS-184: Computer Graphics Lecture #7: BSP and AABB Trees Brandon Wang and Prof. James OBrien

CS-184: Computer Graphics Lecture #7: BSP and AABB Trees Prof. James OBrien University of

AIRS activities at NCEP John C. Derber NOAA/NWS/NCEP/EMC (VanDelst, Tahara, Treadon, et al.)

BSP? IS A 7 Dimensions of ABA Applied Analytic T echnological Behavioral Conceptually

presentation to Shareholders BSP Annual General Meeting, 21 May 2010 Presentation overview

Parallel Numerical Algorithms Chapter 4 Sparse Linear Systems Section 4.1 Direct Methods

Extremal results for sparse pseudorandom graphs Yufei Zhao Massachusetts Institute of Technology

Sparse tensors are a natural way of representing real-world data 1 Sparse tensors are a natural

MLSS 06 - Canberra Elements Hierarchical Basis Sparse Grids Sparse Grids Combination

AN EXTENSIBLE AND PRIVACY- PRESERVING MOBILE ID Michael Hlzl, MSc Institute of Networks and

End to End Quality with the Sonar Ecosystem and the Water Leak Metaphor G. Ann Campbell

G 8243 Entropy and Information in Probability I. Kontoyiannis Columbia University Spring 2009

Logic Puzzles Problem Solving Club Birds In Trees There are 2 trees in a garden (tree

Abstraction-Carrying Code an Puebla and Manuel Hermenegildo , Elvira Albert

Commitments etc. Bart Geurts Ulterior motives Two aspects of promises Albert to Berta:

Halls B G Theorem E(H) L(H) R(H) Hall.1 Hall.2 Albert R Meyer. April 3, 2013 Albert R

BJC in Action: Comparison of Student Perceptions of a Computer Science Principles Course Thomas

General Assembly 04 JULY 2019 BSP ITALY MARKET TRENDS BSP ITALY R&S Market Data Dec. 2018