A Sparse Tensor Format and a Benchmark Suite Jiajia Li Pacific - PowerPoint PPT Presentation

A Sparse Tensor Format and a Benchmark Suite Jiajia Li Pacific Northwest National Laboratory January 25, 2019 @ MIT Figure sources: “A brief survey of tensors” by Berton Earnshaw and NVIDIA Tensor Cores

HiCOO: Hierarchical Storage of Sparse Tensors Jiajia Li 1,2 , Jimeng Sun 1 , Richard Vuduc 1 1 Georgia Institute of Technology 2 Pacific Northwest National Laboratory SUNLAB Code: https://github.com/hpcgarage/ParTI (v1.0.0) Figure sources: “A brief survey of tensors” by Berton Earnshaw and NVIDIA Tensor Cores

Challenges Compactness: A space-efficient data structure Mode-Genericity: Efficient traversals of the data structure for computations The concept “mode-genericity” is inherited from [Baskaran et al. 2012]. [Baskaran et al. 2012] M. Baskaran et al., “Efficient and scalable computations with sparse tensors,” HPEC2012 � 3

Baseline Sparse Tensor Formats in This Work COO: coordinate formats [Bader et al., 2006] CSF: Compressed Sparse Fibers, extension of CSR. [Smith et al. 2015] i = 1,…,I F-COO: Flagged COO format [Liu et al., 2017] 4 k = 1,…,K j = i j k val 1 bf j k val , … , J 4 3 0 1 2 3 i 0 0 0 1 1 0 0 1 0 1 0 2 0 1 0 2 sf[0]=1 1 0 0 3 1 0 0 3 0 1 0 0 1 2 0 3 j 1 0 2 4 0 0 2 4 2 1 0 5 1 1 0 5 2 2 2 6 0 2 2 6 0 0 0 2 0 2 1 2 k sf[1]=1 3 0 1 7 1 0 1 7 3 3 2 8 1 2 3 4 5 6 7 8 0 3 2 8 val (a) COO (b) CSF (c) F-COO Mode-Generic Mode-Specific prefer different representations for different modes. � 4

Mode-Specific Tensor Formats Three CSF/F-COO representations are required/preferred for three kernels. Kernel in Mode-1 Tensor 0 1 2 3 i Decomposition 0 1 0 0 1 2 0 3 j CSF-1 0 0 0 2 0 2 1 2 k 1 2 3 4 5 6 7 8 val 0 1 2 3 j Kernel in Mode-2 0 1 1 3 0 2 2 3 i CSF-2 0 0 2 1 0 0 2 2 k val 1 3 4 7 2 5 6 8 Kernel in Mode-3 k 0 1 2 0 0 1 2 3 1 2 3 i CSF-3 j 0 1 0 1 0 0 2 3 val 1 2 3 5 7 4 6 8 � 5

Mode-Specific Tensor Formats Three CSF/F-COO representations are required/preferred for three kernels. Kernel in Mode-1 Tensor 0 1 2 3 i Decomposition 0 1 0 0 1 2 0 3 j CSF-1 0 0 0 2 0 2 1 2 k 1 2 3 4 5 6 7 8 val Kernel in Mode-2 Performance drops Kernel in Mode-3 � 6

Mode Orientation Tensor decomposition Mode-Specific Mode-Generic Kernel in Mode-1 Mode-1 oriented (CSF/FCOO) Coordinate (COO) HiCOO Kernel in Mode-2 Kernel in Mode-3 Efficient In-efficient � 7

HiCOO Format Store a sparse tensor in units of small sparse blocks bptr bi bj bk ei ej ek val i j k val 0 0 0 0 0 0 0 1 0 0 0 1 i = 1,…,I B1 0 1 0 2 0 1 0 2 1 0 0 3 1 0 0 3 B2 3 0 0 1 1 0 0 4 1 0 2 4 k = 1,…,K j = 1,…,J 4 1 0 0 0 1 0 5 2 1 0 5 B3 1 0 1 7 2 2 2 6 6 1 1 1 0 0 0 6 3 0 1 7 B4 Block size: 2*2*2 1 1 0 8 3 3 2 8 COO HiCOO Extension from Compressed Sparse Blocks (CSB) format by Buluc et al. SPAA. 2009. 8 �

HiCOO Format Store a sparse tensor in units of small sparse blocks Shorten the bit-length of element indices • block indices element indices 32-bit 32-bit 8-bit bptr bi bj bk ei ej ek val i j k val 0 0 0 0 0 0 0 1 0 0 0 1 i = 1,…,I B1 0 1 0 2 0 1 0 2 1 0 0 3 1 0 0 3 B2 3 0 0 1 1 0 0 4 1 0 2 4 k = 1,…,K j = 1,…,J 4 1 0 0 0 1 0 5 2 1 0 5 B3 1 0 1 7 2 2 2 6 6 1 1 1 0 0 0 6 3 0 1 7 B4 Block size: 2*2*2 1 1 0 8 3 3 2 8 COO HiCOO i = bi * B + ei � 9

HiCOO Format Store a sparse tensor in units of small sparse blocks Shorten the bit-length of element indices • Compress the number of block indices • block indices element indices 32-bit 32-bit 8-bit bptr bi bj bk ei ej ek val i j k val 0 0 0 0 0 0 0 1 0 0 0 1 B1 0 1 0 2 0 1 0 2 1 0 0 3 1 0 0 3 B2 3 0 0 1 1 0 0 4 1 0 2 4 4 1 0 0 0 1 0 5 2 1 0 5 B3 1 0 1 7 2 2 2 6 6 1 1 1 0 0 0 6 3 0 1 7 B4 1 1 0 8 3 3 2 8 COO HiCOO � 10

HiCOO Format Store a sparse tensor in units of small sparse blocks Shorten the bit-length of element indices • Compress the number of block indices • block indices element indices 32-bit 32-bit 8-bit bptr bi bj bk ei ej ek val i j k val 0 0 0 0 0 0 0 1 0 0 0 1 COO indices: B1 0 1 0 2 0 1 0 2 = nnz * 3 * 32 1 0 0 3 1 0 0 3 B2 3 0 0 1 1 0 0 4 1 0 2 4 HiCOO indices: 4 1 0 0 0 1 0 5 2 1 0 5 B3 = nnz * 3 * 8 + nnb * (3 * 32 + 32) 1 0 1 7 2 2 2 6 6 1 1 1 0 0 0 6 3 0 1 7 B4 1 1 0 8 3 3 2 8 COO HiCOO i = bi * B + ei nnz: #Nonzeros; nnb: #Non-zero blocks 11 �

HiCOO Format Store a sparse tensor in units of small sparse blocks Shorten the bit-length of element indices • Compress the number of block indices • For arbitrary-order sparse tensors. • 32-bit 32-bit 8-bit bptr bi bj bk ei ej ek val i j k val 0 0 0 0 0 0 0 1 0 0 0 1 B1 0 1 0 2 0 1 0 2 For the tensor: Reduce its storage 1 0 0 3 1 0 0 3 and memory footprints B2 3 0 0 1 1 0 0 4 1 0 2 4 4 1 0 0 0 1 0 5 2 1 0 5 B3 For matrices: Better data locality 1 0 1 7 2 2 2 6 6 1 1 1 0 0 0 6 3 0 1 7 B4 1 1 0 8 3 3 2 8 COO HiCOO 12 �

Platform and Dataset Platform : Intel Xeon CPU E7-4850 v3 platform consisting 56 physical cores with icc 18.0.2 and parallelized by OpenMP. Dataset : FROSTT [Smith et al. 2017], HaTen2 [Jeon et al. 2015], and healthcare data [Perros et al. 2017]. � 13

Multicore CP-ALS HiCOO outperforms COO by 6.2 × and CSF by 2.1 × on average. Speedup ov er CSF (higher is better) 3D 4D choa 4.00 cr ime darpa fb−m nips nips nell2 HiCOO 2.00 fb−s nell2 HiCOO cr ime flickr CSF−1 deli4d darpa nell1 deli 1.00 CSF−1 fb−m fb−s enron choa � flickr deli nell2 enron � choa nell1 0.50 COO � fb−m � � deli4d darpa nips � fb−s � � cr ime deli COO 0.25 � deli4d � � flickr nell1 � enron 1 2 4 1 2 4 Compression r atio relati v e to CSF (higher is better) � 14

Following Work HiCOO for other tensor operations and Tucker decomposition HiCOO-MTTKRP/CPD on GPUs and distributed systems. � 15

PASTA: A Parallel Sparse Tensor Algorithm Benchmark Suite Jiajia Li 1 , Yuchen Ma 2 , Xiaolong Wu 3 , Ang Li 1 , Kevin Barker 1 1 Pacific Northwest National Laboratory 2 Hangzhou Dianzi University 3 Virginia Tech Code: https://gitlab.com/tensorworld/pasta Figure sources: “A brief survey of tensors” by Berton Earnshaw and NVIDIA Tensor Cores

PASTA Workloads MTTKRP Data (Matriced TEW TS TTV TTM Platforms Structures/ Tensor-Times- (Element-Wise) (Tensor-scalar) (Tensor-Times- (Tensor-Times- Khatri-Rao Algorithms Vector) Matrix) Product) Single-core CPUs COO Multi-core CPUs

PASTA Workloads Arbitrary shape and nonuniform nonzero pattern MTTKRP Data (Matriced TEW TS TTV TTM Platforms Structures/ Tensor-Times- (Element-Wise) (Tensor-scalar) (Tensor-Times- (Tensor-Times- Khatri-Rao Algorithms Vector) Matrix) Product) Single-core CPUs COO Multi-core CPUs

PASTA Workloads Parallelize Parallelize nonzero Parallelize Parallelize nonzeros with partitions nonzeros nonzero fibers atomics MTTKRP Data (Matriced TEW TS TTV TTM Platforms Structures/ Tensor-Times- (Element-Wise) (Tensor-scalar) (Tensor-Times- (Tensor-Times- Khatri-Rao Algorithms Vector) Matrix) Product) Single-core CPUs COO Multi-core CPUs

Memory-Bound Workloads � 20

Following Work Include HiCOO, CSF and other formats Support GPUs, FPGAs (long-term future) � 21

Other Recent Work A dynamic sparse tensor structure for tensor contraction • Collaborators: Sriram Krishnamoorthy (PNNL) • Application: Quantum Chemistry, NWChemEx Hybrid formats and nonzero partitioning strategies • Collaborators: Israt Nisa (OSU), P. (Saday) Sadayappan (OSU), Sriram Krishnamoorthy (PNNL) � 22

Acknowledgement � 23

A Sparse Tensor Format and a Benchmark Suite Jiajia Li Pacific - PowerPoint PPT Presentation

A Sparse Tensor Format and a Benchmark Suite Jiajia Li Pacific Northwest National Laboratory January 25, 2019 @ MIT Figure sources: A brief survey of tensors by Berton Earnshaw and NVIDIA Tensor Cores HiCOO: Hierarchical Storage of Sparse

Htel Splendide Royal Junior Suite Junior Suite Junior Suite Suite Suite Suite Suite Suite

Tensor-Matrix Products with a Compressed Sparse Tensor Shaden Smith George Karypis University

8. Tensor Field Visualization Tensor: extension of concept of scalar and vector Tensor data

Format Abstraction for Sparse Tensor Algebra Compilers Stephen Chou , Fredrik Kjolstad, and Saman

(Some) Challenges in (Some) Challenges in Tensor Mining Tensor Mining Evrim Acar Sandia

Presidential Suite Presidential Suite Presidential Suite Presidential Suite Presidential Suite

Sparse Matrices Example Of Sparse Matrices diagonal tridiagonal sparse many elements are

Tensor Field Techniques Lecture 11 March 5, 2020 Outline Basics of tensor algebra Tensor

TENSOR ALGEBRA Continuum Mechanics Course (MMC) - ETSECCPB - UPC Introduction to Tensors Tensor

Tensor Field Visualization 9-1 Ronald Peikert SciVis 2007 - Tensor Fields Tensors

Sparse Tensor Factorization: Algorithms, Data Structures, and Challenges Shaden Smith &

A Medium-Grained Algorithm for Distributed Sparse Tensor Factorization Shaden Smith George

A Benchmark Suite for Formal Verification of Analog Circuits Felix Salfelder, Lars Hedrich

Sparse Matrices sparse many elements are zero dense few elements are zero Example Of

Tensor Methods for Signal Processing and Machine Learning Qibin Zhao Tensor Learning Unit RIKEN

and You Tensor network methods Matrix product states (MPS) Projected Entangled Pair States

The Cloud Is Big! The Cloud Is Hot! IT industry Constitutes about 2% of total US energy

Data Management Images collected by DataOne.org and stewardship

Analyzing tropical cyclone-climate connections using the Community Earth System Model (CESM) UIUC

Community Workshop December 11, 2019 Water for the Economy Water for the Environment Storage

sparse matrices and graphs L. Olson Department of Computer Science University of Illinois at

Edward T. Tilly, President and COO CBOE Holdings, Inc. UBS Global Financial Services Conference

Question What is the pH of a liter of water to which you add 1 mL of White Vinegar? A. 5.89 B.

Distributed OLTP Databases (Part II) Lecture # 23 Database Systems Andy Pavlo AP AP Computer

A Sparse Tensor Format and a Benchmark Suite Jiajia Li Pacific - PowerPoint PPT Presentation

A Sparse Tensor Format and a Benchmark Suite Jiajia Li Pacific Northwest National Laboratory January 25, 2019 @ MIT Figure sources: A brief survey of tensors by Berton Earnshaw and NVIDIA Tensor Cores HiCOO: Hierarchical Storage of Sparse

Htel Splendide Royal Junior Suite Junior Suite Junior Suite Suite Suite Suite Suite Suite

Tensor-Matrix Products with a Compressed Sparse Tensor Shaden Smith George Karypis University

8. Tensor Field Visualization Tensor: extension of concept of scalar and vector Tensor data

Format Abstraction for Sparse Tensor Algebra Compilers Stephen Chou , Fredrik Kjolstad, and Saman

(Some) Challenges in (Some) Challenges in Tensor Mining Tensor Mining Evrim Acar Sandia

Presidential Suite Presidential Suite Presidential Suite Presidential Suite Presidential Suite

Sparse Matrices Example Of Sparse Matrices diagonal tridiagonal sparse many elements are

Tensor Field Techniques Lecture 11 March 5, 2020 Outline Basics of tensor algebra Tensor

TENSOR ALGEBRA Continuum Mechanics Course (MMC) - ETSECCPB - UPC Introduction to Tensors Tensor

Tensor Field Visualization 9-1 Ronald Peikert SciVis 2007 - Tensor Fields Tensors

Sparse Tensor Factorization: Algorithms, Data Structures, and Challenges Shaden Smith &amp;

A Medium-Grained Algorithm for Distributed Sparse Tensor Factorization Shaden Smith George

A Benchmark Suite for Formal Verification of Analog Circuits Felix Salfelder, Lars Hedrich

Sparse Matrices sparse many elements are zero dense few elements are zero Example Of

Tensor Methods for Signal Processing and Machine Learning Qibin Zhao Tensor Learning Unit RIKEN

and You Tensor network methods Matrix product states (MPS) Projected Entangled Pair States

The Cloud Is Big! The Cloud Is Hot! IT industry Constitutes about 2% of total US energy

Data Management Images collected by DataOne.org and stewardship

Analyzing tropical cyclone-climate connections using the Community Earth System Model (CESM) UIUC

Community Workshop December 11, 2019 Water for the Economy Water for the Environment Storage

sparse matrices and graphs L. Olson Department of Computer Science University of Illinois at

Edward T. Tilly, President and COO CBOE Holdings, Inc. UBS Global Financial Services Conference

Question What is the pH of a liter of water to which you add 1 mL of White Vinegar? A. 5.89 B.

Distributed OLTP Databases (Part II) Lecture # 23 Database Systems Andy Pavlo AP AP Computer

Sparse Tensor Factorization: Algorithms, Data Structures, and Challenges Shaden Smith &