Sparse tensors are a natural way of representing real-world data 1
Sparse tensors are a natural way of representing real-world data 1
Sparse tensors are a natural way of representing real-world data … Durable Poor Quality Peter 2 Lilly 1 1 Paul 1 Billy 1 Hilde 1 Bob 2 Sam 3 Mary 1 1 The Iliad Dubliners Monitor Sweater Laptop Candide Jacket Kindle … … 1
Sparse tensors are a natural way of representing real-world data … Durable Poor Quality Peter 2 Lilly 1 1 Paul 1 Billy 1 Hilde 1 Bob 2 Sam 3 Mary 1 1 The Iliad Dubliners Monitor Sweater Laptop Candide Jacket Kindle … … Dense storage: 107 exabytes Sparse storage: 13 gigabytes 1
There exists many di ff erent formats for storing tensors DNS CSB CSR DCSR BCSR COO USS ELL BCOO CSC DIA BDIA DCSC LIL SELL SKY BELL LNK MSR DOK BND VBR JAD 2
There exists many di ff erent formats for storing tensors DNS CSB CSR DCSR BCSR COO USS ELL BCOO CSC DIA BDIA DCSC LIL SELL SKY BELL E ffi cient insertions LNK MSR DOK BND VBR JAD 2
There exists many di ff erent formats for storing tensors DNS CSB CSR DCSR BCSR COO USS ELL Structured stencils BCOO CSC DIA BDIA DCSC LIL SELL SKY BELL LNK MSR DOK BND VBR JAD 2
There exists many di ff erent formats for storing tensors DNS CSB CSR DCSR BCSR COO USS ELL BCOO Unstructured mesh simulations CSC DIA BDIA DCSC LIL SELL SKY BELL LNK MSR DOK BND VBR JAD 2
Applications must work with tensors in di ff erent formats for performance Time Construct tensor T Compute with tensor T 3
Applications must work with tensors in di ff erent formats for performance Time Only COO: Construct tensor T in COO Construct tensor T Compute with tensor T in COO Compute with tensor T 3
Applications must work with tensors in di ff erent formats for performance Time Only COO: Construct tensor T in COO Construct tensor T Compute with tensor T in COO Compute with tensor T Compute with Only DIA: Construct tensor T in DIA tensor T in DIA 3
Applications must work with tensors in di ff erent formats for performance Time Only COO: Construct tensor T in COO Construct tensor T Compute with tensor T in COO Compute with tensor T Compute with Only DIA: Construct tensor T in DIA tensor T in DIA 3
Applications must work with tensors in di ff erent formats for performance Time Only COO: Construct tensor T in COO Construct tensor T Compute with tensor T in COO Compute with tensor T Compute with Only DIA: Construct tensor T in DIA tensor T in DIA 3
Applications must work with tensors in di ff erent formats for performance Time Only COO: Construct tensor T in COO Construct tensor T Compute with tensor T in COO Compute with tensor T Compute with Only DIA: Construct tensor T in DIA tensor T in DIA Compute with Hybrid: Construct tensor T in COO tensor T in DIA 3
Applications must work with tensors in di ff erent formats for performance Time Only COO: Construct tensor T in COO Construct tensor T Compute with tensor T in COO Compute with tensor T Compute with Only DIA: Construct tensor T in DIA tensor T in DIA Compute with Hybrid: Construct tensor T in COO COO → DIA tensor T in DIA 3
Manually implementing support for e ffi cient conversion between all combinations of formats is infeasible COO COO BCSR BCSR ELL ELL BND BND DIA DIA JAD JAD SKY SKY CSR CSR . . . . . . 4
Manually implementing support for e ffi cient conversion between all combinations of formats is infeasible COO COO BCSR BCSR ELL ELL BND BND DIA DIA JAD JAD SKY SKY CSR CSR . . . . . . . . . 4
Manually implementing support for e ffi cient conversion between all combinations of formats is infeasible int K = 0; for (int i = 0; i < N; i++) { bool nz[2 * N - 1] = {0}; int ncols = A_pos[i+1] - A_pos[i]; COO COO for (int i = 0; i < N; i++) { K = max(K, ncols); for (int pA2 = A_pos[i]; } pA2 < A_pos[i+1]; pA2++) { int* B_crd = new int[K * N](); int j = A_crd[pA2]; double* B_vals = new double[K * N](); BCSR BCSR int k = j - i; for (int i = 0; i < N; i++) { nz[k + N - 1] = true; int count = 0; }} for (int pA2 = A_pos[i]; int* B_perm = new int[2 * N - 1]; pA2 < A_pos[i+1]; pA2++) { int K = 0; int j = A_crd[pA2]; ELL ELL for (int i = -N + 1; i < N; i++) { int k = count++; int pB2 = k * N + i; if (nz[i + N - 1]) B_crd[pB2] = j; B_perm[K++] = i; B_vals[pB2] = A_vals[pA2]; } BND BND }} double* B_vals = new double[K * N](); int* B_rperm = new int[2 * N - 1]; int count[N] = {0}; for (int i = 0; i < K; i++) { for (int pA1 = A_pos[0]; B_rperm[B_perm[i] + N - 1] = i; DIA DIA pA1 < A_pos[1]; pA1++) { } int i = A1_crd[pA1]; for (int i = 0; i < N; i++) { count[i]++; for (int pA2 = A_pos[i]; } pA2 < A_pos[i+1]; pA2++) { int* B_pos = new int[N + 1]; JAD JAD int j = A_crd[pA2]; B_pos[0] = 0; int k = j - i; for (int i = 0; i < N; i++) { int pB1 = B_rperm[k + N - 1]; B_pos[i + 1] = B_pos[i] + count[i]; int pB2 = pB1 * N + i; } SKY SKY B_vals[pB2] = A_vals[pA2]; int* B_crd = new int[pos[N]]; }} double* B_vals = new double[pos[N]]; for (int pA1 = A_pos[0]; pA1 < A_pos[1]; pA1++) { int i = A1_crd[pA1]; CSR CSR int j = A2_crd[pA1]; int pB2 = pos[i]++; B_crd[pB2] = j; B_vals[pB2] = A_vals[pA2]; . . . . . . . . . } for (int i = 0; i < N; i++) { B_pos[N - i] = B_pos[N - i - 1]; } B_pos[0] = 0; 4
Hand-optimized libraries limit support for e ffi cient conversion to few combinations of formats COO COO BCSR BCSR ELL ELL BND CSR BND DIA DIA JAD JAD . . . . . . SKY SKY . . . . . . 5
Hand-optimized libraries limit support for e ffi cient conversion to few combinations of formats COO COO BCSR BCSR ELL ELL BND CSR BND DIA DIA JAD JAD . . . . . . SKY SKY . . . . . . 5
Hand-optimized libraries limit support for e ffi cient conversion to few combinations of formats COO COO BCSR BCSR ELL ELL BND CSR BND DIA DIA JAD JAD . . . . . . SKY SKY . . . . . . 5
Hand-optimized libraries limit support for e ffi cient conversion to few combinations of formats COO COO BCSR BCSR ELL ELL BND CSR BND DIA DIA JAD JAD . . . . . . SKY SKY . . . . . . 5
Ine ffi cient conversion eliminates benefit of using di ff erent formats Time Only COO: Construct tensor T in COO Compute with tensor T in COO Compute with Only DIA: Construct tensor T in DIA tensor T in DIA Hybrid w/ Compute with Construct tensor T in COO COO → CSR CSR → DIA tensor T in DIA libraries: 6
Automatic Generation of E ffi cient Sparse Tensor Format Conversion Routines Stephen Chou , Fredrik Kjolstad, and Saman Amarasinghe Stephen Chou , Fredrik Kjolstad, and Saman Amarasinghe
A compiler can generate e ffi cient conversion routines from standalone specifications for each tensor format COO COO BCSR BCSR ELL ELL BND BND DIA DIA JAD JAD SKY SKY CSR CSR . . . . . . . . . 8
A compiler can generate e ffi cient conversion routines from standalone specifications for each tensor format COO COO BCSR BCSR ELL ELL BND BND DIA DIA JAD JAD SKY SKY CSR CSR . . . . . . . . . 8
A compiler can generate e ffi cient conversion routines from standalone specifications for each tensor format COO COO BCSR BCSR ELL ELL BND BND DIA DIA JAD JAD SKY SKY CSR CSR . . . . . . . . . 8
A compiler can generate e ffi cient conversion routines from standalone specifications for each tensor format COO COO BCSR BCSR ELL ELL BND BND DIA DIA JAD JAD SKY SKY CSR CSR . . . . . . . . . 8
A compiler can generate e ffi cient conversion routines from standalone specifications for each tensor format COO COO BCSR BCSR ELL ELL BND BND DIA DIA JAD JAD SKY SKY CSR CSR . . . . . . . . . 8
Our technique generates e ffi cient code This work SPARSKIT Intel MKL 5 4 Normalized time 3 2 1 0 COO → CSR CSR → CSC CSR → DIA CSC → DIA COO → DIA 9
Our technique generates e ffi cient code This work SPARSKIT Intel MKL 5 4 Normalized time 3 2 1 0 COO → CSR CSR → CSC CSR → DIA CSC → DIA COO → DIA 9
Being able to generate e ffi cient conversion routines lets users exploit di ff erent formats for performance Time Only COO: Construct tensor T in COO Compute with tensor T in COO Compute with Only DIA: Construct tensor T in DIA tensor T in DIA Hybrid w/ Compute with Construct tensor T in COO COO → CSR CSR → DIA tensor T in DIA libraries: Hybrid w/ Compute with Construct tensor T in COO COO → DIA tensor T in DIA our approach: 10
Coordinate Remappings Attribute Queries 11
Coordinate Remappings Attribute Queries 11
Di ff erent tensor formats arrange nonzeros in memory in di ff erent ways A B C D E F G H J 12
Di ff erent tensor formats arrange nonzeros in memory in di ff erent ways 0 2 4 7 9 pos 0 2 1 2 1 2 4 2 5 crd A B C D E F G H J vals CSR A B C D E F G H J 12
Di ff erent tensor formats arrange nonzeros in memory in di ff erent ways 0 2 4 7 9 4 pos 3 K N M 6 0 2 1 2 1 2 4 2 5 crd -1 0 2 perm A B C D E F G H J C E H A D F B G J vals vals CSR DIA A B C D E F G H J 12
Recommend
More recommend