a medium grained algorithm for distributed sparse tensor
play

A Medium-Grained Algorithm for Distributed Sparse Tensor - PowerPoint PPT Presentation

A Medium-Grained Algorithm for Distributed Sparse Tensor Factorization Shaden Smith George Karypis University of Minnesota Department of Computer Science & Engineering shaden@cs.umn.edu Medium-Grained Sparse Tensor Factorization 1 / 24


  1. A Medium-Grained Algorithm for Distributed Sparse Tensor Factorization Shaden Smith George Karypis University of Minnesota Department of Computer Science & Engineering shaden@cs.umn.edu Medium-Grained Sparse Tensor Factorization 1 / 24 http://cs.umn.edu/~splatt/

  2. Table of Contents Preliminaries 1 Related Work: Coarse- and Fine-Grained Algorithms 2 A Medium-Grained Algorithm 3 Experiments 4 Conclusions 5 Medium-Grained Sparse Tensor Factorization 2 / 24 http://cs.umn.edu/~splatt/

  3. Tensor Introduction Tensors are the generalization of matrices to ≥ 3 D Tensors have m dimensions (or modes ) and are I 1 × . . . × I m . ◮ We’ll stick to m = 3 in this talk and call dimensions I , J , K patients procedures diagnoses Medium-Grained Sparse Tensor Factorization 3 / 24 http://cs.umn.edu/~splatt/

  4. Canonical Polyadic Decomposition (CPD) We compute matrices A , B , C , each with F columns ◮ F is assumed to be small, on the order of 10 or 50 + · · · + ≈ Usually computed via alternating least squares (ALS) As a result, computations are mode-centric Medium-Grained Sparse Tensor Factorization 4 / 24 http://cs.umn.edu/~splatt/

  5. CPD-ALS Algorithm 1 CPD-ALS 1: while not converged do A ⊺ = ( C ⊺ C ∗ B ⊺ B ) − 1 � � ⊺ X (1) ( C � B ) 2: B ⊺ = ( C ⊺ C ∗ A ⊺ A ) − 1 � � ⊺ X (2) ( C � A ) 3: C ⊺ = ( B ⊺ B ∗ A ⊺ A ) − 1 � � ⊺ X (3) ( B � A ) 4: 5: end while Medium-Grained Sparse Tensor Factorization 5 / 24 http://cs.umn.edu/~splatt/

  6. A Closer Look... Algorithm 2 One mode of CPD-ALS 1: ˆ A ← X (1) ( C � B ) ⊲ O ( F · nnz( X )) 2: LL ⊺ ← Cholesky( C ⊺ C ∗ B ⊺ B ) ⊲ O ( F 3 ) 3: A ⊺ = ( LL ⊺ ) − 1 ˆ ⊺ ⊲ O ( IF 2 ) A ⊲ O ( IF 2 ) 4: Compute A ⊺ A Step 1 is the most expensive and the focus of this talk Medium-Grained Sparse Tensor Factorization 6 / 24 http://cs.umn.edu/~splatt/

  7. Matricized Tensor Times Khatri-Rao Product (MTTKRP) A C i k B j A ( i , :) ← ˆ ˆ A ( i , :) + X ( i , j , k ) [ B ( j , :) ∗ C ( k , :)] Medium-Grained Sparse Tensor Factorization 7 / 24 http://cs.umn.edu/~splatt/

  8. MTTKRP Communication A C i k B j 1 j 2 A ( i , :) ← ˆ ˆ A ( i , :) + X ( i , j 1 , k ) [ B ( j 1 , :) ∗ C ( k , :)] A ( i , :) ← ˆ ˆ A ( i , :) + X ( i , j 2 , k ) [ B ( j 2 , :) ∗ C ( k , :)] Medium-Grained Sparse Tensor Factorization 8 / 24 http://cs.umn.edu/~splatt/

  9. Table of Contents Preliminaries 1 Related Work: Coarse- and Fine-Grained Algorithms 2 A Medium-Grained Algorithm 3 Experiments 4 Conclusions 5 Medium-Grained Sparse Tensor Factorization 9 / 24 http://cs.umn.edu/~splatt/

  10. Coarse-Grained Decomposition A C B [Choi & Vishwanathan 2014, Shin & Kang 2014] Processes own complete slices of X and aligned factor rows I / p rows communicated to p − 1 processes after each update Medium-Grained Sparse Tensor Factorization 10 / 24 http://cs.umn.edu/~splatt/

  11. Fine-Grained Decomposition [Kaya & U¸ car 2015] Most flexible: non-zeros individually assigned to processes Two communication steps Aggregate partial computations after MTTKRP 1 Exchange new factor values 2 Factors can be assigned to minimize communication Medium-Grained Sparse Tensor Factorization 11 / 24 http://cs.umn.edu/~splatt/

  12. Finding a Fine-Grained Decomposition Some options: Random assignment Hypergraph partitioning Multi-constraint hypergraph partitioning In Practice: Hypergraph Model nnz( X ) vertices and I + J + K hyperedges Tight approximation of communication and load balance ◮ Distribution of factors must be considered: in practice a greedy solution works well Medium-Grained Sparse Tensor Factorization 12 / 24 http://cs.umn.edu/~splatt/

  13. Table of Contents Preliminaries 1 Related Work: Coarse- and Fine-Grained Algorithms 2 A Medium-Grained Algorithm 3 Experiments 4 Conclusions 5 Medium-Grained Sparse Tensor Factorization 13 / 24 http://cs.umn.edu/~splatt/

  14. Medium-Grained Decomposition A 1 A 2 C 2 C 1 B 1 B 2 B 3 Distribute over a grid of p = q × r × s partitions r × s processes divide each A 1 , . . . , A q Two communication steps like fine-grained ◮ O ( I / p ) rows communicated to r × s processes Medium-Grained Sparse Tensor Factorization 14 / 24 http://cs.umn.edu/~splatt/

  15. Medium-Grained Decomposition X (2 , 3 , 1) C 1 A 2 B 3 Each process owns roughly I / p rows of each factor Like before, a greedy algorithm works well Medium-Grained Sparse Tensor Factorization 15 / 24 http://cs.umn.edu/~splatt/

  16. Finding a Medium-Grained Decomposition Greedy Algorithm 1 Apply a random relabeling to modes of X 2 Choose a decomposition dimension (algorithm in paper) 3 Compute 1D partitionings of each mode ◮ Greedily chosen with load balance objective 4 Intersect! 5 Distribute factors with objective of reducing communication Medium-Grained Sparse Tensor Factorization 16 / 24 http://cs.umn.edu/~splatt/

  17. Table of Contents Preliminaries 1 Related Work: Coarse- and Fine-Grained Algorithms 2 A Medium-Grained Algorithm 3 Experiments 4 Conclusions 5 Medium-Grained Sparse Tensor Factorization 17 / 24 http://cs.umn.edu/~splatt/

  18. Datasets Dataset I J K nnz Netflix 480K 18K 2K 100M Delicious 532K 17M 3M 140M NELL 3M 2M 25M 143M Amazon 5M 18M 2M 1.7B Random1 20M 20M 20M 1.0B Random2 50M 5M 5M 1.0B Medium-Grained Sparse Tensor Factorization 18 / 24 http://cs.umn.edu/~splatt/

  19. Load Balance Table: Load imbalance with 64 and 128 processes. coarse medium fine Dataset 64 128 64 128 64 128 Netflix 1.03 1.18 1.00 1.00 1.00 1.00 Delicious 1.21 1.41 1.01 1.06 1.00 1.05 NELL 1.12 1.29 1.01 1.01 1.00 1.00 Amazon 2.17 3.86 1.08 1.08 - - Medium-Grained Sparse Tensor Factorization 19 / 24 http://cs.umn.edu/~splatt/

  20. Communication Volume Average Communication Volume Maximum Communication Volume 4.0 4.0 medium medium fine fine 3.5 3.5 Volume relative to coarse-grained Volume relative to coarse-grained 3.0 3.0 2.5 2.5 2.0 2.0 1.5 1.5 1.0 1.0 0.5 0.5 0.0 0.0 Netflix Delicious NELL Amazon Random1 Random2 Netflix Delicious NELL Amazon Random1 Random2 Medium-Grained Sparse Tensor Factorization 20 / 24 http://cs.umn.edu/~splatt/

  21. Strong Scaling: Netflix DFacTo medium fine coarse ideal 10 2 10 1 Time per iteration 10 0 10 -1 10 -2 8 16 32 64 128 256 512 1024 Number of cores Medium-Grained Sparse Tensor Factorization 21 / 24 http://cs.umn.edu/~splatt/

  22. Strong Scaling: Amazon DFacTo medium ideal coarse 10 2 10 1 Time per iteration 10 0 10 -1 64 128 256 512 1024 Number of cores Medium-Grained Sparse Tensor Factorization 22 / 24 http://cs.umn.edu/~splatt/

  23. Table of Contents Preliminaries 1 Related Work: Coarse- and Fine-Grained Algorithms 2 A Medium-Grained Algorithm 3 Experiments 4 Conclusions 5 Medium-Grained Sparse Tensor Factorization 23 / 24 http://cs.umn.edu/~splatt/

  24. Wrapping Up... Medium-grained decompositions are a good middle-ground 1 . 5 × to 5 × faster than fine-grained decompositions with hypergraph partitioning DMS is 40 × to 80 × faster than DFacTo , the fastest publicly available software http://cs.umn.edu/~splatt/ Medium-Grained Sparse Tensor Factorization 24 / 24 http://cs.umn.edu/~splatt/

  25. Choosing the Shape of the Decomposition Objective We need to find q , r , s such that q × r × s = p Tensors modes are often very skewed (480k Netflix users vs 2k days) ◮ We want to assign processes proportionally ◮ 1D decompositions actually work well for many tensors Algorithm 1 Start with a 1 × 1 × 1 shape 2 Compute the prime factorization of p 3 For each prime factor f , starting from the largest, multiply the most imbalanced mode by f Medium-Grained Sparse Tensor Factorization 24 / 24 http://cs.umn.edu/~splatt/

Recommend


More recommend