shared memory parallelization of mttkrp for dense tensors
play

Shared Memory Parallelization of MTTKRP for Dense Tensors BLIS - PowerPoint PPT Presentation

Shared Memory Parallelization of MTTKRP for Dense Tensors BLIS Retreat 2017, September 18 th Koby Hayashi , Grey Ballard, Yujie Jiang, Michael Tobia hayakb13@,ballard@,jiany14@,tobiamj@wfu.edu Neuroimaging Application ensor: Time by Subjects by


  1. Shared Memory Parallelization of MTTKRP for Dense Tensors BLIS Retreat 2017, September 18 th Koby Hayashi , Grey Ballard, Yujie Jiang, Michael Tobia hayakb13@,ballard@,jiany14@,tobiamj@wfu.edu

  2. Neuroimaging Application ensor: Time by Subjects by oxel Correlation Matrix est: Rest Γ  Activity Γ  ecovery Subjects: Control, MDD, SAD, COMO Time

  3. Quick Introduction to Tensors Multidimensional arrays, an N-dimensional nsors is said to be N-way or order-N. 4-way 5-way ay 2-way 3-way

  4. CP Decomposition nonical Polyadic composition (CP): Decomposes a tensor into a sum of rank 1 tensors π’΄β‰ˆ βˆ‘π‘‘ =0 ↑𝐷 βˆ’1 ▒​𝑣↓𝑗𝑑 ∘ β€‹π‘€β†“π‘˜π‘‘ ∘ ​π‘₯↓𝑙𝑑 π’΄β‰ˆ βŸ¦π‘‰ , π‘Š , π‘‹βŸ§

  5. CP via Alternating Least Squares

  6. Hadamard Product β†‘π‘ˆ ​𝑉↓ 0 )βˆ—β€¦βˆ— ​ ( π‘‰β†“π‘œ βˆ’1 β†‘π‘ˆ β€‹π‘‰β†“π‘œ βˆ’1 )βˆ— ​ ( π‘‰β†“π‘œ +1 β†‘π‘ˆ β€‹π‘‰β†“π‘œ +1 )βˆ—β€¦βˆ— ​ ( 𝑉↓𝑂 βˆ’1 β†‘π‘ˆ ​𝑉↓𝑂 βˆ’1 ) Element wise matrix product denoted * 𝐷 = 𝐡 βˆ— 𝐢 β€‹π·β†“π‘—π‘˜ = β€‹π΅β†“π‘—π‘˜ βˆ— β€‹πΆβ†“π‘—π‘˜ β€‹π·β†“π‘—π‘˜ β€‹π΅β†“π‘—π‘˜ β€‹πΆβ†“π‘—π‘˜ = 𝐽 * 𝐾 𝐾 𝐾

  7. Khatri Rao Product = ​𝒀↓ 𝒀↓(𝒐) ( ​𝑽↓ 𝑽↓𝑢 βˆ’πŸ ⨀…⨀ ​𝑽↓ 𝑽↓𝒐 +𝟐 ⨀ ​𝑽↓ 𝑽↓𝒐 βˆ’πŸ ⨀…⨀ ​𝑽↓ 𝑽↓ 𝟏 ) = βŠ™ A (​𝑠↓𝐡 ,: ) Khatri Rao Product (KRP): B (​𝑠↓𝐢 𝐿 = 𝐡 βŠ™ 𝐢 ​𝐽↓𝐡 βˆ™ ​𝐽↓𝐢 𝐷 𝐷 olumn-wise Kronecker Product 𝐿( :, 𝑗) = 𝐡( :, 𝑗) βŠ™ 𝐢( :, 𝑗) r Hadamard Product of Rows 𝐿(​𝑠↓𝐢 + ​𝑠↓𝐡 ​𝐽↓𝐢 ,: ) 𝐿(​𝑠↓𝐢 + ​𝑠↓𝐡 ​𝐽↓𝐢 ,: ) = 𝐡(​𝑠↓𝐡 ,: ) βˆ— 𝐢(​𝑠↓𝐢 ,: ) 𝐷

  8. Tensor Fibers β€‹π‘œ =0, 𝒴 ↓ (: π‘˜π‘™ ) β€‹π‘œ =1, 𝒴 ↓ ( 𝑗 : 𝑙 ) β€‹π‘œ =2, 𝒴 ↓ ( π‘—π‘˜ :)

  9. Unfolding Tensors 𝐍= ​𝒀↓ 𝒀↓(𝒐) ( ​𝑽↓ 𝑽↓𝑢 βˆ’πŸ ⨀…⨀ ​𝑽↓ 𝑽↓𝒐 +𝟐 ⨀ ​𝑽↓ 𝑽↓𝒐 βˆ’πŸ ⨀…⨀ ​𝑽↓ 𝑽↓ 𝟏 ) ​𝐽↓ β‰  π‘œ β€’ The n th mode matricization of a N- way tensor 𝒴 that is ​𝐽↓ 0 Γ— ​𝐽↓ 1 ×…× ​𝐽↓𝑂 βˆ’1 β€‹π‘Œβ†“ ( π‘œ ) β€‹π½β†“π‘œ is denoted β€‹π‘Œβ†“ ( π‘œ ) and is β€‹π½β†“π‘œ Γ— ​𝐽↓ β‰  π‘œ o ​𝐽↓ β‰  π‘œ = βˆπ‘œ β‰  𝑙 ∈[ 𝑂 ] ↑▒​𝐽↓𝑙 β€‹π‘Œβ†“ ( 𝑛 : π‘œ ) denotes a matricization β€’ where {𝑛 , 𝑛 +1,…, π‘œ} are the row modes βˆπ‘™ ={ 𝑛 , 𝑛 +1,…, π‘œ } ↑▒​𝐽↓𝑙 β€‹π‘Œβ†“ ( 𝑛 : π‘œ )

  10. Matricized Tensor Times Khatri Rao Product 𝑁 = β€‹π‘Œβ†“(π‘œ) ( ​𝑉↓ 0 ⨀…⨀ β€‹π‘‰β†“π‘œ βˆ’1 ⨀ β€‹π‘‰β†“π‘œ +1 ⨀…⨀ ​𝑉↓𝑂 βˆ’1 ) NaΓ―ve algorithm Permute 𝒴 to β€‹π‘Œβ†“(π‘œ) 𝐷 1. Form K= ​ ( 𝑉↓ 0 ⨀…⨀ β€‹π‘‰β†“π‘œ βˆ’1 ⨀ β€‹π‘‰β†“π‘œ +1 ⨀…⨀ ​𝑉↓𝑂 βˆ’1 ) 2. β€‹π‘Œβ†“(π‘œ) 3. Call DGEMM 𝐷 1-Step and 2-Step MTTKRP = β€‹π½β†“π‘œ Avoid permuting 𝒴 K 1. β€‹π½β†“π‘œ 2. Efficiently form the KRP Β§ ​𝐽↓ β‰  π‘œ 1Step o ( ​𝑉↓𝑂 βˆ’1 ⨀…⨀ β€‹π‘‰β†“π‘œ +1 ⨀ β€‹π‘‰β†“π‘œ βˆ’1 ⨀…⨀ ​𝑉↓ 0 ) Β§ 2Step o ​𝐿↓𝑀 = ​ ( 𝑉↓ 0 ⨀…⨀ β€‹π‘‰β†“π‘œ βˆ’1 ) o ​𝐿↓𝑆 = ​ ( π‘‰β†“π‘œ +1 ⨀…⨀ ​𝑉↓𝑂 ) 3. Utilize BLAS

  11. Computing the KRP Consider 𝐿 = 𝐡 ⨀ 𝐢 ⨀ 𝐷 β€’ 𝐿(π‘˜ ,: ) = 𝐡(𝑏 ,: ) βˆ— 𝐢(𝑐 ,: ) βˆ— 𝐷(𝑑 ,: ) 𝐡( 0,: ) βˆ— 𝐢( 0,: ) ⨀ 𝐷 ⨀ ⨀ = 𝐡 𝐢 𝐷 𝐿 ​𝐽↓𝐡 ​𝐽↓𝐢 ​𝐽↓𝑑

  12. Timings for KRPs of naΓ―ve and reuse algorithms.

  13. 1-Step MTTKRP ! "& 1 void permuting tensor entries ! & ast computation as matmul 2 (4) ) blocks y observation: the n th mode ! ' tricization of a tensor can be tained by chunking the tensor ! ' ) blocks ! ' contiguous submatrices of ual size. 2 (6) ( ! ' ! "#$% ! #$% 9 2 (7$8)

  14. Parallel 1-Step MTTKRP Form ​𝐿↓𝑀 Form ​𝐿↓𝑆 ( π‘˜ ,:) Form 𝐿(π‘˜ ,: ) MatMul Reduce

  15. 2-Step MTTKRP β€’ First Compute a Partial MTTKRP 1. Compute ​𝐿↓𝑀 and ​𝐿↓𝑆 2. β„’ ​ ← π‘Œβ†“ (0: π‘œ βˆ’1) β†‘π‘ˆ βˆ™ ​𝐿↓𝑀 o β„’ is β€‹π½β†“π‘œ ×…× ​𝐽↓𝑂 βˆ’1 Γ— 𝐷 β€’ Second Compute a Series of ___?___ operations. a. Tensor Times Vector (TTVs) b. Tensor Times Matrix (TTMs) c. Quasi-Tensor Times Matrix (q-TTMs)

  16. 2-Step MTTKRP: β„’ β€’ First Compute a Partial MTTKRP # ! " & & % % ! " $ ! " ! " $ ! " # ! " 2 1 (*:.-/) ( (*:,-.-/) ' ( = $

  17. 2-Step MTTKRP: β„’ β€’ Second Compute a series of TTVs ! blocks ! ! ) * ) * 7 ) * 7 ) * + (-) [0] 8 5 6 (: , 0) = 2(: , 0)

  18. Parallel 2-Step MTTKRP Call Parallel BLAS WOW!!!

  19. 60Γ—60Γ—60Γ—60Γ—60

  20. Per iteration time of a CP decomposition via ALS. Matlab used the Tensor Toolbox cp_als function, version 2.6. [1]

  21. Findings wo interesting networks β€’ Positive affect β€’ Negative affect Tobia M., Hayashi K., Ballard G., Gotlib I. Dynamic Functional Connectivity and Individual Differences in Emotions During Social Stress - to appear in uman Brain Mapping

  22. References Tamara G. Kolda and Bre8 W. Bader. 2009. Tensor DecomposiAons and ApplicaAons. SIAM Rev. 51, 3 (Septembe 2009), 455–500. h8ps://doi.org/10.1137/ 07070111X Jiajia Li, Jee Choi, Ioakeim Perros, Jimeng Sun, and Richard Vuduc. 2017. Model Driven Sparse CP DecomposiAon for Higher-Order Tensors. In IEEE InternaAonal Parallel and Distributed Processing Symposium (IPDPS). 1048–10 h8ps://doi.org/10.1109/IPDPS.2017.80 Shaden Smith, Niranjay Ravindran, Nicholas D. Sidiropoulos, and George Karypis. 2015. SPLATT: Efficient and Parallel Sparse Tensor-Matrix MulAplicaAon. In Proceedings of the 2015 IEEE InternaAonal Parallel and Distribute Processing Symposium (IPDPS ’15). IEEE Computer Society, Washington, DC, USA, 61–70. h8ps://doi.org/10.1109/ IPDPS.2015.27 D.C. Van Essen, K. Ugurbil, E. Auerbach, D. Barch, T.E.J. Behrens, R. Bucholz, A. Chang, L. Chen, M. Corbe8a, S.W. CurAss, S. Della Penna, D. Feinberg, M.F. Glasser, N. Harel, A.C. Heath, L. Larson-Prior, D. Marcus, G. Michalareas S. Moeller, R. Oostenveld, S.E. Petersen, F. Prior, B.L. Schlaggar, S.M. Smith, A.Z. Snyder, J. Xu, and E. Yacoub. 20 The Human Connectome Project: a data acquisiAon perspecAve. Neuroimage 62, 4 (2012), 2222–2231. h8ps:// doi.org/10. 1016/j.neuroimage.2012.02.018 Anh-Huy Phan, Petr Tichavsky, and Andrzej Cichocki. 2013. Fast AlternaAng LS Algorithms for High Order CANDECOMP/PARAFAC Tensor FactorizaAons. IEEE TransacAons on Signal Processing 61, 19 (Oct 2013), 4834– 4846. h8ps://doi. org/10.1109/TSP.2013.2269903

  23. End Thanks for listening

Recommend


More recommend