Shared Memory Parallelization of MTTKRP for Dense Tensors BLIS Retreat 2017, September 18 th Koby Hayashi , Grey Ballard, Yujie Jiang, Michael Tobia hayakb13@,ballard@,jiany14@,tobiamj@wfu.edu
Neuroimaging Application ensor: Time by Subjects by oxel Correlation Matrix est: Rest Γ Activity Γ ecovery Subjects: Control, MDD, SAD, COMO Time
Quick Introduction to Tensors Multidimensional arrays, an N-dimensional nsors is said to be N-way or order-N. 4-way 5-way ay 2-way 3-way
CP Decomposition nonical Polyadic composition (CP): Decomposes a tensor into a sum of rank 1 tensors π΄β βπ =0 βπ· β1 ββπ£βππ β βπ€βππ β βπ₯βππ π΄β β¦π , π , πβ§
CP via Alternating Least Squares
Hadamard Product βπ βπβ 0 )ββ¦β β ( πβπ β1 βπ βπβπ β1 )β β ( πβπ +1 βπ βπβπ +1 )ββ¦β β ( πβπ β1 βπ βπβπ β1 ) Element wise matrix product denoted * π· = π΅ β πΆ βπ·βππ = βπ΅βππ β βπΆβππ βπ·βππ βπ΅βππ βπΆβππ = π½ * πΎ πΎ πΎ
Khatri Rao Product = βπβ πβ(π) ( βπ½β π½βπΆ βπ β¨β¦β¨ βπ½β π½βπ +π β¨ βπ½β π½βπ βπ β¨β¦β¨ βπ½β π½β π ) = β A (βπ βπ΅ ,: ) Khatri Rao Product (KRP): B (βπ βπΆ πΏ = π΅ β πΆ βπ½βπ΅ β βπ½βπΆ π· π· olumn-wise Kronecker Product πΏ( :, π) = π΅( :, π) β πΆ( :, π) r Hadamard Product of Rows πΏ(βπ βπΆ + βπ βπ΅ βπ½βπΆ ,: ) πΏ(βπ βπΆ + βπ βπ΅ βπ½βπΆ ,: ) = π΅(βπ βπ΅ ,: ) β πΆ(βπ βπΆ ,: ) π·
Tensor Fibers βπ =0, π΄ β (: ππ ) βπ =1, π΄ β ( π : π ) βπ =2, π΄ β ( ππ :)
Unfolding Tensors π= βπβ πβ(π) ( βπ½β π½βπΆ βπ β¨β¦β¨ βπ½β π½βπ +π β¨ βπ½β π½βπ βπ β¨β¦β¨ βπ½β π½β π ) βπ½β β π β’ The n th mode matricization of a N- way tensor π΄ that is βπ½β 0 Γ βπ½β 1 Γβ¦Γ βπ½βπ β1 βπβ ( π ) βπ½βπ is denoted βπβ ( π ) and is βπ½βπ Γ βπ½β β π o βπ½β β π = βπ β π β[ π ] βββπ½βπ βπβ ( π : π ) denotes a matricization β’ where {π , π +1,β¦, π} are the row modes βπ ={ π , π +1,β¦, π } βββπ½βπ βπβ ( π : π )
Matricized Tensor Times Khatri Rao Product π = βπβ(π) ( βπβ 0 β¨β¦β¨ βπβπ β1 β¨ βπβπ +1 β¨β¦β¨ βπβπ β1 ) NaΓ―ve algorithm Permute π΄ to βπβ(π) π· 1. Form K= β ( πβ 0 β¨β¦β¨ βπβπ β1 β¨ βπβπ +1 β¨β¦β¨ βπβπ β1 ) 2. βπβ(π) 3. Call DGEMM π· 1-Step and 2-Step MTTKRP = βπ½βπ Avoid permuting π΄ K 1. βπ½βπ 2. Efficiently form the KRP Β§ βπ½β β π 1Step o ( βπβπ β1 β¨β¦β¨ βπβπ +1 β¨ βπβπ β1 β¨β¦β¨ βπβ 0 ) Β§ 2Step o βπΏβπ = β ( πβ 0 β¨β¦β¨ βπβπ β1 ) o βπΏβπ = β ( πβπ +1 β¨β¦β¨ βπβπ ) 3. Utilize BLAS
Computing the KRP Consider πΏ = π΅ β¨ πΆ β¨ π· β’ πΏ(π ,: ) = π΅(π ,: ) β πΆ(π ,: ) β π·(π ,: ) π΅( 0,: ) β πΆ( 0,: ) β¨ π· β¨ β¨ = π΅ πΆ π· πΏ βπ½βπ΅ βπ½βπΆ βπ½βπ
Timings for KRPs of naΓ―ve and reuse algorithms.
1-Step MTTKRP ! "& 1 void permuting tensor entries ! & ast computation as matmul 2 (4) ) blocks y observation: the n th mode ! ' tricization of a tensor can be tained by chunking the tensor ! ' ) blocks ! ' contiguous submatrices of ual size. 2 (6) ( ! ' ! "#$% ! #$% 9 2 (7$8)
Parallel 1-Step MTTKRP Form βπΏβπ Form βπΏβπ ( π ,:) Form πΏ(π ,: ) MatMul Reduce
2-Step MTTKRP β’ First Compute a Partial MTTKRP 1. Compute βπΏβπ and βπΏβπ 2. β β β πβ (0: π β1) βπ β βπΏβπ o β is βπ½βπ Γβ¦Γ βπ½βπ β1 Γ π· β’ Second Compute a Series of ___?___ operations. a. Tensor Times Vector (TTVs) b. Tensor Times Matrix (TTMs) c. Quasi-Tensor Times Matrix (q-TTMs)
2-Step MTTKRP: β β’ First Compute a Partial MTTKRP # ! " & & % % ! " $ ! " ! " $ ! " # ! " 2 1 (*:.-/) ( (*:,-.-/) ' ( = $
2-Step MTTKRP: β β’ Second Compute a series of TTVs ! blocks ! ! ) * ) * 7 ) * 7 ) * + (-) [0] 8 5 6 (: , 0) = 2(: , 0)
Parallel 2-Step MTTKRP Call Parallel BLAS WOW!!!
60Γ60Γ60Γ60Γ60
Per iteration time of a CP decomposition via ALS. Matlab used the Tensor Toolbox cp_als function, version 2.6. [1]
Findings wo interesting networks β’ Positive affect β’ Negative affect Tobia M., Hayashi K., Ballard G., Gotlib I. Dynamic Functional Connectivity and Individual Differences in Emotions During Social Stress - to appear in uman Brain Mapping
References Tamara G. Kolda and Bre8 W. Bader. 2009. Tensor DecomposiAons and ApplicaAons. SIAM Rev. 51, 3 (Septembe 2009), 455β500. h8ps://doi.org/10.1137/ 07070111X Jiajia Li, Jee Choi, Ioakeim Perros, Jimeng Sun, and Richard Vuduc. 2017. Model Driven Sparse CP DecomposiAon for Higher-Order Tensors. In IEEE InternaAonal Parallel and Distributed Processing Symposium (IPDPS). 1048β10 h8ps://doi.org/10.1109/IPDPS.2017.80 Shaden Smith, Niranjay Ravindran, Nicholas D. Sidiropoulos, and George Karypis. 2015. SPLATT: Efficient and Parallel Sparse Tensor-Matrix MulAplicaAon. In Proceedings of the 2015 IEEE InternaAonal Parallel and Distribute Processing Symposium (IPDPS β15). IEEE Computer Society, Washington, DC, USA, 61β70. h8ps://doi.org/10.1109/ IPDPS.2015.27 D.C. Van Essen, K. Ugurbil, E. Auerbach, D. Barch, T.E.J. Behrens, R. Bucholz, A. Chang, L. Chen, M. Corbe8a, S.W. CurAss, S. Della Penna, D. Feinberg, M.F. Glasser, N. Harel, A.C. Heath, L. Larson-Prior, D. Marcus, G. Michalareas S. Moeller, R. Oostenveld, S.E. Petersen, F. Prior, B.L. Schlaggar, S.M. Smith, A.Z. Snyder, J. Xu, and E. Yacoub. 20 The Human Connectome Project: a data acquisiAon perspecAve. Neuroimage 62, 4 (2012), 2222β2231. h8ps:// doi.org/10. 1016/j.neuroimage.2012.02.018 Anh-Huy Phan, Petr Tichavsky, and Andrzej Cichocki. 2013. Fast AlternaAng LS Algorithms for High Order CANDECOMP/PARAFAC Tensor FactorizaAons. IEEE TransacAons on Signal Processing 61, 19 (Oct 2013), 4834β 4846. h8ps://doi. org/10.1109/TSP.2013.2269903
End Thanks for listening
Recommend
More recommend