singular value decomposition for high dimensional tensor
play

Singular Value Decomposition for High-dimensional Tensor Data Anru - PowerPoint PPT Presentation

Singular Value Decomposition for High-dimensional Tensor Data Anru Zhang Department of Statistics University of Wisconsin-Madison Introduction Introduction Tensors are arrays with multiple directions. Tensors of order three or higher


  1. Singular Value Decomposition for High-dimensional Tensor Data Anru Zhang Department of Statistics University of Wisconsin-Madison

  2. Introduction Introduction • Tensors are arrays with multiple directions. • Tensors of order three or higher are called high-order tensors. A ∈ R p 1 ×···× p d , A = ( A i 1 ··· i d ) , 1 ≤ i k ≤ p k , k = 1 , . . . , d . Anru Zhang (UW-Madison) Tensor SVD 2

  3. Introduction Importance of High-Order Methods More High-Order Data Are Emerging • Brain imaging • Microbiome studies • Matrix-valued time series Anru Zhang (UW-Madison) Tensor SVD 3

  4. Introduction Importance of High-Order Methods High Order Enables Solutions for Harder Problems High-order Interaction Pursuits • Model (Hao, Z. , Cheng, 2018) � � � y i = β 0 + X i β i + γ ij X i X j + η ijk X i X j X k + ε i , i = 1 , . . . , n . i i , j i , j , k � �� �� �� � � ������ �� ������ � � ����������� �� ����������� � Main effect Pairwise interaction Triple-wise • Rewrite as y i = � B , X i � + ε i . Anru Zhang (UW-Madison) Tensor SVD 4

  5. Introduction Importance of High-Order Methods High Order Enables Solutions for Harder Problems Estimation of Mixture Models • A mixture model incorporates subpopulations in an overall population. • Examples: ◮ Gaussian mixture model (Lindsay & Basak, 1993; Hsu & Kakade, 2013) ◮ Topic modeling (Arora et al, 2013) ◮ Hidden Markov Process (Anandkumar, Hsu, & Kakade, 2012) ◮ Independent component analysis (Miettinen, et al., 2015) ◮ Additive index model (Balasubramanian, Fan & Yang, 2018) ◮ Mixture regression model (De Veaux, 1989; Jordan & Jacobs, 1994) ◮ ... • Method of Moment (MoM): ◮ First moment → vector; ◮ Second moment → matrix; ◮ High-order moment → high-order tensors. Anru Zhang (UW-Madison) Tensor SVD 5

  6. Introduction Importance of High-Order Methods High Order is ... • High order is more charming! • High order is harder! Tensor problems are far more than extension of matrices. ◮ More structures ◮ High-dimensionality ◮ Computational difficulty ◮ Many concepts not well defined or NP-hard Anru Zhang (UW-Madison) Tensor SVD 6

  7. Introduction Importance of High-Order Methods High Order Casts New Problems and Challenges • Tensor Completion • Tensor SVD • Tensor Regression • Biclustering/Triclustering • ... Anru Zhang (UW-Madison) Tensor SVD 7

  8. Introduction Importance of High-Order Methods In this talk, we focus on tensor SVD . Anru Zhang (UW-Madison) Tensor SVD 8

  9. Introduction Importance of High-Order Methods Part I: Tensor SVD: Statistical and Computational Limits Anru Zhang (UW-Madison) Tensor SVD 9

  10. Tensor SVD SVD and PCA • Singular value decomposition (SVD) is one of the most important tools in multivariate analysis. • Goal: Find the underlying low-rank structure from the data matrix. • Closely related to Principal component analysis (PCA): Find the one/multiple directions that explain most of the variance . Anru Zhang (UW-Madison) Tensor SVD 10

  11. Tensor SVD Tensor SVD • We propose a general framework for tensor SVD. • Y = X + Z , where ◮ Y ∈ R p 1 × p 2 × p 3 is the observation; ◮ Z is the noise of small amplitude; ◮ X is a low-rank tensor. • We wish to recover the high-dimensional low-rank structure X . → Unfortunately, there is no uniform definition for tensor rank. Anru Zhang (UW-Madison) Tensor SVD 11

  12. Tensor SVD Tensor Rank Has No Uniform Definition • Canonical polyadic (CP) rank: r cp = min r s.t. r � X = λ i · u i ◦ v i ◦ w i i = 1 • Tucker rank: X = S × 1 U 1 × 2 U 2 × 3 U 3 S ∈ R r 1 × r 2 × r 3 , U k ∈ R p k × r k Smallest possible ( r 1 , r 2 , r 3 ) are Tucker rank of X . • See Kolda and Balder (2009) for a comprehensive survey. Picture Source: Guoxu Zhou’s website. http://www.bsp.brain.riken.jp/ zhougx/tensor.html Anru Zhang (UW-Madison) Tensor SVD 12

  13. Tensor SVD Model • Observations: Y ∈ R p 1 × p 2 × p 3 , Y = X + Z = S × 1 U 1 × 2 U 2 × 3 U 3 + Z , Z iid S ∈ R r 1 × r 2 × r 3 . ∼ N (0 , σ 2 ) , U k ∈ O p k , r k , • Goal: estimate U 1 , U 2 , U 3 , and the original tensor X . Anru Zhang (UW-Madison) Tensor SVD 13

  14. Tensor SVD Straightforward Idea 1: Higher order SVD (HOSVD) • Since U k is the subspace for M k ( X ) , let ˆ U k = SVD r k ( M k ( Y )) , k = 1 , 2 , 3 . i.e. the leading r k singular vectors of all mode- k fibers. Note: SVD r ( · ) represents the first r left singular vectors of any given matrix. Anru Zhang (UW-Madison) Tensor SVD 14

  15. Tensor SVD Straightforward Idea 1: Higher order SVD (HOSVD) (De Lathauwer, De Moor, and Vandewalle, SIAM J. Matrix Anal. & Appl. 2000a) • Advantage : easy to implement and analyze. • Disadvantage: perform sub-optimally. Reason: simply unfolding the tensor fails to utilize the tensor structure! Anru Zhang (UW-Madison) Tensor SVD 15

  16. Tensor SVD Straightforward Idea 2: Maximum Likelihood Estimator • Maximum-likelihood estimator mle = argmax mle mle mle ˆ 1 , ˆ 2 , ˆ 3 , ˆ � Y − S × 1 U 1 × 2 U 2 × 3 U 3 � 2 S U U U F U 1 , U 2 , U 3 , S mle mle mle • Equivalently, ˆ 1 , ˆ 2 , ˆ U U U can be calculated via 3 � � � 2 � Y × 1 V ⊤ 1 × 2 V ⊤ 2 × 3 V ⊤ max � � 3 F subject to V 1 ∈ O p 1 , r 1 , V 2 ∈ O p 2 , r 2 , V 3 ∈ O p 3 , r 3 . • Advantage : achieves statistical optimality. (will be shown later) • Disadvantage : ◮ Non-convex, computational intractable. ◮ NP-hard to approximate even r = 1 (Hillar and Lim, 2013). Anru Zhang (UW-Madison) Tensor SVD 16

  17. Tensor SVD Phase Transition in Tensor SVD • The difficulty is driven by signal-to-noise ratio (SNR). λ = min k = 1 , 2 , 3 σ r k ( M k ( X )) = least non-zero singular value of M k ( X ) , k = 1 , 2 , 3 , σ = SD ( Z ) = noise level . • Suppose p 1 ≍ p 2 ≍ p 3 ≍ p . Three phases: λ/σ ≥ Cp 3 / 4 (Strong SNR case) , λ/σ < cp 1 / 2 (Weak SNR case) , p 1 / 2 ≪ λ/σ ≪ p 3 / 4 (Moderate SNR case) . Anru Zhang (UW-Madison) Tensor SVD 17

  18. Tensor SVD Strong SNR Case Strong SNR Case: Methodology • When λ/σ ≥ Cp 3 / 4 , apply higher-order orthogonal iteration (HOOI). (De Lathauwer, Moor, and Vandewalle, SIAM. J. Matrix Anal. & Appl. 2000b) • (Step 1. Spectral initialization) (0) ˆ U = SVD r k ( M k ( Y )) , k = 1 , 2 , 3 . k • (Step 2. Power iterations) Repeat Let t = t + 1 . Calculate � � ( t ) ( t − 1) ) ⊤ × 3 ( ˆ ( t − 1) ˆ M 1 ( Y × 2 ( ˆ ) ⊤ ) 1 = SVD r 1 U U U , 2 3 � � 1 ) ⊤ × 3 ( ˆ ( t − 1) ( t ) ( t ) ) ⊤ ) ˆ M 2 ( Y × 1 ( ˆ U 2 = SVD r 2 U U , 3 � � ( t ) ( t ) 1 ) ⊤ × 2 ( ˆ ( t ) ˆ M 3 ( Y × 1 ( ˆ 2 ) ⊤ ) 3 = SVD r 3 U U U . Until t = t max or convergence. Anru Zhang (UW-Madison) Tensor SVD 18

  19. Tensor SVD Strong SNR Case Interpretation 1. Spectral initialization provides a “warm start.” 2. Power iteration refines the initializations. ( t − 1) ( t − 1) ( t − 1) Given ˆ , ˆ , ˆ , denoise Y via: U U U 1 2 3 ( t − 1) ( t − 1) Y × 2 ˆ × 3 ˆ U U . 2 3 ◮ Mode-1 singular subspace is reserved; ◮ Noise can be highly reduced. Thus, we update � � �� ( t − 1) ( t − 1) ( t ) ˆ Y × 2 ˆ × 3 ˆ U 1 = SVD r 1 M r 1 U U . 2 3 Anru Zhang (UW-Madison) Tensor SVD 19

  20. Tensor SVD Strong SNR Case Higher-order orthogonal iteration (HOOI) (De Lathauwer, Moor, and Vandewalle, SIAM. J. Matrix Anal. & Appl. 2000b) Anru Zhang (UW-Madison) Tensor SVD 20

  21. Tensor SVD Strong SNR Case Strong SNR Case: Theoretical Analysis Theorem (Upper Bound) Suppose λ/σ > Cp 3 / 4 and other regularity conditions hold, after at most O � log( p /λ ) ∨ 1 � iterations, • (Recovery of U 1 , U 2 , U 3 ) � F ≤ C √ p k r k � � � ˆ � � E min U k − U k O , k = 1 , 2 , 3; λ/σ O ∈ O r • (Recovery of X ) � � � 2 � ˆ F ≤ C ( p 1 r 1 + p 2 r 2 + p 3 r 3 ) σ 2 , � � sup k = 1 , 2 , 3 E max X − X X ∈F p , r ( λ ) k = 1 , 2 , 3 E � ˆ X − X � 2 ≤ C ( p 1 + p 2 + p 3 ) σ 2 F sup max . � X � 2 λ 2 X ∈F p , r ( λ ) F Anru Zhang (UW-Madison) Tensor SVD 21

  22. Tensor SVD Strong SNR Case Strong SNR Case: Lower Bound Define the following class of low-rank tensors with signal strength λ . F p , r ( λ ) = � X ∈ R p 1 × p 2 × p 3 : rank ( X ) = ( r 1 , r 2 , r 3 ) , σ r k ( M k ( X )) ≥ λ � Theorem (Lower Bound) (Recovery of U 1 , U 2 , U 3 ) √ p k r k � � � ˜ � U k − U k O � F ≥ c � inf sup E min λ/σ , k = 1 , 2 , 3 . ˜ O ∈ O r U k X ∈F p , r ( λ ) (Recovery of X ) � � 2 � � ˆ F ≥ c ( p 1 r 1 + p 2 r 2 + p 3 r 3 ) σ 2 , inf sup E � X − X � ˆ X X ∈F p , r ( λ ) E � ˆ X − X � 2 ≥ c ( p 1 + p 2 + p 3 ) σ 2 F inf sup . � X � 2 λ 2 ˆ X X ∈F p , r ( λ ) F Anru Zhang (UW-Madison) Tensor SVD 22

Recommend


More recommend