A randomized block sampling approach to the canonical polyadic decomposition of large-scale tensors Nico Vervliet Joint work with Lieven De Lathauwer SIAM AN17, July 13, 2017
Classification of hazardous gasses using e-noses Sensor Classify 900 experiments containing 72 time series with 26 000 samples each. Time E x p e r i m e n t 2
Overview Decomposing large-scale tensors Randomized block sampling Experimental results Chemo-sensing application 3
Canonical polyadic decomposition ◮ Sum of R rank-1 terms c 1 c R b 1 b R = + · · · + T a 1 a R 4
Canonical polyadic decomposition ◮ Sum of R rank-1 terms c 1 c R b 1 b R = + · · · + T a 1 a R ◮ Mathematically, for a general N th order tensor T R a (1) ⊗ a (2) ⊗ · · · ⊗ a ( N ) � T = r r r r =1 � A (1) , A (2) , . . . , A ( N ) � = 4
Computing a CPD ◮ Optimization problem: 1 2 � � � A (1) , A (2) , . . . , A ( N ) � � � � T − min � � � � 2 � � � F A (1) , A (2) ,..., A ( N ) 5
Computing a CPD ◮ Optimization problem: 1 2 � � � A (1) , A (2) , . . . , A ( N ) � � � � T − min � � � � 2 � � � F A (1) , A (2) ,..., A ( N ) ◮ Algorithms ◮ Alternating least squares ◮ CPOPT [Acar et al. 2011a] ◮ (Damped) Gauss–Newton [Phan et al. 2013] ◮ (Inexact) nonlinear least squares [Sorber et al. 2013] 5
Curse of dimensionality ◮ Suppose N th order T ∈ C I × I ×···× I , then ◮ number of entries: I N ◮ memory and time complexity: O � I N � 6
Curse of dimensionality ◮ Suppose N th order T ∈ C I × I ×···× I , then ◮ number of entries: I N ◮ memory and time complexity: O � I N � ◮ number of variables: NIR 6
Curse of dimensionality ◮ Suppose N th order T ∈ C I × I ×···× I , then ◮ number of entries: I N ◮ memory and time complexity: O � I N � ◮ number of variables: NIR Example [Vervliet et al. 2014] Ninth-order tensor with I = 100 and rank R = 5: ◮ number of entries: 10 18 ◮ number of variables: 4500 6
How to handle large tensors? ◮ Use incomplete tensors Acar et al. 2011b; Vervliet et al. 2014; Vervliet et al. 2016a ◮ Exploit sparsity Kang et al. 2012; Papalexakis et al. 2012; Bader and Kolda 2007 ◮ Compress the tensor Sidiropoulos et al. 2014; Oseledets and Tyrtyshnikov 2010; Vervliet et al. 2016b ◮ Decompose subtensors and combine results Papalexakis et al. 2012; Phan and Cichocki 2011 ◮ Parallel Liavas and Sidiropoulos 2015 + many of the above 7
Overview Decomposing large-scale tensors Randomized block sampling Experimental results Chemo-sensing application 8
Randomized block sampling CPD: idea ≈ + · · · + 9
Randomized block sampling CPD: idea ≈ + · · · + 9
Randomized block sampling CPD: idea ≈ + · · · + Take sample 9
Randomized block sampling CPD: idea ≈ + · · · + Take sample Initialization Compute step + · · · + 9
Randomized block sampling CPD: idea ≈ + · · · + Update Take sample Initialization Compute step + · · · + 9
Randomized block sampling CPD: algorithm input : Data T and initial guess A ( n ) , n = 1 , ..., N output: A ( n ) , n = 1 , ..., N such that T ≈ A (1) , . . . , A ( N ) � � while k < K and not converged do Create sample T s and corresponding A ( n ) s , n = 1 , . . . , N A ( n ) Let ¯ be the result of 1 iteration in a restricted CPD s algorithm on T s with initial guess A ( n ) s , n = 1 , ..., N and restriction ∆ Update the affected variables A ( n ) using ¯ A ( n ) s , n = 1 , ..., N k ← k + 1 10
Randomized block sampling CPD: algorithm input : Data T and initial guess A ( n ) , n = 1 , ..., N output: A ( n ) , n = 1 , ..., N such that T ≈ A (1) , . . . , A ( N ) � � while k < K and not converged do Create sample T s and corresponding A ( n ) s , n = 1 , . . . , N A ( n ) Let ¯ be the result of 1 iteration in a restricted CPD s algorithm on T s with initial guess A ( n ) s , n = 1 , ..., N and restriction ∆ Update the affected variables A ( n ) using ¯ A ( n ) s , n = 1 , ..., N k ← k + 1 10
Ingredient 1: randomized block sampling For a 6 × 6 tensor and block size 3 × 2: I 1 = { 3 , 1 , 2 , 6 , 5 , 4 } I 2 = { 1 , 2 , 4 , 6 , 3 , 5 } 11
Ingredient 1: randomized block sampling For a 6 × 6 tensor and block size 3 × 2: I 1 = { 3 , 1 , 2 , 6 , 5 , 4 } I 1 = { 3 , 1 , 2 , 6 , 5 , 4 } I 2 = { 1 , 2 , 4 , 6 , 3 , 5 } I 2 = { 1 , 2 , 4 , 6 , 3 , 5 } 11
Ingredient 1: randomized block sampling For a 6 × 6 tensor and block size 3 × 2: I 1 = { 3 , 1 , 2 , 6 , 5 , 4 } I 1 = { 3 , 1 , 2 , 6 , 5 , 4 } I 1 = { 6 , 1 , 4 , 2 , 5 , 3 } I 2 = { 1 , 2 , 4 , 6 , 3 , 5 } I 2 = { 1 , 2 , 4 , 6 , 3 , 5 } I 2 = { 1 , 2 , 4 , 6 , 3 , 5 } 11
Ingredient 2: restricted CPD algorithm ◮ ALS variant − 1 A ( n ) k +1 = (1 − α ) A ( n ) + α T ( n ) ¯ V ( n ) ( ¯ W ( n ) ) k Enforce restriction by α = ∆ k . 12
Ingredient 2: restricted CPD algorithm ◮ ALS variant − 1 A ( n ) k +1 = (1 − α ) A ( n ) + α T ( n ) ¯ V ( n ) ( ¯ W ( n ) ) k Enforce restriction by α = ∆ k . ◮ NLS variant 1 2 || vec ( F ( x k )) − J k p k || 2 s.t. || p k || ≤ ∆ k min p k in which � A (1) , . . . , A ( N ) � F = T − 12
Ingredient 3: restriction Use restriction of form � ∆ 0 if k < K search ∆ k = ˆ ∆ 0 · α ( k − K search ) / Q if k ≥ K search 10 − 1 10 − 3 0 50 100 150 200 Iteration k 13
Ingredient 3: restriction Use restriction of form � ∆ 0 if k < K search ∆ k = ˆ ∆ 0 · α ( k − K search ) / Q if k ≥ K search 10 − 1 10 − 3 0 50 100 150 200 Iteration k Example (Selecting Q ) For a 100 × 100 × 100 tensor and block size 25 × 25 × 25, Q = 4 13
Ingredient 4: A stopping criterion � 2 � �� A (1) , ..., A ( N ) � � �� ◮ Function evaluation f val = 0 . 5 � � T − 10 0 f val CPD Error 10 − 1 10 − 2 10 − 3 0 500 1 000 1 500 Iteration k 14
Ingredient 4: A stopping criterion � 2 � �� A (1) , ..., A ( N ) � � �� ◮ Function evaluation f val = 0 . 5 � � T − 10 0 f val CPD Error 10 − 1 10 − 2 10 − 3 0 500 1 000 1 500 Iteration k ◮ Step size 14
Intermezzo: Cram´ er–Rao bound ◮ Uncertainty of an estimate 68% − 3 σ − 2 σ − σ 0 σ 2 σ 3 σ 15
Intermezzo: Cram´ er–Rao bound ◮ Uncertainty of an estimate 68% − 3 σ − 2 σ − σ 0 σ 2 σ 3 σ ◮ CRB ≤ σ 2 15
Intermezzo: Cram´ er–Rao bound ◮ Uncertainty of an estimate 68% − 3 σ − 2 σ − σ 0 σ 2 σ 3 σ ◮ CRB ≤ σ 2 ◮ C = τ 2 ( J H J ) − 1 15
Ingredient 4: Cram´ er–Rao bound based stopping criterion ◮ Experimental bound ◮ Use estimates A ( n ) k ◮ Use f val to estimate noise τ 16
Ingredient 4: Cram´ er–Rao bound based stopping criterion ◮ Experimental bound ◮ Use estimates A ( n ) k ◮ Use f val to estimate noise τ ◮ Stopping criterion: � � � A ( n ) k ( i , r ) − A ( n ) k − K CRB ( i , r ) N I n R � � 1 � � � � D CRB = R � � n I n C ( n ) ( i , r ) n =1 i =1 r =1 16
Ingredient 4: Cram´ er–Rao bound based stopping criterion ◮ Experimental bound ◮ Use estimates A ( n ) k ◮ Use f val to estimate noise τ ◮ Stopping criterion: � � � A ( n ) k ( i , r ) − A ( n ) k − K CRB ( i , r ) N I n R � � 1 � � � � D CRB = R � � n I n C ( n ) ( i , r ) n =1 i =1 r =1 ≤ γ 16
Unrestricted phase vs restricted phase CPD Error 1 2 3 Iteration k ◮ Unrestricted phase (1 + 2): converge to a neighborhood of an optimum ◮ Restricted phase (3): pull iterates towards optimum 17
Unrestricted phase vs restricted phase CPD Error 1 2 3 Iteration k ◮ Unrestricted phase (1 + 2): converge to a neighborhood of an optimum ◮ Restricted phase (3): pull iterates towards optimum 17
Unrestricted phase vs restricted phase CPD Error 1 2 3 Iteration k ◮ Unrestricted phase (1 + 2): converge to a neighborhood of an optimum ◮ Restricted phase (3): pull iterates towards optimum Assumptions ◮ CPD of rank R exists ◮ SNR is high enough ◮ Most block dimensions > R 17
Overview Decomposing large-scale tensors Randomized block sampling Experimental results Chemo-sensing application 18
Experiment overview ◮ Experiments ◮ Comparison ALS vs NLS (see paper) ◮ Influence of block size ◮ Influence of step size (see paper) 19
Experiment overview ◮ Experiments ◮ Comparison ALS vs NLS (see paper) ◮ Influence of block size ◮ Influence of step size (see paper) ◮ Performance ◮ 50 Monte Carlo experiments ◮ CPD error � � � � � � � � � A ( n ) � A ( n ) − A ( n ) max � / � � � � � � � � 0 res 0 � � � � � n 19
Experiment overview ◮ Experiments ◮ Comparison ALS vs NLS (see paper) ◮ Influence of block size ◮ Influence of step size (see paper) ◮ Performance ◮ 50 Monte Carlo experiments ◮ CPD error � � � � � � � � � A ( n ) � A ( n ) − A ( n ) max � / � � � � � � � � 0 res 0 � � � � � n ◮ cpd rbs in Tensorlab 3.0 [Vervliet et al. 2016c] 19
Recommend
More recommend