The decomposition of a third-order tensor in R block-terms of rank-(L,L,1) Model, Algorithms, Uniqueness, Estimation of R and L Dimitri Nion & Lieven De Lathauwer K.U. Leuven, Kortrijk campus, Belgium E-mails: Dimitri.Nion@kuleuven-kortrijk.be Lieven.DeLathauwer@kuleuven-kortrijk.be TRICAP 2009 , Nurià, Spain, June 14th-19th, 2009
Introduction Tensor Decompositions = Powerful multi-linear algebra tools that generalize matrix decompositions. Motivation: increasing number of applications involving manipulation of multi-way data, rather than 2-way data. Key research axes: � Development of new models/decompositions � Development of algorithms to compute decompositions � Uniqueness of tensor decompositions � Use these tools in new applications, or existing applications where the multi-way nature of data was ignored until now � Tensor decompositions under constraints (e.g. imposing non-negativity or specific algebraic structures) 2
From matrix SVD to tensor HOSVD Matrix SVD J R v 1 H v R H d 11 d RR V H R Y U I = = + … + D u 1 u R Tensor HOSVD (third-order case) L M N = ∑∑∑ W y u v w h K ijk il jm kn lmn N = = = l m n V T 1 1 1 = � � U U V W = × × × L I � � M 1 2 3 J � One unitary matrix (U U, V V, W W) per mode U U V V W W �� is the representation of � � in the reduced spaces. � �� �� �� � � ≠ ≠ L M N � We may have � is not not diagonal (difference with matrix SVD). not not 3 � � � �
From matrix SVD to PARAFAC Matrix SVD J R H H v 1 v R d 11 d RR V H Y R U I = = D + … + u 1 u R PARAFAC decomposition C � is diagonal K � � � R B T = ( if i=j=k, h ijk =1, else, h ijk =0 ) � I A R � R J c c c c 1 c R c c c 1 1 1 R R R Sum of R rank-1 tensors: + … + b 1 b b b R b b b b � 1 +…+ � � � � � � � R = 1 1 1 R R R R R R a R a a a a a 1 a a R R R 1 1 1 K C �� �� �� �� = set of K matrices of the B T form: = A � (:,:,k)=A A A diag(C A C(k,:)) B C C B T B B � � �
From PARAFAC/HOSVD to Block Components Decompositions (BCD) [De Lathauwer and Nion] BCD in rank (L r ,L r ,1) terms c c R 1 K B T B T L 1 L 1 L R R L R 1 � = + … + I A A 1 R J BCD in rank (L r , M r , . ) terms K K K B T B � T = � � +…+ A L R 1 R A L 1 1 I 1 1 R M R M 1 J BCD in rank (L r , M r , N r ) terms C C 1 R K N 1 N R B T B T = � � +…+ � R A 1 R A L 1 I L R 1 1 R M 1 5 M R J
Content of this talk BCD - (L r ,L r ,1) c c R 1 K B B T T L 1 L 1 L R R 1 L R � = + … + I A A 1 R J � Model ambiguities � Algorithms � Uniqueness � Estimation of the parameters L r (r = 1,…,R) and R � An application in telecommunications 6
BCD - (L r ,L r ,1) : Model ambiguities c c R 1 K B T F − 1 F R F F − B T R 1 R L R + … + � = 1 1 1 L 1 I A A R 1 J � Unknown matrices: L 1 L R L 1 L R C = ... B = A = A A B B ... ... K I J R R 1 1 c c R 1 � BCD-(L r ,L r ,1) is said essentially unique if the only ambiguities are: Arbitrary permutation of the R blocks in A A and B B and of the R columns of C C A A B B C C + Each block of A A A A and B B B B post- - -multiplied by arbitrary non-singular matrix, each - column of C C arbitrarily scaled. C C = A = A = A = A and B B B B estimated up to multiplication by a block block block block- - -wise - wise wise wise permuted block- diagonal matrix and C C C C by a permuted diagonal matrix.
BCD - (L r ,L r ,1) : Algorithms � Usual approach: estimate A , B and C by minimization of 2 R ∑ = T A B c − � outer product Φ � � � ( ) r r r r = F 1 The model is fitted for a given choice of the parameters {L r , R} Exploit algebraic structure of matrix unfoldings J J Y Y × Y = ... Y I I I KJ K 1 k � � � � K K J Y Y × = Y Y ... J IK i J I K 1 Y I I j Y = Y × Y ... K J K JI 1 8
BCD - (L r ,L r ,1) : ALS Algorithm Y C Z B A = ⋅ Y C Z B A Φ = − ⋅ 2 ( , ) ( , ) K × JI K × JI 1 1 F Y B Z A C = ⋅ ( , ) Φ = Y − B ⋅ Z A C 2 ( , ) J IK × 2 J × IK 2 F Y = A ⋅ Z C B ( , ) Φ = Y − A ⋅ Z C B 2 I KJ × ( , ) 3 I × KJ 3 F Z 1 , Z 2 and Z 3 are built from 2 matrices only and have a block-wise Khatri- Rao product structure. A B = ˆ k ˆ ( 0 ) ( 0 ) Initialisa tion : , , 1 - ) Φ k − − Φ k > ε ε = while 6 ( 1 ) ( ) (e.g. 10 [ ] C Y Z B A k = ⋅ k − k − ˆ ˆ ˆ ( ) ( 1 ) ( 1 ) ( , ) ( 1 ) K JI × 1 [ ] B Y Z A − C k = ⋅ k k ˆ ˆ ˆ ( ) ( 1 ) ( ) ( , ) ( 2 ) J IK × 2 [ ] A Y Z C B k = ⋅ k k ˆ ˆ ˆ ( ) ( ) ( ) ( , ) ( 3 ) I × KJ 3 ← + k k 1 9
ALS algorithm: problem of swamps ALS algorithm: problem of swamps ALS algorithm: problem of swamps ALS algorithm: problem of swamps Observation: ALS is fast in many problems, but sometimes, a long swamp is encountered before convergence. Long swamp 27000 iterations ! Long Swamps typically occur when: � The loading matrices of the decomposition (i.e. the objective matrices) are ill-conditioned � The updated matrices become ill-conditionned (impact of initialization) � One of the R tensor-components in ����� ����� �� ����� ����� �� ������� ������� ������� ������� R has a much higher �� �� 10 norm than the R-1 others (e.g. « near-far » effect in telecommunications)
Improvement 1 of ALS: Line Search Improvement 1 of ALS: Line Search Improvement 1 of ALS: Line Search Improvement 1 of ALS: Line Search Purpose: reduce the length of swamps Principle: for each iteration, interpolate A A A A, B B B B and C C from their estimates of C C 2 previous iterations and use the interpolated matrices in input of ALS 1.Line Search: Search directions C C − C − C − new = k + ρ k − k ( ) ( 2 ) ( 1 ) ( 2 ) ( ) B B − B − B − new = k + ρ k − k ( ) ( 2 ) ( 1 ) ( 2 ) ρ ( ) Choice of crucial A new = A k − + ρ A k − − A k − ρ ( ) ( 2 ) ( 1 ) ( 2 ) ( ) =1 annihilates LS step (i.e. we get standard ALS) 2.Then ALS update [ ] C Y Z B A k = ⋅ new new ˆ ˆ ˆ ( ) ( ) ( ) ( , ) ( 1 ) K × JI 1 [ ] B Y Z A C k = ⋅ new k ˆ ˆ ˆ ( ) ( ) ( ) ( , ) ( 2 ) J × IK 2 [ ] A Y Z C B k = ⋅ k k ˆ ˆ ˆ ( ) ( ) ( ) ( , ) ( 3 ) I × KJ 3 ← + k k 1 11
Improvement 1 of ALS: Line Search Improvement 1 of ALS: Line Search Improvement 1 of ALS: Line Search Improvement 1 of ALS: Line Search ρ = [Harshman, 1970] « LSH » Choose 1 . 25 ρ = [Bro, 1997] « LSB » k 1 / 3 Choose and validate LS step if decrease in Fit [Rajih, Comon, 2005] « Enhanced Line Search (ELS) » A S H Φ new new new = Φ ρ = th ( ) ( ) ( ) For REAL tensors order polynomial . ( , , ) ( ) 6 A S H ρ Φ new new new ( ) ( ) ( ) Optimal is the root that minimizes ( , , ) [Nion, De Lathauwer, 2006] « Enhanced Line Search with Complex Step (ELSCS) » θ ρ = i m e For complex tensors, look for optimal . Φ A new S new H new = Φ θ m ( ) ( ) ( ) We have ( , , ) ( , ) θ m Alternate update of and : ∂ Φ θ m ( , ) θ = m m th Update for fixed, 5 order polynomial in : ∂ m ∂ Φ θ θ m ( , ) θ = = m t th Update : for fixed, 6 order polynomial in tan( ) ∂ θ 2 12
Improvement 1 of ALS: Line Search Improvement 1 of ALS: Line Search Improvement 1 of ALS: Line Search Improvement 1 of ALS: Line Search «easy» problem «difficult» problem 2000 iterations 27000 iterations � ELS � Large reduction of the number of iterations at a very low additional complexity w.r.t. standard ALS 13
Improvement 2 of ALS: Dimensionality reduction Improvement 2 of ALS: Dimensionality reduction Improvement 2 of ALS: Dimensionality reduction Improvement 2 of ALS: Dimensionality reduction C C K N B T B T = � = +…+ � A L I A M J STEP 1: STEP 2: HOSVD of � BCD of the small core tensor �� �� �� �� (compressed space) STEP 3: Come back to original space + a few refinement iterations in original space � Compression � Large reduction of the cost per iteration since the model is 14 fitted in compressed space.
Improvement 3 of ALS: Good initialization Improvement 3 of ALS: Good initialization Improvement 3 of ALS: Good initialization Improvement 3 of ALS: Good initialization Comparison ALS and ALS+ELS, with three random initializations Instead of using random initializations, could we use the observed tensor itself ? YES For the BCD-(L,L,1), if A and B are full column rank (so I and J have to be long enough), there is an easy way to find a good intialization, in same spirit as Direct Trilinear Decomposition (DTLD) used to initialize PARAFAC (not detailed in 15 this talk) .
Recommend
More recommend