an introduction to tensor based independent component
play

An Introduction to Tensor-Based Independent Component Analysis - PowerPoint PPT Presentation

L. De Lathauwer An Introduction to Tensor-Based Independent Component Analysis Lieven De Lathauwer K.U.Leuven Belgium Lieven.DeLathauwer@kuleuven-kortrijk.be 1 L. De Lathauwer Overview Problem definition Higher-order statistics


  1. L. De Lathauwer An Introduction to Tensor-Based Independent Component Analysis Lieven De Lathauwer K.U.Leuven Belgium Lieven.DeLathauwer@kuleuven-kortrijk.be 1

  2. L. De Lathauwer Overview • Problem definition • Higher-order statistics • Basic ICA equations • Specific prewhitening-based multilinear algorithms • Application • Higher-order-only schemes • Variants for coloured sources • Dimensionality reduction • Conclusions 2

  3. L. De Lathauwer Independent Component Analysis (ICA) Model: Y = M X + N ( P × 1) ( P × R )( R × 1) ( P × 1) x 2 x 1 x 3 + x 1 ˆ x 2 ˆ x 3 ˆ 3

  4. L. De Lathauwer Model: Y = M X + N ( P × 1) ( P × R )( R × 1) ( P × 1) Assumptions: • columns of M are linearly independent • components of X are statistically independent Goal: Identification of M and/or reconstruction of X while observing only Y 4

  5. L. De Lathauwer Independent Component Analysis (ICA) Disciplines: statistics, neural networks, information theory, linear and multilinear algebra , . . . Indeterminacies: ordering and scaling of the columns ( Y = M X ) Uncorrelated vs independent: X , Y are uncorrelated iff E { XY } = 0 X , Y are independent iff p XY ( x, y ) = p X ( x ) p Y ( y ) statistical independence implies: - the variables are uncorrelated - additional conditions on the HOS 5

  6. L. De Lathauwer Algebraic tools: Condition Identification Tool X i uncorr. column space M matrix EVD/SVD X i indep. M tensor EVD/SVD Web site: http://www.tsi.enst.fr/icacentral/index.html mailing list, data sets, software 6

  7. L. De Lathauwer Applications • Speech and audio • Image processing feature extraction, image reconstruction, video • Telecommunications OFDM, CDMA, . . . • Biomedical applications functional Magnetic Resonance Imaging, electromyogram, electro-encephalogram, (fetal) electrocardiogram, mammography, pulse oximetry, (fetal) magnetocardiogram, . . . • Other applications text classification, vibratory signals generated by termites (!), electron energy loss spectra, astrophysics, . . . 7

  8. L. De Lathauwer HOS definitions Moments and cumulants of a random variable: Moments Cumulants m X c X 1 = E { X } 1 = E { X } “mean” ( m X ) “mean” m X 2 = E { X 2 } c X 2 = E { ( X − m X ) 2 } “variance” ( σ 2 ( R X ) X ) m X 3 = E { X 3 } c X 3 = E { ( X − m X ) 3 } m X 4 = E { X 4 } c X 4 = E { ( X − m X ) 4 } − 3 σ 4 X 8

  9. L. De Lathauwer Characteristic Functions First characteristic function: + ∞ � def = E { e jωx } = p x ( x ) e jωx dx Φ x ( ω ) −∞ Generates moments: ∞ ( jω ) k m X � Φ x ( ω ) = ( m 0 = 1) k k ! k =0 Second characteristic function: def Ψ x ( ω ) = ln Φ x ( ω ) Generates cumulants: ∞ ( jω ) k c X � Ψ x ( ω ) = k k ! k =1 9

  10. L. De Lathauwer Moments and cumulants of a set of random variables: Moments: def ( M ( N ) ) i 1 i 2 ...iN = Mom ( x i 1 , x i 2 , . . . , x iN ) = E { x i 1 x i 2 . . . x iN } x Cumulants: def ( c x ) i = Cum ( x i ) = E { x i } def ( C x ) i 1 i 2 = Cum ( x i 1 , x i 2 ) = E { x i 1 x i 2 } def ( C (3) x ) i 1 i 2 i 3 = Cum ( x i 1 , x i 2 , x i 3 ) = E { x i 1 x i 2 x i 3 } def ( C (4) x ) i 1 i 2 i 3 i 4 = Cum ( x i 1 , x i 2 , x i 3 , x i 4 ) = E { x i 1 x i 2 x i 3 x i 4 } − E { x i 1 x i 2 } E { x i 3 x i 4 } − E { x i 1 x i 3 } E { x i 2 x i 4 } − E { x i 1 x i 4 } E { x i 2 x i 3 } Order � 2 : x i ← x i − E { x i } 10

  11. L. De Lathauwer Multivariate case: e.g. moments: X = E { } R X X X X = E { } M X X 3 11

  12. L. De Lathauwer def  1 : m X = E { X }    → vector         def  E { XX T } 2 : =  R X     → matrix      = ⇒ def M X 3 : = E { X ◦ X ◦ X }  3   → 3rd order tensor          def M X  4 : = E { X ◦ X ◦ X ◦ X }   4   → 4th order tensor      12

  13. L. De Lathauwer HOS example Gaussian distribution 2 πσ exp( − x 2 1 p x ( x ) = 2 σ 2 ) √ m ( n ) c ( n ) n p x ( x ) x x 1 0 0 σ 2 σ 2 2 3 0 0 3 σ 4 x 4 0 Uniform distribution 1 p x ( x ) = ( x ∈ [ − a, + a ]) 2 a m ( n ) c ( n ) n p x ( x ) x x 1 1 0 0 2 a a 2 / 3 a 2 / 3 2 3 0 0 3 a 4 / 5 − 2 a 4 / 15 − a + ax 4 13

  14. L. De Lathauwer ICA: basic equations Model: Y = M X Second order: C Y E { Y Y T } = 2 M · C X 2 · M T = C X = 2 • 1 M • 2 M uncorrelated sources: C X 2 is diagonal “diagonalization by congruence” σ 2 σ 2 σ 2 1 2 R M 1 M 2 M R = + + . . . + C Y 2 M 1 M 2 M R 14

  15. L. De Lathauwer Higher order: C Y 4 = C X 4 • 1 M • 2 M • 3 M • 4 M independent sources: C X is diagonal 4 “CANDECOMP / PARAFAC” M 1 M 2 M R λ 1 λ 2 λ R M 1 M 2 M R = + . . . + + C Y M 1 M 2 M R 15

  16. L. De Lathauwer Prewhitening-based computation Model: Y = M X Second order: C Y E { Y Y T } = 2 M · C X 2 · M T = M · I · M T = M · M T ⇒ ( M · Q ) · ( M · Q ) T = “square root”: EVD, Cholesky, . . . Remark: PCA: M = U · S · V T SVD of M : ( US ) · ( US ) T = U · S 2 · U T ⇒ C Y = 2 16

  17. L. De Lathauwer Prewhitening-based computation (2) Matrix factorization: M = T · Q Second order: C Y 2 = C X 2 • 1 M • 2 M = T · T T Whitened r.v. Z = T − 1 Y = Q X Observed r.v. Y = M X Higher order: ICA: C Y C X = 4 • 1 M • 2 M • 3 M • 4 M 4 ⇒ C Z C X = 4 • 1 Q • 2 Q • 3 Q • 4 Q 4 “multilinear symmetric EVD” “CANDECOMP/PARAFAC with orthogonality and symmetry constraints” Source cumulant is theoretically diagonal An arbitrary symmetric tensor cannot be diagonalized ⇒ different solution strategies 17

  18. L. De Lathauwer PCA versus ICA ICA = higher-order fine-tuning of PCA: PCA ICA 2nd-order higher-order matrix EVD tensor EVD uncorrelated sources independent sources column space M M itself always possible depends on context Computational cost: cumulant estimation and diagonalization 18

  19. L. De Lathauwer Illustration 3 2 1 0 −1 −2 −3 0 50 100 150 200 250 300 350 400 2 1 0 −1 −2 0 50 100 150 200 250 300 350 400 Observations 19

  20. L. De Lathauwer 0.1 0.05 0 −0.05 −0.1 0 50 100 150 200 250 300 350 400 0.1 0.05 0 −0.05 −0.1 0 50 100 150 200 250 300 350 400 Sources estimated with PCA 3 2 1 0 −1 −2 −3 0 50 100 150 200 250 300 350 400 1.5 1 0.5 0 −0.5 −1 −1.5 0 50 100 150 200 250 300 350 400 Sources estimated with ICA 20

  21. L. De Lathauwer Algorithm 1: maximal diagonality x x x Q x Q Q = x x x x x x x x C ( k +1) C ( k ) 21

  22. L. De Lathauwer • Maximize energy on the diagonal by Jacobi-iteration • Determination of optimal rotation angle: order 3 real roots polynomial degree 2 order 3 complex roots polynomial degree 3 order 4 real roots polynomial degree 4 order 4 complex - [ Comon ’94, De Lathauwer ’01 ] 22

  23. L. De Lathauwer Algorithm 2: maximal diagonality x x x Q x Q Q = x x x x x x x x C ( k +1) C ( k ) • Trace is not rotation invariant • Maximize sum of diagonal entries by Jacobi-iteration • Determination of optimal rotation angle: order 4 real roots polynomial degree 2 order 4 complex roots polynomial degree 3 [ Comon, Moreau, ’97 ] 23

  24. L. De Lathauwer Algorithm 3: simultaneous EVD Q 1 Q 2 Q P Q 1 Q 2 Q P = + . . . + + C Z Q 1 Q 2 Q P = • Maximize energy on the diagonals by Jacobi-iteration • Determination of optimal rotation angle: real roots polynomial degree 2 complex roots polynomial degree 3 [ Cardoso ’94 (JADE) ] 24

  25. L. De Lathauwer Application: fetal electrocardiogram extraction Abdominal and thoracic recordings 1000 0 −1000 2000 0 −2000 1000 0 −1000 500 0 −500 2000 0 −2000 5000 0 −5000 5000 0 −5000 5000 0 −5000 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 [s] 25

  26. L. De Lathauwer ICA results for FECG extraction Independent components: 0.2 0 −0.2 0.2 0 −0.2 0.2 0 −0.2 0.1 0 −0.1 0.2 0 −0.2 0.2 0 −0.2 0.1 0 −0.1 0.2 0 −0.2 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 [s] 26

  27. L. De Lathauwer A variant for coloured sources Condition: sources mutually uncorrelated, but individually correlated in time Basic equations: C Y E { Y ( t ) Y ( t ) T } 2 (0) = M · C X 2 (0) · M T = σ 2 σ 2 σ 2 1 2 R M 1 M 2 M R = + + . . . + C Y 2 (0) M 1 M 2 M R 27

  28. L. De Lathauwer C Y E { Y ( t ) Y ( t + τ ) T } 2 ( τ ) = M · C X 2 ( τ ) · M T = = Variants: nonstationary sources, time-frequency representations, Hessian second characteristic function, . . . [ Belouchrani et al. ’97 (SOBI) ], [ De Lathauwer and Castaing ’08 ] (overcomplete) 28

  29. L. De Lathauwer Large mixtures: more sensors than sources Applications: EEG, MEG, NMR, hyper-spectral image processing, data analysis, . . . Prewhitening-based algorithms: Y = M X ( P ≫ R ) ( P × 1) ( P × R )( R × 1) U · S · V T = M ( P × R ) ( P × R )( R × R )( R × R ) S − 1 · U T Y Z = V T X Z = ( R × 1) ( R × R )( R × 1) 29

  30. L. De Lathauwer Large mixtures: more sensors than sources (2) Algorithms without prewhitening: best multilinear rank approximation U (3) I 3 I 3 I 2 = I 2 I 1 I 1 U (2) U (1) A S Tucker decomposition: [ Tucker ’64 ], [ De Lathauwer ’00 ] 30

Recommend


More recommend