robust nonnegative matrix factorisation with the
play

Robust nonnegative matrix factorisation with the -divergence and - PowerPoint PPT Presentation

Robust nonnegative matrix factorisation with the -divergence and applications in imaging C edric F evotte Institut de Recherche en Informatique de Toulouse S eminaire du D epartement Image-Signal GIPSA-lab Juillet 2020 Outline


  1. Robust nonnegative matrix factorisation with the β -divergence and applications in imaging C´ edric F´ evotte Institut de Recherche en Informatique de Toulouse S´ eminaire du D´ epartement Image-Signal GIPSA-lab Juillet 2020

  2. Outline Generalities Matrix factorisation models Nonnegative matrix factorisation (NMF) Optimisation for NMF Measures of fit Majorisation-minimisation Applications in imaging Hyperspectral unmixing in remote sensing Factor analysis in dynamic PET 2

  3. Matrix factorisation models Data often available in matrix form. coefficient s e r u t a e f samples 3

  4. Matrix factorisation models Data often available in matrix form. movie s rating e i 4 v o m users 4

  5. Matrix factorisation models Data often available in matrix form. word count s d 57 r o w text documents 5

  6. Matrix factorisation models Data often available in matrix form. Fourier s e i coefficient c n e 0.3 u q e r f time 6

  7. Matrix factorisation models dictionary learning ≈ low-rank approximation factor analysis latent semantic analysis data X dictionary W activations H ≈ 7

  8. Matrix factorisation models dictionary learning ≈ low-rank approximation factor analysis latent semantic analysis data X dictionary W activations H ≈ 8

  9. Matrix factorisation models for dimensionality reduction (coding, low-dimensional embedding) ≈ 9

  10. Matrix factorisation models for unmixing (source separation, latent topic discovery) ≈ 10

  11. Matrix factorisation models for interpolation (collaborative filtering, image inpainting) ≈ 11

  12. Nonnegative matrix factorisation K patterns H N samples F features ≈ V W ◮ data V and factors W , H have nonnegative entries. ◮ nonnegativity of W ensures interpretability of the dictionary, because patterns w k and samples v n belong to the same space. ◮ nonnegativity of H tends to produce part-based representations, because subtractive combinations are forbidden. Early work by (Paatero and Tapper, 1994) , landmark Nature paper by (Lee and Seung, 1999) 12

  13. NMF for latent semantic analysis (Lee and Seung, 1999; Hofmann, 1999) court president government served council governor culture secretary supreme senate constitutional congress Encyclopedia entry: rights presidential 'Constitution of the justice elected United States' president (148) flowers disease congress (124) leaves behaviour power (120) plant glands united (104) perennial contact ≈ constitution (81) flower symptoms amendment (71) plants skin government (57) growing pain law (49) annual infection ≈ × v n W h n reproduced from (Lee and Seung, 1999) 13

  14. NMF for audio spectral unmixing (Smaragdis and Brown, 2003) Input music passage 20000 16000 6000 3500 Frequency (Hz) Component 2000 1000 500 100 4 3 2 1 0.5 1 1.5 2 2.5 3 Frequency Time (sec) 4 Component 3 2 1 reproduced from (Smaragdis, 2013) 14

  15. NMF for hyperspectral unmixing (Berry, Browne, Langville, Pauca, and Plemmons, 2007) reproduced from (Bioucas-Dias et al., 2012) 15

  16. Outline Generalities Matrix factorisation models Nonnegative matrix factorisation (NMF) Optimisation for NMF Measures of fit Majorisation-minimisation Applications in imaging Hyperspectral unmixing in remote sensing Factor analysis in dynamic PET 16

  17. NMF as a constrained minimisation problem Minimise a measure of fit between V and WH , subject to nonnegativity: � W , H ≥ 0 D ( V | WH ) = min d ([ V ] fn | [ WH ] fn ) , fn where d ( x | y ) is a scalar cost function, e.g., ◮ squared Euclidean distance (Paatero and Tapper, 1994; Lee and Seung, 2001) ◮ Kullback-Leibler divergence (Lee and Seung, 1999; Finesso and Spreij, 2006) ◮ Itakura-Saito divergence (F´ evotte, Bertin, and Durrieu, 2009) ◮ α -divergence (Cichocki et al., 2008) ◮ β -divergence (Cichocki et al., 2006; F´ evotte and Idier, 2011) ◮ Bregman divergences (Dhillon and Sra, 2005) ◮ and more in (Yang and Oja, 2011) Regularisation terms often added to D ( V | WH ) for sparsity, smoothness, dynamics, etc. Nonconvex problem. 17

  18. Probabilistic models ◮ Let V ∼ p ( V | WH ) such that ◮ E[ V | WH ] = WH ◮ p ( V | WH ) = � fn p ( v fn | [ WH ] fn ) ◮ then the following correspondences apply with D ( V | WH ) = − log p ( V | WH ) + cst data support distribution/noise divergence examples real-valued additive Gaussian squared Euclidean many multinomial ⋆ integer weighted KL word counts integer Poisson generalised KL photon counts multiplicative nonnegative Itakura-Saito spectrogram Gamma generally generalises Tweedie β -divergence nonnegative above models ⋆ conditional independence over f does not apply 18

  19. The β -divergence A popular measure of fit in NMF (Basu et al., 1998; Cichocki and Amari, 2010) x β + ( β − 1) y β − β x y β − 1 �  1 � β ∈ R \{ 0 , 1 } β ( β − 1)  d β ( x | y ) def  x log x = y + ( y − x ) β = 1 y − log x x y − 1 β = 0   Special cases: ◮ squared Euclidean distance ( β = 2) ◮ generalised Kullback-Leibler (KL) divergence ( β = 1) ◮ Itakura-Saito (IS) divergence ( β = 0) Properties: ◮ Homogeneity: d β ( λ x | λ y ) = λ β d β ( x | y ) ◮ d β ( x | y ) is a convex function of y for 1 ≤ β ≤ 2 ◮ Bregman divergence 19

  20. The β -divergence d(x=1|y) 1 β = 2 (Euc) 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 y 20

  21. The β -divergence d(x=1|y) 1 β = 2 (Euc) 0.9 β = 1 (KL) 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 y 21

  22. The β -divergence d(x=1|y) 1 β = 2 (Euc) 0.9 β = 1 (KL) β = 0 (IS) 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 y 22

  23. The β -divergence d(x=1|y) 1 β = 2 (Euc) 0.9 β = 1 (KL) β = 0 (IS) 0.8 β = −1 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 y 23

  24. The β -divergence d(x=1|y) 1 β = 2 (Euc) 0.9 β = 1 (KL) β = 0 (IS) 0.8 β = −1 β = 3 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 y 24

  25. Common NMF algorithm design ◮ Block-coordinate update of H given W ( i − 1) and W given H ( i ) . ◮ Updates of W and H equivalent by transposition: V ≈ WH ⇔ V T ≈ H T W T ◮ Objective function separable in the columns of H or the rows of W : � D ( V | WH ) = D ( v n | Wh n ) n ◮ Essentially left with nonnegative linear regression: h ≥ 0 C ( h ) def min = D ( v | Wh ) Numerous references in the image restoration literature, e.g., (Richardson, 1972; Lucy, 1974; Daube-Witherspoon and Muehllehner, 1986; De Pierro, 1993) Block-descent algorithm, nonconvex problem, initialisation is an issue. 25

  26. Majorisation-minimisation (MM) Build G ( h | ˜ h ) such that G ( h | ˜ h ) ≥ C ( h ) and G (˜ h | ˜ h ) = C (˜ h ). Optimise (iteratively) G ( h | ˜ h ) instead of C ( h ). 0.5 Objective function C(h) 0.45 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 0 3 26

  27. Majorisation-minimisation (MM) Build G ( h | ˜ h ) such that G ( h | ˜ h ) ≥ C ( h ) and G (˜ h | ˜ h ) = C (˜ h ). Optimise (iteratively) G ( h | ˜ h ) instead of C ( h ). 0.5 Objective function C(h) 0.45 Auxiliary function G(h|h (0) ) 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 0 3 h (1) h (0) 26

  28. Majorisation-minimisation (MM) Build G ( h | ˜ h ) such that G ( h | ˜ h ) ≥ C ( h ) and G (˜ h | ˜ h ) = C (˜ h ). Optimise (iteratively) G ( h | ˜ h ) instead of C ( h ). 0.5 Objective function C(h) 0.45 Auxiliary function G(h|h (1) ) 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 0 3 h (2) h (1) h (0) 26

  29. Majorisation-minimisation (MM) Build G ( h | ˜ h ) such that G ( h | ˜ h ) ≥ C ( h ) and G (˜ h | ˜ h ) = C (˜ h ). Optimise (iteratively) G ( h | ˜ h ) instead of C ( h ). 0.5 Objective function C(h) 0.45 Auxiliary function G(h|h (2) ) 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 0 3 h (3) h (2) h (1) h (0) 26

  30. Majorisation-minimisation (MM) Build G ( h | ˜ h ) such that G ( h | ˜ h ) ≥ C ( h ) and G (˜ h | ˜ h ) = C (˜ h ). Optimise (iteratively) G ( h | ˜ h ) instead of C ( h ). 0.5 Objective function C(h) 0.45 Auxiliary function G(h|h * ) 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 0 3 h * h (3) h (2) h (1) h (0) 26

  31. Majorisation-minimisation (MM) ◮ Finding a good & workable local majorisation is the crucial point. ◮ Treating convex and concave terms separately with Jensen and tangent inequalities usually works. E.g.: �� � �� �� �� v f C IS ( h ) = + log w fk h k + cst � k w fk h k f f k 27

  32. Majorisation-minimisation (MM) ◮ Finding a good & workable local majorisation is the crucial point. ◮ Treating convex and concave terms separately with Jensen and tangent inequalities usually works. E.g.: �� � �� �� �� v f C IS ( h ) = + log w fk h k + cst � k w fk h k f f k ◮ In most cases, leads to nonnegativity-preserving multiplicative algorithms: � γ h k C (˜ � ∇ − h ) h k = ˜ h k h k C (˜ ∇ + h ) ◮ ∇ h k C ( h ) = ∇ + h k C ( h ) − ∇ − h k C ( h ) and the two summands are nonnegative. ◮ if ∇ h k C (˜ h ) > 0, ratio of summands < 1 and h k goes left. ◮ γ is a divergence-specific scalar exponent. ◮ Details in (F´ evotte and Idier, 2011; Yang and Oja, 2011; Zhao and Tan, 2018) 27

Recommend


More recommend