Robust nonnegative matrix factorisation with the -divergence and - PowerPoint PPT Presentation

Robust nonnegative matrix factorisation with the β -divergence and applications in imaging C´ edric F´ evotte Institut de Recherche en Informatique de Toulouse Imaging & Machine Learning Institut Henri Poincar´ e April 2019

Outline Generalities Matrix factorisation models Nonnegative matrix factorisation (NMF) Optimisation for NMF Measures of fit Majorisation-minimisation Applications in imaging Hyperspectral unmixing in remote sensing Factor analysis in dynamic PET 2

Matrix factorisation models Data often available in matrix form. coefficient s e r u t a e f samples 3

Matrix factorisation models Data often available in matrix form. movie s rating e i 4 v o m users 4

Matrix factorisation models Data often available in matrix form. word count s d 57 r o w text documents 5

Matrix factorisation models Data often available in matrix form. Fourier s e i coefficient c n e 0.3 u q e r f time 6

Matrix factorisation models dictionary learning ≈ low-rank approximation factor analysis latent semantic analysis data X dictionary W activations H ≈ 7

Matrix factorisation models dictionary learning ≈ low-rank approximation factor analysis latent semantic analysis data X dictionary W activations H ≈ 8

Matrix factorisation models for dimensionality reduction (coding, low-dimensional embedding) ≈ 9

Matrix factorisation models for unmixing (source separation, latent topic discovery) ≈ 10

Matrix factorisation models for interpolation (collaborative filtering, image inpainting) ≈ 11

Nonnegative matrix factorisation K patterns H N samples F features ≈ V W ◮ data V and factors W , H have nonnegative entries. ◮ nonnegativity of W ensures interpretability of the dictionary, because patterns w k and samples v n belong to the same space. ◮ nonnegativity of H tends to produce part-based representations, because subtractive combinations are forbidden. Early work by (Paatero and Tapper, 1994) , landmark Nature paper by (Lee and Seung, 1999) 12

NMF for latent semantic analysis (Lee and Seung, 1999; Hofmann, 1999) court president government served council governor culture secretary supreme senate constitutional congress Encyclopedia entry: rights presidential 'Constitution of the justice elected United States' president (148) flowers disease congress (124) leaves behaviour power (120) plant glands united (104) perennial contact ≈ constitution (81) flower symptoms amendment (71) plants skin government (57) growing pain law (49) annual infection ≈ × v n W h n reproduced from (Lee and Seung, 1999) 13

NMF for audio spectral unmixing (Smaragdis and Brown, 2003) Input music passage 20000 16000 6000 3500 Frequency (Hz) Component 2000 1000 500 100 4 3 2 1 0.5 1 1.5 2 2.5 3 Frequency Time (sec) 4 Component 3 2 1 reproduced from (Smaragdis, 2013) 14

NMF for hyperspectral unmixing (Berry, Browne, Langville, Pauca, and Plemmons, 2007) reproduced from (Bioucas-Dias et al., 2012) 15

Outline Generalities Matrix factorisation models Nonnegative matrix factorisation (NMF) Optimisation for NMF Measures of fit Majorisation-minimisation Applications in imaging Hyperspectral unmixing in remote sensing Factor analysis in dynamic PET 16

NMF as a constrained minimisation problem Minimise a measure of fit between V and WH , subject to nonnegativity: � W , H ≥ 0 D ( V | WH ) = min d ([ V ] fn | [ WH ] fn ) , fn where d ( x | y ) is a scalar cost function, e.g., ◮ squared Euclidean distance (Paatero and Tapper, 1994; Lee and Seung, 2001) ◮ Kullback-Leibler divergence (Lee and Seung, 1999; Finesso and Spreij, 2006) ◮ Itakura-Saito divergence (F´ evotte, Bertin, and Durrieu, 2009) ◮ α -divergence (Cichocki et al., 2008) ◮ β -divergence (Cichocki et al., 2006; F´ evotte and Idier, 2011) ◮ Bregman divergences (Dhillon and Sra, 2005) ◮ and more in (Yang and Oja, 2011) Regularisation terms often added to D ( V | WH ) for sparsity, smoothness, dynamics, etc. Nonconvex problem. 17

Probabilistic models ◮ Let V ∼ p ( V | WH ) such that ◮ E[ V | WH ] = WH ◮ p ( V | WH ) = � fn p ( v fn | [ WH ] fn ) ◮ then the following correspondences apply with D ( V | WH ) = − log p ( V | WH ) + cst data support distribution/noise divergence examples real-valued additive Gaussian squared Euclidean many multinomial ⋆ integer weighted KL word counts integer Poisson generalised KL photon counts multiplicative nonnegative Itakura-Saito spectrogram Gamma generally generalises Tweedie β -divergence nonnegative above models ⋆ conditional independence over f does not apply 18

The β -divergence A popular measure of fit in NMF (Basu et al., 1998; Cichocki and Amari, 2010) x β + ( β − 1) y β − β x y β − 1 �  1 � β ∈ R \{ 0 , 1 } β ( β − 1)  d β ( x | y ) def  x log x = y + ( y − x ) β = 1 y − log x x y − 1 β = 0   Special cases: ◮ squared Euclidean distance ( β = 2) ◮ generalised Kullback-Leibler (KL) divergence ( β = 1) ◮ Itakura-Saito (IS) divergence ( β = 0) Properties: ◮ Homogeneity: d β ( λ x | λ y ) = λ β d β ( x | y ) ◮ d β ( x | y ) is a convex function of y for 1 ≤ β ≤ 2 ◮ Bregman divergence 19

The β -divergence d(x=1|y) 1 β = 2 (Euc) 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 y 20

The β -divergence d(x=1|y) 1 β = 2 (Euc) 0.9 β = 1 (KL) 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 y 21

The β -divergence d(x=1|y) 1 β = 2 (Euc) 0.9 β = 1 (KL) β = 0 (IS) 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 y 22

The β -divergence d(x=1|y) 1 β = 2 (Euc) 0.9 β = 1 (KL) β = 0 (IS) 0.8 β = −1 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 y 23

The β -divergence d(x=1|y) 1 β = 2 (Euc) 0.9 β = 1 (KL) β = 0 (IS) 0.8 β = −1 β = 3 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 y 24

Common NMF algorithm design ◮ Block-coordinate update of H given W ( i − 1) and W given H ( i ) . ◮ Updates of W and H equivalent by transposition: V ≈ WH ⇔ V T ≈ H T W T ◮ Objective function separable in the columns of H or the rows of W : � D ( V | WH ) = D ( v n | Wh n ) n ◮ Essentially left with nonnegative linear regression: h ≥ 0 C ( h ) def min = D ( v | Wh ) Numerous references in the image restoration literature, e.g., (Richardson, 1972; Lucy, 1974; Daube-Witherspoon and Muehllehner, 1986; De Pierro, 1993) Block-descent algorithm, nonconvex problem, initialisation is an issue. 25

Majorisation-minimisation (MM) Build G ( h | ˜ h ) such that G ( h | ˜ h ) ≥ C ( h ) and G (˜ h | ˜ h ) = C (˜ h ). Optimise (iteratively) G ( h | ˜ h ) instead of C ( h ). 0.5 Objective function C(h) 0.45 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 0 3 26

Majorisation-minimisation (MM) Build G ( h | ˜ h ) such that G ( h | ˜ h ) ≥ C ( h ) and G (˜ h | ˜ h ) = C (˜ h ). Optimise (iteratively) G ( h | ˜ h ) instead of C ( h ). 0.5 Objective function C(h) 0.45 Auxiliary function G(h|h (0) ) 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 0 3 h (1) h (0) 26

Majorisation-minimisation (MM) Build G ( h | ˜ h ) such that G ( h | ˜ h ) ≥ C ( h ) and G (˜ h | ˜ h ) = C (˜ h ). Optimise (iteratively) G ( h | ˜ h ) instead of C ( h ). 0.5 Objective function C(h) 0.45 Auxiliary function G(h|h (1) ) 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 0 3 h (2) h (1) h (0) 26

Majorisation-minimisation (MM) Build G ( h | ˜ h ) such that G ( h | ˜ h ) ≥ C ( h ) and G (˜ h | ˜ h ) = C (˜ h ). Optimise (iteratively) G ( h | ˜ h ) instead of C ( h ). 0.5 Objective function C(h) 0.45 Auxiliary function G(h|h (2) ) 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 0 3 h (3) h (2) h (1) h (0) 26

Majorisation-minimisation (MM) Build G ( h | ˜ h ) such that G ( h | ˜ h ) ≥ C ( h ) and G (˜ h | ˜ h ) = C (˜ h ). Optimise (iteratively) G ( h | ˜ h ) instead of C ( h ). 0.5 Objective function C(h) 0.45 Auxiliary function G(h|h * ) 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 0 3 h * h (3) h (2) h (1) h (0) 26

Majorisation-minimisation (MM) ◮ Finding a good & workable local majorisation is the crucial point. ◮ Treating convex and concave terms separately with Jensen and tangent inequalities usually works. E.g.: �� v f C IS ( h ) = + log w fk h k + cst � k w fk h k f f k 27

Majorisation-minimisation (MM) ◮ Finding a good & workable local majorisation is the crucial point. ◮ Treating convex and concave terms separately with Jensen and tangent inequalities usually works. E.g.: �� v f C IS ( h ) = + log w fk h k + cst � k w fk h k f f k ◮ In most cases, leads to nonnegativity-preserving multiplicative algorithms: � γ h k C (˜ � ∇ − h ) h k = ˜ h k h k C (˜ ∇ + h ) ◮ ∇ h k C ( h ) = ∇ + h k C ( h ) − ∇ − h k C ( h ) and the two summands are nonnegative. ◮ if ∇ h k C (˜ h ) > 0, ratio of summands < 1 and h k goes left. ◮ γ is a divergence-specific scalar exponent. ◮ Details in (F´ evotte and Idier, 2011; Yang and Oja, 2011; Zhao and Tan, 2018) 27

Robust nonnegative matrix factorisation with the -divergence and - PowerPoint PPT Presentation

Robust nonnegative matrix factorisation with the -divergence and applications in imaging C edric F evotte Institut de Recherche en Informatique de Toulouse Imaging & Machine Learning Institut Henri Poincar e April 2019 Outline

Robust nonnegative matrix factorisation with the -divergence and applications in imaging C

Nonnegative matrix factorization and applications in audio signal processing C edric F

An introduction to Nonnegative Matrix Factorisation Slim ESSID Telecom ParisTech June 2015 Slim

Fast Newton-type Methods for Nonnegative Matrix and Tensor Approximation Inderjit S. Dhillon

Some Recent Advances in Nonnegative Matrix Factorization and their Applications to Hyperspectral

Automatic relevance determination in nonnegative matrix factorization with the -divergence

Boolean Matrix Factorisation for Collaborative Filtering: An FCA based approach Dmitry Ignatov

Fault-tolerant matrix factorisation: a formal model and proof Camille Coti, Laure Petrucci,

Nonnegative Matrix Factorization and Applications Christine De Mol (joint work with Michel

Parallel Nonnegative Matrix Factorization Algorithms for Hyperspectral Images A Masters Thesis

Adversarial Nonnegative Matrix Factorization Lei Luo, Yanfu Zhang, Heng Huang Electrical and

Sparse Separable Nonnegative Matrix Factorization Extending Separable NMF with 0 sparsity

Age and Gender Recognition from Speech Patterns Based on Supervised NonNegative Matrix

New variants of Nonnegative Matrix Factorization for sparsity improvement and maximum biclique

Neural Nonnegative Matrix Factorization for Hierarchical Multilayer Topic Modeling Jamie Haddock

Data Sciences CentraleSupelec Advance Machine Learning Course VI - Nonnegative matrix

Automated Gene Classification using Nonnegative Matrix Factorization on Biomedical Literature

Multi-View Clustering via Joint Nonnegative Matrix Factorization Jialu Liu 1 Chi Wang 1 Jing Gao 2

Factorisation algebras associated to Hilbert schemes of points Emily Cliff University of Oxford

Boolean matrix factorization meets consecutive ones property Nikolaj T atti & Pauli

Accurate Eigenvalues and SVDs of Totally Nonnegative Matrices Plamen Koev San Jose State

Minimisation de la m emoire VS minimisation du volume dE/S dans les m ethodes de

Robust Principal Component Pursuit via Alternating Minimization Scheme on Matrix Manifolds Tao Wu

Online robust matrix factorization for dependent data streams Hanbaek Lyu Department of

Robust nonnegative matrix factorisation with the -divergence and - PowerPoint PPT Presentation

Robust nonnegative matrix factorisation with the -divergence and applications in imaging C edric F evotte Institut de Recherche en Informatique de Toulouse Imaging & Machine Learning Institut Henri Poincar e April 2019 Outline

Robust nonnegative matrix factorisation with the -divergence and applications in imaging C

Nonnegative matrix factorization and applications in audio signal processing C edric F

An introduction to Nonnegative Matrix Factorisation Slim ESSID Telecom ParisTech June 2015 Slim

Fast Newton-type Methods for Nonnegative Matrix and Tensor Approximation Inderjit S. Dhillon

Some Recent Advances in Nonnegative Matrix Factorization and their Applications to Hyperspectral

Automatic relevance determination in nonnegative matrix factorization with the -divergence

Boolean Matrix Factorisation for Collaborative Filtering: An FCA based approach Dmitry Ignatov

Fault-tolerant matrix factorisation: a formal model and proof Camille Coti, Laure Petrucci,

Nonnegative Matrix Factorization and Applications Christine De Mol (joint work with Michel

Parallel Nonnegative Matrix Factorization Algorithms for Hyperspectral Images A Masters Thesis

Adversarial Nonnegative Matrix Factorization Lei Luo, Yanfu Zhang, Heng Huang Electrical and

Sparse Separable Nonnegative Matrix Factorization Extending Separable NMF with 0 sparsity

Age and Gender Recognition from Speech Patterns Based on Supervised NonNegative Matrix

New variants of Nonnegative Matrix Factorization for sparsity improvement and maximum biclique

Neural Nonnegative Matrix Factorization for Hierarchical Multilayer Topic Modeling Jamie Haddock

Data Sciences CentraleSupelec Advance Machine Learning Course VI - Nonnegative matrix

Automated Gene Classification using Nonnegative Matrix Factorization on Biomedical Literature

Multi-View Clustering via Joint Nonnegative Matrix Factorization Jialu Liu 1 Chi Wang 1 Jing Gao 2

Factorisation algebras associated to Hilbert schemes of points Emily Cliff University of Oxford

Boolean matrix factorization meets consecutive ones property Nikolaj T atti &amp; Pauli

Accurate Eigenvalues and SVDs of Totally Nonnegative Matrices Plamen Koev San Jose State

Minimisation de la m emoire VS minimisation du volume dE/S dans les m ethodes de

Robust Principal Component Pursuit via Alternating Minimization Scheme on Matrix Manifolds Tao Wu

Online robust matrix factorization for dependent data streams Hanbaek Lyu Department of

Boolean matrix factorization meets consecutive ones property Nikolaj T atti & Pauli