Nonnegative matrix factorization and applications in audio signal processing C´ edric F´ evotte Laboratoire Lagrange, Nice Machine Learning Crash Course Genova, June 2015 1
Outline Generalities Matrix factorisation models Nonnegative matrix factorisation Majorisation-minimisation algorithms Audio examples Piano toy example Audio restoration Audio bandwidth extension Multichannel IS-NMF 2
Matrix factorisation models Data often available in matrix form. coefficient s e r u t a e f samples 3
Matrix factorisation models Data often available in matrix form. movie s rating e i 4 v o m users 4
Matrix factorisation models Data often available in matrix form. word count s d 57 r o w text documents 5
Matrix factorisation models Data often available in matrix form. Fourier s e i coefficient c n e 0.3 u q e r f time 6
Matrix factorisation models dictionary learning ≈ low-rank approximation factor analysis latent semantic analysis data X dictionary W activations H ≈ 7
Matrix factorisation models dictionary learning ≈ low-rank approximation factor analysis latent semantic analysis data X dictionary W activations H ≈ 8
Matrix factorisation models for dimensionality reduction (coding, low-dimensional embedding) ≈ 9
Matrix factorisation models for unmixing (source separation, latent topic discovery) ≈ 10
Matrix factorisation models for interpolation (collaborative filtering, image inpainting) ≈ 11
Nonnegative matrix factorisation K patterns H N samples F features ≈ V W ◮ data V and factors W , H have nonnegative entries. ◮ nonnegativity of W ensures interpretability of the dictionary, because patterns w k and samples v n belong to the same space. ◮ nonnegativity of H tends to produce part-based representations, because subtractive combinations are forbidden. Early work by Paatero and Tapper (1994), landmark Nature paper by Lee and Seung (1999) 12
49 images among 2429 from MIT’s CBCL face dataset 13
PCA dictionary with K = 25 red pixels indicate negative values 14
NMF dictionary with K = 25 experiment reproduced from (Lee and Seung, 1999) 15
NMF for latent semantic analysis (Lee and Seung, 1999; Hofmann, 1999) court president government served council governor culture secretary supreme senate constitutional congress Encyclopedia entry: rights presidential 'Constitution of the justice elected United States' president (148) flowers disease congress (124) leaves behaviour power (120) plant glands united (104) perennial contact ≈ constitution (81) flower symptoms amendment (71) plants skin government (57) growing pain law (49) annual infection ≈ × v n W h n reproduced from (Lee and Seung, 1999) 16
NMF for hyperspectral unmixing (Berry, Browne, Langville, Pauca, and Plemmons, 2007) reproduced from (Bioucas-Dias et al., 2012) 17
NMF for audio spectral unmixing (Smaragdis and Brown, 2003) Input music passage 20000 16000 6000 3500 Frequency (Hz) Component 2000 1000 500 100 4 3 2 1 0.5 1 1.5 2 2.5 3 Frequency Time (sec) 4 Component 3 2 1 reproduced from (Smaragdis, 2013) 18
Outline Generalities Matrix factorisation models Nonnegative matrix factorisation Majorisation-minimisation algorithms Audio examples Piano toy example Audio restoration Audio bandwidth extension Multichannel IS-NMF 19
NMF as a constrained minimisation problem Minimise a measure of fit between V and WH , subject to nonnegativity: � W , H ≥ 0 D ( V | WH ) = min d ([ V ] fn | [ WH ] fn ) , fn where d ( x | y ) is a scalar cost function, e.g., ◮ squared Euclidean distance (Paatero and Tapper, 1994; Lee and Seung, 2001) ◮ Kullback-Leibler divergence (Lee and Seung, 1999; Finesso and Spreij, 2006) ◮ Itakura-Saito divergence (F´ evotte, Bertin, and Durrieu, 2009) ◮ α -divergence (Cichocki et al., 2008) ◮ β -divergence (Cichocki et al., 2006; F´ evotte and Idier, 2011) ◮ Bregman divergences (Dhillon and Sra, 2005) ◮ and more in (Yang and Oja, 2011) Regularisation terms often added to D ( V | WH ) for sparsity, smoothness, dynamics, etc. 20
Common NMF algorithm design ◮ Block-coordinate update of H given W ( i − 1) and W given H ( i ) . ◮ Updates of W and H equivalent by transposition: V ≈ WH ⇔ V T ≈ H T W T ◮ Objective function separable in the columns of H or the rows of W : � D ( V | WH ) = D ( v n | Wh n ) n ◮ Essentially left with nonnegative linear regression: h ≥ 0 C ( h ) def min = D ( v | Wh ) Numerous references in the image restoration literature. e.g., (Richardson, 1972; Lucy, 1974; Daube-Witherspoon and Muehllehner, 1986; De Pierro, 1993) 21
Majorisation-minimisation (MM) Build G ( h | ˜ h ) such that G ( h | ˜ h ) ≥ C ( h ) and G (˜ h | ˜ h ) = C (˜ h ). Optimise (iteratively) G ( h | ˜ h ) instead of C ( h ). 0.5 Objective function C(h) 0.45 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 0 3 22
Majorisation-minimisation (MM) Build G ( h | ˜ h ) such that G ( h | ˜ h ) ≥ C ( h ) and G (˜ h | ˜ h ) = C (˜ h ). Optimise (iteratively) G ( h | ˜ h ) instead of C ( h ). 0.5 Objective function C(h) 0.45 Auxiliary function G(h|h (0) ) 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 0 3 h (1) h (0) 22
Majorisation-minimisation (MM) Build G ( h | ˜ h ) such that G ( h | ˜ h ) ≥ C ( h ) and G (˜ h | ˜ h ) = C (˜ h ). Optimise (iteratively) G ( h | ˜ h ) instead of C ( h ). 0.5 Objective function C(h) 0.45 Auxiliary function G(h|h (1) ) 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 0 3 h (2) h (1) h (0) 22
Majorisation-minimisation (MM) Build G ( h | ˜ h ) such that G ( h | ˜ h ) ≥ C ( h ) and G (˜ h | ˜ h ) = C (˜ h ). Optimise (iteratively) G ( h | ˜ h ) instead of C ( h ). 0.5 Objective function C(h) 0.45 Auxiliary function G(h|h (2) ) 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 0 3 h (3) h (2) h (1) h (0) 22
Majorisation-minimisation (MM) Build G ( h | ˜ h ) such that G ( h | ˜ h ) ≥ C ( h ) and G (˜ h | ˜ h ) = C (˜ h ). Optimise (iteratively) G ( h | ˜ h ) instead of C ( h ). 0.5 Objective function C(h) 0.45 Auxiliary function G(h|h * ) 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 0 3 h * h (3) h (2) h (1) h (0) 22
Majorisation-minimisation (MM) ◮ Finding a good & workable local majorisation is the crucial point. ◮ For most the divergences mentioned, Jensen and tangent inequalities are usually enough. ◮ In many cases, leads to multiplicative algorithms such that � � γ ∇ − h k C (˜ h ) h k = ˜ h k ∇ + h k C (˜ h ) where h k C ( h ) − ∇ + ◮ ∇ h k C ( h ) = ∇ − h k C ( h ) and the two summands are nonnegative ◮ γ is a divergence-specific scalar exponent. ◮ More details about MM in (Lee and Seung, 2001; F´ evotte and Idier, 2011; Yang and Oja, 2011) . 23
How to choose a right measure of fit ? ◮ Squared Euclidean distance is a common default choice. ◮ Underlies a Gaussian additive noise model such that v fn = [ WH ] fn + ǫ fn . Can generate negative values – not very natural for nonnegative data. ◮ Many other options. Select a right divergence (for a specific problem) by ◮ comparing performances, given ground-truth data. ◮ assessing the ability to predict missing/unseen data (interpolation, cross-validation). ◮ probabilistic modelling: D ( V | WH ) = − log p ( V | WH ) + cst 24
How to choose a right measure of fit ? ◮ Let V ∼ p ( V | WH ) such that E[ V | WH ] = WH ◮ then the following correspondences apply with D ( V | WH ) = − log p ( V | WH ) + cst data support distribution/noise divergence examples real-valued additive Gaussian squared Euclidean many integer multinomial Kullback-Leibler word counts integer Poisson generalised KL photon counts multiplicative nonnegative Itakura-Saito spectral data Gamma generally generalises Tweedie β -divergence nonnegative above models 25
Outline Generalities Matrix factorisation models Nonnegative matrix factorisation Majorisation-minimisation algorithms Audio examples Piano toy example Audio restoration Audio bandwidth extension Multichannel IS-NMF 26
Piano toy example � ����� � � � � � � � � � � � � � � � � � (MIDI numbers : 61, 65, 68, 72) Figure: Three representations of data. 27
Piano toy example IS-NMF on power spectrogram with K = 8 Dictionary W Coefficients H Reconstructed components 15000 − 2 0.2 10000 K = 1 − 4 0 − 6 5000 − 0.2 − 8 − 10 0 − 2 10000 0.2 K = 2 − 4 0 − 6 5000 − 0.2 − 8 − 10 0 6000 − 2 0.2 4000 K = 3 − 4 0 − 6 2000 − 8 − 0.2 − 10 0 8000 − 2 0.2 K = 4 − 4 6000 0 − 6 4000 − 0.2 − 8 2000 − 10 0 − 2 2000 0.2 K = 5 − 4 0 − 6 1000 − 0.2 − 8 − 10 0 200 − 2 0.2 K = 6 − 4 0 − 6 100 − 8 − 0.2 − 10 0 4 − 2 0.2 K = 7 − 4 2 0 − 6 − 0.2 − 8 − 10 0 2 − 2 0.2 K = 8 − 4 0 1 − 6 − 8 − 0.2 − 10 0 50 100 150 200 250 300 350 400 450 500 0 100 200 300 400 500 600 0.5 1 1.5 2 2.5 3 5 x 10 Pitch estimates: 65.0 68.0 61.0 72.0 0 0 0 0 (True values: 61, 65, 68, 72) 28
Recommend
More recommend