An introduction to Nonnegative Matrix Factorisation Slim ESSID Telecom ParisTech June 2015 Slim ESSID (Telecom ParisTech) Introduction to NMF TPT - UPS – June 2015 1 / 53
Credits Some illustrations, slides and demos are reproduced courtesy of: • A. Ozerov, • C. Févotte, • N. Seichepine, • R. Hennequin, • F. Vallet, • A. Liutkus. Slim ESSID (Telecom ParisTech) Introduction to NMF TPT - UPS – June 2015 2 / 53
◮ Introduction ◮ NMF models ◮ Algorithms for solving NMF ◮ Applications ◮ Conclusion Slim ESSID (Telecom ParisTech) Introduction to NMF TPT - UPS – June 2015 3 / 53
Introduction Motivation Explaining data by factorisation General formulation W ( F × K ) × H ( K × N ) V ( F × N ) F N v n w k v n ≈ � K k = 1 h kn w k Illustration by C. Févotte Slim ESSID (Telecom ParisTech) Introduction to NMF TPT - UPS – June 2015 4 / 53
Introduction Motivation Explaining data by factorisation General formulation F V ( F × N ) W ( F × K ) × H ( K × N ) N v n w k data matrix “explanatory variables” “regressors”, “basis”, “dictionary”, “activation coefficients”, “patterns”, “topics” “expansion coefficients” Illustration by C. Févotte Slim ESSID (Telecom ParisTech) Introduction to NMF TPT - UPS – June 2015 4 / 53
Introduction Motivation Data is often nonnegative by nature 1 • pixel intensities; • amplitude spectra; • occurrence counts; • food or energy consumption; • user scores; • stock market values; • ... For the sake of interpretability of the results, optimal processing of nonnegative data may call for processing under nonnegativity constraints . 1 slide adapted from (Févotte, 2012). Slim ESSID (Telecom ParisTech) Introduction to NMF TPT - UPS – June 2015 5 / 53
Introduction Motivation The Nonnegative Matrix Factorisation model NMF provides an unsupervised linear representation of the data : H V ≈ WH ; W V − W = [ w fk ] s.t. w fk ≥ 0 and − H = [ h kn ] s.t. h kn ≥ 0. Illustration by N. Seichepine Slim ESSID (Telecom ParisTech) Introduction to NMF TPT - UPS – June 2015 6 / 53
Introduction Motivation Explaining face images by NMF 2 Image example: 49 images among 2429 from MIT’s CBCL face dataset 2 slide adapted from (Févotte, 2012). Slim ESSID (Telecom ParisTech) Introduction to NMF TPT - UPS – June 2015 7 / 53
Introduction Motivation Explaining face images by NMF Method Importance of features Facial Vectorised images features in each image ≈ ... ... ... ... V W H Slim ESSID (Telecom ParisTech) Introduction to NMF TPT - UPS – June 2015 8 / 53
Introduction Motivation NMF outputs Image example Illustration by C. Févotte Slim ESSID (Telecom ParisTech) Introduction to NMF TPT - UPS – June 2015 9 / 53
Introduction Motivation Notations I • V : the F × N data matrix : − F features (rows), − N observations/examples/feature vectors (columns); • v n = ( v 1 n , · · · , v Fn ) T : the n -th feature vector observation among a collection of N observations v 1 , · · · , v N ; • v n is a column vector in R F + ; v n is a row vector; • W : the F × K dictionary matrix : − w fk is one of its coefficients, − w k a dictionary/basis vector among K elements; Slim ESSID (Telecom ParisTech) Introduction to NMF TPT - UPS – June 2015 10 / 53
Introduction Motivation Notations II • H : the K × N activation/expansion matrix: − h n : the column vector of activation coefficients for observation v n : K � v n ≈ h kn w k ; k = 1 − h k : : the row vector of activation coefficients relating to basis vector w k . Slim ESSID (Telecom ParisTech) Introduction to NMF TPT - UPS – June 2015 11 / 53
NMF models ◮ Introduction ◮ NMF models – Cost functions – Weighted NMF schemes ◮ Algorithms for solving NMF ◮ Applications ◮ Conclusion Slim ESSID (Telecom ParisTech) Introduction to NMF TPT - UPS – June 2015 12 / 53
NMF models Cost functions NMF optimization criteria NMF approximation V ≈ WH is usually obtained through: W , H ≥ 0 D ( V | WH ) , min where D ( V | � V ) is a separable matrix divergence : F N � � D ( V | � V ) = d ( v fn | ˆ v fn ) , n = 1 f = 1 and d ( x | y ) defined for all x , y ≥ 0 is a scalar divergence such that: • d ( x | y ) is continuous over x and y ; • d ( x | y ) ≥ 0 for all x , y ≥ 0; • d ( x | y ) = 0 if and only if x = y . Slim ESSID (Telecom ParisTech) Introduction to NMF TPT - UPS – June 2015 13 / 53
NMF models Cost functions Popular (scalar) divergences Euclidean (EUC) distance ( Lee and Seung, 1999) 2 d EUC ( x | y ) = ( x − y ) Kullback-Leibler (KL) divergence ( Lee and Seung, 1999) d KL ( x | y ) = x log x y − x + y Itakura-Saito (IS) divergence ( Févotte et al., 2009) d IS ( x | y ) = x y − log x y − 1 Slim ESSID (Telecom ParisTech) Introduction to NMF TPT - UPS – June 2015 14 / 53
NMF models Cost functions Convexity properties Divergence d ( x | y ) EUC KL IS Convex on x yes yes yes Convex on y yes yes no Slim ESSID (Telecom ParisTech) Introduction to NMF TPT - UPS – June 2015 15 / 53
NMF models Cost functions Scale invariance properties 3 λ 2 d EUC ( x | y ) d EUC ( λ x | λ y ) = d KL ( λ x | λ y ) = λ d KL ( x | y ) d IS ( λ x | λ y ) = d IS ( x | y ) The IS divergence is scale-invariant → it provides higher accuracy in the representation of data with large dynamic range ( e.g. audio spectra). 3 slide adapted from (Févotte, 2012). Slim ESSID (Telecom ParisTech) Introduction to NMF TPT - UPS – June 2015 16 / 53
NMF models Weighted NMF schemes Weighted NMF Conventional NMF optimization criterion: F N � � min d ( v fn | ˆ v fn ) . W , H ≥ 0 n = 1 f = 1 Weighted NMF optimization criterion: F N � � min b fn d ( v fn | ˆ v fn ) , W , H ≥ 0 f = 1 n = 1 where b fn ( f = 1 , . . . , F , n = 1 , . . . , N ) are some nonnegative weights representing the contribution of data point v fn to NMF learning. Slim ESSID (Telecom ParisTech) Introduction to NMF TPT - UPS – June 2015 17 / 53
NMF models Weighted NMF schemes Weighted NMF application example I Learning from partial observations (e.g., for image inpainting as in ( Mairal et al., 2010) ): Observed value b fn = 1 Missing value b fn = 0 Slim ESSID (Telecom ParisTech) Introduction to NMF TPT - UPS – June 2015 18 / 53
NMF models Weighted NMF schemes Weighted NMF application example II Face feature extraction (example and figure from ( Blondel et al., 2008) ): Data V Weights B = { b fn } f , n Image-centered weights Face-centered weights Slim ESSID (Telecom ParisTech) Introduction to NMF TPT - UPS – June 2015 19 / 53
Algorithms for solving NMF ◮ Introduction ◮ NMF models ◮ Algorithms for solving NMF – Preliminaries – Difficulties in NMF – Multiplicative update rules ◮ Applications ◮ Conclusion Slim ESSID (Telecom ParisTech) Introduction to NMF TPT - UPS – June 2015 20 / 53
Algorithms for solving NMF Preliminaries Optimization problem An efficient solution of the NMF optimization problem θ C ( θ ) ; C ( θ ) def W , H ≥ 0 D ( V | WH ) ⇔ min min = D ( V | WH ) where θ def = { W , H } denotes the NMF parameters, must cope with the following difficulties: • the nonnegativity constraints must be taken into account; • the solution is not unique ... Slim ESSID (Telecom ParisTech) Introduction to NMF TPT - UPS – June 2015 21 / 53
Algorithms for solving NMF Difficulties in NMF NMF is ill-posed The solution is not unique Given V = WH ; W ≥ 0, H ≥ 0; any matrix Q such that: • WQ ≥ 0 • Q − 1 H ≥ 0 provides an alternative factorisation V = ˜ W ˜ H = ( WQ )( Q − 1 H ) . In particular, Q can be any nonnegative generalised permutation matrix ; e.g. , in R 3 : 0 0 2 Q = 0 3 0 1 0 0 This case is not so problematic: merely accounts for scaling and permutation of basis vectors w k . Slim ESSID (Telecom ParisTech) Introduction to NMF TPT - UPS – June 2015 22 / 53
Algorithms for solving NMF Difficulties in NMF Geometric interpretation and ill-posedness NMF assumes the data is well described by a simplicial convex cone C w generated by the columns of W : w 1 C w v i w 2 �� K � C w = k = 1 λ k w k ; λ k ≥ 0 Slim ESSID (Telecom ParisTech) Introduction to NMF TPT - UPS – June 2015 23 / 53
Algorithms for solving NMF Difficulties in NMF Geometric interpretation and ill-posedness NMF assumes the data is well described by a simplicial convex cone C w generated by the columns of W : w 1 C w w 1 C w v i v i w 2 w 2 �� K � C w = k = 1 λ k w k ; λ k ≥ 0 Problem : which C w ? Slim ESSID (Telecom ParisTech) Introduction to NMF TPT - UPS – June 2015 23 / 53
Algorithms for solving NMF Difficulties in NMF Geometric interpretation and ill-posedness NMF assumes the data is well described by a simplicial convex cone C w generated by the columns of W : w 1 C w w 1 C w v i v i w 2 w 2 �� K � C w = k = 1 λ k w k ; λ k ≥ 0 Problem : which C w ? → Need to impose constraints on the set of possible solutions to select the most “useful” ones. Slim ESSID (Telecom ParisTech) Introduction to NMF TPT - UPS – June 2015 23 / 53
Recommend
More recommend