New variants of Nonnegative Matrix Factorization for sparsity improvement and maximum biclique finding Nicolas Gillis nicolas.gillis@uclouvain.be In collaboration with Fran¸ cois Glineur UCL/CORE (Center for Operations Research and Econometrics) UCL/INMA (Department of Mathematical Engineering) March 3, 2009 Seminar at CESAME CESAME Nonnegative Matrix Factorization 1
Outline 1. Introduction to Nonnegative Matrix Factorization ◮ Motivations and applications ◮ Some algorithms 2. Rank-one update and Nonnegative Factorization ◮ Nonnegative Factorization ◮ Complexity and the maximum edge biclique problem 3. Greedy with Underapproximations ◮ For sparse approximations ◮ Algorithm based on Lagrangian relaxation CESAME Nonnegative Matrix Factorization 2
Why low-rank matrix approximations ? Given a matrix M ∈ R m × n and a factorization rank r , we would like to find U ∈ R m × r and V ∈ R r × n such that M ≈ UV M is approximated by a rank r matrix. − → dimensionality reduction for noise filtering, compression, interpretation, classification, . . . CESAME Nonnegative Matrix Factorization 3
Why low-rank matrix approximations ? Given a matrix M ∈ R m × n and a factorization rank r , we would like to find U ∈ R m × r and V ∈ R r × n such that M ≈ UV M is approximated by a rank r matrix. − → dimensionality reduction for noise filtering, compression, interpretation, classification, . . . CESAME Nonnegative Matrix Factorization 3
Matrix approximation and optimization If we want to minimize the sum of squares of the error i.e. � || M − UV || 2 ( M − UV ) 2 min F = ij , U,V ij the matrix factorization problem is an unconstrained optimization problem. This is a well-known problem with nice properties and which can be solved efficiently. It corresponds to finding the principal components of your data matrix (PCA). This can be solved using truncation of the singular value decomposition (SVD). CESAME Nonnegative Matrix Factorization 4
Matrix approximation and optimization If we want to minimize the sum of squares of the error i.e. � || M − UV || 2 ( M − UV ) 2 min F = ij , U,V ij the matrix factorization problem is an unconstrained optimization problem. This is a well-known problem with nice properties and which can be solved efficiently. It corresponds to finding the principal components of your data matrix (PCA). This can be solved using truncation of the singular value decomposition (SVD). CESAME Nonnegative Matrix Factorization 4
Matrix factorization, a linear model If each column of M is an element of a dataset, r � M : j U : k V kj ≈ ���� ���� ���� k =1 basis elements elements of the data weights the columns of M are decomposed into a linear combination of the columns of U which then form a basis of these elements. Example 2 3 2 1 . 5 − 0 . 8 � � 1 2 . 3 0 . 6 M = 2 1 1 ≈ 0 . 7 − 0 . 9 = UV − 1 0 . 7 − 1 1 5 0 1 . 9 1 2 . 3 2 . 9 1 . 7 = 1 . 6 1 1 . 3 0 . 9 5 . 1 0 . 1 CESAME Nonnegative Matrix Factorization 5
Matrix factorization, a linear model If each column of M is an element of a dataset, r � M : j U : k V kj ≈ ���� ���� ���� k =1 basis elements elements of the data weights the columns of M are decomposed into a linear combination of the columns of U which then form a basis of these elements. Example 2 3 2 1 . 5 − 0 . 8 � � 1 2 . 3 0 . 6 M = 2 1 1 ≈ 0 . 7 − 0 . 9 = UV − 1 0 . 7 − 1 1 5 0 1 . 9 1 2 . 3 2 . 9 1 . 7 = 1 . 6 1 1 . 3 0 . 9 5 . 1 0 . 1 CESAME Nonnegative Matrix Factorization 5
Nonnegativity In many applications, data are nonnegative, often due to physical considerations, e.g. ⋄ images are described by pixel intensities; ⋄ texts are represented by vectors of word counts; ⋄ spectra correspond to power intensities. For interpretation purposes, one can think of imposing nonnegativity constraints on the factor U so that basis elements belong to the same space as the original data. Moreover, in order to force the reconstruction of the basis elements to be additive, one can impose the weights V to be nonnegative as well. CESAME Nonnegative Matrix Factorization 6
Nonnegativity In many applications, data are nonnegative, often due to physical considerations, e.g. ⋄ images are described by pixel intensities; ⋄ texts are represented by vectors of word counts; ⋄ spectra correspond to power intensities. For interpretation purposes, one can think of imposing nonnegativity constraints on the factor U so that basis elements belong to the same space as the original data. Moreover, in order to force the reconstruction of the basis elements to be additive, one can impose the weights V to be nonnegative as well. CESAME Nonnegative Matrix Factorization 6
Nonnegativity In many applications, data are nonnegative, often due to physical considerations, e.g. ⋄ images are described by pixel intensities; ⋄ texts are represented by vectors of word counts; ⋄ spectra correspond to power intensities. For interpretation purposes, one can think of imposing nonnegativity constraints on the factor U so that basis elements belong to the same space as the original data. Moreover, in order to force the reconstruction of the basis elements to be additive, one can impose the weights V to be nonnegative as well. CESAME Nonnegative Matrix Factorization 6
Image Processing Each column of M represents a face using pixel intensity → M is a nonnegative matrix CESAME Nonnegative Matrix Factorization 7
Image Processing For an unconstrained decomposition Figure: Gray: positive entries; Red: negatives entries Basis elements are not nonnegative and can not be interpreted easily as facial features. CESAME Nonnegative Matrix Factorization 8
Image Processing U ≥ 0 constraints the basis elements to be nonnegative. Moreover V ≥ 0 imposes an additive reconstruction. The basis elements extract facial features such as eyes, nose and lips. CESAME Nonnegative Matrix Factorization 9
Image Processing U ≥ 0 constraints the basis elements to be nonnegative. Moreover V ≥ 0 imposes an additive reconstruction. The basis elements extract facial features such as eyes, nose and lips. CESAME Nonnegative Matrix Factorization 9
Image Processing U ≥ 0 constraints the basis elements to be nonnegative. Moreover V ≥ 0 imposes an additive reconstruction. The basis elements extract facial features such as eyes, nose and lips. CESAME Nonnegative Matrix Factorization 9
Image Processing NMF allows a part-based representation of the data. CESAME Nonnegative Matrix Factorization 10
Text Mining M ( i, j ) is the frequency of word i in text j i.e. the columns of M represents the words frequency in each text. CESAME Nonnegative Matrix Factorization 11
Text Mining ⋄ Basis elements allow to recover the different topics; ⋄ Weights allow to assign each text to its corresponding classes. CESAME Nonnegative Matrix Factorization 12
Text Mining ⋄ Basis elements allow to recover the different topics; ⋄ Weights allow to assign each text to its corresponding classes. CESAME Nonnegative Matrix Factorization 12
Text Mining ⋄ Basis elements allow to recover the different topics; ⋄ Weights allow to assign each text to its corresponding classes. CESAME Nonnegative Matrix Factorization 12
Text Mining ⋄ Basis elements allow to recover the different topics; ⋄ Weights allow to assign each text to its corresponding classes. CESAME Nonnegative Matrix Factorization 12
Text Mining ⋄ Basis elements allow to recover the different topics: ◮ Basis element 1 : profit, company, bank, . . . → Economy ◮ Basis element 2 : run, jump, score, . . . → Sport ⋄ Weights allow to assign each text to its corresponding class. CESAME Nonnegative Matrix Factorization 13
Text Mining ⋄ Basis elements allow to recover the different topics: ◮ Basis element 1 : profit, company, bank, . . . → Economy ◮ Basis element 2 : run, jump, score, . . . → Sport ⋄ Weights allow to assign each text to its corresponding class. CESAME Nonnegative Matrix Factorization 13
Text Mining ⋄ Basis elements allow to recover the different topics: ◮ Basis element 1 : profit, company, bank, . . . → Economy ◮ Basis element 2 : run, jump, score, . . . → Sport ⋄ Weights allow to assign each text to its corresponding class. CESAME Nonnegative Matrix Factorization 13
Text Mining ⋄ Basis elements allow to recover the different topics: ◮ Basis element 1 : profit, company, bank, . . . → Economy ◮ Basis element 2 : run, jump, score, . . . → Sport ⋄ Weights allow to assign each text to its corresponding class. CESAME Nonnegative Matrix Factorization 13
Text Mining ⋄ Basis elements allow to recover the different topics: ◮ Basis element 1 : profit, company, bank, . . . → Economy ◮ Basis element 2 : run, jump, score, . . . → Sport ⋄ Weights allow to assign each text to its corresponding class. CESAME Nonnegative Matrix Factorization 13
Spectral Data Analysis More than 15000 various type of objects in orbit (military/commercial satellites, debris, . . . ). Need for space object database mining, object identification, clustering, classification, . . . CESAME Nonnegative Matrix Factorization 14
Recommend
More recommend