Data Sciences CentraleSupelec Advance Machine Learning Course VI - - PowerPoint PPT Presentation

Data Sciences – CentraleSupelec Advance Machine Learning Course VI - Nonnegative matrix factorization Emilie Chouzenoux Center for Visual Computing CentraleSupelec emilie.chouzenoux@centralesupelec.fr

Motivation Matrix factorization: Given a set of data entries x j ∈ R p , 1 ≤ j ≤ n , and a dimension r < min( p , n ), we search for r basis elements w k , 1 ≤ k ≤ r such that r � x j ≈ w k h j ( k ) k =1 with some weights h j ∈ R r . Equivalent form: X ≈ WH ◮ X ∈ R p × n s.t. X (: , j ) = x j for 1 ≤ j ≤ n , ◮ W ∈ R p × r s.t. W (: , k ) = w k for 1 ≤ k ≤ r , ◮ H ∈ R r × n s.t. H (: , j ) = h j for 1 ≤ j ≤ n . :

Motivation X ≈ WH ⇒ low-rank approximation / linear dimensionality reduction Two key aspects: 1. Which loss function to assess the quality of the approximation ? Typical examples: Frobenius norm, KL-divergence, logistic, Itakura-Saito. 2. Which assumptions on the structure of the factors W and H ? Typical examples: Independency, sparsity, normalization, non-negativity. NMF: find ( W , H ) s.t. X ≈ WH , W ≥ 0 , H ≥ 0 . :

Example: Facial feature extraction Decomposition of the CBCL face database [Lee and Seung, 1999] ⇒ Some of the features look like parts of nose or eye. Decomposition of a face as having a certain weight of a certain nose type, a certain amount of some eye type, etc. :

Example: Spectral unmixing Decomposition of the Urban hyperspectral image [Ma et al. , 2014] ⇒ NMF is able to compute the spectral signatures of the endmembers and simultaneously the abundance of each endmember in each pixel. :

Example: Topic modeling in text mining Goal: Decompose a term-document matrix, where each column represents a document, and each element in the document represents the weight of a certain word (e.g., term frequency - inverse document frequency). The ordering of the words in the documents is not taken into account (= bag-of-words). Topic decomposition model [Blei, 2012] ⇒ The NMF decomposition of the term-document matrix yields components that could be considered as “topics”, and decomposes each document into a weighted sum of topics. :

White board :

Multiplicative algorithms for NMF Challenges: NMF is NP-hard and ill-posed. Most algorithms are only guaranteed to converge to stationary point, and may be sensitive to initialization. We present here a popular class of methods introduced in [Lee and Seung, 1999], relying on simple multiplicative updates. (Assumption: X ≥ 0). ∗ Frobenius norm: � X − WH � 2 F XH ⊤ W ← W ◦ WHH ⊤ W ⊤ X H ← H ◦ W ⊤ WH ∗ KL-divergence: KL ( X , WH ) � n ℓ =1 ( H k ℓ X i ℓ / [ WH ] i ℓ ) W ik ← W ik � n ℓ =1 H k ℓ � p i =1 ( W ik X ij / [ WH ] ij ) H kj ← H kj � p i =1 W ik :

Sketch of proof The multiplicative schemes rely on the use of separable surrogate functions, majorizing the loss w.r.t. W and H , respectively: ∗ Frobenius norm: For every ( X , W , H , ¯ H ) ≥ 0, and 1 ≤ j ≤ n , p r � 2 � 1 X ij − H kj � � W ik ¯ [ W ¯ � Wh j − x j � 2 2 ≤ h j ] i H kj [ W ¯ ¯ h j ] i H kj i =1 k =1 ∗ KL-divergence: For every ( X , W , H , ¯ H ) ≥ 0, and 1 ≤ j ≤ n , p � KL ( x j , Wh j ) ≤ ( X ij log X ij − X ij + [ Wh j ] i i =1 r �� X ij � H kj � W ik ¯ [ W ¯ − H kj log h j ] i [ W ¯ ¯ h j ] i H kj k =1 :

White board :

Weighted NMF ∗ Weigthed Frobenius norm: � Σ ◦ ( X − WH ) � 2 F (Σ ◦ X ) H ⊤ W ← W ◦ (Σ ◦ WH ) H ⊤ W ⊤ (Σ ◦ X ) H ← H ◦ W ⊤ (Σ ◦ ( WH )) ∗ Weigthed KL-divergence: KL ( X , Diag( p ) WH Diag( q )) � n ℓ =1 ( H k ℓ X i ℓ / ( p i [ WH ] i ℓ )) W ik ← W ik � n ℓ =1 q ℓ H k ℓ � p i =1 ( W ik X ij / ( q j [ WH ] ij )) H kj ← H kj � p i =1 p i W ik � A typical application is matrix completion to predict unobserved data, for instance in user-rating matrices. In that case, binary weights are used, signaling the position of the available entries in X . :

White board :

Regularized NMF ∗ Regularized Frobenius norm: 1 F + µ F + λ � H � 1 + ν 2 � X − WH � 2 2 � H � 2 2 � W � 2 F XH ⊤ W ← W ◦ W ( HH ⊤ + ν I r ) H ← H ◦ W ⊤ X − λ 1 r × n ( W ⊤ W + µ I r ) H � The ambiguity due to rescaling of ( W , H ) and to rotation is frozen by the penalty terms. :

White board :

Other NMF algorithms Multiplicative updates (MU) are simple to implement but they can be slow to converge, and are sensitive to initialization. Other strategies are listed below (for the Least-Squares case): ◮ Alternating Least Squares: First compute the unconstrained solution w.r.t. W or H and project onto nonnegative orthant. Easy to implement but oscillations can arise (no convergence guarantee). Rather powerful for initialization purposes. ◮ Alternating Nonnegative Least Squares: Solve constrained problem exactly, w.r.t. W and H , in alternate manner, using inner solver (e.g., projected gradient, Quasi-Newton, active set). Expensive. Useful as refinement step of a cheap MU. ◮ Hierarchical Alternative Least Squares: Exact coordinate descent method, updating one column of W (resp. one line of H ) at a time. Simple to implement, and similar performance than MU. :

Data Sciences CentraleSupelec Advance Machine Learning Course VI - - PowerPoint PPT Presentation

Data Sciences CentraleSupelec Advance Machine Learning Course VI - Nonnegative matrix factorization Emilie Chouzenoux Center for Visual Computing CentraleSupelec emilie.chouzenoux@centralesupelec.fr Motivation Matrix factorization: Given

Data Sciences CentraleSupelec Advance Machine Learning Course VII - Inference on Graphical

Data Sciences CentraleSupelec Advance Machine Learning Course II - Linear regression/Linear

Data Sciences CentraleSupelec Advance Machine Learning Course III - Stochastic approximation

Inperia Advance BIS Coated CoCr BMS for BTK Indications DS - 2018 Inperia Advance Inperia

Clustering Lesson 3 : Lab Session Advanced Machine Learning, CentraleSupelec Teachers Assistant

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

Deep learning J er emy Fix CentraleSup elec jeremy.fix@centralesupelec.fr 2016 1 / 94

T1 ADVANCE + / T1D ABOUT THE T1 ADVANCE The T1 ADVANCE + from TRIWATER SOLUTIONS INC. was

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

Machine learning for finance Nathan George Data Science Professor DataCamp Machine Learning

Machine Learning 1 Machine(Learning(in(a(Nutshell ( Data$ Model$ Performance$ Measure$

ST5 : Semi autonomous drone navigation Introduction J er emy Fix

Lecture 1: Introduction Felix Held, Mathematical Sciences MSA220/MVE440 Statistical Learning for

Basics on generative and discriminative classification Machine Learning and Object Recognition

Printability and Complexity Co-optimization Bentian Jiang 1 , Lixin Liu 1 , Yuzhe Ma 1 , Hang Zhang

Probabilistic Graphical Models David Sontag New York University Lecture 10, April 3, 2012 David

(Statistical Machine-Learning) General framework + Supervised Learning Pr. Fabien MOUTARDE

Neural Networks Janos Borst July 23, 2019 University of Leipzig - NLP Group Machine Learning

Electricity Demand Forecasting by Multi-Task Learning Jean-Baptiste Fiot Francesco Dinuzzo IBM

A New Paradigm in Hydrological Forecasting Qingyun Duan College of Hydrology and Water Resources