Test of Time Award Online Dictionary Learning for Sparse Coding Julien Mairal, Francis Bach, Jean Ponce, Guillermo Sapiro International Conference on Machine Learning, 2019 Julien Mairal Online Dictionary Learning for Sparse Coding 1/15
Test of Time Award Online Learning for Matrix Factorization and Sparse Coding Julien Mairal, Francis Bach, Jean Ponce, Guillermo Sapiro International Conference on Machine Learning, 2019 Julien Mairal Online Dictionary Learning for Sparse Coding 1/15
Francis Jean Guillermo Julien Mairal Online Dictionary Learning for Sparse Coding 2/15
What are these papers about? n They are dealing with matrix factorization p p n A m ≈ × X D Julien Mairal Online Dictionary Learning for Sparse Coding 3/15
What are these papers about? n They are dealing with matrix factorization p p n A m ≈ × X D when a factor is sparse . Julien Mairal Online Dictionary Learning for Sparse Coding 3/15
What are these papers about? n They are dealing with matrix factorization p p n A m ≈ × X D or the other one. Julien Mairal Online Dictionary Learning for Sparse Coding 3/15
What are these papers about? n They are dealing with matrix factorization p p n A m ≈ × X D or both. Julien Mairal Online Dictionary Learning for Sparse Coding 3/15
What are these papers about? n They are dealing with matrix factorization p p n A m ≈ × X D or not only one factor is sparse , but it admits a particular structure . Julien Mairal Online Dictionary Learning for Sparse Coding 3/15
What are these papers about? n They are dealing with matrix factorization p p n A m ≈ × X D or one factor admits a particular structure ( e.g. , piecewise constant), but it is not sparse. Julien Mairal Online Dictionary Learning for Sparse Coding 3/15
What these papers are about? n → + ∞ In these papers, data matrices have many columns , p p n → + ∞ A m ≈ × X D or an infinite number of columns , or columns are streamed online . Julien Mairal Online Dictionary Learning for Sparse Coding 4/15
Formulation(s) X = [ x 1 , x 2 , . . . , x n ] is a data matrix . We may call D = [ d 1 , . . . , d p ] a dictionary . A = [ α 1 , . . . , α n ] carries the decomposition coefficients of X onto D . Julien Mairal Online Dictionary Learning for Sparse Coding 5/15
Formulation(s) X = [ x 1 , x 2 , . . . , x n ] is a data matrix . We may call D = [ d 1 , . . . , d p ] a dictionary . A = [ α 1 , . . . , α n ] carries the decomposition coefficients of X onto D . Interpretation as signal/data decomposition p � X ≈ DA ⇐ ⇒ ∀ i, x i ≈ D α i = α i [ j ] d j . j =1 Julien Mairal Online Dictionary Learning for Sparse Coding 5/15
Formulation(s) X = [ x 1 , x 2 , . . . , x n ] is a data matrix . We may call D = [ d 1 , . . . , d p ] a dictionary . A = [ α 1 , . . . , α n ] carries the decomposition coefficients of X onto D . Interpretation as signal/data decomposition p � X ≈ DA ⇐ ⇒ ∀ i, x i ≈ D α i = α i [ j ] d j . j =1 Generic formulation n 1 1 2 � x − D α � 2 + λψ ( α ) . � L ( x , D ) △ min L ( x i , D ) with = min n D ∈D α ∈A i =1 Julien Mairal Online Dictionary Learning for Sparse Coding 5/15
Formulation(s) X = [ x 1 , x 2 , . . . , x n ] is a data matrix . We may call D = [ d 1 , . . . , d p ] a dictionary . A = [ α 1 , . . . , α n ] carries the decomposition coefficients of X onto D . Interpretation as signal/data decomposition p � X ≈ DA ⇐ ⇒ ∀ i, x i ≈ D α i = α i [ j ] d j . j =1 Generic formulation / stochastic case n 1 2 � x − D α � 2 + λψ ( α ) . � L ( x , D ) △ D ∈D E x [ L ( x , D )] min with = min α ∈A i =1 Julien Mairal Online Dictionary Learning for Sparse Coding 5/15
Formulation(s) n 1 2 � x − D α � 2 + λψ ( α ) . � L ( x , D ) △ D ∈D E x [ L ( x , D )] min with = min α ∈A i =1 Which formulations does it cover? D A ψ R m × p R p non-negative matrix factorization 0 + + [Paatero and Tapper, ’94] Julien Mairal Online Dictionary Learning for Sparse Coding 6/15
Formulation(s) n 1 2 � x − D α � 2 + λψ ( α ) . � L ( x , D ) △ D ∈D E x [ L ( x , D )] min with = min α ∈A i =1 Which formulations does it cover? D A ψ R m × p R p non-negative matrix factorization 0 + + R p sparse coding { D : ∀ j, � d j � ≤ 1 } � . � 1 [Paatero and Tapper, ’94], [Olshausen and Field, ’96] Julien Mairal Online Dictionary Learning for Sparse Coding 6/15
Formulation(s) n 1 2 � x − D α � 2 + λψ ( α ) . � L ( x , D ) △ D ∈D E x [ L ( x , D )] min with = min α ∈A i =1 Which formulations does it cover? D A ψ R m × p R p non-negative matrix factorization 0 + + R p sparse coding { D : ∀ j, � d j � ≤ 1 } � . � 1 R p non-negative sparse coding { D : ∀ j, � d j � ≤ 1 } � . � 1 + [Paatero and Tapper, ’94], [Olshausen and Field, ’96], [Hoyer, 2002] Julien Mairal Online Dictionary Learning for Sparse Coding 6/15
Formulation(s) n 1 2 � x − D α � 2 + λψ ( α ) . � L ( x , D ) △ D ∈D E x [ L ( x , D )] min with = min α ∈A i =1 Which formulations does it cover? D A ψ R m × p R p non-negative matrix factorization 0 + + R p sparse coding { D : ∀ j, � d j � ≤ 1 } � . � 1 R p non-negative sparse coding { D : ∀ j, � d j � ≤ 1 } � . � 1 + R p structured sparse coding { D : ∀ j, � d j � ≤ 1 } � . � 1 + Ω( . ) [Paatero and Tapper, ’94], [Olshausen and Field, ’96], [Hoyer, 2002], [Mairal et al., 2011] Julien Mairal Online Dictionary Learning for Sparse Coding 6/15
Formulation(s) n 1 2 � x − D α � 2 + λψ ( α ) . � L ( x , D ) △ D ∈D E x [ L ( x , D )] min with = min α ∈A i =1 Which formulations does it cover? D A ψ R m × p R p non-negative matrix factorization 0 + + R p sparse coding { D : ∀ j, � d j � ≤ 1 } � . � 1 R p non-negative sparse coding { D : ∀ j, � d j � ≤ 1 } � . � 1 + R p structured sparse coding { D : ∀ j, � d j � ≤ 1 } � . � 1 + Ω( . ) { D : ∀ j, � d j � 2 R p ≈ sparse PCA 2 + � d j � 1 ≤ 1 } � . � 1 [Paatero and Tapper, ’94], [Olshausen and Field, ’96], [Hoyer, 2002], [Mairal et al., 2011], [Zou et al., 2004]. Julien Mairal Online Dictionary Learning for Sparse Coding 6/15
Formulation(s) n 1 2 � x − D α � 2 + λψ ( α ) . � L ( x , D ) △ D ∈D E x [ L ( x , D )] min with = min α ∈A i =1 Which formulations does it cover? D A ψ R m × p R p non-negative matrix factorization 0 + + R p sparse coding { D : ∀ j, � d j � ≤ 1 } � . � 1 R p non-negative sparse coding { D : ∀ j, � d j � ≤ 1 } � . � 1 + R p structured sparse coding { D : ∀ j, � d j � ≤ 1 } � . � 1 + Ω( . ) { D : ∀ j, � d j � 2 R p ≈ sparse PCA 2 + � d j � 1 ≤ 1 } � . � 1 . . . . . . . . . . . . [Paatero and Tapper, ’94], [Olshausen and Field, ’96], [Hoyer, 2002], [Mairal et al., 2011], [Zou et al., 2004]. Julien Mairal Online Dictionary Learning for Sparse Coding 6/15
The sparse coding context was introduced by Olshausen and Field, ’96. It was the first time (together with ICA, see [Bell and Sejnowski, ’97]) that a simple unsupervised learning principle would lead to various sorts of “Gabor-like” filters, when trained on natural image patches. Julien Mairal Online Dictionary Learning for Sparse Coding 7/15
The sparse coding context Remember that we can play with various structured sparsity-inducing penalties : [Jenatton et al. 2010], [Kavukcuoglu et al., 2009], [Mairal et al. 2011], [Hyv¨ arinen and Hoyer, 2001]. Julien Mairal Online Dictionary Learning for Sparse Coding 8/15
Sparsity and simplicity principles 1921 1921: Wrinch and Jeffrey’s simplicity principle. Julien Mairal Online Dictionary Learning for Sparse Coding 9/15
Sparsity and simplicity principles 1921 1950 1921: Wrinch and Jeffrey’s simplicity principle. 1952: Markowitz’s portfolio selection. Julien Mairal Online Dictionary Learning for Sparse Coding 9/15
Sparsity and simplicity principles 1921 1950 1960 1970 1921: Wrinch and Jeffrey’s simplicity principle. 1952: Markowitz’s portfolio selection. 1960’s and 70’s: best subset selection in statistics. Julien Mairal Online Dictionary Learning for Sparse Coding 9/15
Sparsity and simplicity principles 1921 1950 1960 1970 1980 1990 1921: Wrinch and Jeffrey’s simplicity principle. 1952: Markowitz’s portfolio selection. 1960’s and 70’s: best subset selection in statistics. 1990’s: the wavelet era in signal processing. Julien Mairal Online Dictionary Learning for Sparse Coding 9/15
Sparsity and simplicity principles 1921 1950 1960 1970 1980 1990 2000 1921: Wrinch and Jeffrey’s simplicity principle. 1952: Markowitz’s portfolio selection. 1960’s and 70’s: best subset selection in statistics. 1990’s: the wavelet era in signal processing. 1996: Olshausen and Field’s dictionary learning method. 1994–1996: the Lasso (Tibshirani) and Basis pursuit (Chen and Donoho). Julien Mairal Online Dictionary Learning for Sparse Coding 9/15
Recommend
More recommend