Sparse Coding and Dictionary Learning for Image Analysis Part I: - PowerPoint PPT Presentation

Sparse Coding and Dictionary Learning for Image Analysis Part I: Optimization for Sparse Coding Francis Bach, Julien Mairal, Jean Ponce and Guillermo Sapiro ICCV’09 tutorial, Kyoto, 28th September 2009 Francis Bach, Julien Mairal, Jean Ponce and Guillermo Sapiro Optimization for Sparse Coding 1/41

What is a Sparse Linear Model? Let x in R m be a signal. Let D = [ d 1 , . . . , d p ] ∈ R m × p be a set of normalized “basis vectors”. We call it dictionary . D is “adapted” to x if it can represent it with a few basis vectors—that is, there exists a sparse vector α in R p such that x ≈ D α . We call α the sparse code .   α [1]     α [2]    d 1   ≈ · · ·  x d 2 d p .    .  .   α [ p ] � �� x ∈ R m D ∈ R m × p � �� α ∈ R p , sparse Francis Bach, Julien Mairal, Jean Ponce and Guillermo Sapiro Optimization for Sparse Coding 2/41

The Sparse Decomposition Problem 1 2 || x − D α || 2 min + λψ ( α ) 2 α ∈ R p � �� sparsity-inducing data fitting term regularization ψ induces sparsity in α . It can be △ the ℓ 0 “pseudo-norm”. || α || 0 = # { i s.t. α [ i ] � = 0 } (NP-hard) = � p △ the ℓ 1 norm. || α || 1 i =1 | α [ i ] | (convex) . . . This is a selection problem. Francis Bach, Julien Mairal, Jean Ponce and Guillermo Sapiro Optimization for Sparse Coding 3/41

Finding your way in the sparse coding literature. . . . . . is not easy. The literature is vast, redundant, sometimes confusing and many papers are claiming victory. . . The main class of methods are greedy procedures [Mallat and Zhang, 1993], [Weisberg, 1980] homotopy [Osborne et al., 2000], [Efron et al., 2004], [Markowitz, 1956] soft-thresholding based methods [Fu, 1998], [Daubechies et al., 2004], [Friedman et al., 2007], [Nesterov, 2007], [Beck and Teboulle, 2009], . . . reweighted- ℓ 2 methods [Daubechies et al., 2009],. . . active-set methods [Roth and Fischer, 2008]. . . . Francis Bach, Julien Mairal, Jean Ponce and Guillermo Sapiro Optimization for Sparse Coding 4/41

1 Greedy Algorithms 2 Homotopy and LARS 3 Soft-thresholding based optimization Francis Bach, Julien Mairal, Jean Ponce and Guillermo Sapiro Optimization for Sparse Coding 5/41

Matching Pursuit || 2 α ∈ R p || x − D α min 2 s.t. || α || 0 ≤ L � �� r 1: α ← 0 2: r ← x (residual). 3: while || α || 0 < L do Select the atom with maximum correlation with the residual 4: | d T ˆ ı ← arg max i r | i =1 ,..., p Update the residual and the coefficients 5: ı ] + d T α [ˆ ı ] ← α [ˆ ı r ˆ r − ( d T ← ı r ) d ˆ r ˆ ı 6: end while Francis Bach, Julien Mairal, Jean Ponce and Guillermo Sapiro Optimization for Sparse Coding 6/41

Matching Pursuit α = (0 , 0 , 0) y d 2 d 1 r z d 3 x Francis Bach, Julien Mairal, Jean Ponce and Guillermo Sapiro Optimization for Sparse Coding 7/41

Matching Pursuit α = (0 , 0 , 0) y d 2 d 1 r z d 3 < r , d 3 > d 3 x Francis Bach, Julien Mairal, Jean Ponce and Guillermo Sapiro Optimization for Sparse Coding 8/41

Matching Pursuit α = (0 , 0 , 0) y d 2 d 1 r r − < r , d 3 > d 3 z d 3 x Francis Bach, Julien Mairal, Jean Ponce and Guillermo Sapiro Optimization for Sparse Coding 9/41

Matching Pursuit α = (0 , 0 , 0 . 75) y d 2 d 1 z r d 3 x Francis Bach, Julien Mairal, Jean Ponce and Guillermo Sapiro Optimization for Sparse Coding 10/41

Matching Pursuit α = (0 , 0 . 24 , 0 . 75) y d 2 d 1 z d 3 r x Francis Bach, Julien Mairal, Jean Ponce and Guillermo Sapiro Optimization for Sparse Coding 13/41

Orthogonal Matching Pursuit α ∈ R p || x − D α || 2 min 2 s.t. || α || 0 ≤ L 1: Γ = ∅ . 2: for iter = 1 , . . . , L do Select the atom which most reduces the objective 3: � � α ′ || x − D Γ ∪{ i } α ′ || 2 ˆ ı ← arg min min 2 i ∈ Γ C Update the active set: Γ ← Γ ∪ { ˆ ı } . 4: Update the residual (orthogonal projection) 5: Γ D Γ ) − 1 D T r ← ( I − D Γ ( D T Γ ) x . Update the coefficients 6: Γ D Γ ) − 1 D T α Γ ← ( D T Γ x . 7: end for Francis Bach, Julien Mairal, Jean Ponce and Guillermo Sapiro Optimization for Sparse Coding 17/41

Orthogonal Matching Pursuit α = (0 , 0 , 0) Γ = ∅ y d 2 d 1 x z d 3 x Francis Bach, Julien Mairal, Jean Ponce and Guillermo Sapiro Optimization for Sparse Coding 18/41

Orthogonal Matching Pursuit α = (0 , 0 , 0 . 75) Γ = { 3 } y d 2 d 1 π 2 xr z π 1 d 3 π 3 x Francis Bach, Julien Mairal, Jean Ponce and Guillermo Sapiro Optimization for Sparse Coding 19/41

Orthogonal Matching Pursuit α = (0 , 0 . 29 , 0 . 63) Γ = { 3 , 2 } y d 2 d 1 xr z π 32 π 31 d 3 x Francis Bach, Julien Mairal, Jean Ponce and Guillermo Sapiro Optimization for Sparse Coding 20/41

Orthogonal Matching Pursuit Contrary to MP, an atom can only be selected one time with OMP. It is, however, more difficult to implement efficiently. The keys for a good implementation in the case of a large number of signals are Precompute the Gram matrix G = D T D once in for all, Maintain the computation of D T r for each signal, Γ D Γ ) − 1 for each signal. Maintain a Cholesky decomposition of ( D T The total complexity for decomposing n L -sparse signals of size m with a dictionary of size p is O ( p 2 m ) + O ( nL 3 ) + O ( n ( pm + pL 2 )) = O ( np ( m + L 2 )) � �� D T r Gram matrix Cholesky It is also possible to use the matrix inversion lemma instead of a Cholesky decomposition (same complexity, but less numerical stability) Francis Bach, Julien Mairal, Jean Ponce and Guillermo Sapiro Optimization for Sparse Coding 21/41

Example with the software SPAMS Software available at http://www.di.ens.fr/willow/SPAMS/ >> I=double(imread(’data/lena.png’))/255; >> %extract all patches of I >> X=im2col(I,[8 8],’sliding’); >> %load a dictionary of size 64 x 256 >> D=load(’dict.mat’); >> >> %set the sparsity parameter L to 10 >> param.L=10; >> alpha=mexOMP(X,D,param); On a 8-cores 2.83Ghz machine: 230000 signals processed per second! Francis Bach, Julien Mairal, Jean Ponce and Guillermo Sapiro Optimization for Sparse Coding 22/41

Why does the ℓ 1 -norm induce sparsity? Analysis of the norms in 1D ψ ( α ) = | α | ψ ( α ) = α 2 ψ ′ ( α ) = 2 α ψ ′ ψ ′ − ( α ) = − 1 , + ( α ) = 1 The gradient of the ℓ 2 -norm vanishes when α get close to 0. On its differentiable part, the norm of the gradient of the ℓ 1 -norm is constant. Francis Bach, Julien Mairal, Jean Ponce and Guillermo Sapiro Optimization for Sparse Coding 23/41

Why does the ℓ 1 -norm induce sparsity? Exemple: quadratic problem in 1D 1 2( x − α ) 2 + λ | α | min α ∈ R Piecewise quadratic function with a kink at zero. Derivative at 0 + : g + = − x + λ and 0 − : g − = − x − λ . Optimality conditions. α is optimal iff: | α | > 0 and ( x − α ) + λ sign( α ) = 0 α = 0 and g + ≥ 0 and g − ≤ 0 The solution is a soft-thresholding : α ⋆ = sign( x )( | x | − λ ) + . Francis Bach, Julien Mairal, Jean Ponce and Guillermo Sapiro Optimization for Sparse Coding 24/41

Why does the ℓ 1 -norm induce sparsity? Physical illustration E 1 = 0 E 1 = 0 x 0 Francis Bach, Julien Mairal, Jean Ponce and Guillermo Sapiro Optimization for Sparse Coding 25/41

Why does the ℓ 1 -norm induce sparsity? Physical illustration E 1 = k 1 2 ( x 0 − x ) 2 E 1 = k 1 2 ( x 0 − x ) 2 E 2 = mgx E 2 = k 2 2 x 2 x x Francis Bach, Julien Mairal, Jean Ponce and Guillermo Sapiro Optimization for Sparse Coding 26/41

Why does the ℓ 1 -norm induce sparsity? Physical illustration E 1 = k 1 2 ( x 0 − x ) 2 E 1 = k 1 2 ( x 0 − x ) 2 E 2 = k 2 2 x 2 x E 2 = mgx x = 0 !! Francis Bach, Julien Mairal, Jean Ponce and Guillermo Sapiro Optimization for Sparse Coding 27/41

Why does the ℓ 1 -norm induce sparsity? The geometric explanation x y y x z general quadratic problem: coupled soft-thresholding. Francis Bach, Julien Mairal, Jean Ponce and Guillermo Sapiro Optimization for Sparse Coding 28/41

Sparse Coding and Dictionary Learning for Image Analysis Part I: - PowerPoint PPT Presentation

Sparse Coding and Dictionary Learning for Image Analysis Part I: Optimization for Sparse Coding Francis Bach, Julien Mairal, Jean Ponce and Guillermo Sapiro ICCV09 tutorial, Kyoto, 28th September 2009 Francis Bach, Julien Mairal, Jean Ponce

Sparse Coding and Dictionary Learning for Image Analysis Part II: Dictionary Learning for signal

Sparse Coding and Dictionary Learning for Image Analysis Part IV: New sparse models Francis

Test of Time Award Online Dictionary Learning for Sparse Coding Julien Mairal, Francis Bach, Jean

Formal Modeling in Cognitive Science 1 Coding Theorems Lecture 28: Kraft Inequality; Source Coding

Learning ancestral atom of structured dictionary via sparse coding Bernoulli Society Satellite

Image and Video Coding: Video Coding Extensions Screen Content Coding Screen Content Coding

Bag of Pursuits and Neural Gas for Improved Sparse Coding Manifold Learning with Sparse Coding

Sparse dictionary learning in the presence of noise & outliers Rmi Gribonval INRIA Rennes

VIDEO SIGNALS Lossless coding g LOSSLESS CODING LOSSLESS CODING The goal of lossless image

Image and Video Coding: Encoder Control D D = - R d R Problem Statement / Scope of Image

The Dictionary ADT The dictionary ADT models a searchable collection findElement(k): if the

Sparse Matrices Example Of Sparse Matrices diagonal tridiagonal sparse many elements are

ADVANCED MULTIMEDIA ADVANCED MULTIMEDIA CODING CODING Fernando Pereira Instituto Superior

Dynamical systems Expanding maps on the circle. Coding Jana Rodriguez Hertz ICTP 2018 coding

Image and Video Coding: Hybrid Video Coding s n 1 [ x , y ] s n [ x , y ] m k = ( m x , m

Image Restoration Image Enhancement and Image Restoration both deal with improving images. Image

Outline for Today Course Overview Goals Administrative details CSE 143 Workload

t Prr Pr

Speech Processing 15-492/18-492 Multilinguality SPICE: making it easier Dealing with all

Federal Pharmacy Senior Leaders Vision: Next Three - Five Years CAPT Brian LaPlant, US Public

RinohType A Document Processor inspired by LaTeX Brecht Machiels EuroPython 2015 About the

[3] The Matrix What is a matrix? Traditional answer Neo: What is the Matrix? Trinity: The answer

Weapons of mass prediction Leonardo Egidi a (joint work with Jonah Gabry b , in preparation for

Learning Systems Research at the Intersection of Machine Learning & Data Systems Joseph E.

Sparse Coding and Dictionary Learning for Image Analysis Part I: - PowerPoint PPT Presentation

Sparse Coding and Dictionary Learning for Image Analysis Part I: Optimization for Sparse Coding Francis Bach, Julien Mairal, Jean Ponce and Guillermo Sapiro ICCV09 tutorial, Kyoto, 28th September 2009 Francis Bach, Julien Mairal, Jean Ponce

Sparse Coding and Dictionary Learning for Image Analysis Part II: Dictionary Learning for signal

Sparse Coding and Dictionary Learning for Image Analysis Part IV: New sparse models Francis

Test of Time Award Online Dictionary Learning for Sparse Coding Julien Mairal, Francis Bach, Jean

Formal Modeling in Cognitive Science 1 Coding Theorems Lecture 28: Kraft Inequality; Source Coding

Learning ancestral atom of structured dictionary via sparse coding Bernoulli Society Satellite

Image and Video Coding: Video Coding Extensions Screen Content Coding Screen Content Coding

Bag of Pursuits and Neural Gas for Improved Sparse Coding Manifold Learning with Sparse Coding

Sparse dictionary learning in the presence of noise &amp; outliers Rmi Gribonval INRIA Rennes

VIDEO SIGNALS Lossless coding g LOSSLESS CODING LOSSLESS CODING The goal of lossless image

Image and Video Coding: Encoder Control D D = - R d R Problem Statement / Scope of Image

The Dictionary ADT The dictionary ADT models a searchable collection findElement(k): if the

Sparse Matrices Example Of Sparse Matrices diagonal tridiagonal sparse many elements are

ADVANCED MULTIMEDIA ADVANCED MULTIMEDIA CODING CODING Fernando Pereira Instituto Superior

Dynamical systems Expanding maps on the circle. Coding Jana Rodriguez Hertz ICTP 2018 coding

Image and Video Coding: Hybrid Video Coding s n 1 [ x , y ] s n [ x , y ] m k = ( m x , m

Image Restoration Image Enhancement and Image Restoration both deal with improving images. Image

Outline for Today Course Overview Goals Administrative details CSE 143 Workload

t Prr Pr

Speech Processing 15-492/18-492 Multilinguality SPICE: making it easier Dealing with *all*

Federal Pharmacy Senior Leaders Vision: Next Three - Five Years CAPT Brian LaPlant, US Public

RinohType A Document Processor inspired by LaTeX Brecht Machiels EuroPython 2015 About the

[3] The Matrix What is a matrix? Traditional answer Neo: What is the Matrix? Trinity: The answer

Weapons of mass prediction Leonardo Egidi a (joint work with Jonah Gabry b , in preparation for

Learning Systems Research at the Intersection of Machine Learning &amp; Data Systems Joseph E.

Sparse dictionary learning in the presence of noise & outliers Rmi Gribonval INRIA Rennes

Speech Processing 15-492/18-492 Multilinguality SPICE: making it easier Dealing with all

Learning Systems Research at the Intersection of Machine Learning & Data Systems Joseph E.