Sparse dictionary learning in the presence of noise & outliers Rémi Gribonval INRIA Rennes - Bretagne Atlantique, France remi.gribonval@inria.fr
Overview • Context: sparse signal processing • Dictionary learning • Statistical guarantees • Flavor of the proof • Conclusion R. GRIBONVAL - Séminaire LIF April 25th 2013 2
Sparse signal processing R. GRIBONVAL - Séminaire LIF April 25th 2013-
Sparse Signal / Image Processing + Compression, Source Localization, Separation, Compressed Sensing ... R. GRIBONVAL - Séminaire LIF April 25th 2013 4
Typical Sparse Models • Audio : time-frequency representations (MP3) ANALYSIS Black SYNTHESIS = zero • Images : wavelet transform (JPEG2000) ANALYSIS White = zero SYNTHESIS R. GRIBONVAL - Séminaire LIF April 25th 2013 5
Mathematical expression • Signal / image = high dimensional vector x ∈ R d • Model = linear combination of basis vectors (ex: time-frequency atoms, wavelets ) Dictionary of atoms X x ≈ z k d k = Dz (Mallat & Zhang 93) k • Sparsity = small L0 (quasi)-norm | z k | 0 = card { k, z k � = 0 } X ⇥ z ⇥ 0 = k R. GRIBONVAL - Séminaire LIF April 25th 2013 6
CoSparse models and inverse problems Observation Domain R. GRIBONVAL - Séminaire LIF April 25th 2013 7
Acoustic Imaging • Ground truth: laser vibrometry • Nearfield Acoustic Holography ✓ direct optical measures ✓ indirect acoustic measures ✓ sequential ✓ 120 microphones at a time ✓ 2000 measures ✓ 120 x 16 = 1920 measures ✓ Tikhonov regularization echange.inria.fr R. GRIBONVAL - Séminaire LIF April 25th 2013 8
Compressive Nearfield Acoustic Holography • One shot with 120 micros • Sparse regularization echange.inria.fr R. GRIBONVAL - Séminaire LIF April 25th 2013 9
Dictionary learning small-project.eu with K. Schnass, F. Bach, R. Jenatton R. GRIBONVAL - Séminaire LIF April 25th 2013-
Sparse Atomic Decompositions x ≈ Dz Sparse Signal (Overcomplete) Representation Image dictionary of atoms Coefficients R. GRIBONVAL - Séminaire LIF April 25th 2013 11
Data Deluge + Jungle • Sparsity: historically for signals & images ✓ bottleneck = large-scale algorithms Graph data Social networks Brain connectivity Signals Images Hyperspectral Satellite imaging Spherical geometry Cosmology, HRTF (3D audio) Vector valued Diffusion tensor • New “exotic” or composite data ✓ bottleneck = dictionary/operator design/learning R. GRIBONVAL - Séminaire LIF April 25th 2013 12
A quest for the perfect sparse model Training patches x n = D z n , 1 ≤ n ≤ N patch extraction Unknown Unknown Training database dictionary sparse coefficients = edge-like atoms ˆ D sparse learning [Olshausen & Field 96, Aharon et al 06, Mairal et al 09, ...] = shifts of edge-like motifs [Blumensath 05, Jost et al 05, ...] R. GRIBONVAL - Séminaire LIF April 25th 2013 13
Dictionary Learning = Sparse Matrix Factorization z 1 z 2 z N . . . x 1 x 2 D x N ≈ . . . X ≈ D Z d × N d × K K × N with s -sparse columns R. GRIBONVAL - Séminaire LIF April 25th 2013 14
Many approaches • Independent component analysis [see e.g. book by Comon & Jutten 2011] ✦ • Convex [Bach et al., 2008; Bradley and Bagnell, 2009] ✦ • Submodular [Krause and Cevher, 2010] ✦ • Bayesian [Zhou et al., 2009] ✦ • Non-convex matrix-factorization [Olshausen and Field, 1997; Pearlmutter & Zibulevsky 2001, Aharon et al. ✦ 2006; Lee et al., 2007; Mairal et al., 2010 (... and many other authors)] R. GRIBONVAL - Séminaire LIF April 25th 2013 15
Sparse coding objective function • Given one training sample: Basis Pursuit / LASSO 1 2 k x n � D z n k 2 f x n ( D ) = min 2 + λ k z n k 1 z n • Given N training samples N F X ( D ) = 1 X f x n ( D ) N n =1 1 2 k X � D Z k 2 / min F + λ k Z k 1 Z R. GRIBONVAL - Séminaire LIF April 25th 2013 16
Learning = constrained minimization ˆ D = arg min D ∈ D F X ( D ) ✓ Online learning with SPAMS library (Mairal & al) ✓ Constraint = dictionary with unit columns D = { D = [ d 1 , . . . , d D ] , � k ⇥ d k ⇥ 2 = 1 } R. GRIBONVAL - Séminaire LIF April 25th 2013 17
Empirical findings R. GRIBONVAL - Séminaire LIF April 25th 2013-
Numerical example (2D) F X ( D ) X = D 0 Z 0 N = 1000 Bernoulli − Gaussian training samples 3 θ 0 , θ 1 X k 1 D θ 0 , θ 1 θ 0 2 θ 1 1 k D − 1 0 − 1 − 2 − 3 − 4 − 3 − 2 − 1 0 1 2 3 Empirical observations Symmetry = a) Global minima match angles of the original basis permutation ambiguity b) There is no other local minimum. R. GRIBONVAL - Séminaire LIF April 25th 2013 19
Sparsity vs coherence (2D) Empirical probability of success ground truth=local min ground truth=global min weakly sparse 1 N = 1000 Bernoulli − Gaussian training samples N = 1000 Bernoulli − Gaussian training samples 1 4 4 3 3 0.9 2 2 0.8 1 1 0.7 0 0 − 1 − 1 0.6. − 2 − 2 0.5 − 3 − 3 − 4 − 4 − 4 − 3 − 2 − 1 0 1 2 3 − 3 − 2 − 1 0 1 2 3 N = 1000 Bernoulli − Gaussian training samples N = 1000 Bernoulli − Gaussian training samples p 3 no spurious local min 3 2.5 2 2 1 1.5 1 0 0.5 − 1 0 − 2 − 0.5 − 1 − 3 − 1.5 sparse − 4 − 2 − 3 − 2 − 1 0 1 2 3 − 2.5 − 2 − 1.5 − 1 − 0.5 0 0.5 1 1.5 2 0 1 µ = | cos( θ 1 − θ 0 ) | Rule of thumb : perfect recovery if: coherent incoherent a) Incoherence µ < 1 − p b) Enough training samples (N large enough) R. GRIBONVAL - Séminaire LIF April 25th 2013 20
Empirical findings • Stable & robust dictionary identification ✓ Global minima often match ground truth ✓ Often, there is no spurious local minimum • Role of parameters ? ✓ sparsity of Z ? ✓ incoherence of D ? ✓ noise level ? ✓ presence / nature of outliers ? ✓ sample complexity (number of training samples) ? R. GRIBONVAL - Séminaire LIF April 25th 2013 21
Theoretical guarantees R. GRIBONVAL - Séminaire LIF April 25th 2013-
Theoretical guarantees • Excess risk analysis (~Machine Learning) [Maurer and Pontil, 2010; Vainsencher et al., 2010; Mehta and Gray, 2012] ✦ F X ( ˆ D ) − min D E X F X ( D ) • Identifiability analysis (~Signal Processing) [Independent Component Analysis, e.g. book Comon & Jutten 2011] ✦ Array processing perspective ✓ k ˆ D � D 0 k F Dictionary ~ directions of arrival ✦ Identification ~ source localization ✦ Neural coding perspective: ✓ Dictionaries ~ receptive fields ✦ R. GRIBONVAL - Séminaire LIF April 25th 2013 23
Theoretical guarantees: overview [G. & Schnass 2010] [Geng & al 2011] [Jenatton, Bach & G.] − signal model − − − − − − − − overcomplete (d<K) no yes yes outliers yes no yes noise no no yes min F X ( D ) min D ,Z k Z k 1 s.t. D Z = X cost function R. GRIBONVAL - Séminaire LIF April 25th 2013 24
Sparse Signal Model • Random support J ⊂ [1 , K ] , � J = s • Sub-Gaussian iid coefficients, bounded below P ( | z i | < z ) = 0 • Sub-Gaussian additive noise X x = z i d i + ε = D J z J + ε i ∈ J R. GRIBONVAL - Séminaire LIF April 25th 2013 25
Local stability & robustness • Theorem 1: local stability [Jenatton, Bach & G. 2012] ✓ Assumptions: D 0 overcomplete incoherent dictionary ✦ sµ ( D 0 ) ⌧ 1 s -sparse sub-Gaussian coefficient model (no outlier) ✦ ✓ Conclusion: with high probability there exists a local minimum of such that F X ( D ) ✦ r sdK 3 · log N ⇤ D � D 0 ⇤ F ⇥ C N • Theorem 2: robustness to noise ✓ technical assumption: bounded coefficient model • Theorem 3: robustness to outliers R. GRIBONVAL - Séminaire LIF April 25th 2013 26
Learning Guarantees vs Empirical Findings • Robustness to noise • Sample complexity Predicted slope dxd dx2d Hadamard dictionary in dimension d Hadamard − Dirac dictionary in dimension d 1 10 0 10 d=8 d=8 d=16 d=16 d=32 (random init.) 0 d=32 (oracle init.) 10 − 1 10 Relative error relative error − 1 10 − 2 10 − 2 10 d=8 d=8 d=16 d=16 d=32 (random init.) d=32 (oracle init.) − 3 10 1 2 3 4 5 0 1 10 10 10 10 10 10 10 number N of training signals Noise level R. GRIBONVAL - Séminaire LIF April 25th 2013 27
Flavor of the proof R. GRIBONVAL - Séminaire LIF April 25th 2013-
Characterizing local minima (1) • Noiseless setting • Noisy setting ✓ Minimum exactly at ground truth ✓ Minimum close to ground truth F X ( D ) − F X ( D 0 ) F X ( D ) − F X ( D 0 ) r D 0 D D D 0 ground truth ✓ Zero at ground truth ✓ one-sided directional derivatives ✓ Lower bound at radius r R. GRIBONVAL - Séminaire LIF April 25th 2013 29
Controlling the cost function • Problem : sum of complicated functions! F X ( D ) 1 2 k x n � D z n k 2 f x n ( D ) = min 2 + λ k z n k 1 z n • Solution : simplified expression if sparse recovery adaptation from [Fuchs, 2005; Zhao and Yu, 2006; Wainwright, 2009] ✦ f x ( D ) = φ x ( D | sign( z 0 )) x = D 0 z 0 + ε ✓ Approximate cost function Φ X ( D ) ≈ F X ( D ) R. GRIBONVAL - Séminaire LIF April 25th 2013 30
Recommend
More recommend