Sparse dictionary learning in the presence of noise & outliers - PowerPoint PPT Presentation

Sparse dictionary learning in the presence of noise & outliers Rémi Gribonval INRIA Rennes - Bretagne Atlantique, France remi.gribonval@inria.fr

Overview • Context: sparse signal processing • Dictionary learning • Statistical guarantees • Flavor of the proof • Conclusion R. GRIBONVAL - Séminaire LIF April 25th 2013 2

Sparse signal processing R. GRIBONVAL - Séminaire LIF April 25th 2013-

Sparse Signal / Image Processing + Compression, Source Localization, Separation, Compressed Sensing ... R. GRIBONVAL - Séminaire LIF April 25th 2013 4

Typical Sparse Models • Audio : time-frequency representations (MP3) ANALYSIS Black SYNTHESIS = zero • Images : wavelet transform (JPEG2000) ANALYSIS White = zero SYNTHESIS R. GRIBONVAL - Séminaire LIF April 25th 2013 5

Mathematical expression • Signal / image = high dimensional vector x ∈ R d • Model = linear combination of basis vectors (ex: time-frequency atoms, wavelets ) Dictionary of atoms X x ≈ z k d k = Dz (Mallat & Zhang 93) k • Sparsity = small L0 (quasi)-norm | z k | 0 = card { k, z k � = 0 } X ⇥ z ⇥ 0 = k R. GRIBONVAL - Séminaire LIF April 25th 2013 6

CoSparse models and inverse problems Observation Domain R. GRIBONVAL - Séminaire LIF April 25th 2013 7

Acoustic Imaging • Ground truth: laser vibrometry • Nearfield Acoustic Holography ✓ direct optical measures ✓ indirect acoustic measures ✓ sequential ✓ 120 microphones at a time ✓ 2000 measures ✓ 120 x 16 = 1920 measures ✓ Tikhonov regularization echange.inria.fr R. GRIBONVAL - Séminaire LIF April 25th 2013 8

Compressive Nearfield Acoustic Holography • One shot with 120 micros • Sparse regularization echange.inria.fr R. GRIBONVAL - Séminaire LIF April 25th 2013 9

Dictionary learning small-project.eu with K. Schnass, F. Bach, R. Jenatton R. GRIBONVAL - Séminaire LIF April 25th 2013-

Sparse Atomic Decompositions x ≈ Dz Sparse Signal (Overcomplete) Representation Image dictionary of atoms Coefficients R. GRIBONVAL - Séminaire LIF April 25th 2013 11

Data Deluge + Jungle • Sparsity: historically for signals & images ✓ bottleneck = large-scale algorithms Graph data Social networks Brain connectivity Signals Images Hyperspectral Satellite imaging Spherical geometry Cosmology, HRTF (3D audio) Vector valued Diffusion tensor • New “exotic” or composite data ✓ bottleneck = dictionary/operator design/learning R. GRIBONVAL - Séminaire LIF April 25th 2013 12

A quest for the perfect sparse model Training patches x n = D z n , 1 ≤ n ≤ N patch extraction Unknown Unknown Training database dictionary sparse coefficients = edge-like atoms ˆ D sparse learning [Olshausen & Field 96, Aharon et al 06, Mairal et al 09, ...] = shifts of edge-like motifs [Blumensath 05, Jost et al 05, ...] R. GRIBONVAL - Séminaire LIF April 25th 2013 13

Dictionary Learning = Sparse Matrix Factorization z 1 z 2 z N . . . x 1 x 2 D x N ≈ . . . X ≈ D Z d × N d × K K × N with s -sparse columns R. GRIBONVAL - Séminaire LIF April 25th 2013 14

Many approaches • Independent component analysis [see e.g. book by Comon & Jutten 2011] ✦ • Convex [Bach et al., 2008; Bradley and Bagnell, 2009] ✦ • Submodular [Krause and Cevher, 2010] ✦ • Bayesian [Zhou et al., 2009] ✦ • Non-convex matrix-factorization [Olshausen and Field, 1997; Pearlmutter & Zibulevsky 2001, Aharon et al. ✦ 2006; Lee et al., 2007; Mairal et al., 2010 (... and many other authors)] R. GRIBONVAL - Séminaire LIF April 25th 2013 15

Sparse coding objective function • Given one training sample: Basis Pursuit / LASSO 1 2 k x n � D z n k 2 f x n ( D ) = min 2 + λ k z n k 1 z n • Given N training samples N F X ( D ) = 1 X f x n ( D ) N n =1 1 2 k X � D Z k 2 / min F + λ k Z k 1 Z R. GRIBONVAL - Séminaire LIF April 25th 2013 16

Learning = constrained minimization ˆ D = arg min D ∈ D F X ( D ) ✓ Online learning with SPAMS library (Mairal & al) ✓ Constraint = dictionary with unit columns D = { D = [ d 1 , . . . , d D ] , � k ⇥ d k ⇥ 2 = 1 } R. GRIBONVAL - Séminaire LIF April 25th 2013 17

Empirical findings R. GRIBONVAL - Séminaire LIF April 25th 2013-

Numerical example (2D) F X ( D ) X = D 0 Z 0 N = 1000 Bernoulli − Gaussian training samples 3 θ 0 , θ 1 X k 1 D θ 0 , θ 1 θ 0 2 θ 1 1 k D − 1 0 − 1 − 2 − 3 − 4 − 3 − 2 − 1 0 1 2 3 Empirical observations Symmetry = a) Global minima match angles of the original basis permutation ambiguity b) There is no other local minimum. R. GRIBONVAL - Séminaire LIF April 25th 2013 19

Sparsity vs coherence (2D) Empirical probability of success ground truth=local min ground truth=global min weakly sparse 1 N = 1000 Bernoulli − Gaussian training samples N = 1000 Bernoulli − Gaussian training samples 1 4 4 3 3 0.9 2 2 0.8 1 1 0.7 0 0 − 1 − 1 0.6. − 2 − 2 0.5 − 3 − 3 − 4 − 4 − 4 − 3 − 2 − 1 0 1 2 3 − 3 − 2 − 1 0 1 2 3 N = 1000 Bernoulli − Gaussian training samples N = 1000 Bernoulli − Gaussian training samples p 3 no spurious local min 3 2.5 2 2 1 1.5 1 0 0.5 − 1 0 − 2 − 0.5 − 1 − 3 − 1.5 sparse − 4 − 2 − 3 − 2 − 1 0 1 2 3 − 2.5 − 2 − 1.5 − 1 − 0.5 0 0.5 1 1.5 2 0 1 µ = | cos( θ 1 − θ 0 ) | Rule of thumb : perfect recovery if: coherent incoherent a) Incoherence µ < 1 − p b) Enough training samples (N large enough) R. GRIBONVAL - Séminaire LIF April 25th 2013 20

Empirical findings • Stable & robust dictionary identification ✓ Global minima often match ground truth ✓ Often, there is no spurious local minimum • Role of parameters ? ✓ sparsity of Z ? ✓ incoherence of D ? ✓ noise level ? ✓ presence / nature of outliers ? ✓ sample complexity (number of training samples) ? R. GRIBONVAL - Séminaire LIF April 25th 2013 21

Theoretical guarantees R. GRIBONVAL - Séminaire LIF April 25th 2013-

Theoretical guarantees • Excess risk analysis (~Machine Learning) [Maurer and Pontil, 2010; Vainsencher et al., 2010; Mehta and Gray, 2012] ✦ F X ( ˆ D ) − min D E X F X ( D ) • Identifiability analysis (~Signal Processing) [Independent Component Analysis, e.g. book Comon & Jutten 2011] ✦ Array processing perspective ✓ k ˆ D � D 0 k F Dictionary ~ directions of arrival ✦ Identification ~ source localization ✦ Neural coding perspective: ✓ Dictionaries ~ receptive fields ✦ R. GRIBONVAL - Séminaire LIF April 25th 2013 23

Theoretical guarantees: overview [G. & Schnass 2010] [Geng & al 2011] [Jenatton, Bach & G.] − signal model − − − − − − − − overcomplete (d<K) no yes yes outliers yes no yes noise no no yes min F X ( D ) min D ,Z k Z k 1 s.t. D Z = X cost function R. GRIBONVAL - Séminaire LIF April 25th 2013 24

Sparse Signal Model • Random support J ⊂ [1 , K ] , � J = s • Sub-Gaussian iid coefficients, bounded below P ( | z i | < z ) = 0 • Sub-Gaussian additive noise X x = z i d i + ε = D J z J + ε i ∈ J R. GRIBONVAL - Séminaire LIF April 25th 2013 25

Local stability & robustness • Theorem 1: local stability [Jenatton, Bach & G. 2012] ✓ Assumptions: D 0 overcomplete incoherent dictionary ✦ sµ ( D 0 ) ⌧ 1 s -sparse sub-Gaussian coefficient model (no outlier) ✦ ✓ Conclusion: with high probability there exists a local minimum of such that F X ( D ) ✦ r sdK 3 · log N ⇤ D � D 0 ⇤ F ⇥ C N • Theorem 2: robustness to noise ✓ technical assumption: bounded coefficient model • Theorem 3: robustness to outliers R. GRIBONVAL - Séminaire LIF April 25th 2013 26

Learning Guarantees vs Empirical Findings • Robustness to noise • Sample complexity Predicted slope dxd dx2d Hadamard dictionary in dimension d Hadamard − Dirac dictionary in dimension d 1 10 0 10 d=8 d=8 d=16 d=16 d=32 (random init.) 0 d=32 (oracle init.) 10 − 1 10 Relative error relative error − 1 10 − 2 10 − 2 10 d=8 d=8 d=16 d=16 d=32 (random init.) d=32 (oracle init.) − 3 10 1 2 3 4 5 0 1 10 10 10 10 10 10 10 number N of training signals Noise level R. GRIBONVAL - Séminaire LIF April 25th 2013 27

Flavor of the proof R. GRIBONVAL - Séminaire LIF April 25th 2013-

Characterizing local minima (1) • Noiseless setting • Noisy setting ✓ Minimum exactly at ground truth ✓ Minimum close to ground truth F X ( D ) − F X ( D 0 ) F X ( D ) − F X ( D 0 ) r D 0 D D D 0 ground truth ✓ Zero at ground truth ✓ one-sided directional derivatives ✓ Lower bound at radius r R. GRIBONVAL - Séminaire LIF April 25th 2013 29

Controlling the cost function • Problem : sum of complicated functions! F X ( D ) 1 2 k x n � D z n k 2 f x n ( D ) = min 2 + λ k z n k 1 z n • Solution : simplified expression if sparse recovery adaptation from [Fuchs, 2005; Zhao and Yu, 2006; Wainwright, 2009] ✦ f x ( D ) = φ x ( D | sign( z 0 )) x = D 0 z 0 + ε ✓ Approximate cost function Φ X ( D ) ≈ F X ( D ) R. GRIBONVAL - Séminaire LIF April 25th 2013 30

Sparse dictionary learning in the presence of noise & outliers - PowerPoint PPT Presentation

Sparse dictionary learning in the presence of noise & outliers Rmi Gribonval INRIA Rennes - Bretagne Atlantique, France remi.gribonval@inria.fr Overview Context: sparse signal processing Dictionary learning Statistical

Sparse Coding and Dictionary Learning for Image Analysis Part II: Dictionary Learning for signal

Test of Time Award Online Dictionary Learning for Sparse Coding Julien Mairal, Francis Bach, Jean

Module-2c: Two Port Noise Modelling 20 July 2018 16:40 Shot Noise vs. Flicker Noise Simple

The Dictionary ADT The dictionary ADT models a searchable collection findElement(k): if the

Dictionary learning in geoscience Michael Bianco UCSD Noise Lab, Scripps Institution of

Sparse Matrices Example Of Sparse Matrices diagonal tridiagonal sparse many elements are

Sparse Coding and Dictionary Learning for Image Analysis Part I: Optimization for Sparse Coding

Sparse Coding and Dictionary Learning for Image Analysis Part IV: New sparse models Francis

Visioning Committee Air Quality and Noise January 23, 2020 Noise Data Noise is evaluated on

Lecture 19- ECE 240a Laser Phase Noise 1 ECE 240a Lasers - Fall 2019 Lecture 19 Phase Noise

Making Polynomials Robust to Noise Alexander Sherstov U C L A Noise in computation 2 Noise in

Johnson Noise: Determinations of k and Absolute Zero Edwin Ng | 12 December 2011 Nyquists

Noises Jaanus Jaggo Noise Noise is a function: noise(coordinate) -> value Pseudo-random:

Noises Jaanus Jaggo Noise Noise is a function: noise(coordinate) -> value Pseudo-random:

Dictionary Learning for Graph Signals 236862 Introduction to Sparse and Redundant

Sparse Matrices sparse many elements are zero dense few elements are zero Example Of

1 Projects Projects Final Reports Final Report Textual description of your system

Magnetism and Matter MM-2: Electronic and magnetic properties Stefan Blgel Peter Grnberg

Probability Theory Intro Jonathan Pillow Mathematical Tools for Neuroscience (NEU 314) Spring,

Representation Stefano Ermon, Aditya Grover Stanford University Lecture 2 Stefano Ermon, Aditya

BNL/PBL SBIR Phase II HTS Program for approaching 40 T Robert Weggel, David Cline, Alper Garren,

HTS Magnets L Rossi Use of Bi-2212 and YBCO: both are promising so far 10,000 YBCO: Parallel

Green Mountain Care Board Bruce Bullock, MD VITL Board Chair December 14, 2017 1 Initial VITL

Unit 3: Foundations for inference Lecture 4: Review / Synthesis Statistics 101 Thomas Leininger

Sparse dictionary learning in the presence of noise & outliers - PowerPoint PPT Presentation

Sparse dictionary learning in the presence of noise & outliers Rmi Gribonval INRIA Rennes - Bretagne Atlantique, France remi.gribonval@inria.fr Overview Context: sparse signal processing Dictionary learning Statistical

Sparse Coding and Dictionary Learning for Image Analysis Part II: Dictionary Learning for signal

Test of Time Award Online Dictionary Learning for Sparse Coding Julien Mairal, Francis Bach, Jean

Module-2c: Two Port Noise Modelling 20 July 2018 16:40 Shot Noise vs. Flicker Noise Simple

The Dictionary ADT The dictionary ADT models a searchable collection findElement(k): if the

Dictionary learning in geoscience Michael Bianco UCSD Noise Lab, Scripps Institution of

Sparse Matrices Example Of Sparse Matrices diagonal tridiagonal sparse many elements are

Sparse Coding and Dictionary Learning for Image Analysis Part I: Optimization for Sparse Coding

Sparse Coding and Dictionary Learning for Image Analysis Part IV: New sparse models Francis

Visioning Committee Air Quality and Noise January 23, 2020 Noise Data Noise is evaluated on

Lecture 19- ECE 240a Laser Phase Noise 1 ECE 240a Lasers - Fall 2019 Lecture 19 Phase Noise

Making Polynomials Robust to Noise Alexander Sherstov U C L A Noise in computation 2 Noise in

Johnson Noise: Determinations of k and Absolute Zero Edwin Ng | 12 December 2011 Nyquists

Noises Jaanus Jaggo Noise Noise is a function: noise(coordinate) -&gt; value Pseudo-random:

Noises Jaanus Jaggo Noise Noise is a function: noise(coordinate) -&gt; value Pseudo-random:

Dictionary Learning for Graph Signals 236862 Introduction to Sparse and Redundant

Sparse Matrices sparse many elements are zero dense few elements are zero Example Of

1 Projects Projects Final Reports Final Report Textual description of your system

Magnetism and Matter MM-2: Electronic and magnetic properties Stefan Blgel Peter Grnberg

Probability Theory Intro Jonathan Pillow Mathematical Tools for Neuroscience (NEU 314) Spring,

Representation Stefano Ermon, Aditya Grover Stanford University Lecture 2 Stefano Ermon, Aditya

BNL/PBL SBIR Phase II HTS Program for approaching 40 T Robert Weggel, David Cline, Alper Garren,

HTS Magnets L Rossi Use of Bi-2212 and YBCO: both are promising so far 10,000 YBCO: Parallel

Green Mountain Care Board Bruce Bullock, MD VITL Board Chair December 14, 2017 1 Initial VITL

Unit 3: Foundations for inference Lecture 4: Review / Synthesis Statistics 101 Thomas Leininger

Noises Jaanus Jaggo Noise Noise is a function: noise(coordinate) -> value Pseudo-random:

Noises Jaanus Jaggo Noise Noise is a function: noise(coordinate) -> value Pseudo-random: