Learning = Constrained Minimization ˆ D = arg min D ∈ D F X ( D ) 1 2 k X � D Z k 2 / min F + Φ ( Z ) Z • Without constraint set : degenerate solution D D → ∞ , Z → 0 • Typical constraint = unit-norm columns D = { D = [ d 1 , . . . , d K ] , 8 k k d k k 2 = 1 } R. GRIBONVAL - 2015 January 2015 26
A versatile matrix factorization framework • Sparse coding (typically d < K) D penalty: L1 norm ✦ constraint: unit-norm dictionary ✦ • K-means clustering penalty: indicator function of canonical basis vectors ✦ constraint: none ✦ • NMF (non-negative matrix factorization) (d > K) penalty: indicator function of non-negative coefficients ✦ constraint: unit-norm non-negative dictionary ✦ • PCA (typically d > K) D penalty: none ✦ constraint: dictionary with orthonormal columns ✦ R. GRIBONVAL - 2015 January 2015 27
Algorithms for penalized matrix factorization R. GRIBONVAL - 2015 January 2015-
Principle: Alternate Optimization • Global objective 2 k X � D Z k 2 1 min F + Φ ( Z ) D ,Z • Alternate two steps ✓ Update coefficients given current dictionary D 2 k x i � D z i k 2 1 min F + φ ( z i ) z i ✓ Update dictionary given current coefficients Z 2 k X � D Z k 2 1 min F D R. GRIBONVAL - 2015 January 2015 29
Coefficient Update = Sparse Coding • Objective 2 k x i � D z i k 2 1 min F + φ ( z i ) z i • Two strategies ✓ Batch: for all training samples i at each iteration ✓ Online : for one (randomly selected) training sample i • Implementation: sparse coding algorithm ✓ L1 minimization , (Orthonormal) Matching Pursuit, ... R. GRIBONVAL - 2015 January 2015 30
Dictionary Update • Objective 2 k X � D Z k 2 1 min F D • Main approaches ✓ Method of Optimal Directions (MOD) [Engan et al., 1999] ˆ D k X � D Z k 2 D = X · pinv( Z ) = arg min F ✓ K-SVD: with PCA [Aharon et al. 2006] coefficients are jointly updated ✦ ✓ Online L1: stoch. gradient [Engan & al 2007, Mairal et al., 2010] R. GRIBONVAL - 2015 January 2015 31
... but also • Related «learning» matrix factorizations ✓ Non-negativity (NMF): Multiplicative update [Lee & Seung 1999] ✦ ✓ Known rows up to gains ( blind calibration ) D = diag( g ) D 0 Convex formulation [G & al 2012, Bilen & al 2013] ✦ ✓ Know-rows up to permutation (cable chaos) D = Π D 0 Branch & bound [Emiya & al, 2014] ✦ • (Approximate) Message Passing [e.g. Krzakala & al, 2013] R. GRIBONVAL - 2015 January 2015 32
Analytic vs Learned Dictionaries Learning Fast Transforms Ph.D. of Luc Le Magoarou R. GRIBONVAL - 2015 January 2015-
Analytic vs Learned Dictionaries Adaptation to Computational Dictionary Training Data Complexity Analytic No Low (Fourier, wavelets, ...) Learned Yes High R. GRIBONVAL - 2015 January 2015 34
Analytic vs Learned Dictionaries Adaptation to Computational Dictionary Training Data Complexity Analytic No Low (Fourier, wavelets, ...) Learned Yes High R. GRIBONVAL - 2015 January 2015 34
Analytic vs Learned Dictionaries Adaptation to Computational Dictionary Training Data Complexity Analytic No Low (Fourier, wavelets, ...) Best of both worlds ? Learned Yes High R. GRIBONVAL - 2015 January 2015 34
Sparse-KSVD • Principle: constrained dictionary learning ✓ choose reference (fast) dictionary D 0 ✓ learn with the constraint: where is sparse D = D 0 S S • Resulting double-sparse factorization problem X ≈ D 0 S Z • [R. Rubinstein, M. Zibulevsky & M. Elad, “ Double Sparsity: Learning Sparse Dictionaries for Sparse Signal Approximation ,” IEEE TSP, vol. 58, no. 3, pp. 1553–1564. R. GRIBONVAL - 2015 January 2015 35
Sparse-KSVD • Principle: constrained dictionary learning ✓ choose reference (fast) dictionary D 0 ✓ learn with the constraint: where is sparse D = D 0 S S • Resulting double-sparse factorization problem two unknown factors X ≈ D 0 S Z • [R. Rubinstein, M. Zibulevsky & M. Elad, “ Double Sparsity: Learning Sparse Dictionaries for Sparse Signal Approximation ,” IEEE TSP, vol. 58, no. 3, pp. 1553–1564. R. GRIBONVAL - 2015 January 2015 35
Sparse-KSVD • Principle: constrained dictionary learning strong prior! ✓ choose reference (fast) dictionary D 0 ✓ learn with the constraint: where is sparse D = D 0 S S • Resulting double-sparse factorization problem two unknown factors X ≈ D 0 S Z • [R. Rubinstein, M. Zibulevsky & M. Elad, “ Double Sparsity: Learning Sparse Dictionaries for Sparse Signal Approximation ,” IEEE TSP, vol. 58, no. 3, pp. 1553–1564. R. GRIBONVAL - 2015 January 2015 35
Speed = Factorizable Structure • Fourier : FFT with butterfly algorithm Y • Wavelets : FWT tree of filter banks • Hadamard : Fast Hadamard Transform R. GRIBONVAL - 2015 January 2015 36
Learning Fast Transforms = Chasing Butterflies M • Class of dictionaries of the form Y D D = S j j =1 ✓ covers standard fast transforms ✓ more flexible, better adaptation to training data ✓ benefits: Speed : inverse problems and more ✦ Storage : compression ✦ Statistical significance / sample complexity: denoising ✦ • Learning : ✓ Nonconvex optimization algorithm : PALM guaranteed convergence to stationary point ✦ ✓ Hierarchical strategy R. GRIBONVAL - 2015 January 2015 37
Example 1: Reverse-Engineering the Fast Hadamard Transform • Hadamard Dictionary: Reference Factorization n 2 2 n log 2 n • Learned Factorization: different, but as fast O n 2 2 n log 2 n tested up to n=1024 R. GRIBONVAL - 2015 January 2015 38
Example 2: Image Denoising with Learned Fast Transform • Patch-based dictionary learning ( n = 8x8 pixels) • Comparison using box small-project.eu R. GRIBONVAL - 2015 January 2015 39
Example 2: Image Denoising with Learned Fast Transform • Patch-based dictionary learning ( n = 8x8 pixels) • Comparison using box small-project.eu • Learned dictionaries O ( n 2 ) O ( n log 2 n ) R. GRIBONVAL - 2015 January 2015 40
Comparison with Sparse KSVD (KSVDS) R. GRIBONVAL - 2015 January 2015 41
Comparison with Sparse KSVD (KSVDS) D = D 0 S very close to = DCT D 0 R. GRIBONVAL - 2015 January 2015 42
Statistical guarantees R. GRIBONVAL - 2015 January 2015-
Theoretical Guarantees ? • Given N training samples in X : ˆ D N ∈ arg min D F X ( D ) R. GRIBONVAL - 2015 January 2015 44
Theoretical Guarantees ? • Given N training samples in X : ˆ D N ∈ arg min D F X ( D ) Compression, denoising, calibration, ✓ Source localization, neural coding ... inverse problems ... R. GRIBONVAL - 2015 January 2015 44
Theoretical Guarantees ? • Given N training samples in X : ˆ D N ∈ arg min D F X ( D ) Compression, denoising, calibration, ✓ Source localization, neural coding ... inverse problems ... ✓ No «ground truth dictionary» ✓ Goal = performance generalization E F X ( ˆ D N ) ≤ min D E F X ( D ) + η N • «How many training samples ?» R. GRIBONVAL - 2015 January 2015 45
Theoretical Guarantees ? • Given N training samples in X : ˆ D N ∈ arg min D F X ( D ) Compression, denoising, calibration, ✓ Source localization, neural coding ... inverse problems ... ✓ No «ground truth dictionary» ✓ Goal = performance generalization E F X ( ˆ D N ) ≤ min D E F X ( D ) + η N • «How many training samples ?» • Excess risk analysis (~Machine Learning) [Maurer and Pontil, 2010; Vainsencher & al., ✦ 2010; Mehta and Gray, 2012; G. & al 2013] R. GRIBONVAL - 2015 January 2015 45
Theoretical Guarantees ? • Given N training samples in X : ˆ D N ∈ arg min D F X ( D ) Compression, denoising, calibration, ✓ Source localization, neural coding ... inverse problems ... ✓ No «ground truth dictionary» ✓ Ground truth x = D 0 z + ε ✓ Goal = performance generalization ✓ Goal = dictionary estimation k ˆ E F X ( ˆ D N � D 0 k F D N ) ≤ min D E F X ( D ) + η N • «How many training samples ?» • What recovery conditions ? • Excess risk analysis (~Machine Learning) [Maurer and Pontil, 2010; Vainsencher & al., ✦ 2010; Mehta and Gray, 2012; G. & al 2013] R. GRIBONVAL - 2015 January 2015 46
Theoretical Guarantees ? • Given N training samples in X : ˆ D N ∈ arg min D F X ( D ) Compression, denoising, calibration, ✓ Source localization, neural coding ... inverse problems ... ✓ No «ground truth dictionary» ✓ Ground truth x = D 0 z + ε ✓ Goal = performance generalization ✓ Goal = dictionary estimation k ˆ E F X ( ˆ D N � D 0 k F D N ) ≤ min D E F X ( D ) + η N • «How many training samples ?» • What recovery conditions ? • Excess risk analysis • Identifiability analysis (~Machine Learning) (~Signal Processing) [Maurer and Pontil, 2010; Vainsencher & al., ✦ [Independent Component Analysis, e.g. book ✦ 2010; Mehta and Gray, 2012; G. & al 2013] Comon & Jutten 2011] R. GRIBONVAL - 2015 January 2015 46
Theorem: Excess Risk Control • Assume: [G. & al, Sample Complexity of Dictionary Learning and Other Matrix Factorizations, 2013, arXiv/HAL] R. GRIBONVAL - 2015 January 2015 47
Theorem: Excess Risk Control • Assume: ✓ X obtained from N draws, i.i.d., bounded P ( k x k 2 1) = 1 [G. & al, Sample Complexity of Dictionary Learning and Other Matrix Factorizations, 2013, arXiv/HAL] R. GRIBONVAL - 2015 January 2015 47
Theorem: Excess Risk Control • Assume: ✓ X obtained from N draws, i.i.d., bounded P ( k x k 2 1) = 1 ✓ Penalty function φ ( z ) non-negative and minimum at zero ✦ lower semi-continuous ✦ coercive ✦ [G. & al, Sample Complexity of Dictionary Learning and Other Matrix Factorizations, 2013, arXiv/HAL] R. GRIBONVAL - 2015 January 2015 47
Theorem: Excess Risk Control • Assume: ✓ X obtained from N draws, i.i.d., bounded P ( k x k 2 1) = 1 ✓ Penalty function φ ( z ) non-negative and minimum at zero ✦ lower semi-continuous ✦ coercive ✦ ✓ Constraint set : (upper box-counting) dimension h D typically: h = dK d = signal dimension, K = number of atoms ✦ [G. & al, Sample Complexity of Dictionary Learning and Other Matrix Factorizations, 2013, arXiv/HAL] R. GRIBONVAL - 2015 January 2015 47
Theorem: Excess Risk Control • Assume: ✓ X obtained from N draws, i.i.d., bounded P ( k x k 2 1) = 1 ✓ Penalty function φ ( z ) non-negative and minimum at zero ✦ lower semi-continuous ✦ coercive ✦ ✓ Constraint set : (upper box-counting) dimension h D typically: h = dK d = signal dimension, K = number of atoms ✦ • Then : with probability at least on X 1 − 2 e − x r ( h + x ) log N E F X ( ˆ D N ) ≤ min D E F X ( D ) + η N η N ≤ C N [G. & al, Sample Complexity of Dictionary Learning and Other Matrix Factorizations, 2013, arXiv/HAL] R. GRIBONVAL - 2015 January 2015 47
A word about the proof • Classical approach based on three ingredients ✓ Concentration of around its mean F X ( D ) E x f x ( D ) ✓ Lipschitz behaviour of D 7! F X ( D ) ➡ main technical contribution under assumptions on penalty ✓ Union bound using covering numbers • High dimensional scaling d → ∞ ✓ Dimension dependent bound ! d r ! dK log N D O N ✓ With Rademacher’s complexities & Slepian’s Lemma, can recover known dimension independent bounds ✓ E.g., for PCA r ! K 2 O N R. GRIBONVAL - 2015 January 2015 48
Versatility of the Sample Complexity Results • General penalty functions l1 norm / mixed norms / lp quasi-norms ✦ ... but also non-coercive penalties (with additional RIP on constraint set): ✦ • s-sparse constraint, non-negativity • General constraint sets unit norm / sparse / shift-invariant / tensor product / tight frame ... ✦ «complexity» captured by box-counting dimension ✦ • «Distribution free» bounded samples P ( k x k 2 1) = 1 ✦ ... but also sub-Gaussian P ( k x k 2 � At ) exp( � t ) , t � 1 ✦ • Selected covered examples: PCA / NMF / K-Means / sparse PCA ✦ R. GRIBONVAL - 2015 January 2015 49
Identifiability analysis ? Empirical findings R. GRIBONVAL - 2015 January 2015-
Numerical Example (2D) X = D 0 Z 0 N = 1000 Bernoulli − Gaussian training samples 3 2 1 0 − 1 − 2 − 3 − 4 − 3 − 2 − 1 0 1 2 3 R. GRIBONVAL - 2015 January 2015 51
Numerical Example (2D) X = D 0 Z 0 N = 1000 Bernoulli − Gaussian training samples 3 D θ 0 , θ 1 θ 0 2 θ 1 1 0 − 1 − 2 − 3 − 4 − 3 − 2 − 1 0 1 2 3 R. GRIBONVAL - 2015 January 2015 51
Numerical Example (2D) F X ( D ) X = D 0 Z 0 N = 1000 Bernoulli − Gaussian training samples 3 θ 0 , θ 1 X k 1 D θ 0 , θ 1 θ 0 2 θ 1 1 k D − 1 0 − 1 − 2 − 3 − 4 − 3 − 2 − 1 0 1 2 3 R. GRIBONVAL - 2015 January 2015 51
Numerical Example (2D) F X ( D ) X = D 0 Z 0 N = 1000 Bernoulli − Gaussian training samples 3 θ 0 , θ 1 X k 1 D θ 0 , θ 1 θ 0 2 θ 1 1 k D − 1 0 − 1 − 2 − 3 − 4 − 3 − 2 − 1 0 1 2 3 Symmetry = permutation ambiguity R. GRIBONVAL - 2015 January 2015 51
Numerical Example (2D) F X ( D ) X = D 0 Z 0 N = 1000 Bernoulli − Gaussian training samples 3 θ 0 , θ 1 X k 1 D θ 0 , θ 1 θ 0 2 θ 1 1 k D − 1 0 − 1 − 2 − 3 − 4 − 3 − 2 − 1 0 1 2 3 Empirical observations a) Global minima match angles of the original basis b) There is no other local minimum. R. GRIBONVAL - 2015 January 2015 51
Sparsity vs Coherence (2D) weakly sparse 1 N = 1000 Bernoulli − Gaussian training samples N = 1000 Bernoulli − Gaussian training samples 4 4 3 3 2 2 1 1 0 0 − 1 − 1 − 2 − 2 − 3 − 3 − 4 − 4 − 4 − 3 − 2 − 1 0 1 2 3 − 3 − 2 − 1 0 1 2 3 N = 1000 Bernoulli − Gaussian training samples N = 1000 Bernoulli − Gaussian training samples p 3 3 2.5 2 2 1 1.5 1 0 0.5 − 1 0 − 2 − 0.5 − 1 − 3 − 1.5 − 4 sparse − 2 − 3 − 2 − 1 0 1 2 3 − 2.5 − 2 − 1.5 − 1 − 0.5 0 0.5 1 1.5 2 0 1 µ = | cos( θ 1 − θ 0 ) | coherent incoherent R. GRIBONVAL - 2015 January 2015 52
Sparsity vs Coherence (2D) Empirical probability of success ground truth=local min ground truth=global min weakly sparse 1 N = 1000 Bernoulli − Gaussian training samples N = 1000 Bernoulli − Gaussian training samples 1 4 4 3 3 0.9 2 2 0.8 1 1 0.7 0 0 − 1 − 1 0.6. − 2 − 2 0.5 − 3 − 3 − 4 − 4 − 4 − 3 − 2 − 1 0 1 2 3 − 3 − 2 − 1 0 1 2 3 N = 1000 Bernoulli − Gaussian training samples N = 1000 Bernoulli − Gaussian training samples p 3 3 no spurious local min 2.5 2 2 1 1.5 1 0 0.5 − 1 0 − 2 − 0.5 − 1 − 3 − 1.5 − 4 sparse − 2 − 3 − 2 − 1 0 1 2 3 − 2.5 − 2 − 1.5 − 1 − 0.5 0 0.5 1 1.5 2 0 1 µ = | cos( θ 1 − θ 0 ) | coherent incoherent R. GRIBONVAL - 2015 January 2015 52
Sparsity vs Coherence (2D) Empirical probability of success ground truth=local min ground truth=global min weakly sparse 1 N = 1000 Bernoulli − Gaussian training samples N = 1000 Bernoulli − Gaussian training samples 1 4 4 3 3 0.9 2 2 0.8 1 1 0.7 0 0 − 1 − 1 0.6. − 2 − 2 0.5 − 3 − 3 − 4 − 4 − 4 − 3 − 2 − 1 0 1 2 3 − 3 − 2 − 1 0 1 2 3 N = 1000 Bernoulli − Gaussian training samples N = 1000 Bernoulli − Gaussian training samples p 3 3 no spurious local min 2.5 2 2 1 1.5 1 0 0.5 − 1 0 − 2 − 0.5 − 1 − 3 − 1.5 − 4 sparse − 2 − 3 − 2 − 1 0 1 2 3 − 2.5 − 2 − 1.5 − 1 − 0.5 0 0.5 1 1.5 2 0 1 µ = | cos( θ 1 − θ 0 ) | Rule of thumb : perfect recovery if: coherent incoherent a) Incoherence µ < 1 − p b) Enough training samples (N large enough) R. GRIBONVAL - 2015 January 2015 52
Empirical Findings • Stable & robust dictionary identification ✓ Global minima often match ground truth ✓ Often, there is no spurious local minimum • Role of parameters ? ✓ sparsity level ? ✓ incoherence of D ? ✓ noise level ? ✓ presence / nature of outliers ? ✓ sample complexity (number of training samples) ? R. GRIBONVAL - 2015 January 2015 53
Identifiability Analysis: Overview [G. & Schnass 2010] [Geng & al 2011] [Jenatton, Bach & G.] − signal model − − − − − − − − overcomplete (d<K) no yes yes outliers yes no yes noise no no yes min D ,Z k Z k 1 s.t. D Z = X min F X ( D ) cost function R. GRIBONVAL - 2015 January 2015 54
Identifiability Analysis: Overview [G. & Schnass 2010] [Geng & al 2011] [Jenatton, Bach & G.] − signal model − − − − − − − − overcomplete (d<K) no yes yes outliers yes no yes noise no no yes min D ,Z k Z k 1 s.t. D Z = X min F X ( D ) cost function φ ( z ) = λ k z k 1 R. GRIBONVAL - 2015 January 2015 54
Identifiability Analysis: Overview [G. & Schnass 2010] [Geng & al 2011] [Jenatton, Bach & G.] − signal model − − − − − − − − overcomplete (d<K) no yes yes outliers yes no yes noise no no yes min D ,Z k Z k 1 s.t. D Z = X min F X ( D ) cost function φ ( z ) = λ k z k 1 See also: [Spielman&al 2012, Agarwal & al 2013/2014, Arora & al 2013/2014, Schnass 2013, Schnass 2014] R. GRIBONVAL - 2015 January 2015 54
Theoretical Guarantees ? • Given N training samples in X : ˆ D N ∈ arg min D F X ( D ) Compression, denoising, calibration, ✓ Source localization, neural coding ... inverse problems ... ✓ No «ground truth dictionary» ✓ Ground truth x = D 0 z + ε ✓ Goal = performance generalization ✓ Goal = dictionary estimation k ˆ D N � D 0 k F E F X ( ˆ D N ) ≤ min D E F X ( D ) + η N • «How many training samples ?» • What recovery conditions ? • Excess risk analysis • Identifiability analysis (~Machine Learning) (~Signal Processing) [Maurer and Pontil, 2010; Vainsencher & al., ✦ [Independent Component Analysis, e.g. book ✦ 2010; Mehta and Gray, 2012; G. & al 2013] Comon & Jutten 2011] R. GRIBONVAL - 2015 January 2015 55
Theoretical Guarantees ? • Given N training samples in X : ˆ D N ∈ arg min D F X ( D ) Compression, denoising, calibration, ✓ Source localization, neural coding ... inverse problems ... ✓ No «ground truth dictionary» ✓ Ground truth x = D 0 z + ε ✓ Goal = performance generalization ✓ Goal = dictionary estimation k ˆ D N � D 0 k F E F X ( ˆ D N ) ≤ min D E F X ( D ) + η N • «How many training samples ?» • What recovery conditions ? ✓ • Excess risk analysis • Identifiability analysis (~Machine Learning) (~Signal Processing) [Maurer and Pontil, 2010; Vainsencher & al., ✦ [Independent Component Analysis, e.g. book ✦ 2010; Mehta and Gray, 2012; G. & al 2013] Comon & Jutten 2011] R. GRIBONVAL - 2015 January 2015 55
Theoretical Guarantees ? • Given N training samples in X : ˆ D N ∈ arg min D F X ( D ) Compression, denoising, calibration, ✓ Source localization, neural coding ... inverse problems ... ✓ No «ground truth dictionary» ✓ Ground truth x = D 0 z + ε ✓ Goal = performance generalization ✓ Goal = dictionary estimation k ˆ D N � D 0 k F E F X ( ˆ D N ) ≤ min D E F X ( D ) + η N • «How many training samples ?» • What recovery conditions ? ✓ • Excess risk analysis • Identifiability analysis (~Machine Learning) (~Signal Processing) [Maurer and Pontil, 2010; Vainsencher & al., ✦ [Independent Component Analysis, e.g. book ✦ 2010; Mehta and Gray, 2012; G. & al 2013] Comon & Jutten 2011] R. GRIBONVAL - 2015 January 2015 55
«Ground Truth» = Sparse Signal Model X x = z i d i + ε = D J z J + ε i ∈ J • Random support J ⊂ [1 , K ] , � J = s • Bounded coefficient vector + bounded from below P ( k z J k 2 > M z ) = 0 P (min j ∈ J | z i | < z ) = 0 • Bounded & white noise P ( k ε k 2 > M ✏ ) = 0 ✓ (+ second moment assumptions) R. GRIBONVAL - 2015 January 2015 56
«Ground Truth» = Sparse Signal Model X x = z i d i + ε = D J z J + ε i ∈ J • Random support J ⊂ [1 , K ] , � J = s • Bounded coefficient vector + bounded from below P ( k z J k 2 > M z ) = 0 P (min j ∈ J | z i | < z ) = 0 • Bounded & white noise P ( k ε k 2 > M ✏ ) = 0 ✓ (+ second moment assumptions) NB : z not required to have i.i.d. entries R. GRIBONVAL - 2015 January 2015 56
Theorem: Robust Local Identifiability • Assume [Jenatton, Bach & G. 2012] R. GRIBONVAL - 2015 January 2015 57
Theorem: Robust Local Identifiability • Assume [Jenatton, Bach & G. 2012] µ ( D 0 ) = max i 6 = j | h d i , d j i | 2 [0 , 1] dictionary with small coherence ✦ R. GRIBONVAL - 2015 January 2015 57
Theorem: Robust Local Identifiability • Assume [Jenatton, Bach & G. 2012] µ ( D 0 ) = max i 6 = j | h d i , d j i | 2 [0 , 1] dictionary with small coherence ✦ 1 s . s -sparse coefficient model (no outlier, no noise) ✦ µ ( D 0 ) k | D 0 k | 2 R. GRIBONVAL - 2015 January 2015 57
Theorem: Robust Local Identifiability • Assume [Jenatton, Bach & G. 2012] µ ( D 0 ) = max i 6 = j | h d i , d j i | 2 [0 , 1] dictionary with small coherence ✦ 1 s . s -sparse coefficient model (no outlier, no noise) ✦ µ ( D 0 ) k | D 0 k | 2 1 • Then : consider 2 k X � D Z k 2 F X ( D ) = min F + λ k Z k 1 , 1 Z ✓ for any small enough , with high probability on , λ X ˆ there is a local minimum of such that F X ( D ) D k ˆ D � D 0 k F O ( λ sµ k | D 0 k | 2 ) R. GRIBONVAL - 2015 January 2015 57
Theorem: Robust Local Identifiability • Assume [Jenatton, Bach & G. 2012] µ ( D 0 ) = max i 6 = j | h d i , d j i | 2 [0 , 1] dictionary with small coherence ✦ 1 s . s -sparse coefficient model (no outlier, no noise) ✦ µ ( D 0 ) k | D 0 k | 2 1 • Then : consider 2 k X � D Z k 2 F X ( D ) = min F + λ k Z k 1 , 1 Z ✓ for any small enough , with high probability on , λ X ˆ there is a local minimum of such that F X ( D ) D k ˆ D � D 0 k F O ( λ sµ k | D 0 k | 2 ) • + stability to noise R. GRIBONVAL - 2015 January 2015 57
Theorem: Robust Local Identifiability • Assume [Jenatton, Bach & G. 2012] µ ( D 0 ) = max i 6 = j | h d i , d j i | 2 [0 , 1] dictionary with small coherence ✦ 1 s . s -sparse coefficient model (no outlier, no noise) ✦ µ ( D 0 ) k | D 0 k | 2 1 • Then : consider 2 k X � D Z k 2 F X ( D ) = min F + λ k Z k 1 , 1 Z ✓ for any small enough , with high probability on , λ X ˆ there is a local minimum of such that F X ( D ) D k ˆ D � D 0 k F O ( λ sµ k | D 0 k | 2 ) • + stability to noise • + finite sample results R. GRIBONVAL - 2015 January 2015 57
Theorem: Robust Local Identifiability • Assume [Jenatton, Bach & G. 2012] µ ( D 0 ) = max i 6 = j | h d i , d j i | 2 [0 , 1] dictionary with small coherence ✦ 1 s . s -sparse coefficient model (no outlier, no noise) ✦ µ ( D 0 ) k | D 0 k | 2 1 • Then : consider 2 k X � D Z k 2 F X ( D ) = min F + λ k Z k 1 , 1 Z ✓ for any small enough , with high probability on , λ X ˆ there is a local minimum of such that F X ( D ) D k ˆ D � D 0 k F O ( λ sµ k | D 0 k | 2 ) • + stability to noise • + finite sample results • + robustness to outliers R. GRIBONVAL - 2015 January 2015 57
Recommend
More recommend