proximal identification and applications
play

Proximal Identification and Applications J er ome MALICK CNRS, - PowerPoint PPT Presentation

Proximal Identification and Applications J er ome MALICK CNRS, Lab. J. Kuntzmann, Grenoble (France) Workshop Optimization for Machine Learning Luminy March 2020 talk based on materiel from joint work with G. Peyr e J. Fadili


  1. Proximal Identification and Applications J´ erˆ ome MALICK CNRS, Lab. J. Kuntzmann, Grenoble (France) Workshop Optimization for Machine Learning – Luminy – March 2020 talk based on materiel from joint work with G. Peyr´ e J. Fadili G. Garrigos F. Iutzeler D. Grishchenko

  2. Example of stability 1 2 � A x − y � 2 + λ � x � 1 min (LASSO) x ∈ R d Stability: the support of optimal solutions is stable under small perturbations Illustration (on an instance with d = 2 ) 4 20 2 4 1 6 0 1 0 . 5 0 . 5 2 2 4 6 1 0 . 5 1 2 4 6 0 2 4 10 4 6 6 1 0 − 2 2 0 1 0 30 − 2 − 1 0 1 2 3 4 1

  3. Example of stability 1 2 � A x − y � 2 + λ � x � 1 min (LASSO) x ∈ R d Stability: the support of optimal solutions is stable under small perturbations Illustration (on an instance with d = 2 ) 4 20 4 4 2 1 2 4 1 6 0 6 0 . 5 10 1 1 0 2 . 5 0 . 5 4 2 2 2 0 1 0 . 5 . 5 4 2 6 1 0 . 5 1 2 6 1 4 4 2 2 6 0 2 0 10 6 4 10 4 4 4 6 1 0 6 6 1 0 − 2 − 2 10 2 0 30 2 0 1 0 30 − 2 − 1 0 1 2 3 4 − 2 − 1 0 1 2 3 4 1

  4. Example of stability 1 2 � A x − y � 2 + λ � x � 1 min (LASSO) x ∈ R d Stability: the support of optimal solutions is stable under small perturbations Illustration (on an instance with d = 2 ) 4 20 4 6 4 2 2 1 4 1 6 0 1 0 . 5 1 0 . 5 0 . 5 2 6 4 2 2 5 0 . 1 0 2 4 6 1 1 0 . 5 0 . 5 1 0 2 2 2 4 6 4 10 6 0 2 0 4 10 4 6 20 4 6 10 6 30 1 0 − 2 − 2 20 2 0 1 0 40 30 0 3 − 2 − 1 0 1 2 3 4 − 2 − 1 0 1 2 3 4 1

  5. Example of stability 1 2 � A x − y � 2 + λ � x � 1 min (LASSO) x ∈ R d Stability: the support of optimal solutions is stable under small perturbations Illustration (on an instance with d = 2 ) 4 20 4 6 4 2 2 1 4 1 6 0 1 0 . 5 1 0 . 5 0 . 5 2 6 4 2 2 5 0 . 1 0 2 4 6 1 1 0 . 5 0 . 5 1 0 2 2 2 4 6 4 10 6 0 2 0 4 10 4 6 20 4 6 10 6 30 1 0 − 2 − 2 20 2 0 1 0 40 30 0 3 − 2 − 1 0 1 2 3 4 − 2 − 1 0 1 2 3 4 More generally: [Lewis ’02] sensitivity analysis of partly-smooth functions (remind Clarice’s talk, this morning) 1

  6. Example of identification 1 2 � A x − y � 2 + λ � x � 1 min (LASSO) x ∈ R d Identification: (proximal-gradient) algorithms produce iterates... ...that eventually have the same support as the optimal solution 9 . 1 2 1 Proximal Gradient 1 . 3 . 4 4 Accelerated Proximal Gradient 10 . 2 1 . 5 2 . 3 1 . 1 4 . 5 5 . 7 6 . 8 8 x ⋆ 1 1 . 1 3 . 4 0 . 5 2 . 3 0 2 . 3 5 . 7 4 . 5 3 . 4 5 − 0 . 5 . 7 3 . 4 6 . 8 4 . 5 8 − 1 − 1 0 1 2 3 4 Runs of two proximal-gradient algos (same instance with d = 2 ) 2

  7. Example of identification 1 2 � A x − y � 2 + λ � x � 1 min (LASSO) x ∈ R d Identification: (proximal-gradient) algorithms produce iterates... ...that eventually have the same support as the optimal solution 9 . 1 2 1 Proximal Gradient 1 . 3 . 4 4 Accelerated Proximal Gradient 10 . 2 1 . 5 2 . 3 1 . 1 4 . 5 5 . 7 6 . 8 8 x ⋆ 1 1 . 1 3 . 4 0 . 5 2 . 3 0 2 . 3 5 . 7 4 . 5 3 . 4 5 − 0 . 5 . 7 3 . 4 6 . 8 4 . 5 8 − 1 − 1 0 1 2 3 4 Runs of two proximal-gradient algos (same instance with d = 2 ) Well-studied, see e.g. [Bertsekas ’76] , [Wright ’96] , [Lewis Drusvyatskiy ’13] ... 2

  8. Outline General stability of regularized problems 1 Enlarged identification of proximal algorithms 2 Application: communication-efficient federated learning 3 Application: model consistency for regularized least-squares 4

  9. Outline General stability of regularized problems 1 Enlarged identification of proximal algorithms 2 Application: communication-efficient federated learning 3 Application: model consistency for regularized least-squares 4

  10. General stability of regularized problems Stability or sensitivity analysis Parameterized composite optimization problem (smooth + nonsmooth) min F ( x , p ) + R ( x ) , x ∈ R d Typically nonsmooth R traps solutions in low-dimensional manifolds x ⋆ ( p ) ∈ M Stability: Optimal solutions lie on a manifold: for p ∼ p 0 Studied in e.g. [Hare Lewis ’10] [Vaiter et al ’15] [Liang et al ’16] ... � x ⋆ ( p ) � � x ⋆ ( p 0 ) � Example 1: R = � · � 1 , supp = supp 3

  11. General stability of regularized problems Stability or sensitivity analysis Parameterized composite optimization problem (smooth + nonsmooth) min F ( x , p ) + R ( x ) , x ∈ R d Typically nonsmooth R traps solutions in low-dimensional manifolds x ⋆ ( p ) ∈ M Stability: Optimal solutions lie on a manifold: for p ∼ p 0 Studied in e.g. [Hare Lewis ’10] [Vaiter et al ’15] [Liang et al ’16] ... � x ⋆ ( p ) � � x ⋆ ( p 0 ) � Example 1: R = � · � 1 , supp = supp p p 0 Example 2: R = ι B ∞ (indicator function) projection onto the ℓ ∞ ball Many examples in machine learning... 3

  12. General stability of regularized problems Structure of nonsmooth regularizers Many of the regularizers used in machine learning or image processing have a strong primal-dual structure (“mirror-stratifiable” [Fadili, M., Peyr´ e ’18] ) ...that can be exploit to get (enlarged) stability/identification results Examples: (associated unit ball and low-dimensional manifold where x belongs) R = � · � 1 ( and � · � ∞ or other polyedral gauges) x M x M x = { z : supp ( z )= supp ( x ) } 4

  13. General stability of regularized problems Structure of nonsmooth regularizers Many of the regularizers used in machine learning or image processing have a strong primal-dual structure (“mirror-stratifiable” [Fadili, M., Peyr´ e ’18] ) ...that can be exploit to get (enlarged) stability/identification results Examples: (associated unit ball and low-dimensional manifold where x belongs) R = � · � 1 ( and � · � ∞ or other polyedral gauges) R ( X ) = � i | σ i ( X ) | = � σ ( X ) � 1 nuclear norm (aka trace-norm) x M x x M x M x = { z : supp ( z )= supp ( x ) } M x = { z : rank ( z )= rank ( x ) } 4

  14. General stability of regularized problems Structure of nonsmooth regularizers Many of the regularizers used in machine learning or image processing have a strong primal-dual structure (“mirror-stratifiable” [Fadili, M., Peyr´ e ’18] ) ...that can be exploit to get (enlarged) stability/identification results Examples: (associated unit ball and low-dimensional manifold where x belongs) R = � · � 1 ( and � · � ∞ or other polyedral gauges) R ( X ) = � i | σ i ( X ) | = � σ ( X ) � 1 nuclear norm (aka trace-norm) R ( x ) = � b ∈B � x b � 2 group- ℓ 1 ( e.g. R ( x ) = � x 1 , 2 � + | x 3 | ) M x x x M x x M x M x = { z : supp ( z )= supp ( x ) } M x = { z : rank ( z )= rank ( x ) } M x = { 0 } × { 0 } × R 4

  15. General stability of regularized problems Recall on stratifications A stratification of a set D ⊂ R d is a (finite) partition M = { M i } i ∈ I � D = M i i ∈ I with so-called “strata” (e.g. smooth/affine manifolds) which fit nicely: M ∩ cl ( M ′ ) � = ∅ M ⊂ cl ( M ′ ) = ⇒ Example: B ∞ the unit ℓ ∞ -ball in R 2 M 2 M 1 a stratification with 9 (affine) strata M 3 M 4 Other examples: “tame” sets, remind Edouard’s talk 5

  16. General stability of regularized problems Recall on stratifications A stratification of a set D ⊂ R d is a (finite) partition M = { M i } i ∈ I � D = M i i ∈ I with so-called “strata” (e.g. smooth/affine manifolds) which fit nicely: M ∩ cl ( M ′ ) � = ∅ M ⊂ cl ( M ′ ) = ⇒ This relation induces a (partial) ordering M � M ′ Example: B ∞ the unit ℓ ∞ -ball in R 2 M 2 M 1 a stratification with 9 (affine) strata M 3 M 4 M 1 � M 2 � M 4 M 1 � M 3 � M 4 Other examples: “tame” sets, remind Edouard’s talk 5

  17. General stability of regularized problems Mirror-stratifiable regularizations (primal) stratification M = { M i } i ∈ I and (dual) stratification M ∗ = { M ∗ i } i ∈ I in one-to-one decreasing correspondence � through the transfert operator J R ( S ) = ri ( ∂ R ( x )) x ∈ S R ∗ = � · � 1 Simple example: R = ι B ∞ J R M ∗ 2 M 2 M 1 M ∗ 1 M 3 M ∗ M 4 3 M ∗ 4 J R ∗ � � ri ∂ R ( x ) = ri N B ∞ ( x ) = M ∗ ri ∂ R ∗ ( x ) = J R ∗ ( M ∗ J R ( M i ) = M i = ri ∂ � x � 1 = i ) i x ∈ Mi x ∈ M ∗ i 6

  18. General stability of regularized problems Enlarged stability result Theorem ( Fadili, M., Peyr´ e ’18 ) For the composite optimization problem (smooth + nonsmooth) min F ( x , p ) + R ( x ) , x ∈ R d satisfying mild assumptions (unique minimizer x ⋆ ( p 0 ) at p 0 and objective uniformly level-bounded in x ) , if R is mirror-stratifiable, then for p ∼ p 0 , M x ⋆ ( p 0 ) � M x ⋆ ( p ) � J R ∗ ( M ∗ u ⋆ ( p 0 ) ) If R = � · � 1 , then supp ( x ⋆ ( p 0 )) ⊆ supp ( x ⋆ ( p )) ⊆ { i : | u ⋆ ( p 0 ) i | = 1 } 7

Recommend


More recommend