sparsity in learning
play

Sparsity in Learning Y. Grandvalet Heudiasyc, C NRS & Universit - PowerPoint PPT Presentation

Sparsity in Learning Y. Grandvalet Heudiasyc, C NRS & Universit e de Technologie de Compi` egne Statistical Learning Parsimony Variable Space Example Space Conclusions Statistical Learning Regression Classification


  1. Sparsity in Learning Y. Grandvalet Heudiasyc, C NRS & Universit´ e de Technologie de Compi` egne

  2. Statistical Learning Parsimony Variable Space Example Space Conclusions Statistical Learning ● Regression ● Classification ● Clustering Statlearn’11 Sparsity in Learning Y. Grandvalet 2

  3. Statistical Learning Parsimony Variable Space Example Space Conclusions Statistical Learning Generalize from examples i = 1 adjust � f ∈ F , such that � Given a training sample, { ( x i , y i ) } n f ( x i ) ≃ y i . Choose F not too small, nor too large, so that � f reaches a trade-off between fit and smoothness X 2 X 1 Statlearn’11 Sparsity in Learning Y. Grandvalet 3

  4. Statistical Learning Parsimony Variable Space Example Space Conclusions Learning Algorithm A 3 steps process Structural Risk Minimization: choose F and � f 1. Define a nested family of models F 1 ⊂ F 2 ⊂ . . . F λ . . . ⊂ F L 2. Fit to data: � f λ = Argmin R emp ( f ) , λ = 1 . . . , L f ∈F λ λ by estimating the expected loss of � 3. Select model F � f λ Choosing F amounts to choose a parameter Statlearn’11 Sparsity in Learning Y. Grandvalet 4

  5. Statistical Learning Parsimony Variable Space Example Space Conclusions Structural Risk Minimization upper-bound on R ( � f ) R emp ( � f ) ✻ ✻ ✻ � � � f 1 f � f L λ F 1 F � F λ 3. Minimize � R ( � f λ ) Statlearn’11 Sparsity in Learning Y. Grandvalet 5

  6. Statistical Learning Parsimony Variable Space Example Space Conclusions Structural Risk Minimization Approximation/estimation trade-off target f ∗ F 0 F 1 F 2 R ( f ) = E XY ( ℓ ( f ( X ) , Y )) level curves Statlearn’11 Sparsity in Learning Y. Grandvalet 6

  7. Statistical Learning Parsimony Variable Space Example Space Conclusions Parsimonious use of data We consider the data table :   x t 1   . .   . � X 1 . . . X j . . . X d �     x t X = =   i   . .   . x t n This table can be reduced 1. in rows ⇒ suppress some examples: compression ⇒ loss function 2. in columns ⇒ suppress variables: Occam’s razor ⇒ model selection 3. in rows and columns 4. in rank (PCA, PLS, . . . ) Statlearn’11 Sparsity in Learning Y. Grandvalet 8

  8. Statistical Learning Parsimony Variable Space Example Space Conclusions Why ignoring some variables. . . since the Bayes error may only decreases with more variables ? ● Means to implement Structural Risk Minimization ❍ Penalize to stabilize ❍ Parsimony is sometimes a “reasonable prior” ● Computational efficiency: ❍ Iteratively solve problem of increasing size ❍ Exact regularization paths ❍ Fast evaluation ● Interpretability △ ! ❍ Understanding the underlying phenomenon ❍ Acceptability Statlearn’11 Sparsity in Learning Y. Grandvalet 10

  9. Statistical Learning Parsimony Variable Space Example Space Conclusions Three categories of methods 1. “Filter” approach ❍ Variables “filtered” by a criterion (Fisher, Wilks, mutual information) ❍ Learning proceeds after the treatement 2. “Wrapper” approach ❍ Heuristic search of subsets of variables ❍ Subset selection is determined by the learning algorithm performance ❍ no feedback 3. “Embedded” ❍ Variable selection mechanism incorporated in the learning algorithm ❍ All variables processed during learning, some will not influence the solution Statlearn’11 Sparsity in Learning Y. Grandvalet 11

  10. Statistical Learning Parsimony Variable Space Example Space Conclusions Embedded LASSO Geometric Insights Examples Ball crafting Coop-Lasso Embedded Subset Selection For linear models d � β j x j , f ( x ; β ) = β 0 + j = 1 Subset selection aims at solving the problem  � n  1  min ℓ ( f ( x i ; β ) , y i ) n , β  i = 1  � β � 0 ≤ d ′ < d s. t. wher d ′ is the number of desired variables NP-hard problem Statlearn’11 Sparsity in Learning Y. Grandvalet 12

  11. Statistical Learning Parsimony Variable Space Example Space Conclusions Embedded LASSO Geometric Insights Examples Ball crafting Coop-Lasso Relaxation Soft-thresholding Relax “hard” subset selection  n �  1  min ℓ ( f ( x i ; β ) , y i ) n . β  i = 1  s. t. � β � p ≤ c Sparse solution for p ≤ 1 Convex optimization problem (if ℓ convex) for p ≥ 1 Statlearn’11 Sparsity in Learning Y. Grandvalet 13

  12. Statistical Learning Parsimony Variable Space Example Space Conclusions Embedded LASSO Geometric Insights Examples Ball crafting Coop-Lasso Sparsity – Convexity Trade-off β 2 β 2 β RR β 2 β OLS β OLS β L β OLS 0 0 β L1/2 β L 0 0 0 0 β 1 β 1 β 1 � d � d � d j = 1 | β j | 2 j = 1 | β j | j = 1 | β j | 1 / 2 ridge (weight decay) LASSO Statlearn’11 Sparsity in Learning Y. Grandvalet 14

  13. Statistical Learning Parsimony Variable Space Example Space Conclusions Embedded LASSO Geometric Insights Examples Ball crafting Coop-Lasso Geometric Insight on Adaptivity Variational formulation � 2 � 2 0 0 0 0 � 1 � 1 n  1 n �  min ℓ ( f ( x i ; β ) , y i ) 1  �  ℓ ( f ( x i ; β ) , y i ) n  min  β, s     n i = 1 β     i = 1 d β 2 ⇔ d � j ≤ c 2 s. t. �   s. t. | β j � ≤ c s j     j = 1      j = 1 � d  j = 1 s j ≤ 1 , s j ≥ 0 j = 1 , . . . , d Adaptive ridge penalty Statlearn’11 Sparsity in Learning Y. Grandvalet 15

  14. Statistical Learning Parsimony Variable Space Example Space Conclusions Embedded LASSO Geometric Insights Examples Ball crafting Coop-Lasso Geometric Insight on Sparsity Constrained Optimization L ( β 1 , β 2 ) max β 1 ,β 2 L ( β 1 , β 2 ) − λ Ω( β 1 , β 2 ) � max L ( β 1 , β 2 ) ⇔ β 1 ,β 2 Ω( β 1 , β 2 ) ≤ c s . t . β 2 β 1 Statlearn’11 Sparsity in Learning Y. Grandvalet 16

  15. Statistical Learning Parsimony Variable Space Example Space Conclusions Embedded LASSO Geometric Insights Examples Ball crafting Coop-Lasso Geometric Insight on Sparsity Supporting Hyperplane An hyperplane supports a set iff ● the set is contained in one half-space ● the set has at least one point on the hyperplane β 2 β 2 β 2 β 2 β 2 β 1 β 1 β 1 β 1 β 1 There are Supporting Hyperplane at all points of convex sets: Generalize tangents Statlearn’11 Sparsity in Learning Y. Grandvalet 17

  16. Statistical Learning Parsimony Variable Space Example Space Conclusions Embedded LASSO Geometric Insights Examples Ball crafting Coop-Lasso Geometric Insight on Sparsity Dual Cone Generalizes normals β 2 β 2 β 2 β 2 β 2 β 2 β 1 β 1 β 1 β 1 β 1 β 1 Shape of dual cones ⇒ sparsity pattern Statlearn’11 Sparsity in Learning Y. Grandvalet 18

  17. Statistical Learning Parsimony Variable Space Example Space Conclusions Embedded LASSO Geometric Insights Examples Ball crafting Coop-Lasso Expression Recognition Logistic Regression Surprise Anger Sadness Happiness Fear Disgust Surprise Anger Sadness Happiness Fear Disgust Surprise Surprise Anger Anger Sadness Sadness Happiness Happiness Fear Fear Disgust Disgust Surprise Sadness Happiness Surprise Sadness Happiness Statlearn’11 Sparsity in Learning Y. Grandvalet 19

  18. Statistical Learning Parsimony Variable Space Example Space Conclusions Embedded LASSO Geometric Insights Examples Ball crafting Coop-Lasso Prediction of Response to Chemotherapy Logistic Regression 6 4 β j � 2 0 probe sets/genes No coherent pattern Statlearn’11 Sparsity in Learning Y. Grandvalet 20

  19. Statistical Learning Parsimony Variable Space Example Space Conclusions Embedded LASSO Geometric Insights Examples Ball crafting Coop-Lasso Ball crafting Group sparsity ridge lasso group-lasso hierarchies coop-lasso ● Additive models (Grandvalet & Canu 1999, Bakin, 1999) ❍ Adaptive metric ⇒ 1 or 2 hyper-parameters (compared to d ) ❍ Ease to implementation, interpretability ● Multiple/Composite Kernel Learning (Lanckriet et al. , 2004, Szafranski et al. , 2010) ❍ Adaptive metric: “learn the kernel” ⇒ 1 hyper-parameter ❍ CKL takes into account a group structure on kernels ● Sign-coherent groups ❍ Multi-task learning for pathway inference (Chiquet et al. , 2010) ❍ Prediction from cooperative features (Chiquet et al. , 2011) Statlearn’11 Sparsity in Learning Y. Grandvalet 21

  20. Statistical Learning Parsimony Variable Space Example Space Conclusions Embedded LASSO Geometric Insights Examples Ball crafting Coop-Lasso Group-Lasso  � n  1   min ℓ ( f ( x i ; β ) , y i )   n  β i = 1   1 / 2 ,  K � �   β 2   s. t. ≤ c   j k = 1 j ∈G k where {G k } K k = 1 forms a partion of { 1 , . . . , d } Sparse solution groupwise No sign-coherence Statlearn’11 Sparsity in Learning Y. Grandvalet 22

Recommend


More recommend