Sparsity in Learning Y. Grandvalet Heudiasyc, C NRS & Universit´ e de Technologie de Compi` egne
Statistical Learning Parsimony Variable Space Example Space Conclusions Statistical Learning ● Regression ● Classification ● Clustering Statlearn’11 Sparsity in Learning Y. Grandvalet 2
Statistical Learning Parsimony Variable Space Example Space Conclusions Statistical Learning Generalize from examples i = 1 adjust � f ∈ F , such that � Given a training sample, { ( x i , y i ) } n f ( x i ) ≃ y i . Choose F not too small, nor too large, so that � f reaches a trade-off between fit and smoothness X 2 X 1 Statlearn’11 Sparsity in Learning Y. Grandvalet 3
Statistical Learning Parsimony Variable Space Example Space Conclusions Learning Algorithm A 3 steps process Structural Risk Minimization: choose F and � f 1. Define a nested family of models F 1 ⊂ F 2 ⊂ . . . F λ . . . ⊂ F L 2. Fit to data: � f λ = Argmin R emp ( f ) , λ = 1 . . . , L f ∈F λ λ by estimating the expected loss of � 3. Select model F � f λ Choosing F amounts to choose a parameter Statlearn’11 Sparsity in Learning Y. Grandvalet 4
Statistical Learning Parsimony Variable Space Example Space Conclusions Structural Risk Minimization upper-bound on R ( � f ) R emp ( � f ) ✻ ✻ ✻ � � � f 1 f � f L λ F 1 F � F λ 3. Minimize � R ( � f λ ) Statlearn’11 Sparsity in Learning Y. Grandvalet 5
Statistical Learning Parsimony Variable Space Example Space Conclusions Structural Risk Minimization Approximation/estimation trade-off target f ∗ F 0 F 1 F 2 R ( f ) = E XY ( ℓ ( f ( X ) , Y )) level curves Statlearn’11 Sparsity in Learning Y. Grandvalet 6
Statistical Learning Parsimony Variable Space Example Space Conclusions Parsimonious use of data We consider the data table : x t 1 . . . � X 1 . . . X j . . . X d � x t X = = i . . . x t n This table can be reduced 1. in rows ⇒ suppress some examples: compression ⇒ loss function 2. in columns ⇒ suppress variables: Occam’s razor ⇒ model selection 3. in rows and columns 4. in rank (PCA, PLS, . . . ) Statlearn’11 Sparsity in Learning Y. Grandvalet 8
Statistical Learning Parsimony Variable Space Example Space Conclusions Why ignoring some variables. . . since the Bayes error may only decreases with more variables ? ● Means to implement Structural Risk Minimization ❍ Penalize to stabilize ❍ Parsimony is sometimes a “reasonable prior” ● Computational efficiency: ❍ Iteratively solve problem of increasing size ❍ Exact regularization paths ❍ Fast evaluation ● Interpretability △ ! ❍ Understanding the underlying phenomenon ❍ Acceptability Statlearn’11 Sparsity in Learning Y. Grandvalet 10
Statistical Learning Parsimony Variable Space Example Space Conclusions Three categories of methods 1. “Filter” approach ❍ Variables “filtered” by a criterion (Fisher, Wilks, mutual information) ❍ Learning proceeds after the treatement 2. “Wrapper” approach ❍ Heuristic search of subsets of variables ❍ Subset selection is determined by the learning algorithm performance ❍ no feedback 3. “Embedded” ❍ Variable selection mechanism incorporated in the learning algorithm ❍ All variables processed during learning, some will not influence the solution Statlearn’11 Sparsity in Learning Y. Grandvalet 11
Statistical Learning Parsimony Variable Space Example Space Conclusions Embedded LASSO Geometric Insights Examples Ball crafting Coop-Lasso Embedded Subset Selection For linear models d � β j x j , f ( x ; β ) = β 0 + j = 1 Subset selection aims at solving the problem � n 1 min ℓ ( f ( x i ; β ) , y i ) n , β i = 1 � β � 0 ≤ d ′ < d s. t. wher d ′ is the number of desired variables NP-hard problem Statlearn’11 Sparsity in Learning Y. Grandvalet 12
Statistical Learning Parsimony Variable Space Example Space Conclusions Embedded LASSO Geometric Insights Examples Ball crafting Coop-Lasso Relaxation Soft-thresholding Relax “hard” subset selection n � 1 min ℓ ( f ( x i ; β ) , y i ) n . β i = 1 s. t. � β � p ≤ c Sparse solution for p ≤ 1 Convex optimization problem (if ℓ convex) for p ≥ 1 Statlearn’11 Sparsity in Learning Y. Grandvalet 13
Statistical Learning Parsimony Variable Space Example Space Conclusions Embedded LASSO Geometric Insights Examples Ball crafting Coop-Lasso Sparsity – Convexity Trade-off β 2 β 2 β RR β 2 β OLS β OLS β L β OLS 0 0 β L1/2 β L 0 0 0 0 β 1 β 1 β 1 � d � d � d j = 1 | β j | 2 j = 1 | β j | j = 1 | β j | 1 / 2 ridge (weight decay) LASSO Statlearn’11 Sparsity in Learning Y. Grandvalet 14
Statistical Learning Parsimony Variable Space Example Space Conclusions Embedded LASSO Geometric Insights Examples Ball crafting Coop-Lasso Geometric Insight on Adaptivity Variational formulation � 2 � 2 0 0 0 0 � 1 � 1 n 1 n � min ℓ ( f ( x i ; β ) , y i ) 1 � ℓ ( f ( x i ; β ) , y i ) n min β, s n i = 1 β i = 1 d β 2 ⇔ d � j ≤ c 2 s. t. � s. t. | β j � ≤ c s j j = 1 j = 1 � d j = 1 s j ≤ 1 , s j ≥ 0 j = 1 , . . . , d Adaptive ridge penalty Statlearn’11 Sparsity in Learning Y. Grandvalet 15
Statistical Learning Parsimony Variable Space Example Space Conclusions Embedded LASSO Geometric Insights Examples Ball crafting Coop-Lasso Geometric Insight on Sparsity Constrained Optimization L ( β 1 , β 2 ) max β 1 ,β 2 L ( β 1 , β 2 ) − λ Ω( β 1 , β 2 ) � max L ( β 1 , β 2 ) ⇔ β 1 ,β 2 Ω( β 1 , β 2 ) ≤ c s . t . β 2 β 1 Statlearn’11 Sparsity in Learning Y. Grandvalet 16
Statistical Learning Parsimony Variable Space Example Space Conclusions Embedded LASSO Geometric Insights Examples Ball crafting Coop-Lasso Geometric Insight on Sparsity Supporting Hyperplane An hyperplane supports a set iff ● the set is contained in one half-space ● the set has at least one point on the hyperplane β 2 β 2 β 2 β 2 β 2 β 1 β 1 β 1 β 1 β 1 There are Supporting Hyperplane at all points of convex sets: Generalize tangents Statlearn’11 Sparsity in Learning Y. Grandvalet 17
Statistical Learning Parsimony Variable Space Example Space Conclusions Embedded LASSO Geometric Insights Examples Ball crafting Coop-Lasso Geometric Insight on Sparsity Dual Cone Generalizes normals β 2 β 2 β 2 β 2 β 2 β 2 β 1 β 1 β 1 β 1 β 1 β 1 Shape of dual cones ⇒ sparsity pattern Statlearn’11 Sparsity in Learning Y. Grandvalet 18
Statistical Learning Parsimony Variable Space Example Space Conclusions Embedded LASSO Geometric Insights Examples Ball crafting Coop-Lasso Expression Recognition Logistic Regression Surprise Anger Sadness Happiness Fear Disgust Surprise Anger Sadness Happiness Fear Disgust Surprise Surprise Anger Anger Sadness Sadness Happiness Happiness Fear Fear Disgust Disgust Surprise Sadness Happiness Surprise Sadness Happiness Statlearn’11 Sparsity in Learning Y. Grandvalet 19
Statistical Learning Parsimony Variable Space Example Space Conclusions Embedded LASSO Geometric Insights Examples Ball crafting Coop-Lasso Prediction of Response to Chemotherapy Logistic Regression 6 4 β j � 2 0 probe sets/genes No coherent pattern Statlearn’11 Sparsity in Learning Y. Grandvalet 20
Statistical Learning Parsimony Variable Space Example Space Conclusions Embedded LASSO Geometric Insights Examples Ball crafting Coop-Lasso Ball crafting Group sparsity ridge lasso group-lasso hierarchies coop-lasso ● Additive models (Grandvalet & Canu 1999, Bakin, 1999) ❍ Adaptive metric ⇒ 1 or 2 hyper-parameters (compared to d ) ❍ Ease to implementation, interpretability ● Multiple/Composite Kernel Learning (Lanckriet et al. , 2004, Szafranski et al. , 2010) ❍ Adaptive metric: “learn the kernel” ⇒ 1 hyper-parameter ❍ CKL takes into account a group structure on kernels ● Sign-coherent groups ❍ Multi-task learning for pathway inference (Chiquet et al. , 2010) ❍ Prediction from cooperative features (Chiquet et al. , 2011) Statlearn’11 Sparsity in Learning Y. Grandvalet 21
Statistical Learning Parsimony Variable Space Example Space Conclusions Embedded LASSO Geometric Insights Examples Ball crafting Coop-Lasso Group-Lasso � n 1 min ℓ ( f ( x i ; β ) , y i ) n β i = 1 1 / 2 , K � � β 2 s. t. ≤ c j k = 1 j ∈G k where {G k } K k = 1 forms a partion of { 1 , . . . , d } Sparse solution groupwise No sign-coherence Statlearn’11 Sparsity in Learning Y. Grandvalet 22
Recommend
More recommend