Sparsity in Learning Y. Grandvalet Heudiasyc, C NRS & Universit - PowerPoint PPT Presentation

Sparsity in Learning Y. Grandvalet Heudiasyc, C NRS & Universit´ e de Technologie de Compi` egne

Statistical Learning Parsimony Variable Space Example Space Conclusions Statistical Learning ● Regression ● Classification ● Clustering Statlearn’11 Sparsity in Learning Y. Grandvalet 2

Statistical Learning Parsimony Variable Space Example Space Conclusions Statistical Learning Generalize from examples i = 1 adjust � f ∈ F , such that � Given a training sample, { ( x i , y i ) } n f ( x i ) ≃ y i . Choose F not too small, nor too large, so that � f reaches a trade-off between fit and smoothness X 2 X 1 Statlearn’11 Sparsity in Learning Y. Grandvalet 3

Statistical Learning Parsimony Variable Space Example Space Conclusions Learning Algorithm A 3 steps process Structural Risk Minimization: choose F and � f 1. Define a nested family of models F 1 ⊂ F 2 ⊂ . . . F λ . . . ⊂ F L 2. Fit to data: � f λ = Argmin R emp ( f ) , λ = 1 . . . , L f ∈F λ λ by estimating the expected loss of � 3. Select model F � f λ Choosing F amounts to choose a parameter Statlearn’11 Sparsity in Learning Y. Grandvalet 4

Statistical Learning Parsimony Variable Space Example Space Conclusions Structural Risk Minimization upper-bound on R ( � f ) R emp ( � f ) ✻ ✻ ✻ � � � f 1 f � f L λ F 1 F � F λ 3. Minimize � R ( � f λ ) Statlearn’11 Sparsity in Learning Y. Grandvalet 5

Statistical Learning Parsimony Variable Space Example Space Conclusions Structural Risk Minimization Approximation/estimation trade-off target f ∗ F 0 F 1 F 2 R ( f ) = E XY ( ℓ ( f ( X ) , Y )) level curves Statlearn’11 Sparsity in Learning Y. Grandvalet 6

Statistical Learning Parsimony Variable Space Example Space Conclusions Parsimonious use of data We consider the data table :   x t 1   . .   . � X 1 . . . X j . . . X d �     x t X = =   i   . .   . x t n This table can be reduced 1. in rows ⇒ suppress some examples: compression ⇒ loss function 2. in columns ⇒ suppress variables: Occam’s razor ⇒ model selection 3. in rows and columns 4. in rank (PCA, PLS, . . . ) Statlearn’11 Sparsity in Learning Y. Grandvalet 8

Statistical Learning Parsimony Variable Space Example Space Conclusions Why ignoring some variables. . . since the Bayes error may only decreases with more variables ? ● Means to implement Structural Risk Minimization ❍ Penalize to stabilize ❍ Parsimony is sometimes a “reasonable prior” ● Computational efficiency: ❍ Iteratively solve problem of increasing size ❍ Exact regularization paths ❍ Fast evaluation ● Interpretability △ ! ❍ Understanding the underlying phenomenon ❍ Acceptability Statlearn’11 Sparsity in Learning Y. Grandvalet 10

Statistical Learning Parsimony Variable Space Example Space Conclusions Three categories of methods 1. “Filter” approach ❍ Variables “filtered” by a criterion (Fisher, Wilks, mutual information) ❍ Learning proceeds after the treatement 2. “Wrapper” approach ❍ Heuristic search of subsets of variables ❍ Subset selection is determined by the learning algorithm performance ❍ no feedback 3. “Embedded” ❍ Variable selection mechanism incorporated in the learning algorithm ❍ All variables processed during learning, some will not influence the solution Statlearn’11 Sparsity in Learning Y. Grandvalet 11

Statistical Learning Parsimony Variable Space Example Space Conclusions Embedded LASSO Geometric Insights Examples Ball crafting Coop-Lasso Embedded Subset Selection For linear models d � β j x j , f ( x ; β ) = β 0 + j = 1 Subset selection aims at solving the problem  � n  1  min ℓ ( f ( x i ; β ) , y i ) n , β  i = 1  � β � 0 ≤ d ′ < d s. t. wher d ′ is the number of desired variables NP-hard problem Statlearn’11 Sparsity in Learning Y. Grandvalet 12

Statistical Learning Parsimony Variable Space Example Space Conclusions Embedded LASSO Geometric Insights Examples Ball crafting Coop-Lasso Relaxation Soft-thresholding Relax “hard” subset selection  n �  1  min ℓ ( f ( x i ; β ) , y i ) n . β  i = 1  s. t. � β � p ≤ c Sparse solution for p ≤ 1 Convex optimization problem (if ℓ convex) for p ≥ 1 Statlearn’11 Sparsity in Learning Y. Grandvalet 13

Statistical Learning Parsimony Variable Space Example Space Conclusions Embedded LASSO Geometric Insights Examples Ball crafting Coop-Lasso Sparsity – Convexity Trade-off β 2 β 2 β RR β 2 β OLS β OLS β L β OLS 0 0 β L1/2 β L 0 0 0 0 β 1 β 1 β 1 � d � d � d j = 1 | β j | 2 j = 1 | β j | j = 1 | β j | 1 / 2 ridge (weight decay) LASSO Statlearn’11 Sparsity in Learning Y. Grandvalet 14

Statistical Learning Parsimony Variable Space Example Space Conclusions Embedded LASSO Geometric Insights Examples Ball crafting Coop-Lasso Geometric Insight on Adaptivity Variational formulation � 2 � 2 0 0 0 0 � 1 � 1 n  1 n �  min ℓ ( f ( x i ; β ) , y i ) 1  �  ℓ ( f ( x i ; β ) , y i ) n  min  β, s     n i = 1 β     i = 1 d β 2 ⇔ d � j ≤ c 2 s. t. �   s. t. | β j � ≤ c s j     j = 1      j = 1 � d  j = 1 s j ≤ 1 , s j ≥ 0 j = 1 , . . . , d Adaptive ridge penalty Statlearn’11 Sparsity in Learning Y. Grandvalet 15

Statistical Learning Parsimony Variable Space Example Space Conclusions Embedded LASSO Geometric Insights Examples Ball crafting Coop-Lasso Geometric Insight on Sparsity Constrained Optimization L ( β 1 , β 2 ) max β 1 ,β 2 L ( β 1 , β 2 ) − λ Ω( β 1 , β 2 ) � max L ( β 1 , β 2 ) ⇔ β 1 ,β 2 Ω( β 1 , β 2 ) ≤ c s . t . β 2 β 1 Statlearn’11 Sparsity in Learning Y. Grandvalet 16

Statistical Learning Parsimony Variable Space Example Space Conclusions Embedded LASSO Geometric Insights Examples Ball crafting Coop-Lasso Geometric Insight on Sparsity Supporting Hyperplane An hyperplane supports a set iff ● the set is contained in one half-space ● the set has at least one point on the hyperplane β 2 β 2 β 2 β 2 β 2 β 1 β 1 β 1 β 1 β 1 There are Supporting Hyperplane at all points of convex sets: Generalize tangents Statlearn’11 Sparsity in Learning Y. Grandvalet 17

Statistical Learning Parsimony Variable Space Example Space Conclusions Embedded LASSO Geometric Insights Examples Ball crafting Coop-Lasso Geometric Insight on Sparsity Dual Cone Generalizes normals β 2 β 2 β 2 β 2 β 2 β 2 β 1 β 1 β 1 β 1 β 1 β 1 Shape of dual cones ⇒ sparsity pattern Statlearn’11 Sparsity in Learning Y. Grandvalet 18

Statistical Learning Parsimony Variable Space Example Space Conclusions Embedded LASSO Geometric Insights Examples Ball crafting Coop-Lasso Expression Recognition Logistic Regression Surprise Anger Sadness Happiness Fear Disgust Surprise Anger Sadness Happiness Fear Disgust Surprise Surprise Anger Anger Sadness Sadness Happiness Happiness Fear Fear Disgust Disgust Surprise Sadness Happiness Surprise Sadness Happiness Statlearn’11 Sparsity in Learning Y. Grandvalet 19

Statistical Learning Parsimony Variable Space Example Space Conclusions Embedded LASSO Geometric Insights Examples Ball crafting Coop-Lasso Prediction of Response to Chemotherapy Logistic Regression 6 4 β j � 2 0 probe sets/genes No coherent pattern Statlearn’11 Sparsity in Learning Y. Grandvalet 20

Statistical Learning Parsimony Variable Space Example Space Conclusions Embedded LASSO Geometric Insights Examples Ball crafting Coop-Lasso Ball crafting Group sparsity ridge lasso group-lasso hierarchies coop-lasso ● Additive models (Grandvalet & Canu 1999, Bakin, 1999) ❍ Adaptive metric ⇒ 1 or 2 hyper-parameters (compared to d ) ❍ Ease to implementation, interpretability ● Multiple/Composite Kernel Learning (Lanckriet et al. , 2004, Szafranski et al. , 2010) ❍ Adaptive metric: “learn the kernel” ⇒ 1 hyper-parameter ❍ CKL takes into account a group structure on kernels ● Sign-coherent groups ❍ Multi-task learning for pathway inference (Chiquet et al. , 2010) ❍ Prediction from cooperative features (Chiquet et al. , 2011) Statlearn’11 Sparsity in Learning Y. Grandvalet 21

Statistical Learning Parsimony Variable Space Example Space Conclusions Embedded LASSO Geometric Insights Examples Ball crafting Coop-Lasso Group-Lasso  � n  1   min ℓ ( f ( x i ; β ) , y i )   n  β i = 1   1 / 2 ,  K � �   β 2   s. t. ≤ c   j k = 1 j ∈G k where {G k } K k = 1 forms a partion of { 1 , . . . , d } Sparse solution groupwise No sign-coherence Statlearn’11 Sparsity in Learning Y. Grandvalet 22

Sparsity in Learning Y. Grandvalet Heudiasyc, C NRS & Universit - PowerPoint PPT Presentation

Sparsity in Learning Y. Grandvalet Heudiasyc, C NRS & Universit e de Technologie de Compi` egne Statistical Learning Parsimony Variable Space Example Space Conclusions Statistical Learning Regression Classification

Sparsity, Randomness and Compressed Sensing Petros Boufounos Mitsubishi Electric Research Labs

Introduction to Sparsity in Modeling and Learning Introduction to Sparsity in Modeling and

Sparsity and image processing Aurlie Boisbunon INRIA-SAM, AYIN March 26, 2014 Why sparsity?

Sparsity in Information Theory and Biology Olgica Milenkovic ECE Department, UIUC Joint work

Sparsity and optimality of splines: Deterministic vs. statistical justification Michael Unser

Blind Image Deconvolution Need for Theoretical . . . Based on Sparsity: Need for Improvement

RAMSEY CLASSES SPARSITY AND MODELS FOR FINITE - NESIETPIIL JAROSLAV UNIVERSITY CHARLES

Structured sparsity and convex optimization Francis Bach INRIA - Ecole Normale Sup erieure,

Sparsity-aware sampling theorems and applications Rachel Ward University of Texas at Austin

Structural Sparsity Jaroslav Neetil Patrice Ossona de Mendez Charles University CAMS,

The Sparsity Gap Joel A. Tropp Computing & Mathematical Sciences California Institute

Computing sparsity stuff in real world graphs Marcin Pilipczuk a lot of slides by Wojciech Nadara

Commonsense Explanations Zipfs Law: A Brief . . . of Sparsity, Zipf Law, and Main Idea Behind

Accelerating Sparse DNN Models without Hardware-Support via Tile-Wise Sparsity 2020/11

RegML 2016 Class 6 Structured sparsity Lorenzo Rosasco UNIGE-MIT-IIT June 30, 2016 Exploiting

RegML2017@SIMULA Oslo Class 7 Structured sparsity Lorenzo Rosasco UNIGE-MIT-IIT May 4, 2017

Types & Bandwidth is the throughput of a communication resource, Types & measured in

DRAFT This paper is a draft submission to Inequality Measurement, trends, impacts, and

Manage Your Time and Energy: A Path to Personal Sustainability WEBINAR: FEBRUARY 11, 2020

CS 331: Artificial Intelligence Reasoning Intelligent Agents Actions Actuators This part is

would set the tone with your VISTA term. This section will build on that knowledge as we

Finite-State Morphology CMSC 723 / LING 723 / INST 725 M ARINE C ARPUAT marine@cs.umd.edu

Sentiment analysis Christopher Potts CS 244U: Natural language understanding May 19 1 / 83

Educational Strategies: Best Practices in Adult Learning Shawna Faber, Ph.D. Tell Tell me me

Sparsity in Learning Y. Grandvalet Heudiasyc, C NRS & Universit - PowerPoint PPT Presentation

Sparsity in Learning Y. Grandvalet Heudiasyc, C NRS & Universit e de Technologie de Compi` egne Statistical Learning Parsimony Variable Space Example Space Conclusions Statistical Learning Regression Classification

Sparsity, Randomness and Compressed Sensing Petros Boufounos Mitsubishi Electric Research Labs

Introduction to Sparsity in Modeling and Learning Introduction to Sparsity in Modeling and

Sparsity and image processing Aurlie Boisbunon INRIA-SAM, AYIN March 26, 2014 Why sparsity?

Sparsity in Information Theory and Biology Olgica Milenkovic ECE Department, UIUC Joint work

Sparsity and optimality of splines: Deterministic vs. statistical justification Michael Unser

Blind Image Deconvolution Need for Theoretical . . . Based on Sparsity: Need for Improvement

RAMSEY CLASSES SPARSITY AND MODELS FOR FINITE - NESIETPIIL JAROSLAV UNIVERSITY CHARLES

Structured sparsity and convex optimization Francis Bach INRIA - Ecole Normale Sup erieure,

Sparsity-aware sampling theorems and applications Rachel Ward University of Texas at Austin

Structural Sparsity Jaroslav Neetil Patrice Ossona de Mendez Charles University CAMS,

The Sparsity Gap Joel A. Tropp Computing &amp; Mathematical Sciences California Institute

Computing sparsity stuff in real world graphs Marcin Pilipczuk a lot of slides by Wojciech Nadara

Commonsense Explanations Zipfs Law: A Brief . . . of Sparsity, Zipf Law, and Main Idea Behind

Accelerating Sparse DNN Models without Hardware-Support via Tile-Wise Sparsity 2020/11

RegML 2016 Class 6 Structured sparsity Lorenzo Rosasco UNIGE-MIT-IIT June 30, 2016 Exploiting

RegML2017@SIMULA Oslo Class 7 Structured sparsity Lorenzo Rosasco UNIGE-MIT-IIT May 4, 2017

Types &amp; Bandwidth is the throughput of a communication resource, Types &amp; measured in

DRAFT This paper is a draft submission to Inequality Measurement, trends, impacts, and

Manage Your Time and Energy: A Path to Personal Sustainability WEBINAR: FEBRUARY 11, 2020

CS 331: Artificial Intelligence Reasoning Intelligent Agents Actions Actuators This part is

would set the tone with your VISTA term. This section will build on that knowledge as we

Finite-State Morphology CMSC 723 / LING 723 / INST 725 M ARINE C ARPUAT marine@cs.umd.edu

Sentiment analysis Christopher Potts CS 244U: Natural language understanding May 19 1 / 83

Educational Strategies: Best Practices in Adult Learning Shawna Faber, Ph.D. Tell Tell me me

The Sparsity Gap Joel A. Tropp Computing & Mathematical Sciences California Institute

Types & Bandwidth is the throughput of a communication resource, Types & measured in