RegML2017@SIMULA Oslo Class 7 Structured sparsity Lorenzo Rosasco - PowerPoint PPT Presentation

RegML2017@SIMULA Oslo Class 7 Structured sparsity Lorenzo Rosasco UNIGE-MIT-IIT May 4, 2017

Exploiting structure Building blocks of a function can be more structured than single variables L.Rosasco - RegML2017@SIMULA

Sparsity Variables divided in non-overlapping groups L.Rosasco - RegML2017@SIMULA

Group sparsity ◮ f ( x ) = � d j =1 w j x j ◮ w = ( w 1 , . . . , . . . , . . . , w d ) � �� w (1) w ( G ) ◮ each group G g has size |G g | , so w ( g ) ∈ R |G g | L.Rosasco - RegML2017@SIMULA

Group sparsity regularization Regularization exploiting structure � � |G g | � G � G � � � ( w ( g )) 2 R group ( w ) = � w ( g ) � = j g =1 g =1 j =1 L.Rosasco - RegML2017@SIMULA

Group sparsity regularization Regularization exploiting structure � � |G g | � G � G � � � ( w ( g )) 2 R group ( w ) = � w ( g ) � = j g =1 g =1 j =1 Compare to |G g | G G � � � � w ( g ) � 2 = ( w ( g )) 2 j g =1 g =1 j =1 L.Rosasco - RegML2017@SIMULA

Group sparsity regularization Regularization exploiting structure � � |G g | � G � G � � � ( w ( g )) 2 R group ( w ) = � w ( g ) � = j g =1 g =1 j =1 Compare to |G g | G G � � � � w ( g ) � 2 = ( w ( g )) 2 j g =1 g =1 j =1 or |G g | G G � � � � w ( g ) � 2 = | ( w ( g )) j | g =1 g =1 j =1 L.Rosasco - RegML2017@SIMULA

ℓ 1 − ℓ 2 norm We take the ℓ 2 norm of all the groups ( � w (1) � , . . . , � w ( G ) � ) and then the ℓ 1 norm of the above vector � G � w ( g ) � g =1 L.Rosasco - RegML2017@SIMULA

Groups lasso G � 1 y � 2 + λ n � ˆ min Xw − ˆ � w ( g ) � w g =1 ◮ reduces to the Lasso if groups have cardinality one L.Rosasco - RegML2017@SIMULA

Computations G � 1 y � 2 + λ n � ˆ min Xw − ˆ � w ( g ) � w g =1 � �� non differentiable Convex, non-smooth, but composite structure � � w t − γ 2 X ⊤ ( ˆ ˆ w t +1 = Prox γλR group Xw t − ˆ y ) n L.Rosasco - RegML2017@SIMULA

Block thresholding It can be shown that Prox λR group ( w ) = (Prox λ �·� ( w (1)) , . . . , Prox λ �·� ( w ( G )) � w ( g ) j − λ w ( g ) j � w ( g ) � > λ (Prox λ �·� ( w ( g ))) j = � w ( g ) � 0 � w ( g ) � ≤ λ ◮ Entire groups of coefficients set to zero! ◮ Reduces to softhresholding if groups have cardinality one L.Rosasco - RegML2017@SIMULA

Other norms ℓ 1 − ℓ p norms   1 p |G g | � G � G � ( w ( g )) p   R ( w ) = � w ( g ) � p = j g =1 g =1 j =1 L.Rosasco - RegML2017@SIMULA

Overlapping groups Variables divided in possibly overlapping groups L.Rosasco - RegML2017@SIMULA

Regularization with overlapping groups Group Lasso G � R GL ( w ) = � w ( g ) � g =1 L.Rosasco - RegML2017@SIMULA

Regularization with overlapping groups Group Lasso G � R GL ( w ) = � w ( g ) � g =1 → The selected variables are union of group complements L.Rosasco - RegML2017@SIMULA

Regularization with overlapping groups w ( g ) ∈ R d be equal to w ( g ) on group G g and zero otherwise Let ¯ L.Rosasco - RegML2017@SIMULA

Regularization with overlapping groups w ( g ) ∈ R d be equal to w ( g ) on group G g and zero otherwise Let ¯ Group Lasso with overlap � G � G � � R GLO ( w ) = inf � w ( g ) � | w (1) , . . . , w ( g ) s . t . w = w ( g ) ¯ g =1 g =1 L.Rosasco - RegML2017@SIMULA

Regularization with overlapping groups w ( g ) ∈ R d be equal to w ( g ) on group G g and zero otherwise Let ¯ Group Lasso with overlap � G � G � � R GLO ( w ) = inf � w ( g ) � | w (1) , . . . , w ( g ) s . t . w = w ( g ) ¯ g =1 g =1 ◮ Multiple ways to write w = � G g =1 ¯ w ( g ) L.Rosasco - RegML2017@SIMULA

Regularization with overlapping groups w ( g ) ∈ R d be equal to w ( g ) on group G g and zero otherwise Let ¯ Group Lasso with overlap � G � G � � R GLO ( w ) = inf � w ( g ) � | w (1) , . . . , w ( g ) s . t . w = w ( g ) ¯ g =1 g =1 ◮ Multiple ways to write w = � G g =1 ¯ w ( g ) ◮ Selected variables are groups! L.Rosasco - RegML2017@SIMULA

An equivalence It holds G � 1 1 y � 2 + λR GLO ( w ) ⇔ min y � 2 + λ n � ˆ n � ˜ min Xw − ˆ X ˜ w − ˆ � w ( g ) � w w ˜ g =1 ◮ ˜ X is the matrix obtained by replicating columns/variables ◮ ˜ w = ( w (1) , . . . , w ( G )) , vector with (nonoverlapping!) groups L.Rosasco - RegML2017@SIMULA

An equivalence (cont.) Indeed � G 1 y � 2 + λ n � ˆ min Xw − ˆ inf � w ( g ) � = w w (1) ,...,w ( g ) g =1 s.t. � G g =1 ¯ w ( g )= w G � 1 y � 2 + λ n � ˆ inf Xw − ˆ � w ( g ) � = w (1) ,...,w ( g ) g =1 s.t. � G g =1 ¯ w ( g )= w � G � G 1 y � 2 + λ n � ˆ inf X ( w ( g )) − ˆ ¯ � w ( g ) � = w (1) ,...,w ( g ) g =1 g =1 G G � � 1 y � 2 + λ ˆ inf n � X |G g w ( g ) − ˆ � w ( g ) � = w (1) ,...,w ( g ) g =1 g =1 G � 1 y � 2 + λ n � ˜ min X ˜ w − ˆ � w ( g ) � w ˜ g =1 L.Rosasco - RegML2017@SIMULA

Computations ◮ Can use block thresholding with replicated variables = ⇒ potentially wasteful ◮ The proximal operator for R GLO can be computed efficiently but not in closed form L.Rosasco - RegML2017@SIMULA

More structure Structured overlapping groups ◮ trees ◮ DAG ◮ . . . Structure can be exploited in computations. . . L.Rosasco - RegML2017@SIMULA

Beyond linear models Consider a dictionary made by union of distinct dictionaries G G � � Φ g ( x ) ⊤ w ( g ) , f ( x ) = f g ( x ) � �� = g =1 g =1 where each dictionary defines a feature map Φ g ( x ) = ( φ g 1 ( x ) , . . . , φ g p g ( x )) Easy extension with usual change of variable... L.Rosasco - RegML2017@SIMULA

Representer theorems Let G G G � � � x ( g ) ⊤ ¯ f ( x ) = x ⊤ ( w ( g )) = ¯ ¯ w ( g ) = f g ( x ) , g =1 g =1 g =1 L.Rosasco - RegML2017@SIMULA

Representer theorems Let G G G � � � x ( g ) ⊤ ¯ f ( x ) = x ⊤ ( w ( g )) = ¯ ¯ w ( g ) = f g ( x ) , g =1 g =1 g =1 Idea Show that n � w ( g ) = ¯ x ( g ) i c ( g ) i , ¯ i =1 i.e. n n � � x ( g ) ⊤ ¯ x ( g ) ⊤ x ( g ) i f g ( x ) = ¯ x ( g ) i c ( g ) i = c ( g ) i � �� i =1 i =1 Φ g ( x ) ⊤ Φ g ( x i )= K g ( x,x i ) L.Rosasco - RegML2017@SIMULA

Representer theorems Let G G G � � � x ( g ) ⊤ ¯ f ( x ) = x ⊤ ( w ( g )) = ¯ ¯ w ( g ) = f g ( x ) , g =1 g =1 g =1 Idea Show that n � w ( g ) = ¯ x ( g ) i c ( g ) i , ¯ i =1 i.e. n n � � x ( g ) ⊤ ¯ x ( g ) ⊤ x ( g ) i f g ( x ) = ¯ x ( g ) i c ( g ) i = c ( g ) i � �� i =1 i =1 Φ g ( x ) ⊤ Φ g ( x i )= K g ( x,x i ) Note that in this case � f g � 2 = � w ( g ) � 2 = c ( g ) ⊤ ˆ X ( g ) ˆ X ( g ) ⊤ c ( g ) � �� ˆ K ( g ) L.Rosasco - RegML2017@SIMULA

Coefficients update � � c t − γ ( ˆ c t +1 = Prox γλR group Kc t − ˆ y )) where ˆ K = ( ˆ K (1) , . . . , ˆ K ( G )) , and c t = ( c t (1) , . . . , c t ( G )) Block Thresholding It can be shown that  c ( g ) j − λ c ( g ) j �  � f g � > λ    c ( g ) ⊤ ˆ K ( g ) c ( g ) (Prox λ �·� ( c ( g ))) j = � ��   � fg �   0 � f g � ≤ λ L.Rosasco - RegML2017@SIMULA

Non-parametric sparsity G � f ( x ) = f g ( x ) g =1 n n � � x ( g ) ⊤ x ( g ) i ( c ( g )) i �→ f g ( x ) = f g ( x ) = K g ( x, x i )( c ( g )) i i =1 i =1 ( K 1 , . . . , K G ) family of kernels G G � � � w ( g ) � = ⇒ � f g � K g g =1 g =1 L.Rosasco - RegML2017@SIMULA

ℓ 1 MKL � G 1 y � 2 + λ n � ˆ inf Xw − ˆ � w ( g ) � = w (1) ,...,w ( g ) g =1 s.t. � G g =1 ¯ w ( g )= w ⇓ n G � � 1 ( y i − f ( x i )) 2 + λ min � f g � K g n f 1 , . . . , f g i =1 g =1 s . t . � G g =1 f g = f L.Rosasco - RegML2017@SIMULA

ℓ 2 MKL G G � � � w ( g ) � 2 = � f g � 2 ⇒ K g g =1 g =1 Corresponds to using the kernel G � K ( x, x ′ ) = K g ( x, x ′ ) g =1 L.Rosasco - RegML2017@SIMULA

ℓ 1 or ℓ 2 MKL ◮ ℓ 2 *much* faster ◮ ℓ 1 could be useful is only few kernels are relevant L.Rosasco - RegML2017@SIMULA

Why MKL? ◮ Data fusion– different features ◮ Model selection, e.g. gaussian kernels with different widths ◮ Richer model– many kernels! L.Rosasco - RegML2017@SIMULA

MKL & kernel learning It can be shown that n G � � 1 ( y i − f ( x i )) 2 + λ min � f g � K g n f 1 , . . . , f g i =1 g =1 s . t . � G g =1 f g = f � n � 1 ( y i − f ( x i )) 2 + λ � f � 2 K ∈K min min K n f ∈H K i =1 where K = { K | K = � α g ≥ 0 , } g K g α g , L.Rosasco - RegML2017@SIMULA

Sparsity beyond vectors Recall multi-variable regression x i ∈ R d , y i ∈ R T ( x i , y i ) i =1 n , f ( x ) = x ⊤ W �� d × T W � ˆ XW − ˆ Y � 2 F + λ Tr ( WAW ⊤ ) min L.Rosasco - RegML2017@SIMULA

Sparse regularization ◮ We have seen � d � T Tr ( WW ⊤ ) = ( W t,j ) 2 j =1 t =1 ◮ We could consider now � d � T | W t,j | j =1 t =1 ◮ . . . L.Rosasco - RegML2017@SIMULA

RegML2017@SIMULA Oslo Class 7 Structured sparsity Lorenzo Rosasco - PowerPoint PPT Presentation

RegML2017@SIMULA Oslo Class 7 Structured sparsity Lorenzo Rosasco UNIGE-MIT-IIT May 4, 2017 Exploiting structure Building blocks of a function can be more structured than single variables L.Rosasco - RegML2017@SIMULA Sparsity Variables

Heimdallr: A Dataset Michael Riegler, Simula & UiO Duc-Tien Dang-Nguyen, University of Trento

A STRUCTURED L IFE A STRUCTURED L IFE A STRUCTURED L IFE A STRUCTURED L IFE A STRUCTURED L IFE

Sparsity, Randomness and Compressed Sensing Petros Boufounos Mitsubishi Electric Research Labs

Structured Prediction Introduction What is structured prediction? CS 6355: Structured Prediction

This is Simula Michael Riegler (Senior Researcher) SimulaMet michael@simula.no BOARD OF

Structured sparsity and convex optimization Francis Bach INRIA - Ecole Normale Sup erieure,

RegML 2016 Class 6 Structured sparsity Lorenzo Rosasco UNIGE-MIT-IIT June 30, 2016 Exploiting

Introduction to Sparsity in Modeling and Learning Introduction to Sparsity in Modeling and

Sparsity and image processing Aurlie Boisbunon INRIA-SAM, AYIN March 26, 2014 Why sparsity?

Simula Where would we be without it? Historic Fun-facts Ole-Johan Dahl and Kristen Nygaard

The inverse ischemia problem; mathematical models and validation Bjrn Fredrik Nielsen (Simula)

Simula and Smalltalk First object-oriented language Designed for simulation Later

ADC simula+on tools ProtoDUNE Simula+on and Reconstruc+on David

Scaling Log-Structured KV-Stores featuring Monkey and Dostoevsky SIGMOD17 / SIGMOD18 Niv Dayan

Machine Learning Fall 2017 Structured Prediction (structured perceptron, HMM, structured SVM)

L101: Introduction to Structured Prediction Ryan Cotterell What is structured prediction?

Mul$pac$ng simula$on of MICE 201 MHz cavity Tianhuan Luo

t s rr sr

A Brief History of HPC Simula4on and Future Challenges

Et i tex.4eco.k. AA) , Ctroyele.:r)Ytdo CDEEP IIT Bombay b :40 CE623 L 3 2 - 12 / Slide

Cosmological Simula-ons WG: ac-vity and agenda overview

Isogeny Based Cryptography: an Introduction Luca De Feo IBM Research Zrich November 18, 2019

Generic Enterprise Simulation using an In-Memory Column Store Lars Butzmann, Stefan

Charades An Adap've Parallel Discrete Event Simula'on Framework

Sambuz

Useful Links

Newsletter

Mail Us

RegML2017@SIMULA Oslo Class 7 Structured sparsity Lorenzo Rosasco - PowerPoint PPT Presentation

RegML2017@SIMULA Oslo Class 7 Structured sparsity Lorenzo Rosasco UNIGE-MIT-IIT May 4, 2017 Exploiting structure Building blocks of a function can be more structured than single variables L.Rosasco - RegML2017@SIMULA Sparsity Variables

Heimdallr: A Dataset Michael Riegler, Simula &amp; UiO Duc-Tien Dang-Nguyen, University of Trento

A STRUCTURED L IFE A STRUCTURED L IFE A STRUCTURED L IFE A STRUCTURED L IFE A STRUCTURED L IFE

Sparsity, Randomness and Compressed Sensing Petros Boufounos Mitsubishi Electric Research Labs

Structured Prediction Introduction What is structured prediction? CS 6355: Structured Prediction

This is Simula Michael Riegler (Senior Researcher) SimulaMet michael@simula.no BOARD OF

Structured sparsity and convex optimization Francis Bach INRIA - Ecole Normale Sup erieure,

RegML 2016 Class 6 Structured sparsity Lorenzo Rosasco UNIGE-MIT-IIT June 30, 2016 Exploiting

Introduction to Sparsity in Modeling and Learning Introduction to Sparsity in Modeling and

Sparsity and image processing Aurlie Boisbunon INRIA-SAM, AYIN March 26, 2014 Why sparsity?

Simula Where would we be without it? Historic Fun-facts Ole-Johan Dahl and Kristen Nygaard

The inverse ischemia problem; mathematical models and validation Bjrn Fredrik Nielsen (Simula)

Simula and Smalltalk First object-oriented language Designed for simulation Later

ADC simula+on tools ProtoDUNE Simula+on and Reconstruc+on David

Scaling Log-Structured KV-Stores featuring Monkey and Dostoevsky SIGMOD17 / SIGMOD18 Niv Dayan

Machine Learning Fall 2017 Structured Prediction (structured perceptron, HMM, structured SVM)

L101: Introduction to Structured Prediction Ryan Cotterell What is structured prediction?

Mul$pac$ng simula$on of MICE 201 MHz cavity Tianhuan Luo

t s rr sr

A Brief History of HPC Simula4on and Future Challenges

Et i tex.4eco.k. AA) , Ctroyele.:r)Ytdo CDEEP IIT Bombay b :40 CE623 L 3 2 - 12 / Slide

Cosmological Simula-ons WG: ac-vity and agenda overview

Isogeny Based Cryptography: an Introduction Luca De Feo IBM Research Zrich November 18, 2019

Generic Enterprise Simulation using an In-Memory Column Store Lars Butzmann, Stefan

Charades An Adap've Parallel Discrete Event Simula'on Framework

Sambuz

Useful Links

Newsletter

Mail Us

Heimdallr: A Dataset Michael Riegler, Simula & UiO Duc-Tien Dang-Nguyen, University of Trento