Duality in vv-RKHSs with Infinite Dimensional Outputs: Application - PowerPoint PPT Presentation

Duality in vv-RKHSs with Infinite Dimensional Outputs: Application to Robust Losses Pierre Laforgue , Alex Lambert, Luc Brogat-Motte, Florence d’Alch´ e-Buc LTCI, T´ el´ ecom Paris, Institut Polytechnique de Paris, France 1/25

Outline Motivations A duality theory for general OVKs Robust losses as convolutions Experiments Conclusion 2/25

Motivation 1: structured prediction by surrogate approach Kernel trick in the input space. Kernel trick in the output space [Cortes ’05, Geurts ’06, Brouard ’11, Kadri ’13, Brouard ’16] , Input Output Kernel Regression (IOKR). � n � � φ ( y i ) − h ( x i ) � � h ( x ) � 1 + Λ � 2 � φ ( y ) − ˆ � ˆ 2 � h � 2 h = argmin HK , g ( x ) = argmin 2 n FY FY h ∈HK y ∈Y i =1 2/25

− − − Motivation 2: function to function regression EMG curves Lip acceleration curves 3 2 2 1.5 1 Millivolts Meters/s 2 0 1 − 1 0.5 − 2 − 3 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0 0.1 0.2 0.3 0.4 0.5 0.6 seconds seconds � n � � 1 L 2 + Λ � 2 � y i − h ( x i ) 2 � h � 2 min [Kadri et al., 2016] 2 n h ∈H K i =1 And many more! e.g. structured data autoencoding [Laforgue et al., 2019] � n � � 1 � 2 � φ ( x i ) − h 2 ◦ h 1 ( φ ( x i )) min F X + Λ Reg( h 1 , h 2 ) . 2 n h 1 , h 2 ∈H 1 K ×H 2 K i =1 3/25

Purpose of this work Question: Is it possible to extend the previous approaches to different (ideally robust) loss functions? First answer: Yes, possible extension to maximum-margin regression [Brouard et al., 2016], and ǫ -insensitive loss functions for matrix-valued kernels [Sangnier et al., 2017] What about general Operator-Valued Kernels (OVKs)? What about other types of loss functions? 4/25

Learning in vector-valued RKHSs (vv-RKHSs) � K ( x , x ′ ) = K ( x ′ , x ) ∗ , • K : X × X → L ( Y ), i , j � y i , K ( x i , x j ) y j � Y ≥ 0 • Unique vv-RKHS H K ⊂ F ( X , Y ) , H K = Span {K ( · , x ) y : x , y ∈ X × Y} • Ex: decomposable OVK K ( x , x ′ ) = k ( x , x ′ ) A , with k scalar, A p.s.d. on Y 5/25

Learning in vector-valued RKHSs (vv-RKHSs) � K ( x , x ′ ) = K ( x ′ , x ) ∗ , • K : X × X → L ( Y ), i , j � y i , K ( x i , x j ) y j � Y ≥ 0 • Unique vv-RKHS H K ⊂ F ( X , Y ) , H K = Span {K ( · , x ) y : x , y ∈ X × Y} • Ex: decomposable OVK K ( x , x ′ ) = k ( x , x ′ ) A , with k scalar, A p.s.d. on Y i =1 ∈ ( X × Y ) n with Y a Hilbert space, we want to find: • For { ( x i , y i ) } n � n ℓ � � 1 + Λ ˆ 2 � h � 2 h ∈ argmin h ( x i ) , y i H K . n h ∈H K i =1 Representer Theorem [Micchelli and Pontil, 2005]: � n i =1 ∈ Y n (infinite dimensional!) ˆ α i ) n ∃ (ˆ s . t . h ( x ) = K ( · , x i )ˆ α i . i =1 α i = � n When ℓ ( · , · ) = 1 2 � · − · � 2 A = ( K + n Λ I n ) − 1 . Y , K = k · I Y : ˆ j =1 A ij y j , 5/25

Applying duality � n � n 1 ℓ i ( h ( x i )) + Λ h = 1 ˆ 2 � h � 2 ˆ h ∈ argmin is given by K ( · , x i )ˆ α i , H K n Λ n h ∈H K i =1 i =1 i =1 ∈ Y n the solutions to the dual problem : α i ) n with (ˆ � n � n 1 ℓ ⋆ min i ( − α i ) + � α i , K ( x i , x j ) α j � Y , 2Λ n ( α i ) n i =1 ∈Y n i =1 i , j =1 with f ⋆ : α ∈ Y �→ sup y ∈Y � α, y � Y − f ( y ) the Fenchel-Legendre transform of f . 6/25

Applying duality � n � n 1 ℓ i ( h ( x i )) + Λ h = 1 ˆ 2 � h � 2 ˆ h ∈ argmin is given by K ( · , x i )ˆ α i , H K n Λ n h ∈H K i =1 i =1 i =1 ∈ Y n the solutions to the dual problem : α i ) n with (ˆ � n � n 1 ℓ ⋆ min i ( − α i ) + � α i , K ( x i , x j ) α j � Y , 2Λ n ( α i ) n i =1 ∈Y n i =1 i , j =1 with f ⋆ : α ∈ Y �→ sup y ∈Y � α, y � Y − f ( y ) the Fenchel-Legendre transform of f . • 1st limitation: the FL transform ℓ ⋆ needs to be computable ( → assumption) • 2nd limitation : the dual variables ( α i ) n i =1 are still infinite dimensional! 6/25

Applying duality � n � n 1 ℓ i ( h ( x i )) + Λ h = 1 ˆ 2 � h � 2 ˆ h ∈ argmin is given by K ( · , x i )ˆ α i , H K n Λ n h ∈H K i =1 i =1 i =1 ∈ Y n the solutions to the dual problem : α i ) n with (ˆ � n � n 1 ℓ ⋆ min i ( − α i ) + � α i , K ( x i , x j ) α j � Y , 2Λ n ( α i ) n i =1 ∈Y n i =1 i , j =1 with f ⋆ : α ∈ Y �→ sup y ∈Y � α, y � Y − f ( y ) the Fenchel-Legendre transform of f . • 1st limitation: the FL transform ℓ ⋆ needs to be computable ( → assumption) • 2nd limitation : the dual variables ( α i ) n i =1 are still infinite dimensional! If Y = Span { y j , j ≤ n } invariant by K , i.e. ∀ ( x , x ′ ) , y ∈ Y ⇒ K ( x , x ′ ) y ∈ Y : α i = � then ˆ α i ∈ Y → possible reparametrization: ˆ j ˆ ω ij y j 6/25

The double representer theorem (1/2) Assume that OVK K and loss ℓ satisfy the appropriate assumptions (see paper for details, verified by standard kernels and losses), then � 1 ℓ ( h ( x i ) , y i ) + Λ ˆ 2 � h � 2 h = argmin H K is given by n H K i � n h = 1 ˆ K ( · , x i ) ˆ ω ij y j , Λ n i , j =1 ω ij ] ∈ R n × n the solution to the finite dimensional problem with ˆ Ω = [ˆ � n � Ω i : , K Y � 2Λ n Tr � ˜ M ⊤ (Ω ⊗ Ω) � 1 min L i + , Ω ∈ R n × n i =1 M the n 2 × n 2 matrix writing of M s.t. M ijkl = � y k , K ( x i , x j ) y l � Y . with ˜ 7/25

The double representer theorem (2/2) If K further satisfies K ( x , x ′ ) = � t k t ( x , x ′ ) A t , then tensor M simplifies to M ijkl = � t [ K X t ] ij [ K Y t ] kl and the problem rewrites � n � T � Ω i : , K Y � Tr � t Ω ⊤ � 1 K X t Ω K Y min L i + . 2Λ n Ω ∈ R n × n i =1 t =1 Rmk. Only need the n 4 tensor � y k , K ( x i , x j ) y l � Y to learn OVKMs. Rmk. Simplifies to 2 n 2 matrices K X ij K Y kl if K is decomposable. How to apply the duality approach? 8/25

Infimal convolution and Fenchel-Legendre transforms Infimal-convolution operator � between proper lower semicontinuous functions [Bauschke et al., 2011]: ( f � g )( x ) = inf y f ( y ) + g ( x − y ) . Relation to FL transform: ( f � g ) ⋆ = f ⋆ + g ⋆ Ex: ǫ -insensitive losses. Let ℓ : Y → R be a convex loss with unique minimum at 0, and ǫ > 0. The ǫ -insensitive version of ℓ , denoted ℓ ǫ , is defined by: � ℓ (0) if � y � Y ≤ ǫ ℓ ǫ ( y ) = ( ℓ � χ B ǫ ) ( y ) = , � d � Y ≤ 1 ℓ ( y − ǫ d ) inf otherwise and has FL transform: ǫ ( y ) = ( ℓ � χ B ǫ ) ⋆ ( y ) = ℓ ⋆ ( y ) + ǫ � y � . ℓ ⋆ 9/25

Interesting loss functions: sparsity and robustness ǫ -Ridge ǫ -SVR κ -Huber 5 jj x jj 2 jj x jj 2 1 1 2 jj x jj 2 12 ² -insensitive 4 Huber loss ² -insensitive 4 10 3 8 3 6 2 2 4 1 1 2 0 0 0 −4 −2 0 2 4 −4 −2 0 2 4 −4 −2 0 2 4 4 12 12 3 9 9 2 6 6 1 3 3 −3 −3 −3 −1 −1 −1 0 0 0 1 1 1 2 3 2 3 2 3 1 1 1 3 −1 0 3 −1 0 3 −1 0 −3 −2 −2 −3 −2 −3 2 � · � 2 � χ B ǫ 1 κ � · � � 1 2 � · � 2 � · � � χ B ǫ (Sparsity) (Sparsity, Robustness) (Robustness) 10/25

Specific dual problems For the ǫ -ridge, ǫ -SVR and κ -Huber, it holds ˆ Ω = ˆ W V − 1 , with ˆ W the solution to these finite dimensional dual problems: 1 2 � AW − B � 2 ( D 1) min Fro + ǫ � W � 2 , 1 , W ∈ R n × n 1 2 � AW − B � 2 ( D 2) min Fro + ǫ � W � 2 , 1 , W ∈ R n × n s.t. � W � 2 , ∞ ≤ 1 , 1 2 � AW − B � 2 ( D 3) min Fro , W ∈ R n × n s.t. � W � 2 , ∞ ≤ κ, with V , A , B such that: VV ⊤ = K Y , A ⊤ A = K X / (Λ n ) + I n (or A ⊤ A = K X / (Λ n ) for the ǫ -SVR), and A ⊤ B = V . 11/25

Surrogate approaches for structured prediction • Experiments on YEAST dataset • Empirically, ǫ -SV-IOKR outperforms ridge-IOKR for a wide range of ǫ • Promotes sparsity and acts as a regularizer Sparsity w.r.t. ¤ for different ² ( ² -SVR) Comparison ² -SVR / KRR 1.0 2.6 1.0 3.5 KRR 0.9 0.9 2.5 3.0 Sparsity (% null components) 0.8 0.8 2.5 0.7 2.4 0.7 Test MSE 2.0 0.6 0.6 2.3 ² ² 0.5 0.5 1.5 2.2 0.4 0.4 1.0 0.3 0.3 0.5 2.1 0.2 0.2 0.0 2.0 0.1 0.1 10 -8 10 -7 10 -6 10 -5 10 -4 10 -3 10 -2 10 -1 10 -8 10 -7 10 -6 10 -5 10 -4 10 -3 10 -2 10 -1 ¤ ¤ Figure 1: MSEs and sparsity w.r.t. Λ for several ǫ 13/25

Robust function-to-function regression Task from [Kadri et al., 2016]: predict lip acceleration from EMG signals. • Dataset augmented with outliers, model learned with Huber loss • Improvement for every output size M (see paper for approximation) 0 . 900 m=4 m=5 m=6 0 . 875 LOO generalization error m=7 m=15 Ridge Regression ( κ = + ∞ ) 0 . 850 0 . 825 0 . 800 0 . 775 0 . 750 0 . 0 0 . 5 1 . 0 1 . 5 κ Figure 2: LOO generalization error w.r.t. κ 14/25

Duality in vv-RKHSs with Infinite Dimensional Outputs: Application - PowerPoint PPT Presentation

Duality in vv-RKHSs with Infinite Dimensional Outputs: Application to Robust Losses Pierre Laforgue , Alex Lambert, Luc Brogat-Motte, Florence dAlch e-Buc LTCI, T el ecom Paris, Institut Polytechnique de Paris, France 1/25 Outline

VARIATIONAL INEQUALITIES, INFINITE-DIMENSIONAL DUALITY, INVERSE PROBLEM AND APPLICATIONS TO

String Structures, Reductions and T-duality Pedram Hekmati University of Adelaide

High-dimensional and infinite-dimensional hyperbolic crosses and their applications in

Feedback stabilization of diagonal infinite-dimensional systems in the presence of delays IFAC

Infinite dimensional sub-Riemannian geometry Sylvain Arguill` ere (CIS, Johns Hopkins

Infinite-dimensional calculus with a view towards Lie theory Helge Gl ockner (Universit at

Poincar e-Verdier duality Having proved Verdier duality, our next goal is to compute f ! G ,

Some recent results and open questions in time optimal control for infinite dimensional systems

1 Introduction and motivations Regression and classification from an infinite dimensional

Infinite-dimensional integration by the Multivariate Decomposition Method Ian Sloan

Infinite Dimensional Preconditioners V.B. Kiran Kumar Department of Mathematics Cochin

The HJB-POD approach for infinite dimensional control problems M. Falcone works in collaboration

Estimation with Infinite Dimensional Kernel Exponential Families Kenji Fukumizu The Institute of

Marginal stability in infinite dimensional Hard Spheres: the Gardner transition and the fullRSB

Backward stochastic partial differential equations driven by infinite dimensional martingales and

Degree Theory and Infinite Dimensional Topology . . . Takayuki Kihara Department of

Mathematical Foundations of Infinite-Dimensional Statistical Models: 3.3 The Entropy Method and

Spectral distributions of high-dimensional sample correlation matrices under infinite variance

Stability of networks of infinite-dimensional systems Andrii Mironchenko Faculty of Mathematics

Coprime factorizations and stabilizability of infinite-dimensional linear systems Kalle M.

Upper triangular forms for some classes of infinite dimensional operators Ken Dykema, 1 Fedor

Projective limit techniques for the infinite dimensional moment problem Maria Infusino

A framework for non-convex recovery of low dimensional models in infinite dimension GDR MIA,

Infinite-dimensional stochastic differential equations related to Airy random fields