frank wolfe splitting via augmented lagrangian method
play

Frank-Wolfe Splitting via Augmented Lagrangian Method Fabian - PowerPoint PPT Presentation

Frank-Wolfe Splitting via Augmented Lagrangian Method Fabian Pedregosa 2 Simon Lacoste-Julien 1 Gauthier Gidel 1 1 MILA, DIRO Universit de Montral 2 UC Berkeley & ETH Zurich April 2018 Gauthier Gidel FW Splitting via ALM April 2018 Why


  1. Frank-Wolfe Splitting via Augmented Lagrangian Method Fabian Pedregosa 2 Simon Lacoste-Julien 1 Gauthier Gidel 1 1 MILA, DIRO Université de Montréal 2 UC Berkeley & ETH Zurich April 2018 Gauthier Gidel FW Splitting via ALM April 2018

  2. Why Frank-Wolfe is wonderful. ◮ Constrained optimization algorithm: min x ∈C f ( x ) f convex, C convex compact . ◮ Interesting for highly structured constraint sets: Permutahedron: [Lancia and Alignment constraint: [Alayrac Serafini, 2018] [Evangelopoulos et al., 2016] et al., 2017] Gauthier Gidel FW Splitting via ALM April 2018

  3. Why Frank-Wolfe is wonderful. ◮ Constrained optimization algorithm: min x ∈C f ( x ) f convex, C convex compact . ◮ Interesting for highly structured constraint sets: Alignment constraint: Permutahedron: [Lancia and Serafini, 2018] [Evangelopoulos et al., 2017] [Alayrac et al., 2016] Gauthier Gidel FW Splitting via ALM April 2018

  4. Why Frank-Wolfe is wonderful. ◮ Constrained optimization algorithm: min x ∈C f ( x ) f convex, C convex compact . ◮ Interesting for highly structured constraint sets: Permutahedron: Alignment constraint: [Lancia and Serafini, 2018] [Alayrac et al., 2016] [Evangelopoulos et al., 2017] Gauthier Gidel FW Splitting via ALM April 2018

  5. Why Frank-Wolfe is wonderful. ◮ Constrained optimization algorithm: min x ∈C f ( x ) f convex, C convex compact . ◮ Interesting when projection is not practical : Projection Linear Minimization Oracle ◮ When projection is practical better use projected gradient method. Gauthier Gidel FW Splitting via ALM April 2018

  6. Why Frank-Wolfe sometimes is not enough. ◮ FW requires linear minimization ( LMO ) over these set. LMO ( d ) := arg min � d , x � x ∈C ◮ Intersection of constraint sets: C 1 ∩ C 2 . ◮ LMO C 1 ∩C 2 ( d ) may be too expensive. ◮ FW-AL just requires LMO C 1 ( d ) and LMO C 2 ( d ). Gauthier Gidel FW Splitting via ALM April 2018

  7. Why Frank-Wolfe sometimes is not enough. ◮ FW requires linear minimization ( LMO ) over these set. LMO ( d ) := arg min � d , x � x ∈C ◮ Intersection of constraint sets: C 1 ∩ C 2 . ◮ LMO C 1 ∩C 2 ( d ) may be too expensive. ◮ FW-AL just requires LMO C 1 ( d ) and LMO C 2 ( d ). Gauthier Gidel FW Splitting via ALM April 2018

  8. Why Frank-Wolfe sometimes is not enough. ◮ FW requires linear minimization ( LMO ) over these set. LMO ( d ) := arg min � d , x � x ∈C ◮ Intersection of constraint sets: C 1 ∩ C 2 . ◮ LMO C 1 ∩C 2 ( d ) may be too expensive. ◮ FW-AL just requires LMO C 1 ( d ) and LMO C 2 ( d ). Gauthier Gidel FW Splitting via ALM April 2018

  9. Simultaneously sparse and low rank matrix recovery Proposed by Richard et al. [2012]: � S − ˆ Σ � 2 min 2 . S � 0 , � S � 1 ≤ β 1 , � S � ∗ ≤ β 2 ◮ Sparcity constraint: C 1 := { S � 0 , � S � 1 ≤ β 1 } , LMO C 1 ( D ) = Largest coefficient of the matrix: O ( d 2 ) ◮ Low rank constraint: C 2 := { S � 0 , � S � ∗ ≤ β 2 } . LMO C 2 ( D ) = Largest eigenvector: O ( d 2 / √ ǫ ) Gauthier Gidel FW Splitting via ALM April 2018

  10. Simultaneously sparse and low rank matrix recovery Proposed by Richard et al. [2012]: � S − ˆ Σ � 2 min 2 . S � 0 , � S � 1 ≤ β 1 , � S � ∗ ≤ β 2 ◮ Sparcity constraint: C 1 := { S � 0 , � S � 1 ≤ β 1 } , LMO C 1 ( D ) = Largest coefficient of the matrix: O ( d 2 ) ◮ Low rank constraint: C 2 := { S � 0 , � S � ∗ ≤ β 2 } . LMO C 2 ( D ) = Largest eigenvector: O ( d 2 / √ ǫ ) Gauthier Gidel FW Splitting via ALM April 2018

  11. Simultaneously sparse and low rank matrix recovery Proposed by Richard et al. [2012]: � S − ˆ Σ � 2 min 2 . S � 0 , � S � 1 ≤ β 1 , � S � ∗ ≤ β 2 ◮ Sparcity constraint: C 1 := { S � 0 , � S � 1 ≤ β 1 } , LMO C 1 ( D ) = Largest coefficient of the matrix: O ( d 2 ) ◮ Low rank constraint: C 2 := { S � 0 , � S � ∗ ≤ β 2 } . LMO C 2 ( D ) = Largest eigenvector: O ( d 2 / √ ǫ ) Gauthier Gidel FW Splitting via ALM April 2018

  12. Multiple sequence alignment Proposed by Yen et al. [2016a]: W ∈A∩P � W, D � min ◮ W : alignment the sequences. D : cost matrix. ◮ A : alignment constraint . Each alignment with the consensus sequence is valid. ◮ P : consensus constraint. Alignments consistent between each other. Gauthier Gidel FW Splitting via ALM April 2018

  13. Multiple sequence alignment Proposed by Yen et al. [2016a]: W ∈A∩P � W, D � min ◮ W : alignment the sequences. D : cost matrix. ◮ A : alignment constraint . Each alignment with the consensus sequence is valid. ◮ P : consensus constraint. Alignments consistent between each other. Gauthier Gidel FW Splitting via ALM April 2018

  14. Multiple sequence alignment Proposed by Yen et al. [2016a]: W ∈A∩P � W, D � min ◮ W : alignment the sequences. D : cost matrix. ◮ A : alignment constraint . Each alignment with the consensus sequence is valid. ◮ P : consensus constraint. Alignments consistent between each other. Gauthier Gidel FW Splitting via ALM April 2018

  15. Multiple sequence alignment Proposed by Yen et al. [2016a]: W ∈A∩P � W, D � min ◮ W : alignment the sequences. D : cost matrix. ◮ A : alignment constraint . Each alignment with the consensus sequence is valid. ◮ P : consensus constraint. Alignments consistent between each other. Gauthier Gidel FW Splitting via ALM April 2018

  16. Structured SVM Proposed by Yen et al. [2016b]: 1 � � A F α � 2 � δ ⊤ min 2 − j α j dual problem: 2 α f ∈ ∆ |Y f | F ∈T j ∈V s.t. M fi α f = α i , f ∈ F, F ∈ T , i ∈ N ( f ) . ◮ V : Variables. T : Factor templates. N ( f ): neighbors of f . ◮ Consistency constraint: M 11 x (1) = α 1 , M 12 x (1) = α 2 , . . . x (1) x (2) α 1 α 2 α 3 Gauthier Gidel FW Splitting via ALM April 2018

  17. Structured SVM Proposed by Yen et al. [2016b]: 1 � � A F α � 2 � δ ⊤ min 2 − j α j dual problem: 2 α f ∈ ∆ |Y f | F ∈T j ∈V s.t. M fi α f = α i , f ∈ F, F ∈ T , i ∈ N ( f ) . ◮ V : Variables. T : Factor templates. N ( f ): neighbors of f . ◮ Consistency constraint: M 11 x (1) = α 1 , M 12 x (1) = α 2 , . . . x (1) x (2) α 1 α 2 α 3 Gauthier Gidel FW Splitting via ALM April 2018

  18. Structured SVM Proposed by Yen et al. [2016b]: 1 � � A F α � 2 � δ ⊤ min 2 − j α j dual problem: 2 α f ∈ ∆ |Y f | F ∈T j ∈V s.t. M fi α f = α i , f ∈ F, F ∈ T , i ∈ N ( f ) . ◮ V : Variables. T : Factor templates. N ( f ): neighbors of f . ◮ Consistency constraint: M 11 x (1) = α 1 , M 12 x (1) = α 2 , . . . x (1) x (2) α 1 α 2 α 3 Gauthier Gidel FW Splitting via ALM April 2018

  19. General Formulation x (1) ,..., x ( k ) f ( x (1) , . . . , x ( k ) ) , minimize K x ( k ) ∈ C k , k ∈ [ K ] , A k x ( k ) = 0 . � k =1 ◮ f is convex and smooth (gradient Lipschitz). ◮ C k , k ∈ { 1 , . . . , K } are convex compact. Gauthier Gidel FW Splitting via ALM April 2018

  20. Augmented Lagrangian Method k =1 A k x ( k ) = 0. ◮ Augmented Lagrangian trick to get rid of � K k =1 A k x ( k ) = 0 and the functions, ◮ M s.t. M x = 0 ⇔ � K 2 � M x � 2 . L ( x , y ) := f ( x ) + � y , M x � + λ � f ( x ) if M x = 0 , p ( x ) := max y ∈ R d L ( x , y ) = + ∞ otherwise . ◮ Augmented Lagrangian formulation of our problem, minimize max y ∈ R d L ( x , y ) x x ∈ X := × K s.t. k =1 C k . Gauthier Gidel FW Splitting via ALM April 2018

  21. Augmented Lagrangian Method k =1 A k x ( k ) = 0. ◮ Augmented Lagrangian trick to get rid of � K k =1 A k x ( k ) = 0 and the functions, ◮ M s.t. M x = 0 ⇔ � K 2 � M x � 2 . L ( x , y ) := f ( x ) + � y , M x � + λ � f ( x ) if M x = 0 , p ( x ) := max y ∈ R d L ( x , y ) = + ∞ otherwise . ◮ Augmented Lagrangian formulation of our problem, minimize max y ∈ R d L ( x , y ) x x ∈ X := × K s.t. k =1 C k . Gauthier Gidel FW Splitting via ALM April 2018

  22. Augmented Lagrangian Method k =1 A k x ( k ) = 0. ◮ Augmented Lagrangian trick to get rid of � K k =1 A k x ( k ) = 0 and the functions, ◮ M s.t. M x = 0 ⇔ � K 2 � M x � 2 . L ( x , y ) := f ( x ) + � y , M x � + λ � f ( x ) if M x = 0 , p ( x ) := max y ∈ R d L ( x , y ) = + ∞ otherwise . ◮ Augmented Lagrangian formulation of our problem, minimize max y ∈ R d L ( x , y ) x x ∈ X := × K s.t. k =1 C k . Gauthier Gidel FW Splitting via ALM April 2018

  23. FW-AL algorithm minimize max y ∈ R d L ( x , y ) x x ∈ X := × K s.t. k =1 C k . ◮ Standard AL method:  x t +1 = arg min L ( x , y t ) (argmin step) ,  x ∈X y t +1 = y t + η t M x t +1 (Gradient ascent step) .  ◮ Replace arg min steps by FW steps. FW-AL : � x t +1 = FW ( x t ; L ( · , y t )) (Frank-Wolfe step) , y t +1 = y t + η t M x t +1 (Gradient ascent step) . Gauthier Gidel FW Splitting via ALM April 2018

  24. FW-AL algorithm minimize max y ∈ R d L ( x , y ) x x ∈ X := × K s.t. k =1 C k . ◮ Standard AL method:  x t +1 = arg min L ( x , y t ) (argmin step) ,  x ∈X y t +1 = y t + η t M x t +1 (Gradient ascent step) .  ◮ Replace arg min steps by FW steps. FW-AL : � x t +1 = FW ( x t ; L ( · , y t )) (Frank-Wolfe step) , y t +1 = y t + η t M x t +1 (Gradient ascent step) . Gauthier Gidel FW Splitting via ALM April 2018

Recommend


More recommend