From Nesterovs Estimate Sequence To Riemannian Acceleration - PowerPoint PPT Presentation

From Nesterov’s Estimate Sequence To Riemannian Acceleration Kwangjun Ahn, Suvrit Sra COLT 2020 arXiv: https://arxiv.org/abs/2001.08876

Riemannian Optimization? 𝑔: ℝ � → ℝ • (Euclidean) Optimization: 𝑦∈ℝ � 𝑔(𝑦) min • Riemannian Optimization: 𝑔: 𝑁 → ℝ min 𝑦∈𝑁 𝑔(𝑦) 𝑁 = a Riemannian manifold

Accel. Gradient Method! • Yurii Nesterov 80�s Accel. Gradient Descent: For 𝑢 � 0,1,2, … 𝑦 ��1 � 𝑧 � � 𝛽 ��1 𝑨 � � 𝑧 � 𝑧 ��1 � 𝑦 ��1 � 𝛿 ��1 𝛼𝑔 𝑦 ��1 𝑨 ��1 � 𝑦 ��1 � 𝛾 ��1 𝑨 � � 𝑦 ��1 � 𝜃 ��1 𝛼𝑔�𝑦 ��1 �

Accel. Gradient Method: Theory • Yurii Nesterov 80�s Nesterov showed: C.f. Gradient Descent: For 𝜈 ≼ 𝛼 � 𝑔 𝑦 ≼ 𝑀 For 𝜈 ≼ 𝛼 � 𝑔 𝑦 ≼ 𝑀 � � � � 𝑔 𝑦 � � 𝑔 𝑦 ∗ � 𝑃 1 � 𝑔 𝑧 � � 𝑔 𝑦 ∗ � 𝑃 1 � 𝑴 𝑴 For 𝜗 -approx. solution, For 𝜗 -approx. solution, 𝑴 � We need 𝑃 many iterations. � log 𝑴 � We only need t � 𝑃 . � log � � � Acceleration! � (and indeed optimal for this class!)

Natural Question.. � Could we develop such landmark result for curved spaces (Riem. manifolds)? � Turns out to be challenging question: � Li� et al.�17 ( NIPS ) reduces the task to solving nonlinear equations. � Not clear whether whether these equations are even feasible or tractably solvable. � Alimisis et al.�20 ( AISTATS ): Continuous dynamic approach � Not clear whether the discretization yields accel. � Most concrete result: Zhang- Sra�18 ( COLT ) � proposed an alg. guaranteed to accel. locally . Global accel ? � Open!

Challenge! • Nesterov�s analysis is called the Estimate Sequence technique • Nesterov�s analysis relies on linear structure! � not clear if it generalizes to non-linear space like Riem. manifolds. • Nesterov�s analysis entails non-trivial algebraic tricks! � Hard to understand; its scope has puzzled researchers for years.

Riemannian Accel. GD (Euclidean) Accel. Gradient Descent: 𝑦 𝑢+1 � 𝑧 𝑢 � 𝛽 𝑢+1 𝑨 𝑢 � 𝑧 𝑢 𝑧 𝑢+1 � 𝑦 𝑢+1 � 𝛿 𝑢+1 𝛼𝑔 𝑦 𝑢+1 𝑨 𝑢+1 � 𝑦 𝑢+1 � 𝛾 𝑢+1 𝑨 𝑢 � 𝑦 𝑢+1 � 𝜃 𝑢+1 𝛼𝑔�𝑦 𝑢+1 � Riemannian Accel. Gradient Descent: −1 𝑨 𝑢 𝑦 𝑢+1 � 𝐹𝑦𝑞 � � 𝛽 𝑢+1 ⋅ 𝐹𝑦𝑞 � � 𝑧 𝑢+1 � 𝐹𝑦𝑞 � �� 𝛿 𝑢+1 ⋅ 𝛼𝑔 𝑦 𝑢+1 −1 𝑨 𝑢+1 � 𝐹𝑦𝑞 � �� 𝛾 𝑢+1 ⋅ 𝐹𝑦𝑞 � �� 𝑨 𝑢 � 𝜃 𝑢+1 𝛼𝑔 𝑦 𝑢+1 Space is curved, causes “distortion”

1.How does this affect the convergence rate? • Non-linear recursive relation: � �� −�/�� 𝟐 2 Severer the distortion gets, � 𝜺 𝜊 � �1−� �� Slower the convergence rate becomes! 𝜐 𝑣 � 𝑣�𝑣 � 𝜈/𝑀� 1 � 𝑣 No matter how severe the distortion Riem. AGD always faster than RGD! 𝜄 𝑣 � 𝑣 2 1 𝜄 𝑣 � 1 To achieve full accel. i.e. 𝜈/𝑀 , 2 𝑣 2 we need bring 𝜀 down to 1 ! 𝜄 𝑣 � 1 5 𝑣 2 𝜀 � 5 𝜈/𝑀 𝜈/𝑀 1 How do we control/estimate the distortion?

Global Accel for Riem. Case! Thm 2. Th . Given: 𝜊 0 � 0 the magnitude of metric distortion Find 𝜊 𝑢+1 ∈ �2𝜈Δ, 1� such that at iteration t 𝜊 𝑢+1 �𝜊 𝑢+1 −2𝜈Δ� 𝟐 2 � 𝜺 𝒖+𝟐 𝜊 𝑢 �1−𝜊 𝑢+1 � where 𝜺 𝒖+𝟐 � 𝑼�𝒆�𝒚 𝒖 , 𝒜 𝒖 �� for some computable function 𝑈 . 𝑔 𝑧 𝑢+1 � 𝑔 𝑦 ∗ � 𝑃 1 � 𝜊 1 1 � 𝜊 2 ⋯ 1 � 𝜊 𝑢+1 s.t. 1 𝜊 𝑢 � 𝜈/𝑀 for all 𝑢 . (2) 𝜊 𝑢 quickly converges to 𝜈/𝑀 . quickly acheives 𝐠𝐯𝐦𝐦 acceleartion! strictly 𝐠𝐛𝐭𝐮𝐟𝐬 than nonaccel GD!

Open problem Obtaining acceleration the non-strongly convex case? Remarks ★ Using strongly convex perturbation can be done ★ But, extra factor O (log 1/ ϵ ) ★ More crucially, our current proof needs to ensure all   iterates remain within a set of specific size to be able   to ensure acceleration. Removing this limitation is valuable

From Nesterovs Estimate Sequence To Riemannian Acceleration - PowerPoint PPT Presentation

From Nesterovs Estimate Sequence To Riemannian Acceleration Kwangjun Ahn, Suvrit Sra COLT 2020 arXiv: https://arxiv.org/abs/2001.08876 Riemannian Optimization? : (Euclidean) Optimization: ()

Protein Sequence Analysis Protein Sequence Analysis Protein sequence motifs Protein sequence

Sequence to Sequence models: Attention Models 1 Sequence-to-sequence modelling Problem:

Sequence to Sequence models: Attention Models 1 Sequence-to-sequence modelling Problem:

Sequence to Sequence models: Connectionist Temporal Classification 1 Sequence-to-sequence

Complexity and Simplicity of Optimization Problems Yurii Nesterov, CORE/INMA (UCL) February 17 -

Primal-dual Subgradient Method for Convex Problems with Functional Constraints Yurii Nesterov,

SEQUENCE ANALYSIS The term " sequence analysis " in biology implies subjecting a DNA or

On the Smallest Enclosing Riemannian Balls On Approximating the Riemannian 1-Center

Riemannian manifolds with nontrivial Limbeek local symmetry Wouter van Limbeek University of

D U E o i r ud ig el it i R o e t Riemannian Holonomy. To a Riemannian manifold ( M n

Sequence Alignment Gerhard Jger ESSLLI 2016 Gerhard Jger Sequence Alignment ESSLLI 2016 1

Sequence to Sequence models: Connectionist Temporal Classification 5 March 2018 1

61A Lecture 30 Announcements Efficient Sequence Processing Sequence Operations 4 Sequence

Introduction to sequence to sequence models N ATURAL LAN GUAGE GEN ERATION IN P YTH ON

Sequence-to-Sequence Learning with Neural Networks Ilya Sutskever, Oriol Vinyals, Quoc V. Le,

Advanced Algorithms (X) Shanghai Jiao Tong University Chihao Zhang May 11, 2020 Estimate

Big graphs for big data: parallel matching and Outline clustering on billion-vertex graphs

Chapter 4 Programming with MATLAB Algorithms and Control Structures Algorithm: an ordered

15-388/688 - Practical Data Science: Relational Data J. Zico Kolter Carnegie Mellon University

10. Support Vector Machines Chlo-Agathe Azencot Centre for Computatjonal Biology, Mines

How to Relate Fuzzy and Fuzzy Fusion for . . . Resulting . . . OWA Estimates A Similar Problem

Business Statistics CONTENTS Comparing the variance of two populations The -distribution The

26 27 28 Normal-plott for Eksempel 3 (MINITAB 14) Probability Plot of C1 Normal - 95% CI 99

DEVELOPING E-PORTFOLIOS FOR EXTENSIVE READING ACTIVITIES FOR ENGLISH MAJOR STUDENTS IN HO CHI

From Nesterovs Estimate Sequence To Riemannian Acceleration - PowerPoint PPT Presentation

From Nesterovs Estimate Sequence To Riemannian Acceleration Kwangjun Ahn, Suvrit Sra COLT 2020 arXiv: https://arxiv.org/abs/2001.08876 Riemannian Optimization? : (Euclidean) Optimization: ()

Protein Sequence Analysis Protein Sequence Analysis Protein sequence motifs Protein sequence

Sequence to Sequence models: Attention Models 1 Sequence-to-sequence modelling Problem:

Sequence to Sequence models: Attention Models 1 Sequence-to-sequence modelling Problem:

Sequence to Sequence models: Connectionist Temporal Classification 1 Sequence-to-sequence

Complexity and Simplicity of Optimization Problems Yurii Nesterov, CORE/INMA (UCL) February 17 -

Primal-dual Subgradient Method for Convex Problems with Functional Constraints Yurii Nesterov,

SEQUENCE ANALYSIS The term &quot; sequence analysis &quot; in biology implies subjecting a DNA or

On the Smallest Enclosing Riemannian Balls On Approximating the Riemannian 1-Center

Riemannian manifolds with nontrivial Limbeek local symmetry Wouter van Limbeek University of

D U E o i r ud ig el it i R o e t Riemannian Holonomy. To a Riemannian manifold ( M n

Sequence Alignment Gerhard Jger ESSLLI 2016 Gerhard Jger Sequence Alignment ESSLLI 2016 1

Sequence to Sequence models: Connectionist Temporal Classification 5 March 2018 1

61A Lecture 30 Announcements Efficient Sequence Processing Sequence Operations 4 Sequence

Introduction to sequence to sequence models N ATURAL LAN GUAGE GEN ERATION IN P YTH ON

Sequence-to-Sequence Learning with Neural Networks Ilya Sutskever, Oriol Vinyals, Quoc V. Le,

Advanced Algorithms (X) Shanghai Jiao Tong University Chihao Zhang May 11, 2020 Estimate

Big graphs for big data: parallel matching and Outline clustering on billion-vertex graphs

Chapter 4 Programming with MATLAB Algorithms and Control Structures Algorithm: an ordered

15-388/688 - Practical Data Science: Relational Data J. Zico Kolter Carnegie Mellon University

10. Support Vector Machines Chlo-Agathe Azencot Centre for Computatjonal Biology, Mines

How to Relate Fuzzy and Fuzzy Fusion for . . . Resulting . . . OWA Estimates A Similar Problem

Business Statistics CONTENTS Comparing the variance of two populations The -distribution The

26 27 28 Normal-plott for Eksempel 3 (MINITAB 14) Probability Plot of C1 Normal - 95% CI 99

DEVELOPING E-PORTFOLIOS FOR EXTENSIVE READING ACTIVITIES FOR ENGLISH MAJOR STUDENTS IN HO CHI

SEQUENCE ANALYSIS The term " sequence analysis " in biology implies subjecting a DNA or