Regularized Nonlinear Acceleration. Alexandre dAspremont , CNRS - PowerPoint PPT Presentation

Regularized Nonlinear Acceleration. Alexandre d’Aspremont , CNRS & D.I. Ecole Normale Sup´ erieure . with Damien Scieur & Francis Bach. Support from ERC SIPA and ITN SpaRTaN. Alex d’Aspremont Huatulco, January 2018. 1/30

Introduction Generic convex optimization problem x ∈ R n f ( x ) min Alex d’Aspremont Huatulco, January 2018. 2/30

Introduction Algorithms produce a sequence of iterates. We only keep the last (or best) one. . . Alex d’Aspremont Huatulco, January 2018. 3/30

Introduction Aitken’s ∆ 2 [Aitken, 1927]. Given a sequence { s k } k =1 ,... ∈ R N with limit s ∗ , and suppose s k +1 − s ∗ = a ( s k − s ∗ ) , for k = 1 , . . . We can compute a using a = s k +1 − s k s k +1 − s k = a ( s k − s k − 1 ) ⇒ s k − s k − 1 and get the limit s ∗ by solving s k +1 − s ∗ = s k +1 − s k ( s k − s ∗ ) s k − s k − 1 which yields s k − 1 s k +1 − s 2 s ∗ = k s k +1 − 2 s k + s k − 1 This is Aitken’s ∆ 2 and allows us to compute s ∗ from { s k +1 , s k , s k − 1 } . Alex d’Aspremont Huatulco, January 2018. 4/30

Introduction Convergence acceleration. Consider k ( − 1) i π � k →∞ s k = − − − − → 4 = 0 . 785398 . . . (2 i + 1) i =0 we have ( − 1) k ( − 1) i � k ∆ 2 k i =0 (2 k +1) (2 i +1) 0 1 1.0000 – 1 -0.33333 0.66667 – 2 0.2 0.86667 0.7 9167 3 -0.14286 0.7 2381 0.78 333 4 0.11111 0.83492 0.78 631 5 -0.090909 0.7 4401 0.78 492 6 0.076923 0.82093 0.785 68 7 -0.066667 0.7 5427 0.785 22 8 0.058824 0.81309 0.785 52 9 -0.052632 0.7 6046 0.7853 1 Alex d’Aspremont Huatulco, January 2018. 5/30

Introduction Convergence acceleration. � Similar results apply to sequences satisfying k � a i ( s n + i − s ∗ ) = 0 i =0 using Aitken’s ideas recursively. � This produces Wynn’s ε − algorithm [Wynn, 1956]. � See [Brezinski, 1977] for a survey on acceleration, extrapolation. � Directly related to the Levinson-Durbin algo on AR processes. � Vector case: focus on Minimal Polynomial Extrapolation [Sidi et al., 1986]. Overall: a simple postprocessing step. Alex d’Aspremont Huatulco, January 2018. 6/30

Outline � Introduction � Minimal Polynomial Extrapolation � Regularized MPE � Numerical results Alex d’Aspremont Huatulco, January 2018. 7/30

Minimal Polynomial Extrapolation Quadratic example. Minimize f ( x ) = 1 2 � Bx − b � 2 2 using the basic gradient algorithm, with x k +1 := x k − 1 L ( B T Bx k − b ) . we get � � I − 1 x k +1 − x ∗ := LB T B ( x k − x ∗ ) � �� A since B T Bx ∗ = b . This means x k +1 − x ∗ follows a vector autoregressive process. Alex d’Aspremont Huatulco, January 2018. 8/30

Minimal Polynomial Extrapolation We have k k � � c i ( x i − x ∗ ) = c i A i ( x 0 − x ∗ ) i =0 i =1 and setting 1 T c = 1 , yields � k � � − x ∗ = p ( A )( x 0 − x ∗ ) , where p ( v ) = � k i =1 c i v i c i x i i =0 � Setting c such that p ( A )( x 0 − x ∗ ) = 0 , we would have k � x ∗ = c i x i i = 0 � Get the limit by averaging iterates (using weights depending on x k ). � We typically do not observe A (or x ∗ ). � How do we extract c from the iterates x k ? Alex d’Aspremont Huatulco, January 2018. 9/30

Minimal Polynomial Extrapolation We have ( x k − x ∗ ) − ( x k − 1 − x ∗ ) x k − x k − 1 = ( A − I ) A k − 1 ( x 0 − x ∗ ) = hence if p ( A ) = 0 , we must have k � c i ( x i − x i − 1 ) = ( A − I ) p ( A )( x 0 − x ∗ ) = 0 i =1 so if ( A − I ) is nonsingular, the coefficient vector c solves the linear system  � k i =1 c i ( x i − x i − 1 ) = 0   � k  i =1 c i = 1  and p ( · ) is the minimal polynomial of A w.r.t. ( x 0 − x ∗ ) . Alex d’Aspremont Huatulco, January 2018. 10/30

Approximate Minimal Polynomial Extrapolation Approximate MPE. � For k smaller than the degree of the minimal polynomial, we find c that minimizes the residual � � k � � � � ( A − I ) p ( A )( x 0 − x ∗ ) � 2 = � � c i ( x i − x i − 1 ) � � � � i =1 2 � Setting U ∈ R n × k +1 , with U i = x i +1 − x i , this means solving c ∗ � argmin � Uc � 2 (AMPE) 1 T c =1 in the variable c ∈ R k +1 . � Also known as Eddy-Meˇ sina method [Meˇ sina, 1977, Eddy, 1979] or Reduced Rank Extrapolation with arbitrary k (see [Smith et al., 1987, § 10]). Very similar to Anderson acceleration, GMRES, etc. Alex d’Aspremont Huatulco, January 2018. 11/30

Uniform Bound Chebyshev polynomials. Crude bound on � Uc ∗ � 2 using Chebyshev polynomials, to bound error as a function of k , with � i x i − x ∗ � � � �� k � ( I − A ) − 1 � k � i =0 c ∗ � � i =0 c ∗ � = i U i � � 2 2 � ( I − A ) − 1 � � ≤ 2 � p ( A )( x 1 − x 0 ) � 2 � We have � p ( A )( x 1 − x 0 ) � 2 ≤ � p ( A ) � 2 � ( x 1 − x 0 ) � 2 = i =1 ,...,n | p ( λ i ) | � ( x 1 − x 0 ) � 2 max where 0 ≤ λ i ≤ σ are the eigenvalues of A . It suffices to find p ( · ) ∈ R k [ x ] solving inf sup | p ( v ) | { p ∈ R k [ x ]: p (1)=1 } v ∈ [0 ,σ ] Explicit solution using modified Chebyshev polynomials. Alex d’Aspremont Huatulco, January 2018. 12/30

Uniform Bound using Chebyshev Polynomials σ 0.9 0.8 0.7 0.6 T k ( x, σ ) 0.5 0.4 0.3 0.2 0.1 0 −0.1 0 0.2 0.4 0.6 0.8 1 x Chebyshev polynomials T 3 ( x, σ ) and T 5 ( x, σ ) for x ∈ [0 , 1] and σ = 0 . 85 . The maximum value of T k on [0 , σ ] decreases geometrically fast when k grows. Alex d’Aspremont Huatulco, January 2018. 13/30

Approximate Minimal Polynomial Extrapolation Proposition Let A be symmetric, 0 � A � σI with σ < 1 and c ∗ be AMPE convergence. the solution of (AMPE) . Then � � k 2 ζ k � � � c ∗ i x i − x ∗ 1 + ζ 2 k � x 0 − x ∗ � 2 � � ≤ κ ( A − I ) (1) � � � � i =0 2 where κ ( A − I ) is the condition number of the matrix A − I and ζ is given by ζ = 1 − √ 1 − σ 1 + √ 1 − σ < σ, (2) See also [Nemirovskiy and Polyak, 1984]. Gradient method, σ = 1 − µ/L , so 1 − √ � � k � i x i − x ∗ � � k µ/L � i =0 c ∗ � � x 0 − x ∗ � 2 1+ √ 2 ≤ κ ( A − I ) � � µ/L Alex d’Aspremont Huatulco, January 2018. 14/30

Approximate Minimal Polynomial Extrapolation AMPE versus Nesterov, conjugate gradient. � Key difference with conjugate gradient: we do not observe A . . . � Chebyshev polynomials satisfy a two-step recurrence. For quadratic minimization using the gradient method:  z k − 1 = y k − 1 − 1 L ( By k − 1 − b )    � 2 z k − 1 � y k = α k − 1 − α k − 2  − y k − 1 y k − 2   α k σ α k where α k = 2 − σ σ α k − 1 − α k − 2 � Nesterov’s acceleration recursively computes a similar polynomial with  z k − 1 = y k − 1 − 1 L ( By k − 1 − b )   y k = z k − 1 + β k ( z k − 1 − z k − 2 ) , see also [Hardt, 2013]. Alex d’Aspremont Huatulco, January 2018. 15/30

Approximate Minimal Polynomial Extrapolation Accelerating optimization algorithms. For gradient descent, we have x k − 1 x k +1 := ˜ ˜ L ∇ f (˜ x k ) x k +1 − x ∗ := A (˜ x k − x ∗ ) + O ( � ˜ x k − x ∗ � 2 � This means ˜ 2 ) where A = I − 1 L ∇ 2 f ( x ∗ ) , meaning that � A � 2 ≤ 1 − µ L , whenever µI � ∇ 2 f ( x ) � LI . � Approximation error is a sum of three terms � � � � � � � � k k k k � � � � � � � � � � � � x i − x ∗ c i x i − x ∗ � � � � � � � � c i ˜ ˜ ≤ + (˜ c i − c i ) x i + c i (˜ ˜ x i − x i ) � � � � � � � � � � � � � � � � i =0 i =0 i =0 i =0 2 2 2 2 � �� AMPE Stability Nonlinearity Stability is key here. Alex d’Aspremont Huatulco, January 2018. 16/30

Approximate Minimal Polynomial Extrapolation Stability. � The iterations span a Krylov subspace � � U 0 , AU 0 , ..., A k − 1 U 0 K k = span so the matrix U in AMPE is a Krylov matrix. � Similar to Hankel or Toeplitz case. U T U has a condition number typically growing exponentially with dimension [Tyrtyshnikov, 1994]. � In fact, the Hankel, Toeplitz and Krylov problems are directly connected, hence the link with Levinson-Durbin [Heinig and Rost, 2011]. � For generic optimization problems, eigenvalues are perturbed by deviations from the linear model, which can make the situation even worse. Be wise, regularize . . . Alex d’Aspremont Huatulco, January 2018. 17/30

Outline � Introduction � Minimal Polynomial Extrapolation � Regularized MPE � Numerical results Alex d’Aspremont Huatulco, January 2018. 18/30

Regularized Minimal Polynomial Extrapolation Regularized AMPE. Add a regularization term to AMPE. � Regularized formulation of problem (AMPE), c T ( U T U + λI ) c minimize (RMPE) 1 T c = 1 subject to � Solution given by a linear system of size k + 1 . ( U T U + λI ) − 1 1 c ∗ λ = (3) 1 T ( U T U + λI ) − 1 1 Alex d’Aspremont Huatulco, January 2018. 19/30

Regularized Nonlinear Acceleration. Alexandre dAspremont , CNRS - PowerPoint PPT Presentation

Regularized Nonlinear Acceleration. Alexandre dAspremont , CNRS & D.I. Ecole Normale Sup erieure . with Damien Scieur & Francis Bach. Support from ERC SIPA and ITN SpaRTaN. Alex dAspremont Huatulco, January 2018. 1/30

Nonlinear Control Lecture # 31 Nonlinear Observers Nonlinear Control Lecture # 31 Nonlinear

Nonlinear Control Lecture # 22 Special nonlinear Forms Nonlinear Control Lecture # 22 Special

Nonlinear Control Lecture # 21 Special nonlinear Forms Nonlinear Control Lecture # 21 Special

Regularized generalized CCA (RGCCA) Arthur Tenenhaus (SUPELEC) Michel Tenenhaus (HEC Paris) 1

A GPU-Inspired Soft Processor for High- Throughput Acceleration Throughput Acceleration Jeffrey

Nonlinear Control Lecture # 8 Special nonlinear Forms Nonlinear Control Lecture # 8 Special

Nonlinear Control Lecture # 12 Nonlinear Observers and Output Feedback Stabilization Nonlinear

Nonlinear Control Lecture # 20 Special nonlinear Forms Nonlinear Control Lecture # 20 Special

Nonlinear Control Lecture # 1 Introduction Nonlinear Control Lecture # 1 Introduction Nonlinear

Numerical Proofs in Nonlinear Control Sicun Gao, UCSD Nonlinear control working Nonlinear

Acceleration at North Allegheny Mathematics Acceleration (Elementary) Students may qualify for

Particle Driven Acceleration Experiments Edda Gschwendtner CAS, Plasma Wake Acceleration 2014 2

Motion with Constant Acceleration 1 Particle Under Constant Acceleration In the case of motion

acceleration Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada NSS acceleration

Statistical Properties of the Regularized Least Squares Functional and a hybrid LSQR Newton method

Regularized Least Squares Charlie Frogner 1 MIT 2011 1 Slides mostly stolen from Ryan Rifkin

Landsat Calibration: Interpolation, Extrapolation, and Reflection L DCM Sc ie nc e T e a m Me

Evaluation for Stability data Q1E Sumie Yoshioka, Ph. D. MHLW National Institute of Health

G8 Education Investor Presentation G8 Education Limited (ASX:GEM) 26 April 2016 Transaction

ANNUAL GENERAL MEETING Tuesday, June 5 th , 2018 Paris 05 06 2018 Overview 1. Opening of

0-6 hour Weather Forecast Guidance at The Weather Company Steven Honey, Joseph Koval, Cathryn

Assessing Generalization in Deep Reinforcement Learning Soo Jung Jang Background Before (ex:

Presentation to ARC Technical Coordinating Committee June 13, 2019 Fayette County 2016

Biosimilars in Oncology The Patient Perspective Opportunities and Challenges Jenn Gordon