Geometry of First-order Methods Trajectory and Adaptive Acceleration - PowerPoint PPT Presentation

Example: Douglas–Rachford splitting Feasibility problem in R 2 Let T 1 , T 2 ⊂ R 2 be two subspaces such that T 1 ∩ T 2 � = ∅ , Find x ∈ R 2 such that x ∈ T 1 ∩ T 2 . 10 0 1-step inertial DR normal DR Inertial Douglas–Rachford: 2-step inertial DR ¯ z k = z k + a ( z k − z k − 1 ) + b ( z k − 1 − z k − 2 ) , 10 -5 z k + 1 = F DR (¯ z k ) . 1-step inertial: a = 0 . 3. 10 -10 2-step inertial: a = 0 . 6 , b = − 0 . 3. 10 -15 200 400 600 800 1000 NB : 1-step inertial will always worsen the rate! 10

Problems Nesterov/FISTA achieve worst case optimal convergence rate. 11

Problems Nesterov/FISTA achieve worst case optimal convergence rate. The performance of inertial in general is not clear, no rate improvements. 11

Problems Nesterov/FISTA achieve worst case optimal convergence rate. The performance of inertial in general is not clear, no rate improvements. Generalisation of inertial technique to first-order methods, or in general fixed-point iteration, is achievable: 11

Problems Nesterov/FISTA achieve worst case optimal convergence rate. The performance of inertial in general is not clear, no rate improvements. Generalisation of inertial technique to first-order methods, or in general fixed-point iteration, is achievable: � Guaranteed sequence convergence [Alvarez & Attouch ’01]. � NO acceleration guarantees. Unless stronger assumptions are imposed, e.g. strong convexity or Lipschitz smoothness. 11

Problems Nesterov/FISTA achieve worst case optimal convergence rate. The performance of inertial in general is not clear, no rate improvements. Generalisation of inertial technique to first-order methods, or in general fixed-point iteration, is achievable: � Guaranteed sequence convergence [Alvarez & Attouch ’01]. � NO acceleration guarantees. Unless stronger assumptions are imposed, e.g. strong convexity or Lipschitz smoothness. For a given method, e.g. Douglas–Rachford, the outcome of inertial/SOR is problem and parameters dependent. 11

Problems Nesterov/FISTA achieve worst case optimal convergence rate. The performance of inertial in general is not clear, no rate improvements. Generalisation of inertial technique to first-order methods, or in general fixed-point iteration, is achievable: � Guaranteed sequence convergence [Alvarez & Attouch ’01]. � NO acceleration guarantees. Unless stronger assumptions are imposed, e.g. strong convexity or Lipschitz smoothness. For a given method, e.g. Douglas–Rachford, the outcome of inertial/SOR is problem and parameters dependent. A general acceleration framework with acceleration guarantees is missing! 11

Outline Introduction Trajectory of first-order methods Adaptive acceleration via linear prediction Relation with previous work Numerical experiments Conclusions 12

Why the difference? Trivial rivial ans answer: er: because they are different problems and different methods... 13

Why the difference? Trivial rivial ans answer: er: because they are different problems and different methods... What happens when a problem can be solved by diff differ eren ent me methods thods ? 13

Why the difference? Trivial rivial ans answer: er: because they are different problems and different methods... Gradient descent Trajectory of { x k } k ∈ N . 13

Why the difference? Trivial rivial ans answer: er: because they are different problems and different methods... normal DR Trajectory of { z k } k ∈ N . 13

Why the difference? Trivial rivial ans answer: er: because they are different problems and different methods... 1-step inertial DR normal DR Trajectory of { z k } k ∈ N . 13

Why the difference? Trivial rivial ans answer: er: because they are different problems and different methods... 1-step inertial DR normal DR 2-step inertial DR Trajectory of { z k } k ∈ N . 13

Why the difference? Trivial rivial ans answer: er: because they are different problems and different methods... Geome Geometr try Structure Structur = = = = = = = = ⇒ FoM = = = = = = = = ⇒ Trajectory of sequence Non-smooth opt. 13

Why the difference? Trivial rivial ans answer: er: because they are different problems and different methods... However, FoM are non-linear in general... 13

Partial smoothness Partly smooth function [Lewis ’03] R is partly smooth at x relative to a set M x containing x if ∂ R ( x ) � = ∅ and Smoothness M x is a C 2 -manifold, R | M x is C 2 near x . � ⊥ . � Sharpness Tangent space T M x ( x ) is T x = par def ∂ R ( x ) Continuity ∂ R : R n ⇒ R n is continuous along M x near x . C ) :sub-space parallel to C , where C ⊂ R n is a non-empty convex set. par ( 20 Examples: Ex amples: 15 ℓ 1 , ℓ 1 , 2 , ℓ ∞ -norm f ( x ) 10 5 Nuclear norm 0 x Total variation M − 5 40 0 30 20 5 14 10 x 2 x 1

Trajectory of first-order methods Framework for analysing the local trajectory of FoM a First-order method ( non-linear ) ⇓ Convergence & Non-degeneracy: finite identification of M ⇓ Local linearisation along M : matrix M ( linear ) ⇓ Spectral properties of M ⇓ Local trajectory 15 a Local convergence previously studied in [Liang et al ’16].

Trajectory of first-order methods Local linearisation of FoM: z k + 1 − z k = M ( z k − z k − 1 ) + o ( || z k − z k − 1 || ) . 15

Trajectory of first-order methods Local linearisation of FoM: z k + 1 − z k = M ( z k − z k − 1 ) + o ( || z k − z k − 1 || ) . FB M is similar to a symmetric matrix with real eigenvalues in ] − 1 , 1 ] : straight-line FB trajectory . 15

Trajectory of first-order methods Local linearisation of FoM: z k + 1 − z k = M ( z k − z k − 1 ) + o ( || z k − z k − 1 || ) . FB M is similar to a symmetric matrix with real eigenvalues in ] − 1 , 1 ] : straight-line FB trajectory . DR DR If both functions are locally polyhedral, M is normal matrix with complex eigenvalues of the form cos ( θ ) e ± i θ : logarithmic spiral trajectory . 15

Trajectory of first-order methods Local linearisation of FoM: z k + 1 − z k = M ( z k − z k − 1 ) + o ( || z k − z k − 1 || ) . FB M is similar to a symmetric matrix with real eigenvalues in ] − 1 , 1 ] : straight-line FB trajectory . DR DR If both functions are locally polyhedral, M is normal matrix with complex eigenvalues of the form cos ( θ ) e ± i θ : logarithmic spiral trajectory . PD PD If both functions are locally polyhedral, M is up to orthogonal transform a block diagonal matrix, composition of circular and elliptical rotations: elliptical spiral trajectory . 15

Trajectory of first-order methods Local linearisation of FoM: z k + 1 − z k = M ( z k − z k − 1 ) + o ( || z k − z k − 1 || ) . FB M is similar to a symmetric matrix with real eigenvalues in ] − 1 , 1 ] : straight-line FB trajectory . DR DR If both functions are locally polyhedral, M is normal matrix with complex eigenvalues of the form cos ( θ ) e ± i θ : logarithmic spiral trajectory . PD PD If both functions are locally polyhedral, M is up to orthogonal transform a block diagonal matrix, composition of circular and elliptical rotations: elliptical spiral trajectory . NB : For DR/ADMM, if one term is locally C 2 -smooth, straight-line trajectory can be obtained under proper parameters. 15

Trajectory of first-order methods For orwar ard–Backw d–Backwar ard Douglas–Rach Douglas–Rachfor ord/ d/ADMM ADMM Primal–Dual Primal–Dual 10 -4 0.96 10 -5 10 -6 0.94 0.92 10 -8 0.9 10 -10 10 -10 0.88 0.86 10 -12 10 -15 0.84 10 -14 50 100 150 200 250 300 50 100 150 200 250 50 100 150 200 250 300 1 − cos ( θ k ) cos ( ψ ) − cos ( θ k ) cos ( θ k ) 15

Failure of inertial Consider the Lasso for a random Gaussian matrix A ∈ R m × n with m < n : x , y ∈ R n || x || 1 + 1 min 2 || Ax − f || 2 2 . Solving using DR with γ = 0 . 9 || A || 2 10 0 250 10 0 200 10 -5 150 10 -5 100 10 -10 50 10 -15 10 -10 0 200 400 600 800 1000 1200 200 400 600 800 1000 1200 1400 Eventual trajectory: Straight line when γ < ( || A || 2 ) − 1 16

Failure of inertial Consider the Lasso for a random Gaussian matrix A ∈ R m × n with m < n : x , y ∈ R n || x || 1 + 1 min 2 || Ax − f || 2 2 . Solving using DR with γ = 10 || A || 2 10 0 250 10 0 200 10 -5 150 10 -5 100 10 -10 50 10 -15 10 -10 0 50 100 150 200 100 200 300 400 500 600 700 800 900 Eventual trajectory: Linearisation matrix may have complex leading eigenvalue if γ ≥ ( || A || 2 ) − 1 . 16

Linear prediction: illustration Idea: Given past points { z k − j } q + 1 j = 0 , how to predict z k + 1 ? Define { v k − j def = z k − j − z k − j − 1 } q j = 0 . 18

Linear prediction: illustration Idea: Given past points { z k − j } q + 1 j = 0 , how to predict z k + 1 ? Define { v k − j def = z k − j − z k − j − 1 } q j = 0 . Fit the past directions v k − 1 , . . . , v k − q to the lat- est direction v k : = argmin c ∈ R q || � q j = 1 c j v k − j − v k || . def c k 18

Linear prediction: illustration Idea: Given past points { z k − j } q + 1 j = 0 , how to predict z k + 1 ? Define { v k − j def = z k − j − z k − j − 1 } q j = 0 . Fit the past directions v k − 1 , . . . , v k − q to the lat- est direction v k : = argmin c ∈ R q || � q j = 1 c j v k − j − v k || . def c k Let ¯ def = z k + � q z k , 1 j = 1 c k , j v k − j + 1 . 18

Linear prediction: illustration Idea: Given past points { z k − j } q + 1 j = 0 , how to predict z k + 1 ? Define { v k − j def = z k − j − z k − j − 1 } q j = 0 . Fit the past directions v k − 1 , . . . , v k − q to the lat- est direction v k : = argmin c ∈ R q || � q j = 1 c j v k − j − v k || . def c k Let ¯ def = z k + � q z k , 1 j = 1 c k , j v k − j + 1 . Repeat on { z k − j } q j = 0 ∪ { ¯ z k , 1 } and so on. 18

Linear prediction: illustration Idea: Given past points { z k − j } q + 1 j = 0 , how to predict z k + 1 ? Define { v k − j def = z k − j − z k − j − 1 } q j = 0 . Fit the past directions v k − 1 , . . . , v k − q to the lat- est direction v k : = argmin c ∈ R q || � q j = 1 c j v k − j − v k || . def c k Let ¯ def = z k + � q z k , 1 j = 1 c k , j v k − j + 1 . Repeat on { z k − j } q j = 0 ∪ { ¯ z k , 1 } and so on. The s -step extrapolation is ¯ z k , s = z k + E s , q , k , where E s , q , k = � q j = 1 ˆ c j v k − j + 1 and � � Id q − 1 = � � s j = 1 H ( c k ) j � ˆ def def H ( c k ) = . c k c with (: , 1 ) 0 1 , q − 1 18

Adaptive acceleration for FoM (A 2 FoM) Given first-order method z k + 1 = F ( z k ) . A 2 FoM via linear prediction z 0 = z 0 , set D = 0 ∈ R n × ( q + 1 ) : Let s ≥ 1 , q ≥ 1 be integers. Let z 0 ∈ R n and ¯ For k ≥ 1: z k = F (¯ z k − 1 ) , v k = z k − z k − 1 , D = [ v k , D (: , 1 : q )] . If mod ( k , q + 2 ) = 0: compute c and H c , � � s � If ρ ( H c ) < 1 : ¯ z k = z k + V k (: , 1 ) ; else: ¯ z k = z k . i = 1 H i c If mod ( k , q + 2 ) � = 0: ¯ z k = z k . 19

Adaptive acceleration for FoM (A 2 FoM) Simplific Simplification tion If ρ ( H c ) < 1, the Neumann series is convergent � + ∞ c = ( Id − H c ) − 1 . i = 0 H i 19

Adaptive acceleration for FoM (A 2 FoM) Simplific Simplification tion If ρ ( H c ) < 1, the Neumann series is convergent � + ∞ c = ( Id − H c ) − 1 . i = 0 H i For the summation � s i = 1 H i c , � s c = ( H c − H s + 1 )( Id − H c ) − 1 . i = 1 H i c 19

Adaptive acceleration for FoM (A 2 FoM) Simplific Simplification tion If ρ ( H c ) < 1, the Neumann series is convergent � + ∞ c = ( Id − H c ) − 1 . i = 0 H i For the summation � s i = 1 H i c , � s c = ( H c − H s + 1 )( Id − H c ) − 1 . i = 1 H i c When s = + ∞ , we have � + ∞ c = H c ( Id − H c ) − 1 i = 1 H i = ( Id − H c ) − 1 − Id . 19

Adaptive acceleration for FoM (A 2 FoM) Remark emark We extrapolate every ( q + 2 ) -iterations. 20

Adaptive acceleration for FoM (A 2 FoM) Remark emark We extrapolate every ( q + 2 ) -iterations. Only apply the linear prediction when ρ ( H c ) < 1. 20

Adaptive acceleration for FoM (A 2 FoM) Remark emark We extrapolate every ( q + 2 ) -iterations. Only apply the linear prediction when ρ ( H c ) < 1. Extra memory cost n × ( q + 1 ) (the difference vector matrix). Usually q ≤ 10 20

Adaptive acceleration for FoM (A 2 FoM) Remark emark We extrapolate every ( q + 2 ) -iterations. Only apply the linear prediction when ρ ( H c ) < 1. Extra memory cost n × ( q + 1 ) (the difference vector matrix). Usually q ≤ 10 Extra computation cost, q 2 n from V + k − 1 . 20

Adaptive acceleration for FoM (A 2 FoM) Remark emark We extrapolate every ( q + 2 ) -iterations. Only apply the linear prediction when ρ ( H c ) < 1. Extra memory cost n × ( q + 1 ) (the difference vector matrix). Usually q ≤ 10 Extra computation cost, q 2 n from V + k − 1 . Global convergence can be obtained by treating extrapolation as perturbation error [Alvarez & Attouch ’01], i.e. z k + 1 = F ( z k + ǫ k ) . Weighted LP � � s � ¯ z k = z k + a k V k (: , 1 ) , i = 1 H i c with a k updated online. 20

Example: Douglas–Rachford continue normal DR 21

Example: Douglas–Rachford continue normal DR LP, s = 4 21

Example: Douglas–Rachford continue normal DR LP, s = 4 LP, s = 25 21

Example: Douglas–Rachford continue normal DR LP, s = 4 LP, s = 25 LP, s = + 21

Convergence acceleration Given a sequence { z k } k ∈ N which converges to z ⋆ . Can we generate another sequence z k − z ⋆ || = o ( || z k − z ⋆ || ) ? { ¯ z k } k ∈ N such that || ¯ 23

Convergence acceleration Given a sequence { z k } k ∈ N which converges to z ⋆ . Can we generate another sequence z k − z ⋆ || = o ( || z k − z ⋆ || ) ? { ¯ z k } k ∈ N such that || ¯ This is called convergence acceleration and is well-established in numerical analysis: 1927 Aitkin’s ∆ -process. 1965 Andersen’s acceleration. 1970’s Vector extrapolation techniques such as minimal polynomial extrapolation (MPE) and reduced rank extrapolation (RRE) [Sidi ’17]. Now Regularized non-linear acceleration (RNA) is a regularised version of RRE introduced by [Scieur, D’Aspremont, Bach ’16]. 23

Vector extrapolation techniques Polynomial extrapolation [Cabay & Jackson ’76] Consider z k + 1 = Mz k + d with ρ ( M ) < 1 such that z k → z ⋆ : 24

Vector extrapolation techniques Polynomial extrapolation [Cabay & Jackson ’76] Consider z k + 1 = Mz k + d with ρ ( M ) < 1 such that z k → z ⋆ : z k − z ⋆ = M ( z k − 1 − z ⋆ ) = M k ( z 0 − z ⋆ ) , 24

Vector extrapolation techniques Polynomial extrapolation [Cabay & Jackson ’76] Consider z k + 1 = Mz k + d with ρ ( M ) < 1 such that z k → z ⋆ : z k − z ⋆ = M ( z k − 1 − z ⋆ ) = M k ( z 0 − z ⋆ ) , j = 0 c j λ j is the minimal polynomial of M w.r.t. z 0 − z ⋆ , that is If P ( λ ) = � q P ( M )( z 0 − z ⋆ ) = � q j = 0 c j M j ( z 0 − z ⋆ ) = 0 . � q then z ⋆ = j = 0 c j z j j c j . � The coefficients c can be computed without knowledge of z ⋆ : V q c ( 0 : q − 1 ) = − v q + 1 c q = 1 and � � where V q = v 1 | v 2 | · · · | v q and v j = z j − z j − 1 . 24

Vector extrapolation techniques Vector extrapolation methods with applications (SIAM, 2017) by Avram Sidi. Given a sequence generated by z k = F ( z k − 1 ) . Minimal polynomial extrapolation (MPE) Let z 0 = ¯ z : S.1 Generate points { z j } q + 1 j = 0 and let v j = z j − z j − 1 . S.2 Let c ∈ R q + 1 be such that c q = 1 and V q c ( 0 : q − 1 ) = − v q + 1 where V q = � � . For j ∈ [ 0 , q − 1 ] , ˜ = c j / ( � q v 1 | · · · | v q def j = 0 c j ) . c j S.3 ¯ = � q def j = 0 ˜ c j z j . z 25

Vector extrapolation techniques Vector extrapolation methods with applications (SIAM, 2017) by Avram Sidi. Given a sequence generated by z k = F ( z k − 1 ) . Minimal polynomial extrapolation (MPE) Let z 0 = ¯ z : S.1 Generate points { z j } q + 1 j = 0 and let v j = z j − z j − 1 . S.2 Let c ∈ R q + 1 be such that c q = 1 and V q c ( 0 : q − 1 ) = − v q + 1 where V q = � � . For j ∈ [ 0 , q − 1 ] , ˜ = c j / ( � q v 1 | · · · | v q def j = 0 c j ) . c j S.3 ¯ = � q def j = 0 ˜ c j z j . z Reduced rank extrapolation (RRE) [Andersen ’65; Kaniel & Stein ’74; Eddy ’79; Mešina ’77] Replace step S.2 by ˜ c ∈ argmin c || V q + 1 c || subject to 1 T c = 1 . 25

Vector extrapolation techniques Vector extrapolation methods with applications (SIAM, 2017) by Avram Sidi. Given a sequence generated by z k = F ( z k − 1 ) . Minimal polynomial extrapolation (MPE) Let z 0 = ¯ z : S.1 Generate points { z j } q + 1 j = 0 and let v j = z j − z j − 1 . S.2 Let c ∈ R q + 1 be such that c q = 1 and V q c ( 0 : q − 1 ) = − v q + 1 where V q = � � . For j ∈ [ 0 , q − 1 ] , ˜ = c j / ( � q v 1 | · · · | v q def j = 0 c j ) . c j S.3 ¯ = � q def j = 0 ˜ c j z j . z Reduced rank extrapolation (RRE) [Andersen ’65; Kaniel & Stein ’74; Eddy ’79; Mešina ’77] Replace step S.2 by ˜ c ∈ argmin c || V q + 1 c || subject to 1 T c = 1 . = � q eplaced by ¯ de def j = 0 ˜ LP LP is is equiv equivalen alent to MPE MPE with with S. S.3 replaced z c j z j + 1 . 25

Regularised non-linear acceleration (RNA) [Scieur, D’Aspremont, Bach ’16] proposed a regularised version of RRE for the case of z k + 1 − z ⋆ = A ( z k − z ⋆ ) + O ( || z k − z ⋆ || 2 ) where A is symmetric with 0 � A � σ Id, σ < 1. 26

Regularised non-linear acceleration (RNA) [Scieur, D’Aspremont, Bach ’16] proposed a regularised version of RRE for the case of z k + 1 − z ⋆ = A ( z k − z ⋆ ) + O ( || z k − z ⋆ || 2 ) where A is symmetric with 0 � A � σ Id, σ < 1. To deal with the possible ill-conditioning of V q , regularise with λ > 0: ˜ c ∈ Argmin c || c T ( V T q V q + λ Id ) c || subject to 1 T c = 1 . 26

Regularised non-linear acceleration (RNA) [Scieur, D’Aspremont, Bach ’16] proposed a regularised version of RRE for the case of z k + 1 − z ⋆ = A ( z k − z ⋆ ) + O ( || z k − z ⋆ || 2 ) where A is symmetric with 0 � A � σ Id, σ < 1. To deal with the possible ill-conditioning of V q , regularise with λ > 0: ˜ c ∈ Argmin c || c T ( V T q V q + λ Id ) c || subject to 1 T c = 1 . In practice, grid search with the objective to find optimal λ ∈ [ λ min , λ max ] . 26

Regularised non-linear acceleration (RNA) [Scieur, D’Aspremont, Bach ’16] proposed a regularised version of RRE for the case of z k + 1 − z ⋆ = A ( z k − z ⋆ ) + O ( || z k − z ⋆ || 2 ) where A is symmetric with 0 � A � σ Id, σ < 1. To deal with the possible ill-conditioning of V q , regularise with λ > 0: ˜ c ∈ Argmin c || c T ( V T q V q + λ Id ) c || subject to 1 T c = 1 . In practice, grid search with the objective to find optimal λ ∈ [ λ min , λ max ] . The angle between z k − z k − 1 and z k + 1 − z k converges to zero, intuitively, this is the regime where standard inertial works well... 26

Acceleration guarantees We have local acceleration guarantees thanks to results on MPE and RRE [Sidi ’98]: 27

Acceleration guarantees We have local acceleration guarantees thanks to results on MPE and RRE [Sidi ’98]: When z k + 1 − z k = M ( z k − z k − 1 ) , z k , s − z ∗ || ≤ || z k + s − z ∗ || + B ǫ k || ¯ ℓ = 1 || M ℓ ||| � s − ℓ where ǫ k = || V k − 1 c − v k || and B def = � s i = 0 ( H i c ) ( 1 , 1 ) | . 27

Acceleration guarantees We have local acceleration guarantees thanks to results on MPE and RRE [Sidi ’98]: When z k + 1 − z k = M ( z k − z k − 1 ) , z k , s − z ∗ || ≤ || z k + s − z ∗ || + B ǫ k || ¯ ℓ = 1 || M ℓ ||| � s − ℓ where ǫ k = || V k − 1 c − v k || and B def = � s i = 0 ( H i c ) ( 1 , 1 ) | . Asymptotic bound ( k → ∞ ): ǫ k = O ( | λ q + 1 | k ) where λ q + 1 is the ( q + 1 ) th largest eigenvalue. Without extrapolation, we just have O ( | λ 1 | k ) . 27

Acceleration guarantees We have local acceleration guarantees thanks to results on MPE and RRE [Sidi ’98]: When z k + 1 − z k = M ( z k − z k − 1 ) , z k , s − z ∗ || ≤ || z k + s − z ∗ || + B ǫ k || ¯ ℓ = 1 || M ℓ ||| � s − ℓ where ǫ k = || V k − 1 c − v k || and B def = � s i = 0 ( H i c ) ( 1 , 1 ) | . Asymptotic bound ( k → ∞ ): ǫ k = O ( | λ q + 1 | k ) where λ q + 1 is the ( q + 1 ) th largest eigenvalue. Without extrapolation, we just have O ( | λ 1 | k ) . Non-asymptotic bound: If Σ( M ) ⊂ [ α, β ] with − 1 < α < β < 1, then B ǫ k ≤ K β k − q � √ η − 1 � q η = 1 − α , √ η + 1 where 1 − β 27

Acceleration guarantees We have local acceleration guarantees thanks to results on MPE and RRE [Sidi ’98]: When z k + 1 − z k = M ( z k − z k − 1 ) , z k , s − z ∗ || ≤ || z k + s − z ∗ || + B ǫ k || ¯ ℓ = 1 || M ℓ ||| � s − ℓ where ǫ k = || V k − 1 c − v k || and B def = � s i = 0 ( H i c ) ( 1 , 1 ) | . Asymptotic bound ( k → ∞ ): ǫ k = O ( | λ q + 1 | k ) where λ q + 1 is the ( q + 1 ) th largest eigenvalue. Without extrapolation, we just have O ( | λ 1 | k ) . Non-asymptotic bound: If Σ( M ) ⊂ [ α, β ] with − 1 < α < β < 1, then B ǫ k ≤ K β k − q � √ η − 1 � q η = 1 − α , √ η + 1 where 1 − β For PD, DR with polyhedral functions, guaranteed acceleration with q = 2. 27

Our contributions We tackle the non-smoothness of the methods using partial smoothness and give in- sight as to why vector extrapolation techniques work. 28

Our contributions We tackle the non-smoothness of the methods using partial smoothness and give in- sight as to why vector extrapolation techniques work. Our acceleration is derived via sequence trajectory . Minor difference in final form. 28

Geometry of First-order Methods Trajectory and Adaptive Acceleration - PowerPoint PPT Presentation

Geometry of First-order Methods Trajectory and Adaptive Acceleration Clarice Poon University of Bath Joint work with: Jingwei Liang, University of Cambridge Outline Introduction Trajectory of first-order methods Adaptive acceleration via

Stochastic geometry and random generation 1 Stochastic geometry and random generation

48-175 Descriptive Geometry Basic Concepts of Descriptive Geometry Descriptive geometry is

Reorder Buffer Method Issue Execute Write Classic 5-stage pipeline In-order In-order

Hyperbolic Geometry Victor Gonzalez Mentor: Ryan Kirk May 4, 2016 Hyperbolic Geometry We are

Geometry Problems Geometry Problems Examples for Typical ACM Instances Elementary Geometry

3d Geometry for Computer Graphics Lesson 1: Basics & PCA 3d geometry 3d geometry 3d

3. First-Order Theories 3- 1 First-Order Theories First-order theory T defined by Signature

Using first order logic (Ch. 8-9) Review: First order logic In first order logic, we have objects

Using first order logic (Ch. 8-9) Review: First order logic In first order logic, we have objects

First Order Logic: First-order resolution. Valentin Goranko DTU Informatics September 2010 V

First-order logic 6 AI Slides (5e) c Lin Zuoquan@PKU 2003-2019 6 1 6 First-Order Logic

First-Order Logical Duality Henrik Forssell June 2008 First-Order Logical Duality Introduction

York University www.cs.york.ac.uk/~ndm First order vs Higher order Higher order:

Geometry Euclid of Alexandria The Founder of Geometry. He was a Greek mathematician, often

Ansys - Old Geometry - Cathode 1 Ansys - New Geometry - Cathode lamella (PCB and copper

Snapshots from the History of Toric Geometry David A. Cox Geometry 19701988 Toric Geometry

M.S. Petrovi , A.I. Strini , N.B . Aleksi and M.R. Beli 1 OR THE TALE OF THE PAPER THAT

Engineering Analysis ENG 3420 Fall 2009 Dan C. Marinescu Office: HEC 439 B Office hours: Tu-Th

CS475 / CM375 Lecture 23: Nov 29, 2011 Convergence of Iterative Methods CS475/CM375 (c) 2011 P.

Parallel Scientific Computing Matrix-vector multiplication. Matrix-matrix multiplication.

PetaBricks A Language and Compiler for Algorithmic Choice Jason Ansel Cy Chan Yee Lok Wong

Analysis of the Influences on Server Power Consumption and Energy Efficiency for CPU-Intensive

Java on Scalable Memory Architectures University of Crete , 25th of October 2016 Foivos S. Zakkak

Assessing the algorithmic scaling behavior of PDE solvers Matthias Bolten University of

Geometry of First-order Methods Trajectory and Adaptive Acceleration - PowerPoint PPT Presentation

Geometry of First-order Methods Trajectory and Adaptive Acceleration Clarice Poon University of Bath Joint work with: Jingwei Liang, University of Cambridge Outline Introduction Trajectory of first-order methods Adaptive acceleration via

Stochastic geometry and random generation 1 Stochastic geometry and random generation

48-175 Descriptive Geometry Basic Concepts of Descriptive Geometry Descriptive geometry is

Reorder Buffer Method Issue Execute Write Classic 5-stage pipeline In-order In-order

Hyperbolic Geometry Victor Gonzalez Mentor: Ryan Kirk May 4, 2016 Hyperbolic Geometry We are

Geometry Problems Geometry Problems Examples for Typical ACM Instances Elementary Geometry

3d Geometry for Computer Graphics Lesson 1: Basics &amp; PCA 3d geometry 3d geometry 3d

3. First-Order Theories 3- 1 First-Order Theories First-order theory T defined by Signature

Using first order logic (Ch. 8-9) Review: First order logic In first order logic, we have objects

Using first order logic (Ch. 8-9) Review: First order logic In first order logic, we have objects

First Order Logic: First-order resolution. Valentin Goranko DTU Informatics September 2010 V

First-order logic 6 AI Slides (5e) c Lin Zuoquan@PKU 2003-2019 6 1 6 First-Order Logic

First-Order Logical Duality Henrik Forssell June 2008 First-Order Logical Duality Introduction

York University www.cs.york.ac.uk/~ndm First order vs Higher order Higher order:

Geometry Euclid of Alexandria The Founder of Geometry. He was a Greek mathematician, often

Ansys - Old Geometry - Cathode 1 Ansys - New Geometry - Cathode lamella (PCB and copper

Snapshots from the History of Toric Geometry David A. Cox Geometry 19701988 Toric Geometry

M.S. Petrovi , A.I. Strini , N.B . Aleksi and M.R. Beli 1 OR THE TALE OF THE PAPER THAT

Engineering Analysis ENG 3420 Fall 2009 Dan C. Marinescu Office: HEC 439 B Office hours: Tu-Th

CS475 / CM375 Lecture 23: Nov 29, 2011 Convergence of Iterative Methods CS475/CM375 (c) 2011 P.

Parallel Scientific Computing Matrix-vector multiplication. Matrix-matrix multiplication.

PetaBricks A Language and Compiler for Algorithmic Choice Jason Ansel Cy Chan Yee Lok Wong

Analysis of the Influences on Server Power Consumption and Energy Efficiency for CPU-Intensive

Java on Scalable Memory Architectures University of Crete , 25th of October 2016 Foivos S. Zakkak

Assessing the algorithmic scaling behavior of PDE solvers Matthias Bolten University of

3d Geometry for Computer Graphics Lesson 1: Basics & PCA 3d geometry 3d geometry 3d