robust control for analysis and design of large-scale optimization - PowerPoint PPT Presentation

robust control for analysis and design of large-scale optimization algorithms Laurent Lessard University of Wisconsin–Madison Joint work with Ben Recht and Andy Packard LCCC Workshop on Large-Scale and Distributed Optimization Lund University, June 15, 2017

1. Many algorithms can be viewed as dynamical systems with feedback (control systems!). algorithm convergence ⇐ ⇒ system stability 2. By solving a small convex program, we can recover state-of-the-art convergence results for these algorithms, automatically and efficiently. 3. The ultimate goal: to move from analysis to design. 2

Unconstrained optimization: minimize f ( x ) x ∈ R N subject to • need algorithms that are fast and simple • currently favored family: first-order methods 3

Gradient method 10 0 x k +1 = x k − α ∇ f ( x k ) Error 10 − 2 Heavy ball method 10 − 4 x k +1 = x k − α ∇ f ( x k ) + β ( x k − x k − 1 ) 0 20 40 60 80 Nesterov’s accelerated method y k = x k + β ( x k − x k − 1 ) x k +1 = y k − α ∇ f ( y k ) x 1 contours of f ( x ) (quadratic) x 0 4

Robust algorithm selection G ∈ G : algorithm we’re going to use f ∈ S : function we’d like to minimize � � G opt = arg min max f ∈S cost( f, G ) G ∈G Similar problem for a finite number of iterations: • Drori, Teboulle (2012) • Taylor, Hendrickx, Glineur (2016) 5

 Gradient method    x k +1 = x k − α ∇ f ( x k )          Heavy ball method G ∈ G x k +1 = x k − α ∇ f ( x k ) + β ( x k − x k − 1 )        Nesterov’s accelerated method     � �  x k +1 = x k − α ∇ x k + β ( x k − x k − 1 ) + β ( x k − x k − 1 ) f  Analytically solvable:    Quadratic functions: f ( x ) = 1 2 x T Qx − p T x f ∈ S    with the constraint: mI � Q � LI 6

Convergence rate on quadratic functions Iterations to convergence for Gradient method 10 3 1 1 Iterations to convergence 10 2 Convergence rate ρ 0 . 8 0 . 6 10 1 1 / 2 0 . 4 Gradient (quadratic) 10 0 0 . 2 Nesterov (quadratic) Heavy ball (quadratic) 0 10 0 10 1 10 2 10 3 10 4 10 0 10 1 10 2 10 3 10 4 Condition ratio L/m Condition ratio L/m Convergence rate : � x k − x ⋆ � ≤ Cρ k � x 0 − x ⋆ � 1 Iterations to convergence ∝ − log ρ 7

Robust algorithm selection G ∈ G : algorithm we’re going to use f ∈ S : function we’d like to minimize � � G opt = arg min max f ∈S cost( f, G ) G ∈G 1. mathematical representation for G 2. mathematical representation for S 3. main robustness result 8

Dynamical system interpretation Heavy ball: x k +1 = x k − α ∇ f ( x k ) + β ( x k − x k − 1 ) Define u k := ∇ f ( x k ) and p k := x k − 1 algorithm (linear, known, decoupled) � x k +1 � � (1 + β ) I � � x k � � − αI � − βI = + u k p k +1 I 0 p k 0 � � x k � � y k = I 0 p k y u u k = ∇ f ( y k ) function (nonlinear, uncertain, coupled) 9

Dynamical system interpretation x k +1 = x k − α ∇ f ( x k ) + β ( x k − x k − 1 ) Heavy ball: Define u k := ∇ f ( x k ) and p k := x k − 1 algorithm (linear, known, decoupled ) � ( x k +1 ) i � � 1 + β � � ( x k ) i � � − α � − β � ( x k +1 ) i � � 1 + β � � ( x k ) i � � − α � − β � ( x k +1 ) i � ( x k +1 ) i � � � 1 + β � 1 + β � � ( x k ) i � � ( x k ) i � � � − α � − α � � = + ( u k ) i − β − β = + ( u k ) i 0 ( p k +1 ) i 1 0 ( p k ) i = = + + ( u k ) i ( u k ) i 0 ( p k +1 ) i 1 0 ( p k ) i 0 0 ( p k +1 ) i ( p k +1 ) i 1 1 0 0 ( p k ) i ( p k ) i � � ( x k ) i � � � ( x k ) i � � � � ( x k ) i � � ( x k ) i � � ( y k ) i = 1 � 0 ( y k ) i = 1 � � 0 ( p k ) i ( y k ) i = ( y k ) i = 1 1 0 0 ( p k ) i ( p k ) i ( p k ) i y u i = 1 , . . . , N u k = ∇ f ( y k ) function (nonlinear, uncertain, coupled ) 10

G ξ ξ k +1 = Aξ k + Bu k y k = Cξ k y u ∇ f u k = ∇ f ( y k )  � � 1 − α  Gradient    1 0       1 + β − β − α   � �  A B  Heavy ball 1 0 0   = 1 0 0 C 0       1 + β − β − α     Nesterov 1 0 0       − β 1 + β 0 11

y u ∇ f � �� ∇ ∇ ∇ f ( x ) f ( x ) f ( x ) ⊂ ⊂ ∇ f ( x ) : x x x linear sector bounded + slope restricted sector bounded � � � f ( x ) f ( x ) f ( x ) ⊂ ⊂ f ( x ) : x x x quadratic strongly convex + Lipschitz gradients radially quasiconvex 12

y u ∇ f Representing function classes express as quadratic constraints on ( y, u ) u k ∇ f is a passive function: u k y k ≥ 0 y k sector bounded 13

y u ∇ f Representing function classes express as quadratic constraints on ( y, u ) u k ∇ f is sector-bounded : L � y k � T � − 2 mL � � y k � m m + L y k ≥ 0 − 2 u k m + L u k sector bounded 14

y u ∇ f Representing function classes express as quadratic constraints on ( y, u ) u k L ∇ f is sector-bounded + slope-restricted: m constraint on ( y k , u k ) depends on history y k ( y 0 , . . . , y k − 1 , u 0 , . . . , u k − 1 ) . sector bounded + slope restricted 15

y u ∇ f z Ψ ζ Introduce extra dynamics • Design dynamics Ψ and multiplier matrix M . • Instead of using q ( u k , y k ) , use z T k Mz k . • Systematic way of doing this for strong convexity via Zames-Falb multipliers (1968). • General theory: Integral Quadratic Constraints (Megretski & Rantzer 1997) 16

 � � 1 − α  Gradient  1 0   G  � 1 + β �  � A  � − β − α  B 1 0 0 y Heavy ball u 1 0 0 C 0   � 1 + β �  ∇ f  − β − α   1 0 0  Nesterov  1 + β − β 0 ∇ f ( x ) ∇ f ( x ) ∇ f ( x ) ⊂ ⊂ x x x f is quadratic f is strongly convex f is quasiconvex � �� (Ψ , M ) 17

Main result Problem data: • G (the algorithm) Size of LMI does not grow with problem dimension! • Ψ (what we know about f ) e.g. P ∈ S 3 × 3 , LMI ∈ S 4 × 4 Auxiliary quantities: • Compute ( ˆ A, ˆ B, ˆ C, ˆ D ) matrices from ( G, Ψ) • Choose a candidate rate 0 < ρ < 1 . If there exists P ≻ 0 such that � ˆ � A T P ˆ A T P ˆ ˆ A − ρ 2 P � ˆ � ˆ B � T M � ˆ ˆ + C D C D � 0 B T P ˆ ˆ B T P ˆ ˆ A B � cond( P ) ρ k � x 0 − x ⋆ � for all k . then � x k − x ⋆ � ≤ 18

main results: analytic and numerical 19

Gradient method x k +1 = x k − α ∇ f ( x k ) Convergence rate for Gradient method Iterations to convergence for Gradient method 10 3 1 1 Iterations to convergence Convergence rate ρ 10 2 0 . 8 0 . 6 10 1 1 / 2 0 . 4 Gradient ( all functions ) 10 0 0 . 2 Nesterov (quadratic) Heavy ball (quadratic) 0 10 0 10 1 10 2 10 3 10 4 10 0 10 1 10 2 10 3 10 4 Condition ratio L/m Condition ratio L/m analytic solution! Same rate for: quadratics, strongly convex, or quasiconvex functions. 20

Nesterov’s method x k +1 = x k − α ∇ f ( x k + β ( x k − x k − 1 )) + β ( x k − x k − 1 ) Nesterov rate bounds Nesterov iterations 10 3 1 Iterations to convergence Convergence rate ρ 0 . 8 10 2 0 . 6 10 1 IQC (quasiconvex) 0 . 4 IQC (strongly convex) Nesterov (strongly convex) 10 0 0 . 2 Nesterov (quadratic) Heavy ball (quadratic) 0 10 − 1 10 0 10 1 10 2 10 3 10 4 10 0 10 1 10 2 10 3 10 4 Condition ratio L/m Condition ratio L/m • Cannot certify stability for quasiconvex functions • IQC bound improves upon best known bound! 21

Heavy ball method x k +1 = x k − α ∇ f ( x k ) + β ( x k − x k − 1 ) Heavy ball rate bounds Heavy ball iterations 10 3 1 Iterations to convergence Convergence rate ρ 0 . 8 10 2 0 . 6 10 1 0 . 4 IQC (quasiconvex) IQC (strongly convex) 10 0 0 . 2 Nesterov (quadratic) Heavy ball (quadratic) 0 10 − 1 10 0 10 1 10 2 10 3 10 4 10 0 10 1 10 2 10 3 10 4 Condition ratio L/m Condition ratio L/m • Cannot certify stability for quasiconvex functions • Cannot certify stability for strongly convex functions 22

The heavy ball method is not stable!  25 2 x 2 x < 1   2 x 2 + 24 x − 12 1 counterexample: f ( x ) = 1 ≤ x < 2  2 x 2 − 24 x + 36 25  x ≥ 2 and start the heavy ball iteration at x 0 = x 1 ∈ [3 . 07 , 3 . 46] . 80 • L/m = 25 60 • heavy ball iterations converge to a limit cycle 40 • simple counterexample to 20 the Aizerman (1949) and 0 f ( x ) Kalman (1957) conjectures − 2 0 2 4 23

uncharted territory: noise robustness and algorithm design 24

Noise robustness The ∆ δ block is uncertain u G multiplicative noise: ξ � u k − w k � ≤ δ � w k � y ∆ δ ∇ f w How does an algorithm perform in the presence of noise? 25

robust control for analysis and design of large-scale optimization - PowerPoint PPT Presentation

robust control for analysis and design of large-scale optimization algorithms Laurent Lessard University of WisconsinMadison Joint work with Ben Recht and Andy Packard LCCC Workshop on Large-Scale and Distributed Optimization Lund

A large-scale International IPv6 Network A large-scale International IPv6 Network www.6net.org

Outlier Outlier Outlier- Outlier - -robust - robust robust robust identification

FINANCING LARGE SCALE SOLAR Large Scale Solar Conference - Sydney Gloria Chan Director, Large

Short Course in Supervised Learning Robust Optimization and Machine Learning Robust Supervised

Large Scale Complex Network Analysis using Large Scale Complex Network Analysis using the Hybrid

Large-Scale Machine Learning at Twitter 2 Large-Scale Machine Learning at Twitter Jimmy Lin and

Fragility of controllers Robust and non-fragile control systems Robust and non-fragile control

INFRASTRUCTURE 2110414 Large Scale Computing Systems Natawut Nupairoj, Ph.D. Outline 2

Robust Location and Scatter Estimators Outline for Multivariate Data Analysis Background

Industrial Robots Industrial Robots Control Control Part 1 Control Control Part 1 Part 1

Action Robust Reinforcement Learning and Applications in Continuous Control Chen Tessler *,

Robust control of a risk-sensitive performance measure Paul Dupuis Division of Applied

Nonlinear Control Lecture # 28 Robust State Feedback Stabilization Nonlinear Control Lecture # 28

Design & Analysis of Design & Analysis of Design & Analysis of Physical Design

A large-scale chemical data integration system Gaia Paolini Pfizer Confidential 1 Large-Scale

Robust estimation of precision matrices under cellwise contamination Garth Tarr, Samuel M

From Smith to Schumpeter: A Theory of Take-o and Convergence to Sustained Growth Pietro F.

IFRS CONVERGENCE AND IMPLEMENTATION Simon Tay Pit Eu 1 Regulators: Regulators: Central Bank

Fixedmobile Convergence: Structural convergence MoSvaSon and technological triggers Dirk

A -convergence result and an application to the derivation of the Monge-Ampre gravitational

WOU Student Media Rhys s Finch ch, MBA Student Media Adviser Student Media is Northw

On a two-level domain decomposition preconditioner for 3D flows in anisotropic highly

New Media Production 2 MUMT 303 Week 1 Charalampos Saitis What is new media? What is new

Surveillance and the Public Sphere: confronting a democratic dilemma Oscar H Gandy Jr Professor

robust control for analysis and design of large-scale optimization - PowerPoint PPT Presentation

robust control for analysis and design of large-scale optimization algorithms Laurent Lessard University of WisconsinMadison Joint work with Ben Recht and Andy Packard LCCC Workshop on Large-Scale and Distributed Optimization Lund

A large-scale International IPv6 Network A large-scale International IPv6 Network www.6net.org

Outlier Outlier Outlier- Outlier - -robust - robust robust robust identification

FINANCING LARGE SCALE SOLAR Large Scale Solar Conference - Sydney Gloria Chan Director, Large

Short Course in Supervised Learning Robust Optimization and Machine Learning Robust Supervised

Large Scale Complex Network Analysis using Large Scale Complex Network Analysis using the Hybrid

Large-Scale Machine Learning at Twitter 2 Large-Scale Machine Learning at Twitter Jimmy Lin and

Fragility of controllers Robust and non-fragile control systems Robust and non-fragile control

INFRASTRUCTURE 2110414 Large Scale Computing Systems Natawut Nupairoj, Ph.D. Outline 2

Robust Location and Scatter Estimators Outline for Multivariate Data Analysis Background

Industrial Robots Industrial Robots Control Control Part 1 Control Control Part 1 Part 1

Action Robust Reinforcement Learning and Applications in Continuous Control Chen Tessler *,

Robust control of a risk-sensitive performance measure Paul Dupuis Division of Applied

Nonlinear Control Lecture # 28 Robust State Feedback Stabilization Nonlinear Control Lecture # 28

Design &amp; Analysis of Design &amp; Analysis of Design &amp; Analysis of Physical Design

A large-scale chemical data integration system Gaia Paolini Pfizer Confidential 1 Large-Scale

Robust estimation of precision matrices under cellwise contamination Garth Tarr, Samuel M

From Smith to Schumpeter: A Theory of Take-o and Convergence to Sustained Growth Pietro F.

IFRS CONVERGENCE AND IMPLEMENTATION Simon Tay Pit Eu 1 Regulators: Regulators: Central Bank

Fixedmobile Convergence: Structural convergence MoSvaSon and technological triggers Dirk

A -convergence result and an application to the derivation of the Monge-Ampre gravitational

WOU Student Media Rhys s Finch ch, MBA Student Media Adviser Student Media is Northw

On a two-level domain decomposition preconditioner for 3D flows in anisotropic highly

New Media Production 2 MUMT 303 Week 1 Charalampos Saitis What is new media? What is new

Surveillance and the Public Sphere: confronting a democratic dilemma Oscar H Gandy Jr Professor

Design & Analysis of Design & Analysis of Design & Analysis of Physical Design