convex optimization modeling and algorithms
play

Convex Optimization: Modeling and Algorithms Lieven Vandenberghe - PowerPoint PPT Presentation

Convex Optimization: Modeling and Algorithms Lieven Vandenberghe Electrical Engineering Department, UC Los Angeles Tutorial lectures, 21st Machine Learning Summer School Kyoto, August 29-30, 2012 Convex optimization MLSS 2012 Introduction


  1. Examples convex quadratic function ( Q ≻ 0 ) f ( x ) = 1 f ∗ ( y ) = 1 2 x T Qx 2 y T Q − 1 y negative entropy n n e y i − 1 � � f ∗ ( y ) = f ( x ) = x i log x i i =1 i =1 norm � � y � ∗ ≤ 1 0 f ∗ ( y ) = f ( x ) = � x � + ∞ otherwise indicator function ( C convex) � x ∈ C 0 f ∗ ( y ) = sup y T x f ( x ) = I C ( x ) = + ∞ otherwise x ∈ C Convex sets and functions 24

  2. Convex optimization — MLSS 2012 Convex optimization problems • linear programming • quadratic programming • geometric programming • second-order cone programming • semidefinite programming

  3. Convex optimization problem minimize f 0 ( x ) f i ( x ) ≤ 0 , subject to i = 1 , . . . , m Ax = b f 0 , f 1 , . . . , f m are convex functions • feasible set is convex • locally optimal points are globally optimal • tractable, in theory and practice Convex optimization problems 25

  4. Linear program (LP) c T x + d minimize subject to Gx ≤ h Ax = b • inequality is componentwise vector inequality • convex problem with affine objective and constraint functions • feasible set is a polyhedron − c x ⋆ P Convex optimization problems 26

  5. Piecewise-linear minimization i =1 ,...,m ( a T minimize f ( x ) = max i x + b i ) f ( x ) a T i x + b i x equivalent linear program minimize t a T i x + b i ≤ t, subject to i = 1 , . . . , m an LP with variables x , t ∈ R Convex optimization problems 27

  6. ℓ 1 -Norm and ℓ ∞ -norm minimization ℓ 1 -norm approximation and equivalent LP ( � y � 1 = � k | y k | ) n � � Ax − b � 1 minimize minimize y i i =1 − y ≤ Ax − b ≤ y subject to ℓ ∞ -norm approximation ( � y � ∞ = max k | y k | ) minimize � Ax − b � ∞ minimize y subject to − y 1 ≤ Ax − b ≤ y 1 ( 1 is vector of ones) Convex optimization problems 28

  7. ✁ � ✁ � ✁ � example: histograms of residuals Ax − b (with A is 200 × 80 ) for x ls = argmin � Ax − b � 2 , x ℓ 1 = argmin � Ax − b � 1 10 8 6 4 2 0 1.5 1.0 0.5 0.0 0.5 1.0 1.5 ( Ax ls − b ) k 100 80 60 40 20 0 1.5 1.0 0.5 0.0 0.5 1.0 1.5 ( Ax ℓ 1 − b ) k 1-norm distribution is wider with a high peak at zero Convex optimization problems 29

  8. ✂ ✂ ✂ ✂ ✂ ✂ Robust regression 25 20 15 10 5 f ( t ) 0 5 10 15 20 10 5 0 5 10 t • 42 points t i , y i (circles), including two outliers • function f ( t ) = α + βt fitted using 2 -norm (dashed) and 1 -norm Convex optimization problems 30

  9. Linear discrimination • given a set of points { x 1 , . . . , x N } with binary labels s i ∈ {− 1 , 1 } • find hyperplane a T x + b = 0 that strictly separates the two classes a T x i + b > 0 if s i = 1 a T x i + b < 0 if s i = − 1 homogeneous in a , b , hence equivalent to the linear inequalities (in a , b ) s i ( a T x i + b ) ≥ 1 , i = 1 , . . . , N Convex optimization problems 31

  10. Approximate linear separation of non-separable sets N � max { 0 , 1 − s i ( a T x i + b ) } minimize i =1 • a piecewise-linear minimization problem in a , b ; equivalent to an LP • can be interpreted as a heuristic for minimizing #misclassified points Convex optimization problems 32

  11. Quadratic program (QP) (1 / 2) x T Px + q T x + r minimize Gx ≤ h subject to • P ∈ S n + , so objective is convex quadratic • minimize a convex quadratic function over a polyhedron −∇ f 0 ( x ⋆ ) x ⋆ P Convex optimization problems 33

  12. Linear program with random cost c T x minimize Gx ≤ h subject to • c is random vector with mean ¯ c and covariance Σ • hence, c T x is random variable with mean ¯ c T x and variance x T Σ x expected cost-variance trade-off E c T x + γ var ( c T x ) = ¯ c T x + γx T Σ x minimize subject to Gx ≤ h γ > 0 is risk aversion parameter Convex optimization problems 34

  13. Robust linear discrimination { z | a T z + b = 1 } H 1 = { z | a T z + b = − 1 } H − 1 = distance between hyperplanes is 2 / � a � 2 to separate two sets of points by maximum margin, � a � 2 2 = a T a minimize s i ( a T x i + b ) ≥ 1 , subject to i = 1 , . . . , N a quadratic program in a , b Convex optimization problems 35

  14. Support vector classifier N � γ � a � 2 max { 0 , 1 − s i ( a T x i + b ) } minimize 2 + i =1 γ = 0 γ = 10 equivalent to a quadratic program Convex optimization problems 36

  15. Kernel formulation f ( Xa ) + � a � 2 minimize 2 • variables a ∈ R n • X ∈ R N × n with N ≤ n and rank N change of variables a = X T ( XX T ) − 1 y y = Xa, • a is minimum-norm solution of Xa = y • gives convex problem with N variables y f ( y ) + y T Q − 1 y minimize Q = XX T is kernel matrix Convex optimization problems 37

  16. Total variation signal reconstruction x − x cor � 2 minimize � ˆ 2 + γφ (ˆ x ) • x cor = x + v is corrupted version of unknown signal x , with noise v • variable ˆ x (reconstructed signal) is estimate of x • φ : R n → R is quadratic or total variation smoothing penalty n − 1 n − 1 � � x i ) 2 , φ quad (ˆ x ) = (ˆ x i +1 − ˆ φ tv (ˆ x ) = | ˆ x i +1 − ˆ x i | i =1 i =1 Convex optimization problems 38

  17. ✄ ✄ ✄ example: x cor , and reconstruction with quadratic and t.v. smoothing 2 x cor 0 2 0 500 1000 1500 2000 2 i quad. 0 2 0 500 1000 1500 2000 2 i t.v. 0 2 0 500 1000 1500 2000 i • quadratic smoothing smooths out noise and sharp transitions in signal • total variation smoothing preserves sharp transitions in signal Convex optimization problems 39

  18. Geometric programming posynomial function K � c k x a 1 k 1 x a 2 k dom f = R n · · · x a nk f ( x ) = n , ++ 2 k =1 with c k > 0 geometric program (GP) minimize f 0 ( x ) subject to f i ( x ) ≤ 1 , i = 1 , . . . , m with f i posynomial Convex optimization problems 40

  19. Geometric program in convex form change variables to y i = log x i , and take logarithm of cost, constraints geometric program in convex form: � K � � exp( a T minimize log 0 k y + b 0 k ) k =1 � K � � exp( a T ≤ 0 , subject to log ik y + b ik ) i = 1 , . . . , m k =1 b ik = log c ik Convex optimization problems 41

  20. Second-order cone program (SOCP) f T x minimize � A i x + b i � 2 ≤ c T subject to i x + d i , i = 1 , . . . , m � y 2 • � · � 2 is Euclidean norm � y � 2 = 1 + · · · + y 2 n • constraints are nonlinear, nondifferentiable, convex 1 constraints are inequalities w.r.t. second-order cone: 0 . 5 y 3 � � � � y 2 1 + · · · + y 2 y p − 1 ≤ y p � 0 � 1 1 0 0 y 2 − 1 − 1 y 1 Convex optimization problems 42

  21. Robust linear program (stochastic) c T x minimize prob ( a T i x ≤ b i ) ≥ η, subject to i = 1 , . . . , m • a i random and normally distributed with mean ¯ a i , covariance Σ i • we require that x satisfies each constraint with probability exceeding η η = 10% η = 50% η = 90% Convex optimization problems 43

  22. SOCP formulation the ‘chance constraint’ prob ( a T i x ≤ b i ) ≥ η is equivalent to the constraint i x + Φ − 1 ( η ) � Σ 1 / 2 a T ¯ x � 2 ≤ b i i Φ is the (unit) normal cumulative density function 1 η Φ( t ) 0.5 Φ − 1 ( η ) 0 0 t robust LP is a second-order cone program for η ≥ 0 . 5 Convex optimization problems 44

  23. Robust linear program (deterministic) c T x minimize a T i x ≤ b i for all a i ∈ E i , subject to i = 1 , . . . , m • a i uncertain but bounded by ellipsoid E i = { ¯ a i + P i u | � u � 2 ≤ 1 } • we require that x satisfies each constraint for all possible a i SOCP formulation c T x minimize a T i x + � P T i x � 2 ≤ b i , subject to ¯ i = 1 , . . . , m follows from a i + P i u ) T x = ¯ a T i x + � P T sup (¯ i x � 2 � u � 2 ≤ 1 Convex optimization problems 45

  24. Examples of second-order cone constraints convex quadratic constraint ( A = LL T positive definite) x T Ax + 2 b T x + c ≤ 0 � � � L T x + L − 1 b � 2 ≤ ( b T A − 1 b − c ) 1 / 2 � extends to positive semidefinite singular A hyperbolic constraint x T x ≤ yz, y, z ≥ 0 � � �� � 2 x � � ≤ y + z, y, z ≥ 0 � � y − z � � 2 Convex optimization problems 46

  25. Examples of SOC-representable constraints positive powers x 1 . 5 ≤ t, x ≥ 0 � x 2 ≤ tz, z 2 ≤ x, ∃ z : x, z ≥ 0 • two hyperbolic constraints can be converted to SOC constraints • extends to powers x p for rational p ≥ 1 negative powers x − 3 ≤ t, x > 0 � z 2 ≤ tx, ∃ z : 1 ≤ tz, x, z ≥ 0 • two hyperbolic constraints on r.h.s. can be converted to SOC constraints • extends to powers x p for rational p < 0 Convex optimization problems 47

  26. Semidefinite program (SDP) c T x minimize subject to x 1 A 1 + x 2 A 2 + · · · + x n A n � B • A 1 , A 2 , . . . , A n , B are symmetric matrices • inequality X � Y means Y − X is positive semidefinite , i.e. , � z T ( Y − X ) z = ( Y ij − X ij ) z i z j ≥ 0 for all z i,j • includes many nonlinear constraints as special cases Convex optimization problems 48

  27. Geometry 1 0 . 5 z � � x y � 0 y z 0 1 1 0 0 . 5 y − 1 0 x • a nonpolyhedral convex cone • feasible set of a semidefinite program is the intersection of the positive semidefinite cone in high dimension with planes Convex optimization problems 49

  28. Examples ( A i ∈ S n ) A ( x ) = A 0 + x 1 A 1 + · · · + x m A m eigenvalue minimization (and equivalent SDP) minimize λ max ( A ( x )) minimize t A ( x ) � tI subject to matrix-fractional function b T A ( x ) − 1 b minimize minimize t � A ( x ) � A ( x ) � 0 subject to b subject to � 0 b T t Convex optimization problems 50

  29. Matrix norm minimization ( A i ∈ R p × q ) A ( x ) = A 0 + x 1 A 1 + x 2 A 2 + · · · + x n A n matrix norm approximation ( � X � 2 = max k σ k ( X ) ) � A ( x ) � 2 minimize minimize t A ( x ) T � � tI subject to � 0 A ( x ) tI nuclear norm approximation ( � X � ∗ = � k σ k ( X ) ) minimize � A ( x ) � ∗ minimize ( tr U + tr V ) / 2 A ( x ) T � � U � 0 subject to A ( x ) V Convex optimization problems 51

  30. Semidefinite relaxation semidefinite programming is often used • to find good bounds for nonconvex polynomial problems, via relaxation • as a heuristic for good suboptimal points example: Boolean least-squares � Ax − b � 2 minimize 2 x 2 subject to i = 1 , i = 1 , . . . , n • basic problem in digital communications • could check all 2 n possible values of x ∈ {− 1 , 1 } n . . . • an NP-hard problem, and very hard in general Convex optimization problems 52

  31. Lifting Boolean least-squares problem x T A T Ax − 2 b T Ax + b T b minimize x 2 subject to i = 1 , i = 1 , . . . , n reformulation: introduce new variable Y = xx T tr ( A T AY ) − 2 b T Ax + b T b minimize Y = xx T subject to diag ( Y ) = 1 • cost function and second constraint are linear (in the variables Y , x ) • first constraint is nonlinear and nonconvex . . . still a very hard problem Convex optimization problems 53

  32. Relaxation replace Y = xx T with weaker constraint Y � xx T to obtain relaxation tr ( A T AY ) − 2 b T Ax + b T b minimize Y � xx T subject to diag ( Y ) = 1 • convex; can be solved as a semidefinite program � Y � x Y � xx T ⇐ ⇒ � 0 x T 1 • optimal value gives lower bound for Boolean LS problem • if Y = xx T at the optimum, we have solved the exact problem • otherwise, can use randomized rounding generate z from N ( x, Y − xx T ) and take x = sign ( z ) Convex optimization problems 54

  33. Example 0.5 0.4 SDP bound LS solution frequency 0.3 0.2 0.1 0 1 1.2 � Ax − b � 2 / ( SDP bound ) • n = 100 : feasible set has 2 100 ≈ 10 30 points • histogram of 1000 randomized solutions from SDP relaxation Convex optimization problems 55

  34. Overview 1. Basic theory and convex modeling • convex sets and functions • common problem classes and applications 2. Interior-point methods for conic optimization • conic optimization • barrier methods • symmetric primal-dual methods 3. First-order methods • (proximal) gradient algorithms • dual techniques and multiplier methods

  35. Convex optimization — MLSS 2012 Conic optimization • definitions and examples • modeling • duality

  36. Generalized (conic) inequalities conic inequality: a constraint x ∈ K with K a convex cone in R m we require that K is a proper cone: • closed • pointed: does not contain a line (equivalently, K ∩ ( − K ) = { 0 } • with nonempty interior: int K � = ∅ (equivalently, K + ( − K ) = R m ) notation x � K y ⇐ ⇒ x − y ∈ K, x ≻ K y ⇐ ⇒ x − y ∈ int K subscript in � K is omitted if K is clear from the context Conic optimization 56

  37. Cone linear program c T x minimize Ax � K b subject to if K is the nonnegative orthant, this is a (regular) linear program widely used in recent literature on convex optimization • modeling: a small number of ‘primitive’ cones is sufficient to express most convex constraints that arise in practice • algorithms : a convenient problem format when extending interior-point algorithms for linear programming to convex optimization Conic optimization 57

  38. Norm cone ( x, y ) ∈ R m − 1 × R | � x � ≤ y � � K = 1 0 . 5 y 0 1 1 0 0 x 2 − 1 − 1 x 1 for the Euclidean norm this is the second-order cone (notation: Q m ) Conic optimization 58

  39. Second-order cone program c T x minimize subject to � B k 0 x + d k 0 � 2 ≤ B k 1 x + d k 1 , k = 1 , . . . , r cone LP formulation: express constraints as Ax � K b     − B 10 d 10 − B 11 d 11         . . K = Q m 1 × · · · × Q m r , . .     A = . , b = .         − B r 0 d r 0         − B r 1 d r 1 (assuming B k 0 , d k 0 have m k − 1 rows) Conic optimization 59

  40. Vector notation for symmetric matrices • vectorized symmetric matrix: for U ∈ S p √ � U 11 � 2 , U 21 , . . . , U p 1 , U 22 2 , U 32 , . . . , U p 2 , . . . , U pp √ √ √ vec ( U ) = 2 2 • inverse operation: for u = ( u 1 , u 2 , . . . , u n ) ∈ R n with n = p ( p + 1) / 2 √   · · · 2 u 1 u 2 u p √ mat ( u ) = 1 · · · u 2 2 u p +1 u 2 p − 1   √ . . .  . . .  . . . 2 √   · · · u p u 2 p − 1 2 u p ( p +1) / 2 √ coefficients 2 are added so that standard inner products are preserved: tr ( UV ) = vec ( U ) T vec ( V ) , u T v = tr ( mat ( u ) mat ( v )) Conic optimization 60

  41. Positive semidefinite cone + } = { x ∈ R p ( p +1) / 2 | mat ( x ) � 0 } S p = { vec ( X ) | X ∈ S p 1 z 0.5 0 1 1 0 0.5 −1 y x 0 √ � � � � � x y/ 2 S 2 = √ � � 0 ( x, y, z ) � y/ 2 z � Conic optimization 61

  42. Semidefinite program c T x minimize x 1 A 11 + x 2 A 12 + · · · + x n A 1 n � B 1 subject to . . . x 1 A r 1 + x 2 A r 2 + · · · + x n A rn � B r r linear matrix inequalities of order p 1 , . . . , p r cone LP formulation: express constraints as Ax � K B K = S p 1 × S p 2 × · · · × S p r     · · · vec ( A 11 ) vec ( A 12 ) vec ( A 1 n ) vec ( B 1 ) · · · vec ( A 21 ) vec ( A 22 ) vec ( A 2 n ) vec ( B 2 )     A =  , b = . . . . . . . .     . . . .    vec ( A r 1 ) vec ( A r 2 ) · · · vec ( A rn ) vec ( B r ) Conic optimization 62

  43. Exponential cone the epigraph of the perspective of exp x is a non-proper cone ( x, y, z ) ∈ R 3 | ye x/y ≤ z, y > 0 � � K = the exponential cone is K exp = cl K = K ∪ { ( x, 0 , z ) | x ≤ 0 , z ≥ 0 } 1 0.5 z 0 3 2 1 1 0 y −1 0 x −2 Conic optimization 63

  44. Geometric program c T x minimize n i exp( a T subject to log � ik x + b ik ) ≤ 0 , i = 1 , . . . , r k =1 cone LP formulation c T x minimize a T   ik x + b ik  ∈ K exp , subject to 1 k = 1 , . . . , n i , i = 1 , . . . , r  z ik n i � z ik ≤ 1 , i = 1 , . . . , m k =1 Conic optimization 64

  45. Power cone m definition: for α = ( α 1 , α 2 , . . . , α m ) > 0 , � α i = 1 i =1 ( x, y ) ∈ R m + × R | | y | ≤ x α 1 1 · · · x α m � � K α = m examples for m = 2 α = ( 1 2 , 1 α = ( 2 3 , 1 α = ( 3 4 , 1 2 ) 3 ) 4 ) 0.5 0.5 0.4 0.2 0 y 0 y y 0 −0.2 −0.4 −0.5 −0.5 0 0 0 0.5 0.5 0.5 1 1 1 0.5 0.5 0.5 1 1 1 x 2 0 x 2 x 2 0 0 x 1 x 1 x 1 Conic optimization 65

  46. Outline • definition and examples • modeling • duality

  47. Modeling software modeling packages for convex optimization • CVX, YALMIP (MATLAB) • CVXPY, CVXMOD (Python) assist the user in formulating convex problems, by automating two tasks: • verifying convexity from convex calculus rules • transforming problem in input format required by standard solvers related packages general-purpose optimization modeling: AMPL, GAMS Conic optimization 66

  48. CVX example � Ax − b � 1 minimize subject to 0 ≤ x k ≤ 1 , k = 1 , . . . , n MATLAB code cvx_begin variable x(3); minimize(norm(A*x - b, 1)) subject to x >= 0; x <= 1; cvx_end • between cvx_begin and cvx_end , x is a CVX variable • after execution, x is MATLAB variable with optimal solution Conic optimization 67

  49. Modeling and conic optimization convex modeling systems (CVX, YALMIP, CVXPY, CVXMOD, . . . ) • convert problems stated in standard mathematical notation to cone LPs • in principle, any convex problem can be represented as a cone LP • in practice, a small set of primitive cones is used ( R n + , Q p , S p ) • choice of cones is limited by available algorithms and solvers (see later) modeling systems implement set of rules for expressing constraints f ( x ) ≤ t as conic inequalities for the implemented cones Conic optimization 68

  50. Examples of second-order cone representable functions • convex quadratic f ( x ) = x T Px + q T x + r ( P � 0) • quadratic-over-linear function f ( x, y ) = x T x with dom f = R n × R + ( assume 0 / 0 = 0) y • convex powers with rational exponent x β � x > 0 f ( x ) = | x | α , f ( x ) = + ∞ x ≤ 0 for rational α ≥ 1 and β ≤ 0 • p -norm f ( x ) = � x � p for rational p ≥ 1 Conic optimization 69

  51. Examples of SD cone representable functions • matrix-fractional function + × R n | y ∈ R ( X ) } f ( X, y ) = y T X − 1 y with dom f = { ( X, y ) ∈ S n • maximum eigenvalue of symmetric matrix • maximum singular value f ( X ) = � X � 2 = σ 1 ( X ) � tI � X � X � 2 ≤ t ⇐ ⇒ � 0 X T tI • nuclear norm f ( X ) = � X � ∗ = � i σ i ( X ) � � 1 U X � X � ∗ ≤ t ⇐ ⇒ ∃ U, V : � 0 , 2( tr U + tr V ) ≤ t X T V Conic optimization 70

  52. Functions representable with exponential and power cone exponential cone • exponential and logarithm • entropy f ( x ) = x log x power cone • increasing power of absolute value: f ( x ) = | x | p with p ≥ 1 • decreasing power: f ( x ) = x q with q ≤ 0 and domain R ++ • p -norm: f ( x ) = � x � p with p ≥ 1 Conic optimization 71

  53. Outline • definition and examples • modeling • duality

  54. Linear programming duality primal and dual LP c T x − b T z (P) minimize (D) maximize A T z + c = 0 subject to Ax ≤ b subject to z ≥ 0 • primal optimal value is p ⋆ ( + ∞ if infeasible, −∞ if unbounded below) • dual optimal value is d ⋆ ( −∞ if infeasible, + ∞ if unbounded below) duality theorem • weak duality: p ⋆ ≥ d ⋆ , with no exception • strong duality: p ⋆ = d ⋆ if primal or dual is feasible • if p ⋆ = d ⋆ is finite, then primal and dual optima are attained Conic optimization 72

  55. Dual cone definition K ∗ = { y | x T y ≥ 0 for all x ∈ K } K ∗ is a proper cone if K is a proper cone dual inequality: x � ∗ y means x � K ∗ y for generic proper cone K note: dual cone depends on choice of inner product: H − 1 K ∗ is dual cone for inner product � x, y � = x T Hy Conic optimization 73

  56. Examples + , Q p , S p are self-dual: K = K ∗ • R p • dual of a norm cone is the norm cone of the dual norm • dual of exponential cone ( u, v, w ) ∈ R − × R × R + | − u log( − u/w ) + u − v ≤ 0 K ∗ � � exp = (with 0 log(0 /w ) = 0 if w ≥ 0 ) • dual of power cone is + × R | | v | ≤ ( u 1 /α 1 ) α 1 · · · ( u m /α m ) α m � K ∗ ( u, v ) ∈ R m � α = Conic optimization 74

  57. Primal and dual cone LP primal problem (optimal value p ⋆ ) c T x minimize subject to Ax � b dual problem (optimal value d ⋆ ) − b T z maximize A T z + c = 0 subject to z � ∗ 0 weak duality : p ⋆ ≥ d ⋆ (without exception) Conic optimization 75

  58. Strong duality p ⋆ = d ⋆ if primal or dual is strictly feasible • slightly weaker than LP duality (which only requires feasibility) • can have d ⋆ < p ⋆ with finite p ⋆ and d ⋆ other implications of strict feasibility • if primal is strictly feasible, then dual optimum is attained (if d ⋆ is finite) • if dual is strictly feasible, then primal optimum is attained (if p ⋆ is finite) Conic optimization 76

  59. Optimality conditions c T x − b T z minimize maximize A T z + c = 0 subject to Ax + s = b subject to s � 0 z � ∗ 0 optimality conditions A T � � � � � � � � 0 0 x c = + − A s 0 z b z T s = 0 s � 0 , z � ∗ 0 , duality gap: inner product of ( x, z ) and (0 , s ) gives z T s = c T x + b T z Conic optimization 77

  60. Convex optimization — MLSS 2012 Barrier methods • barrier method for linear programming • normal barriers • barrier method for conic optimization

  61. History • 1960s: Sequentially Unconstrained Minimization Technique (SUMT) solves nonlinear convex optimization problem minimize f 0 ( x ) f i ( x ) ≤ 0 , subject to i = 1 , . . . , m via a sequence of unconstrained minimization problems m tf 0 ( x ) − � log( − f i ( x )) minimize i =1 • 1980s: LP barrier methods with polynomial worst-case complexity • 1990s: barrier methods for non-polyhedral cone LPs Barrier methods 78

  62. Logarithmic barrier function for linear inequalities m • barrier for nonnegative orthant R m � + : φ ( s ) = − log s i i =1 • barrier for inequalities Ax ≤ b : m � log( b i − a T ψ ( x ) = φ ( b − Ax ) = − i x ) i =1 convex, ψ ( x ) → ∞ at boundary of dom ψ = { x | Ax < b } gradient and Hessian ∇ ψ ( x ) = − A T ∇ φ ( s ) , ∇ 2 ψ ( x ) = A T ∇ φ 2 ( s ) A with s = b − Ax and � 1 � 1 , . . . , 1 � , . . . , 1 � ∇ φ 2 ( s ) = diag ∇ φ ( s ) = − , s 2 s 2 s 1 s m m 1 Barrier methods 79

  63. Central path for linear program c T x minimize Ax ≤ b subject to c central path: minimizers x ⋆ ( t ) of f t ( x ) = tc T x + φ ( b − Ax ) x ⋆ ( t ) x ⋆ t is a positive parameter optimality conditions: x = x ⋆ ( t ) satisfies ∇ f t ( x ) = tc − A T ∇ φ ( s ) = 0 , s = b − Ax Barrier methods 80

  64. Central path and duality dual feasible point on central path • for x = x ⋆ ( t ) and s = b − Ax , � 1 z ∗ ( t ) = − 1 , 1 , . . . , 1 � t ∇ φ ( s ) = ts 1 ts 2 ts m z = z ⋆ ( t ) is strictly dual feasible: c + A T z = 0 and z > 0 • can be corrected to account for inexact centering of x ≈ x ⋆ ( t ) duality gap between x = x ⋆ ( t ) and z = z ⋆ ( t ) is c T x + b T z = s T z = m t gives bound on suboptimality: c T x ⋆ ( t ) − p ⋆ ≤ m/t Barrier methods 81

  65. Barrier method starting with t > 0 , strictly feasible x • make one or more Newton steps to (approximately) minimize f t : x + = x − α ∇ 2 f t ( x ) − 1 ∇ f t ( x ) step size α is fixed or from line search • increase t and repeat until c T x − p ⋆ ≤ ǫ complexity: with proper initialization, step size, update scheme for t , � √ m log(1 /ǫ ) � #Newton steps = O result follows from convergence analysis of Newton’s method for f t Barrier methods 82

  66. Outline • barrier method for linear programming • normal barriers • barrier method for conic optimization

  67. Normal barrier for proper cone φ is a θ -normal barrier for the proper cone K if it is • a barrier : smooth, convex, domain int K , blows up at boundary of K • logarithmically homogeneous with parameter θ : φ ( tx ) = φ ( x ) − θ log t, ∀ x ∈ int K, t > 0 • self-concordant : restriction g ( α ) = φ ( x + αv ) to any line satisfies g ′′′ ( α ) ≤ 2 g ′′ ( α ) 3 / 2 (Nesterov and Nemirovski, 1994) Barrier methods 83

  68. Examples nonnegative orthant: K = R m + m � φ ( x ) = − log x i ( θ = m ) i =1 second-order cone: K = Q p = { ( x, y ) ∈ R p − 1 × R | � x � 2 ≤ y } φ ( x, y ) = − log( y 2 − x T x ) ( θ = 2) semidefinite cone : K = S m = { x ∈ R m ( m +1) / 2 | mat ( x ) � 0 } φ ( x ) = − log det mat ( x ) ( θ = m ) Barrier methods 84

  69. exponential cone: K exp = cl { ( x, y, z ) ∈ R 3 | ye x/y ≤ z, y > 0 } φ ( x, y, z ) = − log ( y log( z/y ) − x ) − log z − log y ( θ = 3) power cone: K = { ( x 1 , x 2 , y ) ∈ R + × R + × R | | y | ≤ x α 1 1 x α 2 2 } � − y 2 � x 2 α 1 x 2 α 2 φ ( x, y ) = − log − log x 1 − log x 2 ( θ = 4) 1 2 Barrier methods 85

  70. Central path conic LP (with inequality with respect to proper cone K ) c T x minimize subject to Ax � b barrier for the feasible set φ ( b − Ax ) where φ is a θ -normal barrier for K central path: set of minimizers x ⋆ ( t ) (with t > 0 ) of f t ( x ) = tc T x + φ ( b − Ax ) Barrier methods 86

  71. Newton step centering problem f t ( x ) = tc T x + φ ( b − Ax ) minimize Newton step at x ∆ x = −∇ 2 f t ( x ) − 1 ∇ f t ( x ) Newton decrement � 1 / 2 ∆ x T ∇ 2 f t ( x )∆ x � λ t ( x ) = � 1 / 2 −∇ f t ( x ) T ∆ x � = useful as a measure of proximity of x to x ⋆ ( t ) Barrier methods 87

  72. Damped Newton method f t ( x ) = tc T x + φ ( b − Ax ) minimize algorithm (with parameters ǫ ∈ (0 , 1 / 2) , η ∈ (0 , 1 / 4] ) select a starting point x ∈ dom f t repeat: 1. compute Newton step ∆ x and Newton decrement λ t ( x ) 2. if λ t ( x ) 2 ≤ ǫ , return x 3. otherwise, set x := x + α ∆ x with 1 if λ t ( x ) ≥ η, α = α = 1 if λ t ( x ) < η 1 + λ t ( x ) • stopping criterion λ t ( x ) 2 ≤ ǫ implies f t ( x ) − inf f t ( x ) ≤ ǫ • alternatively, can use backtracking line search Barrier methods 88

  73. Convergence results for damped Newton method • damped Newton phase: f t decreases by at least a positive constant γ f t ( x + ) − f t ( x ) ≤ − γ if λ t ( x ) ≥ η where γ = η − log(1 + η ) • quadratic convergence phase: λ t rapidly decreases to zero 2 λ t ( x + ) ≤ (2 λ t ( x )) 2 if λ t ( x ) < η implies λ t ( x + ) ≤ 2 η 2 < η conclusion: the number of Newton iterations is bounded by f t ( x (0) ) − inf f t ( x ) + log 2 log 2 (1 /ǫ ) γ Barrier methods 89

  74. Outline • barrier method for linear programming • normal barriers • barrier method for conic optimization

Recommend


More recommend