An E ffi cient A ffi ne-Scaling Algorithm for Hyperbolic Programming Jim Renegar – joint work with Mutiara Sondjaja 1
Euclidean space A homogeneous polynomial p : E ! R is hyperbolic if there is a vector e 2 E such that for all x 2 E , the univariate polynomial t 7! p ( x + t e ) has only real roots. “ p is hyperbolic in direction e ” E = S n × n , p ( X ) = det( X ) , E = I (identity matrix) Example: – then p ( X + t E ) is the characteristic polynomial of − X All roots are real because symmetric matrices have only real eigenvalues. The hyperbolicity cone is Λ ++ the connected component of { x : p ( x ) 6 = 0 } containing e . For the example, Λ ++ = S n × n (cone of positive-definite matrices) ++ – the convexity of this particular cone is true of hyperbolicity cones in general . . . 2
Thm (G ˚ arding, 1959): Λ ++ is a convex cone A hyperbolic program is an optimization problem of the form h c, x i min Ax = b s.t. HP x 2 Λ + closure of Λ ++ G¨ uler (1997) introduced hyperbolic programming, motivated largely by the realization that f ( x ) = − ln p ( x ) is a self-concordant barrier function “ O ( √ n ) iterations to halve the duality gap ” – where n is the degree of p G¨ uler showed the barrier functions f ( x ) = � ln p ( x ) possess many of the nice properties of X 7! � ln det( X ) although hyperbolicity cones in general are not symmetric (i.e., self-scaled) 3
h c, x i min Ax = b s.t. HP x 2 Λ + There are natural ways in which to “relax” HP to hyperbolic programs for lower degree polynomials. For example, to obtain a relaxation of SDP . . . Fix n , and for 1 ≤ k ≤ n let σ k ( λ 1 , . . . , λ n ) := P j 1 <...<j k λ j 1 · · · λ j k – elementary symmetric polynomial of degree k Then X 7! σ k ( λ ( X ) ) is a hyperbolic polynomial in direction E = I of degree k , and its hyperbolicity cone contains S n × n ++ These polynomials can be evaluated e ffi ciently via the FFT. Perhaps relaxing SDP’s in this and related ways will allow larger SDP’s to be approximately solved e ffi ciently. The relaxations easily generalize to all hyperbolic programs. 4
h c, x i min Ax = b s.t. HP x 2 Λ + barrier function, f ( x ) = − ln p ( x ) its gradient g ( x ) and Hessian H ( x ) positive-definite for all x ∈ Λ ++ “ local inner product at e 2 Λ ++ ” h u, v i e := h u, H ( e ) v i p – the induced norm: k v k e = h v, v i e – “Dikin ellipsoids”: ¯ B e ( e, r ) = { x : k x � e k e r } The gist of the original a ffi ne-scaling method due to Dikin is simply: Given a strictly feasible point e for HP and an appropriate value r > 0 , move from e to the optimal solution e + for min h c, x i s.t. Ax = b x 2 ¯ B e ( e, r ) Dikin focused on linear programming and chose r = 1 (giving the largest Dikin ellipsoids contained in R n + ) also: Vanderbei, Meketon and Freedman (1986) 5
In the mid-1980’s, there was considerable e ff ort trying to prove that Dikin’s a ffi ne-scaling method runs in polynomial-time (perhaps with choice r < 1) The e ff orts mostly ceased when in 1986, Shub and Megiddo showed that the “infinitesimal version” of the algorithm can come near all vertices of a Klee-Minty cube. Nevertheless, several algorithms with spirit similar to Dikin’s method have been shown to halve the duality gap in polynomial time: Monteiro, Adler and Resende 1990 LP, and convex QP } Jansen, Roos and Terlaky 1996 LP use “scaling points” and “V-space” 1997 PSD LCP-problems Sturm and Zhang 1996 SDP } use ellipsoidal cones Chua 2007 symmetric cone programming rather than ellipsoids These algorithms are primal-dual methods and rely heavily on the cones being self-scaled. Our framework shares some strong connections to the one developed by Chek Beng Chua, to whom we are indebted. 6
h c, x i min Ax = b s.t. HP x 2 Λ + For e 2 Λ ++ and 0 < α < p n , let K e ( α ) := { x : h e, x i e � α k x k e } – this happens to be the smallest cone √ � n − α 2 � containing the Dikin ellipsoid B e e, Keep in mind that the cone grows in size as α decreases. min h c, x i min h c, x i s.t. Ax = b s.t. Ax = b HP � ! QP e ( α ) x 2 Λ + x 2 K e ( α ) Swath( α ) = { e ∈ Λ ++ : Ae = b and QP e ( α ) has an optimal solution } Definition: Swath(0) = Central Path Prop: Thus, α can be regarded as a measure of the proximity of points in Swath( α ) to the central path. 7
min h c, x i min h c, x i s.t. Ax = b HP � ! s.t. Ax = b QP e ( α ) x 2 Λ + x 2 K e ( α ) { x : h e, x i e � α k x k e } Swath( α ) = { e ∈ Λ ++ : Ae = b and QP e ( α ) has an optimal solution } Definition: Let x e ( α ) = optimal solution of QP e ( α ) (assuming e ∈ Swath( α )) – the main work in computing x e ( α ) lies in solving a system of linear equations We assume 0 < α < 1 , in which case Λ + ⊆ K e ( α ) – thus, K e ( α ) is a relaxation of HP – hence, optimal value of HP � h c, x e ( α ) i Current iterate: e ∈ Λ ++ Next iterate will be e 0 , a convex combination of e and x e ( α ) e 0 = 1 � � e + t x e ( α ) 1+ t The choice of t is made through duality . . . 8
min h c, x i min h c, x i s.t. Ax = b HP � ! s.t. Ax = b QP e ( α ) x 2 Λ + x 2 K e ( α ) { x : h e, x i e � α k x k e } Swath( α ) = { e ∈ Λ ++ : Ae = b and QP e ( α ) has an optimal solution } Definition: Let x e ( α ) = optimal solution of QP e ( α ) (assuming e ∈ Swath( α )) 9
min h c, x i min h c, x i s.t. Ax = b HP � ! s.t. Ax = b QP e ( α ) x 2 Λ + x 2 K e ( α ) { x : h e, x i e � α k x k e } Swath( α ) = { e ∈ Λ ++ : Ae = b and QP e ( α ) has an optimal solution } Definition: Let x e = optimal solution of QP e ( α ) (assuming e ∈ Swath( α )) 10
min h c, x i min h c, x i s.t. Ax = b HP � ! s.t. Ax = b QP e ( α ) x 2 Λ + x 2 K e ( α ) { x : h e, x i e � α k x k e } Swath( α ) = { e ∈ Λ ++ : Ae = b and QP e ( α ) has an optimal solution } Definition: Let x e = optimal solution of QP e ( α ) (assuming e ∈ Swath( α )) b T y b T y max max s.t. A ∗ y + s = c HP ∗ s.t. A ∗ y + s = c QP e ( α ) ∗ − → s ∈ Λ ∗ x ∈ K e ( α ) ∗ + First-order optimality conditions for x e yield optimal solution ( y e , s e ) for QP e ( α ) ∗ because Λ + ⊆ K e ( α ) Moreover, ( y e , s e ) is feasible for HP ∗ and hence K e ( α ) ∗ ⊆ Λ ∗ + ( y e , s e ) for HP ∗ e for HP , primal-dual feasible pair: gap e := h c, e i � b T y e duality gap: 11
Swath( α ) = { e ∈ Λ ++ : Ae = b and QP e ( α ) has an optimal solution } x e = optimal solution of QP e ( α ) (assuming e ∈ Swath( α )) primal-dual feasible pair: e for HP , ( y e , s e ) for HP ∗ Current iterate: e ∈ Λ ++ Next iterate will be a convex combination of e and x e : 1 � � e ( t ) = e + t x e 1+ t Want t to be large so as to improve primal objective value, but also want e ( t ) ∈ Swath( α ) We choose t to be the minimizer of a particular quadratic polynomial, and thereby ensure that: s e ∈ int( K e ( t ) ( α ) ∗ ) e ( t ) ∈ Λ ++ • • – consequently, both e ( t ) is strictly feasible for QP e ( t ) ( α ) and ( y e , s e ) is strictly feasible for QP e ( t ) ( α ) ∗ – hence, e ( t ) ∈ Swath( α ) 12
Swath( α ) = { e ∈ Λ ++ : Ae = b and QP e ( α ) has an optimal solution } x e = optimal solution of QP e ( α ) (assuming e ∈ Swath( α )) primal-dual feasible pair: e for HP , ( y e , s e ) for HP ∗ Current iterate: e ∈ Λ ++ Next iterate will be a convex combination of e and x e : 1 � � e ( t ) = e + t x e 1+ t Want t to be large so as to improve primal objective value, but also want e ( t ) ∈ Swath( α ) We choose t to be the minimizer of a particular quadratic polynomial, and thereby ensure that: t � 1 2 α / k x e k e • – and thus ensure good improvement in the primal objective value if, say, k x e k e p n 13
Swath( α ) = { e ∈ Λ ++ : Ae = b and QP e ( α ) has an optimal solution } x e = optimal solution of QP e ( α ) (assuming e ∈ Swath( α )) primal-dual feasible pair: e for HP , ( y e , s e ) for HP ∗ Current iterate: e ∈ Λ ++ Next iterate will be a convex combination of e and x e : 1 � � e ( t ) = e + t x e 1+ t Want t to be large so as to improve primal objective value, but also want e ( t ) ∈ Swath( α ) We choose t to be the minimizer of a particular quadratic polynomial, and thereby ensure that: q 1+ α s e ∈ K e ( t ) ( β ) ∗ where β = α • 2 – which implies s e is “deep within” K e ( t ) ( α ) ∗ – and hence ( y e , s e ) is “very strongly” feasible for QP e ( t ) ( α ) ∗ 14
1 � � e ( t ) = e + t x e 1+ t We choose t to be the minimizer of a particular quadratic polynomial, and thereby ensure that: 1. There is “good” improvement in primal objective value if k x e k e p n ( y e , s e ) is “very strongly” feasible for QP ∗ 2. e ( t ) Sequence of iterates: e 0 , e 1 , e 2 , . . . – write x i and ( y i , s i ) rather than x e i and ( y e i , s e i ) If i > 0, then k x i k e i p n ) h c, e i +1 i ⌧ h c, e i i 1. ( y i − 1 , s i − 1 ) is “very strongly” feasible for QP ∗ 2. e i On the other hand, we show ( k x i k e i � p n ) ^ (2 . ) b T y i � b T y i − 1 3. ) In this manner we establish the Main Theorem . . . 15
Recommend
More recommend