CS257 Linear and Convex Optimization Lecture 7 Bo Jiang John Hopcroft Center for Computer Science Shanghai Jiao Tong University October 19, 2020
Recap: Convex Optimization Problem min f ( x ) x s.t. g i ( x ) ≤ 0 , i = 1 , 2 , . . . , m h i ( x ) = 0 , i = 1 , 2 , . . . , k 1. f , g i are convex functions 2. h i are affine functions, i.e. h i ( x ) = a T i x − b i Domain. D = dom f ∩ ( � m i = 1 dom g i ) Feasible set. X = { x ∈ D : g i ( x ) ≤ 0 , 1 ≤ i ≤ m ; h i ( x ) = 0 , 1 ≤ i ≤ k } Optimal value. f ∗ = inf x ∈ X f ( x ) Optimal point. x ∗ ∈ X and f ( x ∗ ) = f ∗ , i.e. f ( x ∗ ) ≤ f ( x ) , ∀ x ∈ X First-order optimality condition ∇ f ( x ∗ ) T ( x − x ∗ ) ≥ 0 , ∀ x ∈ X 1/21
Recap: LP General form Standard form Inequality form c T x c T x c T x min min min x x x s.t. Bx ≤ d s.t. Ax = b s.t. Ax ≤ b Ax = b x ≥ 0 Conversion to equivalent problems • introducing slack variables • eliminating equality constraints • epigraph form • representing a variable by two nonnegative variables, x = x + − x − 2/21
Recap: Geometry of LP min − x 1 − 3 x 2 c T x min x x s.t. x 1 + x 2 ≤ 6 s.t. Ax ≤ b − x 1 + 2 x 2 ≤ 8 x 1 , x 2 ≥ 0 x 2 − c x ∗ = ( 4 / 3 , 14 / 3 ) f ( x ) x ∗ − c x 1 x 1 x 2 • optimization of a linear function over a polyhedron • graphic solution of simple LP 3/21
Contents 1. Some Canonical Problem Forms 1.1 QP and QCQP 1.2 Geometric Program 4/21
Quadratic Program (QP) 1 2 x T Qx + c T x min x s.t. Bx ≤ d Ax = b QP is convex iff Q � O . x 2 x 1 −∇ f ( x ∗ ) f ( x ) x ∗ x ∗ −∇ f ( x ∗ ) x 2 x 1 5/21
Quadratically Constrained Quadratic Program (QCQP) 1 2 x T Qx + c T x min x 1 2 x T Q i x + c T s.t. i x + d i ≤ 0 , i = 1 , 2 , . . . , m Ax = b QCQP is convex if Q � O and Q i � O , ∀ i x 2 x 1 −∇ f ( x ∗ ) f ( x ) x ∗ −∇ f ( x ∗ ) x ∗ X x 2 x 1 6/21
Example: Linear Least Squares Regression Given y ∈ R n , X ∈ R n × p , find w ∈ R p s.t. � y − Xw � 2 min 2 w • convex QP with objective f ( w ) = w T X T Xw − 2 y T Xw + y T y Geometrically, we are looking for the orthogonal projection ˆ y of y onto the column space of X . y y − Xw ∗ ˆ y = Xw ∗ O column space of X 7/21
Example: Linear Least Squares Regression (cont’d) By the first-order optimality condition, w ∗ is optimal iff ∇ f ( w ∗ ) = 0 i.e. w ∗ is a solution of the normal equation, X T Xw = X T y Case I. X has full column rank, i.e. rank X = p • X T X ≻ O • unique solution w ∗ = ( X T X ) − 1 X T y Note. In this case, the objective f ( w ) is strictly convex and coercive. f ( w ) ≥ λ min ( X T X ) � w � 2 − 2 � y T X � · � w � + � y � 2 8/21
Example: Linear Least Squares Regression (cont’d) Example. Solve � y − Xw � 2 min 2 w with 2 0 3 , . X = y = 0 1 2 0 0 2 Solution. The normal equation is X T Xw = X T y with � 4 � 0 X T X = X T y = ( 6 , 2 ) T , 0 1 Since X has full column rank, w ∗ = ( X T X ) − 1 X T y = ( 1 . 5 , 2 ) T 9/21
Example: Linear Least Squares Regression (cont’d) Case II. rank X = r < p . WLOG assume the first r columns are linearly independent, i.e. X = ( X 1 , X 2 ) where X 1 ∈ R n × r and rank X 1 = r . Claim. There is a solution w ∗ with the last p − r components being 0. • X and X 1 have the same column space • If w ∗ 1 solves w 1 ∈ R r � y − X 1 w 1 � min � w ∗ � then w ∗ = 1 solves min w ∈ R p � y − Xw � 0 • w ∗ 1 = ( X T 1 X 1 ) − 1 X T 1 y Question. Is the solution unique in this case? ⇒ w ∗ + w 0 is also a solution. A. rank X < p = ⇒ ∃ w 0 s.t. Xw 0 = 0 = 10/21
Example: Linear Least Squares Regression (cont’d) Example Solve min w � y − Xw � 2 2 with 2 0 2 3 , . X = − 1 y = 0 1 2 0 0 0 2 Solution. Note rank X = 2 < 3 . • Let 2 0 X 1 = 0 1 0 0 • By the previous example, 1 = ( X T 1 X 1 ) − 1 X T 1 y = ( 1 . 5 , 2 ) T w ∗ is a solution to min w 1 ∈ R 2 � y − X 1 w 1 � 2 . • w ∗ = ( 1 . 5 , 2 , 0 ) T is a solution to min w ∈ R 3 � y − Xw � 2 . 11/21
Example: Linear Least Squares Regression (cont’d) Example (cont’d). The normal equation to the original problem is X T Xw = X T y where 4 0 4 , X T X = X T y = ( 6 , 2 , 4 ) T − 1 0 1 − 1 4 5 • Note X T X is not invertible, so we cannot use the formula 1 w ∗ = ( X T X ) − 1 X T y • The solution w ∗ = ( 1 . 5 , 2 , 0 ) T satisfies the normal equation. • The normal equation has infinitely many solutions given by w = ( 1 . 5 , 2 , 0 ) T + α ( − 1 , 1 , 1 ) T , α ∈ R . All of them are solutions to the least squares problem. 1 This formula still applies if we use the so-called pseudo inverse of X T X . 12/21
General Unconstrained QP Minimize quadratic function with Q ∈ R n × n s.t. Q � O , f ( x ) = 1 2 x T Qx + b T x + c min x By first-order condition, solution satisfies ∇ f ( x ) = Qx + b = 0 Case I. Q ≻ O . There is a unique solution x ∗ = − Q − 1 b . Example. n = 2 , Q = diag { 1 , 1 } , b = ( 1 , 0 ) T , c = 0 . � 1 � � x 1 � � x 1 � f ( x ) = 1 0 = 1 1 + 1 2 x 2 2 x 2 2 ( x 1 , x 2 ) + ( 1 , 0 ) 2 + x 1 0 1 x 2 x 2 The first-order condition becomes � 1 � � x 1 � � 1 � � 0 � 0 + = 0 1 x 2 0 0 which yields the unique optimal solution x ∗ = ( − 1 , 0 ) . 13/21
General Unconstrained QP (cont’d) Case II. det Q = 0 and b / ∈ column space of Q . There is no solution, and f ∗ = −∞ . Example. n = 2 , Q = diag { 0 , 1 } , b = ( 1 , 0 ) T , c = 0 . � 0 � � x 1 � � x 1 � f ( x ) = 1 = 1 0 2 x 2 2 ( x 1 , x 2 ) + ( 1 , 0 ) 2 + x 1 0 1 x 2 x 2 The first-order condition becomes � 0 � � x 1 � � 1 � � 0 � 0 + = 0 1 x 2 0 0 which has no solution. 2 + x 1 is unbounded below, so f ∗ = −∞ . It is easy to see that f ( x ) = 1 2 x 2 14/21
General Unconstrained QP (cont’d) Case III. det Q = 0 and b ∈ column space of Q . There are infinitely many solutions. Example. n = 2 , Q = diag { 1 , 0 } , b = ( 1 , 0 ) T , c = 0 . � 1 � � x 1 � � x 1 � f ( x ) = 1 = 1 0 2 x 2 2 ( x 1 , x 2 ) + ( 1 , 0 ) 1 + x 1 0 0 x 2 x 2 The first-order condition becomes � 1 � � x 1 � � 1 � � 0 � 0 + = 0 0 x 2 0 0 which has infinitely many solutions of the form x = ( − 1 , x 2 ) for any x 2 ∈ R 2 , as f is actually independent of x 2 . 15/21
General Unconstrained QP (cont’d) For the general case ( Q is non-diagonal), • Diagonalize Q by an orthogonal matrix U , so Q = U Λ U T where Λ is diagonal. • Let x = Uy and ˜ b = U T b . Then f ( x ) = 1 2 y T U T QUy + b T Uy + c = 1 T y + c � g ( y ) 2 y T Λ y + ˜ b In the expanded form, n � 1 � � i + ˜ 2 λ i y 2 g ( y ) = b i y i + c i = 1 • Minimizing f ( x ) is equivalent to minimizing g ( y ) . We can minimize i + ˜ each term 1 2 λ i y 2 b i y i independently. Exercise. Convince yourself the previous three cases apply to the non-diagonal case. 16/21
Example: Lasso Lasso (Least Absolute Shrinkage and Selection Operator) Given y ∈ R n , X ∈ R n × p , t > 0 , y � y − Xw � 2 min 2 w s . t . � w � 1 ≤ t O • convex problem? yes ˆ y = Xw ∗ column space of X • QP? no, but can be converted to QP • optimal solution exists? yes ◮ compact feasible set • optimal solution unique? ◮ yes if n ≥ p and X has full column rank ( X T X ≻ O , strictly convex) ◮ no in general, e.g. p > n and t is large enough for unconstrained optima to be feasible 17/21
Example: Ridge Regression Given y ∈ R n , X ∈ R n × p , t > 0 , y � y − Xw � 2 min 2 w � w � 2 s . t . 2 ≤ t y = Xw ∗ ˆ • convex problem? yes O • QCQP? yes column space of X • optimal solution exists? yes ◮ compact feasible set • optimal solution unique? ◮ yes if n ≥ p and X has full column rank ( X T X ≻ O , strictly convex) ◮ no in general 18/21
Example: SVM Linearly separable case 1 2 � w � 2 min w , b y i ( w T x i + b ) ≥ 1 , s . t . i = 1 , 2 , . . . , m Soft margin SVM m 1 � 2 � w � 2 min 2 + C ξ i w , b , ξ i = 1 y i ( w T x i + b ) ≥ 1 − ξ i , s . t . i = 1 , 2 , . . . , m ξ ≥ 0 Equivalent unconstrained form n 1 � 2 � w � 2 ( 1 − y i b − y i w T x i ) + min 2 + C w , b i = 1 19/21
Geometric Program ++ = { x ∈ R n : x > 0 } → R of the form A monomial is a function f : R n f ( x ) = γ x a 1 1 x a 2 2 · · · x a n n for γ > 0 , a 1 , . . . , a n ∈ R . A posynomial is a sum of monomials, p � γ k x a k 1 1 x a k 2 2 · · · x a kn f ( x ) = n k = 1 A geometric program (GP) is an optimization problem of the form min f ( x ) x s . t . g i ( x ) ≤ 1 , i = 1 , . . . , m h j ( x ) = 1 , j = 1 , . . . , r where f , g i , i = 1 , . . . , m are posynomials and h j , j = 1 , . . . , r are monomials. The constraint x > 0 is implicit. 20/21
Recommend
More recommend