CS/ECE/ISyE 524 Introduction to Optimization Spring 2017–18 16. Review of convex optimization ❼ Convex sets and functions ❼ Convex programming models ❼ Network flow problems ❼ Least squares problems ❼ Regularization and tradeoffs ❼ Duality Laurent Lessard (www.laurentlessard.com)
Convex sets A set C ⊆ R n is convex if for all x , y ∈ C and all 0 ≤ α ≤ 1, we have: α x + (1 − α ) y ∈ C . ❼ every line segment must be contained in the set ❼ can include boundary or not ❼ can be finite or not y y C C x x convex set nonconvex set 16-2
Examples 1. Polyhedron ❼ A linear inequality a T i x ≤ b i is a halfspace . ❼ Intersections of halfspaces form a polyhedron: Ax ≤ b . Halfspace in 3D Polyhedron in 3D. 16-3
Examples 2. Ellipsoid ❼ A quadratic form looks like: x T Qx ❼ If Q ≻ 0 (positive definite; all eigenvalues positive), then the set of x satisfying x T Qx ≤ b is an ellipsoid . Ellipsoid 16-4
Examples 3. Second-order cone constraint ❼ The set of points satisfying � Ax + b � ≤ c T x + d is called a second-order cone constraint . ❼ Example: robust linear programming 1.5 1.0 0.5 - 1.5 - 1.0 - 0.5 0.5 1.0 1.5 2.0 - 0.5 - 1.0 - 1.5 Constraints a T i x + ρ � x � ≤ b i Second order cone: � x � ≤ y 16-5
Convex functions A function f : D → R is a convex function if: 1. the domain D ⊆ R n is a convex set 2. for all x , y ∈ D and 0 ≤ α ≤ 1, the function f satisfies: f ( α x + (1 − α ) y ) ≤ α f ( x ) + (1 − α ) f ( y ) f ( x ) f ( x ) ❼ any line segment 3 3 joining points of f y 2 2 lies above f . y 1 x 1 x ❼ f is continuous, not x x necessarily smooth - 1 1 2 3 4 - 1 1 2 3 4 Convex function Nononvex function ❼ f is concave if − f is convex. 16-6
Convex programs minimize f 0 ( x ) x ∈ D subject to: f i ( x ) ≤ 0 for i = 1 , . . . , m h j ( x ) = 0 for j = 1 , . . . , r ❼ the domain is the set D ❼ the cost function is f 0 ❼ the inequality constraints are the f i for i = 1 , . . . , m . ❼ the equality constraints are the h j for j = 1 , . . . , r . ❼ feasible set : the x ∈ D satisfying all constraints. A model is convex if D is a convex set, all the f i are convex functions, and the h j are affine functions (linear + constant) 16-7
Examples 1. Linear program (LP) ❼ cost is affine ❼ all constraints are affine ❼ can be maximization or minimization Important properties 1.5 ❼ feasible set is a polyhedron ❼ can be optimal, infeasible, 1.0 or unbounded ❼ optimal point occurs at a 0.5 vertex 0.5 1.0 1.5 2.0 16-8
Examples 2. Convex quadratic program (QP) ❼ cost is a convex quadratic ❼ all constraints are affine ❼ must be a minimization Important properties 1.5 ❼ feasible set is a polyhedron ❼ optimal point occurs on 1.0 boundary or in interior 0.5 0.5 1.0 1.5 2.0 16-9
Examples 3. Convex quadratically constrained QP (QCQP) ❼ cost is convex quadratic ❼ inequality constraints are convex quadratics ❼ equality constraints are affine Important properties 1.5 ❼ feasible set is an intersection of ellipsoids 1.0 ❼ optimal point occurs on boundary or in interior 0.5 0.5 1.0 1.5 2.0 16-10
Examples 4. Second-order cone program (SOCP) ❼ cost is affine ❼ inequality constraints are second-order cone constraints ❼ equality constraints are affine Important properties ❼ feasible set is convex ❼ optimal point occurs on boundary or in interior 16-11
Hierarchy of complexity From simplest to most complicated: 1. linear program 2. convex quadratic program 3. convex quadratically constrained quadratic program 4. second-order cone program 5. semidefinite program 6. general convex program Important notes ❼ more complicated just means that e.g. every LP is a SOCP (by setting appropriate variables to zero), but a general SOCP cannot be expressed as an LP. ❼ in general: strive for the simplest model possible 16-12
Network flow problems 1 6 3 5 8 2 4 7 ❼ Each edge ( i , j ) ∈ E has a flow x ij ≥ 0. ❼ Each edge has a transportation cost c ij . ❼ Each node i ∈ N is: a source if b i > 0, a sink if b i < 0, or a relay if b i = 0. The sum of flows entering i must equal b i . ❼ Find the flow that minimizes total transportation cost while satisfying demand at each node. 16-13
Network flow problems 1 6 3 5 8 2 4 7 ❼ Capacity constraints : p ij ≤ x ij ≤ q ij ∀ ( i , j ) ∈ E . ❼ Balance constraint : � j ∈N x ij = b i ∀ i ∈ N . ❼ Minimize total cost : � ( i , j ) ∈E c ij x ij We assume � i ∈N b i = 0 (balanced graph). Otherwise, add a dummy node with no cost to balance the graph. 16-14
Network flow problems 1 6 3 5 8 2 4 7 Expanded form: x 13 x 23 1 0 0 0 0 0 0 0 0 0 0 b 1 x 24 0 1 1 0 0 0 0 0 0 0 0 b 2 x 35 − 1 − 1 0 1 1 0 0 0 0 0 0 b 3 x 36 0 − 1 0 0 0 1 0 0 0 0 0 b 4 = x 45 0 0 0 − 1 0 − 1 1 1 0 0 0 b 5 x 56 0 0 0 0 − 1 0 − 1 0 1 1 0 b 6 x 57 0 0 0 0 0 0 0 − 1 − 1 0 1 b 7 x 67 0 0 0 0 0 0 0 0 0 − 1 − 1 b 8 x 68 A = incidence matrix x 78 16-15
Integer solutions c T x minimize x subject to: Ax = b p ≤ x ≤ q ❼ If A is a totally unimodular matrix then if demands b i and capacities q ij are integers, the flows x ij are integers. ❼ All incidence matrices are totally unimodular. 16-16
Examples ❼ Transportation problem: each node is a source or a sink ❼ Assignment problem: transportation problem where each source has supply 1 and each sink has demand 1. ❼ Transshipment problem: like a transportation problem, but it also has relay nodes (warehouses) ❼ Shortest path problem: single source, single sink, and the edge costs are the path lengths. ❼ Max-flow problem: single source, single sink. Add a feedback path with − 1 cost and minimize the cost. 16-17
Least squares ❼ We want to solve Ax = b where A ∈ R m × n . ❼ Typical case of interest: m > n (overdetermined). If there is no solution to Ax = b we try instead to have Ax ≈ b . ❼ The least-squares approach: make Euclidean norm � Ax − b � as small as possible. Standard form: � 2 � � minimize � Ax − b x It’s an unconstrained convex QP. 16-18
Example: curve-fitting ❼ We are given noisy data points ( x i , y i ). ❼ We suspect they are related by y = px 2 + qx + r ❼ Find the p , q , r that best agrees with the data. Writing all the equations: y 1 ≈ px 2 1 + qx 1 + r x 2 1 y 1 x 1 1 p y 2 ≈ px 2 2 + qx 2 + r x 2 y 2 x 2 1 2 = ⇒ ≈ q . . . . . . . . . . . . . . . r x 2 1 y m x m y m ≈ px 2 m + qx m + r m ❼ Also called regression . 16-19
Regularization Regularization: Additional penalty term added to the cost function to encourage a solution with desirable properties. Regularized least squares: � Ax − b � 2 + λ R ( x ) minimize x ❼ R ( x ) is the regularizer (penalty function) ❼ λ is the regularization parameter ❼ The model has different names depending on R ( x ). 16-20
Examples � Ax − b � 2 + λ R ( x ) minimize x 1. If R ( x ) = � x � 2 = x 2 1 + x 2 2 + · · · + x 2 n It is called: L 2 regularization , Tikhonov regularization , or Ridge regression depending on the application. It has the effect of smoothing the solution. 2. If R ( x ) = � x � 1 = | x 1 | + | x 2 | + · · · + | x n | It is called: L 1 regularization or LASSO . It has the effect of sparsifying the solution (ˆ x will have few nonzero entries). 3. R ( x ) = � x � ∞ = max {| x 1 | , | x 2 | , . . . , | x n |} It is called L ∞ regularization and it has the effect of equalizing the solution (makes most components equal). 16-21
Tradeoffs ❼ Suppose J 1 = � Ax − b � 2 and J 2 = � Cx − d � 2 . ❼ We would like to make both J 1 and J 2 small. ❼ A sensible approach: solve the optimization problem: minimize J 1 + λ J 2 x where λ > 0 is a (fixed) tradeoff parameter . ❼ Then tune λ to explore possible results. ◮ When λ → 0, we place more weight on J 1 ◮ When λ → ∞ , we place more weight on J 2 16-22
Pareto curve J 2 λ → 0 feasible, but strictly suboptimal P a r e t o - o p t i m a l p o i n t s λ → ∞ infeasible J 1 ❼ Pareto-optimal points can only improve in J 1 at the expense of J 2 or vice versa. 16-23
Example: Min-norm least squares Underdetermined case: A ∈ R m × n is a wide matrix ( m ≤ n ), so Ax = b has infinitely many solutions. ❼ Look to make both � Ax − b � 2 and � x � 2 small � Ax − b � 2 + λ � x � 2 minimize x ❼ In the limit λ → ∞ , we get x = 0 ❼ In the limit λ → 0, we get the min-norm solution: � x � 2 minimize x subject to: Ax = b 16-24
Duality Intuition: Duality is all about finding solution bounds. ❼ If the primal problem is a minimization, all feasible points of the primal are upper bounds on the optimal solution. ❼ The dual problem is a maximization. All feasible points of the dual are lower bounds on the optimal solution. 16-25
Recommend
More recommend