16 review of convex optimization
play

16. Review of convex optimization Convex sets and functions Convex - PowerPoint PPT Presentation

CS/ECE/ISyE 524 Introduction to Optimization Spring 201718 16. Review of convex optimization Convex sets and functions Convex programming models Network flow problems Least squares problems Regularization and tradeoffs


  1. CS/ECE/ISyE 524 Introduction to Optimization Spring 2017–18 16. Review of convex optimization ❼ Convex sets and functions ❼ Convex programming models ❼ Network flow problems ❼ Least squares problems ❼ Regularization and tradeoffs ❼ Duality Laurent Lessard (www.laurentlessard.com)

  2. Convex sets A set C ⊆ R n is convex if for all x , y ∈ C and all 0 ≤ α ≤ 1, we have: α x + (1 − α ) y ∈ C . ❼ every line segment must be contained in the set ❼ can include boundary or not ❼ can be finite or not y y C C x x convex set nonconvex set 16-2

  3. Examples 1. Polyhedron ❼ A linear inequality a T i x ≤ b i is a halfspace . ❼ Intersections of halfspaces form a polyhedron: Ax ≤ b . Halfspace in 3D Polyhedron in 3D. 16-3

  4. Examples 2. Ellipsoid ❼ A quadratic form looks like: x T Qx ❼ If Q ≻ 0 (positive definite; all eigenvalues positive), then the set of x satisfying x T Qx ≤ b is an ellipsoid . Ellipsoid 16-4

  5. Examples 3. Second-order cone constraint ❼ The set of points satisfying � Ax + b � ≤ c T x + d is called a second-order cone constraint . ❼ Example: robust linear programming 1.5 1.0 0.5 - 1.5 - 1.0 - 0.5 0.5 1.0 1.5 2.0 - 0.5 - 1.0 - 1.5 Constraints a T i x + ρ � x � ≤ b i Second order cone: � x � ≤ y 16-5

  6. Convex functions A function f : D → R is a convex function if: 1. the domain D ⊆ R n is a convex set 2. for all x , y ∈ D and 0 ≤ α ≤ 1, the function f satisfies: f ( α x + (1 − α ) y ) ≤ α f ( x ) + (1 − α ) f ( y ) f ( x ) f ( x ) ❼ any line segment 3 3 joining points of f y 2 2 lies above f . y 1 x 1 x ❼ f is continuous, not x x necessarily smooth - 1 1 2 3 4 - 1 1 2 3 4 Convex function Nononvex function ❼ f is concave if − f is convex. 16-6

  7. Convex programs minimize f 0 ( x ) x ∈ D subject to: f i ( x ) ≤ 0 for i = 1 , . . . , m h j ( x ) = 0 for j = 1 , . . . , r ❼ the domain is the set D ❼ the cost function is f 0 ❼ the inequality constraints are the f i for i = 1 , . . . , m . ❼ the equality constraints are the h j for j = 1 , . . . , r . ❼ feasible set : the x ∈ D satisfying all constraints. A model is convex if D is a convex set, all the f i are convex functions, and the h j are affine functions (linear + constant) 16-7

  8. Examples 1. Linear program (LP) ❼ cost is affine ❼ all constraints are affine ❼ can be maximization or minimization Important properties 1.5 ❼ feasible set is a polyhedron ❼ can be optimal, infeasible, 1.0 or unbounded ❼ optimal point occurs at a 0.5 vertex 0.5 1.0 1.5 2.0 16-8

  9. Examples 2. Convex quadratic program (QP) ❼ cost is a convex quadratic ❼ all constraints are affine ❼ must be a minimization Important properties 1.5 ❼ feasible set is a polyhedron ❼ optimal point occurs on 1.0 boundary or in interior 0.5 0.5 1.0 1.5 2.0 16-9

  10. Examples 3. Convex quadratically constrained QP (QCQP) ❼ cost is convex quadratic ❼ inequality constraints are convex quadratics ❼ equality constraints are affine Important properties 1.5 ❼ feasible set is an intersection of ellipsoids 1.0 ❼ optimal point occurs on boundary or in interior 0.5 0.5 1.0 1.5 2.0 16-10

  11. Examples 4. Second-order cone program (SOCP) ❼ cost is affine ❼ inequality constraints are second-order cone constraints ❼ equality constraints are affine Important properties ❼ feasible set is convex ❼ optimal point occurs on boundary or in interior 16-11

  12. Hierarchy of complexity From simplest to most complicated: 1. linear program 2. convex quadratic program 3. convex quadratically constrained quadratic program 4. second-order cone program 5. semidefinite program 6. general convex program Important notes ❼ more complicated just means that e.g. every LP is a SOCP (by setting appropriate variables to zero), but a general SOCP cannot be expressed as an LP. ❼ in general: strive for the simplest model possible 16-12

  13. Network flow problems 1 6 3 5 8 2 4 7 ❼ Each edge ( i , j ) ∈ E has a flow x ij ≥ 0. ❼ Each edge has a transportation cost c ij . ❼ Each node i ∈ N is: a source if b i > 0, a sink if b i < 0, or a relay if b i = 0. The sum of flows entering i must equal b i . ❼ Find the flow that minimizes total transportation cost while satisfying demand at each node. 16-13

  14. Network flow problems 1 6 3 5 8 2 4 7 ❼ Capacity constraints : p ij ≤ x ij ≤ q ij ∀ ( i , j ) ∈ E . ❼ Balance constraint : � j ∈N x ij = b i ∀ i ∈ N . ❼ Minimize total cost : � ( i , j ) ∈E c ij x ij We assume � i ∈N b i = 0 (balanced graph). Otherwise, add a dummy node with no cost to balance the graph. 16-14

  15. Network flow problems 1 6 3 5 8 2 4 7 Expanded form: x 13   x 23 1 0 0 0 0 0 0 0 0 0 0    b 1  x 24   0 1 1 0 0 0 0 0 0 0 0 b 2  x 35  − 1 − 1 0 1 1 0 0 0 0 0 0 b 3       x 36 0 − 1 0 0 0 1 0 0 0 0 0 b 4       = x 45       0 0 0 − 1 0 − 1 1 1 0 0 0 b 5 x 56       0 0 0 0 − 1 0 − 1 0 1 1 0 b 6       x 57 0 0 0 0 0 0 0 − 1 − 1 0 1 b 7   x 67 0 0 0 0 0 0 0 0 0 − 1 − 1   b 8 x 68 A = incidence matrix x 78 16-15

  16. Integer solutions c T x minimize x subject to: Ax = b p ≤ x ≤ q ❼ If A is a totally unimodular matrix then if demands b i and capacities q ij are integers, the flows x ij are integers. ❼ All incidence matrices are totally unimodular. 16-16

  17. Examples ❼ Transportation problem: each node is a source or a sink ❼ Assignment problem: transportation problem where each source has supply 1 and each sink has demand 1. ❼ Transshipment problem: like a transportation problem, but it also has relay nodes (warehouses) ❼ Shortest path problem: single source, single sink, and the edge costs are the path lengths. ❼ Max-flow problem: single source, single sink. Add a feedback path with − 1 cost and minimize the cost. 16-17

  18. Least squares ❼ We want to solve Ax = b where A ∈ R m × n . ❼ Typical case of interest: m > n (overdetermined). If there is no solution to Ax = b we try instead to have Ax ≈ b . ❼ The least-squares approach: make Euclidean norm � Ax − b � as small as possible. Standard form: � 2 � � minimize � Ax − b x It’s an unconstrained convex QP. 16-18

  19. Example: curve-fitting ❼ We are given noisy data points ( x i , y i ). ❼ We suspect they are related by y = px 2 + qx + r ❼ Find the p , q , r that best agrees with the data. Writing all the equations: y 1 ≈ px 2 1 + qx 1 + r     x 2 1 y 1 x 1 1   p y 2 ≈ px 2 2 + qx 2 + r x 2 y 2 x 2 1     2 = ⇒  ≈ q  .   . . .  . . . . .   .  .   . . .  . r    x 2 1 y m x m y m ≈ px 2 m + qx m + r m ❼ Also called regression . 16-19

  20. Regularization Regularization: Additional penalty term added to the cost function to encourage a solution with desirable properties. Regularized least squares: � Ax − b � 2 + λ R ( x ) minimize x ❼ R ( x ) is the regularizer (penalty function) ❼ λ is the regularization parameter ❼ The model has different names depending on R ( x ). 16-20

  21. Examples � Ax − b � 2 + λ R ( x ) minimize x 1. If R ( x ) = � x � 2 = x 2 1 + x 2 2 + · · · + x 2 n It is called: L 2 regularization , Tikhonov regularization , or Ridge regression depending on the application. It has the effect of smoothing the solution. 2. If R ( x ) = � x � 1 = | x 1 | + | x 2 | + · · · + | x n | It is called: L 1 regularization or LASSO . It has the effect of sparsifying the solution (ˆ x will have few nonzero entries). 3. R ( x ) = � x � ∞ = max {| x 1 | , | x 2 | , . . . , | x n |} It is called L ∞ regularization and it has the effect of equalizing the solution (makes most components equal). 16-21

  22. Tradeoffs ❼ Suppose J 1 = � Ax − b � 2 and J 2 = � Cx − d � 2 . ❼ We would like to make both J 1 and J 2 small. ❼ A sensible approach: solve the optimization problem: minimize J 1 + λ J 2 x where λ > 0 is a (fixed) tradeoff parameter . ❼ Then tune λ to explore possible results. ◮ When λ → 0, we place more weight on J 1 ◮ When λ → ∞ , we place more weight on J 2 16-22

  23. Pareto curve J 2 λ → 0 feasible, but strictly suboptimal P a r e t o - o p t i m a l p o i n t s λ → ∞ infeasible J 1 ❼ Pareto-optimal points can only improve in J 1 at the expense of J 2 or vice versa. 16-23

  24. Example: Min-norm least squares Underdetermined case: A ∈ R m × n is a wide matrix ( m ≤ n ), so Ax = b has infinitely many solutions. ❼ Look to make both � Ax − b � 2 and � x � 2 small � Ax − b � 2 + λ � x � 2 minimize x ❼ In the limit λ → ∞ , we get x = 0 ❼ In the limit λ → 0, we get the min-norm solution: � x � 2 minimize x subject to: Ax = b 16-24

  25. Duality Intuition: Duality is all about finding solution bounds. ❼ If the primal problem is a minimization, all feasible points of the primal are upper bounds on the optimal solution. ❼ The dual problem is a maximization. All feasible points of the dual are lower bounds on the optimal solution. 16-25

Recommend


More recommend