csci 1951 g optimization methods in finance part 09
play

CSCI 1951-G Optimization Methods in Finance Part 09: Interior - PowerPoint PPT Presentation

CSCI 1951-G Optimization Methods in Finance Part 09: Interior Point Methods March 23, 2018 1 / 35 This material is covered in S. Boyd, L. Vandenberges book Convex Optimization https://web.stanford.edu/~boyd/cvxbook/ . Some of the


  1. CSCI 1951-G – Optimization Methods in Finance Part 09: Interior Point Methods March 23, 2018 1 / 35

  2. This material is covered in S. Boyd, L. Vandenberge’s book Convex Optimization https://web.stanford.edu/~boyd/cvxbook/ . Some of the materials and the figures are taken from it. 2 / 35

  3. Context • Two weeks ago: unconstrained problems, solved with descent methods • Last week: linearly constrained problems, solved with Newton’s method • This week: inequality constrained problems, solved with interior point methods 3 / 35

  4. Inequality constrained minimization problems min f 0 ( x ) s.t. f i ( x ) ≤ 0 , i = 1 , . . . , m Ax = b f 0 , . . . , f m : convex and twice continuously differentiable , A ∈ R p × n , rank ( A ) = p < n ) Assume: • optimal solution x ∗ exists, with obj. value p ∗ . • problem is strictly feasible (i.e., feasible region has interior points ) ⇒ Slater’s condition hold: There exist λ ∗ and ν ∗ that, with x ∗ , satisfy KKTs . 4 / 35

  5. Hierarchy of algorithms Transforming constrained problem to unconstrained : always possible , but has drawbacks Solving the constrained problem: direct , leverages problem structure What’s the constrained problem class that is the easiest to solve? Qadratic Problems with Linear equality Constraints (LCQP) Only require to solve ... a system of linear equations How did we solve generic problems with linear equality constraints? With Newton’s method , which solves a sequence of ...LCQPs! We will solve inequality constrained problems with interior point methods , which solve a sequence of linear constrained problems ! 5 / 35

  6. Problem Transformation Goal: approximate the Inequality Constrained Problem (ICP) with an Equality Constrained Problem (ECP) solvable with Newton’s method ; We start by transforming the ICP into an equivalent ECP: From: To: min f 0 ( x ) min g ( x ) s.t. f i ( x ) ≤ 0 , i = 1 , . . . , m s.t. Ax = b Ax = b For � m � 0 u ≤ 0 g ( x ) = f 0 ( x ) + I _ ( f i ( x )) where I _ ( u ) = ∞ u > 0 i =1 So we just use Newton’s method and we are done. The End. Nope. 6 / 35

  7. Logarithmic barrier m � min f 0 ( x ) + I _ ( f i ( x )) i =1 s.t. Ax = b The obj. function is in general not differentiable : We can’t use Newton’s method. We want to approximate I _ ( u ) with a differentiable function: I _ ( u ) = − 1 ˆ t log( − u ) with domain − R ++ , and where t > 0 is a parameter 7 / 35

  8. Logarithmic barrier ˆ I _ ( u ) is convex and differentiable 10 5 0 − 5 − 3 − 2 − 1 0 1 u Figure 11.1 The dashed lines show the function I − ( u ), and the solid curves show � I − ( u ) = − (1 /t ) log( − u ), for t = 0 . 5 , 1 , 2. The curve for t = 2 gives the best approximation. 8 / 35

  9. Logarithmic barrier m � min f 0 ( x ) − 1 log( − f i ( x )) t i =1 s.t. Ax = b The objective function is convex and differentiable : we can use Newton’s method φ ( x ) = − � m i =1 log( − f i ( x )) is called the logarithmic barrier for the problem 9 / 35

  10. Example: Inequality form linear programming min c T x Ax ≤ b The logarithmic barrier for this problem is � m log( b i − a T φ ( x ) = − i x ) i =1 where a i are the rows of A . 10 / 35

  11. How to choose t ? min f 0 ( x ) + 1 t φ ( x ) s.t. Ax = b is an approximation of the original problem. How does the quality of the approximation change with t ? t φ ( x ) tends to � I _ ( f i ( x )) As t grows, 1 so the approximation quality increases So let’s just use a large t ? Nope. 11 / 35

  12. Why not using (immediately) a large t ? What’s the intuition behind Newton’s method? Replace obj. function with 2nd-order Taylor approximation at x : f ( x + v ) ≈ f ( x ) + ∇ f ( x ) T v + 1 2 v T ∇ 2 f ( x ) v When does this approximation (and Newton’s method) work well ? When the Hessian changes slowly Is it the case for the barrier function? 12 / 35

  13. Back to the example min c T x s.t. Ax ≤ b m � log( b i − a T φ ( x ) = − i x ) i =1 � m 1 ∇ 2 φ ( x ) = i x ) 2 a i a T i ( b i − a T i =1 The Hessian changes fast as x gets close to the boundary of the feasible region. 13 / 35

  14. Why not using (immediately) a large t ? The Hessian of the function f 0 + 1 t φ varies rapidly near the boundary of the feasible set. This fact makes directly using a large t not efficient Instead, we will solve a sequence of problems in the form min f 0 ( x ) + 1 t φ ( x ) s.t. Ax = b for increasing values of t We start each Newton’ minimization at the solution of the problem for the previous value of t . 14 / 35

  15. The central path Slight rewrite: min tf 0 ( x ) + φ ( x ) s.t. Ax = b Assume it has a unique solution x ∗ ( t ) for each t > 0 . Central path : { x ∗ ( t ) : t > 0 } (made of central points) 15 / 35

  16. The central path Necessary and sufficient conditions for x ∗ ( t ) : • Strict feasibility : Ax ∗ ( t ) = b f i ( x ∗ ( t )) < 0 , i = 1 , . . . , m • Zero of the Lagrangian ( centrality condition ): Exists ˆ ν 0 = t ∇ f 0 ( x ∗ ( t )) + ∇ φ ( x ∗ ( t )) + A T ˆ ν m � 1 − f i ( x ∗ ( t )) ∇ f i ( x ∗ ( t )) + A T ˆ = t ∇ f 0 ( x ∗ ( t )) + ν i =1 16 / 35

  17. Back to the example min c T x s.t. Ax ≤ b m � log( b i − a T φ ( x ) = − i x ) i =1 Centrality condition: 0 = t ∇ f 0 ( x ∗ ( t )) + ∇ φ ( x ∗ ( t )) + A T ˆ ν m � 1 = tc + i xa i b i − a T i =1 17 / 35

  18. Back to the example 0 = tc + � m 1 i x a i i =1 b i − a T c x ⋆ (10) x ⋆ Figure 11.2 Central path for an LP with n = 2 and m = 6. The dashed curves show three contour lines of the logarithmic barrier function φ . The central path converges to the optimal point x ⋆ as t → ∞ . Also shown is the point on the central path with t = 10. The optimality condition (11.9) at this point can be verified geometrically: The line c T x = c T x ⋆ (10) is tangent to the contour line of φ through x ⋆ (10). 18 / 35

  19. Dual point from the central path Every central point x ∗ ( t ) yields a dual feasible point ( λ ∗ ( t ) , ν ∗ ( t )) , thus a ... lower bound to the optimal obj. value p ∗ : 1 ν ∗ ( t ) = ˆ ν λ ∗ i ( t ) = − tf i ( x ∗ ( t )) , i = 1 , . . . , m t The proof gives us a lot of information 19 / 35

  20. Proof • λ i ( t ) > 0 because f i ( x ∗ ( t )) < 0 • Rewrite the centrality condition: � m 1 − f i ( x ∗ ( t )) ∇ f i ( x ∗ ( t )) + A T ˆ 0 = t ∇ f 0 ( x ∗ ( t )) + ν i =1 m � i ( t ) ∇ f i ( x ∗ ( t )) + A T ν ∗ ( t ) = ∇ f 0 ( x ∗ ( t )) + λ ∗ i =1 • The above equals ∂L ∂x ( x ∗ ( t ) , λ ∗ ( t ) , ν ∗ ( t )) = 0 i.e., x ∗ ( t ) ... minimizes the Lagrangian at λ ∗ ( t ) , ν ∗ ( t ) ; 20 / 35

  21. Proof Let’s look at the dual function: m � g ( λ ∗ ( t ) , ν ∗ ( t )) = f 0 ( x ∗ ( t )) + λ ∗ i ( t ) f i x ∗ ( t ) + ν ∗ ( t )( Ax − b ) i =1 It holds g ( λ ∗ ( t ) , ν ∗ ( t )) = f 0 ( x ∗ ( t )) − m/t So f 0 ( x ∗ ( t )) p ∗ ≤ m/t i.e., x ∗ ( t ) is no more than m/t -suboptimal ! x ∗ ( t ) converges to x ∗ as t → ∞ . 21 / 35

  22. The barrier method To get an ε -approximation we could just set t = m/ε and solve min m ε f 0 ( x ) + φ ( x ) Ax = b This method does not scale well with the size of the problem and with ε . Barrier method : Compute x ∗ ( t ) for an increasing sequence of values t until t ≥ m/ε 22 / 35

  23. The barrier method input: strictly feasible x = x (0) , t = t (0) > 0 , µ > 1 , ε > 0 repeat: 1 Centering step : Compute x ∗ ( t ) by minimizing tf 0 + φ subject to Ax = b , starting at x 2 Update : x ← x ∗ ( t ) 3 Stopping criterion : quit if m/t < ε 4 Increase t : t ← µt What can we ask about this algorithm? 23 / 35

  24. The barrier method What can we ask about this algorithm? 1 How many iterations does it take to converge? 2 Do we need to optimally solve the centering step ? 3 What is a good value for µ ? 4 How to choose t (0) ? 24 / 35

  25. Convergence • The algorithm stops when m/t < ε • t starts at t (0) • t increases to µt at each iteration How to compute the number of iterations needed? We must find the smallest i such that m ε < t (0) µ i It holds: � log � m εt (0) i = log m Is there anything important that this analysis does not tell us ? It does not tell us whether, as t grows , the centering step becomes more difficult . (It does not ) 25 / 35

  26. 35 30 Newton iterations 25 20 15 10 1 10 2 10 3 m Figure 11.8 Average number of Newton steps required to solve 100 randomly generated LPs of different dimensions, with n = 2 m . Error bars show stan- dard deviation, around the average value, for each value of m . The growth in the number of Newton steps required, as the problem dimensions range over a 100:1 ratio, is very small. 26 / 35

  27. The barrier method What can we ask about this algorithm? 1 How many iterations does it take to converge? 2 Do we need to optimally solve the centering step ? 3 What is a good value for µ ? 4 How to choose t (0) ? 27 / 35

Recommend


More recommend