CSCI 1951-G Optimization Methods in Finance Part 09: Interior - PowerPoint PPT Presentation

CSCI 1951-G – Optimization Methods in Finance Part 09: Interior Point Methods March 23, 2018 1 / 35

This material is covered in S. Boyd, L. Vandenberge’s book Convex Optimization https://web.stanford.edu/~boyd/cvxbook/ . Some of the materials and the figures are taken from it. 2 / 35

Context • Two weeks ago: unconstrained problems, solved with descent methods • Last week: linearly constrained problems, solved with Newton’s method • This week: inequality constrained problems, solved with interior point methods 3 / 35

Inequality constrained minimization problems min f 0 ( x ) s.t. f i ( x ) ≤ 0 , i = 1 , . . . , m Ax = b f 0 , . . . , f m : convex and twice continuously differentiable , A ∈ R p × n , rank ( A ) = p < n ) Assume: • optimal solution x ∗ exists, with obj. value p ∗ . • problem is strictly feasible (i.e., feasible region has interior points ) ⇒ Slater’s condition hold: There exist λ ∗ and ν ∗ that, with x ∗ , satisfy KKTs . 4 / 35

Hierarchy of algorithms Transforming constrained problem to unconstrained : always possible , but has drawbacks Solving the constrained problem: direct , leverages problem structure What’s the constrained problem class that is the easiest to solve? Qadratic Problems with Linear equality Constraints (LCQP) Only require to solve ... a system of linear equations How did we solve generic problems with linear equality constraints? With Newton’s method , which solves a sequence of ...LCQPs! We will solve inequality constrained problems with interior point methods , which solve a sequence of linear constrained problems ! 5 / 35

Problem Transformation Goal: approximate the Inequality Constrained Problem (ICP) with an Equality Constrained Problem (ECP) solvable with Newton’s method ; We start by transforming the ICP into an equivalent ECP: From: To: min f 0 ( x ) min g ( x ) s.t. f i ( x ) ≤ 0 , i = 1 , . . . , m s.t. Ax = b Ax = b For � m � 0 u ≤ 0 g ( x ) = f 0 ( x ) + I _ ( f i ( x )) where I _ ( u ) = ∞ u > 0 i =1 So we just use Newton’s method and we are done. The End. Nope. 6 / 35

Logarithmic barrier m � min f 0 ( x ) + I _ ( f i ( x )) i =1 s.t. Ax = b The obj. function is in general not differentiable : We can’t use Newton’s method. We want to approximate I _ ( u ) with a differentiable function: I _ ( u ) = − 1 ˆ t log( − u ) with domain − R ++ , and where t > 0 is a parameter 7 / 35

Logarithmic barrier ˆ I _ ( u ) is convex and differentiable 10 5 0 − 5 − 3 − 2 − 1 0 1 u Figure 11.1 The dashed lines show the function I − ( u ), and the solid curves show � I − ( u ) = − (1 /t ) log( − u ), for t = 0 . 5 , 1 , 2. The curve for t = 2 gives the best approximation. 8 / 35

Logarithmic barrier m � min f 0 ( x ) − 1 log( − f i ( x )) t i =1 s.t. Ax = b The objective function is convex and differentiable : we can use Newton’s method φ ( x ) = − � m i =1 log( − f i ( x )) is called the logarithmic barrier for the problem 9 / 35

Example: Inequality form linear programming min c T x Ax ≤ b The logarithmic barrier for this problem is � m log( b i − a T φ ( x ) = − i x ) i =1 where a i are the rows of A . 10 / 35

How to choose t ? min f 0 ( x ) + 1 t φ ( x ) s.t. Ax = b is an approximation of the original problem. How does the quality of the approximation change with t ? t φ ( x ) tends to � I _ ( f i ( x )) As t grows, 1 so the approximation quality increases So let’s just use a large t ? Nope. 11 / 35

Why not using (immediately) a large t ? What’s the intuition behind Newton’s method? Replace obj. function with 2nd-order Taylor approximation at x : f ( x + v ) ≈ f ( x ) + ∇ f ( x ) T v + 1 2 v T ∇ 2 f ( x ) v When does this approximation (and Newton’s method) work well ? When the Hessian changes slowly Is it the case for the barrier function? 12 / 35

Back to the example min c T x s.t. Ax ≤ b m � log( b i − a T φ ( x ) = − i x ) i =1 � m 1 ∇ 2 φ ( x ) = i x ) 2 a i a T i ( b i − a T i =1 The Hessian changes fast as x gets close to the boundary of the feasible region. 13 / 35

Why not using (immediately) a large t ? The Hessian of the function f 0 + 1 t φ varies rapidly near the boundary of the feasible set. This fact makes directly using a large t not efficient Instead, we will solve a sequence of problems in the form min f 0 ( x ) + 1 t φ ( x ) s.t. Ax = b for increasing values of t We start each Newton’ minimization at the solution of the problem for the previous value of t . 14 / 35

The central path Slight rewrite: min tf 0 ( x ) + φ ( x ) s.t. Ax = b Assume it has a unique solution x ∗ ( t ) for each t > 0 . Central path : { x ∗ ( t ) : t > 0 } (made of central points) 15 / 35

The central path Necessary and sufficient conditions for x ∗ ( t ) : • Strict feasibility : Ax ∗ ( t ) = b f i ( x ∗ ( t )) < 0 , i = 1 , . . . , m • Zero of the Lagrangian ( centrality condition ): Exists ˆ ν 0 = t ∇ f 0 ( x ∗ ( t )) + ∇ φ ( x ∗ ( t )) + A T ˆ ν m � 1 − f i ( x ∗ ( t )) ∇ f i ( x ∗ ( t )) + A T ˆ = t ∇ f 0 ( x ∗ ( t )) + ν i =1 16 / 35

Back to the example min c T x s.t. Ax ≤ b m � log( b i − a T φ ( x ) = − i x ) i =1 Centrality condition: 0 = t ∇ f 0 ( x ∗ ( t )) + ∇ φ ( x ∗ ( t )) + A T ˆ ν m � 1 = tc + i xa i b i − a T i =1 17 / 35

Back to the example 0 = tc + � m 1 i x a i i =1 b i − a T c x ⋆ (10) x ⋆ Figure 11.2 Central path for an LP with n = 2 and m = 6. The dashed curves show three contour lines of the logarithmic barrier function φ . The central path converges to the optimal point x ⋆ as t → ∞ . Also shown is the point on the central path with t = 10. The optimality condition (11.9) at this point can be verified geometrically: The line c T x = c T x ⋆ (10) is tangent to the contour line of φ through x ⋆ (10). 18 / 35

Dual point from the central path Every central point x ∗ ( t ) yields a dual feasible point ( λ ∗ ( t ) , ν ∗ ( t )) , thus a ... lower bound to the optimal obj. value p ∗ : 1 ν ∗ ( t ) = ˆ ν λ ∗ i ( t ) = − tf i ( x ∗ ( t )) , i = 1 , . . . , m t The proof gives us a lot of information 19 / 35

Proof • λ i ( t ) > 0 because f i ( x ∗ ( t )) < 0 • Rewrite the centrality condition: � m 1 − f i ( x ∗ ( t )) ∇ f i ( x ∗ ( t )) + A T ˆ 0 = t ∇ f 0 ( x ∗ ( t )) + ν i =1 m � i ( t ) ∇ f i ( x ∗ ( t )) + A T ν ∗ ( t ) = ∇ f 0 ( x ∗ ( t )) + λ ∗ i =1 • The above equals ∂L ∂x ( x ∗ ( t ) , λ ∗ ( t ) , ν ∗ ( t )) = 0 i.e., x ∗ ( t ) ... minimizes the Lagrangian at λ ∗ ( t ) , ν ∗ ( t ) ; 20 / 35

Proof Let’s look at the dual function: m � g ( λ ∗ ( t ) , ν ∗ ( t )) = f 0 ( x ∗ ( t )) + λ ∗ i ( t ) f i x ∗ ( t ) + ν ∗ ( t )( Ax − b ) i =1 It holds g ( λ ∗ ( t ) , ν ∗ ( t )) = f 0 ( x ∗ ( t )) − m/t So f 0 ( x ∗ ( t )) p ∗ ≤ m/t i.e., x ∗ ( t ) is no more than m/t -suboptimal ! x ∗ ( t ) converges to x ∗ as t → ∞ . 21 / 35

The barrier method To get an ε -approximation we could just set t = m/ε and solve min m ε f 0 ( x ) + φ ( x ) Ax = b This method does not scale well with the size of the problem and with ε . Barrier method : Compute x ∗ ( t ) for an increasing sequence of values t until t ≥ m/ε 22 / 35

The barrier method input: strictly feasible x = x (0) , t = t (0) > 0 , µ > 1 , ε > 0 repeat: 1 Centering step : Compute x ∗ ( t ) by minimizing tf 0 + φ subject to Ax = b , starting at x 2 Update : x ← x ∗ ( t ) 3 Stopping criterion : quit if m/t < ε 4 Increase t : t ← µt What can we ask about this algorithm? 23 / 35

The barrier method What can we ask about this algorithm? 1 How many iterations does it take to converge? 2 Do we need to optimally solve the centering step ? 3 What is a good value for µ ? 4 How to choose t (0) ? 24 / 35

Convergence • The algorithm stops when m/t < ε • t starts at t (0) • t increases to µt at each iteration How to compute the number of iterations needed? We must find the smallest i such that m ε < t (0) µ i It holds: � log � m εt (0) i = log m Is there anything important that this analysis does not tell us ? It does not tell us whether, as t grows , the centering step becomes more difficult . (It does not ) 25 / 35

35 30 Newton iterations 25 20 15 10 1 10 2 10 3 m Figure 11.8 Average number of Newton steps required to solve 100 randomly generated LPs of different dimensions, with n = 2 m . Error bars show standard deviation, around the average value, for each value of m . The growth in the number of Newton steps required, as the problem dimensions range over a 100:1 ratio, is very small. 26 / 35

The barrier method What can we ask about this algorithm? 1 How many iterations does it take to converge? 2 Do we need to optimally solve the centering step ? 3 What is a good value for µ ? 4 How to choose t (0) ? 27 / 35

CSCI 1951-G Optimization Methods in Finance Part 09: Interior - PowerPoint PPT Presentation

CSCI 1951-G Optimization Methods in Finance Part 09: Interior Point Methods March 23, 2018 1 / 35 This material is covered in S. Boyd, L. Vandenberges book Convex Optimization https://web.stanford.edu/~boyd/cvxbook/ . Some of the

CSCI 1951-G Optimization Methods in Finance Part 11: Stochastic Optimization April 13, 2018

CSCI 1951-G Optimization Methods in Finance Part 07: Portfolio Optimization March 916,

CSCI 1951-G Optimization Methods in Finance Part 10: Conic Optimization April 6, 2018 1 /

CSCI 1951-G Optimization Methods in Finance Part 06: Algorithms for Unconstrained Convex

CSCI 1951-G Optimization Methods in Finance Part 00: Course Logistics Introduction to

CSCI 1951-G Optimization Methods in Finance Part 04: Building Index Funds with Integer

CSCI 1951-G Optimization Methods in Finance Part 01: Linear Programming January 26, 2018 1

CSCI 1951-G Optimization Methods in Finance Part 03: (Mixed) Integer (Linear) Programming

15-780: Optimization J. Zico Kolter March 14-16, 2015 1 Outline Introduction to optimization

Digital Computing in 1951 or, How I Spent my Summer Vacation Bob Braden USC/ISI 1 21

CSCI 5582 Artificial Intelligence Lecture 3 Jim Martin CSCI 5582 Fall 2006 Page 1 Today: 9/5

CSCI 5832 Natural Language Processing Lecture 21 Jim Martin 4/24/07 CSCI 5832 Spring 2007 1

CSCI 5582 Artificial Intelligence Lecture 4 Jim Martin CSCI 5582 Fall 2006 Today 9/7

STUDENT FINANCE STUDENT FINANCE STUDENT FINANCE STUDENT FINANCE 2014/15 20 2014/15 20 14/15

Conformal Field Theories, Conformal Bootstrap and Applications Konstantinos Deligiannis December

CSCI 2133 Rapid Programming Techniques for Innovation UI Design CSS Grid and Flexbox

Mixed-Integer Nonlinear Programming Ksenia Bestuzheva Zuse Institute Berlin CO@Work 2020

Exploiting sparsity in model matrices Douglas Bates and Martin Maechler Department of Statistics

Data Mining / Intelligent Data Analysis Christian Borgelt Dept. of Mathematics / Dept. of

Estimation in Mixed Models with Dirichlet Process Random Effects Both Sides of the Story George

Optimality Conditions Fabio Schoen 2008 http://gol.dsi.unifi.it/users/schoen Optimality

Solving bitvectors with MCSAT: explanations from bits and pieces Stphane Graham-Lengrand, Dejan

Constrained optimization Problem in standard form minimize f ( x ) subject to a i ( x ) = 0, for i

More Realistic Power Grid Verification Based on Hierarchical Current and Power constraints 2