Inverse Optimization and Equilibrium with Applications in Finance and Statistics Jong-Shi Pang Department of Industrial and Enterprise Systems Engineering University of Illinois at Urbana-Champaign presented at SPECIAL SEMESTER on Stochastics with Emphasis on Finance at RICAM Linz, Austria Monday October 27, 2008, 10:50–11:40 AM 1
Contents of Presentation • A preface • What is inverse optimization? • 3 applications — cross validated support-vector regression — optimal mixing in statistics — implied volatility of American options • What is the methodology? — Focusing on concepts and ideas; omitting technical details. 2
Preface Up to now, most inverse problems in mathematics involve the inversion of partial-differential equations (pde’s)—the forward models—in the presence of observed and/or experimental data. They lead to optimization problems with pde constraints. In contrast, the kind of inverse problems we are interested in involves op- timization or equilibrium problems as the forward models, and requires the solution of finite-dimensional optimization problems with algebraic inequality together with certain complementarity constraints. The latter inverse problems require theories and methods of contemporary optimization and variational inequalities where inequalities provide the key challenges. Inequalities lead to non-smoothness, multi-valuedness, and dis- junctions, which are the atypical characteristics in modern computational mathematics. 3
Forward versus Inverse Optimization Optimization pertains to the computation of the maximum or mimimum value of an objective function in the presence of con- straints, which are, for all practical purposes, expressed in terms of a finite number of algebraic equations and/or inequalities de- fined by a finite number of decision variables. Traditional optimization is a forward process; namely, input data are fed into the optimization model, yielding a resolution of the problem—the model solution. Inverse Optimization attempts to build improved optimization models for the goal of better generalization, by choosing the model parameters so that the model solution could optimize a secondary objective, e.g., reproducing an observed solution, either exactly or as closely as possible. 4
An illustration: Inverse convex quadratic programming Given a set Ω and an outer objective function θ , find ( x, Q, A, b, c ): minimize θ ( x, Q, A, b, c ) ( x,Q,A,b,c ) subject to ( x, Q, A, b, c ) ∈ Ω 2 ( x ′ ) T Qx ′ + c T x ′ 1 x ∈ argmin x ′ and Ax ′ ≤ b subject to 3 salient features: • for each ( Q, A, b, c ) there is a lower-level quadratic program (in box) • for which an optimal solution x is sought such that the upper-level constraint ( x, Q, A, b, c ) ∈ Ω is satisfied • a tuple ( x, Q, A, b, c ) with the above properties is sought to minimize the upper-level objective function θ ( x, Q, A, b, c ). 5
Bilevel support-vector regression Given a finite set of in-sample data points { ( x i , y i ) } n i =1 , fit a hyperplane y = x T w + b by solving the convex quadratic program for ( w, b ): n | w T x i + b − y i | − ε, 0 + 1 � � 2 w T w � minimize max C ( w,b ) i =1 for given ( C, ε ) > 0. Let ( w ( C, ε ) , b ( C, ε )) be optimal. The inverse problem is to choose ( C, ε ) to minimize an error of a set of out-of-sample data, such as n + k | w ( C, ε ) T x j + b ( C, ε ) − y j | . � minimize ( C,ε ) j = n +1 Extension to the statistical methodology of cross validation, in- cluding the leave-one-out validation. 6
The (inner-level) SVM quadratic program n � e i + 1 2 w T w minimize C ( w,b ) i =1 subject to for all i = 1 , · · · , n w T x i + b − y i − ε e i ≥ e i ≥ − w T x i − b + y i − ε, and e i ≥ 0 , and its Karush-Kuhn-Tucker optimality conditions: ⊥ e i − w T x i − b + y i + ε ≥ 0 0 ≤ λ + i ⊥ e i + w T x i + b − y i + ε ≥ 0 0 ≤ λ − i = 1 , · · · , n i 0 ≤ e i ⊥ C − λ + − λ − ≥ 0 m m � ( λ + i − λ − � ( λ + i − λ − i ) x i , 0 = and 0 = i ) , i =1 i =1 where ⊥ denotes the complementary slackness condition; thus 0 ≤ a ⊥ b ≥ 0 if and only if [ a = 0 ≤ b ] or [ a ≥ 0 = b ]. 7
The bilevel SVM problem n + k � minimize e j ( C,ε,e,w,b ) j = n +1 e j ≥ | w T x j + b − y j | , subject to j = n + 1 , · · · , n + k and the inner SVM KKT conditions . • An instance of a linear program with linear complementarity constraints, abbreviated as an LPCC; i.e., a linear program except for the disjunctive complementarity slackness constraints. • As such, it is a nonconvex optimization problem, albeit of a very special kind. • In this application, the inverse process is to optimize an out-of-sample error based on an in-sample training set of data. 8
Extension: Cross-validated support-vector regression T : a positive integer (the number of folds) T � Ω = Ω t : a partitioning of the data into disjoint subgroups t =1 N t : index set of data in Ω t , with complement N t . The fold t training subproblem: C 2 � w t � 2 | ( w t ) T x i + b t − y i | − ε, 0 � 1 � � minimize 2 + max | N t | ( w t ,b t ) ∈ℜ n +1 i ∈N t − w ≤ w t ≤ w, subject to for feature selection yielding the fold t loss, which depends on the choice of ( C, ε, w ): | ( w t ) T x i + b t − y i | . � i ∈N t 9
Cross-validated support-vector regression ( cont. ) Given C > C ≥ 0, ε > ε ≥ 0, and w ub > w lb , T 1 | ( w t ) T x i + b t − y i | � � minimize | N t | C, ε, w t =1 i ∈N t { ( w t ,b t ) } T t =1 w lb ≤ w ≤ w ub subject to C ≤ C ≤ C, ε ≤ ε ≤ ε, and for t = 1 , · · · , T , C � ( w t ) T x i + b t − y i � � − ε, 0 2 � w t � 2 1 �� � ( w t , b t ) ∈ � argmin 2 + max |N t | ( w t ,b t ) i ∈N t − w ≤ w t ≤ w subject to • same ( C, ε, w ) across all folds • can easily accommodate other convex loss functions and constraints • extension to parameterized kernel selection • other tasks, such as classification and semi-supervised learning, can be similarly handled. 10
A bilevel maxi-likelihood approach to target classification Problem. Given data points for 2 target classes, identified by the columns in the 2 matrices: X I � � X I , 1 , · · · X I ,d � ∈ R n 1 × d , for target class I X II � � X II , 1 , · · · X II ,d � ∈ R n 2 × d , for target class II , determine a statistical model to classify future data as type I or II. Our approach is as follows: • Aggregate the data via a common set of weights: w ∈ W ⊆ R d , obtaining the aggregated data: X I , II w . • Apply an m -term mixture Gaussian model: � x − µ i m � � 2 � 1 � − 1 Ψ( x, µ, σ, p ) � √ p i exp 2 2 π σ i σ i i =1 to the aggregated data X I , II w . 11
• Determine the mixing coefficients p i via a log-likelihood maximization: n 1 ,n 2 p I , II ∈ � log Ψ( X I , II argmax j • w, µ, σ, p ) p j =1 m � subject to p i = 1 , p ≥ 0 . i =1 • The overall process chooses the parameters ( p I , II , w, µ, σ ) by maximizing a measure of separation between the two classes based on the given data X I , II : θ ( p I , II , w, µ, σ ) argmax p I , II ,w,µ,σ subject to w ∈ W n 1 ,n 2 p I , II ∈ argmax log Ψ( X I , II � and j • w, µ, σ, p ) p j =1 m � subject to p i = 1 , p ≥ 0 . i =1 12
Pricing American Options: the vanilla Black-Scholes model Consider the forward pricing of an American put/call option of an underlying asset whose (random) price pattern S ( t ) satisfies the stochastic differential equation: dS = ( µ S − D ( S, t ) ) dt + σ ( S, t ) S dW, where µ drift of the price process prevalent interest rate, assumed constant r D ( S, t ) dividend rate of the asset σ ( S, t ) non-constant volatility of the asset standard Wiener process with mean zero and variance dt . dW Let the Black-Scholes operation be denoted by 2 σ 2 ( S, t ) S 2 ∂ 2 L BS � ∂ ∂S 2 + ( r S − D ( S, t ) ) ∂ ∂t + 1 ∂S − r. 13
The forward pricing model The American option price V ( S, t ) satisfies the partial differential linear complementarity system: for ( S, t ) ∈ (0 , ∞ ) × [0 , T ], 0 ≤ V ( S, t ) − Λ( S, t ) ⊥ L BS ( V ) ≤ 0 , plus boundary conditions at terminal time t = T and extreme asset values S = 0 , ∞ , where time of expiry of option T Λ( S, t ) payoff function of option at expiry. The complementarity expresses the early exercise feature of an American option. 14
The discretized complementarity problem Discreting time and asset values, obtain a finite-dimensional lin- ear complementarity problem, parameterized by the asset volatil- ities: 0 ≤ V − Λ ⊥ q ( σ ) + M ( σ ) V ≥ 0 , where V � { V ( mδS, nδt ) } is the vector of approximated option prices at times t = nδt and asset values S = mδS . With suitable discretization, M ( σ ) is a strictly row diagonally dominant, albeit not always symmetric, matrix for fixed σ . Extensions to multiple-state problems, such as options on several assets, models with stochastic volatilities and interest rates, as well as some exotic options. 15
Recommend
More recommend