computational complexity of stochastic programs
play

Computational complexity of stochastic programs A. Shapiro School - PowerPoint PPT Presentation

Computational complexity of stochastic programs A. Shapiro School of Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta, Georgia 30332-0205, USA East Coast Optimization Meeting 2019 Consider optimization problem


  1. Computational complexity of stochastic programs A. Shapiro School of Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta, Georgia 30332-0205, USA East Coast Optimization Meeting 2019

  2. Consider optimization problem � � min f ( x ) = E [ F ( x, ξ )] , x ∈ X where X ⊂ R n , F : R n × R m → R and ξ is an m -dimensional random vector. In case of two-stage linear stochastic programming with recourse, X = { x ∈ R n + : Ax = b } and F ( x, ξ ) is the first stage cost c ⊤ x plus the optimal value of the second stage problem y ∈ R m q ⊤ y subject to Tx + Wy = h, y ≥ 0 , min with ξ formed from random components of q, T, W, h . For fixed x ∈ X the expectation E [ F ( x, ξ )] is given by the integral � E [ F ( x, ξ )] = F ( x, z ) dP ( z ) , where P is the probability distribution of ξ . 1

  3. A standard approach to solving such stochastic programs is to discretize distribution P , i.e., to construct scenarios ξ k , k = 1 , ..., K , with assigned probabilities p k > 0, and hence to ap- proximate E [ F ( x, ξ )] by � K k =1 p k F ( x, ξ k ). In the two-stage linear case this leads to the linear program c ⊤ x + � K k =1 p k q ⊤ min k y k x,y 1 ,...,y K s.t. T k x + W k y k = h k , k = 1 , ..., K, Ax = b, x ≥ 0 , y k ≥ 0 , k = 1 , ..., K. In order to have an accurate approximation of the ‘true’ distri- bution P the number K of required scenarios typically growths exponentially with dimension m . 2

  4. Computational complexity of solving two-stage linear stochas- tic programs (deterministic point of view): the approximate so- lutions, with a sufficiently high accuracy, of linear two-stage stochastic programs with fixed recourse are # P -hard even if the random problem data is governed by independent uniform distributions (Dyer and Stougie, 2006, Hanasusanto, Kuhn and Wiesemann, 2016). Sample complexity of solving stochastic programs Generate a sample ξ j , j = 1 , ..., N , of random vector ξ and ap- proximate the expectation E [ F ( x, ξ )] by the respective sample average. This leads to the following so-called Sample Average Approximation (SAA) of the ‘true’ problem   N f N ( x ) = 1   � F ( x, ξ j )  ˆ min  . N x ∈ X j =1 3

  5. Slow convergence of the sample average ˆ f N ( x ) to the expecta- tion f ( x ). By the Central Limit Theorem, for fixed x the error f N ( x ) − f ( x ) = O p ( N − 1 / 2 ) . ˆ v N be the optimal value of the SAA problem and v 0 and Let ˆ S 0 be the optimal value and set of optimal solutions of the true problem. Then under mild regularity conditions f N ( x ) + o p ( N − 1 / 2 ) . x ∈S 0 ˆ ˆ v N = min In particular, if S 0 = { x 0 } , then N 1 / 2 [ˆ v N − v 0 ] ⇒ N (0 , σ 2 ( x 0 )) (Shapiro, 1991). 4

  6. Large Deviations type bounds. Suppose that: ε > δ ≥ 0, the set X is of finite diameter D , there is a constant σ > 0 such that M x ′ ,x ( t ) ≤ exp { σ 2 t 2 / 2 } , t ∈ R , x ′ , x ∈ X, where M x ′ ,x ( t ) is the moment generating function of the random variable F ( x ′ , ξ ) − F ( x, ξ ) − E [ F ( x ′ , ξ ) − F ( x, ξ )], there exists κ ( ξ ) such that its moment generating function is finite valued in a neighborhood of zero and � � � ≤ κ ( ξ ) � x ′ − x � , x ′ , x ∈ X and a.e. ξ. � F ( x ′ , ξ ) − F ( x, ξ ) � � Then for L = E [ κ ( ξ )] and sample size � � � �� 8 σ 2 � 2 O (1) DL N ≥ n log + log , ( ε − δ ) 2 ( ε − δ ) 2 α � N ⊂ S ε � S δ S δ N and S ε ˆ Here ˆ ≥ 1 − α . we are guaranteed that Pr are the sets of δ -optimal and ε -optimal solutions of the SAA and true problems respectively. 5

  7. Stochastic Approximation (SA) approach. Suppose that the problem is convex, i.e., the feasible set X is convex and F ( · , ξ ) is convex for a.e. ξ . Classical SA algorithm x j +1 = Π X ( x j − γ j G ( x j , ξ j )) , where G ( x, ξ ) ∈ ∂ x F ( x, ξ ) is a calculated (sub)gradient, Π X is the orthogonal (Euclidean) projection onto X and γ j = θ/j . Theoret- ical bound (assuming f ( · ) is strongly convex and differentiable ) E [ f ( x j ) − v 0 ] = O ( j − 1 ) , for an optimal choice of constant θ (recall that v 0 is the optimal value of the true problem). This algorithm is very sensitive to choice of θ . 6

  8. Robust SA approach (B. Polyak, 1990, Nemirovski ). Constant step size variant: fixed in advance sample size (number of iter- x N = 1 � N ations) N and step size γ j ≡ γ , j = 1 , ..., N : ˜ j =1 x j . N Theoretical bound x N ) − v 0 ] ≤ D 2 2 γN + γM 2 X E [ f (˜ , 2 where D X = max x ∈ X � x − x 1 � 2 and M 2 = max x ∈ X E � G ( x, ξ ) � 2 2 . For optimal (up to factor θ ) γ = θD X √ N we have M ≤ D X M + θD X M ≤ κD X M � x N ) − v 0 � √ √ √ f (˜ , E 2 θ 2 N N N where κ = max { θ, θ − 1 } . By Markov inequality it follows that ≤ κD X M � � x N ) − v 0 > ε √ f (˜ , Pr ε N and hence to the sample size estimate N ≥ κ 2 D 2 X M 2 . ε 2 α 2 7

  9. Multistage stochastic programming. Let ξ t be a random (stochas- tic) process. Denote ξ [ t ] := ( ξ 1 , .., ξ t ) the history of the process ξ t up to time t . The values of the decision vector x t , chosen at stage t , may depend on the information ξ [ t ] available up to time t , but not on the future observations. The decision process has the form decision( x 0 ) � observation( ξ 1 ) � decision( x 1 ) � ... � observation( ξ T ) � decision( x T ) . Risk neutral T -stage stochastic programming problem: � � � � min F 1 ( x 1 ) + F 2 ( x 2 ( ξ [2] ) , ξ 2 ) + · · · + F T x T ( ξ [ T ] ) , ξ T E x 1 ,x 2 ( · ) ,...,x T ( · ) s . t . x 1 ∈ X 1 , x t ( ξ [ t ] ) ∈ X t ( x t − 1 ( ξ [ t − 1] ) , ξ t ) , t = 2 , . . . , T. In linear case, F t ( x t , ξ t ) := c ⊤ t x t and X t ( x t − 1 , ξ t ) := { x t : B t x t − 1 + A t x t = b t , x t ≥ 0 } , t = 2 , ..., T. 8

  10. Optimization is performed over feasible policies (also called de- cision rules). A policy is a sequence of (measurable) functions x t = x t ( ξ [ t ] ), t = 1 , ..., T . Each x t ( ξ [ t ] ) is a function of the data process up to time t , this ensures the nonanticipative property of a considered policy. If the number of realizations (scenarios) of the process ξ t is finite, then the above (linear) problem can be written as one large (linear) programming problem. 9

  11. Dynamic programming equations. Going recursively backwards in time. At stage T consider Q T ( x T − 1 , ξ T ) := x T ∈X T ( x T − 1 ,ξ T ) F T ( x T , ξ T ) . inf At stages t = T − 1 , ..., 2, consider � � � � Q t ( x t − 1 , ξ [ t ] ) := x t ∈X t ( x t − 1 ,ξ t ) F t ( x t , ξ t ) + E inf Q t +1 ( x t , ξ [ t +1] ) � ξ [ t ] . � �� � Q t +1 ( x t ,ξ [ t ] ) At the first stage solve: Min F 1 ( x 1 ) + E [ Q 2 ( x 1 , ξ 1 )] . x 1 ∈X 1 If the random process is stagewise independent, i.e., ξ t +1 is in- dependent of ξ [ t ] , then Q t +1 ( x t ) = E [ Q t +1 ( x t , ξ t +1 )] does not depend on ξ [ t ] . 10

  12. For example, suppose that the problem is linear and only the right hand side vectors b t are random and can be modeled as a (first order) autoregressive process b t = µ + Φ b t − 1 + ε t , where µ and Φ are (deterministic) vector and regression matrix, respectively, and the error process ε t , t = 1 , ..., T , is stagewise independent. The corresponding feasibility constraints can be written in terms of x t and b t as B t x t − 1 + A t x t ≤ b t , Φ b t − 1 − b t + µ + ε t = 0 . That is, in terms of decision variables ( x t , b t ) this becomes a linear multistage stochastic programming problem governed by the stagewise independent random process ε 1 , ..., ε T . 11

  13. Discretization by Monte Carlo sampling Independent of each other random samples ξ j t = ( c j t , B j t , A j t , b j t ), j = 1 , ..., N t , of respec- tive ξ t , t = 2 , ..., T , are generated and the corresponding scenario tree is constructed by connecting every ancestor node at stage t , ..., ξ N t t − 1 with the same set of children nodes ξ 1 t . In that way the stagewise independence is preserved in the generated sce- nario tree. We refer to the constructed problem as the Sample Average Approximation (SAA) problem. The total number of scenarios of the SAA problem is given by the product N = � T t =2 N t and quickly becomes astronomically large with increase of the number of stages even for moderate values of sample sizes N t . 12

  14. For T = 3, under certain regularity conditions, for ε > 0 and α ∈ (0 , 1), and the sample sizes N 1 and N 2 satisfying � n 1 exp � n 2 exp �� � � � � � � − O (1) N 1 ε 2 − O (1) N 2 ε 2 D 1 L 1 D 2 L 2 O (1) + ≤ α, σ 2 σ 2 ε ε 1 2 we have that any first-stage ε/ 2-optimal solution of the SAA problem is an ε -optimal first-stage solution of the true problem with probability at least 1 − α . In particular, suppose that N 1 = N 2 and take L := max { L 1 , L 2 } , D := max { D 1 , D 2 } , σ 2 := max { σ 2 1 , σ 2 2 } and n := max { n 1 , n 2 } . Then the required sample size N 1 = N 2 : � � � �� N 1 ≥ O (1) σ 2 � 1 O (1) DL n log + log , ε 2 ε α with total number of scenarios N = N 2 1 (Shapiro, 2006). 13

Recommend


More recommend