Extending SDDP-style Algorithms for Multistage Stochastic Programming Dave Morton Industrial Engineering & Management Sciences Northwestern University Joint work with: Oscar Dowson, Daniel Duque, and Bernardo Pagnoncelli
Collaborators
Hydroelectric Power Itaipu (14 GW)
Yuba, Bear and South Feather Hydrological Basin
SDDP Stochastic Dual Dynamic Programming
SLP- T z ∗ = min x 1 ≥ 0 c 1 x 1 + E ξ 2 | ξ 1 V 2 ( x 1 , ξ 2 ) s.t. A 1 x 1 = B 1 x 0 + b 1 where for t = 2 , . . . , T, V t ( x t − 1 , ξ t ) = min x t ≥ 0 c t x t + E ξ t +1 | ξ 1 ,...,ξ t V t +1 ( x t , ξ t +1 ) s.t. A t x t = B t x t − 1 + b t and where V T +1 ≡ 0 V t ( · , ξ t ) is piecewise linear and convex
SLP- T Assumptions for SDDP • Relatively complete recourse, finite optimal solution • ξ t = ( A t , B t , b t , c t ) is inter-stage independent • Or, ( A t , B t , c t ) is inter-stage independent and b t satisfies, e.g., – b t = Ψ( b t − 1 ) + ε t with ε t inter-stage independent; or, – b t = Ψ( b t − 1 ) · ε t with ε t inter-stage independent • Sample space: Ω t = Σ 2 × Σ 3 × · · · × Σ t with | Σ t | modest • T may be large
What Does “Solution” Mean? A solution is a policy
SDDP … … … … … … … … … (a) Forward Pass (b) Backward Pass
SDDP Master Programs min c t x t + θ t x t ,θ t s.t. A t x t = B t x t − 1 + b t − G k t x t + θ t ≥ g k t , k = 1 , 2 , . . . , K x t ≥ 0
Partially Observable Multistage Stochastic Programming Or, an alternative to DRO when you don’t really know the distribution An apology: Not talking about Wasserstein-based DRO for SLP- T via an SDDP Algorithm (with Daniel Duque)
Policy Graphs (Dowson) A policy graph for SLP- 3 with inter-stage independence: 1 2 3 Unfolds to a scenario tree: 3HH 2H 3HL 1 2L 3LH 3LL
Policy Graphs A Markov-switching model: Random transitions:
Inventory Example 1 D A H A 1 2 ρ R 1 D B H B 2 1 ρ Demand model A : P ( ω = 1) = 0 . 2 P ( ω = 2) = 0 . 8 Demand model B : P ( ω = 1) = 0 . 8 P ( ω = 2) = 0 . 2 u,x ′ ≥ 0 u + E ω [ H i ( x ′ , ω )] D i : D i ( x ) = min s.t. x ′ = x + u u,x ′ ≥ 0 2 u + x ′ + ρD i ( x ) H i : H i ( x, ω ) = min s.t. x ′ = x + u − ω
Policy Graphs Each node i : Ω i ω u = π i ( x, ω ) x x ′ x ′ = T i ( x, u, ω ) C i ( x, u, ω ) A policy graph: • G = ( R, N , E , Φ) • ω j ∈ Ω j : node-wise independent noise • feasible controls: u ∈ U i ( x, ω ) • transition function: x ′ = T i ( x, u, ω ) • one-step cost function: C i ( x, u, ω )
Policy Graphs min π E i ∈ R + ; ω ∈ Ω i [ V i ( x R , ω )] (1) where x, u, ω ) + E j ∈ i + ; ϕ ∈ Ω j [ V j ( x ′ , ϕ )] V i ( x, ω ) = min x,x ′ C i (¯ u, ¯ s.t. ¯ x = x (2) u ∈ U i (¯ x, ω ) x ′ = T i (¯ x, u, ω ) Goal: Find π i ( x, ω ) that solves (1) for each i ∈ N , x , and ω (A1) N is finite (A2) Ω i is finite and ω i is node-wise independent ∀ i ∈ N (A3) Excluding cost-to-go term, subproblem (2) is an LP (A4) Subproblem (2) has finite optimal solution (A5) Hit leaf node with probability 1 (or graph G is acyclic)
Policy Graphs with Partial Observability Extend policy graph to: G = ( R, N , E , Φ , A ) where A partitions N : A ∩ A ′ = ∅ , A � = A ′ � A = N A ∈A We know the current ambiguity set, A , but not which node Full observability: A = {{ i } : i ∈ N} , i.e., | A | = 1 But, could have | A | = 2 , where we know the stage but not the node
Updates to the Belief State 1 D A H A 1 2 ρ A = { A 1 , A 2 } , with A 1 = { D A , D B } and A 2 = { H A , H B } R 1 D B H B 2 1 ρ P { Node = k | ω, A } = 1 k ∈ A · P { ω | Node = k } P { Node = k } P { ω } [ 1 k ∈ A · P ( ω ∈ Ω k )] � i ∈N b i φ ik ← b k j ∈ A φ ij P ( ω ∈ Ω j ) � i ∈N b i � D ω A Φ ⊤ b b ← B ( b, ω ) = j ∈ A φ ij P ( ω ∈ Ω j ) � � i ∈N b i
Policy Graphs with Partial Observability Each node: Ω i ω b ← B ( b, ω ) x, b x ′ , b u = π i ( x, ω, b ) x ′ = T i ( x, u, ω ) C i ( x, u, ω ) • All nodes in an ambiguity set have the same C i , T i , and U i • Children i + , transition probabilities φ ij , even Ω i may differ
Policy Graphs with Partial Observability min π E i ∈ R + ; ω ∈ Ω i [ V i ( x R , B i ( b R , ω ) , ω )] (3) where x, u, ω ) + V ( x ′ , b ) V i ( x, b, ω ) = min x,x ′ C i (¯ u, ¯ s.t. ¯ x = x u ∈ U i (¯ x, ω ) x ′ = T i (¯ x, u, ω ) and where V ( x ′ , b ) = � � � P ( ϕ ∈ Ω k ) · V k ( x ′ , B k ( b, ϕ ) , ϕ ) b j φ jk j ∈N k ∈N ϕ ∈ Ω k Goal: Find π A ( x, b, ω ) that solves (3) for each A ∈ A , x , b , and ω
Saddle Property of Cost-to-go Function x, u, ω ) + V ( x ′ , b ) V i ( x, b, ω ) = min x,x ′ C i (¯ u, ¯ s.t. ¯ x = x u ∈ U i (¯ x, ω ) x ′ = T i (¯ x, u, ω ) where V ( x ′ , b ) = � � � P ( ϕ ∈ Ω k ) · V k ( x ′ , B k ( b, ϕ ) , ϕ ) b j φ jk j ∈N k ∈N ϕ ∈ Ω k Assume (A1)-(A5) with G acyclic Lemma 1. Fix i , b , ω . Then, V i ( x, b, ω ) is piecewise linear convex in x . Lemma 2. Fix x ′ . Then, V ( x ′ , b ) is piecewise linear concave in b . Theorem 1. V ( x ′ , b ) is a piecewise linear saddle function, which is convex in x ′ for fixed b and concave in b for fixed x ′ .
Linear Interpolation: Towards an SDDP Algorithm V ( b ) b ¯ ¯ ¯ ¯ ¯ b 1 = 0 b 2 b 3 b 4 b 5 = 1 K γ k V (¯ V ( b ) = max � b k ) γ ≥ 0 k =1 K s.t. � γ k = 1 k =1 K γ k ¯ � b k = b k =1
Saddle Function with Interpolated Cuts V ( x ′ , b ) x ′ b
Computing Cuts for What? x, u, ω ) + V A ( x ′ , b ) V i ( x, b, ω ) = min x,x ′ C i (¯ u, ¯ s.t. ¯ x = x u ∈ U i (¯ x, ω ) x ′ = T i (¯ x, u, ω ) where V A ( x ′ , b ) = � � � P ( ϕ ∈ Ω k ) · V k ( x ′ , B k ( b, ϕ ) , ϕ ) b j φ jk k ∈ j + j ∈ A ϕ ∈ Ω k
SDDP Master Program K V K i ( x, b, ω ) = min x,x ′ ,θ max γ ≥ 0 C i (¯ x, u, ω ) + � γ k θ k u, ¯ k =1 s.t. ¯ x = x [ λ ] u ∈ U i (¯ x, ω ) x ′ = T i (¯ x, u, ω ) K � γ k b k = b [ µ ] k =1 K � γ k = 1 [ ν ] k =1 θ k ≥ G k x ′ + g k , k = 1 , . . . , K
SDDP Master Program x, u, ω ) + µ ⊤ b + ν V K i ( x, b, ω ) = x,x ′ ,ν,µ C i (¯ min u, ¯ s.t. ¯ x = x, [ λ ] u ∈ U i (¯ x, ω ) x ′ = T i (¯ x, u, ω ) µ ⊤ b k + ν ≥ G k x ′ + g k , k = 1 , . . . , K Theorem 2. Assume (A1)-(A5) with G acyclic. Let the sample paths of the “obvious” SDDP algorithm be generated independently at each iteration. Then, the algorithm converges to an optimal policy almost surely in a finite number of iterations.
Inventory Example 1 D A H A 1 2 ρ R 1 D B H B 2 1 ρ Demand model A : P ( ω = 1) = 0 . 2 P ( ω = 2) = 0 . 8 Demand model B : P ( ω = 1) = 0 . 8 P ( ω = 2) = 0 . 2 u,x ′ ≥ 0 u + E ω [ H i ( x ′ , ω )] D i : D i ( x ) = min s.t. x ′ = x + u u,x ′ ≥ 0 2 u + x ′ + ρD i ( x ) H i : H i ( x, ω ) = min s.t. x ′ = x + u − ω
Inventory Example: Train Four Policies 1. fully observable : distribution known upon departing R 2. partially observable : ambiguity partition { D A , D B } , { H A , H B } 3. risk-neutral average demand : demand equally likely to be 1 or 2 4. DRO average demand : modified χ 2 method with radius 0.25
Inventory Example: Train Four Policies • 2000 out-of-sample costs over 50 periods; quartiles; ρ = 0 . 9 30 25 simulated cost ($) Discounted 20 15 10 Fully Partially Risk neutral DRO average average demand observable observable demand 140 simulated cost ($) 120 Undiscounted 100 80 60 Fully Partially Risk neutral DRO average average demand observable observable demand
Inventory Example One Sample Path of the Partially Observable Policy (a) Belief (b) First-stage buy (c) Inventory 2 2 1 Belief in model A 0 . 8 1 . 5 1 . 5 0 . 6 Units Units 1 1 0 . 4 0 . 5 0 . 5 0 . 2 2 4 6 8 10 12 2 4 6 8 10 12 2 4 6 8 10 12 Periods Periods Periods
Concluding Thoughts • Partially observable multistage stochastic programs – Saddle-cut SDDP algorithm – SDDP.jl (Dowson and Kapelevich) • Related saddle-function work in stochastic programming – Baucke et al. (2018): risk measures – Downward et al. (2018): stage-wise dependent obj. coefficients • Closely related ideas are well known in POMDPs – Contextual, multi-model, concurrent MDPs – We allow continuous state and action spaces via convexity • Countably infinite LPs for cyclic case • We did not handle decision-dependent learning – b ← B ( b, ω ) versus b ← B ( b, ω, u )
Concluding Thoughts http://www.optimization-online.org/DB_HTML/2019/03/7141.html
Recommend
More recommend