extending sddp style algorithms for multistage stochastic
play

Extending SDDP-style Algorithms for Multistage Stochastic - PowerPoint PPT Presentation

Extending SDDP-style Algorithms for Multistage Stochastic Programming Dave Morton Industrial Engineering & Management Sciences Northwestern University Joint work with: Oscar Dowson, Daniel Duque, and Bernardo Pagnoncelli Collaborators


  1. Extending SDDP-style Algorithms for Multistage Stochastic Programming Dave Morton Industrial Engineering & Management Sciences Northwestern University Joint work with: Oscar Dowson, Daniel Duque, and Bernardo Pagnoncelli

  2. Collaborators

  3. Hydroelectric Power Itaipu (14 GW)

  4. Yuba, Bear and South Feather Hydrological Basin

  5. SDDP Stochastic Dual Dynamic Programming

  6. SLP- T z ∗ = min x 1 ≥ 0 c 1 x 1 + E ξ 2 | ξ 1 V 2 ( x 1 , ξ 2 ) s.t. A 1 x 1 = B 1 x 0 + b 1 where for t = 2 , . . . , T, V t ( x t − 1 , ξ t ) = min x t ≥ 0 c t x t + E ξ t +1 | ξ 1 ,...,ξ t V t +1 ( x t , ξ t +1 ) s.t. A t x t = B t x t − 1 + b t and where V T +1 ≡ 0 V t ( · , ξ t ) is piecewise linear and convex

  7. SLP- T Assumptions for SDDP • Relatively complete recourse, finite optimal solution • ξ t = ( A t , B t , b t , c t ) is inter-stage independent • Or, ( A t , B t , c t ) is inter-stage independent and b t satisfies, e.g., – b t = Ψ( b t − 1 ) + ε t with ε t inter-stage independent; or, – b t = Ψ( b t − 1 ) · ε t with ε t inter-stage independent • Sample space: Ω t = Σ 2 × Σ 3 × · · · × Σ t with | Σ t | modest • T may be large

  8. What Does “Solution” Mean? A solution is a policy

  9. SDDP …
 …
 …
 …
 …
 …
 …
 …
 …
 (a) Forward Pass (b) Backward Pass

  10. SDDP Master Programs min c t x t + θ t x t ,θ t s.t. A t x t = B t x t − 1 + b t − G k t x t + θ t ≥ g k t , k = 1 , 2 , . . . , K x t ≥ 0

  11. Partially Observable Multistage Stochastic Programming Or, an alternative to DRO when you don’t really know the distribution An apology: Not talking about Wasserstein-based DRO for SLP- T via an SDDP Algorithm (with Daniel Duque)

  12. Policy Graphs (Dowson) A policy graph for SLP- 3 with inter-stage independence: 1 2 3 Unfolds to a scenario tree: 3HH 2H 3HL 1 2L 3LH 3LL

  13. Policy Graphs A Markov-switching model: Random transitions:

  14. Inventory Example 1 D A H A 1 2 ρ R 1 D B H B 2 1 ρ Demand model A : P ( ω = 1) = 0 . 2 P ( ω = 2) = 0 . 8 Demand model B : P ( ω = 1) = 0 . 8 P ( ω = 2) = 0 . 2 u,x ′ ≥ 0 u + E ω [ H i ( x ′ , ω )] D i : D i ( x ) = min s.t. x ′ = x + u u,x ′ ≥ 0 2 u + x ′ + ρD i ( x ) H i : H i ( x, ω ) = min s.t. x ′ = x + u − ω

  15. Policy Graphs Each node i : Ω i ω u = π i ( x, ω ) x x ′ x ′ = T i ( x, u, ω ) C i ( x, u, ω ) A policy graph: • G = ( R, N , E , Φ) • ω j ∈ Ω j : node-wise independent noise • feasible controls: u ∈ U i ( x, ω ) • transition function: x ′ = T i ( x, u, ω ) • one-step cost function: C i ( x, u, ω )

  16. Policy Graphs min π E i ∈ R + ; ω ∈ Ω i [ V i ( x R , ω )] (1) where x, u, ω ) + E j ∈ i + ; ϕ ∈ Ω j [ V j ( x ′ , ϕ )] V i ( x, ω ) = min x,x ′ C i (¯ u, ¯ s.t. ¯ x = x (2) u ∈ U i (¯ x, ω ) x ′ = T i (¯ x, u, ω ) Goal: Find π i ( x, ω ) that solves (1) for each i ∈ N , x , and ω (A1) N is finite (A2) Ω i is finite and ω i is node-wise independent ∀ i ∈ N (A3) Excluding cost-to-go term, subproblem (2) is an LP (A4) Subproblem (2) has finite optimal solution (A5) Hit leaf node with probability 1 (or graph G is acyclic)

  17. Policy Graphs with Partial Observability Extend policy graph to: G = ( R, N , E , Φ , A ) where A partitions N : A ∩ A ′ = ∅ , A � = A ′ � A = N A ∈A We know the current ambiguity set, A , but not which node Full observability: A = {{ i } : i ∈ N} , i.e., | A | = 1 But, could have | A | = 2 , where we know the stage but not the node

  18. Updates to the Belief State 1 D A H A 1 2 ρ A = { A 1 , A 2 } , with A 1 = { D A , D B } and A 2 = { H A , H B } R 1 D B H B 2 1 ρ P { Node = k | ω, A } = 1 k ∈ A · P { ω | Node = k } P { Node = k } P { ω } [ 1 k ∈ A · P ( ω ∈ Ω k )] � i ∈N b i φ ik ← b k j ∈ A φ ij P ( ω ∈ Ω j ) � i ∈N b i � D ω A Φ ⊤ b b ← B ( b, ω ) = j ∈ A φ ij P ( ω ∈ Ω j ) � � i ∈N b i

  19. Policy Graphs with Partial Observability Each node: Ω i ω b ← B ( b, ω ) x, b x ′ , b u = π i ( x, ω, b ) x ′ = T i ( x, u, ω ) C i ( x, u, ω ) • All nodes in an ambiguity set have the same C i , T i , and U i • Children i + , transition probabilities φ ij , even Ω i may differ

  20. Policy Graphs with Partial Observability min π E i ∈ R + ; ω ∈ Ω i [ V i ( x R , B i ( b R , ω ) , ω )] (3) where x, u, ω ) + V ( x ′ , b ) V i ( x, b, ω ) = min x,x ′ C i (¯ u, ¯ s.t. ¯ x = x u ∈ U i (¯ x, ω ) x ′ = T i (¯ x, u, ω ) and where V ( x ′ , b ) = � � � P ( ϕ ∈ Ω k ) · V k ( x ′ , B k ( b, ϕ ) , ϕ ) b j φ jk j ∈N k ∈N ϕ ∈ Ω k Goal: Find π A ( x, b, ω ) that solves (3) for each A ∈ A , x , b , and ω

  21. Saddle Property of Cost-to-go Function x, u, ω ) + V ( x ′ , b ) V i ( x, b, ω ) = min x,x ′ C i (¯ u, ¯ s.t. ¯ x = x u ∈ U i (¯ x, ω ) x ′ = T i (¯ x, u, ω ) where V ( x ′ , b ) = � � � P ( ϕ ∈ Ω k ) · V k ( x ′ , B k ( b, ϕ ) , ϕ ) b j φ jk j ∈N k ∈N ϕ ∈ Ω k Assume (A1)-(A5) with G acyclic Lemma 1. Fix i , b , ω . Then, V i ( x, b, ω ) is piecewise linear convex in x . Lemma 2. Fix x ′ . Then, V ( x ′ , b ) is piecewise linear concave in b . Theorem 1. V ( x ′ , b ) is a piecewise linear saddle function, which is convex in x ′ for fixed b and concave in b for fixed x ′ .

  22. Linear Interpolation: Towards an SDDP Algorithm V ( b ) b ¯ ¯ ¯ ¯ ¯ b 1 = 0 b 2 b 3 b 4 b 5 = 1 K γ k V (¯ V ( b ) = max � b k ) γ ≥ 0 k =1 K s.t. � γ k = 1 k =1 K γ k ¯ � b k = b k =1

  23. Saddle Function with Interpolated Cuts V ( x ′ , b ) x ′ b

  24. Computing Cuts for What? x, u, ω ) + V A ( x ′ , b ) V i ( x, b, ω ) = min x,x ′ C i (¯ u, ¯ s.t. ¯ x = x u ∈ U i (¯ x, ω ) x ′ = T i (¯ x, u, ω ) where V A ( x ′ , b ) = � � � P ( ϕ ∈ Ω k ) · V k ( x ′ , B k ( b, ϕ ) , ϕ ) b j φ jk k ∈ j + j ∈ A ϕ ∈ Ω k

  25. SDDP Master Program K V K i ( x, b, ω ) = min x,x ′ ,θ max γ ≥ 0 C i (¯ x, u, ω ) + � γ k θ k u, ¯ k =1 s.t. ¯ x = x [ λ ] u ∈ U i (¯ x, ω ) x ′ = T i (¯ x, u, ω ) K � γ k b k = b [ µ ] k =1 K � γ k = 1 [ ν ] k =1 θ k ≥ G k x ′ + g k , k = 1 , . . . , K

  26. SDDP Master Program x, u, ω ) + µ ⊤ b + ν V K i ( x, b, ω ) = x,x ′ ,ν,µ C i (¯ min u, ¯ s.t. ¯ x = x, [ λ ] u ∈ U i (¯ x, ω ) x ′ = T i (¯ x, u, ω ) µ ⊤ b k + ν ≥ G k x ′ + g k , k = 1 , . . . , K Theorem 2. Assume (A1)-(A5) with G acyclic. Let the sample paths of the “obvious” SDDP algorithm be generated independently at each iteration. Then, the algorithm converges to an optimal policy almost surely in a finite number of iterations.

  27. Inventory Example 1 D A H A 1 2 ρ R 1 D B H B 2 1 ρ Demand model A : P ( ω = 1) = 0 . 2 P ( ω = 2) = 0 . 8 Demand model B : P ( ω = 1) = 0 . 8 P ( ω = 2) = 0 . 2 u,x ′ ≥ 0 u + E ω [ H i ( x ′ , ω )] D i : D i ( x ) = min s.t. x ′ = x + u u,x ′ ≥ 0 2 u + x ′ + ρD i ( x ) H i : H i ( x, ω ) = min s.t. x ′ = x + u − ω

  28. Inventory Example: Train Four Policies 1. fully observable : distribution known upon departing R 2. partially observable : ambiguity partition { D A , D B } , { H A , H B } 3. risk-neutral average demand : demand equally likely to be 1 or 2 4. DRO average demand : modified χ 2 method with radius 0.25

  29. Inventory Example: Train Four Policies • 2000 out-of-sample costs over 50 periods; quartiles; ρ = 0 . 9 30 25 simulated cost ($) Discounted 20 15 10 Fully Partially Risk neutral DRO average average demand observable observable demand 140 simulated cost ($) 120 Undiscounted 100 80 60 Fully Partially Risk neutral DRO average average demand observable observable demand

  30. Inventory Example One Sample Path of the Partially Observable Policy (a) Belief (b) First-stage buy (c) Inventory 2 2 1 Belief in model A 0 . 8 1 . 5 1 . 5 0 . 6 Units Units 1 1 0 . 4 0 . 5 0 . 5 0 . 2 2 4 6 8 10 12 2 4 6 8 10 12 2 4 6 8 10 12 Periods Periods Periods

  31. Concluding Thoughts • Partially observable multistage stochastic programs – Saddle-cut SDDP algorithm – SDDP.jl (Dowson and Kapelevich) • Related saddle-function work in stochastic programming – Baucke et al. (2018): risk measures – Downward et al. (2018): stage-wise dependent obj. coefficients • Closely related ideas are well known in POMDPs – Contextual, multi-model, concurrent MDPs – We allow continuous state and action spaces via convexity • Countably infinite LPs for cyclic case • We did not handle decision-dependent learning – b ← B ( b, ω ) versus b ← B ( b, ω, u )

  32. Concluding Thoughts http://www.optimization-online.org/DB_HTML/2019/03/7141.html

Recommend


More recommend