A Series of Lectures on Approximate Dynamic Programming Dimitri P - PowerPoint PPT Presentation

A Series of Lectures on Approximate Dynamic Programming Dimitri P . Bertsekas Laboratory for Information and Decision Systems Massachusetts Institute of Technology Lucca, Italy June 2017 Bertsekas (M.I.T.) Approximate Dynamic Programming 1 / 24

Our Aim Discuss optimization by Dynamic Programming (DP) and the use of approximations Purpose: Computational tractability in a broad variety of practical contexts Bertsekas (M.I.T.) Approximate Dynamic Programming 2 / 24

The Scope of these Lectures After an intoduction to exact DP , we will focus on approximate DP for optimal control under stochastic uncertainty The subject is broad with rich variety of theory/math, algorithms, and applications Applications come from a vast array of areas: control/robotics/planning, operations research, economics, artificial intelligence, and beyond ... We will concentrate on control of discrete-time systems with a finite number of stages (a finite horizon), and the expected value criterion We will focus mostly on algorithms ... less on theory and modeling We will not cover: Infinite horizon problems Imperfect state information and minimax/game problems Simulation-based methods: reinforcement learning, neuro-dynamic programming A series of video lectures on the latter can be found at the author’s web site Reference: The lectures will follow Chapters 1 and 6 of the author’s book “Dynamic Programming and Optimal Control," Vol. I, Athena Scientific, 2017 Bertsekas (M.I.T.) Approximate Dynamic Programming 3 / 24

Lectures Plan Exact DP The basic problem formulation Some examples The DP algorithm for finite horizon problems with perfect state information Computational limitations; motivation for approximate DP Approximate DP - I Approximation in value space; limited lookahead Parametric cost approximation, including neural networks Q -factor approximation, model-free approximate DP Problem approximation Approximate DP - II Simulation-based on-line approximation; rollout and Monte Carlo tree search Applications in backgammon and AlphaGo Approximation in policy space Bertsekas (M.I.T.) Approximate Dynamic Programming 4 / 24

First Lecture EXACT DYNAMINC PROGRAMMING Bertsekas (M.I.T.) Approximate Dynamic Programming 5 / 24

Outline Basic Problem 1 Some Examples 2 The DP Algorithm 3 Approximation Ideas 4 Bertsekas (M.I.T.) Approximate Dynamic Programming 6 / 24

Basic Problem Structure for DP Discrete-time system x k + 1 = f k ( x k , u k , w k ) , k = 0 , 1 , . . . , N − 1 x k : State; summarizes past information that is relevant for future optimization at time k u k : Control; decision to be selected at time k from a given set U k ( x k ) w k : Disturbance; random parameter with distribution P ( w k | x k , u k ) For deterministic problems there is no w k Cost function that is additive over time � N − 1 � � E g N ( x N ) + g k ( x k , u k , w k ) k = 0 Perfect state information The control u k is applied with (exact) knowledge of the state x k Bertsekas (M.I.T.) Approximate Dynamic Programming 8 / 24

Optimization over Feedback Policies w k u k = µ k ( x k ) x k System x k +1 = f k ( x k , u k , w k ) µ k Feedback policies: Rules that specify the control to apply at each possible state x k that can occur Major distinction: We minimize over sequences of functions of state π = { µ 0 , µ 1 , . . . , µ N − 1 } , with u k = µ k ( x k ) ∈ U k ( x k ) - not sequences of controls { u 0 , u 1 , . . . , u N − 1 } Cost of a policy π = { µ 0 , µ 1 , . . . , µ N − 1 } starting at initial state x 0 � N − 1 � � � � J π ( x 0 ) = E g N ( x N ) + g k x k , µ k ( x k ) , w k k = 0 Optimal cost function: J ∗ ( x 0 ) = min π J π ( x 0 ) Bertsekas (M.I.T.) Approximate Dynamic Programming 9 / 24

Scope of DP Any optimization (deterministic, stochastic, minimax, etc) involving a sequence of decisions fits the framework A continuous-state example: Linear-quadratic optimal control Linear discrete-time system: x k + 1 = Ax k + Bu k + w k , k = 0 , . . . , N − 1 x k ∈ ℜ n : The state at time k u k ∈ ℜ m : The control at time k (no constraints in the classical version) w k ∈ ℜ n : The disturbance at time k ( w 0 , . . . , w N − 1 are independent random variables with given distribution) Quadratic Cost Function � N − 1 � � x ′ x ′ k Qx k + u ′ E N Qx N + � k Ru k � k = 0 where Q and R are positive definite symmetric matrices Bertsekas (M.I.T.) Approximate Dynamic Programming 11 / 24

Discrete-State Deterministic Scheduling Example 9 6 ABC 3 5 AB ACB 1 2 4 6 2 e A 2 4 6 AC 3 5 Empty schedule 8 3 ACD 2 4 6 3 5 1 Initial al State 1 CAB CA 2 4 6 2 3 5 C 2 4 6 2 4 6 CAD 8 3 2 4 6 CD 3 5 CDA 1 2 Find optimal sequence of operations A, B, C, D (A must precede B and C must precede D) DP Problem Formulation States: Partial schedules; Controls: Stage 0, 1, and 2 decisions DP idea: Break down the problem into smaller pieces (tail subproblems) Start from the last decision and go backwards Bertsekas (M.I.T.) Approximate Dynamic Programming 12 / 24

Scheduling Example Algorithm I 9 6 ABC 3 5 AB ACB 1 3 9 2 4 6 2 e A 2 4 6 AC 3 5 8 3 ACD 2 4 6 3 5 10 5 A Stage 2 1 Initial Subproblem al State 1 CAB CA 2 4 6 2 3 5 C 2 4 6 8 3 2 4 6 CAD 8 3 2 4 6 CD 10 5 3 5 CDA 1 2 Solve the stage 2 subproblems (using the terminal costs) At each state of stage 2, we record the optimal cost-to-go and the optimal decision Bertsekas (M.I.T.) Approximate Dynamic Programming 13 / 24

Scheduling Example Algorithm II 9 6 ABC 3 5 AB ACB 1 3 9 2 4 6 2 e A 2 4 6 AC 3 5 8 ACD 8 3 2 4 6 3 5 5 1 Initial A Stage 1 al State Subproblem 1 CAB CA 2 4 6 2 3 5 C 2 4 6 8 3 2 4 6 CAD 8 3 5 7 2 4 6 CD 5 3 5 CDA 1 2 Solve the stage 1 subproblems (using the solution of stage 2 subproblems) At each state of stage 1, we record the optimal cost-to-go and the optimal decision Bertsekas (M.I.T.) Approximate Dynamic Programming 14 / 24

Scheduling Example Algorithm III 9 6 ABC 3 5 AB ACB 1 3 9 2 4 6 2 e A 2 4 6 AC 3 5 8 3 8 ACD 2 4 6 3 5 5 1 Initial Stage 0 al State Subproblem CAB 1 CA 2 4 6 2 10 3 5 C 2 4 6 8 3 2 4 6 8 3 CAD 5 7 2 4 6 CD 5 3 5 CDA 1 2 Solve the stage 0 subproblem (using the solution of stage 1 subproblems) The stage 0 subproblem is the entire problem The optimal value of the stage 0 subproblem is the optimal cost J ∗ (initial state) Construct the optimal sequence going forward Bertsekas (M.I.T.) Approximate Dynamic Programming 15 / 24

Principle of Optimality Let π ∗ = { µ ∗ 0 , µ ∗ 1 , . . . , µ ∗ N − 1 } be an optimal policy Consider the “tail subproblem" whereby we are at x k at time k and wish to minimize the “cost-to-go” from time k to time N � N − 1 � � � � E g N ( x N ) + g m x m , µ m ( x m ) , w m m = k Consider the “tail" { µ ∗ k , µ ∗ k + 1 , . . . , µ ∗ N − 1 } of the optimal policy Tail Subproblem x k Time 0 k N THE TAIL OF AN OPTIMAL POLICY IS OPTIMAL FOR THE TAIL SUBPROBLEM DP Algorithm Start with the last tail (stage N − 1) subproblems Solve the stage k tail subproblems, using the optimal costs-to-go of the stage ( k + 1 ) tail subproblems The optimal value of the stage 0 subproblem is the optimal cost J ∗ (initial state) In the process construct the optimal policy Bertsekas (M.I.T.) Approximate Dynamic Programming 16 / 24

Formal Statement of the DP Algorithm Computes for all k and states x k : J k ( x k ) : opt. cost of tail problem that starts at x k Go backwards, k = N − 1 , . . . , 0, using J N ( x N ) = g N ( x N ) � �� J k ( x k ) = u k ∈ U k ( x k ) E w k min g k ( x k , u k , w k ) + J k + 1 f k ( x k , u k , w k ) Interpretation: To solve a tail problem that starts at state x k Minimize the ( k th-stage cost + Opt. cost of the tail problem that starts at state x k + 1 ) Notes: J 0 ( x 0 ) = J ∗ ( x 0 ) : Cost generated at the last step, is equal to the optimal cost Let µ ∗ k ( x k ) minimize in the right side above for each x k and k . Then the policy π ∗ = { µ ∗ 0 , . . . , µ ∗ N − 1 } is optimal Proof by induction Bertsekas (M.I.T.) Approximate Dynamic Programming 18 / 24

Practical Difficulties of DP The curse of dimensionality (too many values of x k ) In continuous-state problems: ◮ Discretization needed ◮ Exponential growth of the computation with the dimensions of the state and control spaces In naturally discrete/combinatorial problems: Quick explosion of the number of states as the search space increases Length of the horizon (what if it is infinite?) The curse of modeling; we may not know exactly f k and P ( x k | x k , u k ) It is often hard to construct an accurate math model of the problem Sometimes a simulator of the system is easier to construct than a model The problem data may not be known well in advance A family of problems may be addressed. The data of the problem to be solved is given with little advance notice The problem data may change as the system is controlled – need for on-line replanning and fast solution Bertsekas (M.I.T.) Approximate Dynamic Programming 19 / 24

A Series of Lectures on Approximate Dynamic Programming Dimitri P - PowerPoint PPT Presentation

A Series of Lectures on Approximate Dynamic Programming Dimitri P . Bertsekas Laboratory for Information and Decision Systems Massachusetts Institute of Technology Lucca, Italy June 2017 Bertsekas (M.I.T.) Approximate Dynamic Programming 1

A Series of Lectures on Approximate Dynamic Programming Dimitri P . Bertsekas Laboratory for

Lead Screw Motors LSM08 Series LSM11 Series LSM14 Series LSM17 Series

Approximate Dynamic Programming A. LAZARIC ( SequeL Team @INRIA-Lille ) ENS Cachan - Master 2 MVA

Reinforcement Learning: Approximate Dynamic Programming Decision Making Under Uncertainty, Chapter

Approximate Computing Is Dead; Long Live Approximate Computing Adrian Sampson Cornell Hardware

Approximate Cross-Validation and Dynamic Experiments for Policy Choice Maximilian Kasy

Dynamic Programming Outline and Reading Matrix Chain-Product (5.3.1) Dynamic Programming:

Dynamic Programming Prof. Kuan-Ting Lai 2020/4/10 Dynamic Programming Dynamic Programming is

Course webpage WWW.cs.sfu.ca/~kabanets/307 307 Lectures Spring 2018 Page 1 307 Lectures Spring

CS 170 Section 6 Dynamic Programming Owen Jow | owenjow@berkeley.edu Agenda Dynamic

Dynamic Programming Kevin Zatloukal July 18, 2011 Motivation Dynamic programming deserves

standard series Overview DP series DX series H series M series bitte hier

Approximate Nearest Neighbors Search Approximate Nearest Neighbors Search in High Dimensions in

Time Series Forecasting With a Learning Algorithm: An Approximate Dynamic Programming Approach

Dynamic programming 1 Dynamic programming also solve a problem by combining the solutions to

Dynamic Programming December 15, 2016 CMPE 250 Dynamic Programming December 15, 2016 1 / 60

CS 1501 www.cs.pitt.edu/~nlf4/cs1501/ Greedy Algorithms and Dynamic Programming Consider the

Davis & Putnam Procedure (DP) dp Revision: 1.11 1 dates back to the 50ies: original

Parsing with Dynamic Programming Graham Neubig Site https://phontron.com/class/nn4nlp2017/ Two

Making Logical Form type-logical Glue Semantics for Minimalist syntax Matthew Gotham University

A tropical approach to a generalized Hodge conjecture for positive currents Farhad Babaee

HOPWA/COVID-19: The 5/22/20 Mega-Waiver and Current Questions and Answers for HOPWA Grantees and

Update on Statutory Update on Statutory Damages Caps Damages Caps R. Brent Cooper R. Brent

Wrongful Death Damages Strategy, Evidence and Calculations June 28, 2018 Brad Honnold

A Series of Lectures on Approximate Dynamic Programming Dimitri P - PowerPoint PPT Presentation

A Series of Lectures on Approximate Dynamic Programming Dimitri P . Bertsekas Laboratory for Information and Decision Systems Massachusetts Institute of Technology Lucca, Italy June 2017 Bertsekas (M.I.T.) Approximate Dynamic Programming 1

A Series of Lectures on Approximate Dynamic Programming Dimitri P . Bertsekas Laboratory for

Lead Screw Motors LSM08 Series LSM11 Series LSM14 Series LSM17 Series

Approximate Dynamic Programming A. LAZARIC ( SequeL Team @INRIA-Lille ) ENS Cachan - Master 2 MVA

Reinforcement Learning: Approximate Dynamic Programming Decision Making Under Uncertainty, Chapter

Approximate Computing Is Dead; Long Live Approximate Computing Adrian Sampson Cornell Hardware

Approximate Cross-Validation and Dynamic Experiments for Policy Choice Maximilian Kasy

Dynamic Programming Outline and Reading Matrix Chain-Product (5.3.1) Dynamic Programming:

Dynamic Programming Prof. Kuan-Ting Lai 2020/4/10 Dynamic Programming Dynamic Programming is

Course webpage WWW.cs.sfu.ca/~kabanets/307 307 Lectures Spring 2018 Page 1 307 Lectures Spring

CS 170 Section 6 Dynamic Programming Owen Jow | owenjow@berkeley.edu Agenda Dynamic

Dynamic Programming Kevin Zatloukal July 18, 2011 Motivation Dynamic programming deserves

standard series Overview DP series DX series H series M series bitte hier

Approximate Nearest Neighbors Search Approximate Nearest Neighbors Search in High Dimensions in

Time Series Forecasting With a Learning Algorithm: An Approximate Dynamic Programming Approach

Dynamic programming 1 Dynamic programming also solve a problem by combining the solutions to

Dynamic Programming December 15, 2016 CMPE 250 Dynamic Programming December 15, 2016 1 / 60

CS 1501 www.cs.pitt.edu/~nlf4/cs1501/ Greedy Algorithms and Dynamic Programming Consider the

Davis &amp; Putnam Procedure (DP) dp Revision: 1.11 1 dates back to the 50ies: original

Parsing with Dynamic Programming Graham Neubig Site https://phontron.com/class/nn4nlp2017/ Two

Making Logical Form type-logical Glue Semantics for Minimalist syntax Matthew Gotham University

A tropical approach to a generalized Hodge conjecture for positive currents Farhad Babaee

HOPWA/COVID-19: The 5/22/20 Mega-Waiver and Current Questions and Answers for HOPWA Grantees and

Update on Statutory Update on Statutory Damages Caps Damages Caps R. Brent Cooper R. Brent

Wrongful Death Damages Strategy, Evidence and Calculations June 28, 2018 Brad Honnold

Davis & Putnam Procedure (DP) dp Revision: 1.11 1 dates back to the 50ies: original