Optimal Control and Dynamic Programming 4SC000 Q2 2017-2018 Duarte - PowerPoint PPT Presentation

Optimal Control and Dynamic Programming 4SC000 Q2 2017-2018 Duarte Antunes

Introduction • We have seen in the two previous lectures (5, 6) that for stage decision problems with quadratic costs and linear models we can compute analytically the costs-to-go of the DP algorithm and the optimal policy. • However, linear models and quadratic costs is one of the few cases where this happens. In fact, we typically cannot run DP analytically. Example of slide 11, lecture 5: 1 Cost: X x 2 k + u 2 Model: k + g 2 ( x 2 ) g 2 ( x 2 ) = e x 2 x k +1 = x k + u k k =0 • In the next three lectures we will discuss three alternative methods and their pros and cons: • Discretization (lecture 7) • Static optimization (lecture 7 and 8) • Approximate dynamic programming (lecture 9) 1

Outline • Discretization • Introduction to static optimization

Discretization Discrete optimization Stage decision problems problems state and input discretization • Stage decision problems can be approximated by a process called discretization (also denoted by sampling or quantization). 2

Example 1 X x 2 k + u 2 k + g 2 ( x 2 ) k ∈ { 0 , 1 } x k +1 = x k + u k k =0 g 2 ( x 2 ) = e x 2 When we obtained (see lecture 5) the optimal policy and the optimal g 2 ( x 2 ) = x 2 2 costs-to-go u 0 = − 3 u 1 = − 1 5 x 0 2 x 1 J 1 ( x 1 ) = 3 J 0 ( x 0 ) = 8 2 x 2 5 x 2 1 0 We will now recover these functions using an alternative method (discretization) and then apply it to the problem with a non-quadratic terminal cost. 3

Example: discretization 1 Stage-decision X x 2 k + u 2 k + g 2 ( x 2 ) k ∈ { 0 , 1 } x k +1 = x k + u k problem k =0 Discretization u k ∈ { − M, − M + 1 . . . , M } u k = δ ¯ ¯ x k = δ ¯ u k x k x k ∈ { − N, − N + 1 . . . , N } ¯ Discrete optimization δ 2 ( P 1 x 2 u 2 k =0 ¯ k + ¯ k ) + g 2 ( δ ¯ x 2 ) x k +1 = ¯ ¯ x k + ¯ k ∈ { 0 , 1 } u k problem 4

Results When we indeed recover (approximately) the optimal policy g 2 ( x 2 ) = x 2 2 12 40 40 Discretization Discretization Discretization Optimal solution Optimal solution Optimal solution 35 35 10 30 30 J 0 ( x 0 ) = 8 J 1 ( x 1 ) = 3 8 J 2 ( x 2 ) = x 2 5 x 2 2 x 2 25 25 2 0 1 J 0 (x 0 ) J 1 (x 1 ) J 1 (x 1 ) 6 20 20 15 15 4 10 10 2 5 5 0 0 0 -2 -1 0 1 2 -5 0 5 -5 0 5 x 0 x 1 x 1 2.5 1.5 Discretization Discretization 2 Optimal solution Optimal solution N = 100 1 1.5 M = 50 1 0.5 0.5 0 (x 0 ) 1 (x 1 ) 0 0 δ = 0 . 05 -0.5 -0.5 u 0 = − 3 u 1 = − 1 -1 5 x 0 2 x 1 -1.5 -1 -2 -1.5 -2.5 5 -2 -1 0 1 2 -5 0 5 x 0 x 1

Results Which encourages us to think that the optimal policy when can be g 2 ( x 2 ) = e x 2 (approximately) obtained with the same method 12 45 2000 Discretization Discretization Discretization 40 10 35 1500 8 30 25 J 0 (x 0 ) J 1 (x 1 ) J 2 (x 2 ) 6 1000 20 4 15 500 10 2 5 0 0 0 -2 -1 0 1 2 -5 0 5 -6 -4 -2 0 2 4 6 x 0 x 1 x 2 1.5 0 Discretization Discretization 1 -0.5 0.5 N = 100 -1 0 0 (x 0 ) 1 (x 1 ) M = 50 -0.5 -1.5 -1 δ = 0 . 05 -2 -1.5 -2 -2.5 -2 -1 0 1 2 -4 -3 -2 -1 0 1 2 3 4 x 0 x 1 This statement can be formalised (see LaValle’s book) but we do not pursue this here 6

Results Note that the cost if the initial state is x 0 = 1 2 . 64 Discretization 3.5 3 2.5 J 0 (x 0 ) 2 1.5 1 0.5 0 0.8 0.9 1 1.1 1.2 1.3 1.4 x 0 and corresponds to an initial control leading to state x 1 = 0 . 3 u 0 = − 0 . 7 and in turn to the control u 1 = − 0 . 45 Discretization 0.2 Discretization -0.05 0 -0.1 -0.2 -0.15 -0.2 -0.4 0 (x 0 ) 1 (x 1 ) -0.25 -0.6 -0.3 -0.8 -0.35 -1 -0.4 -1.2 -0.45 -1.4 -0.5 0.4 0.6 0.8 1 1.2 1.4 1.6 0 0.1 0.2 0.3 0.4 0.5 0.6 x 0 x 1 We will (approximately) obtain these values later with a different method! 7

Discussion • We discuss next how to extend the discretisation method to the general case where the state belongs to with the help of the following example. R n • Consider the following toy problem: compute the force to move a unitary mass 1 meter along a flat surface from rest to rest in 1 second with minimum energy. z (0) = 0 z (1) = 1 z ( t ) = u ( t ) ¨ R 1 0 u ( t ) 2 dt z (0) = ˙ ˙ z (1) = 0 min • Later in the course we will learn the tools to find an optimal solution to this problem which is , resulting in v ( t ) = 6 t − 6 t 2 x ( t ) = 3 t 2 − 2 t 3 u ( t ) = 6 − 12 t • To convert from continuous to discrete time we also need temporal discretization. 8

Discretization Continuous-time optimal Discrete optimization Stage decision control problems problems problems temporal state and input discretization discretization 9

Outline • Discretization • Digital control and temporal discretization • State and input discretization • Application: minimum energy control of a vehicle • Introduction to static optimization

Digital control Physical system Actuators Sensors x ( t ) = f c ( t, x ( t ) , u ( t )) ˙ Actuation State x ( t ) u ( t ) = u k , t ∈ [ t k , t k +1 ) D/A A/D Sampled Control State decisions Control law / x k = x ( t k ) u k t k := k τ Digital algorithm u k = µ k ( x k ) Sampling period τ Discretization: system ”seen by the controller” x k +1 = f k ( x k , u k ) 10

Temporal discretization • Solve the differential equation in t ∈ [ t k , t k +1 ) x ( t ) = f c ( t, x ( t ) , u ( t )) ˙ for an initial condition and a constant control input x ( t k ) = x k u ( t ) = u k an evaluate at t = t k +1 x k = x ( t k ) x k +1 = x ( t k +1 ) = f ( x k , u k ) t h t 0 t k +1 t k t 1 • For example if , use the variation of constants formula x ( t ) = Ax ( t ) + Bu ( t ) ˙ R t t k e A ( t − s ) Bu ( s ) ds to conclude that x ( t ) = e A ( t − t k ) x ( t k ) + R τ A d = e A τ , B d = 0 e As dsB x k +1 = A d x k + B d u k 11

Double integrator  τ 2  z k +1 �  1 �  z k � � Model  �  �  z ( t ) �  0 � z ( t ) ˙ 0 1 τ = + 2 = + u ( t ) u k 0 1 v ( t ) ˙ 0 0 v ( t ) 1 v k +1 v k τ u ( t ) = u k , t ∈ [ t k , t k +1 ) z k = z ( t k ) v k = v ( t k ) K τ = 1 t k +1 − t k = τ Cost R 1 R t k +1 0 u ( t ) 2 dt = P K − 1 τ P K − 1 u 2 k =0 u 2 k dt k k =0 t k Initial and z (0) = 0 z (1) = 1 z 0 = v 0 = v K = 0 terminal z K = 1 z (0) = ˙ ˙ z (1) = 0 conditions Optimal solution � K u k = − R − 1 B | (( A | ) − 1 ) k +1 λ 0  M 11 �  A − BR − 1 B | ( A | ) − 1 M 12 = (can be obtained ( A | ) − 1 0 M 21 M 22  � 1 λ 0 = M − 1 by LQR) 12 0 12

Solution τ = 0 . 2 6 1 1.5 u k v k x k 4 u(t) v(t) 0.8 x(t) 2 1 0.6 0 0.4 -2 0.5 0.2 -4 -6 0 0 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 time time time τ = 0 . 1 6 1 2 u k v k x k 4 u(t) v(t) 0.8 x(t) 1.5 2 0.6 0 1 0.4 -2 0.5 0.2 -4 -6 0 0 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 time time time 13

Solution τ = 0 . 05 6 1 2 u k v k x k 4 u(t) v(t) 0.8 x(t) 1.5 2 0.6 0 1 0.4 -2 0.5 0.2 -4 -6 0 0 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 time time time τ = 0 . 025 6 1 2 u k v k 4 x k u(t) v(t) 0.8 1.5 x(t) 2 0.6 0 1 0.4 -2 0.5 0.2 -4 -6 0 0 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 time time time 14

Discussion on temporal discretization Temporal discretization • It is also useful in other contexts and we will exploit this later in the course. Dynamic model • For non-linear systems discretization may be hard. A numerical method (e.g. Euler) is typically used x k +1 = x k + τ f ( t k , x k , u k ) Cost function • In the problem formulation, the state and input variables may already be penalized at sampling times. h − 1 h − 1 X X g k ( x ( t k ) , u ( t k )) + g h ( x ( T )) = g k ( x k , u k ) + g h ( x h ) k =0 k =0 • Cost may result from discretizing a continuous-time cost function Z t k +1 R T 0 g c ( t, x ( t ) , u ( t )) dt = P h − 1 g c ( t, x ( t ) , u ( t )) dt k =0 t k | {z } • g k ( x k ,u k ) We can also use the approximation R T 0 g c ( t, x ( t ) , u ( t )) dt = P h − 1 k =0 τ g c ( k τ , x k , u k ) 15

Optimal Control and Dynamic Programming 4SC000 Q2 2017-2018 Duarte - PowerPoint PPT Presentation

Optimal Control and Dynamic Programming 4SC000 Q2 2017-2018 Duarte Antunes Introduction We have seen in the two previous lectures (5, 6) that for stage decision problems with quadratic costs and linear models we can compute analytically

Dynamic Programming Prof. Kuan-Ting Lai 2020/4/10 Dynamic Programming Dynamic Programming is

Inverse problems and control optimal in non-linear mechanics C. Stolz 1 2 Introduction

Optimal Control and Dynamic Programming 4SC000 Q2 2017-2018 Duarte Antunes Introduction

High Warehouse Racks: Optimal Feedback Control and High Warehouse Racks: Optimal Feedback Control

Optimal Control and Dynamic Programming 4SC000 Q2 2017-2018 Duarte Antunes Part I Discrete

Dynamic Programming Outline and Reading Matrix Chain-Product (5.3.1) Dynamic Programming:

Optimal Control and Dynamic Programming 4SC000 Q2 2017-2018 Duarte Antunes Outline

Optimal Control and Dynamic Programming 4SC000 Q2 2017-2018 Duarte Antunes Part III

Optimal Control Theory The theory Optimal control theory is a mature mathematical discipline

Optimal Control Theory The theory Optimal control theory is a mature mathematical discipline

Part 23 Optimal Control: Examples 142 Definition of optimal control problems Commonly

MA/CSSE 473 Day 28 Optimal BSTs Dynamic Programming Example OPTIMAL BINARY SEARCH TREES 1

Optimal Control and Dynamic Programming 4SC000 Q2 2017-2018 Duarte Antunes Outline Shortest

CS 170 Section 6 Dynamic Programming Owen Jow | owenjow@berkeley.edu Agenda Dynamic

Dynamic Programming Kevin Zatloukal July 18, 2011 Motivation Dynamic programming deserves

Output Feedback Optimal Control with Constraints Mar a M. Seron September 2004 Centre for

Cycle 2 2016: Pragmatic Studies to Evaluate Patient-Centered Outcomes Applicant Town Hall June

Computer Security DD2395 http://www.csc.kth.se/utbildning/kth/kurser/DD2395/dasak11/ Fall 2011

Communications and Information Sharing CS 118 Computer Network Fundamentals Peter Reiher

Machine Translation: Examples CS 188: Artificial Intelligence Spring 2006 Lecture 28: Machine

Junction-tree algorithm Probabilistic Graphical Models Sharif University of Technology Spring

Preservation Decisions: Terms and Conditions Apply Challenges, Misperceptions and Lessons Learned

Lecture 11 LSdiff evaluation / Focus group study Mining Software Repositories, Part 1 eRose,

Covering Small Independent Sets and Separators (with Applications) Recent Advances in Algorithms