Reduction of continuous-time control to Problem discrete-time - PowerPoint PPT Presentation

Reduction of continuous- time control to discrete-time control A. Jean-Marie Reduction of continuous-time control to Problem discrete-time control statement The model Uniformization A. Jean-Marie Event model Application OCOQS Meeting, 24 January 2012

Outline Reduction of continuous- time control to discrete-time Problem statement 1 control A. Jean-Marie The basic model 2 Problem statement The model Uniformization 3 Uniformization Event model Application Event model 4 Application 5

Progress Reduction of continuous- time control to discrete-time Problem statement 1 control A. Jean-Marie The basic model 2 Problem statement The model Uniformization 3 Uniformization Event model Application Event model 4 Application 5

Problem statement Reduction of continuous- time control to discrete-time control Consider some continuous-time, discrete-event, infinite-horizon A. Jean-Marie control problem. Problem statement The standard way to analyze such problems is to reduce them The model to a discrete-time problem using some embedding of a Uniformization discrete-time process into the continuous-time one. Event model Application The optimal policy is deduced from the solution of the discrete-time problem.

Problem statement (ctd) Reduction of continuous- time control to discrete-time There are various ways to place the observation points: control jump instants, A. Jean-Marie controllable event instants, Problem statement uniformization instants. The model They may result in different value functions. Uniformization Event model Question Application Is there a way to “play” with the embedding process in order to obtain structural properties of the optimal policy?

A basic continuous-time control model Reduction of As a starting point, consider: continuous- time control a continuous-time, piecewise-constant process to discrete-time { X ( t ); t ≥ 0 } over some discrete state space X ; control A. Jean-Marie a sequence of decision instants { T n ; n ∈ N } , endogenous a finite set of actions A ; Problem statement at a decision point t , given the current state x = X ( t ), The model there is a feasible set of actions A x ⊂ A . Uniformization Assuming that action a ∈ A s is applied, Event model a reward r ( x , a , y ) is obtained; Application the state jumps to a random T a ( x ) with distribution P xay = P ( T a ( x ) = y ); given y , the next decision point is at t + τ , where τ has an exponential distribution with parameter λ y . between decision points, a reward is accumulated at ℓ ( x ( t )), piecewise constant by assumption.

Basic model (ctd.) Reduction of continuous- time control to discrete-time Reward criterion: expected total discounted reward. Given control X (0) = x , A. Jean-Marie � � ∞ Problem statement e − α t ℓ ( X ( t ))d t J ( x ) = E The model 0 ∞ Uniformization � � e − α T n r ( X ( T − n ) , A ( T n ) , X ( T + + n )) . Event model Application n =1 The goal is to find the optimal feedback control d : X → A (with the constraint that d ( x ) ∈ A x for all x ) to maximize J .

Basic embedding Reduction of continuous- time control Features of this model: to discrete-time control is instantaneous and localized in time control A. Jean-Marie evolution is strictly Markovian Problem immediate generalization to semi-Markov statement decision/transition instants. The model Two possibilities for the observation of the process: Uniformization Event model just before a transition/control: → V − ( x ) Application just after a transition/control: → V + ( x ) Question: What is their relation with J ( x )?

Direct Bellman equations Reduction of continuous- time control to discrete-time control Conditioning on T 1 , the first decision point, we get: A. Jean-Marie 1 Problem V + ( x ) ℓ ( x ) + λ x V − ( x ) � � = statement α + λ x The model �� V − ( x ) r ( x , a , y ) + V + ( y ) Uniformization � � = max P xay a ∈A x Event model y Application r ( x , a , T a ( x )) + V + ( T a ( x )) � � �� = max E . a ∈A x

Basic functional equations Reduction of Eliminating V + or V − leads to two forms of Bellman’s continuous- time control equation: to discrete-time control Bellman Equations A. Jean-Marie Problem 1 � statement V + ( x ) = ℓ ( x ) α + λ x The model Uniformization � � � r ( x , a , y ) + V + ( y ) � + λ x max P xay Event model a ∈A x y Application � � V − ( x ) = max P xay r ( x , a , y ) a ∈A x y 1 � � ℓ ( y ) + λ y V − ( y ) � + . α + λ y

Uniformization ` a la carte For each state x , define ν x ≥ λ x and introduce a new, Reduction of continuous- uncontrollable transition point after τ ∼ Exp( ν x ). time control to Extend the state space to X × { r , u } , discrete-time control r = regular event, u = uniformization event. A. Jean-Marie Table of rewards and transition probabilities: Problem statement x ′ y ′ r ( x ′ , a , y ′ ) a P x ′ ay ′ The model λ y Uniformization ( x , r ) a ( y , r ) r ( x , a , y ) P xay ν y Event model ν y − λ y Application ( x , r ) a ( y , u ) r ( x , a , y ) P xay ν y λ y ( x , u ) ∗ ( x , r ) 0 ν y ν y − λ y ( x , u ) ∗ ( y , u ) 0 ν y Running reward: ℓ ( x , e ) = ℓ ( x ); transition rate: λ ( x , e ) = ν x .

Relationships Reduction of continuous- time control to discrete-time Lemma control A. Jean-Marie Let V ( · ) be the direct value function and V u ( · , · ) be the uniformized value function. Then: Problem statement The model V − V − ( x ) u ( x , r ) = Uniformization V − V + ( x ) u ( x , u ) = Event model 1 Application V + ( ℓ ( x ) + ν x V − ( x )) u ( x , r ) = α + ν x 1 V + ( ℓ ( x ) + ν x V + ( x )) . u ( x , u ) = α + ν x

Interpretations No uniformization ( λ x = µ x ): Reduction of continuous- time control 1 V + ( ℓ ( x ) + λ x V − ( x )) = V + ( x ) to u ( x , r ) = discrete-time α + λ x control �� T 1 � A. Jean-Marie V + e − α u ℓ ( x )d u + e − α T 1 V + ( x ) u ( x , u ) = . E Problem 0 statement Hyper-frequent uniformization ( ν x → ∞ ): The model ν x →∞ V + V − ( x ) = V − lim u ( x , r ) = u ( x , u ) Uniformization Event model ν x →∞ V + V + ( x ) = V − lim u ( x , u ) = u ( x , r ) . Application No discounting ( α → 0): ℓ ( x ) V + + V − ( x ) u ( x , r ) ∼ ν x ℓ ( x ) V + + V + ( x ) . u ( x , u ) ∼ ν x

Bellman equations for the uniformized process Reduction of Lemma continuous- time control The basic value functions V + and V − satisfy: to discrete-time control 1 � A. Jean-Marie V + ( x ) ℓ ( x ) + ( ν x − λ x ) V + ( x ) = α + ν x Problem statement � � � r ( x , a , y ) + V + ( y ) � The model + λ x max P xay a ∈A x Uniformization y Event model 1 � V − ( x ) ( ν x − λ x ) V − ( x ) Application = α + ν x � � + ( α + λ x ) max P xay r ( x , a , y ) + a ∈A x y 1 �� ( ℓ ( y ) + λ y V − ( y )) α + λ y

The event model If transitions have several “types”, the strictly markovian model Reduction of continuous- requires to extend the state space: x = ( s , e ) with s the actual time control to system state, and e the event type. We get: discrete-time control A. Jean-Marie 1 � V + ( s , e ) Problem = ℓ ( s , e ) statement α + λ s , e The model � � P (( s , e ); a ; ( s ′ , e ′ )) + λ s , e max Uniformization a ∈A s , e s ′ e ′ Event model � � Application r (( s , e ) , a , ( s ′ , e ′ )) + V + ( s ′ , e ′ ) � � � V − ( s , e ) P (( s , e ); a ; ( s ′ , e ′ )) = max a ∈A s , e s ′ e ′ r (( s , e ) , a , ( s ′ , e ′ )) + ℓ ( s ′ , e ′ ) + λ s ′ , e ′ V − ( s ′ , e ′ ) � � α + λ s ′ , e ′

The event model Reduction of continuous- time control to discrete-time control A. Jean-Marie Question Problem Under which conditions is it possible to “get rid” of the event statement part in the state representation. The model Uniformization Is it possible that: Event model Application V + ( s , e ) = V + ( s ) ∀ e ?

Reduction of continuous-time control to Problem discrete-time - PowerPoint PPT Presentation

Reduction of continuous- time control to discrete-time control A. Jean-Marie Reduction of continuous-time control to Problem discrete-time control statement The model Uniformization A. Jean-Marie Event model Application OCOQS

Overview Verifying Continuous-Time Markov Chains Negative exponential distributions 1 Lecture

Continuous Descent Operation (CDO) Continuous Descent Operation (CDO) Doc 9331 Doc 9331 Erwin

Continuous Improvement Continuous Improvement Update on Continuous Improvement Process Update on

Overview Motivation Verifying Continuous-Time Markov Chains 1 Lecture 1+2: Discrete-Time Markov

Introduction to Harm Reduction Definition of Harm Reduction Harm reduction refers to policies,

Cycle time: 40 sec Cycle time: 12 sec Cycle time: 0.75 sec Cycle time: 1.25 sec Cycle time: 5

Continuous Probability 3 2 Continuous Probability Motivation I Sometimes you cant model

Industrial Robots Industrial Robots Control Control Part 1 Control Control Part 1 Part 1

Continuous Delivery of Debian packages Michael Prokop Terminology Continuous Integration

Chapter 5 Continuous Random Variables Continuous Probability Distributions Continuous Probability

Continuous Distributions 1.8-1.9: Continuous Random Variables 1.10.1: Uniform Distribution

Continuous Distributions 1.8-1.9: Continuous Random Variables 1.10.1: Uniform Distribution

Formal Modeling in Cognitive Science 1 Continuous Random Variables Lecture 21: Continuous Random

CONTINUOUS SECURITY CONTINUOUS SECURITY IN THE DEVOPS WORLD IN THE DEVOPS WORLD JULIEN VEHENT

ICU Restraint Reduction: ICU Restraint Reduction: ICU Restraint Reduction: Development of

SAS data reduction Haydyn Mertens (EMBL-Hamburg) Data reduction steps Acquisition Reduction

Mean-Payoff Optimization in Continuous-Time Markov Chains with Parametric Alarms Christel Baier

Computing limit expectations of imprecise continuous-time Markov chains Alexander Erreygers

EI331 Signals and Systems Lecture 1 Bo Jiang John Hopcroft Center for Computer Science Shanghai

Time Series vs SDEs Diffusions Consider the AR(1) process. It is a discrete-time random process,

Fourier representation of signals M ATLAB tutorial series (Part 1.1) Pouyan Ebrahimbabaie

STA 331 2.0 Stochastic Processes 5. Continuous Parameter Markov Chains Dr Thiyanga S. Talagala

quancol . ........ . . . ... ... ... ... ... ... ... Stochastic Process Algebras

Solving Continuous MDPs with Discretization Pieter Abbeel UC Berkeley EECS Markov Decision