Rich Behavioral Models: Illustration on Journey Planning Mickael - PowerPoint PPT Presentation

Rich Behavioral Models: Illustration on Journey Planning Mickael Randour F.R.S.-FNRS & UMONS – Universit´ e de Mons, Belgium March 14, 2019 Workshop – Theory and Algorithms in Graph and Stochastic Games

Context SSP-E/SSP-P SSP-WE SSP-PQ Conclusion The talk in one slide Strategy synthesis for Markov Decision Processes (MDPs) Finding good controllers for systems interacting with a stochastic environment. Good? Performance evaluated through payoff functions . Usual problem is to optimize the expected performance or the probability of achieving a given performance level . Not sufficient for many practical applications. � Several extensions, more expressive but also more complex. . . Aim of this survey talk Give a flavor of classical questions and extensions ( rich behavioral models ), illustrated on the stochastic shortest path (SSP). Rich Behavioral Models Mickael Randour 1 / 41

Context SSP-E/SSP-P SSP-WE SSP-PQ Conclusion 1 Context, MDPs, strategies 2 Classical stochastic shortest path problems 3 Good expectation under acceptable worst-case 4 Percentile queries in multi-dimensional MDPs 5 Conclusion Rich Behavioral Models Mickael Randour 2 / 41

Context SSP-E/SSP-P SSP-WE SSP-PQ Conclusion Multi-criteria quantitative synthesis Verification and synthesis: � a reactive system to control , � an interacting environment , � a specification to enforce . Model of the (discrete) interaction? � Antagonistic environment: 2-player game on graph. � Stochastic environment: MDP . Quantitative specifications. Examples: � Reach a state s before x time units � shortest path. � Minimize the average response-time � mean-payoff. Focus on multi-criteria quantitative models � to reason about trade-offs and interplays . Rich Behavioral Models Mickael Randour 4 / 41

Context SSP-E/SSP-P SSP-WE SSP-PQ Conclusion Strategy (policy) synthesis for MDPs system environment informal description description specification model as a model as 1 How complex is it to decide if Markov Decision a winning Process (MDP) objective a winning strategy exists? 2 How complex such a strategy synthesis needs to be? Simpler is better . is there a 3 Can we synthesize one winning strategy ? efficiently? yes no empower system capabilities strategy or weaken = specification controller requirements Rich Behavioral Models Mickael Randour 5 / 41

Context SSP-E/SSP-P SSP-WE SSP-PQ Conclusion Markov decision processes MDP D = ( S , s init , A , δ, w ). � Finite sets of states S and actions A , 0 . 7 � probabilistic transition δ : S × A → D ( S ), 0 . 3 s 1 s 2 � weight function w : A → Z . a 1 , 2 Run (or play): ρ = s 1 a 1 . . . a n − 1 s n . . . 0 . 9 a 2 , − 1 b 3 , 3 such that δ ( s i , a i , s i +1 ) > 0 for all i ≥ 1. 0 . 1 � Set of runs R ( D ). � Set of histories (finite runs) H ( D ). s 3 Strategy σ : H ( D ) → D ( A ). � ∀ h ending in s , Supp( σ ( h )) ∈ A ( s ). a 4 , 1 a 3 , 0 s 4 Rich Behavioral Models Mickael Randour 6 / 41

Context SSP-E/SSP-P SSP-WE SSP-PQ Conclusion Markov decision processes Sample pure memoryless strategy σ . 0 . 7 Sample run ρ = s 1 a 1 s 2 a 2 s 1 a 1 s 2 a 2 ( s 3 a 3 s 4 a 4 ) ω . 0 . 3 s 1 s 2 Other possible run ρ ′ = s 1 a 1 s 2 a 2 ( s 3 a 3 s 4 a 4 ) ω . a 1 , 2 Strategies may use 0 . 9 a 2 , − 1 b 3 , 3 � finite or infinite memory , 0 . 1 � randomness . s 3 Payoff functions map runs to numerical values: a 4 , 1 a 3 , 0 � truncated sum up to T = { s 3 } : TS T ( ρ ) = 2, TS T ( ρ ′ ) = 1, � mean-payoff: MP( ρ ) = MP( ρ ′ ) = 1 / 2, s 4 � many more. Rich Behavioral Models Mickael Randour 6 / 41

Context SSP-E/SSP-P SSP-WE SSP-PQ Conclusion Markov chains 0 . 7 0 . 3 Once strategy σ fixed, fully stochastic process: s 1 s 2 a 1 , 2 � Markov chain ( MC ) M . 0 . 9 State space = product of the MDP and the a 2 , − 1 memory of σ . 0 . 1 s 3 Event E ⊆ R ( M ) � probability P M ( E ) Measurable f : R ( M ) → R ∪ {∞} , a 4 , 1 a 3 , 0 � expected value E M ( f ) s 4 Rich Behavioral Models Mickael Randour 7 / 41

Context SSP-E/SSP-P SSP-WE SSP-PQ Conclusion Aim of this survey Compare different types of quantitative specifications for MDPs � w.r.t. the complexity of the decision problem, � w.r.t. the complexity of winning strategies. Recent extensions share a common philosophy: framework for the synthesis of strategies with richer performance guarantees . � Our work deals with many different payoff functions. Focus on the shortest path problem in this talk. � Not the most involved technically, natural applications. � Useful to understand the practical interest of each variant. Joint work with R. Berthon, V. Bruy` ere, E. Filiot, J.-F. Raskin, O. Sankur [BFRR17, RRS17, RRS15, BCH + 16, Ran16, BRR17]. Rich Behavioral Models Mickael Randour 8 / 41

Context SSP-E/SSP-P SSP-WE SSP-PQ Conclusion Stochastic shortest path Shortest path problem for weighted graphs Given state s ∈ S and target set T ⊆ S , find a path from s to a state t ∈ T that minimizes the sum of weights along edges. � PTIME algorithms (Dijkstra, Bellman-Ford, etc) [CGR96]. We focus on MDPs with strictly positive weights for the SSP. � Truncated sum payoff function for ρ = s 1 a 1 s 2 a 2 . . . and target set T : �� n − 1 j =1 w ( a j ) if s n first visit of T , TS T ( ρ ) = ∞ if T is never reached. Rich Behavioral Models Mickael Randour 10 / 41

Context SSP-E/SSP-P SSP-WE SSP-PQ Conclusion Planning a journey in an uncertain environment home go back, 2 railway, 2 car, 1 0 . 1 0 . 9 0 . 2 0 . 1 0 . 7 waiting light medium heavy train bike, 45 room traffic traffic traffic 0 . 9 0 . 1 relax, 35 drive, 20 drive, 30 drive, 70 wait, 3 work Each action takes time, target = work. � What kind of strategies are we looking for when the environment is stochastic? Rich Behavioral Models Mickael Randour 11 / 41

Context SSP-E/SSP-P SSP-WE SSP-PQ Conclusion SSP-E: minimizing the expected length to target SSP-E problem Given MDP D = ( S , s init , A , δ, w ), target set T and threshold ℓ ∈ Q , decide if there exists σ such that E σ D (TS T ) ≤ ℓ . Theorem [BT91] The SSP-E problem can be decided in polynomial time. Optimal pure memoryless strategies always exist and can be constructed in polynomial time. Rich Behavioral Models Mickael Randour 12 / 41

Context SSP-E/SSP-P SSP-WE SSP-PQ Conclusion SSP-E: illustration home go back, 2 railway, 2 car, 1 0 . 1 0 . 9 0 . 2 0 . 1 0 . 7 waiting light medium heavy train bike, 45 room traffic traffic traffic 0 . 9 0 . 1 relax, 35 drive, 20 drive, 30 drive, 70 wait, 3 work � Pure memoryless strategies suffice. � Taking the car is optimal: E σ D (TS T ) = 33. Rich Behavioral Models Mickael Randour 13 / 41

Context SSP-E/SSP-P SSP-WE SSP-PQ Conclusion SSP-E: PTIME algorithm 1 Graph analysis (linear time): � s not connected to T ⇒ ∞ and remove, � s ∈ T ⇒ 0. 2 Linear programming ( LP , polynomial time). For each s ∈ S \ T , one variable x s , � max x s s ∈ S \ T under the constraints � δ ( s , a , s ′ ) · x s ′ x s ≤ w ( a )+ for all s ∈ S \ T , for all a ∈ A ( s ). s ′ ∈ S \ T Rich Behavioral Models Mickael Randour 14 / 41

Context SSP-E/SSP-P SSP-WE SSP-PQ Conclusion SSP-E: PTIME algorithm 1 Graph analysis (linear time): � s not connected to T ⇒ ∞ and remove, � s ∈ T ⇒ 0. 2 Linear programming ( LP , polynomial time). Optimal solution v : � v s = expectation from s to T under an optimal strategy. Optimal pure memoryless strategy σ v :   σ v ( s ) = arg min � δ ( s , a , s ′ ) · v s ′  .  w ( a ) + a ∈ A ( s ) s ′ ∈ S \ T � Playing optimally = locally optimizing present + future. Rich Behavioral Models Mickael Randour 14 / 41

Context SSP-E/SSP-P SSP-WE SSP-PQ Conclusion SSP-E: PTIME algorithm 1 Graph analysis (linear time): � s not connected to T ⇒ ∞ and remove, � s ∈ T ⇒ 0. 2 Linear programming ( LP , polynomial time). In practice, value and strategy iteration algorithms often used: � best performance in most cases but exponential in the worst-case, � fixed point algorithms, successive solution improvements [BT91, dA99, HM14]. Rich Behavioral Models Mickael Randour 14 / 41

Context SSP-E/SSP-P SSP-WE SSP-PQ Conclusion Traveling without taking too many risks home go back, 2 railway, 2 car, 1 0 . 1 0 . 9 0 . 2 0 . 1 0 . 7 waiting light medium heavy train bike, 45 room traffic traffic traffic 0 . 9 0 . 1 relax, 35 drive, 20 drive, 30 drive, 70 wait, 3 work Minimizing the expected time to destination makes sense if we travel often and it is not a problem to be late. With car, in 10% of the cases, the journey takes 71 minutes. Rich Behavioral Models Mickael Randour 15 / 41

Rich Behavioral Models: Illustration on Journey Planning Mickael - PowerPoint PPT Presentation

Rich Behavioral Models: Illustration on Journey Planning Mickael Randour F.R.S.-FNRS & UMONS Universit e de Mons, Belgium March 14, 2019 Workshop Theory and Algorithms in Graph and Stochastic Games Context SSP-E/SSP-P SSP-WE

A Musical Journey A Musical Journey A Musical Journey A Musical Journey A Musical Journey A

Rich Behavioral Models: Illustration on Journey Planning and Focus on Multi-Constraint Percentiles

An illustration of Conditional Independence Martin Emms October 8, 2020 An illustration of

Using Social Media for Health Studies Ingmar Weber Social Computing, Qatar Computing Research

OHIO MEDICAID OHIO MEDICAID MITS Behavioral MITS Behavioral MITS Behavioral MITS Behavioral

PICTORIAL MODERNISM The Golden Age of Illustration was a period of unprecedented

Beyond Beyond Journey Journey Times Times Bluetooth journey time process Moving beyond basic

Use Case Study: Journey - Effects and Terrain Presented by Madis Janno Use Case Study: Journey -

THE GOOD Nutritional value of seafood: Rich source of vitamins Rich source of minerals Rich

modelling rich interaction sensor-based systems statusevent analysis rich set of

Behavioral Health Services FY 2019-20 Budget Overview Department of Health Services Behavioral

Behavioral Health ASO Overview Rebecca Frechard, LCPC, Deputy Director Medicaid Behavioral Health

Sister Pat McDermott: Presentation on the Journey of Oneness Part One: Journey of Oneness Desire

A MONUMENTAL JOURNEY A MONUMENTAL JOURNEY PURPOSE A MONUMENTAL JOURNEY, SCULPTURE TO PRESERVE

The Heros Journey The Heros Journey the hero's journey, is the common template of stories

Pauls Preaching Paul Journey 2 Pauls Third Missionary Journey Acts 18:23-28 Pauls

Adopt Your Self-Service Portal March 27, 2018 Meet Our Host Oded Moshe VP Products, SysAid

Open Security Controls Assessment Language (OSCAL) Lunch with the OSCAL Developers David

DAQ/Trigger status and planned developments Sergey Boyarinov Mar 5, 2019 DAQ/Trigger Hardware

Knapsack problems in hyperbolic groups Andrey Nikolaev (Stevens Institute) GAGTA, May 2013

VDOTS STATEWIDE TRANSPORTATION OPERATIONS CENTER (TOC) AND SAFETY SERVICE PATROL (SSP)

The Hidden Matching-Structure of the Composition of Strips: a Polyhedral Perspective Yuri Faenza 1

SECURITY IN THE SMART GRID R E B E C C A VA N DY K E MY BACKGROUND Masters student in ECE

Low-level mitigations Nadia Heninger and Deian Stefan Some slides adopted from Kirill Levchenko,