A Correctness Result for Synthesizing Plans With Loops in - PowerPoint PPT Presentation

A Correctness Result for Synthesizing Plans With Loops in Stochastic Domains Laszlo Treszkai & Vaishak Belle , University of Edinburgh

Finite State Controllers • FSCs, such as plans with loops, are powerful and compact representations of action selection widely used in robotics, video games and logistics • Cleaning a table (with arbitrary number of objects), chopping tree of unknown thickness • Lots of work on algorithms for synthesis (e.g., AND/OR bounded search, abstraction)

What if the actions are noisy? An agent that stands on the handrail of a bridge: on one side the sidewalk, on the other side the river, and the agent is n steps away is the goal. With every step taken on the handrail, the agent has a 0.1 probability of falling into the river (an absorbing state), and 0.9 probability of moving forward one step. However, can deterministically get onto sidewalk, where forward is also deterministic. Pr = 0.9 n Clearly, moving solely on handrail satisfies goal with . But it can do better.

Pr = 1 Pr = 0.9

How to handle noise? Lots of approaches for FSCs, but many of them are either approximate or do not properly handle non-terminating traces (e.g., assume failure cannot happen infinitely many times) Theorem AND-OR search algorithm fails if at least one history that cannot be extended into a goal history Theorem AND-OR search algorithm fails if at least one looping history

Planning Problem ∑ LGT ≐ Pr( h ) { h ∣ h is a goal history } Given a planning problem , an integer N , LGT* ∈ 𝒬 = ⟨ S , A , O , Δ , Ω , s 0 , G ⟩ (0,1), find a finite-state controller with at most N states such that LGT ≥ LGT* for . (Here, only is stochastic.) 𝒬 Δ

Theorem Given a planning problem , integer N , and LGT* ∈ (0, 1), the 𝒬 search algorithm PANDOR is sound and complete: every FSC 𝒟 returned is N -bounded and LGT ≥ LGT*, and if there exists an N - bounded controller that is LGT ≥ LGT*, then one such FSC will be found

How? Consider the AND-OR algorithm • Initially, the algorithm starts with the empty controller , at initial controller 𝒟 state & q 0 s 0 • AND function enumerates the outcomes of an action from a given combined state and history, and calls OR to synthesize a controller that is correct for every outcome • The OR function enumerates the extensions of a controller for the current controller state and observation, and thus selects a next action for the current observation, and then calls AND to test for correctness recursively on the outcomes of the chosen action •

The extension: key idea • Maintain an upper and lower bound for the LGT of the current controller • Whenever a failing run is encountered, the upper bound is decreased by the likelihood of this run; similarly, a goal run increases the lower bound on LGT • When the lower bound exceeds the desired correctness likelihood (i.e., LGT*), the current controller is guaranteed to be “good enough”, and the algorithm returns with success • When the upper bound is lower than LGT*, none of the extensions of the controller is su ffi ciently good, and we revert the program state to the point of the last non- deterministic choice point • Need to carefully keep track of looping histories (involved)

Pr = 1 Pr = 0.9 github.com/treszkai/pandor

github.com/treszkai/pandor

Conclusions • New theoretical results on a generic technique for synthesizing FSCs in stochastic environments , allowing for highly granular specifications on termination and goal satisfaction • Builds on the generic AND-OR bounded search, a generic technique for deterministic environments   • Proved the soundness and completeness of that synthesis algorithm  

A Correctness Result for Synthesizing Plans With Loops in - PowerPoint PPT Presentation

A Correctness Result for Synthesizing Plans With Loops in Stochastic Domains Laszlo Treszkai & Vaishak Belle , University of Edinburgh Finite State Controllers FSCs, such as plans with loops, are powerful and compact representations of

LOOPS Loops Loops Loops! How can we repeat a piece of code without having to write it out over

Proving Program Correctness The Axiomatic Approach What is Correctness? Correctness:

Tutorial 3 Loops Side Effects 1 CS 136 Spring 2020 Tutorial 3 Loops: for loops &

Loops! Flow of Control: Loops (Savitch, Chapter 4) TOPICS while Loops do while

Loops! Loops! Loops! Lecture 10 COP 3014 Spring 2017 January 31, 2017 Repetition Statements

Loops! Loops! Loops! Lecture 5 COP 3014 Fall 2020 September 17, 2020 Repetition Statements

SYNTHESIZING 3D SOUND SYNTHESIZING 3D SOUND AND AND SOUND LOCALIZATION SOUND LOCALIZATION

Building Java Programs Chapter 5 Lecture 5-1: while Loops, Fencepost Loops, and Sentinel Loops

Repetition with for loops Topic 5 for loops and nested loops So far, repeating a statement is

Types of loops Topic 15 definite loop : A loop that executes a known number of Indefinite

Building Java Programs Chapter 5 Lecture 10: while Loops, Fencepost Loops, and Sentinel Loops

ARM Assembler Structure / Loops Structure / Loops p. 1/12 Loops Four parts to any loop

Loops Simone Campanoni simonec@eecs.northwestern.edu Outline Loops Identify loops

Building Java Programs Chapter 5 Lecture 5-1: while Loops, Fencepost Loops, and Sentinel Loops

Building Java Programs Chapter 5 Lecture 11: while Loops, Fencepost Loops, and Sentinel Loops

Synthesizing Loops For Program Inversion Cong Hou, Daniel Quinlan, David Jefferson, Richard

Applications of Berkeley s Dwarfs on Nvidia GPUs Seminar: Topics in High-Performance and

Argon : tradeoff-resilient password hashing scheme Alex Biryukov Dmitry Khovratovich University

Securing Circuits Against Constant-Rate Tampering Dana Dachman-Soled Yael Tauman Kalai

Attractive routes Sound pleasantness of pedestrian walks in urban environment Catherine Lavandier

Markov decision process (MDP) Robert Platt Northeastern University The RL Setting Action Agent

Multi-armed Bandits Prof. Kuan-Ting Lai 2020/3/12 k-armed Bandit Problem Playing k armed

From Qualitative to Quantitative Dominance Pruning for Optimal Planning Alvaro Torralba

Monte Carlo Tree Search guided by Symbolic Advice for MDPs Damien Busatto-Gaston, Debraj

A Correctness Result for Synthesizing Plans With Loops in - PowerPoint PPT Presentation

A Correctness Result for Synthesizing Plans With Loops in Stochastic Domains Laszlo Treszkai & Vaishak Belle , University of Edinburgh Finite State Controllers FSCs, such as plans with loops, are powerful and compact representations of

LOOPS Loops Loops Loops! How can we repeat a piece of code without having to write it out over

Proving Program Correctness The Axiomatic Approach What is Correctness? Correctness:

Tutorial 3 Loops Side Effects 1 CS 136 Spring 2020 Tutorial 3 Loops: for loops &amp;

Loops! Flow of Control: Loops (Savitch, Chapter 4) TOPICS while Loops do while

Loops! Loops! Loops! Lecture 10 COP 3014 Spring 2017 January 31, 2017 Repetition Statements

Loops! Loops! Loops! Lecture 5 COP 3014 Fall 2020 September 17, 2020 Repetition Statements

SYNTHESIZING 3D SOUND SYNTHESIZING 3D SOUND AND AND SOUND LOCALIZATION SOUND LOCALIZATION

Building Java Programs Chapter 5 Lecture 5-1: while Loops, Fencepost Loops, and Sentinel Loops

Repetition with for loops Topic 5 for loops and nested loops So far, repeating a statement is

Types of loops Topic 15 definite loop : A loop that executes a known number of Indefinite

Building Java Programs Chapter 5 Lecture 10: while Loops, Fencepost Loops, and Sentinel Loops

ARM Assembler Structure / Loops Structure / Loops p. 1/12 Loops Four parts to any loop

Loops Simone Campanoni simonec@eecs.northwestern.edu Outline Loops Identify loops

Building Java Programs Chapter 5 Lecture 5-1: while Loops, Fencepost Loops, and Sentinel Loops

Building Java Programs Chapter 5 Lecture 11: while Loops, Fencepost Loops, and Sentinel Loops

Synthesizing Loops For Program Inversion Cong Hou, Daniel Quinlan, David Jefferson, Richard

Applications of Berkeley s Dwarfs on Nvidia GPUs Seminar: Topics in High-Performance and

Argon : tradeoff-resilient password hashing scheme Alex Biryukov Dmitry Khovratovich University

Securing Circuits Against Constant-Rate Tampering Dana Dachman-Soled Yael Tauman Kalai

Attractive routes Sound pleasantness of pedestrian walks in urban environment Catherine Lavandier

Markov decision process (MDP) Robert Platt Northeastern University The RL Setting Action Agent

Multi-armed Bandits Prof. Kuan-Ting Lai 2020/3/12 k-armed Bandit Problem Playing k armed

From Qualitative to Quantitative Dominance Pruning for Optimal Planning Alvaro Torralba

Monte Carlo Tree Search guided by Symbolic Advice for MDPs Damien Busatto-Gaston, Debraj

Tutorial 3 Loops Side Effects 1 CS 136 Spring 2020 Tutorial 3 Loops: for loops &