W ISE M OVE ? A research platform that mimics our autonomous driving - PowerPoint PPT Presentation

W ISE M OVE : A Framework to Investigate Safe Deep Reinforcement Learning for Autonomous Driving Jaeyoung Lee , Aravind Balakrishnan, Ashish Gaurav, Krzysztof Czarnecki, Sean Sedwards University of Waterloo September, 14th, 2019

1 W ISE M OVE ? ‣ A research platform that mimics our autonomous driving stack. ‣ Objective: investigate the safety and performance of motion planners trained using deep reinforcement learning ‣ Features: ✓ Hierarchical Decision Making ✓ Runtime Verification ✓ Reinforcement Learning / Monte Carlo Tree Search (MCTS)

2 Motion Planning Architecture in 100 km Public Drive (2018) Motion Planner No learning component … (abstracted) Behaviour Planner high-level decision Local Planner reference trajectories measurements, perceptions, etc.

3 W ISE M OVE Architecture Deep models are trained by Motion Planner (w/o MCTS) deep reinforcement learning. Deep Model for Decision Making Option (high-level decision) Deep Model for Trajectory Generation reference trajectories Road Scenario stop region STOP ego measurements, stop region intersection perceptions, etc. STOP

4 W ISE M OVE Architecture Motion Planner (w/o MCTS) Deep Model for Decision Making Option Road Scenario ‣ Five Options: KeepLane, Stop, Wait, Follow, ChangeLane stop region ‣ Components STOP ego ✓ speed limit, target lane stop region intersection ✓ time-out (e.g., 1 sec.) STOP ✓ preconditions , e.g., in an option ‘Wait’, ‣ Two “two-lane and one-way” roads G((has_stopped_in_stop_region ‣ All-ways stop implemented by the stop region and in_stop_region) U highest_priority) ‣ 0~5 other vehicles Deep Model for Trajectory Generation

5 W ISE M OVE Architecture Motion Planner (w/o MCTS) Deep Model for Decision Making Runtime Verifier Road Scenario ‣ Checks LTL-like strings until violated. stop region STOP ✓ preconditions, e.g., in an option ‘Wait’, ego stop region intersection G((has_stopped_in_stop_region and in_stop_region) U highest_priority) STOP ‣ An episode ends when: ✓ tra ffi c-rules, e.g., in a stop region, ✓ Ego reaches the right end on the road, G(in_stop_region => ✓ a tra ffi c rule is violated, or (in_stop_region U has_stopped_in_stop_region)) ✓ a collision happens. Deep Model for Trajectory Generation

6 W ISE M OVE Architecture Motion Planner (w/o MCTS) Deep Model for Decision Making ‣ Choose the ‘best’ Option. Input: a state representation Output: the learnt ‘best’ Option ‣ Act upon the termination of the current Option. Option (high-level decision) Next Option? Deep Model for Trajectory Generation

7 W ISE M OVE Architecture Motion Planner (w/o MCTS) Deep Model for Decision Making Option (high-level decision) Next Option? Deep Model for Trajectory Generation ‣ A deep model is stored for each Option. Input: a state representation (simplified) Output: reference trajectories, given an Option ‣ Trajectories generated with simplified vehicle model.

8 W ISE M OVE Architecture Motion Planner (w/o MCTS) Deep Model for Decision Making Option “ ” KeepLane Next Option? Follow Wait time Stop Deep Model for Trajectory Generation reference trajectory “____” To the road scenario

9 Training & Testing Low-level Deep Models ‣ Five Deep Models —one for each Option. ‣ Each model ✓ outputs continuous control commands generating the trajectories ✓ was trained by reinforcement learning (DDPG) with ✓ 20 sec. timeout ✓ (additional) preconditions and, if necessary, tra ffi c rules. Option “ ” Deep Model for Trajectory Generation reference trajectory “____”

10 After 100,000 steps training … KeepLane Stop Follow Wait

11 After 100,000 steps training … KeepLane mean (std) % success after 100,000 training Stop (averaged over 100 trials of 100 episodes) Follow Wait

12 After 1,000,000 steps training … KeepLane Stop Follow Wait

13 Training & Testing High-level Deep Model ‣ Each low-level deep model is trained a priori for 1,000,000 steps. ‣ One deep model, trained by reinforcement learning (DQN), outputs an Option. ‣ 1 sec. time-out for each option; 20 sec. time-out for an entire episode. Motion Planner (w/o MCTS) Deep Model for Decision Making Option “ ” Next Option? KeepLane Follow Wait time Stop Deep Model for Trajectory Generation reference trajectory “____” To the road scenario

14 Training & Testing High-level Deep Model Overall performance (after 200,000 steps training) (averaged over 1000 episodes)

15 With MCTS over Options … . . . Stop Traverse until the leaf node, Wait with exploration & exploitation ChangeLane KeepLane KeepLane KeepLane Wait ⊥ Simulate! . . . ChangeLane KeepLane current Wait Follow Stop Backpropagate! state Stop ChangeLane Stop Wait Follow Stop Overall performance (averaged over 1000 episodes)

16 Concluding Remarks ‣ Features: Options / Reinforcement Learning / Runtime Verification / Monte Carlo Tree Search (MCTS) ‣ The results are reproducible using the publicly available code at git.uwaterloo.ca/wise-lab/wise-move/ ‣ Future works ✓ Comparisons of RL and hand-coded motion planners. ✓ Di ff erent scenarios, realistic vehicle dynamics, etc. ✓ Simulation-to-Real

Thank you for attention! Q & A Acknowledgment This work is supported by the Japanese Science and Technology agency (JST) ERATO project JPMJER1603: HASUO Metamathematics for Systems Design, and by the Natural Sciences and Engineering Research Council of Canada (NSERC) Discovery Grant: Model-Based Synthesis and Safety Assurance of Intelligent Controllers for Autonomous Vehicles.

W ISE M OVE ? A research platform that mimics our autonomous driving - PowerPoint PPT Presentation

W ISE M OVE : A Framework to Investigate Safe Deep Reinforcement Learning for Autonomous Driving Jaeyoung Lee , Aravind Balakrishnan, Ashish Gaurav, Krzysztof Czarnecki, Sean Sedwards University of Waterloo September, 14th, 2019 1 W ISE M

Ove rtime & Co mpe nsa to ry T ime What is ove r time ? Ove rtime is c a lc ula te d

PR OGR AM OVE R VIE W PR OGR AM OVE R VIE W PR OGR AM OVE R VIE W gov/hospitals

E xe rc ise fo r fa lls pre ve ntio n: An inve nto ry o f e xe rc ise pro g ra ms I MPACT ,

Pairw ise Variability Index: Variability Index: Pairw ise Evaluating the Cognitive Evaluating

Ove Nilsson Ove Nilsson Ume, Sweden Ume, Sweden Recipient of the 2007 Marcus Wallenberg

Capital Impr ove me nt Pr ogr am Citize ns Bond Ove r sight Committe e Update Se pte mbe r

Capital E quipme nt Ove r vie w F Y 2017 September 28, 2016 1 Capital E quipme nt Ove r

Lecture no: 12 Centralized and AdHoc networks Wireless LAN Ove Edfors, Department of Electroical

Countably categorical almost sure theories Ove Ahlman, Uppsala University ove@math.uu.se

An Ove rvie w o f Onta rio Ag ric ulture a nd Fo o d: An Ove rvie w o f Onta rio

C0002M Numerical analysis, Lecture 11 Ove Edlund Ove Edlund C0002M Numerical analysis,

Ove Nilsson Ove Nilsson Ume, Sweden Ume, Sweden Chairman and Scientific Director of Ume

2 Lecture no: Propagation mechanisms Ove Edfors, Department of Electrical and Information

Using T Using T e c hnology to Impr e c hnology to Impr ove ove Me dic ation Adhe r Me dic

Lecture no: 9 Multiple access and cellular systems Ove Edfors, Department of Electrical and

May 15, 2019 37 Ove rvie w Why T SMO Ne e d fo r a Ne w Pe rspe c tive Ma ste r Pla n Ove

Do Managers and Leaders Really Do Different Things? by John OLeary JUNE 20, 2016 Business

1 3.1.1 Formal Properties and a little Remarks (III) Theory This definition of a MAS is

CSC421/2516 Lecture 3: Automatic Differentiation & Distributed Representations Jimmy Ba

Authentication of LZ77 compressed data Stefano Lonardi University of California, Riverside

MIT 9.520/6.860, Fall 2017 Statistical Learning Theory and Applications Class 20: Dictionary

Regulation update Monitoring and Compliance 2016 Statement of Compliance Evidence required

Our Vocational Qualification Strategy Cassy Taylor Associate Director, Vocational

Impact Evaluating the effectiveness of your careers programme Jo Welch Natasha Davies

W ISE M OVE ? A research platform that mimics our autonomous driving - PowerPoint PPT Presentation

W ISE M OVE : A Framework to Investigate Safe Deep Reinforcement Learning for Autonomous Driving Jaeyoung Lee , Aravind Balakrishnan, Ashish Gaurav, Krzysztof Czarnecki, Sean Sedwards University of Waterloo September, 14th, 2019 1 W ISE M

Ove rtime &amp; Co mpe nsa to ry T ime What is ove r time ? Ove rtime is c a lc ula te d

PR OGR AM OVE R VIE W PR OGR AM OVE R VIE W PR OGR AM OVE R VIE W gov/hospitals

E xe rc ise fo r fa lls pre ve ntio n: An inve nto ry o f e xe rc ise pro g ra ms I MPACT ,

Pairw ise Variability Index: Variability Index: Pairw ise Evaluating the Cognitive Evaluating

Ove Nilsson Ove Nilsson Ume, Sweden Ume, Sweden Recipient of the 2007 Marcus Wallenberg

Capital Impr ove me nt Pr ogr am Citize ns Bond Ove r sight Committe e Update Se pte mbe r

Capital E quipme nt Ove r vie w F Y 2017 September 28, 2016 1 Capital E quipme nt Ove r

Lecture no: 12 Centralized and AdHoc networks Wireless LAN Ove Edfors, Department of Electroical

Countably categorical almost sure theories Ove Ahlman, Uppsala University ove@math.uu.se

An Ove rvie w o f Onta rio Ag ric ulture a nd Fo o d: An Ove rvie w o f Onta rio

C0002M Numerical analysis, Lecture 11 Ove Edlund Ove Edlund C0002M Numerical analysis,

Ove Nilsson Ove Nilsson Ume, Sweden Ume, Sweden Chairman and Scientific Director of Ume

2 Lecture no: Propagation mechanisms Ove Edfors, Department of Electrical and Information

Using T Using T e c hnology to Impr e c hnology to Impr ove ove Me dic ation Adhe r Me dic

Lecture no: 9 Multiple access and cellular systems Ove Edfors, Department of Electrical and

May 15, 2019 37 Ove rvie w Why T SMO Ne e d fo r a Ne w Pe rspe c tive Ma ste r Pla n Ove

Do Managers and Leaders Really Do Different Things? by John OLeary JUNE 20, 2016 Business

1 3.1.1 Formal Properties and a little Remarks (III) Theory This definition of a MAS is

CSC421/2516 Lecture 3: Automatic Differentiation &amp; Distributed Representations Jimmy Ba

Authentication of LZ77 compressed data Stefano Lonardi University of California, Riverside

MIT 9.520/6.860, Fall 2017 Statistical Learning Theory and Applications Class 20: Dictionary

Regulation update Monitoring and Compliance 2016 Statement of Compliance Evidence required

Our Vocational Qualification Strategy Cassy Taylor Associate Director, Vocational

Impact Evaluating the effectiveness of your careers programme Jo Welch Natasha Davies

Ove rtime & Co mpe nsa to ry T ime What is ove r time ? Ove rtime is c a lc ula te d

CSC421/2516 Lecture 3: Automatic Differentiation & Distributed Representations Jimmy Ba