Adaptive Trajectory Optimization Gregory Kahn et al., ICRA 2017 - PowerPoint PPT Presentation

PLATO : Policy Learning using Adaptive Trajectory Optimization Gregory Kahn et al., ICRA 2017 SeungWoon Kim

Probabilistic 3D Sound Source Mapping using Moving Microphone Array / IROS 2016 1. SLAM  Find the hardware’s location in the 3D map 2. Sound Localization  Detect the directions of sound 3. Particle Filter  Calculate the conversion region of directions 4. Sound Source Region Detection 2

Contents □ Motivation □ Background □ Main Contribution □ Results □ Discussion □ Summary and Q & A 3

Motivation (1) □ Policy search (via optimization or RL) is used in many robotic tasks ○ Manipulation ○ Self-driving vehicles https://am.is.tuebingen.mpg.de/uploads/research_project/ http://iranjavan.net/wp-content/uploads/2016/08/wdd2.jpg image/45/unmounting_wheel.jpg 4

Motivation (2) □ What is Policy search? ○ Strategy for finding optimal control for robots and autonomous system ○ Strategy that combines perception and control □ Two obstacles when using RL in the real world ○ RL is difficult to apply to large non-linear function approximators. ○ A partially trained policy can perform unreasonable and even unsafe actions. → To select optimal learning method is important! 5

Background □ Method comparison ○ DAgger method - Selects between teacher and current policy during training with some probability ○ MPC-guided policy search - Seeks to minimize KL-divergence between the teacher and policy distributions. * KL divergence is a measure (but not a metric) of the non- symmetric difference between two probability distributions 6

Main Idea (1) □ PLATO ○ Trains neural networks policies using an adaptive MPC ○ Teacher : adaptive MPC (Model-Predictive Control) * MPC is a traditional optimal control algorithm ○ Algorithm Optimize with respect to KL-divergence Optimize with respect to teacher 7

Main Idea (2) □ The advantages of this approach ○ The teacher can exploit the true state, while the policy is only trained on the observations ○ We can choose a teacher that will remain safe and stable, avoiding dangerous actions during training ○ We can train the final policy using standard and robust supervised learning algorithms 8

Results (1) 9

Results (2) □ Approach ○ Task : A series of simulated quadrotor navigation tasks (with laser, camera) ○ Comparison methods - DAgger - Coaching algorithm - MPC-GPS - Standard supervised learning ○ Environments : winding canyon with randomized turns, dense forest of cylindrical trees - Canyon : changes direction up to 𝝆 /4 radians every 0.5m - Forest : composed of 0.5m radius cylinders with an average spacing of 2.5m 10

Results (3) 11

Results (4) □ Evaluation (centered by PLATO) ○ Can learn effective policies faster, and converges to a solution that is better than other methods. ○ Experiences less than one crash per episode. ○ Successfully learn polices, outperforming prior methods and minimizing the number of crashes. 12

Results (5) 13

Discussion □ The advantages ○ Benefits from the robustness of MPC * minimizing catastrophic failures at training time ○ Use a different set of observations than MPC * the policy can be directly on raw input from onboard sensors, forcing it to perform both perception and control □ The advantages ○ Difficult to apply in most real-world scenarios * requires full state knowledge to train □ Outlook ○ Possibility of acquiring real-world network policies that directly use rich sensory inputs ○ Apply PLATO on real physical platforms 14

Summary and Q&A □ Any Question? 15

Adaptive Trajectory Optimization Gregory Kahn et al., ICRA 2017 - PowerPoint PPT Presentation

PLATO : Policy Learning using Adaptive Trajectory Optimization Gregory Kahn et al., ICRA 2017 SeungWoon Kim Probabilistic 3D Sound Source Mapping using Moving Microphone Array / IROS 2016 1. SLAM Find the hardwares location in the 3D

Moving Object Trajectory Mining Moving Object Trajectory Mining Trajectory decomposition

Lagranto 2.0 Contents An new object - trajectory Tutorial trajectory case study

Neural Nets for Adaptive Filter and Adaptive Neural Nets as Adaptive Filters Pattern Recognition

Adaptive Control Chapter 1: Introduction to Adaptive Control Adaptive Control Landau, Lozano,

Adaptive Control Chapter 11: Direct Adaptive Control 1 Adaptive Control Landau, Lozano,

Pe Pedestria ian n Tra Trajectory jectory Predi redicti ction on Ov Overv rview

Geometry of First-order Methods Trajectory and Adaptive Acceleration Clarice Poon University of

Adaptive Control Chapter 12: Indirect Adaptive Control 1 Adaptive Control Landau, Lozano,

Sensor-based trajectory optimization ABB Robotics Master thesis Martin Biel Supervisor: Mikael

Trajectory Optimization, Imitation Learning Lecture 14 What will you take home today? Recap LQR

Optimal Control, LQR, Trajectory Optimization Lecture 13 What will you take home today? Intro

Nonlinear trajectory tracking control based on adaptive backstepping David Cabecinhas Dynamic

Adaptive Control Chapter 13: Multimodel adaptive control with switching Chapter 13: Multimodel

Adaptive Control Chapter 14: Adaptive regulation Rejection of unknown disturbances 1

Energy-efficient Trajectory Tracking for Mobile Devices Based on "Energy-efficient

Trajectory Code Validation j y Slides 04/12/08 04/12/08 AAE 450 Spring 2008 Trajectory

Cognitive Linguistic Quick Test (CLQT) By: Grace Castillo & Christine Truong What Does This

COMP 150: Developmental Robotics Instructor: Jivko Sinapov www.cs.tufts.edu/~jsinapov Language

Presented by: Lacey Mistretta In education for 14 years ( teacher, Numeracy Coach, Master

MaTRIX Maintenance-Oriented Test Requirements Identifier and Examiner Mary Jean Harrold

Eligible Voting Population for 2008 30 25.9 43.96 U.S. Elections Project

BEREC work plan and strategy: some CERRE comments BEREC Stakeholders Forum 16 October 2014,

Searchable Symmetric Encryption: Optimal Locality in Linear Space via Two-Dimensional Balanced

EMERGENT RESPONDING: GETTING MORE BANG FOR YOUR BUCK WHEN TEACHING VERBAL BEHAVIOR S A R A

Adaptive Trajectory Optimization Gregory Kahn et al., ICRA 2017 - PowerPoint PPT Presentation

PLATO : Policy Learning using Adaptive Trajectory Optimization Gregory Kahn et al., ICRA 2017 SeungWoon Kim Probabilistic 3D Sound Source Mapping using Moving Microphone Array / IROS 2016 1. SLAM Find the hardwares location in the 3D

Moving Object Trajectory Mining Moving Object Trajectory Mining Trajectory decomposition

Lagranto 2.0 Contents An new object - trajectory Tutorial trajectory case study

Neural Nets for Adaptive Filter and Adaptive Neural Nets as Adaptive Filters Pattern Recognition

Adaptive Control Chapter 1: Introduction to Adaptive Control Adaptive Control Landau, Lozano,

Adaptive Control Chapter 11: Direct Adaptive Control 1 Adaptive Control Landau, Lozano,

Pe Pedestria ian n Tra Trajectory jectory Predi redicti ction on Ov Overv rview

Geometry of First-order Methods Trajectory and Adaptive Acceleration Clarice Poon University of

Adaptive Control Chapter 12: Indirect Adaptive Control 1 Adaptive Control Landau, Lozano,

Sensor-based trajectory optimization ABB Robotics Master thesis Martin Biel Supervisor: Mikael

Trajectory Optimization, Imitation Learning Lecture 14 What will you take home today? Recap LQR

Optimal Control, LQR, Trajectory Optimization Lecture 13 What will you take home today? Intro

Nonlinear trajectory tracking control based on adaptive backstepping David Cabecinhas Dynamic

Adaptive Control Chapter 13: Multimodel adaptive control with switching Chapter 13: Multimodel

Adaptive Control Chapter 14: Adaptive regulation Rejection of unknown disturbances 1

Energy-efficient Trajectory Tracking for Mobile Devices Based on &quot;Energy-efficient

Trajectory Code Validation j y Slides 04/12/08 04/12/08 AAE 450 Spring 2008 Trajectory

Cognitive Linguistic Quick Test (CLQT) By: Grace Castillo &amp; Christine Truong What Does This

COMP 150: Developmental Robotics Instructor: Jivko Sinapov www.cs.tufts.edu/~jsinapov Language

Presented by: Lacey Mistretta In education for 14 years ( teacher, Numeracy Coach, Master

MaTRIX Maintenance-Oriented Test Requirements Identifier and Examiner Mary Jean Harrold

Eligible Voting Population for 2008 30 25.9 43.96 U.S. Elections Project

BEREC work plan and strategy: some CERRE comments BEREC Stakeholders Forum 16 October 2014,

Searchable Symmetric Encryption: Optimal Locality in Linear Space via Two-Dimensional Balanced

EMERGENT RESPONDING: GETTING MORE BANG FOR YOUR BUCK WHEN TEACHING VERBAL BEHAVIOR S A R A

Energy-efficient Trajectory Tracking for Mobile Devices Based on "Energy-efficient

Cognitive Linguistic Quick Test (CLQT) By: Grace Castillo & Christine Truong What Does This