Control of a Quadrotor with Reinforcement Learning Jemin Hwangbo, - PowerPoint PPT Presentation

Jun 08, 2023 •317 likes •471 views

Control of a Quadrotor with Reinforcement Learning Jemin Hwangbo, Inkyu Sa, Roland Siegwart, and Marco Hutter Robotic Systems Lab, ETH Zurich Presented by Nicole McNabb University of Waterloo June 27, 2018 1 / 15 Overview Introduction 1

Control of a Quadrotor with Reinforcement Learning Jemin Hwangbo, Inkyu Sa, Roland Siegwart, and Marco Hutter Robotic Systems Lab, ETH Zurich Presented by Nicole McNabb University of Waterloo June 27, 2018 1 / 15
Overview Introduction 1 The Method 2 Empirical Results 3 Summary and Future Work 4 2 / 15
Introduction What is a quadrotor? Figure: Quadrotor [1] 3 / 15
Introduction What is a quadrotor? High-level goal: Train the quadrotor to perform tasks with varying initializations A policy optimization problem. Figure: Quadrotor [1] 4 / 15
Introduction Related Approaches Deep Deterministic Policy Trust Region Policy Optimization Gradient (DDPG) (TRPO) Actor-critic architecture Actor-critic architecture Off-policy, model-free Off-policy, model-free Deterministic Stochastic Insufficient exploration Computationally intensive Very slow (if any) Slow, unreliable convergence convergence 5 / 15
Introduction A New Approach Goal: A deterministic model with Fast and stable convergence Model-free training Extensive exploration Solution: A method combining the actor-critic architecture with an on-policy deterministic policy gradient algorithm and a new exploration strategy. 6 / 15
The Method Setup Continuous State-Action Space State Space 18-D states, model: Orientation (or rotation) Position Linear velocity of system Angular velocity of system Action Space 4-D actions, dictate rotor thrust for each rotor 7 / 15
The Method Exploration Figure: Exploration Strategy [2] 8 / 15
The Method Network Training Figure: Value Network [2] Figure: Policy Network [2] Value function training: Policy optimization: Approximate with Monte-Carlo Same idea as TRPO, replacing samples obtained from current KL-divergence with Mahalanobis trajectory metric 9 / 15
The Method Learning Algorithm Algorithm 1 Policy optimization 1: Input: Initial value function approximation, initial policy 2: for j = 1,2,. . . do Perform exploration, take action 3: Compute MC estimates from current trajectory 4: Do approximate value function update 5: Do policy gradient update 6: 7: end for 10 / 15
Empirical Results Empirical Results Training done in simulation Testing on two main tasks done on a real quadrotor 11 / 15
Summary and Future Work Summary Primary contributions: A new deterministic, model-free neural network policy for training a quadrotor Stable and reliable performance on hard tasks, even under harsh initial conditions 12 / 15
Summary and Future Work Future Research Also compare model against PPO Introducing more accurate model of the system into simulation Train an RNN to adapt to model errors automatically 13 / 15
Summary and Future Work References https://www.seeedstudio.com/Crazyflie-2.0-p-2103.html Jemin Hwangbo, Inkyu Sa, Roland Siegwart, and Marco Hutter Control of a Quadrotor with Reinforcement Learning IEEE Robotics and Automation Letters , June 2017. 14 / 15
Summary and Future Work Questions? 15 / 15

Recommend

Introducing the Quadrotor Introducing the Quadrotor Flying Robot Flying Robot Roy Brewer

Introducing the Quadrotor Introducing the Quadrotor Flying Robot Flying Robot Roy Brewer Organizer Phila ilade delph lphia ia Robo Robotic ics s Meet etup up Group up Phila ilade delph lphia ia Robo Robotic ics s Meet etup

323 views • 20 slides

The JAviator Quadrotor The JAviator Quadrotor An Aerial Software Testbed An Aerial Software

The JAviator Quadrotor The JAviator Quadrotor An Aerial Software Testbed An Aerial Software Testbed Rainer Trummer Department of Computer Sciences University of Salzburg Austria Introduction Introduction The JAviator Project The JAviator

137 views • 13 slides

Reinforcement Learning AIMA Chapters: 21.1, 21.2, 21.3. Sutton and Barto, Reinforcement Learning:

Reinforcement Learning Reinforcement Learning AIMA Chapters: 21.1, 21.2, 21.3. Sutton and Barto, Reinforcement Learning: an Introduction, 2nd Edition: Chapters 6 (6.1 6.5) Outline Reinforcement Learning Reinforcement Learning: the

587 views • 27 slides

Reinforcement Learning Timothy Chou Charlie Tong Vincent Zhuang April 19, 2016 Reinforcement

Reinforcement Learning Q-Learning Deep Q-Learning on Atari Reinforcement Learning Timothy Chou Charlie Tong Vincent Zhuang April 19, 2016 Reinforcement Learning Q-Learning Deep Q-Learning on Atari Table of Contents Reinforcement Learning

939 views • 63 slides

RL Overview of topics About Reinforcement Learning The Reinforcement Learning Problem

Introduction to Reinforcement Learning RL Overview of topics About Reinforcement Learning The Reinforcement Learning Problem Inside an RL agent Temporal difference learning Many faces of Reinforcement Learning What is

552 views • 35 slides

Reinforcement Learning UMaine COS 470/570 Introduction to AI Why reinforcement learning?

Reinforcement Learning<br/><br/> 4/25/19, 8*06 PM Reinforcement Learning<br/><br/> 4/25/19, 8*06 PM Reinforcement Learning UMaine COS 470/570 Introduction to AI Why reinforcement learning? Spring 2019 Created:

371 views • 15 slides

Reinforcement Learning and Simulation-Based Search David Silver Reinforcement Learning and

Reinforcement Learning and Simulation-Based Search Reinforcement Learning and Simulation-Based Search David Silver Reinforcement Learning and Simulation-Based Search Outline 1 Reinforcement Learning 2 Simulation-Based Search 3 Planning Under

425 views • 20 slides

Reinforcement Learning Reinforcement Learning Reinforcement Learning in a nutshell g Imagine

Reinforcement Learning Reinforcement Learning Reinforcement Learning in a nutshell g Imagine playing a new game whose rules you dont know; after a hundred or so moves your don t know; after a hundred or so moves, your opponent announces, You

512 views • 30 slides

Safe Reinforcement Learning Philip S. Thomas Stanford CS234: Reinforcement Learning, Guest

Safe Reinforcement Learning Philip S. Thomas Stanford CS234: Reinforcement Learning, Guest Lecture May 24, 2017 Lecture overview What makes a reinforcement learning algorithm safe ? Notation Creating a safe reinforcement learning

1.42k views • 88 slides

7. Motor Control and Reinforcement Learning Outline A. Action Selection and Reinforcement B.

7. Motor Control and Reinforcement Learning Outline A. Action Selection and Reinforcement B. Temporal Difference Reinforcement Learning C. PVLV Model D. Cerebellum and Error-driven Learning 2/23/18 COSC 494/594 CCN 2 Sensory-Motor Loop

791 views • 56 slides

CMSC828T Vision, Planning And Control In Aerial Robotics QUADROTOR DYNAMICS 9/7/2017 1 z Why

CMSC828T Vision, Planning And Control In Aerial Robotics QUADROTOR DYNAMICS 9/7/2017 1 z Why is Dynamics Important? Point A to Point B Most of these slides are inspired by MEAM620 Slides at UPenn 9/7/2017 2 z Forces and Moments 2

181 views • 7 slides

CS885 Reinforcement Learning Module 2: June 6, 2020 Maximum Entropy Reinforcement Learning

CS885 Reinforcement Learning Module 2: June 6, 2020 Maximum Entropy Reinforcement Learning Haarnoja, Tang et al. (2017) Reinforcement Learning with Deep Energy Based Policies, ICML . Haarnoja, Zhou et al. (2018) Soft Actor-Critic: Off-Policy

684 views • 24 slides

Introduction to Reinforcement Learning Kevin Chen and Zack Khan Lecture 1: Introduction to

Lecture 1: Introduction to Reinforcement Learning Introduction to Reinforcement Learning Kevin Chen and Zack Khan Lecture 1: Introduction to Reinforcement Learning Outline 1. Course Logistics 2. What is Reinforcement Learning? 3.

930 views • 67 slides

Reinforcement Learning Reinforcement Learning Now that you know a little about Optimal Control

Reinforcement Learning Reinforcement Learning Now that you know a little about Optimal Control Theory, you actually have some knowledge in RL. RL shares the overall goal with OCT: solving for a control policy such that the cumulative cost

610 views • 22 slides

Quadrotor State Estimation and Obstacle Detection Robot Autonomy Project Cole, Job, Erik, Rohan

Quadrotor State Estimation and Obstacle Detection Robot Autonomy Project Cole, Job, Erik, Rohan I. Dynamics II. Differential Flatness III. Planning IV. Control Architecture V. State Estimation (EKF) VI. Sensors VII. SLAM (RTAB Map)

315 views • 16 slides

Low Cost solution for Pose Estimation of Quadrotor mangal@iitk.ac.in

Introduction Approach Pose Estimation using UWB sensor IIT Kanpur WiFi based Solutions Conclusion Low Cost solution for Pose Estimation of Quadrotor mangal@iitk.ac.in https://www.iitk.ac.in/aero/mangal/ Intelligent Guidance and Control

441 views • 42 slides

of Compressors and Turbines (AE 651) Autumn Semester 2009 Instructor : Bhaskar Roy Professor,

Aerodynamics of Compressors and Turbines (AE 651) Autumn Semester 2009 Instructor : Bhaskar Roy Professor, Aerospace Engineering Department I.I.T., Bombay e-mail : aeroyia@aero.iitb.ac.in 1 h-s of Axial Turbine Stage Axial Turbines 2 2

322 views • 15 slides

based neutrino facilities Chair Francis Halzen (US) francis.halzen@icecube.wisc.edu Accelerator

ECFA Review Panel for future accelerator based neutrino facilities Chair Francis Halzen (US) francis.halzen@icecube.wisc.edu Accelerator specialists: Terence Garvey (CH) terence.garvey@psi.ch David Findlay (UK)

149 views • 13 slides

Electron beam results Jerry Gilfoyle The Configurations of CO 1 / 28 More On Carbon

More On Carbon Monoxide E = 0 . 25 0 . 05 eV Electron beam results Jerry Gilfoyle The Configurations of CO 1 / 28 More On Carbon Monoxide E = 0 . 25 0 . 05 eV Electron beam results Jerry Gilfoyle The

779 views • 40 slides

The extended Planetary Nebula Spectrograph (ePN.S) early-type galaxy survey The kinematic

The extended Planetary Nebula Spectrograph (ePN.S) early-type galaxy survey The kinematic diversity of stellar halos and the relation between halo transition scale and stellar mass Claudia Pulsoni, Ortwin Gerhard, Magda Arnaboldi, and the

74 views • 5 slides

Robustness of the rotor-router mechanism E. Bampas 1 , 2 , L. G asieniec 3 , R. Klasing 1 , A.

Robustness of the rotor-router mechanism E. Bampas 1 , 2 , L. G asieniec 3 , R. Klasing 1 , A. Kosowski 1 , 4 , T. Radzik 5 1 LaBRI, CNRS / INRIA / Univ. of Bordeaux, France 2 School of Elec. & Comp. Eng., National Technical Univ. of Athens,

658 views • 30 slides

UMBC A B M A L F T U M B C I O M Y O T R 1 (November 26, 2000 6:00 pm) I E S

Principles of VLSI Design Technology CMPE 413/CMSC 711 ITRS International Technology Roadmap for Semiconductors The unique factor that has made the semiconductor industry successful: Decreases in feature size have provided improved

220 views • 9 slides

The ABCDs of Paxos Replicated state machines Consensus: a set of processes decide on an input

The ABCDs of Paxos Replicated state machines Consensus: a set of processes decide on an input value Paxos asynchronous consensus algorithm AP Abstract Paxos: generic, non-local version CP Classic Paxos: stopping failures, compare-and-swap

523 views • 25 slides

r rrr

r rrr rrs rrr rrs rrr

1.11k views • 64 slides