Reinforcement Learning-Based End-to-End Parking for Automatic - PowerPoint PPT Presentation

Reinforcement Learning-Based End-to-End Parking for Automatic Parking System CS885 – Reinforcement Learning Paper by: P. Zhang, L. Xiong, Z. Yu, P. Fang, S. Yan, J. Yao, and Y. Zhou (Sensors 2019) Presented by: Neel Bhatt

Context and Motivation  High density urban parking facilities can benefit from an automated parking system (APS):  Increase parking safety  Enhance utilization rate and convenience  BS ISO 16787-2016 stipulates parking inclination angle to be confined within ±3°  This paper focuses on a DDPG based end-to- end automated parking algorithm End-to-End DDPG APS University of Waterloo – Neel Bhatt PAGE 2

Related Work Path Planning  Consists of predefined trajectory functions: B-splines, 𝜃 3 -splines, Reeds-Shepp curves  Involves geometric numerical optimization of the curve parameters subject to vehicle non- holonomic constraints Path Tracking  Often accomplished through feedforward control using 2DOF vehicle dynamics model  Proportional-Integral-Differential (PID) Control  Sliding Mode Control (SMC) End-to-End DDPG APS University of Waterloo – Neel Bhatt PAGE 3

Problem Background and MDP Formulation  The features of the parking spot include T and L shaped markings  In an end-to-end scheme, these features are identified and represented internally  In this paper, a separate vision based detection module (with tracking) is used End-to-End DDPG APS University of Waterloo – Neel Bhatt PAGE 4

Problem Background and MDP Formulation  The state, 𝑡 , consists of features that correspond to coordinates of the 4 corners of the desired parking spot  The action, 𝑏 , refers to the continuous space of steering angle provided by the APS  The state transition function, 𝑈 , is unknown and not modelled explicitly End-to-End DDPG APS University of Waterloo – Neel Bhatt PAGE 5

Problem Background and MDP Formulation  The reward, 𝑠 , is formulated as: 𝑠 = 𝑆 𝑑𝑞 + 𝑆 𝑚 + 𝑆 𝑒 Deviation from the center of the parking spot and attitude error:  𝑆 𝑑𝑞 = Line Pressing:  𝑆 𝑚 = −10 Lateral Bias:  𝑆 𝑒 = −10 End-to-End DDPG APS University of Waterloo – Neel Bhatt PAGE 6

Deep Deterministic Policy Gradient (DDPG)  DDPG is a model-free, off-policy actor-critic algorithm based on DPG End-to-End DDPG APS University of Waterloo – Neel Bhatt PAGE 7

DDPG – Training Process  Note that the action features are included as network inputs  A target Q network is updated based on the hyperparameter 𝜐 < 1  The temporal difference between the target and Q network are used perform gradient updates  The parameters of the Q network are updated by minimizing the MSE loss function as in DQN End-to-End DDPG APS University of Waterloo – Neel Bhatt PAGE 8

DDPG – Training Process  The actor is trained using the DPG theorem:  A target 𝜌 network is updated based on the hyperparameter 𝜐 < 1  The presence of the Q function gradient over actions points to utilizing this Q function gradient as an error signal to update actor parameters End-to-End DDPG APS University of Waterloo – Neel Bhatt PAGE 9

Network Architecture Critic Actor End-to-End DDPG APS University of Waterloo – Neel Bhatt PAGE 10

Overall Scheme End-to-End DDPG APS University of Waterloo – Neel Bhatt PAGE 11

Experimental Evaluation – 60°  Initial approach angles: 60,45, and 30°  Attitude inclination error: -0.747°  Path planning and tracking approaches such as PID and SMC show > 3° attidude error 60° End-to-End DDPG APS University of Waterloo – Neel Bhatt PAGE 12

Experimental Evaluation – 45 and 30°  The attitude error remain < 1° for initial attitude angles of 45 and 30° 45° 30° End-to-End DDPG APS University of Waterloo – Neel Bhatt PAGE 13

Discussion and Critique  Significant improvement in inclination error  Path Planning vs RL generated path: tracking issues  Tracking cannot be customized in unseen scenarios  Cases where approach angle is 90°  Is the claim of the approach being “end -to- end” valid?  DDPG can learn policies end-to-end based on original paper  Future directions: Inverse RL to mitigate sub-optimal reward convergence due to handcrafted reward scheme End-to-End DDPG APS University of Waterloo – Neel Bhatt PAGE 14

Reinforcement Learning-Based End-to-End Parking for Automatic - PowerPoint PPT Presentation

Reinforcement Learning-Based End-to-End Parking for Automatic Parking System CS885 Reinforcement Learning Paper by: P. Zhang, L. Xiong, Z. Yu, P. Fang, S. Yan, J. Yao, and Y. Zhou (Sensors 2019) Presented by: Neel Bhatt Context and

Reinforcement Learning AIMA Chapters: 21.1, 21.2, 21.3. Sutton and Barto, Reinforcement Learning:

Reinforcement Learning and Simulation-Based Search David Silver Reinforcement Learning and

Reinforcement Learning Timothy Chou Charlie Tong Vincent Zhuang April 19, 2016 Reinforcement

RL Overview of topics About Reinforcement Learning The Reinforcement Learning Problem

Reinforcement Learning UMaine COS 470/570 Introduction to AI Why reinforcement learning?

Reinforcement Learning Reinforcement Learning Reinforcement Learning in a nutshell g Imagine

Safe Reinforcement Learning Philip S. Thomas Stanford CS234: Reinforcement Learning, Guest

CS885 Reinforcement Learning Module 2: June 6, 2020 Maximum Entropy Reinforcement Learning

Introduction to Reinforcement Learning Kevin Chen and Zack Khan Lecture 1: Introduction to

Path following with reinforcement learning for autonomous cars - Mozzam Motiwala (IAS) Index

Introduction to Reinforcement Learning and Q-Learning Skyler Seto (ss3349) May 2, 2016 Skyler

7. Motor Control and Reinforcement Learning Outline A. Action Selection and Reinforcement B.

1 Deep Reinforcement Learning Qianqian Li, Nayeon Koong, Langtian He What is deep reinforcement

Introduction CSCE CSCE 496/896 496/896 Lecture 7: Lecture 7: Reinforcement Reinforcement

CSC2621 Topics in Robotics Reinforcement Learning in Robotics Week 11: Hierarchical Reinforcement

Machine Learning for NLP Reinforcement learning Aurlie Herbelot 2019 Centre for Mind/Brain

From Logistic Regression to Neural Networks CMSC 470 Marine Carpuat Logistic Regression What

Nonlinear Control Strategies for Aircraft Path Following N. Harris McClamroch Department of

Robots interacting with Humans: confronting the Critical Challenge of Machine Intelligence

Online Feedback Optimization with Applications to Power Systems Florian Drfler ETH Zrich

A Bayesian Approach to Empirical Local Linearization for Robotics Jo-Anne Ting 1 , Aaron

Lecture 9 Recurrent Neural Networks Im glad that Im Turing Complete now Xinyu Zhou

UTIAS C. J. Damaren University of Toronto Institute for Aerospace Studies 4925 Dufferin

Introduction, The PID Controller, State Space Models Automatic Control, Basic Course, Lecture 1