for end to end simulated driving
play

for End-to-End Simulated Driving Jiakai Zhang, Kyunghyun Cho New - PowerPoint PPT Presentation

Query-Efficient Imitation Learning for End-to-End Simulated Driving Jiakai Zhang, Kyunghyun Cho New York University Overview Introduction End-to-end learning for self-driving Related work Learning method Convolutional


  1. Query-Efficient Imitation Learning for End-to-End Simulated Driving Jiakai Zhang, Kyunghyun Cho New York University

  2. Overview  Introduction • End-to-end learning for self-driving • Related work  Learning method • Convolutional neural network • Imitation learning using SafeDAgger  Experiment • Setup • Results  Conclusion and future work

  3. Introduction  End-to-end learning for self-driving • Sensory input from front-facing camera • Control signal Steering Brake

  4. Introduction  Related work • Supervised learning • ALVINN net [Pomerleau 1989] • DeepDriving [Chen et al. 2015] • End-to-end learning for self-driving cars [Bojarski et al. 2016] • Imitation learning • DAgger [Ross, Gordon, and Bagnell 2010] • SafeDAgger [Zhang and Cho 2017]

  5. DAgger algorithm Dataset 𝐸 0 Policy 𝜌 1 Initialize Policy 𝜌 𝑗 = 𝛾 𝑗 𝜌 ∗ + (1 − 𝛾 𝑗 ) Dataset 𝐸 ′ 𝜌 𝑗 Iteration Dataset 𝐸 𝑗 = 𝐸 ′ ∪ 𝐸 𝑗−1 Policy 𝜌 𝑗 Disadvantage: Return Best policy 𝜌 𝑗 • Query a reference policy constantly • Safe issue to environment

  6. SafeDAgger algorithm Policy 𝜌 1 Safety classifier 𝑑 1 Initialize Dataset 𝐸 0 Policy 𝜌 𝑗 = 𝛾 𝑗 𝜌 ∗ + (1 − 𝛾 𝑗 ) 𝜌 𝑗 Dataset 𝐸 ′ not safe Safety classifier 𝑑 𝑗 Iteration Policy 𝜌 𝑗 Dataset 𝐸 𝑗 = 𝐸 ′ ∪ 𝐸 𝑗−1 Safety classifier 𝑑 1 Advantage: Return Safety classifier 𝑑 𝑗 Best policy 𝜌 𝑗 • Query-efficient • Safety feature

  7.  Safety classifier • Deviation of a primary policy from a reference policy defined • Optimal safety classifier defined as  Learning safety classifier • Minimize a binary cross-entropy loss

  8. Experiment – Setup  TORCS – Open source racing game Training tracks Test tracks

  9. Experiment – Model Input image – 3x160x72 Convolutional layer – 64x3x3 x 4 Max Pooling – 2x2 Convolutional layer – 128x5x5 Feature map x 2 Fully connected layer x 2 Fully connected layer Control Environment Safety value signals variables Primary policy Safety classifier Optimization algorithm: stochastic gradient descent

  10. Results Safe Frames Unsafe Frames

  11. Results  Evaluation on test tracks 1. Mean squared error of steering angle 2. Damage per lap 3. Number of laps 4. Portion of time driven by a reference policy

  12. Results Mean squared error of steering angle MSE (Steering Angle) # of Dagger Iterations Dashed curve – with traffic Solid curve – without traffic

  13. Results Damage per Lap Damage per Lap # of Dagger Iterations Dashed curve – with traffic Solid curve – without traffic

  14. Results Number of Laps Avg. # of Laps # of Dagger Iterations Dashed curve – with traffic Solid curve – without traffic

  15. Results Portion of time driven by a reference policy % of c safe = 0 # of Dagger Iterations Dashed curve – with traffic Solid curve – without traffic

  16. Demo

  17. Conclusion  Proposed SafeDAgger algorithm • Query efficient • Safety feature  End-to-end simulated driving • Trained a convolutional neural network to drive in TORCS with traffic Future work  Evaluate SafeDAgger in the real world  Learn to use temporal information

Recommend


More recommend