DeepTraffic: Driving Fast through Dense Traffic with Deep - PowerPoint PPT Presentation

DeepTraffic: Driving Fast through Dense Traffic with Deep Reinforcement Learning Lex Fridman DeepTraffic: Driving Fast through Dense Traffic Lex Fridman GTC 2017 with Deep Reinforcement Learning fridman@mit.edu May 11

DeepTraffic: Driving Fast through Dense Traffic Lex Fridman GTC 2017 with Deep Reinforcement Learning fridman@mit.edu May 11

Americans spend 8 billion hours stuck in traffic every year. DeepTraffic: Driving Fast through Dense Traffic Lex Fridman GTC 2017 with Deep Reinforcement Learning fridman@mit.edu May 11

Goal: Deep Learning for Everyone accessible and fun: seconds to start, eternity* to master http://cars.mit.edu or search for: “DeepTraffic” * estimated time to discover globally optimal solution DeepTraffic: Driving Fast through Dense Traffic Lex Fridman GTC 2017 with Deep Reinforcement Learning fridman@mit.edu May 11

Goal: Deep Learning for Everyone To Play: To Win: DeepTraffic: Driving Fast through Dense Traffic Lex Fridman GTC 2017 with Deep Reinforcement Learning fridman@mit.edu May 11

Machine Learning from Human and Machine Memorization Understanding DeepTraffic: Driving Fast through Dense Traffic Lex Fridman GTC 2017 with Deep Reinforcement Learning fridman@mit.edu May 11

http://cars.mit.edu/deeptesla DeepTraffic: Driving Fast through Dense Traffic Lex Fridman GTC 2017 with Deep Reinforcement Learning fridman@mit.edu May 11

Naturalistic Driving Data Teslas instrumented: 18 Hours of data: 6,000+ hours Distance traveled: 140,000+ miles Video frames: 2+ billion Autopilot: ~12% DeepTraffic: Driving Fast through Dense Traffic Lex Fridman GTC 2017 with Deep Reinforcement Learning fridman@mit.edu May 11

Naturalistic Driving Data DeepTraffic: Driving Fast through Dense Traffic Lex Fridman GTC 2017 with Deep Reinforcement Learning fridman@mit.edu May 11

http://cars.mit.edu/deeptesla DeepTraffic: Driving Fast through Dense Traffic Lex Fridman GTC 2017 with Deep Reinforcement Learning fridman@mit.edu May 11

• Localization and Mapping: Where am I? • Scene Understanding: Where/who/what/why of everyone else? • Movement Planning: How do I get from A to B? • Driver State: What’s the driver up to? • Communicate: How to I convey intent to the driver and to the world? DeepTraffic: Driving Fast through Dense Traffic Lex Fridman GTC 2017 with Deep Reinforcement Learning fridman@mit.edu May 11

Autonomous Driving: A Hierarchical View Paden B, Čáp M, Yong SZ, Yershov D, Frazzoli E. "A Survey of Motion Planning and Control Techniques for Self- driving Urban Vehicles." IEEE Transactions on Intelligent Vehicles 1.1 (2016): 33-55. DeepTraffic: Driving Fast through Dense Traffic Lex Fridman GTC 2017 with Deep Reinforcement Learning fridman@mit.edu May 11

Applying Deep Reinforcment Learning to Micro-Traffic Simulation DeepTraffic: Driving Fast through Dense Traffic Lex Fridman GTC 2017 Reference: http://www.traffic-simulation.de with Deep Reinforcement Learning fridman@mit.edu May 11

Formulate Driving as Reinforcement Learning Problem How to formalize and learn driving? DeepTraffic: Driving Fast through Dense Traffic Lex Fridman GTC 2017 with Deep Reinforcement Learning fridman@mit.edu May 11

Philosophical Motivation for Reinforcement Learning Takeaway from Supervised Learning: Neural networks are great at memorization and not (yet) great at reasoning. Hope for Reinforcement Learning: Brute-force propagation of outcomes to knowledge about states and actions. This is a kind of brute- force “reasoning”. DeepTraffic: Driving Fast through Dense Traffic Lex Fridman GTC 2017 with Deep Reinforcement Learning fridman@mit.edu May 11

(Deep) Reinforcement Learning • Pros: • Cheap: Very little human annotation is needed. • Robust: Can learn to act under uncertainty. • General: Can (seemingly) deal with (huge) raw sensory input. • Promising: Our current best framework for achieving “intelligence”. • Cons • Constrained by Formalism: Have to formally define the state space, the action space, the reward, and the simulated environment. • Huge Data: Have to be able to simulate (in software or hardware) or have a lot of real-world examples. DeepTraffic: Driving Fast through Dense Traffic Lex Fridman GTC 2017 with Deep Reinforcement Learning fridman@mit.edu May 11

Agent and Environment • At each step the agent: • Executes action • Receives observation (new state) • Receives reward • The environment: • Receives action • Emits observation (new state) • Emits reward DeepTraffic: Driving Fast through Dense Traffic Lex Fridman GTC 2017 References: [80] with Deep Reinforcement Learning fridman@mit.edu May 11

Markov Decision Process 𝑡 0 , 𝑏 0 , 𝑠 1 , 𝑡 1 , 𝑏 1 , 𝑠 2 , … , 𝑡 𝑜 −1 , 𝑏 𝑜 −1 , 𝑠 𝑜 , 𝑡 𝑜 state Terminal state action reward DeepTraffic: Driving Fast through Dense Traffic Lex Fridman GTC 2017 References: [84] with Deep Reinforcement Learning fridman@mit.edu May 11

Major Components of an RL Agent An RL agent may include one or more of these components: • Policy: agent’s behavior function • Value function: how good is each state and/or action • Model: agent’s representation of the environment 𝑡 0 , 𝑏 0 , 𝑠 1 , 𝑡 1 , 𝑏 1 , 𝑠 2 , … , 𝑡 𝑜 −1 , 𝑏 𝑜 −1 , 𝑠 𝑜 , 𝑡 𝑜 state Terminal state action reward DeepTraffic: Driving Fast through Dense Traffic Lex Fridman GTC 2017 with Deep Reinforcement Learning fridman@mit.edu May 11

Robot in a Room actions: UP , DOWN, LEFT , RIGHT +1 UP -1 80% move UP 10% move LEFT 10% move RIGHT START • reward +1 at [4,3], -1 at [4,2] • reward -0.04 for each step • what’s the strategy to achieve max reward? • what if the actions were deterministic? DeepTraffic: Driving Fast through Dense Traffic Lex Fridman GTC 2017 with Deep Reinforcement Learning fridman@mit.edu May 11

Is this a solution? +1 -1 • only if actions deterministic • not in this case (actions are stochastic) • solution/policy • mapping from each state to an action DeepTraffic: Driving Fast through Dense Traffic Lex Fridman GTC 2017 with Deep Reinforcement Learning fridman@mit.edu May 11

Optimal policy +1 -1 DeepTraffic: Driving Fast through Dense Traffic Lex Fridman GTC 2017 with Deep Reinforcement Learning fridman@mit.edu May 11

Reward for each step -2 +1 -1 DeepTraffic: Driving Fast through Dense Traffic Lex Fridman GTC 2017 with Deep Reinforcement Learning fridman@mit.edu May 11

Reward for each step: -0.1 +1 -1 DeepTraffic: Driving Fast through Dense Traffic Lex Fridman GTC 2017 with Deep Reinforcement Learning fridman@mit.edu May 11

Reward for each step: +0.01 +1 -1 DeepTraffic: Driving Fast through Dense Traffic Lex Fridman GTC 2017 with Deep Reinforcement Learning fridman@mit.edu May 11

Value Function • Future reward 𝑆 = 𝑠 1 + 𝑠 2 + 𝑠 3 + ⋯ + 𝑠 𝑜 𝑆 𝑢 = 𝑠 𝑢 + 𝑠 𝑢 +1 + 𝑠 𝑢 +2 + ⋯ + 𝑠 𝑜 • Discounted future reward (environment is stochastic) 𝑆 𝑢 = 𝑠 𝑢 + 𝛿𝑠 𝑢 +1 + 𝛿 2 𝑠 𝑢 +2 + ⋯ + 𝛿 𝑜 − 𝑢 𝑠 𝑜 = 𝑠 𝑢 + 𝛿 ( 𝑠 𝑢 +1 + 𝛿 ( 𝑠 𝑢 +2 + ⋯ )) = 𝑠 𝑢 + 𝛿𝑆 𝑢 +1 • A good strategy for an agent would be to always choose an action that maximizes the (discounted) future reward DeepTraffic: Driving Fast through Dense Traffic Lex Fridman GTC 2017 References: [84] with Deep Reinforcement Learning fridman@mit.edu May 11

Q-Learning s a • State-action value function: Q  (s,a) r • Expected return when starting in s , performing a, and following  s’ • Q-Learning: Use any policy to estimate Q that maximizes future reward: • Q directly approximates Q* (Bellman optimality equation) • Independent of the policy being followed • Only requirement: keep updating each (s,a) pair Learning Rate Discount Factor New State Old State Reward DeepTraffic: Driving Fast through Dense Traffic Lex Fridman GTC 2017 with Deep Reinforcement Learning fridman@mit.edu May 11

Exploration vs Exploitation • Key ingredient of Reinforcement Learning • Deterministic/greedy policy won’t explore all actions • Don’t know anything about the environment at the beginning • Need to try all actions to find the optimal one • Maintain exploration Use soft policies instead:  (s,a)>0 (for all s,a) • • ε -greedy policy • With probability 1- ε perform the optimal/greedy action • With probability ε perform a random action • Will keep exploring the environment • Slowly move it towards greedy policy: ε -> 0 DeepTraffic: Driving Fast through Dense Traffic Lex Fridman GTC 2017 with Deep Reinforcement Learning fridman@mit.edu May 11

Q-Learning: Value Iteration A1 A2 A3 A4 S1 +1 +2 -1 0 S2 +2 0 +1 -2 S3 -1 +1 0 -2 S4 -2 0 +1 +1 DeepTraffic: Driving Fast through Dense Traffic Lex Fridman GTC 2017 References: [84] with Deep Reinforcement Learning fridman@mit.edu May 11

DeepTraffic: Driving Fast through Dense Traffic with Deep - PowerPoint PPT Presentation

DeepTraffic: Driving Fast through Dense Traffic with Deep Reinforcement Learning Lex Fridman DeepTraffic: Driving Fast through Dense Traffic Lex Fridman GTC 2017 with Deep Reinforcement Learning fridman@mit.edu May 11 DeepTraffic: Driving

Dense Flow Visualization Lecture 10 February 27, 2020 General Overview Dense methods in 2D

A Massively Parallel Dense Symmetric A Massively Parallel Dense Symmetric A Massively Parallel

Results for different matrices and comparisons Dense Matrices Rectangular Matrices

Intelligent Driving Agents Intelligent Driving Agents Microscopic traffic simulation with

Distracted Driving Jennifer Smith What is Distracted Driving? Driving while engaged in any

Self-Driving Cars As Edge Computing Devices Matt Ranney - @mranney Uber ATG Why Self-Driving?

DRIVING AI 1 Driving AI AI world representation Path finding AI driving

Traffic Shaping, Traffic Policing Peter Puschner, Institut fr Technische Informatik Traffic

Traffic signal optimization and traffic assignment Traffic signals Traffic signal optimization

Dense cold mixes: Preservation of Dense cold mixes: Preservation of county roads county roads

Safe Driving Techniques Road Safety Management Use of mobile phones Safe Driving Policy

Being a METS Startup Fast Failure; Fast Reward November 2016 Fast Failure; Fast Reward

VoIP/SMPP traffic sniffer Break through your data Traffic sniffer modules VoIP traffic sniffer

The Traffic Conflicts Methodology revisited Richard van der Horst Traffic Safety Assessment

Traffic Engineering with Traffic Engineering with Estimated Traffic Matrices Estimated Traffic

AGN deep multiwavelength AGN deep multiwavelength AGN deep multiwavelength surveys: surveys:

Local. Honest. Professional. Company Name BWD Capital, LLC Company President Dave

Caltrain Fare Study Update Board of Directors May 3, 2018 Agenda Item 8 Overview Fare Study

Toll Schedule Analyses Board Meeting 11-17-16 Overview n Conducted Toll Schedule Analyses per

Federal and State Cuts to 340B Drug Pricing Program Helen Jung 340B Cuts: Bottom Line $1.6

Royal Philips Second quarter and semi-annual 2019 results July 22, 2019 Important information

INVEST INVESTOR OR PRESEN PRESENTATION TION | A X O N E N T E R P R I S E | A u g u s t

Welcome! Your Colorado Real Estate Solutions Professionals! 8/25/2012 Agenda Woot

Investor Presentation May 15, 2015 Information Related to Forward-Looking Statements This