CS 188: Artificial Intelligence Advanced Applications: Robotics Pieter Abbeel – UC Berkeley A few slides from Sebastian Thrun, Dan Klein 2 So Far Mostly Foundational Methods 3 1
Advanced Applications 4 [DEMO: Race, Short] Autonomous Vehicles Autonomous vehicle slides adapted from Sebastian Thrun 2
[DEMO: GC Bad, Good] Grand Challenge: Barstow, CA, to Primm, NV 150 mile off-road robot race § across the Mojave desert Natural and manmade hazards § No driver, no remote control § No dynamic passing § An Autonomous Car E-stop 5 Lasers GPS Camera GPS compass Radar 6 Computers Control Screen Steering motor IMU 3
Actions: Steering Control Steering Angle Velocity (with respect to trajectory) Error Reference Trajectory [DEMO: LIDAR] Sensors: Laser Readings 4
Readings: No Obstacles 3 2 1 Readings: Obstacles Δ Z 5
Probabilistic Error Model GPS GPS GPS IMU IMU IMU x t x t+ 1 x t+ 2 z t z t+ 1 z t+ 2 HMMs for Detection Raw Measurements: 12.6% false positives HMM Inference: 0.02% false positives 6
Motivating Example n How do we execute a task like this? [demo: autorotate / tictoc] Autonomous Helicopter Flight § Key challenges: § Track helicopter position and orientation during flight § Decide on control inputs to send to helicopter 7
Autonomous Helicopter Setup On-board inertial measurement unit (IMU) Position Send out controls to helicopter HMM for Tracking the Helicopter z, ˙ Á , ˙ µ , ˙ s = ( x, y, z, Á , µ , Ã , ˙ x, ˙ y, ˙ Ã ) § State: § Measurements: § 3-D coordinates from vision, 3-axis magnetometer, 3-axis gyro, 3-axis accelerometer § Transitions (dynamics): [time elapse update] § s t+1 = f (s t , a t ) + w t 27 [f encodes helicopter dynamics] [w is a probabilistic noise model] 8
Helicopter MDP z, ˙ Á , ˙ µ , ˙ s = ( x, y, z, Á , µ , Ã , ˙ x, ˙ y, ˙ Ã ) § State: § Actions (control inputs): § a lon : Main rotor longitudinal cyclic pitch control (affects pitch rate) § a lat : Main rotor latitudinal cyclic pitch control (affects roll rate) § a coll : Main rotor collective pitch (affects main rotor thrust) § a rud : Tail rotor collective pitch (affects tail rotor thrust) § Transitions (dynamics): § s t+1 = f (s t , a t ) + w t [f encodes helicopter dynamics] [w is a probabilistic noise model] § Can we solve the MDP yet? Problem: What’s the Reward? [demo: hover] § Rewards for hovering: § Rewards for “Tic-Toc”? § Problem: what’s the target trajectory? § Just write it down by hand? [demo: bad] 29 9
[demo: unaligned] Helicopter Apprenticeship? 30 Probabilistic Alignment using a Bayes’ Net Intended trajectory Expert demonstrations Time indices § Intended trajectory satisfies dynamics. § Expert trajectory is a noisy observation of one of the hidden states. § But we don’t know exactly which one. [Coates, Abbeel & Ng, 2008] 10
[demo: alignment] Alignment of Samples § Result: inferred sequence is much cleaner! 32 [demo: airshow] Final Behavior 33 11
Quadruped § Low-level control problem: moving a foot into a new location à search with successor function ~ moving the motors § High-level control problem: where should we place the feet? § Reward function R(x) = w . f (s) [25 features] [Kolter, Abbeel & Ng, 2008] Apprenticeship Learning § Goal: learn reward function from expert demonstration § Assume § Get expert demonstrations § Guess initial policy § Repeat: § Find w which make the expert better than § Solve MDP for new weights w: 35 12
Without learning With learned reward function 13
Recommend
More recommend