Visual Scene Understanding for Autonomous Driving Raquel Urtasun - - PowerPoint PPT Presentation

visual scene understanding for autonomous driving
SMART_READER_LITE
LIVE PREVIEW

Visual Scene Understanding for Autonomous Driving Raquel Urtasun - - PowerPoint PPT Presentation

Visual Scene Understanding for Autonomous Driving Raquel Urtasun University of Toronto Oct 3, 2014 R. Urtasun ( UofT) Autonomous Driving Oct 3, 2014 1 / 34 Autonomous Driving State of the art Localization, path planning, obstacle avoidance


slide-1
SLIDE 1

Visual Scene Understanding for Autonomous Driving

Raquel Urtasun

University of Toronto

Oct 3, 2014

  • R. Urtasun ( UofT)

Autonomous Driving Oct 3, 2014 1 / 34

slide-2
SLIDE 2

Autonomous Driving

State of the art Localization, path planning, obstacle avoidance

  • R. Urtasun ( UofT)

Autonomous Driving Oct 3, 2014 2 / 34

slide-3
SLIDE 3

Autonomous Driving

State of the art Localization, path planning, obstacle avoidance Heavy usage of Velodyne and detailed (recorded) maps

  • R. Urtasun ( UofT)

Autonomous Driving Oct 3, 2014 2 / 34

slide-4
SLIDE 4

Autonomous Driving

3D Laser- scanner

State of the art Localization, path planning, obstacle avoidance Heavy usage of Velodyne and detailed (recorded) maps Goal: autonomous driving cheap sensors and little prior knowledge

  • R. Urtasun ( UofT)

Autonomous Driving Oct 3, 2014 2 / 34

slide-5
SLIDE 5

Autonomous Driving

3D Laser- scanner

State of the art Localization, path planning, obstacle avoidance Heavy usage of Velodyne and detailed (recorded) maps Goal: autonomous driving cheap sensors and little prior knowledge Problems for computer vision Stereo, optical flow, visual odometry, structure-from-motion

  • R. Urtasun ( UofT)

Autonomous Driving Oct 3, 2014 2 / 34

slide-6
SLIDE 6

Autonomous Driving

State of the art Localization, path planning, obstacle avoidance Heavy usage of Velodyne and detailed (recorded) maps Goal: autonomous driving cheap sensors and little prior knowledge Problems for computer vision Stereo, optical flow, visual odometry, structure-from-motion Object detection, recognition and tracking

  • R. Urtasun ( UofT)

Autonomous Driving Oct 3, 2014 2 / 34

slide-7
SLIDE 7

Autonomous Driving

State of the art Localization, path planning, obstacle avoidance Heavy usage of Velodyne and detailed (recorded) maps Goal: autonomous driving cheap sensors and little prior knowledge Problems for computer vision Stereo, optical flow, visual odometry, structure-from-motion Object detection, recognition and tracking 3D scene understanding

  • R. Urtasun ( UofT)

Autonomous Driving Oct 3, 2014 2 / 34

slide-8
SLIDE 8

Autonomous Driving

State of the art Localization, path planning, obstacle avoidance Heavy usage of Velodyne and detailed (recorded) maps Goal: autonomous driving cheap sensors and little prior knowledge Problems for computer vision Stereo, optical flow, visual odometry, structure-from-motion Object detection, recognition and tracking 3D scene understanding

  • R. Urtasun ( UofT)

Autonomous Driving Oct 3, 2014 2 / 34

slide-9
SLIDE 9

Benchmarks: KITTI Data Collection

Two stereo rigs (1392 × 512 px, 54 cm base, 90◦ opening) Velodyne laser scanner, GPS+IMU localization 6 hours at 10 frames per second!

  • R. Urtasun ( UofT)

Autonomous Driving Oct 3, 2014 3 / 34

slide-10
SLIDE 10

The KITTI Vision Benchmark Suite

  • R. Urtasun ( UofT)

Autonomous Driving Oct 3, 2014 4 / 34

slide-11
SLIDE 11

First Difficulty: Sensor Calibration

T

GPS

T

Velodyne

T

C

T

C

Camera calibration [Geiger et al., ICRA 2012] Velodyne ↔ Camera registration GPS+IMU ↔ Velodyne registration

  • R. Urtasun ( UofT)

Autonomous Driving Oct 3, 2014 5 / 34

slide-12
SLIDE 12

Second Difficulty: Object Annotation

3D object labels: Annotators (undergrad students from KIT working for months) Occlusion labels: Mechanical Turk

  • R. Urtasun ( UofT)

Autonomous Driving Oct 3, 2014 6 / 34

slide-13
SLIDE 13

One more Difficulty: Evaluation

More than 200 submissions, 8000 downloads since CVPR 2012!

  • R. Urtasun ( UofT)

Autonomous Driving Oct 3, 2014 7 / 34

slide-14
SLIDE 14

An autonomous system has to sense the environment

  • R. Urtasun ( UofT)

Autonomous Driving Oct 3, 2014 8 / 34

slide-15
SLIDE 15

3D Reconstruction

Goal: given 2 cameras mounted on top of the car, reconstruct the environment in 3D.

  • R. Urtasun ( UofT)

Autonomous Driving Oct 3, 2014 9 / 34

slide-16
SLIDE 16

Joint Stereo, Flow, Occlusion and Segmentation

Slanted-plane MRF with explicit occlusion handling which also computes an

  • ver-segmentation of the image into superpixels

MRF on continuous variables (slanted planes) and discrete var. (boundary, super pixel assignments, outliers)

!"#$%&'( )*+,*$-

!"#$"%&'()*+),-"

).&$-*%/01/2.&$*/"3/4*+,*$-

./0%1)*2'()*+),-"

5*.&6"$4782/9*-:**$/4*+,*$-4/ ;/4-&-*4/

<==.#48"$ >8$+* ?"2.&$&' ?"$6$#"#4/@&'8&9.*/ 184='*-*/@&'8&9.*/

)#2*'28A*.4/BC?D/EF'9*.&*GH/*-/&.I/JKLLM/

////////////////////////&$%/)NO?/EF=7&$-&H/*-/&.I/JKLKMP/

Energy that looks at shape, compatibility and boundary length

  • R. Urtasun ( UofT)

Autonomous Driving Oct 3, 2014 10 / 34

slide-17
SLIDE 17

Comparison to the State-of-the-art on KITTI

Stereo Flow

  • Ours,(Joint)

Ours,(Joint) Ours,(Stereo) Ours,(Flow) VC4SF PCBP4SS StereoSLIC PR4Sf+E PCBP PR4Sceneflow AARBM wSGM

4.97% 4.86% 4.36% 4.04% 4.02% 3.92% 3.40% 3.39% 3.05% 2.83%

VC4SF PR4Sf+E PCBP4Flow MoMonSLIC PR4Sceneflow NLTGV4SC TGV2ADCSIFT BTF4ILLUM

6.52% 6.20% 5.93% 3.91% 3.76% 3.64% 3.57% 3.38% 2.82% 2.72%

Error,>,3,pixels,(Non4Occluded) Error,>,3,pixels,(Non4Occluded)

[Vogel,,et,al,,2014] [Vogel,,et,al,,2014] [Vogel,,et,al,,2013] [Vogel,,et,al,,2013] [Vogel,,et,al,,2013] [Vogel,,et,al,,2013] [Yamaguchi,,et,al,,2013] [Yamaguchi,,et,al,,2013] [Yamaguchi,,et,al,,2013] [Yamaguchi,,et,al,,2013] [Yamaguchi,,et,al,,2012] [Ran_l,,et,al,,2014] [Braux4Zin,,et,al,,2013] [Demetz,,et,al,,2014] [Einecke,,et,al,,2014]

[Spangenberg,,et,al,,2013]

Runtime on 1Core@3.5GHz for average resolution 1237.1 x 374.1 pixels

Joint Stereo)only Flow)only Total)run1me 26.3)sec. 4.8)sec. 11.0)sec.

  • R. Urtasun ( UofT)

Autonomous Driving Oct 3, 2014 11 / 34

slide-18
SLIDE 18

Results on KITTI

[K. Yamaguchi, D. McAllester and R. Urtasun, ECCV 2014]

Disparity)image Flow)image

Occlusion Hinge Coplanar

  • R. Urtasun ( UofT)

Autonomous Driving Oct 3, 2014 12 / 34

slide-19
SLIDE 19

An autonomous system has to understand the scene in 3D

  • R. Urtasun ( UofT)

Autonomous Driving Oct 3, 2014 13 / 34

slide-20
SLIDE 20

3D Scene Understanding

Goal: Infer from a short (≈10s) video sequence: Geometric properties, e.g., street orientation Topological properties, e.g., number of intersecting streets Semantic activities, e.g., traffic situations at an intersection 3D objects, e.g., cars

  • R. Urtasun ( UofT)

Autonomous Driving Oct 3, 2014 14 / 34

slide-21
SLIDE 21

Geometric Model

1 2 3 4 5 6 7 (Model topology) (Geometric parameters)

  • R. Urtasun ( UofT)

Autonomous Driving Oct 3, 2014 15 / 34

slide-22
SLIDE 22

Static and Dynamic Observations

Observations 3D Tracklets: Generate tracklets from 2D detections in 3D by employing the orientation as well as size of the bounding boxes

  • R. Urtasun ( UofT)

Autonomous Driving Oct 3, 2014 16 / 34

slide-23
SLIDE 23

Static and Dynamic Observations

Observations 3D Tracklets: Generate tracklets from 2D detections in 3D by employing the orientation as well as size of the bounding boxes Segmentation of the scene into semantic labels. Lines that follow the dominant orientations in the scene (i.e., reasoning about vanishing points).

  • R. Urtasun ( UofT)

Autonomous Driving Oct 3, 2014 16 / 34

slide-24
SLIDE 24

Static and Dynamic Observations

Observations 3D Tracklets: Generate tracklets from 2D detections in 3D by employing the orientation as well as size of the bounding boxes Segmentation of the scene into semantic labels. Lines that follow the dominant orientations in the scene (i.e., reasoning about vanishing points).

  • R. Urtasun ( UofT)

Autonomous Driving Oct 3, 2014 16 / 34

slide-25
SLIDE 25

Static and Dynamic Observations

Observations 3D Tracklets: Generate tracklets from 2D detections in 3D by employing the orientation as well as size of the bounding boxes Segmentation of the scene into semantic labels. Lines that follow the dominant orientations in the scene (i.e., reasoning about vanishing points). Representation We will reason about dynamics in bird eye’s perspective and static in the image.

  • R. Urtasun ( UofT)

Autonomous Driving Oct 3, 2014 16 / 34

slide-26
SLIDE 26

Why high-order semantics?

Certain behaviors are not possible given the traffic ”patterns”

  • R. Urtasun ( UofT)

Autonomous Driving Oct 3, 2014 17 / 34

slide-27
SLIDE 27

Why high-order semantics?

Certain behaviors are not possible given the traffic ”patterns” We learned those patterns from data. Example of traffic patterns learned from data for 4 way intersections

Pattern 1 Pattern 2 Pattern 4 Pattern 3 Pattern 5 Pattern 6 Pattern 7 Pattern 10 Pattern 11 Pattern 8 Pattern 9

The arrows represent our concept of lane

  • R. Urtasun ( UofT)

Autonomous Driving Oct 3, 2014 17 / 34

slide-28
SLIDE 28

Why high-order semantics?

Certain behaviors are not possible given the traffic ”patterns” We learned those patterns from data. Example of traffic patterns learned from data for 4 way intersections

Pattern 1 Pattern 2 Pattern 4 Pattern 3 Pattern 5 Pattern 6 Pattern 7 Pattern 10 Pattern 11 Pattern 8 Pattern 9

The arrows represent our concept of lane

  • R. Urtasun ( UofT)

Autonomous Driving Oct 3, 2014 17 / 34

slide-29
SLIDE 29

Joint Model

Let a be the traffic pattern, and ln the lane associated with a tracklet Road parameters are R = {θ, r, c, w, α}, The joint distribution is p(E, R) = p(R)

prior

  • a

N

  • n=1
  • ln

p(tn, ln, a|R)

  • Vehicle Tracklets

p(vf |R)p(vc|R)

  • Vanishing Points

p(S|R)

Semantic Labels

with E the image evidence.

  • R. Urtasun ( UofT)

Autonomous Driving Oct 3, 2014 18 / 34

slide-30
SLIDE 30

Vanishing Points and Segmentation Likelihoods

p(E, R) = p(R)

prior

  • a

N

  • n=1
  • ln

p(tn, ln, a|R)

  • Vehicle Tracklets

p(vf |R)p(vc|R)

  • Vanishing Points

p(S|R)

Semantic Labels

Make geometry agree with the vanishing points Make geometry agree with the segmentation

  • R. Urtasun ( UofT)

Autonomous Driving Oct 3, 2014 19 / 34

slide-31
SLIDE 31

Full Graphical Model

The joint distribution is p(E, R) = p(R)

prior

  • a

N

  • n=1
  • ln

p(tn, ln, a|R)

  • Vehicle Tracklets

p(vf |R)p(vc|R)

  • Vanishing Points

p(S|R)

Semantic Labels

with E the image evidence, R the intersection variables, ln the lane index and a the traffic pattern The vehicle tracklets are a little bit more complicated than described so far

  • R. Urtasun ( UofT)

Autonomous Driving Oct 3, 2014 20 / 34

slide-32
SLIDE 32

Tracklet model

1 i

g 

1 i

h 

i

g

i

h

i

d

n

l a V S N

frame 1 i frame i

We reason about: parked cars: in which spot? moving vehicles: in which lane and where in the lane are they? the traffic situation (i.e., traffic pattern) Our tracklet formulation p(tn, ln, a|R) combines a HMM with a dynamical system with constraints

  • R. Urtasun ( UofT)

Autonomous Driving Oct 3, 2014 21 / 34

slide-33
SLIDE 33

Results: Geometry and Trackets estimation

Inference is done via Metropolis Hastings sampling

Location Orientation Overlap Pattern error Method 3-arm 4-arm 3-arm 4-arm 3-arm 4-arm 3-arm 4-arm [Geiger11] 4.3 m 5.4 m 3.3 deg 8.0 deg 58.7% 56.0% – – Ours 5.7 m 4.9 m 2.4 deg 4.3 deg 61.5% 61.3% 18.2% 19.4%

Table : Geometry estimation

T-L error (all) T-L error (>10m) Method 3-arm 4-arm 3-arm 4-arm [Geiger11] 46.7% 49.9% 17.9% 30.1% Ours 15.2% 30.1% 3.6% 14.0%

Table : Tracklet accuracy

  • R. Urtasun ( UofT)

Autonomous Driving Oct 3, 2014 22 / 34

slide-34
SLIDE 34

Semantic Scene Understanding

[H. Zhang, A. Geiger and R. Urtasun, ICCV 2013]

  • R. Urtasun ( UofT)

Autonomous Driving Oct 3, 2014 23 / 34

slide-35
SLIDE 35

An autonomous system has to self-localize

  • R. Urtasun ( UofT)

Autonomous Driving Oct 3, 2014 24 / 34

slide-36
SLIDE 36

Motivation

Localization is crucial for autonomous systems GPS has limitations in terms of reliability and availability Place recognition techniques use image features or depth maps and a database of previously collected images (e.g., Google car) We develop an inexpensive technique for localizing to 3m in unseen regions

  • R. Urtasun ( UofT)

Autonomous Driving Oct 3, 2014 25 / 34

slide-37
SLIDE 37

Humans as an inspiration

Humans are able to use a map, combined with visual input and exploration, to localize effectively Detailed, community developed maps are freely available (OpenStreetMap) How can we exploit maps, combined with visual cues, to localize a vehicle?

  • R. Urtasun ( UofT)

Autonomous Driving Oct 3, 2014 26 / 34

slide-38
SLIDE 38

Probabilistic Localization using Visual Odometry

Visual odometry provides a strong source of information for localization Visual odometry has some issues Over short time periods it can be noisy and highly ambiguous Over long time periods it drifts when integrated We adopt a probabilistic approach to represent and maintain this uncertainty

[Geiger et al, IV 2011]

  • R. Urtasun ( UofT)

Autonomous Driving Oct 3, 2014 27 / 34

slide-39
SLIDE 39

Probabilistic Localization using Visual Odometry

Maps can be considered as a graph Nodes of the graph represent street segments Edges represent intersections and allowed transitions between these segments Position is defined by the current street and the distance travelled d, and

  • rientation θ on that street
  • R. Urtasun ( UofT)

Autonomous Driving Oct 3, 2014 28 / 34

slide-40
SLIDE 40

Probabilistic Localization using Visual Odometry

The complete state includes ut the current street segment st = (dt, θt, dt−1, θt−1) the current and previous position and

  • rientation on the street segment

Odometry observation y1:t = (y1, · · · , yt) Localization is formulated as posterior inference p(ut, st|y1:t) ∝ p(yt|ut, st)

  • likelihood
  • ut−1
  • p(ut|ut−1, st−1)
  • street transition

p(st|ut, ut−1, st−1)

  • pose transition

p(ut−1, st−1|y1:t−1)

  • previous posterior

dst−1

  • R. Urtasun ( UofT)

Autonomous Driving Oct 3, 2014 29 / 34

slide-41
SLIDE 41

Results

[M. Brubaker, A. Geiger and R. Urtasun, CVPR13 best paper runner up award]

2 , 1 5 k m

  • f

r

  • a

d

  • R. Urtasun ( UofT)

Autonomous Driving Oct 3, 2014 30 / 34

slide-42
SLIDE 42

Ambiguous Sequences

  • R. Urtasun ( UofT)

Autonomous Driving Oct 3, 2014 31 / 34

slide-43
SLIDE 43

Quantitative Experiments

Average Stereo Odometry Monocular Odometry Map Projection Position Error 3.1m 18.4m 1.4m Heading Error 1.3° 3.6°

  • Localization Time

36s 62s

  • Initial Map Size (km of road)

50.0 10.0 2.0

  • R. Urtasun ( UofT)

Autonomous Driving Oct 3, 2014 32 / 34

slide-44
SLIDE 44

Acknowledgements

Marcus Brubaker Andreas Geiger Tamir Hazan Philip Lenz David McAllester Jian Peng Alex Schwing Christoph Stiller Koichiro Yamaguchi Hongyi Zhang

  • R. Urtasun ( UofT)

Autonomous Driving Oct 3, 2014 33 / 34

slide-45
SLIDE 45

Conclusions

Autonomous systems should Sense the environment: stereo, flow, layout estimation Recognize the 3D world: detection, segmentation Interact with it We can do fairly complex reasoning with cheap sensors (i.e., 1 or 2 cameras) Near Future: Close the loop between localization and semantics: use of maps Learning deep structure models Online memory/computation bounded tracking Real-time: HW accelerators

  • R. Urtasun ( UofT)

Autonomous Driving Oct 3, 2014 34 / 34