Modeling Traffic Scenes for r In Intelligent Vehicles using CNN-based Detection and Ori rientation Estimation Carlos Guindel, David Martín and José María Armingol Intelligent Systems Laboratory (LSI) · Universidad Carlos III de Madrid Sevilla · 23 November 2017
Agenda 2 Introduction Obstacle detection Scene modeling Results Conclusion Modeling Traffic Scenes for Intelligent Vehicles using CNN-based Detection and Orientation Estimation C. Guindel et al. · ROBOT 2017
Agenda 3 Introduction Obstacle detection Scene modeling Results Conclusion Modeling Traffic Scenes for Intelligent Vehicles using CNN-based Detection and Orientation Estimation C. Guindel et al. · ROBOT 2017
Introduction 4 Automated vehicles • Highly dynamic, semi-structured environments • They have to handle complex situations Convolutional Classification Neural Networks • A basic • Close-to-market assemblies requirement • An accurate • Feature learning • Rich data for driving estimation of • The new source tasks paradigm in the class is computer vision essential Obstacle Vision-based detection approaches Modeling Traffic Scenes for Intelligent Vehicles using CNN-based Detection and Orientation Estimation C. Guindel et al. · ROBOT 2017
IVVI 2.0 project 5 I NTELLIGENT V EHICLE BASED ON V ISUAL I NFORMATION 2.0 Multi-layer lidar scanner Trinocular stereo cam. Computer with GPU Side-looking +info: cameras uc3m.es/islab Modeling Traffic Scenes for Intelligent Vehicles using CNN-based Detection and Orientation Estimation C. Guindel et al. · ROBOT 2017
System overview 6 • Two main branches intended to run in parallel • Obstacle detection • Features are extracted exclusively from the left stereo image • Scene modeling • Stereo-based 3D reconstruction & flat-ground assumption Modeling Traffic Scenes for Intelligent Vehicles using CNN-based Detection and Orientation Estimation C. Guindel et al. · ROBOT 2017
Agenda 7 Introduction Obstacle detection Scene modeling Results Conclusion Modeling Traffic Scenes for Intelligent Vehicles using CNN-based Detection and Orientation Estimation C. Guindel et al. · ROBOT 2017
Faster R-CNN framework 8 Parameters are learned through a multi-task loss Conv. features in these regions are pooled for classification A RPN generates proposals wrt. a fixed set of anchors Convolutional features computed only once per image S. Ren, K. He, R. Girshick , and J. Sun, “Faster R -CNN: Towards Real-Time Object Detection with Region Proposal Networks,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 39, no. 6, pp. 1137 – 1149, 2016. Modeling Traffic Scenes for Intelligent Vehicles using CNN-based Detection and Orientation Estimation C. Guindel et al. · ROBOT 2017
Viewpoint estimation 9 • Faster R-CNN framework was modified to introduce viewpoint inference C. Guindel, D. Martin, and J. M. Armingol , “Joint object detection and viewpoint estimation using CNN features,” in Proc. of the IEEE International Conference on Vehicular Electronics and Safety (ICVES), 2017, pp. 145 – 150. Modeling Traffic Scenes for Intelligent Vehicles using CNN-based Detection and Orientation Estimation C. Guindel et al. · ROBOT 2017
Discrete viewpoint inference 10 • Every object is assigned a bin 𝑂 𝑐 angle bins Θ 𝑗 … Θ 𝑂 𝑐 Training: 𝜄 𝑗 0 → Θ 𝑗 𝑂 𝑐 = 8 • Inference gives a categorial distribution Inference output: r ∈ Δ 𝑂 𝑐 −1 𝑠 Final estimation: Θ 𝑗 ∗ → መ 𝜄 Elements of 𝑠 Modeling Traffic Scenes for Intelligent Vehicles using CNN-based Detection and Orientation Estimation C. Guindel et al. · ROBOT 2017
Joint detection and viewpoint estimation 11 B. Box Class Viewpoint regression Softmax Softmax Softmax FC layer FC layer FC layer Fully connected Only 𝑂 𝑐 · 𝐿 · 4096 (FC) layers new weights Fixed size feat. vector 𝑂 𝑐 · 𝐿 Proposal … elements angle bins 𝑂 𝑐 Feature map classes · 𝐿 Modeling Traffic Scenes for Intelligent Vehicles using CNN-based Detection and Orientation Estimation C. Guindel et al. · ROBOT 2017
Loss function and training 12 • Unweighted muli-task loss with five components Logistic loss for RPN objectness … Faster R-CNN loss Smooth-L1 loss 𝑂 𝑐 angle bins of the for RPN b.box regression ground truth class Logistic loss for class Normalized on the batch Ground-truth angle bin Logistic loss Smooth-L1 loss for viewpoint for b.box regression estimation Joint Object Detection and Viewpoint Estimation using CNN features Carlos Guindel · ICVES 2017
Agenda 13 Introduction Obstacle detection Scene modeling Results Conclusion Modeling Traffic Scenes for Intelligent Vehicles using CNN-based Detection and Orientation Estimation C. Guindel et al. · ROBOT 2017
Scene modeling 14 3D R ECONSTRUCTION Left image 3D point Disparity P.C. Disparity cloud map generator Right image Modeling Traffic Scenes for Intelligent Vehicles using CNN-based Detection and Orientation Estimation C. Guindel et al. · ROBOT 2017
Scene modeling 15 3D R ECONSTRUCTION Left image 3D point Disparity P.C. Disparity cloud map generator Right image Modeling Traffic Scenes for Intelligent Vehicles using CNN-based Detection and Orientation Estimation C. Guindel et al. · ROBOT 2017
Scene modeling 16 3D R ECONSTRUCTION Left image 3D point Disparity P.C. Disparity cloud map generator Right image SGM stereo matching Suitable for environments with lack of texture, illumination changes, etc. Modeling Traffic Scenes for Intelligent Vehicles using CNN-based Detection and Orientation Estimation C. Guindel et al. · ROBOT 2017
Scene modeling 17 3D R ECONSTRUCTION Left image 3D point Disparity P.C. Disparity cloud map generator Right image Pin-hole + disparity We build a XYZRGB cloud from the left image and the disparity map Modeling Traffic Scenes for Intelligent Vehicles using CNN-based Detection and Orientation Estimation C. Guindel et al. · ROBOT 2017
Scene modeling 18 3D R ECONSTRUCTION Left image 3D point Disparity P.C. Disparity cloud map generator Right image E XTRINSIC P ARAMETERS A UTO - CALIBRATION Camera-to- 3D point Plane Calibration Plane world from plane cloud model segmentation calibration Modeling Traffic Scenes for Intelligent Vehicles using CNN-based Detection and Orientation Estimation C. Guindel et al. · ROBOT 2017
Scene modeling 19 Voxel grid dowsampling The cloud from the 3D reconstruction pipeline is downsampled (grid size: 20 cm) E XTRINSIC P ARAMETERS A UTO - CALIBRATION Camera-to- 3D point Plane Calibration Plane world from plane cloud model segmentation calibration Modeling Traffic Scenes for Intelligent Vehicles using CNN-based Detection and Orientation Estimation C. Guindel et al. · ROBOT 2017
Scene Modeling 20 …Pass through filters Vertical axis: 0-2 m Depth axis: 0-20 m Planar segmentation Using RANSAC with a 10 cm threshold, and a small angular tolerance. E XTRINSIC P ARAMETERS A UTO - CALIBRATION Camera-to- 3D point Plane Calibration Plane world from plane cloud model segmentation calibration Modeling Traffic Scenes for Intelligent Vehicles using CNN-based Detection and Orientation Estimation C. Guindel et al. · ROBOT 2017
Scene modeling 21 Plane: pitch roll E XTRINSIC P ARAMETERS A UTO - CALIBRATION Camera-to- 3D point Plane Calibration Plane world from plane cloud model segmentation calibration Modeling Traffic Scenes for Intelligent Vehicles using CNN-based Detection and Orientation Estimation C. Guindel et al. · ROBOT 2017
Object localization 22 Modeling Traffic Scenes for Intelligent Vehicles using CNN-based Detection and Orientation Estimation C. Guindel et al. · ROBOT 2017
Object localization 23 Filtered cloud Organized Without points pointcloud belonging to the ground or too close to the camera Object ROI for localization 11 central rows of the object’s Coordinates on the world’s bounding box XY plane for every point 𝒚 = (𝑦, 𝑧, 𝑨) 𝜄 Top-down view 𝒚 𝑝𝑐𝑘 = 𝑛𝑓𝑒𝑗𝑏𝑜 𝒚 y 𝑧 𝑝𝑐𝑘 𝜄 𝑝𝑐𝑘 = 𝛽 − atan2(𝑧 𝑝𝑐𝑘 − 𝑦 𝑝𝑐𝑘 ) 𝑦 𝑝𝑐𝑘 yaw x world Modeling Traffic Scenes for Intelligent Vehicles using CNN-based Detection and Orientation Estimation C. Guindel et al. · ROBOT 2017
Agenda 24 Introduction Obstacle detection Scene modeling Results Conclusion Modeling Traffic Scenes for Intelligent Vehicles using CNN-based Detection and Orientation Estimation C. Guindel et al. · ROBOT 2017
Results: Detection and viewpoint estimation 25 • KITTI Object Detection Benchmark • 5,576 images for training and 2,065 for validation • Labels for class and orientation available • Evaluation metric • Average Orientation Similarity (AOS) Modeling Traffic Scenes for Intelligent Vehicles using CNN-based Detection and Orientation Estimation C. Guindel et al. · ROBOT 2017
Results: Detection and viewpoint estimation 26 • Two different architectures: • ZF (lightweight) and VGG 16-layer (more complex) • Three different scales (height in pixels): • 375, 500, 625 (ms) Top-performing 88,43 66,28 63,41 N.A. N.A. N.A. 2 sec. comparable method in the KITTI ranking Modeling Traffic Scenes for Intelligent Vehicles using CNN-based Detection and Orientation Estimation C. Guindel et al. · ROBOT 2017
Recommend
More recommend