Geometry-Aware Deep Visual Learning Katerina Fragkiadaki zebras - PowerPoint PPT Presentation

Geometry-Aware Deep Visual Learning Katerina Fragkiadaki

zebras How this talk fits the workshop • We will discuss new neural architectures for video understanding and feature learning without human annotations • We will still use SGD to train the models

zebras What is the goal of computer vision?

label image pixels, detect and segment objects Image from Bruno Olshausen

zebras label image pixels, detect and segment objects K. He et al., MaskRCNN, 2017

Registration against known HD maps, 3D object detection, 3D motion forecasting

Image Understanding as Inverse Graphics

zebras A reasonable answer: the goal of computer vision is task specific

Internet Vision Photos taken by people (and uploaded on the Internet) Mobile (Embodied) Computer Vision Photos taken by a NAO robot during a robot soccer game Our detectors may not work very well here…

Internet Vision Photos taken by people (and uploaded on the Internet) Mobile (Embodied) Computer Vision Photos taken by a NAO robot during a robot soccer game Our detectors may not work very well here… Do we have more suitable models for this domain?

Why Embodied Computer Vision Matters 1.Agents that move around in the world, perceive the world and accomplish tasks is (close to) the goal of AI research 2.It may be the key towards unsupervised visual feature learning `` We must perceive in order to move, but we must also move in order to perceive” JJ Gibson Ecological Approach to Visual Perception, Gibson, 1979

zebras Internet and Mobile Perception have developed independently and have each made great progress • Internet vision has trained great DeepNets for image labelling and object detection+segmentation • Mobile computer vision has produced great SLAM (Simultaneous Localization and Mapping) methods

Image Understanding as Inverse Graphics ? Should we be engineering a different model for every domain?

Image Understanding as Inverse Graphics Blocks world Computed 3D model rendered Larry Roberts Input image Image gradient from a new viewpoint Machine perception of Three-Dimensional solids, MIT 1965

Image Understanding as Inverse Graphics David Marr 1982

3D Models are impossible and unecessary Steering angle ``Internal world models which are complete representations of the external environment, besides being impossible to obtain, are not at all necessary for agents to act in a competent manner.” ``…(1) eventually computer vision will catch up and provide such world models—-I don't believe this based on the biological evidence presented below, or (2) complete objective models of reality are unrealistic and hence the methods of Artificial Intelligence that rely on such models are unrealistic.” “ Intelligence without reason ”, IJCAI, Rodney Brooks (1991)

25 years later iRobot vacuum cleaner is building a map! (Rodney Brooks co-founded iRobot in 1990)

To 3D or not to 3D?

And if to 3D, what 3D representation to use? ? depth map surface normals 3d mesh 3d point cloud 3d voxel occupancy

This talk: To 3D using 3D feature tensors H × W × D × C 3 spatial dimensions, 1 feature dimension

Geometry-Aware Recurrent Networks 1.Hidden state: A 4D deep feature tensor, akin to a 3D (feature as opposed to pointcloud) map of the scene 2.Egomotion-stabilized hidden state updates t R, t

2D Recurrent networks, LSTMs, CONVLSTMs,.. h t +1 h t +2 h t CNN CNN CNN T

4D latent state h t h t R, t Egomotion CNN CNN CNN T

4D latent state h t h t h t h t +1 R, t R, t Egomotion Egomotion CNN CNN CNN T

4D latent state h t h t +1 h t h t h t h t +2 R, t R, t Egomotion Egomotion CNN CNN CNN T

Geometry-Aware Recurrent Networks (GRNNs) H × W × D × C

GRNNs t R, t • A set of differentiable neural modules to learn to go from 2D to 3D and back • A lot of SLAM ideas into the neural modules

Unprojection (2D to 3D)

Rotation azimuth elevation

Egomotion-stabilized memory update 3D feature memory Relative Rotation R cross convolution Unprojection Rotation

Egomotion-stabilized memory update Hidden state update h t h t +1 Rotation − R Unprojection

Projection (3D to 2D) d

Training GRNNs 1.Self-supervised via predicting images the agent will see under novel viewpoints 2.Supervised for 3D object detection

Image generation rotate to query view Image generator View prediction project

3 input views 2D RNN [1] GRNN [1] Neural scene representation and rendering DeepMind, Science, 2018

3 input views 2D RNN [1] GRNN Testing on scenes with more objets than train time [1] Neural scene representation and rendering DeepMind, Science, 2018

View prediction geometry-aware RNN 2D RNN [1]

3D Object Detection RPN 3D version of MaskRCNN

Results - 3D object detection

3D object detection predicted segmentations predicted boxes input gt prediction views front-view bird-view Objects detections learn to perist in time, they do not switch on and off from frame to frame

GRNNs Differentiable SLAM for better space-aware deep feature learning • Generative model of scenes with a 3D bottleneck when trained • from view prediction Generalize better than 2D models •

What’s next? • Use GRNNs for tracking, dynamics, learning, perceptual front-end for RL, robotic learning

Thank you! Fish Tung Ziyan Wang Ricson Chen • Learning spatial common sense with geometry-aware recurrent networks, F. Tung, R. Cheng, K.F., arxiv • Geometry-Aware Recurrent Neural Networks for Active Visual Recognition , R. Cheng, Z. Wang, K.F., NIPS 2018

Geometry-Aware Deep Visual Learning Katerina Fragkiadaki zebras - PowerPoint PPT Presentation

Geometry-Aware Deep Visual Learning Katerina Fragkiadaki zebras How this talk fits the workshop We will discuss new neural architectures for video understanding and feature learning without human annotations We will still use SGD to

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

Stochastic geometry and random generation 1 Stochastic geometry and random generation

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from

Biovision team 2 Retina Visual cortex 3 Retina Visual cortex 3 Retina Visual cortex 3

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

48-175 Descriptive Geometry Basic Concepts of Descriptive Geometry Descriptive geometry is

CHRONIC CHRONIC VISUAL LOSS VISUAL LOSS Wasu Supakornthanasarn, MD. Visual loss Sensory

A Model of Visual Imagery A Model of Visual Imagery John Abbondanza, OD, FCOVD John Abbondanza,

Overview Overview Visual displays Visual displays Visual and tactile displays Visual and

Toolkit to Support Intelligibility in Context Aware Applications Context-Aware Applications P

AGN deep multiwavelength AGN deep multiwavelength AGN deep multiwavelength surveys: surveys:

Hyperbolic Geometry Victor Gonzalez Mentor: Ryan Kirk May 4, 2016 Hyperbolic Geometry We are

Geometry Problems Geometry Problems Examples for Typical ACM Instances Elementary Geometry

3d Geometry for Computer Graphics Lesson 1: Basics & PCA 3d geometry 3d geometry 3d

Deep Learning: Theory and Practice Deep Learning - Practical 02-04-2020 Considerations

Recap by Milo Davies, SAS NZ POWERFUL ADAPTIVE OPEN UNIFIED SAS Visual Analytics SAS Visual

Sparse Attentive Backtracking: Temporal credit assignment through reminding Nan Rosemary Ke 1,2 ,

Normalizing tweets with edit scripts and recurrent neural embeddings Grzegorz Chrupaa |

Understanding Hidden Memories of Recurrent Neural Networks Yao Ming , Shaozu Cao, Ruixiang Zhang,

Introduction to the course RECURREN T N EURAL N ETW ORK S F OR LAN GUAGE MODELIN G IN P YTH ON

Neural Network Part 4: Recurrent Neural Networks Yingyu Liang Computer Sciences 760 Fall 2017

Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) CMSC 678 UMBC Recap

Natural Language Processing with Deep Learning Language Modeling with Recurrent Neural Networks

Recurrent Neural Networks CS 6956: Deep Learning for NLP Overview 1. Modeling sequences 2.

Geometry-Aware Deep Visual Learning Katerina Fragkiadaki zebras - PowerPoint PPT Presentation

Geometry-Aware Deep Visual Learning Katerina Fragkiadaki zebras How this talk fits the workshop We will discuss new neural architectures for video understanding and feature learning without human annotations We will still use SGD to

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

Stochastic geometry and random generation 1 Stochastic geometry and random generation

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from

Biovision team 2 Retina Visual cortex 3 Retina Visual cortex 3 Retina Visual cortex 3

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

48-175 Descriptive Geometry Basic Concepts of Descriptive Geometry Descriptive geometry is

CHRONIC CHRONIC VISUAL LOSS VISUAL LOSS Wasu Supakornthanasarn, MD. Visual loss Sensory

A Model of Visual Imagery A Model of Visual Imagery John Abbondanza, OD, FCOVD John Abbondanza,

Overview Overview Visual displays Visual displays Visual and tactile displays Visual and

Toolkit to Support Intelligibility in Context Aware Applications Context-Aware Applications P

AGN deep multiwavelength AGN deep multiwavelength AGN deep multiwavelength surveys: surveys:

Hyperbolic Geometry Victor Gonzalez Mentor: Ryan Kirk May 4, 2016 Hyperbolic Geometry We are

Geometry Problems Geometry Problems Examples for Typical ACM Instances Elementary Geometry

3d Geometry for Computer Graphics Lesson 1: Basics &amp; PCA 3d geometry 3d geometry 3d

Deep Learning: Theory and Practice Deep Learning - Practical 02-04-2020 Considerations

Recap by Milo Davies, SAS NZ POWERFUL ADAPTIVE OPEN UNIFIED SAS Visual Analytics SAS Visual

Sparse Attentive Backtracking: Temporal credit assignment through reminding Nan Rosemary Ke 1,2 ,

Normalizing tweets with edit scripts and recurrent neural embeddings Grzegorz Chrupaa |

Understanding Hidden Memories of Recurrent Neural Networks Yao Ming , Shaozu Cao, Ruixiang Zhang,

Introduction to the course RECURREN T N EURAL N ETW ORK S F OR LAN GUAGE MODELIN G IN P YTH ON

Neural Network Part 4: Recurrent Neural Networks Yingyu Liang Computer Sciences 760 Fall 2017

Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) CMSC 678 UMBC Recap

Natural Language Processing with Deep Learning Language Modeling with Recurrent Neural Networks

Recurrent Neural Networks CS 6956: Deep Learning for NLP Overview 1. Modeling sequences 2.

3d Geometry for Computer Graphics Lesson 1: Basics & PCA 3d geometry 3d geometry 3d