Anticipating the Unseen and Unheard for Embodied Perception Kristen - PowerPoint PPT Presentation

Anticipating the Unseen and Unheard for Embodied Perception Kristen Grauman University of Texas at Austin Facebook AI Research

Visual recognition: significant recent progress Big labeled Deep learning datasets ImageNet top-5 error (%) GPU technology Kristen Grauman

The Web photo perceptual experience A “disembodied” well-curated moment in time A “disembodied” well-curated moment in time BSD (2001) PASCAL (2007-12) Caltech 101 (2004), Caltech 256 (2006) LabelMe (2007) ImageNet (2009) SUN (2010) Places (2014) MS COCO (2014) Visual Genome (2016)

Egocentric perceptual experience A tangle of relevant and irrelevant A tangle of relevant and irrelevant multi-sensory information multi-sensory information Kristen Grauman

Big picture goal: Embodied perception Status quo : Learning and inference with “disembodied” snapshots. On the horizon: Visual learning in the context of action, motion, and multi-sensory observations. Kristen Grauman

Anticipating the unseen and unheard Look-around Affordance Audio-visual policies learning learning Towards embodied perception Kristen Grauman

Active perception From learning representations to learning policies Bajcsy 1985, Aloimonos 1988, Ballard 1991, Wilkes 1992, Dickinson 1997, Schiele & Crowley 1998, Tsotsos 2001, Denzler 2002, Soatto 2009, Ramanathan 2011, Borotschnig 2011, … Kristen Grauman

End-to-end active recognition Main idea: Deep reinforcement learning approach that anticipates visual changes as a function of egomotion mug? bowl? mug pan? Perception Perception Action selection Evidence fusion Kristen Grauman Jayaraman and Grauman, ECCV 2016, PAMI 2018

End-to-end active recognition Predicted label: T=1 T=2 T=3 [Jayaraman and Grauman, ECCV 2016, PAMI 2018] Kristen Grauman

Goal: Learn to “look around” vs. reconnaissance search and rescue recognition task predefined task unfolds dynamically Can we learn look-around policies for visual agents that are curiosity-driven, exploratory, and generic? Kristen Grauman

Key idea: Active observation completion Completion objective: Learn policy for efficiently inferring (pixels of) all yet-unseen portions of environment Agent must choose where to look before looking there. Jayaraman and Grauman, CVPR 2018 Kristen Grauman

Completing unseen views Encoder-decoder model to infer unseen viewpoints output viewgrid “supervision”: actual 360 scene Kristen Grauman Jayaraman and Grauman, CVPR 2018; Ramakrishnan & Grauman, ECCV 2018

Actively selecting observations Decoder Actor belief visualized model Encoder Reward for fast completion Non-myopic : Train to target a budget of observation time Kristen Grauman Jayaraman and Grauman, CVPR 2018; Ramakrishnan & Grauman, ECCV 2018

Two scenarios Kristen Grauman

Active “look around” results 1-view random large-action large-action+ peek-saliency* ours ModelNet (seen cls) ModelNet (unseen cls) SUN360 4 7.5 38 7.3 3.9 7.1 3.8 6.9 33 per-pixel MSE (x1000) 3.7 6.7 6.5 3.6 28 6.3 3.5 6.1 3.4 23 5.9 3.3 5.7 Learned active look-around policy: quickly grasp 5.5 18 3.2 1 2 3 4 1 2 3 4 5 6 1 2 3 4 Time Time Time environment independent of a specific task Jayaraman and Grauman, CVPR 2018 *Saliency -- Harel et al, Graph based Visual Saliency, NIPS’07

Active “look around” results

Active “look around” Agent’s mental model for 360 scene evolves with actively accumulated glimpses Jayaraman and Grauman, CVPR 2018; Ramakrishnan & Grauman, ECCV 2018

Active “look around” Agent’s mental model for 3D object evolves with actively accumulated glimpses Jayaraman and Grauman, CVPR 2018; Ramakrishnan & Grauman, ECCV 2018

Look-around policy transfer Unsupervised Supervised “beach” Predictor Decoder Task-specific Look-around Look-around Policy Policy Policy Task-specific Look-around encoder encoder Plug observation completion policy in for new task Kristen Grauman

Look-around policy transfer SUN 360 Scenes ModelNet Objects Unsupervised exploratory policy approaches Unsupervised exploratory policy approaches Plug observation completion policy in for active recognition task supervised task-specific policy accuracy! supervised task-specific policy accuracy! Kristen Grauman Jayaraman and Grauman, CVPR 2018

Look-around policy transfer Multiple perception tasks Kristen Grauman Ramakrishnan et al. 2019

Look-around policy transfer Agent navigates 3d environment leveraging active exploration Kristen Grauman

Extreme relative pose from RGB-D scans Input : Pair of RGB-D scans with little or no overlap Output : Rigid transformation (R,t) that separates them scan 1 Transform Transform scan 2 Approach : Alternate between completion and matching Yang et al. CVPR 2019 Kristen Grauman

Extreme relative pose from RGB-D scans GT Ours 4PCS Outperform existing methods on SUNCG / Matterport / ScanNet, particularly for small overlap case (10% to 50%) Kristen Grauman Yang et al. CVPR 2019

360 ° video: a “look around” problem for people Control by mouse Where to look when? Kristen Grauman

AutoCam Output NFOV Video Input 360° Video Automatically select FOV and viewing direction [Su & Grauman, ACCV 2016, CVPR 2017] Kristen Grauman

Object interaction Turn on Increase height Move lamp Replace Embodied Object lightbulb perception system manipulation Kristen Grauman

What actions does an object afford ? Adjustable Toggle-able Replaceable Movable Embodied Object perception system manipulation Kristen Grauman

Current approaches: affordance as semantic segmentation Label “holdable” regions Captures annotators’ expectations of what is important Sawatzky et al. (CVPR 17), Nguyen et al. (IROS 17), Roy et al. (ECCV 16), Myers et al. (ICRA 15), … Kristen Grauman

…but real human behavior is complex Kristen Grauman

How to learn object affordances? V S. Manually curated Real human affordances interactions? Sawatzky et al. (CVPR 17), Nguyen et al. (IROS 17), Roy et al. (ECCV 16), Myers et al. (ICRA 15), … Kristen Grauman

Our idea: Learn directly by watching people (video) [Nagarajan et al. 2019] Kristen Grauman

Learning affordances from video Object at Anticipation Aggregated state network for the action rest Classifier Action LSTM “open” t=0 T [Nagarajan et al. 2019] Kristen Grauman

Extracting interaction hotspot maps ? Anticipation network activations gradients Classifier Action Hypothesize for “Pullable” action a = “pullable” Hotspot Map t=0 T Activation mapping to identify responsible spatial regions [Nagarajan et al. 2019] Kristen Grauman

Wait, is this just action recognition? Action recognition + Grad-CAM Ours No: Hotspot anticipation model maps object at rest to potential for interaction Kristen Grauman

Evaluating interaction hotspots OPRA EPIC Kitchens MS COCO (Fang et al., CVPR 18) (Damen et al., ECCV 18) (Lin et al., ECCV 14) Train on video datasets, generate heatmaps on novel images--- even from unseen categories Kristen Grauman

Results: interaction hotspots Given static image of object at rest, infer affordance regions OPRA data EPIC data Weakly Supervised Strongly Supervised Up to 24% increase vs. weakly supervised methods [Nagarajan et al. 2019] Kristen Grauman

Results: interaction hotspots Kristen Grauman

Results: hotspots for recognition Better low-shot object recognition by anticipating object function Kristen Grauman

Listening to learn woof meow clatter ring Goal : a repertoire of objects and their sounds Challenge : a single audio channel mixes sounds of multiple objects Kristen Grauman

Learning to separate object sounds Our idea: Leverage visual objects to learn from unlabeled video with multiple audio sources Violin Dog Cat Disentangle Object sound models Unlabeled video Apply to separate simultaneous sounds in novel videos Kristen Grauman [Gao, Feris, & Grauman, ECCV 2018]

Results: audio-visual source separation Train on 100,000 unlabeled multi-source video clips, then separate audio for novel video Dataset: AudioSet [Gemmeke et al. 2017] Kristen Grauman [Gao et al. ECCV 2018]

Results: audio-visual source separation Train on 100,000 unlabeled multi-source video clips, then separate audio for novel video Kristen Grauman [Gao et al. ECCV 2018]

Spatial effects in audio Spatial effects absent in monaural audio Cues for spatial hearing: • Interaural time difference (ITD) • Interaural level difference (ILD) • Spectral detail (from pinna reflections) Kristen Grauman Image Credit: Michael Mandel

Our idea: 2.5D visual sound “Lift” mono audio to spatial audio via visual cues Monaural Binaural “Lift” + Kristen Grauman [Gao & Grauman, CVPR 2019]

Anticipating the Unseen and Unheard for Embodied Perception Kristen - PowerPoint PPT Presentation

Anticipating the Unseen and Unheard for Embodied Perception Kristen Grauman University of Texas at Austin Facebook AI Research Visual recognition: significant recent progress Big labeled Deep learning datasets ImageNet top-5 error (%) GPU

NEGATIVE POSITIVE FLUFFY AND IRRELEVANT UNHEARD GOD ONLY GIVES SPECIAL KIDS TO SPECIAL

Unheard Of! Part V GrayHawk Perkins & the Mezcal Jazz Unit 13 Moons the Mobilian Trade

PARADOX THE UPSIDE DOWN TRUTH OF FAITH PARADOX Week 4 Seeing the Unseen to Truly See

COMPREHENSION OF UNSEEN PASSAGES UNSEEN PASSAGES Teacher : Prof. Indu Bora Subject :

Understanding Player Interpretation An Embodied Approach Jonne Arjoranta University of

ISLS: NAPLeS Embodied Cognition and the Learning Sciences Dor Abrahamson Embodied Design

Invitation: the performer-interpreter employs tools of the body, emotion, and audience,

Embodied Machines Artificial vs. Embodied Intelligence Artificial Intelligence (AI)

Response: Pray the Story Keynote Session 2 Sarah Agnew 1 2 Terminology reminder Embodied

Six views of embodied cognition (Wilson, 2002) What is meant by embodied cognition?

EMBODIED CARBON IN THE BUILT ENVIRONMENT: SESSION 5 - REUSE August 17, 2018 Disclaimer Webinar

Making sense of time: The embodied nature of human abstraction Rafael E. Nez Embodied

Embodied Carbon in the Built Environment: Change Through Policy February 16, 2018 Series

Embodied Carbon in MEP design Studies Louise Hamot Global Head of Lifecycle Research

Chapter 22 Dark Matter, Dark Energy, and the Fate of the Universe 22.1 Unseen Influences in the

Chapter 22 Dark Matter, Dark Energy, and 22.1 Unseen Influences in the Cosmos the Fate of the

Patent Law Prof. Roger Ford February 17, 2016 Class 6 Novelty: introduction &

Fun IP Prof. Roger Ford Class 6 February 29, 2016 Patents: Novelty and Statutory Bars

MEDICINES USE AND SAFETY WEBINAR Welcome to the MUS Webinar on The Falsified Medicines

How to find, filter, and format Evidence-based Information on the Benefits and on the Risks of

Anticipation and Consumption Linh T. T Neil Thakral Boston University Brown University Dates

NetQi Elie Bursztein LSV, ENS-Cachan I. Background II.

FIOS: A Fair, Efficient Flash I/O Scheduler Stan Park Kai Shen University of Rochester 1 / 21

The old is dying and the new cannot be born Andrea Saltelli and Silvio Funtowicz Centre

Anticipating the Unseen and Unheard for Embodied Perception Kristen - PowerPoint PPT Presentation

Anticipating the Unseen and Unheard for Embodied Perception Kristen Grauman University of Texas at Austin Facebook AI Research Visual recognition: significant recent progress Big labeled Deep learning datasets ImageNet top-5 error (%) GPU

NEGATIVE POSITIVE FLUFFY AND IRRELEVANT UNHEARD GOD ONLY GIVES SPECIAL KIDS TO SPECIAL

Unheard Of! Part V GrayHawk Perkins &amp; the Mezcal Jazz Unit 13 Moons the Mobilian Trade

PARADOX THE UPSIDE DOWN TRUTH OF FAITH PARADOX Week 4 Seeing the Unseen to Truly See

COMPREHENSION OF UNSEEN PASSAGES UNSEEN PASSAGES Teacher : Prof. Indu Bora Subject :

Understanding Player Interpretation An Embodied Approach Jonne Arjoranta University of

ISLS: NAPLeS Embodied Cognition and the Learning Sciences Dor Abrahamson Embodied Design

Invitation: the performer-interpreter employs tools of the body, emotion, and audience,

Embodied Machines Artificial vs. Embodied Intelligence Artificial Intelligence (AI)

Response: Pray the Story Keynote Session 2 Sarah Agnew 1 2 Terminology reminder Embodied

Six views of embodied cognition (Wilson, 2002) What is meant by embodied cognition?

EMBODIED CARBON IN THE BUILT ENVIRONMENT: SESSION 5 - REUSE August 17, 2018 Disclaimer Webinar

Making sense of time: The embodied nature of human abstraction Rafael E. Nez Embodied

Embodied Carbon in the Built Environment: Change Through Policy February 16, 2018 Series

Embodied Carbon in MEP design Studies Louise Hamot Global Head of Lifecycle Research

Chapter 22 Dark Matter, Dark Energy, and the Fate of the Universe 22.1 Unseen Influences in the

Chapter 22 Dark Matter, Dark Energy, and 22.1 Unseen Influences in the Cosmos the Fate of the

Patent Law Prof. Roger Ford February 17, 2016 Class 6 Novelty: introduction &amp;

Fun IP Prof. Roger Ford Class 6 February 29, 2016 Patents: Novelty and Statutory Bars

MEDICINES USE AND SAFETY WEBINAR Welcome to the MUS Webinar on The Falsified Medicines

How to find, filter, and format Evidence-based Information on the Benefits and on the Risks of

Anticipation and Consumption Linh T. T Neil Thakral Boston University Brown University Dates

NetQi Elie Bursztein LSV, ENS-Cachan I. Background II.

FIOS: A Fair, Efficient Flash I/O Scheduler Stan Park Kai Shen University of Rochester 1 / 21

The old is dying and the new cannot be born Andrea Saltelli and Silvio Funtowicz Centre

Unheard Of! Part V GrayHawk Perkins & the Mezcal Jazz Unit 13 Moons the Mobilian Trade

Patent Law Prof. Roger Ford February 17, 2016 Class 6 Novelty: introduction &