Rich representations for Rich representations for learning visual - PowerPoint PPT Presentation

Rich representations for Rich representations for learning visual recognition learning visual recognition g g g g Jitendra Malik Jitendra Malik Jitendra Malik Jitendra Malik University of California at Berkeley University of California at Berkeley

Detection can be very fast Detection can be very fast Detection can be very fast Detection can be very fast  On a task of judging animal vs no O O On a task of judging animal vs no k f j d i k f j d i i i l l animal, humans can make mostly correct animal, humans can make mostly correct saccades in 150 ms (Kirchner & Thorpe, saccades in 150 ms (Kirchner & Thorpe, ( ( p , p , 2006) 2006)  Comparable to synaptic delay in the retina, C C Comparable to synaptic delay in the retina, bl bl i d l i d l i i h h i i LGN, V1, V2, V4, IT pathway. LGN, V1, V2, V4, IT pathway.  Doesn’t rule out feed back but shows feed Doesn’t rule out feed back but shows feed forward only is very powerful f f forward only is very powerful d d l i l i f l f l  Detection and categorization are Detection and categorization are practically simultaneous (Grill practically simultaneous (Grill-Spector & practically simultaneous (Grill practically simultaneous (Grill Spector & Spector & Spector & Kanwisher, 2005) Kanwisher, 2005)

Rolls et al (2000) Rolls et al (2000) Rolls et al (2000) Rolls et al (2000)

Some opinions Some opinions… Some opinions… Some opinions  A hierarchical, mostly A hierarchical, mostly feedforward feedforward network is network is the right model, the question is how to train it the right model, the question is how to train it g g , , q q  Unsupervised, Unsupervised, sparsity sparsity encouraging techniques encouraging techniques are promising for lower layers are promising for lower layers are promising for lower layers are promising for lower layers  But so far the success of this approach at the But so far the success of this approach at the higher stages has not yet been demonstrated higher stages has not yet been demonstrated

Insights from child development Insights from child development Insights from child development Insights from child development •Trying to learn object recognition from bounding boxes is like trying to learn language from a list of sentences. y g g g • The development of visual recognition, like language acquisition benefits from supportive “scaffolding” acquisition, benefits from supportive scaffolding  Grouping and tracking can play an important role by helping solve the correspondence problem. In a machine vision system, we can “cheat” by supplying keypoint correspondences

Detecting and Segmenting People Where are they? What are they wearing? What are they doing? Jitendra Malik Jitendra Malik UC Berkeley This is joint work with L. Bourdev, S. Maji and T. Brox. Th s s jo t wo w th . ou dev, S. Maj a d T. o .

Trying to extract stick figures is hard Trying to extract stick figures is hard (and unnecessary!) (and unnecessary!) Generalized cylinders (Marr & Nishihara, Binford) Pictorial Structures (Felszenswalb & Huttenlocher)

All the wrong limbs… All the wrong limbs… g

High High-Level Computer Vision High High Level Computer Vision Level Computer Vision Level Computer Vision

High High-Level Computer Vision High High Level Computer Vision Level Computer Vision Level Computer Vision Object Recognition Object Recognition person person van an person dog

High High-Level Computer Vision High High Level Computer Vision Level Computer Vision Level Computer Vision Object Recognition Object Recognition person person van an Semantic Segmentation person dog

High High-Level Computer Vision High High Level Computer Vision Level Computer Vision Level Computer Vision Object Recognition Object Recognition Facing the camera Semantic Segmentation Pose Estimation Pose Estimation In a back view Facing back, head to the right

High High-Level Computer Vision High High Level Computer Vision Level Computer Vision Level Computer Vision Walking away g y Object Recognition Object Recognition talking Semantic Segmentation Pose Estimation Pose Estimation Action Recognition

High High High Level Computer Vision High-Level Computer Vision Level Computer Vision Level Computer Vision Object Recognition Object Recognition blue GMC van Semantic Segmentation Pose Estimation Pose Estimation Action Recognition Man with elderly white glasses and a Attribute Classification Attribute Classification man with a coat baseball hat Entlebucher mountain dog

High High High Level Computer Vision High-Level Computer Vision Level Computer Vision Level Computer Vision “A blue GMC van Object Recognition Object Recognition parked, in a back view” k d i b k i ” Semantic Segmentation Pose Estimation Pose Estimation Action Recognition “A man with glasses g Attribute Classification Attribute Classification “An elderly man with a An elderly man with a and a coat, facing back, hat and glasses, facing walking away” the camera and talking” “An entlebucher m mountain dog sitting in nt in d sittin in a bag”

Person Detection is Challenging Person Detection is Challenging g g g g Clothing Clothing Occlusion Occlusion No silhouette Accessories Articulation Viewpoint Wrinkles

How can we make the problem harder? How can we make the problem harder? p  Solution: Severely limit the supervision Solution: Severely limit the supervision

The best approach in such setup? The best approach in such setup? pp pp p p Part 2 fires on left torso …but sometimes on ½ of the head head Learned part Learned part Learned part Learned part location penalty location penalty Part 5 fires on one leg… …or both legs g  Divide  Divide Divide and Divide-and and-conquer: One global template + five parts and conquer: One global template + five parts conquer: One global template + five parts conquer: One global template + five parts  Positions and appearance of parts trained jointly (Latent SVM) Positions and appearance of parts trained jointly (Latent SVM)  Mixture of models for various poses (standing, sitting, etc) Mi Mi Mixture of models for various poses (standing, sitting, etc) f f d l f d l f i i ( ( di di i i i i ) )  Parts are not well localized and have large appearance variations Parts are not well localized and have large appearance variations [Felzenszwalb Felzenszwalb et al. PAMI 2010] et al. PAMI 2010]

Radical idea: What if, instead, we try to Radical idea: What if, instead, we try to make the problem easier? make the problem easier? make the problem easier? make the problem easier? Nose Right Shoulder Left Shoulder f Sh ld Right Elbow Left Elbow [Bourdev and Malik, ICCV 2009] [Bourdev and Malik, ICCV 2009]

Can we build upon the success of Can we build upon the success of faces and pedestrians? faces and pedestrians?  Both do template matching  Both do template matching Both do template matching Both do template matching  Capture salient and common patterns Capture salient and common patterns  Are these the only two salient & common patterns? Are these the only two salient & common patterns?  But how are we going to create the training set? But how are we going to create the training set?

Agenda Agenda Agenda Agenda  Poselets Poselets  Training a Training a poselet g p poselet  Selecting a good set of Selecting a good set of poselets poselets  Impro ing  Improving Impro ing poselets Improving poselets poselets with context poselets with context ith conte t ith conte t  Detection with Detection with poselets poselets  Segmentation Segmentation  Attributes  Attributes Attributes Attributes  Action Recognition Action Recognition

Examples of poselets Examples of poselets Examples of poselets Examples of poselets Patches are often far Patches are often far visually Patches are often far Patches are often far visually visually , but they are close visually , but they are close , but they are close semantically , but they are close semantically semantically semantically

Agenda Agenda Agenda Agenda  Poselets Poselets  Training a Training a poselet g p poselet  Selecting a good set of Selecting a good set of poselets poselets  Impro ing  Improving Impro ing poselets Improving poselets poselets with context poselets with context ith conte t ith conte t  Detection with Detection with poselets poselets  Segmentation Segmentation  Attributes  Attributes Attributes Attributes  Action Recognition Action Recognition

How do we train a How do we train a poselet poselet for a for a given pose configuration? given pose configuration?

Rich representations for Rich representations for learning visual - PowerPoint PPT Presentation

Rich representations for Rich representations for learning visual recognition learning visual recognition g g g g Jitendra Malik Jitendra Malik Jitendra Malik Jitendra Malik University of California at Berkeley University of California

61A Lecture 16 Announcements String Representations String Representations 4 String

THE GOOD Nutritional value of seafood: Rich source of vitamins Rich source of minerals Rich

modelling rich interaction sensor-based systems statusevent analysis rich set of

Fourier transform for nilpotent Lie groups Index sets and representations Granada Index sets

iRUI - The Rich Client and i iRUI - The Rich Client and i Pluta Brothers Design, Inc. Joe Pluta

Status of the CBM- and HADES RICH projects at FAIR C. Pauly, Wuppertal University for the CBM

NetBeans Rich Client Platform Simpletests Anton Epple Eppleton IT Consulting NetBeans Rich

The MPGD-Based Photon Detectors for the upgrade of COMPASS RICH-1 and beyond S. Dalla Torre

USE OF GEANT4 FOR LHCB RICH SIMULATION S. Easo, RAL, 5-7-2001 LHCB AND ITS RICH DETECTORS.

RICH DETECTORS Giulia Meo University of Heidelberg 27 January 2017 1/30 Cherenkov Radiation

Negotiating Commercial Loan Covenants, Representations and Warranties Representations and

On SAT representations of XOR constraints (towards a theory of good SAT representations) Oliver

New formula representations of high- New formula representations of high- latitude O + +

Abstract rule representations in a Abstract rule representations in a bilinear model bilinear

Image Processing Todays Class Image Representations: Matrices Image Representations: RGB,

CSC321 Lecture 7: Distributed Representations Roger Grosse Roger Grosse CSC321 Lecture 7:

How to use Video AI at the Edge Magic Pao, Director, Advantech Industrial IOT Industrial AI

A Question Topic 14 p public class WordList { bli l W dLi t { Iterators private

Kleene algebras with implication Hern an Javier San Mart n CONICET Departamento de

Benchmarking Non-First-Come-First-Served Component Allocation in an Assemble-To-Order System Kai

CpSc 875 CpSc 875 John D McGregor John D. McGregor Class 4 Driving requirements Driving

Simultaneous meta and data manipulation in Blaise Marien Lina Statistics netherlands Statistics

CESNET e Infrastructure Storage services vision and plans Storage services vision and

South Carolina Electric & Gas Company p y Virgil C. Summer Nuclear Station Units 2 and 3

Sambuz

Useful Links

Newsletter

Mail Us

Rich representations for Rich representations for learning visual - PowerPoint PPT Presentation

Rich representations for Rich representations for learning visual recognition learning visual recognition g g g g Jitendra Malik Jitendra Malik Jitendra Malik Jitendra Malik University of California at Berkeley University of California

61A Lecture 16 Announcements String Representations String Representations 4 String

THE GOOD Nutritional value of seafood: Rich source of vitamins Rich source of minerals Rich

modelling rich interaction sensor-based systems statusevent analysis rich set of

Fourier transform for nilpotent Lie groups Index sets and representations Granada Index sets

iRUI - The Rich Client and i iRUI - The Rich Client and i Pluta Brothers Design, Inc. Joe Pluta

Status of the CBM- and HADES RICH projects at FAIR C. Pauly, Wuppertal University for the CBM

NetBeans Rich Client Platform Simpletests Anton Epple Eppleton IT Consulting NetBeans Rich

The MPGD-Based Photon Detectors for the upgrade of COMPASS RICH-1 and beyond S. Dalla Torre

USE OF GEANT4 FOR LHCB RICH SIMULATION S. Easo, RAL, 5-7-2001 LHCB AND ITS RICH DETECTORS.

RICH DETECTORS Giulia Meo University of Heidelberg 27 January 2017 1/30 Cherenkov Radiation

Negotiating Commercial Loan Covenants, Representations and Warranties Representations and

On SAT representations of XOR constraints (towards a theory of good SAT representations) Oliver

New formula representations of high- New formula representations of high- latitude O + +

Abstract rule representations in a Abstract rule representations in a bilinear model bilinear

Image Processing Todays Class Image Representations: Matrices Image Representations: RGB,

CSC321 Lecture 7: Distributed Representations Roger Grosse Roger Grosse CSC321 Lecture 7:

How to use Video AI at the Edge Magic Pao, Director, Advantech Industrial IOT Industrial AI

A Question Topic 14 p public class WordList { bli l W dLi t { Iterators private

Kleene algebras with implication Hern an Javier San Mart n CONICET Departamento de

Benchmarking Non-First-Come-First-Served Component Allocation in an Assemble-To-Order System Kai

CpSc 875 CpSc 875 John D McGregor John D. McGregor Class 4 Driving requirements Driving

Simultaneous meta and data manipulation in Blaise Marien Lina Statistics netherlands Statistics

CESNET e Infrastructure Storage services vision and plans Storage services vision and

South Carolina Electric &amp; Gas Company p y Virgil C. Summer Nuclear Station Units 2 and 3

Sambuz

Useful Links

Newsletter

Mail Us

South Carolina Electric & Gas Company p y Virgil C. Summer Nuclear Station Units 2 and 3