The Three R s of Vision Recognition Reconstruction Reorganization - PowerPoint PPT Presentation

The Three R ’ s of Vision Recognition Reconstruction Reorganization Jitendra Malik UC Berkeley

Recognition, Reconstruction & Reorganization Recognition Reconstruction Reorganization

Fifty years of computer vision 1963-2013 • 1960s: Beginnings in artificial intelligence, image processing and pattern recognition • 1970s: Foundational work on image formation: Horn, Koenderink, Longuet- Higgins … • 1980s: Vision as applied mathematics: geometry, multi-scale analysis, probabilistic modeling, control theory, optimization • 1990s: Geometric analysis largely completed, vision meets graphics, statistical learning approaches resurface • 2000s: Significant advances in visual recognition, range of practical applications

Different aspects of vision • Perception: study the “ laws of seeing ” -predict what a human would perceive in an image. • Neuroscience: understand the mechanisms in the retina and the brain • Function: how laws of optics, and the statistics of the world we live in, make certain interpretations of an image more likely to be valid The match between human and computer vision is strongest at the level of function, but since typically the results of computer vision are meant to be conveyed to humans makes it useful to be consistent with human perception. Neuroscience is a source of ideas but being bio-mimetic is not a requirement.

The Three R ’ s of Vision Recognition Reconstruction Reorganization

The Three R ’ s of Vision Recognition Reconstruction Reorganization Each of the 6 directed arcs in this diagram is a useful direction of information flow

Review • Reconstruction – Feature matching + multiple view geometry has led to city scale point cloud reconstructions • Recognition – 2D problems such as handwriting recognition, face detection successfully fielded in applications. – Partial progress on 3d object category recognition • Reorganization – Progress on bottom-up segmentation hitting diminishing returns – Semantic segmentation is the key problem now

Image-based Modeling • Façade (1996) Debevec, Taylor & Malik – Acquire photographs – Recover geometry (explicit or implicit) – Texture map

Campus Model of UC Berkeley Campanile + 40 Buildings (Debevec et al, 1997)

Arc de Triomphe

The Taj Mahal Taj Mahal modeled from one photograph by G. Borshukov

State of the Art in Reconstruction • Multiple photographs • Range Sensors Kinect (PrimeSense) Velodyne Lidar Agarwal et al (2010) Frahm et al, (2010) Semantic Segmentation is needed to make this more useful…

Shape, Albedo, and Illumination from Shading Jonathan Barron Jitendra Malik UC Berkeley

Forward Optics Far Near shape / depth

Forward Optics Far Near shape / depth illumination

Forward Optics Far Near shape / depth illumination log-shading image of Z and L

Forward Optics Far Near shape / depth illumination log-shading image of Z and L log-albedo / log-reflectance

Forward Optics Far Near shape / depth illumination log-shading image of Z and L log-albedo / log-reflectance Lambertian reflectance in log-intensity

Shape, Albedo, and Illumination from Shading SAIFS ( “ safes ” ) Far ? ? ? Near shape / depth illumination log-shading image of Z and L ? log-albedo / log-reflectance Lambertian reflectance in log-intensity

Problem Formulation : Known Lighting ? ? ? “ Find the most likely explanation (shape Z and log-albedo A) that together exactly reconstructs log-image I , given rendering engine S () and known illumination L . ”

What do we know about reflectance ? 1) Piecewise smooth (variation is small and sparse) 2) Palette is small (distribution is low-entropy) 3) Some colors are common (maximize likelihood under density model)

Reflectance : Absolute Color

What do we know about shapes ? 1) Piecewise smooth (variation in mean curvature is small and sparse) 2) Face outward at the occluding contour 3) Tend to be fronto-parallel (slant tends to be small)

Evaluation : Real World Images

Recognition helps reconstruction Blanz & Vetter (1999) Geometric Context (Hoiem, Efros, Hebert) for outdoor scenes; recent work on rooms (CMU, UIUC) is another example

The Three R ’ s of Vision Recognition Reconstruction Reorganization

Caltech-101 [Fei-Fei et al. 04] • 102 classes, 31-300 images/class

Caltech 101 classification results (even better by combining cues..)

Texton Histogram Model for Recognition (Leung & Malik, 1999) cf. Bag of Words Rough Plastic Pebbles Plaster-b Terrycloth ICCV '99, Corfu, Greece

Lazebnik, Schmid & Ponce (2006) They proposed using vector-quantized SIFT descriptors as “ words ”

PASCAL Visual Object Challenge (Everingham et al)

A good building block is a linear SVM trained on HOG features (Dalal & Triggs)

AP=0.23

Problems with current recognition approaches • Performance is quite poor compared to that at 2d recognition tasks and the needs of many applications. • Pose Estimation / Localization of parts or keypoints is even worse. We can ’ t isolate decent stick figures from radiance images, making use of depth data necessary. • Progress has slowed down. Variations of HOG/Deformable part models dominate.

PCA Results on APs of 20 VOC classes 1 BROOKES(11) UoCTTI(10) 0.8 UoCTTI(09) UoCTTI(11) 0.6 MISSOURI(11) 0.4 UMNECUIUC(10) NLPR(10) NLPR(11) Principal Component 2 NUS(10) 0.2 NUS(11) UCLA(10) 0 CORNELL(11) OXFORD(11) OXFORD(09) -0.2 UCLA(11) BONN_FGT(10) -0.4 BONN_SVR(10) NECUIUC(09) -0.6 UVA(11) -0.8 UVA(10) FGBG(10) -1 -1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 Principal Component 1

Next steps in recognition • Richer features than SIFT/HOG (deep learning ?) • Incorporate the “ shape bias ” known from child development literature to improve generalization – This requires monocular computation of shape, as once posited in the 2.5D sketch, and distinguishing albedo and illumination changes from geometric contours • Top down templates should predict keypoint locations and image support, not just information about category • Recognition and figure-ground inference need to co- evolve. Occlusion is signal, not noise.

High-Level Computer Vision

High-Level Computer Vision Object Recognition person van person dog

High-Level Computer Vision Object Recognition person van Semantic Segmentation person dog

High-Level Computer Vision Object Recognition Facing the camera Semantic Segmentation Pose Estimation In a back view Facing back, head to the right

High-Level Computer Vision Walking away Object Recognition talking Semantic Segmentation Pose Estimation Action Recognition

High-Level Computer Vision Object Recognition blue GMC van Semantic Segmentation Pose Estimation Action Recognition Man with elderly white glasses and a Attribute Classification man with a coat baseball hat Entlebucher mountain dog

High-Level Computer Vision “ A blue GMC van Object Recognition parked, in a back view ” Semantic Segmentation Pose Estimation Action Recognition “ A man with glasses Attribute Classification “ An elderly man with a and a coat, facing back, hat and glasses, facing walking away ” the camera and talking ” “ An entlebucher mountain dog sitting in a bag ”

Trying to extract stick figures is hard (and unnecessary!) Generalized cylinders (Marr & Nishihara, Binford) Pictorial Structures (Felszenswalb & Huttenlocher)

All the wrong limbs…

Motivation

Face Detection Carnegie Mellon University various images submitted to the CMU on-lin http://www.vasc.ri.cmu.edu/cgi-bin/demos/findface.cgi

Examples of poselets (Bourdev & Malik , 2009) Patches are often far visually , but they are close semantically

How do we train a poselet for a given pose configuration?

Finding Correspondences Given part of a human How do we find a similar pose pose configuration in the training set?

Finding Correspondences Left Shoulder Left Hip We use keypoints to annotate the joints, eyes, nose, etc. of people

Finding Correspondences Residual Error

Training poselet classifiers Residual 0.15 0.20 0.10 0.85 0.15 0.35 Error: 1. Given a seed patch 2. Find the closest patch for every other person 3. Sort them by residual error 4. Threshold them

Male or female?

How do we train attribute classifiers “ in the wild ” ?  Effective prediction requires inferring the pose and camera view  Pose reconstruction is itself a hard problem, but we don ’ t need perfect solution.  We train attribute classifiers for each poselet  Poselets implicitly decompose the pose

Gender classifier per poselet is much easier to train

Is male

Has long hair

Wears a hat

Wears glasses

Wears long pants

Wears long sleeves

The Three R s of Vision Recognition Reconstruction Reorganization - PowerPoint PPT Presentation

The Three R s of Vision Recognition Reconstruction Reorganization Jitendra Malik UC Berkeley Recognition, Reconstruction & Reorganization Recognition Reconstruction Reorganization Fifty years of computer vision 1963-2013

Computer Vision Computer Vision How does vision work? What is vision for? Ela Claridge

Branding Presentation VISION Mevushal VISION Muscat of Alexandria & Viognier VISION

Vision Services Vision Services & & Vision Therapy Vision Therapy February 2, 2007

Vision Our National Church partners .. Vision Our National Network partners Vision Getting

HIM Without Walls Realizing Our Vision! Realizing Our Vision Realize Our Vision Realizing Our

MACBETH Revision Day Slides ACT 1 SCENE 1 Note the significance of the number three: three

OEBB 2019-20 Open Enrollment: Vision Moda Health Vision Moda will continue offering three

J J R R Our Vision . . . Our Vision . . . Our Vision . . . Our Vision . . . TO BE THE BEST

Post- -trauma vision trauma vision Post Post- -trauma vision trauma vision Post syndrome

2017 Humana Vision 130 LOOK Whats NEW! NEW RETAIL FRAME BENEFIT 2 Humana Vision 100

Vision What is the Vision? The American Fork Canyon Vision (Vision) will ho- Few places in the

Building Our Vision St. Andrews Vision and Mission Our Vision: Our Vision: The Tree of Life is

FLITTER FLITTER The Foldable Litter Pink B Our Vision Our Vision Our Vision Our Vision A

FOCUS AREAS FOCUS AREAS FOCUS AREAS FOCUS AREAS Our Our Vision Vision Our Our Vision

So What Has So, What Has So, What Has So What Has Vision Done For Vision Done For Vision Done

Analog night vision devices April, 2020 ANALOG NIGHT VISION DEVICES Night vision devices

Extending Decision Trees for Web Categorisation Multiparadigm Inductive Programming group

MATTER BISPECTRUM BEYOND HORNDESKI ( ) based on 1801. 07885 SH , T. Kobayashi, S.

Getting there from here: Governing climate change through experimentation? Matthew Hoffmann

Local coderivatives and approximation of Hodge Laplace problems Ragnar Winther Department of

OPM FLOW IN MSO4SC AND OTHER STORIES Atgeirr Fl Rasmussen About SINTEF Vision: technology for

BEHAVIORAL APPROACH TO CORPORATE GOVERNANCE? DEPARTMENT OF MANAGEMENT MIRIAM FLICKINGER AARHUS

The universe evolution and modified gravity: an overview Sergei D. Odintsov Consejo Superior de

Finite Volume discretization of two phase Darcy flows with discontinuous capillary pressures R.

The Three R s of Vision Recognition Reconstruction Reorganization - PowerPoint PPT Presentation

The Three R s of Vision Recognition Reconstruction Reorganization Jitendra Malik UC Berkeley Recognition, Reconstruction & Reorganization Recognition Reconstruction Reorganization Fifty years of computer vision 1963-2013

Computer Vision Computer Vision How does vision work? What is vision for? Ela Claridge

Branding Presentation VISION Mevushal VISION Muscat of Alexandria &amp; Viognier VISION

Vision Services Vision Services &amp; &amp; Vision Therapy Vision Therapy February 2, 2007

Vision Our National Church partners .. Vision Our National Network partners Vision Getting

HIM Without Walls Realizing Our Vision! Realizing Our Vision Realize Our Vision Realizing Our

MACBETH Revision Day Slides ACT 1 SCENE 1 Note the significance of the number three: three

OEBB 2019-20 Open Enrollment: Vision Moda Health Vision Moda will continue offering three

J J R R Our Vision . . . Our Vision . . . Our Vision . . . Our Vision . . . TO BE THE BEST

Post- -trauma vision trauma vision Post Post- -trauma vision trauma vision Post syndrome

2017 Humana Vision 130 LOOK Whats NEW! NEW RETAIL FRAME BENEFIT 2 Humana Vision 100

Vision What is the Vision? The American Fork Canyon Vision (Vision) will ho- Few places in the

Building Our Vision St. Andrews Vision and Mission Our Vision: Our Vision: The Tree of Life is

FLITTER FLITTER The Foldable Litter Pink B Our Vision Our Vision Our Vision Our Vision A

FOCUS AREAS FOCUS AREAS FOCUS AREAS FOCUS AREAS Our Our Vision Vision Our Our Vision

So What Has So, What Has So, What Has So What Has Vision Done For Vision Done For Vision Done

Analog night vision devices April, 2020 ANALOG NIGHT VISION DEVICES Night vision devices

Extending Decision Trees for Web Categorisation Multiparadigm Inductive Programming group

MATTER BISPECTRUM BEYOND HORNDESKI ( ) based on 1801. 07885 SH , T. Kobayashi, S.

Getting there from here: Governing climate change through experimentation? Matthew Hoffmann

Local coderivatives and approximation of Hodge Laplace problems Ragnar Winther Department of

OPM FLOW IN MSO4SC AND OTHER STORIES Atgeirr Fl Rasmussen About SINTEF Vision: technology for

BEHAVIORAL APPROACH TO CORPORATE GOVERNANCE? DEPARTMENT OF MANAGEMENT MIRIAM FLICKINGER AARHUS

The universe evolution and modified gravity: an overview Sergei D. Odintsov Consejo Superior de

Finite Volume discretization of two phase Darcy flows with discontinuous capillary pressures R.

Branding Presentation VISION Mevushal VISION Muscat of Alexandria & Viognier VISION

Vision Services Vision Services & & Vision Therapy Vision Therapy February 2, 2007