Generative Hierarchical Models for Image Analysis Stuart Geman (with E. Borenstein, L.-B. Chang, W. Zhang)
I. Image modeling II. Data likelihood III. Priors: content/context sensitivity
I. Image modeling • Red herrings? • Bayesian (generative) image models II. Data likelihood III. Priors: content/context sensitivity
Practical vision problems: What is the end-product of processing? machine vision: machine analysis human vision: “The more you look, the more you see”
Learning Theory: Pure learning ( label ) x Tree ( image ) y ( ) black box No Tree N , , ,..., Given y x produce so that 1 2 k k 1 N k N black box OPTIMAL classifier
Performance of stressed biological systems: Super-rapid response… In this circumstance: machine vision achieves biological performance
I. Image modeling • Red herrings? • Bayesian (generative) image models II. Data likelihood III. Priors: content/context sensitivity
I. Bayesian (generative) image models Prior I set of possible "interpretations" or "parses" x I a particular interpretation ( ) P x probability model on I * very structured and constrained * organizing principles: hierarchy and reusability (Amit, Buhmann, Felzenszwalb, Mumford, Pogio, Yuille, Zhu, etc.) * non-Markovian (context/content sensitive) Data likelihood y image ( | ) P y x conditional probability model Posterior ( | ) ( | ) ( ) P x y P y x P x
I. Image modeling II. Data likelihood • Feature distributions and data distributions • Conditional modeling • Examples: learning templates III. Priors: content/context sensitivity
Feature distributions and data distributions y pixel intensity at s S S s { } y y s s S image patch Given a category (e.g. edge, corner, eye, face, (eye,pose ),…), model patch through a feature model: ( ) f y "feature" e.g. ( ) for short f y variance of patch histogram of gradients, sift features, etc. template correlation ( ) for short P f ( ; ) P f proba bility model F F 1 ,..., , Problem: given y y samples of eye patches, learn and N
Use maximum likelihood…but what is the likelihood? ( ),..., ( ) f y f y Tempting to PRETEND that the data is : 1 N N ( ( ),..., ( ); , ) ( ( )) L f y f y P f y 1 N F k 1 k caution: this is different from 1 ( ) ( ( )) Y P y P f y F Z , ,..., BUT the data is y y and N 1 ( ) ( ( )) ( | ( )) P y P f y P y F f y Y F Y N ( ,..., ; , ) ( ( )) ( | ( )) L y y P f y P y F f y 1 N F k Y k k 1 k The first is fine for estimating P (i.e. ), F but not fine for estimating (i.e. ) f
I. Image modeling II. Data likelihood • Feature distributions and data distributions • Conditional modeling • Examples: learning templates III. Priors: content/context sensitivity
Conditional modeling ( ) For any category (e.g. "eye") and feature g F f Y g g g ( ) ( ( )) ( | ( )) P y P f y P y F f y Y F Y g g ( ( )) ( | ). Easy to model P f y ; hard to model P y F f F Y Proposal: start with a "null" or "background" o g ( ) ( ) distribution P y and choose P y Y Y g ( ) 1. consistent with P f , and F o ( ) 2. otherwise "as close as possible" to P y Y
Conditional modeling: a perturbation of the null distribution g o ( ) ( ), Specifically, given P f , and a null distribution P y F Y choose g o ( ) arg min ( || ) P y D P P Y Y Y : ( ) P F Y has Y g P ( f ) distribution F g g o ( ) ( ( )) ( | ( )) P y P f y P y F f y Y F Y ( ) P y % ( || ) ( )log (where D P P P y dy is K-L divergence) % ( ) P y
Estimation g g 1 ,..., ( ) ( ; ) y y P f P f Given , ), and N F F g g o ( ) ( ( )) ( | ( )) P y P f y P y F f y Y F Y : estimate and argmax ( ,..., ; , ) L y y 1 N , g N ( ( )) P f y argmax F k .... o ( ( )) P f y 1 k F k ,
In fact, for arbitrary mixture (e.g. over poses, templates, vector quanta, …): g g ( ) ( ; ) 1,2,..., P f P f ), m M F m F m m m m M g g o ( ) ( ( )) ( | ( )) P y P f y P y F f y Y m F m Y m m m 1 m argmax ( ,..., ; ,..., ,..., ,..., ) L y y 1 1 1 1 N m m m ,..., ,..., ,..., 1 m 1 m 1 m g ( ( )) P f y N M argmax F m k .... m m o ( ( )) P f y 1 1 k m F m k ,..., ,..., ,..., m 1 m 1 m 1 m
I. Image modeling II. Data likelihood • Feature distributions and data distributions • Conditional modeling • Examples: learning templates III. Priors: content/context sensitivity
Example: learning eye templates y pixel intensity at s S S s { } y y s s S image patch ( ) ( ) ( , ), Take f y c y corr T y and model eyes as a T mixture: M e e o ( ) ( ( )) ( | ( )) P y P c y P y C c y Y m C T Y T T T m m m m m 1 M (1 c ( )) y o ( | ( )) = e m Tm P y C c y m Y T T m m m m=1
o : Null distribution, P for estimation Y o ( ) only P c C T matters... 2 o o ( ) (0, P sample P c N ) Y C T 1 iid | | S 10 random | | S image patch 15 random | | S smooth image patch
Example: learning eye templates Examples of faces from Feret database With N 500 compute argmax ( ,..., | ,..., , ,..., , ,..., ) L y y T T 1 1 1 1 N m m m ,..., ,..., T ,..., T 1 1 1 m m m (1 ( )) c y e m T k N M m argmax m m o ( ( )) P c y 1 k 1 m ,..., ,..., ,..., C T k T T T m 1 m 1 m 1 m m
Example: learning eye templates, mixing over position, scale, and template samples from training set learned templates Top to bottom: EM iterations
Example: learning (right) eye templates What if we forget all this nonsense and just maximize (1 ( )) c y e m T k N M N M m (1 ( )) c y ) e m T k (instead of ? m m m m o ( ( )) P c y m 1 1 k 1 m k 1 m C T k T m m
How good are the templates? A classification experiment… Classify East Asian and South Asian * mixing over 4 scales, and 8 templates East Asian: (L) examples of training images (M) progression of EM (R) trained templates South Asian: (L) examples of training images (M) progression of EM (R) trained templates Classification Rate: 97%
Other examples: noses 16 templates multiple scales, shifts, and rotations samples from training set learned templates
Other examples: mixture of noses and mouths samples from training set 32 learned templates (1/2 noses, 1/2 mouths)
Other examples: train on 58 faces …half with glasses…half without samples from training set 32 learned templates 6 learned templates
Other examples: train on 58 faces …half with glasses…half without 6 learned templates random eight of the 58 faces row 2 to 5, top to bottom: templates ordered by posterior likelihood
Other examples: train on 58 faces …half with glasses…half without top row: the six learned templates row 2 to 5, top to bottom: Training images ordered by correlation
Other examples: train random patches (“sparse representation”) 500 random 15x15 training patches from 24 10x10 templates random internet images
Other examples: coarse representation ( ) ( , ( )), use f y Corr T D y where D downconvert ( ) ( ( ), )?) (go other way for super res.: f y Corr D T y training of 8 low-res (10x10) templates
Grenander : “pattern synthesis=pattern analysis” (approximate) sampling… 0 32 samples from mixture model with P white noise Y
(approximate) sampling… 0 32 samples from mixture model with P Caltech 101 Y
(approximate) sampling… 0 32 samples from mixture model with P population of Y smooth image patches
I. Image modeling II. Data likelihood III. Priors: content/context sensitivity • Hierarchical models and the Markov dilemma • Conditional modeling • Examples: detecting faces and reading license plates
Recommend
More recommend