Applied Bayesian Nonparametrics 5. Spatial Models via Gaussian Processes, not MRFs Tutorial at CVPR 2012 Erik Sudderth Brown University NIPS 2008: E. Sudderth & M. Jordan, Shared Segmentation of Natural Scenes using Dependent Pitman-Yor Processes. CVPR 2012: S. Ghosh & E. Sudderth, Nonparametric Learning for Layered Segmentation of Natural Images.
Human Image Segmentation
BNP Image Segmentation Segmentation as Partitioning • ! How many regions does this image contain? • ! What are the sizes of these regions? Why Bayesian Nonparametrics? • ! Huge variability in segmentations across images • ! Want multiple interpretations, ranked by probability
BNP Image Segmentation Model ! ! Dependent Pitman-Yor processes ! ! Spatial coupling via Gaussian processes cesses Inference ! ! Stochastic search & expectation propagation Learning ! ! Conditional covariance calibration Results ! ! Multiple segmentations of natural images
Feature Extraction • ! Partition image into ~1,000 superpixels • ! Compute texture and color features: Texton Histograms (VQ 13-channel filter bank) Hue-Saturation-Value (HSV) Color Histograms • ! Around 100 bins for each histogram
Pitman-Yor Mixture Model PY segment size prior π k − 1 � π k = v k (1 − v ℓ ) ℓ =1 v k ∼ Beta(1 − a, b + ka ) Assign features z 2 to segments z 1 z i ∼ Mult( π ) z 4 z 3 x 2 Observed features x 1 (color & texture) Visual segment x c i ∼ Mult( θ c z i ) x 4 appearance model x 3 x s i ∼ Mult( θ s z i ) Color: Texture:
Dependent DP&PY Mixtures π 2 Some dependent Kernel/logistic/probit π 1 prior with DP/PY stick-breaking process, “like” marginals order-based DDP, π 4 ! π 3 Assign features z 2 to segments z 1 z i ∼ Mult( π i ) z 4 z 3 x 2 x 1 Observed features (color & texture) Visual segment x c i ∼ Mult( θ c z i ) x 4 x 3 appearance model x s i ∼ Mult( θ s z i ) Color: Texture:
Example: Logistic of Gaussians • ! Pass set of Gaussian processes through softmax to get probabilities of independent segment assignments Fernandez & Green, 2002 Woolrich & Behrens, 2006 Blei & Lafferty, 2006 Figueiredo et. al., 2005, 2007 • ! Nonparametric analogs have similar properties
Discrete Markov Random Fields Ising and Potts Models Previous Applications GrabCut: Rother, • ! Interactive foreground segmentation Kolmogorov, & Blake 2004 • ! Supervised training for known categories ! but learning is challenging, and little success at unsupervised segmentation. Verbeek & Triggs, 2007
Region Classification with Markov Field Aspect Models Verbeek & Triggs, CVPR 2007 Local: 74% MRF: 78%
10-State Potts Samples States sorted by size: largest in blue, smallest in red
1996 IEEE DSP Workshop number of edges on which giant states take same value cluster natural edge strength images Even within the phase very noisy transition region, samples lack the size distribution and spatial coherence of real image segments
Geman & Geman, 1984 128 x128 grid 8 nearest neighbor edges K = 5 states Potts potentials: 200 Iterations 10,000 Iterations
Product of Potts and DP? Orbanz & Buhmann 2006 Potts Potentials DP Bias:
Spatially Dependent Pitman-Yor • ! Cut random surfaces (samples from a GP) π with thresholds (as in Level Set Methods) • ! Assign each pixel to the first surface which exceeds threshold z 2 z 1 (as in Layered Models) z 4 z 3 x 2 x 1 x 4 x 3 Duan, Guindani, & Gelfand, Generalized Spatial DP, 2007
Spatially Dependent Pitman-Yor • ! Cut random surfaces (samples from a GP) with thresholds (as in Level Set Methods) • ! Assign each pixel to the first surface which exceeds threshold (as in Layered Models) Duan, Guindani, & Gelfand, Generalized Spatial DP, 2007
Spatially Dependent Pitman-Yor • ! Cut random surfaces (samples from a GP) with thresholds (as in Level Set Methods) • ! Assign each pixel to the first surface which exceeds threshold (as in Layered Models) • ! Retains Pitman-Yor marginals while jointly modeling rich spatial dependencies (as in Copula Models)
Stick-Breaking Revisited 0 1 Multinomial Sampler: Sequential Binary Sampler:
PY Gaussian Thresholds Normal CDF because Gaussian Sampler: Sequential Binary Sampler:
PY Gaussian Thresholds Gaussian Sampler: Sequential Binary Sampler:
Spatially Dependent Pitman-Yor Non-Markov Gaussian Processes: PY prior: Segment size Normal CDF Feature Assignments
Preservation of PY Marginals Preserva Why Ordered Layer Assignments? Random Thresholds Stick Size Prior
Samples from PY Spatial Prior Comparison: Potts Markov Random Field
Outline Model ! ! Dependent Pitman-Yor processes ! ! Spatial coupling via Gaussian processes cesses Inference ! ! Stochastic search & expectation propagation Learning ! ! Conditional covariance calibration Results ! ! Multiple segmentations of natural images
Mean Field for Dependent PY Factorized Gaussian Posteriors K Sufficient Statistics K Allows closed form update of via
Mean Field for Dependent PY Updating Layered Partitions Evaluation of beta normalization constants: K Jointly optimize each layer’s threshold and Gaussian assignment surface, fixing all other layers, via backtracking conjugate gradient with line search K Reducing Local Optima Place factorized posterior on eigenfunctions of Gaussian process, not single features
Robustness and Initialization Log-likelihood bounds versus iteration, for many random initializations of mean field variational inference on a single image.
Alternative: Inference by Search Marginalize layer support functions via expectation propagation (EP): approximate but very accurate Consider hard assignments of superpixels to Integrate layers (partitions) likelihood parameters analytically (conjugacy) No need for a finite, conservative model truncation!
Maximization Expectation EM Algorithm ! ! E-step: Marginalize latent variables (approximate) ! M-step: Maximize likelihood bound given model parameters ME Algorithm Kurihara & Welling, 2009 ! ! M-step: Maximize likelihood given latent assignments ! E-step: Marginalize random parameters (exact) Why Maximization-Expectation? ! ! Parameter marginalization allows Bayesian “model selection” ! ! Hard assignments allow efficient algorithms, data structures ! ! Hard assignments consistent with clustering objectives ! ! No need for finite truncation of nonparametric models
Discrete Search Moves Stochastic proposals, accepted if and only if they improve our EP estimate of marginal likelihood: ! ! Merge: Combine a pair of regions into a single region ! ! Split: Break a single region into a pair of regions (for diversity, a few proposals) ! ! Shift: Sequentially move single superpixels to the most probable region ! ! Permute: Swap the position of two layers in the order Marginalization of continuous variables simplifies these moves !
Inferring Ordered Layers Order A: Front, Middle, Back Order B: Front, Middle, Back ! ! Which is preferred by a diagonal covariance? Order B ! ! Which is preferred by a spatial covariance? Order A
Inference Across Initializations Mean Field Variational EP Stochastic Search Best Worst Best Worst
Spatial PY (MF) Spatial PY (EP) BSDS: Spatial PY Inference
Outline Model ! ! Dependent Pitman-Yor processes ! ! Spatial coupling via Gaussian processes cesses Inference ! ! Stochastic search & expectation propagation Learning ! ! Conditional covariance calibration Results ! ! Multiple segmentations of natural images
Covariance Kernels • ! Thresholds determine segment size : Pitman-Yor • ! Covariance determines segment shape : probability that features at locations are in the same segment Roughly Independent Image Cues: ! ! Color and texture histograms within each region: Model generatively via multinomial likelihood (Dirichlet prior) ! Pixel locations and intervening contour cues: Model conditionally via GP covariance function Berkeley Pb (probability of boundary) detector
Learning from Human Segments ! ! Data unavailable to learn models of all the categories we’re interested in: We want to discover new categories! ! Use logistic regression, and basis expansion of image cues, to learn binary “are we in the same segment” predictors: ! ! Generative: Distance only ! ! Conditional: Distance, intervening contours, !
From Probability to Correlation There is an injective mapping between covariance and the probability that two superpixels are in the same segment.
Low-Rank Covariance Projection ! ! The pseudo-covariance constructed by considering each superpixel pair independently may not be positive definite ! ! Projected gradient method finds low rank (factor analysis), unit diagonal covariance close to target estimates
Prediction of Test Partitions Learned Probability versus Heuristic versus Learned Rand index measure Image Partition Probabilities of partition overlap
Comparing Spatial PY Models Image PY Learned PY Heuristic
Recommend
More recommend