outline
play

Outline Motivation and challenge Dirichlet Process and Infinite - PDF document

Nonparametric Bayesian M Nonparametric Bayesian Models odels --Learning and Reasoning in Open Possible Worlds -- Learning and Reasoning in Open Possible Worlds Eric Xing epxing@cs.cmu.edu Machine Learning Dept./Language Technology


  1. Nonparametric Bayesian M Nonparametric Bayesian Models odels --Learning and Reasoning in Open Possible Worlds -- Learning and Reasoning in Open Possible Worlds Eric Xing epxing@cs.cmu.edu Machine Learning Dept./Language Technology Inst./Computer Science Dept. Carnegie Mellon University 1 VLPR09 @ Beijing, China 8/6/2009 Outline � Motivation and challenge � Dirichlet Process and Infinite Mixture � Formulation � Approximate Inference algorithm Example: population clustering � � Hierarchical Dirichlet Process and Multi-Task Clustering � Formulation � Transformed DP and HDP � Kernel stick-breaking process Application: joint image segmentation � � Dynamic Dirichlet Process � Hidden Markov DP � Temporal DPM Application: evolutionary clustering of documents � � Summary 2 VLPR09 @ Beijing, China 8/6/2009

  2. Clustering 3 VLPR09 @ Beijing, China 8/6/2009 Image Segmentation � How to segment images? � Manual segmentation (very expensive) � Algorithm segmentation � K-means � Statistical mixture models � Spectral clustering � Problems with most existing algorithms � Ignore the spatial information � Perform the segmentation one image at a time � Need to specify the number of segments a priori 4 VLPR09 @ Beijing, China 8/6/2009

  3. Discover Object Categories � Discover what objects are present in a collection of images in an unsupervised way � Find those same objects in novel images � Determine what local image features correspond to what objects; segmenting the image 5 VLPR09 @ Beijing, China 8/6/2009 Learn and Recognize Natural Scene Categories 6 VLPR09 @ Beijing, China 8/6/2009

  4. Object Recognition and Tracking (1.9, 9.0, 2.1) (1.8, 7.4, 2.3) (1.9, 6.1, 2.2) (0.7, 5.1, 3.2) (0.6, 5.9, 3.2) (0.9, 5.8, 3.1) t=1 t=2 t=3 7 VLPR09 @ Beijing, China 8/6/2009 The Evolution of Science Research Research circles circles Phy Phy Bio Research Research topics topics CS PNAS papers papers PNAS 2000 ? 1900 8 VLPR09 @ Beijing, China 8/6/2009

  5. A Classical Approach � Clustering as Mixture Modeling � Then "model selection" 9 VLPR09 @ Beijing, China 8/6/2009 Partially Observed, Open and Evolving Possible Worlds � Unbounded # of objects/trajectories � Changing attributes � Birth/death, merge/split � Relational ambiguity � The parametric paradigm: � Finite ( ) Event model motion model ( { } Event model ) ( { } ) motion model { } { } 0 1 + 1 p φ φ T φ t φ t p : p � Structurally k or k k k Entity space Entity space unambiguous Ξ Ξ * * 1 + | 1 + t | t t t observation space observation space Sensor model Sensor model ( { } ) φ p | x k How to open it up? How to open it up? 10 10 VLPR09 @ Beijing, China 8/6/2009

  6. Model Selection vs. Posterior Inference � Model selection � "intelligent" guess: ??? � cross validation: data-hungry � � information theoretic: ( ) � AIC f g ˆ ⋅ ⋅ θ KL K arg min ( ) | ( | , ) ML � TIC Parsimony, Ockam's Parsimony, Ockam's Razor Razor � MDL : � Bayes factor: need to compute data likelihood � Posterior inference: we want to handle uncertainty of model complexity explicitly p M D p D M p M ∝ ( | ) ( | ) ( ) { } M ≡ θ K , � we favor a distribution that does not constrain M in a "closed" space! 11 11 VLPR09 @ Beijing, China 8/6/2009 Two "Recent" Developments � First order probabilistic languages (FOPLs) � Examples: PRM, BLOG … � Lift graphical models to "open" world (#rv, relation, index, lifespan …) � Focus on complete, consistent, and operating rules to instantiate possible worlds, and formal language of expressing such rules � Operational way of defining distributions over possible worlds, via sampling methods � Bayesian Nonparametrics � Examples: Dirichlet processes, stick-breaking processes … � From finite, to infinite mixture, to more complex constructions (hierarchies, spatial/temporal sequences, …) � Focus on the laws and behaviors of both the generative formalisms and resulting distributions � Often offer explicit expression of distributions, and expose the structure of the distributions --- motivate various approximate schemes 12 12 VLPR09 @ Beijing, China 8/6/2009

  7. Outline � Motivation and challenge � Dirichlet Process and Infinite Mixture � Formulation � Approximate Inference algorithm Example: population clustering � � Hierarchical Dirichlet Process and Multi-Task Clustering � Formulation � Transformed DP and HDP � Kernel stick-breaking process � Application: joint image segmentation � Dynamic Dirichlet Process � Hidden Markov DP � Temporal DPM Application: evolutionary clustering of documents � � Summary 13 13 VLPR09 @ Beijing, China 8/6/2009 Clustering � How to label them ? � How many clusters ??? 14 14 VLPR09 @ Beijing, China 8/6/2009

  8. Random Partition of Probability Space { } φ , 6 π 6 { } φ , 4 π 4 { } . ( event, p event ) φ , 5 π 5 { } φ , 3 π 3 { } centroid := φ φ , 2 π 2 { } Image ele. :=( x, θ ) … φ , 1 π 1 15 15 VLPR09 @ Beijing, China 8/6/2009 Stick-breaking Process 0 0.4 0.4 ∞ ∑ G = π δ θ ( ) k k k 1 = 0.6 0.5 0.3 G θ ~ k 0 ∞ ∑ Location 1 0.3 0.8 0.24 π = k k 1 = k 1 - ∏ 1 π = β β ( - ) k k k j 1 = Mass G 0 1 β α ~ Beta( , ) k 16 16 VLPR09 @ Beijing, China 8/6/2009

  9. DP – a P ó lya urn Process 2 = 5 p + α 3 = 5 p + α α = 5 p + α = G p K : ( ) 0 ( ) α G DP( G ) ~ Joint: Joint: 0 � Self-reinforcing property α K n � exchangeable partition ∑ φ φ α δ + G k G | , , ~ . Marginal: Marginal: 0 0 i − i 1 φ 1 − + α − + α i i of samples k = 1 k 17 17 VLPR09 @ Beijing, China 8/6/2009 Clustering and DP Mixture 2 = 5 p + α 3 = 5 p + α α = 5 p + α = G p K : ( ) 0 1 3 2 4 5 6 � We can associate mixture components with colors in the Pólya urn model and thereby define a clustering of the data 18 18 VLPR09 @ Beijing, China 8/6/2009

  10. Chinese Restaurant Process θ θ 1 2 P c k 1 0 0 ( = | ) = c i - i α 1 0 α α 1 + 1 + α 1 1 α α α 2 + 2 + 2 + α 1 2 α α α 3 + 3 + 3 + α m m .... 1 2 α α + α i i i + - 1 + - 1 - 1 19 19 VLPR09 @ Beijing, China 8/6/2009 Dirichlet Process � A CDF , G , on possible worlds φ φ 6 of random partitions follows a 4 φ φ 5 3 φ Dirichlet Process if for any 2 φ 1 measurable finite partition ( φ 1 , φ 2 , .., φ m ): a distribution ( G ( φ 1 ), G( φ 2 ), …, G ( φ m ) ) ~ Dirichlet( α G 0 ( φ 1 ), …., α G 0( φ m ) ) another distribution where G 0 is the base measure and α is the scale parameter Process G G defines a distribution of distribution Thus a Thus a Dirichlet Dirichlet Process defines a distribution of distribution 20 20 VLPR09 @ Beijing, China 8/6/2009

  11. Graphical Model Representations of DP G 0 G G 0 G 0 0 α α α α G θ π ∞ θ i y i x i x i N N The Pólya urn construction The Stick-breaking construction 21 21 VLPR09 @ Beijing, China 8/6/2009 Example: DP-haplotyper [Xing et al, 2004] � Clustering human populations α G 0 DP G K infinite mixture components A θ (for population haplotypes haplotypes ) H n 1 H n 2 Likelihood model (for individual haplotypes and genotypes genotypes ) G n N � Inference: Markov Chain Monte Carlo (MCMC) � Gibbs sampling � Metropolis Hasting 22 22 VLPR09 @ Beijing, China 8/6/2009

  12. Inheritance and Observation Models � Single-locus mutation model A 1 → A A H C i 2 Ancestral i e C A e i 3 pool 1 … ⎧ θ = h a for t t ⎪ C θ = 1 − θ P h a ⎨ ( | , ) i H t t ≠ h a for 2 ⎪ − 1 t t B ⎩ | | → = θ h a with prob . H t t i 1 Haplotypes � Noisy observation model H i 2 → H H G 1 , i i i 2 P g h h ( | , ) : 1 2 G G Genotype = ⊕ λ g h h with prob i . 1 2 t , t , t 23 23 VLPR09 @ Beijing, China 8/6/2009 MCMC for Haplotype Inference � Gibbs sampling for exploring the posterior distribution under the proposed model θ λ c a e , � Integrate out the parameters such as or , and sample i k h and i e = ∝ = p c k p c k p h a ( | , , ) ( | ) ( | , ) c h a c h c − − − i i i i i k i [ ] [ ] , [ ] e e e e e e Posterior Prior x Likelihood Pólya urn M � Gibbs sampling algorithm: draw samples of each random variable to be sampled given values of all the remaining variables 24 24 VLPR09 @ Beijing, China 8/6/2009

  13. MCMC for Haplotype Inference Sample c ie(j) , from 1. Sample a k from 2. Sample h ie(j) from 3. For DP scale parameter α : a vague inverse Gamma prior � 25 25 VLPR09 @ Beijing, China 8/6/2009 Convergence of Ancestral Inference 26 26 VLPR09 @ Beijing, China 8/6/2009

Recommend


More recommend