Hierarchical Dirichlet Processes Sharing Clusters Among Related Groups Dongruo Zhou 1 Difan Zou 2 Yaodong Yu 3 1 , 2 , 3 University of Virginia 12/15/2017 Dongruo Zhou, Difan Zou, Yaodong Yu (Universities of Virginia) Hierarchical Dirichlet Processes 12/15/2017 1 / 31
Outline Model Introduction 1 General Problem Setting Dirichlet Process Hierarchical Dirichlet Process Inference 2 Posterior Sampling Experiments 3 Document Modeling Multiple Corpora Questions 4 Dongruo Zhou, Difan Zou, Yaodong Yu (Universities of Virginia) Hierarchical Dirichlet Processes 12/15/2017 2 / 31
Outline Model Introduction 1 General Problem Setting Dirichlet Process Hierarchical Dirichlet Process Inference 2 Posterior Sampling Experiments 3 Document Modeling Multiple Corpora Questions 4 Dongruo Zhou, Difan Zou, Yaodong Yu (Universities of Virginia) Hierarchical Dirichlet Processes 12/15/2017 3 / 31
Mixture Model We are interested in problems where the observations are organized into groups, and assumed exchangeable both within each group and across groups. Let j index the groups and i index the observations within each group, then θ ji | G j ∼ G j , for each j , i x ji | θ ji ∼ F ( θ ji ) , for each j , i where θ ji is the factor variable, F ( θ ji ) is the distribution of x ji given θ ji , G j is the prior distribution for the factor θ ji . Dongruo Zhou, Difan Zou, Yaodong Yu (Universities of Virginia) Hierarchical Dirichlet Processes 12/15/2017 4 / 31
Outline Model Introduction 1 General Problem Setting Dirichlet Process Hierarchical Dirichlet Process Inference 2 Posterior Sampling Experiments 3 Document Modeling Multiple Corpora Questions 4 Dongruo Zhou, Difan Zou, Yaodong Yu (Universities of Virginia) Hierarchical Dirichlet Processes 12/15/2017 5 / 31
Definition A Dirichlet process DP( α 0 , G 0 ) is defined to be the distribution of a random probability measure G over measure space (Θ , B ) We say G j ∼ DP( α 0 , G 0 ) if for any finite measurable partition ( A 1 , . . . , A r ) of Θ, ( G j ( A 1 ) , . . . , G j ( A r )) ∼ Dir( α 0 G 0 ( A 1 ) , . . . , α 0 G 0 ( A r )) , where y ∼ Dir( β i , 1 ≤ i ≤ r ) iff j =1 x β i − 1 p ( y i = x i , 1 ≤ i ≤ r , � r j =1 x i = 1) ∼ � r . i Distribution of distributions. Dongruo Zhou, Difan Zou, Yaodong Yu (Universities of Virginia) Hierarchical Dirichlet Processes 12/15/2017 6 / 31
Direct view from Chinese restaurant process Hard to describe G j directly from formal definition! Can we describe θ i ∼ G j directly from α 0 , G 0 without using G j ? Chinese restaurant process Suppose θ 1 , . . . are conditionally independent given G j , then i − 1 δ θ l α 0 � θ i | θ 1 , . . . , θ i − 1 , α 0 , G 0 ∼ + G 0 . i − 1 + α 0 i − 1 + α 0 l =1 i − 1 With probability i − 1+ α 0 , θ i takes existing values in θ 1 , . . . , θ i − 1 ; with α 0 probability i − 1+ α 0 , θ i takes values from G 0 . Dongruo Zhou, Difan Zou, Yaodong Yu (Universities of Virginia) Hierarchical Dirichlet Processes 12/15/2017 7 / 31
Outline Model Introduction 1 General Problem Setting Dirichlet Process Hierarchical Dirichlet Process Inference 2 Posterior Sampling Experiments 3 Document Modeling Multiple Corpora Questions 4 Dongruo Zhou, Difan Zou, Yaodong Yu (Universities of Virginia) Hierarchical Dirichlet Processes 12/15/2017 8 / 31
Definition How about another distribution on G 0 ? We consider G 0 also satisfies Dirichlet process DP( γ, H ), and each G j are conditionally independent given G 0 , with distribution DP( α 0 , G 0 ), named G 0 | γ, H ∼ DP( γ, H ) , G j | α 0 , G 0 ∼ DP( α 0 , G 0 ) . Dongruo Zhou, Difan Zou, Yaodong Yu (Universities of Virginia) Hierarchical Dirichlet Processes 12/15/2017 9 / 31
Definition H γ G G 0 0 α α G G j 0 0 θ θ ji i x ji x i Dongruo Zhou, Difan Zou, Yaodong Yu (Universities of Virginia) Hierarchical Dirichlet Processes 12/15/2017 10 / 31
Interpretation of HDP as Chinese restaurant process θ θ θ 18 14 16 θ θ 13 15 ψ ψ ψ φ φ φ θ θ θ = = = 11 11 1 12 12 2 17 13 1 θ 26 θ θ θ 22 ψ 24 ψ ψ 28 ψ φ φ φ φ θ θ θ θ = = = = 21 21 3 23 22 1 25 23 3 27 24 1 θ θ 36 35 θ φ 32 34 ψ ψ φ φ θ φ = = 31 31 1 33 32 2 Dongruo Zhou, Difan Zou, Yaodong Yu (Universities of Virginia) Hierarchical Dirichlet Processes 12/15/2017 11 / 31
Interpretation of HDP as Chinese restaurant process From previous definition of Chinese restaurant process, we have i − 1 δ θ jl α 0 � θ ji | θ j 1 , . . . , θ j , i − 1 , α 0 , G 0 ∼ + G 0 . i − 1 + α 0 i − 1 + α 0 l =1 which can also be written as m j · n jt · α 0 � θ ji | θ j 1 , . . . , θ j , i − 1 , α 0 , G 0 ∼ δ ψ jt + G 0 , i − 1 + α 0 i − 1 + α 0 t =1 where ψ jt are distinct values appearing in θ j 1 , . . . , θ j , i − 1 , m j · represents how many different values ψ jt are, and n jt · represents how many times ψ jt appears in θ j 1 , . . . , θ j , i − 1 . Dongruo Zhou, Difan Zou, Yaodong Yu (Universities of Virginia) Hierarchical Dirichlet Processes 12/15/2017 12 / 31
Interpretation of HDP as Chinese restaurant process Integrate out G 0 , we finally have ψ jt | ψ 11 , . . . , ψ 21 , . . . , ψ j 1 , . . . , ψ j , t − 1 , γ, H K m · k γ � ∼ m ·· + γ δ φ k + m ·· + γ H , k =1 where φ k represents all different values which appear before ψ jt , K represents the number of how many different values, m · k represents how many times φ k appears before ψ jt , m ·· = � K i =1 m · i . Dongruo Zhou, Difan Zou, Yaodong Yu (Universities of Virginia) Hierarchical Dirichlet Processes 12/15/2017 13 / 31
Outline Model Introduction 1 General Problem Setting Dirichlet Process Hierarchical Dirichlet Process Inference 2 Posterior Sampling Experiments 3 Document Modeling Multiple Corpora Questions 4 Dongruo Zhou, Difan Zou, Yaodong Yu (Universities of Virginia) Hierarchical Dirichlet Processes 12/15/2017 14 / 31
Posterior Sampling Observations: x ji ∼ F ( θ ji ) Factor θ ji ∼ G j : m j · n jt · α 0 � θ ji | θ j 1 , . . . , θ j , i − 1 , α 0 , G 0 ∼ δ ψ jt + G 0 , i − 1 + α 0 i − 1 + α 0 t =1 Random Variable ψ jt ∼ G 0 ψ jt | ψ 11 , . . . , ψ 21 , . . . , ψ j 1 , . . . , ψ j , t − 1 , γ, H K m · k γ � ∼ m ·· + γ δ φ k + m ·· + γ H , k =1 Dongruo Zhou, Difan Zou, Yaodong Yu (Universities of Virginia) Hierarchical Dirichlet Processes 12/15/2017 15 / 31
Posterior Sampling in the Chinese Restaurant Franchise Purpose: sample θ ji and ψ jt given observations x . Simplification: We sample the indexes t and k rather than θ ji and ψ jt . We first show the conditional density of x ji under component k ( φ k ) given all data items except x ji as follows: � f ( x ji | φ k )Π j ′ i ′ � = ji f ( x j ′ i ′ | φ k ) d φ k f − x ji ( x ji ) = k � Π j ′ i ′ � = ji f ( x j ′ i ′ | φ k ) h ( φ ) d φ k where h ( φ k ) denotes the density function of H . Dongruo Zhou, Difan Zou, Yaodong Yu (Universities of Virginia) Hierarchical Dirichlet Processes 12/15/2017 16 / 31
Sampling t If t takes on a particular previously used value t , we have p ( t ij = t | t − ji , k ) ∝ n − ji jt · . Then the posterior probability p ( t ij | t − ji , k , x ) satisfies, p ( t ij = t | t − ji , k , x ) ∝ p ( x ji | t ij = t , t − ji , k ) · p ( t ij = t | t − ji , k ) jt · f − x ji = n − ji ( x ji ) k jt If t ji takes on a new value t new , we have p ( t ij = t new | t − ji , k ) ∝ α 0 . Thus p ( t ij = t new | t − ji , k , x ) ∝ α 0 p ( x ji | t ij = t new , t − ji , k ) K m · k γ m ·· + γ f − x ji m ·· + γ f − x ji p ( x ji | t ij = t new , t − ji , k ) = � ( x ji ) + k new ( x ji ) k k =1 Dongruo Zhou, Difan Zou, Yaodong Yu (Universities of Virginia) Hierarchical Dirichlet Processes 12/15/2017 17 / 31
Sampling k Following the last page, if sampled value of t ji is t new , then we have � m · k f − x ji p ( k jt new = k | t , k − jt new ) ∝ ( x ji ) k is previously used k γ f − x ji k = k new k new ( x ji ) if t ji = t , we have � m · k f − x jt p ( k jt new = k | t , k − jt new ) ∝ ( x jt ) k is previously used k γ f − x jt k = k new k new ( x jt ) where x jt = ( x ji : all i with t ji = t ). Dongruo Zhou, Difan Zou, Yaodong Yu (Universities of Virginia) Hierarchical Dirichlet Processes 12/15/2017 18 / 31
Outline Model Introduction 1 General Problem Setting Dirichlet Process Hierarchical Dirichlet Process Inference 2 Posterior Sampling Experiments 3 Document Modeling Multiple Corpora Questions 4 Dongruo Zhou, Difan Zou, Yaodong Yu (Universities of Virginia) Hierarchical Dirichlet Processes 12/15/2017 19 / 31
Document Modeling Dataset: Corpus of nematode biology abstracts 1 5,838 abstracts in total Data Processing: Remove standard stop words and words appearing less than 10 times. Left with 476,441 words in total and a vocabulary size of 5,699 Representation: Use “bag of words” to represent a document 1 Available at http://elegans.swmed.edu/wli/cgcbib. Dongruo Zhou, Difan Zou, Yaodong Yu (Universities of Virginia) Hierarchical Dirichlet Processes 12/15/2017 20 / 31
Recommend
More recommend