LOCAL and GLOBAL INDEPENDENCE of PARAMETERS in DISCRETE BAYESIAN GRAPHICAL MODELS Jacek Wesołowski (GUS & Politechnika Warszawska , Warszawa) XLII Konferencja "STATYSTYKA MATEMATYCZNA" B˛ edlewo, Nov. 28 - Dec. 2, 2016 with H. Massam (York Univ., Toronto)
Plan Introduction 1 Markovian structure imposed by DAGs 2 Local and global independence vs. HD law 3
Introduction 1 Markovian structure imposed by DAGs 2 Local and global independence vs. HD law 3
Discrete model Let X = ( X v , v ∈ V ) be a random vector assuming values in I = × v ∈ V I v , where #( I v ) < ∞ , v ∈ V . We write p ( i ) := P X ( i ) = P ( X = i ) , i ∈ I . Let X 1 , . . . , X n be iid with distribution P X . Let n � M i = I ( X j = i ) , i ∈ I . j = 1 Then M = ( M i , i ∈ I ) has a multinomial distribution, i.e. � n � � p ( i ) m i , P ( M = m ) = m i ∈I � m = ( m i , i ∈ I ) , m i = n . i ∈I
Dirichlet law as an a priori distribution Bayesian approach means that one imposes some distribution on π = ( p ( i ) , i ∈ I ) . Since the only restriction on π are: p ( i ) ≥ 0, i ∈ I and � i ∈I p ( i ) = 1 we need a probability measure supported on a unit simplex of proper dimension. A random vector ( Y 1 , . . . , Y r ) has a (classical) Dirichlet distribution D ( α i , i = 1 , . . . , r ) if the density of the distirbution of ( Y 1 , . . . , Y r − 1 ) has the form r Γ( α ) � y α i f ( y 1 , . . . , y r − 1 ) = i I T r ( y ) , � r i = 1 Γ( α i ) i = 1 where α = � r i = 1 α i oraz y r = 1 − y 1 − . . . − y r − 1 .
Dirichlet conjugacy and moments If π = ( p ( i ) , i ∈ I ) has a Dirichlet distribution D ( α i , i ∈ I ) then a posteriori law is also Dirichlet π | M ∼ D ( α i + M i , i ∈ I ) . Exercise : Prove conjugacy of the Dirichlet law using only the form of its joint moments i ∈I ( α i ) ri � � p ( i ) r i = E , ( α ) r i ∈I i ∈I r i and ( a ) s = Γ( a + s ) where r = � Γ( a ) . Note that in this case the moments uniquely determine the distribution.
Example Let X = ( X 1 , X 2 , X 2 ) assume values in I = { 0 , 1 } 3 . Obviously, P ( X = i ) = P ( X 1 = i 1 ) P ( X 2 = i 2 | X 1 = i 1 ) P ( X 3 = i 3 | X 1 = i 1 , X 2 = i 2 ) , This is different than the Markov structure imposed by P ( X = i ) = P ( X 1 = i 1 ) P ( X 2 = i 2 | X 1 = i 1 ) P ( X 3 = i 3 | X 2 = i 2 ) , associated with an ordered graph: 1 → 2 → 3. Equivalent to p ( 000 ) p ( 101 ) = p ( 100 ) p ( 001 ) (1) and p ( 010 ) p ( 111 ) = p ( 110 ) p ( 011 ) . (2)
Example, cont. Conditions (1) and (2) are equivalent to each of the Markov structures imposed by two other ordered graphs (with skeleton 1 − 2 − 3) 1 ← 2 ← 3, i.e. P ( X = i ) = P ( X 1 = i 1 | X 2 = i 2 ) P ( X 2 = i 2 | X 3 = i 3 ) P ( X 3 = i 3 ); 1 ← 2 → 3, i.e. P ( X = i ) = P ( X 1 = i 1 | X 2 = i 2 ) P ( X 2 = i 2 ) P ( X 3 = i 3 | X 2 = i 2 ) .
Example, cont.: prior on π We seek a convenient prior on π , which is a probab. measure on (5-dimensional) manifold in [ 0 , ∞ ) 8 described by equations: x 1 + . . . + x 8 = 1 , x 1 x 2 = x 3 x 4 , x 5 x 6 = x 7 x 8 . Some Dirichlet-like distribution would be fine!
Example, cont. - one more ordered graph The graph 1 → 2 ← 3 introduces a different Markov structure P ( X = i ) = P ( X 1 = i 1 ) P ( X 2 = i 2 | X 1 = i 1 , X 3 = i 3 ) P ( X 3 = i 3 ) . Equivalently ( p ( 000 )+ p ( 010 ))( p ( 101 )+ p ( 111 )) = ( p ( 100 )+ p ( 110 ))( p ( 001 )+ p ( 011 )) . So we seek a probab. measure on a (6-dimensional) manifold in [ 0 , ∞ ) 8 defined by x 1 + . . . + x 8 = 1 , ( x 1 + x 5 )( x 2 + x 6 ) = ( x 3 + x 7 )( x 4 + x 8 ) .
Introduction 1 Markovian structure imposed by DAGs 2 Local and global independence vs. HD law 3
DAG For a graph G = ( V , E ) define a DAG (directed acycylic graph) with skeleton G by changing all unordered edges in E into arrows in a acyclic way. DAG can be identified with a parent function p : V → 2 V defined by p ( v ) = { w ∈ V : w → v } , v ∈ V and having the "acyclicity" property { v } ∩ p k ( v ) = ∅ . ∀ k ≥ 1 We will use also another function, q : V → 2 V defined by q ( v ) = { v } ∪ p ( v ) , v ∈ V .
p -Markov model Let p be a DAG with a chordal skeleton G = ( V , E ) . X (or π = ( p ( i ) , i ∈ I ) ) is called p - Markov iff p v | p ( v ) � p ( i ) = P ( X = i ) = ∀ i ∈ I , i v | i p v , v ∈ V where p v | p ( v ) � := P ( X v = i v | X p ( v ) = i p ( v ) ) . i v | i p v v ∈ V Note that = p q ( v ) (( n , m )) p v | p ( v ) , m ∈ I v , n ∈ I p ( v ) , m | n p p ( v ) ( n ) where p A n = � j ∈I V \ A p (( j , n )) = P ( X A = n ) , n ∈ I A , A ⊂ V .
Moral DAGs A DAG p with chordal skeleton G = ( V , E ) is moral if ∀ v ∈ V the subgraph induced in G by p ( v ) ⊂ V is complete. π is p ′ -Markov for a moral DAG p ′ with a chordal skeleton G iff π is p -Markov with respect to any moral p DAG with the same skeleton G . The family of DAGs with skeleton 1 − 2 − 3 splits into: moral DAGs 1 → 2 → 3 , 1 ← 2 ← 3 , 1 ← 2 → 3 an immoral DAG 1 → 2 ← 3 .
Cliques and separators Let G = ( V , E ) be a chordal graph. Any induced maximal complete subgraph is called a clique . Denote C the set of cliques of G . A perfect ordering of cliques is a numbering C 1 , . . . , C K of element of C , such that ∀ j = 2 , . . . , K j − 1 � ∃ i < j : S j := C j ∩ C l ⊂ C i . l = 1 S = { ( S 1 = ∅ ) , S j , j = 2 , . . . , K } is called a set of separators .
G -Markov model For a chordal G = ( V , E ) we say that X (or π ) is G -Markov if C ∈C p C ( i C ) � p ( i ) = S ∈S p S ( i S ) , i ∈ I , � where p A i A = P ( X A = i A ) and X A = ( X v , v ∈ A ) , for A ⊂ V . Equivalently, X (or π ) is p -Markov for (any) moral DAG p with skeleton G , i.e. p v | p ( v ) � i ∈ I . p ( i ) = i v | i p ( v ) , v ∈ V Equivalently, X w ⊥ X v | X V \{ w , v } { w , v } �∈ E . if only
Dawid & Lauritzen, Ann. Statist. (1993) Assume that π is G -Markov where G = ( V , E ) is a chordal graph. We say that π has a hyper-Dirichlet distribution , HD ( α C m , m ∈ I C , C ∈ C ) , iff its moments are rC m ∈I C ( α C m ) m � � � p ( i ) r i = C ∈C E n , rS n ∈I S ( α S n ) � � i ∈I S ∈S where for S ∋ S ⊂ C ∈ C � α S α C n = m , n , n ∈ I S m ∈I C \ S and � r A m = r m , n , m ∈ I A . n ∈I V \ A
HD distribution Equivalently, for any moral DAG p (with skeleton G ) in the decomposition p v | p ( v ) � p ( i ) = i ∈ I i v | i p ( v ) , v ∈ V the vectors of conditional probabilities ( p v | p ( v ) i v | i p ( v ) , i v ∈ I v ) , i p ( v ) ∈ I p ( v ) , v ∈ V , are independent and have classical Dirichlet distributions D ( α v | p ( v ) i v | i p ( v ) , i v ∈ I v ) . Then ∀ C ∈ C and ∀ i C ∈ I C i C = α v | p ( v ) α C if only C = { v } ∪ p ( v ) = q ( v ) . i v | i p ( v )
Multinomial mixture Let X 1 , . . . , X m be observations on X and m � � � M = M i = I ( X k = i ) , i ∈ I . k = 1 The conditional law of M = � m k = 1 X k given π is a multinomial distribution with parameters m and π = ( p ( i ) , i ∈ I ) .
HD as a conjugate prior law Th. If the a priori law of π is HD ( α C m , m ∈ I C , C ∈ C ) the posterior law of π | M is also hiper-Dirichlet, HD ( α C m + M C m , m ∈ I C , C ∈ C ) , where M C � m = M ( m , n ) , m ∈ I C . n ∈I V \ C
Proof The generalized Bayes rule reads � i ∈I p ( i ) ri ( m � i ∈I p ( i ) mi m ) E � � � p ( i ) r i = � E M = m . � i ∈I p ( i ) mi E ( m m ) � � i ∈I � Apply the moment formula for the HD distribution in the numerator and denominator: � � rC j + mC mS � j � α C ( α S n ) n � p ( i ) r i = � � j � � � E M = m n , � � mC rS n + mS � ( α S n ) � j α C i ∈I j ∈I C n ∈I S C ∈C S ∈S � j where m A n = � j ∈I V \ A m ( n , j ) , n ∈ I A , A ⊂ V .
Proof, cont. Since ( a ) b + c = ( a + b ) c , ( a ) b then the last formula gives � � rC � � α C j + m C j � � � p ( i ) r i = C ∈C j ∈I C j � M = m E n . ✷ � rS n ∈I S ( α S n + m S n ) � � � i ∈I S ∈S �
p -Dirichlet and P -Dirichlet distributins Let p be a moral DAG with a chordal skeleton G = ( V , E ) . A G -Markow random vector π has a p -Dirichlet law if only the random vectors ( p v | p ( v ) , m ∈ I v ) , n ∈ I p ( v ) , v ∈ V , m | n have (classical) Dirichlet laws and are independent. Let P be a family of moral DAGs with a chordal skeleton G = ( V , E ) . We say that G -Markov π has a P -Dirichlet distribution if it has a p -Dirichlet ∀ p ∈ P .
HD as a special P -Dirichlet law Let P be a family of all moral DAGs with the chordal skeleton G . If G -Markov π has a P -Dirichlet distribution then π has a HD distribution. Question: Can we have a similar description of the HD law through a smaller family P ?
p -perfect ordering of cliques Let p be a moral DAG with a (chordal) skeleton G = ( V , E ) . A perfect ordering of cliques o = ( C 1 , . . . , C K ) is called p -perfect (notation: o p ) if ∀ ℓ = 1 , . . . , K ∃ v ∈ C ℓ \ S ℓ : S ℓ = p ( v ) . Lemat. For any moral DAGu p there exists a p -perfect ordering of cliques.
Recommend
More recommend