Population Markov Chain Monte Carlo and Genetic Networks Fujun Ye MSc in Artificial Intelligence Supervised by Dirk Husmeier
Outline � Introduction � MCMCMC � MCMCMC for missing values � Result Evaluation (complete data) � Result Evaluation (missing values) � Summary
Introduction � Genetic Network � Clustering and Differential equation � Bayesian Network � MCMC
eq Genetic Network + + F f2 f - B b + ab A a -
Clustering
Differential Equation � Advantage provide detailed understanding of the biological systems � Shortcoming short of data noisy data
Inferring Bayesian Network From Expression Data A � Bayesian Network B C n ∏ D = ( , ,..., ) ( | ( )) P X X X P X Pa X 1 2 n i G i = 1 i E = ( , , , , ) ( | ) ( | , ) ( | ) ( | ) ( ) P a b c d e P e d P d b c P c a P b a P a
Problems � The number of different network structures grows super exponentially with the number of nodes
P(M|D) P(M|D) M’ M M’ M Where the data set is large, the optimal structure M’ is Where the data set is small, there are many networks which can well defined explain the data fairly well.
MCMC � MCMC samples networks from its posterior distribution ( | ) ( ) P D M P M = k k ( | ) P M D ∑ k ( | ) ( ) P D M P M i i i � Calculate the posterior probability of a feature ∑ ( | ) ( | ) P M D P f M i i i ( | ) P f D ∑ = ( | ) P M D i i
� Coincidence dependence 1 2 1 2 3 4 5 6 3 4 5 6 1 2 3 4 5 6
� Escape from local optima using traditional MCMC P (M|D) M
� Small step size versus big step size P (M|D) M
Problems � Huge search space and coincidence dependence — Prescreening is important! � Local optima — Traversal operator is important! � Fixed step size — Varied step size is more reasonable
MCMCMC � Metropolis-coupled Markov Chain Monte Carlo (MCMCMC) � Pre-processing method � Traversal operators � Algorithm � MCMCMC for missing values
MCMCMC 2 T 1 T 1 > 1 = T > 3 2 T T
� For each chain, move a step based on 1 T ⎛ ⎞ ' ' ' ( | ) ( ) ( | ) P D M P M Q M M ⎜ ⎟ = ' ( , ) min( 1 , ) A M M ⎜ ⎟ ' ⎝ ( | ) ( ) ⎠ ( | ) P D M P M Q M M � Chain swap ⎛ ⎞ ... ... ... T T T T T = ⎜ ⎟ 1 2 i k M S ⎜ ⎟ a ⎝ ... ... ... ⎠ M M M M M 1 2 i k M ⎛ ⎞ ... ... ... T T T T T = ⎜ ⎟ 1 2 i k M S ⎜ ⎟ b ⎝ ... ... ... ⎠ M M M M M 1 2 k i M Acceptance Probability [ ] [ ] ⎧ ⎫ 1 1 ⎪ T T ⎪ ( | ) ( ) ( | ) ( ) P D M P M k P D M P M i = ⎨ ⎬ i i k k min 1 , } P [ ] [ ] a ⎪ ⎪ 1 T 1 T ⎩ ( | ) ( ) ( | ) ( ) ⎭ P D M P M i P D M P M k i i k k
Pre-processing method ∫ = θ θ θ ( | ) ( | , ) ( | ) P D M P D M P M d � Penalize complex model α ∑ α π = α = α π nv π π | | * | | n n v nv n v n n n n v n Γ α Γ α + ( ) ( ) n ∏∏ ∏ π π π = n nv nv ( | ) p D M n n n n n Γ α + Γ α ( ) ( ) n π π π π n v n n nv n n n n n n
The log likelihood is ∑ = π log( ( | )) ( , , ) p D M score n n D n where ∑ π = Γ α − Γ α + + ( , , ) [log( ( )) log( ( ))] score n D n π π π n n nv n n n n n π n ∑∑ Γ α + − Γ α [log( ( ) ) log( ( ))] n π π π nv nv nv n n n n n n π v n n
� Use some max fan in � Find all possible parents-configurations for each node and delete low score parents-configurations � Keep C parents-configurations for each node and cardinality Threshold is set as: θ = λ − + * ( ) / / scoresh scoresl m scoresl m
score(n, π ,D) π ’ π When data is quite sparse and noisy score(n, π ,D) π ’ π Using pre-screening method
Traversal operators � Importance sampling--- Sample a parents-configuration for a node π + ( , , ) score i D C Ki j π = = ( ) ( | ) P nodei p nodei n ∑ j n ∑ π + − ( , , ) ( 1 ) score i D n C Ki k = ≠ 1 , _ k k k old = 1 i π + ( , , ) score i D C k _ old n ( | ) ∑ Q Mold Mnew π + − = ( , , ) ( 1 ) score i D n C k ( | ) Q Mnew Mold = ≠ 1 , _ k k k old π + ( , , ) score i D C k _ new n ∑ π + − ( , , ) ( 1 ) score i D n C k = ≠ k 1 , k k _ new ( | ) P D Mnew = π − π exp( ( , , ) ( , , )) score i D score i D ( | ) _ _ P D Mold k new k old
� DIN sampling --- If the new network is loopy 2 3 2 3 1 1 The old model The new model 2 3 2 3 1 1 Step 1 Step 2
( | ) * ( ) * ( | ) P D Mnew P Mnew Q Mold Mnew = ( | ) ( 1 , ) A Mnew Mold Min ( | ) * ( ) * ( | ) P D Mold P Mold Q Mnew Mold n ∑ π + π exp( ( , ( ), ) ( , ( ), )) score n n n D score n n n D j j i i ( | ) P D Mnew = = j 1 n ( | ) P D Mold ∑ π + π exp( ( , ( ), ) ( , ( ), )) score n o n D score n o n D j j i i = 1 j π + ( , , ) score i D C k _ old ( | ) Q Mold Mnew n ∑ π + − = ( , , ) ( 1 ) score i D n C k ( | ) Q Mnew Mold = ≠ 1 , _ k k k old π + ( , , ) score i D C k _ new n ∑ π + − ( , , ) ( 1 ) score i D n C I simply us an approximation since it is k = ≠ 1 , _ k k k new quite time consuming to calculate the proposal probability
DIN proposal Traditional MCMC
Algorithm � Initialization � Each iteration Move a step for every chain • Chain swap • � Keep the first chain
Importance sampling M1 M1’ T=1 M2 A (Mi’, Mi) Importance S( m sampling Legal Chains) … Mi’ T>=1 Chain Mm Illegal Swap Mi’ DIN Sampling Pa (S’, S) S’
MCMCMC for missing values X1 X2 X3 X4 X5 1 ? 2 1 1 2 2 2 1 2 ? 1 1 ? 1 1 2 ? 2 ? ? 1 1 1 2 I1 I2 I3 I4 I5 I6 … 1 1 2 ? 1 1 3 1 5 2 1 2 7 3 4 3 9 … 1 ? 2 1 1 2 1 1 2 ? 2 2 ? 1 2 1 2 2 ? ? I1 I2 I3 I4 I5 I6 I7 I8 I9 I10 I11 I12 1 2 2 2 1 1 2 1 2 2 1 2 I4 I3 I10 I6 2 2 1 1
Importance M1, D1 sampling M1’ T=1 M2, D2 A (Mi’, Mi| Di) Importance S( m Legal sampling Chains) … Mi’ T>=1 Mm, Dm Illegal Mi’ DIN Sampling T=1 A (Di’, Di| Gi) Di’ Dmi’ Dmi Observed data Chain Swap Pa (S’, S) S’
� Proposal method for before burn in + 1 N v n = ∑ ( | , ) + Q v M n ( 1 ) N n v n + 1 v N n v v π n m = ∑ + ( | , , ) Q v M n m ( 1 ) N n v v π n m + v 1 n N = ∑ v v v π π n mis nm + ( , | , , ) ( 1 ) Q v v M n m N π n mis v v v π π n mis nm π v n , mis
X1 X2 X3 X4 X5 1 ? 2 1 1 2 2 2 1 2 5 ? 1 1 2 1 1 1 2 ? ? ? ? 1 1 1 2 1 1 2 ? 1 4 1 ? 2 1 1 2 3 2 1 1 2 ? 2 2 ? 1 2 1 2 2 ? ?
Acceptance probability ' ' ( | ) ( | ) Q MissVal MissVal P D M ' ( , ) = min( 1 , ) Accept MissVal MissVal ' ( | ) ( | ) Q MissVal MissVal P D M
� After burn in + = ∏ ∑ 1 N ' MissVal _ i new ( | ) Q MissVal + ( 1 ) N ∈ Ω ( ) i cmis ij j + = ∏ ∑ 1 N _ ' i old ( | ) Q MissVal MissVal + ( ' 1 ) N ∈ Ω ( ) i cmis ij j Acceptance probability ' ' ( | ) ( | ) Q MissVal MissVal P D M ' ( , ) = min( 1 , ) Accept MissVal MissVal ' ( | ) ( | ) Q MissVal MissVal P D M
Result Evaluation (complete data) � ROC curve 1 1 fn fp tp = sensitivty + 2 3 2 3 tp fn tp tn tn tn = specificit y + 4 4 tn fp tn fp = 1 − = complement ary specificit y + + tn fp tn fp tp is the number of true positive edges. fn is the number of false negative edges. fp is the number of false positive edges. tn is the number of true negative edges.
� Model Genetic Network
� MCMCMC against order MCMC � MCMCMC against structure MCMC � MCMCMC against Population MCMC
Temperatures= [1, 1, 3, 9, 30] � Keep at most 10 parents-configurations for each node and cardinality. � With 60000 iterations: 30000 burn in and keep the last 30000 samples. �
� Alarm Network
� Arabidopsis data
Result Evaluation (missing values) � Model Genetic Network � Before burn in(30000 burn in, 30000 iterations � After burn in 40000 iterations � Temp=[1,1,3,9,12]
The ROC curve for noise=0.2 data=200 with different missing rate � Temp = [1, 1, 3, 9, 12] � Use 30000 burn in and 30000 iterations. � Every 10 steps keep one sample. (before burn in algorithm)
� B cell Lymphoma data
Recommend
More recommend