population markov chain monte carlo and genetic networks
play

Population Markov Chain Monte Carlo and Genetic Networks Fujun Ye - PowerPoint PPT Presentation

Population Markov Chain Monte Carlo and Genetic Networks Fujun Ye MSc in Artificial Intelligence Supervised by Dirk Husmeier Outline Introduction MCMCMC MCMCMC for missing values Result Evaluation (complete data) Result


  1. Population Markov Chain Monte Carlo and Genetic Networks Fujun Ye MSc in Artificial Intelligence Supervised by Dirk Husmeier

  2. Outline � Introduction � MCMCMC � MCMCMC for missing values � Result Evaluation (complete data) � Result Evaluation (missing values) � Summary

  3. Introduction � Genetic Network � Clustering and Differential equation � Bayesian Network � MCMC

  4. eq Genetic Network + + F f2 f - B b + ab A a -

  5. Clustering

  6. Differential Equation � Advantage provide detailed understanding of the biological systems � Shortcoming short of data noisy data

  7. Inferring Bayesian Network From Expression Data A � Bayesian Network B C n ∏ D = ( , ,..., ) ( | ( )) P X X X P X Pa X 1 2 n i G i = 1 i E = ( , , , , ) ( | ) ( | , ) ( | ) ( | ) ( ) P a b c d e P e d P d b c P c a P b a P a

  8. Problems � The number of different network structures grows super exponentially with the number of nodes

  9. P(M|D) P(M|D) M’ M M’ M Where the data set is large, the optimal structure M’ is Where the data set is small, there are many networks which can well defined explain the data fairly well.

  10. MCMC � MCMC samples networks from its posterior distribution ( | ) ( ) P D M P M = k k ( | ) P M D ∑ k ( | ) ( ) P D M P M i i i � Calculate the posterior probability of a feature ∑ ( | ) ( | ) P M D P f M i i i ( | ) P f D ∑ = ( | ) P M D i i

  11. � Coincidence dependence 1 2 1 2 3 4 5 6 3 4 5 6 1 2 3 4 5 6

  12. � Escape from local optima using traditional MCMC P (M|D) M

  13. � Small step size versus big step size P (M|D) M

  14. Problems � Huge search space and coincidence dependence — Prescreening is important! � Local optima — Traversal operator is important! � Fixed step size — Varied step size is more reasonable

  15. MCMCMC � Metropolis-coupled Markov Chain Monte Carlo (MCMCMC) � Pre-processing method � Traversal operators � Algorithm � MCMCMC for missing values

  16. MCMCMC 2 T 1 T 1 > 1 = T > 3 2 T T

  17. � For each chain, move a step based on 1 T ⎛ ⎞ ' ' ' ( | ) ( ) ( | ) P D M P M Q M M ⎜ ⎟ = ' ( , ) min( 1 , ) A M M ⎜ ⎟ ' ⎝ ( | ) ( ) ⎠ ( | ) P D M P M Q M M � Chain swap ⎛ ⎞ ... ... ... T T T T T = ⎜ ⎟ 1 2 i k M S ⎜ ⎟ a ⎝ ... ... ... ⎠ M M M M M 1 2 i k M ⎛ ⎞ ... ... ... T T T T T = ⎜ ⎟ 1 2 i k M S ⎜ ⎟ b ⎝ ... ... ... ⎠ M M M M M 1 2 k i M Acceptance Probability [ ] [ ] ⎧ ⎫ 1 1 ⎪ T T ⎪ ( | ) ( ) ( | ) ( ) P D M P M k P D M P M i = ⎨ ⎬ i i k k min 1 , } P [ ] [ ] a ⎪ ⎪ 1 T 1 T ⎩ ( | ) ( ) ( | ) ( ) ⎭ P D M P M i P D M P M k i i k k

  18. Pre-processing method ∫ = θ θ θ ( | ) ( | , ) ( | ) P D M P D M P M d � Penalize complex model α ∑ α π = α = α π nv π π | | * | | n n v nv n v n n n n v n Γ α Γ α + ( ) ( ) n ∏∏ ∏ π π π = n nv nv ( | ) p D M n n n n n Γ α + Γ α ( ) ( ) n π π π π n v n n nv n n n n n n

  19. The log likelihood is ∑ = π log( ( | )) ( , , ) p D M score n n D n where ∑ π = Γ α − Γ α + + ( , , ) [log( ( )) log( ( ))] score n D n π π π n n nv n n n n n π n ∑∑ Γ α + − Γ α [log( ( ) ) log( ( ))] n π π π nv nv nv n n n n n n π v n n

  20. � Use some max fan in � Find all possible parents-configurations for each node and delete low score parents-configurations � Keep C parents-configurations for each node and cardinality Threshold is set as: θ = λ − + * ( ) / / scoresh scoresl m scoresl m

  21. score(n, π ,D) π ’ π When data is quite sparse and noisy score(n, π ,D) π ’ π Using pre-screening method

  22. Traversal operators � Importance sampling--- Sample a parents-configuration for a node π + ( , , ) score i D C Ki j π = = ( ) ( | ) P nodei p nodei n ∑ j n ∑ π + − ( , , ) ( 1 ) score i D n C Ki k = ≠ 1 , _ k k k old = 1 i π + ( , , ) score i D C k _ old n ( | ) ∑ Q Mold Mnew π + − = ( , , ) ( 1 ) score i D n C k ( | ) Q Mnew Mold = ≠ 1 , _ k k k old π + ( , , ) score i D C k _ new n ∑ π + − ( , , ) ( 1 ) score i D n C k = ≠ k 1 , k k _ new ( | ) P D Mnew = π − π exp( ( , , ) ( , , )) score i D score i D ( | ) _ _ P D Mold k new k old

  23. � DIN sampling --- If the new network is loopy 2 3 2 3 1 1 The old model The new model 2 3 2 3 1 1 Step 1 Step 2

  24. ( | ) * ( ) * ( | ) P D Mnew P Mnew Q Mold Mnew = ( | ) ( 1 , ) A Mnew Mold Min ( | ) * ( ) * ( | ) P D Mold P Mold Q Mnew Mold n ∑ π + π exp( ( , ( ), ) ( , ( ), )) score n n n D score n n n D j j i i ( | ) P D Mnew = = j 1 n ( | ) P D Mold ∑ π + π exp( ( , ( ), ) ( , ( ), )) score n o n D score n o n D j j i i = 1 j π + ( , , ) score i D C k _ old ( | ) Q Mold Mnew n ∑ π + − = ( , , ) ( 1 ) score i D n C k ( | ) Q Mnew Mold = ≠ 1 , _ k k k old π + ( , , ) score i D C k _ new n ∑ π + − ( , , ) ( 1 ) score i D n C I simply us an approximation since it is k = ≠ 1 , _ k k k new quite time consuming to calculate the proposal probability

  25. DIN proposal Traditional MCMC

  26. Algorithm � Initialization � Each iteration Move a step for every chain • Chain swap • � Keep the first chain

  27. Importance sampling M1 M1’ T=1 M2 A (Mi’, Mi) Importance S( m sampling Legal Chains) … Mi’ T>=1 Chain Mm Illegal Swap Mi’ DIN Sampling Pa (S’, S) S’

  28. MCMCMC for missing values X1 X2 X3 X4 X5 1 ? 2 1 1 2 2 2 1 2 ? 1 1 ? 1 1 2 ? 2 ? ? 1 1 1 2 I1 I2 I3 I4 I5 I6 … 1 1 2 ? 1 1 3 1 5 2 1 2 7 3 4 3 9 … 1 ? 2 1 1 2 1 1 2 ? 2 2 ? 1 2 1 2 2 ? ? I1 I2 I3 I4 I5 I6 I7 I8 I9 I10 I11 I12 1 2 2 2 1 1 2 1 2 2 1 2 I4 I3 I10 I6 2 2 1 1

  29. Importance M1, D1 sampling M1’ T=1 M2, D2 A (Mi’, Mi| Di) Importance S( m Legal sampling Chains) … Mi’ T>=1 Mm, Dm Illegal Mi’ DIN Sampling T=1 A (Di’, Di| Gi) Di’ Dmi’ Dmi Observed data Chain Swap Pa (S’, S) S’

  30. � Proposal method for before burn in + 1 N v n = ∑ ( | , ) + Q v M n ( 1 ) N n v n + 1 v N n v v π n m = ∑ + ( | , , ) Q v M n m ( 1 ) N n v v π n m + v 1 n N = ∑ v v v π π n mis nm + ( , | , , ) ( 1 ) Q v v M n m N π n mis v v v π π n mis nm π v n , mis

  31. X1 X2 X3 X4 X5 1 ? 2 1 1 2 2 2 1 2 5 ? 1 1 2 1 1 1 2 ? ? ? ? 1 1 1 2 1 1 2 ? 1 4 1 ? 2 1 1 2 3 2 1 1 2 ? 2 2 ? 1 2 1 2 2 ? ?

  32. Acceptance probability ' ' ( | ) ( | ) Q MissVal MissVal P D M ' ( , ) = min( 1 , ) Accept MissVal MissVal ' ( | ) ( | ) Q MissVal MissVal P D M

  33. � After burn in + = ∏ ∑ 1 N ' MissVal _ i new ( | ) Q MissVal + ( 1 ) N ∈ Ω ( ) i cmis ij j + = ∏ ∑ 1 N _ ' i old ( | ) Q MissVal MissVal + ( ' 1 ) N ∈ Ω ( ) i cmis ij j Acceptance probability ' ' ( | ) ( | ) Q MissVal MissVal P D M ' ( , ) = min( 1 , ) Accept MissVal MissVal ' ( | ) ( | ) Q MissVal MissVal P D M

  34. Result Evaluation (complete data) � ROC curve 1 1 fn fp tp = sensitivty + 2 3 2 3 tp fn tp tn tn tn = specificit y + 4 4 tn fp tn fp = 1 − = complement ary specificit y + + tn fp tn fp tp is the number of true positive edges. fn is the number of false negative edges. fp is the number of false positive edges. tn is the number of true negative edges.

  35. � Model Genetic Network

  36. � MCMCMC against order MCMC � MCMCMC against structure MCMC � MCMCMC against Population MCMC

  37. Temperatures= [1, 1, 3, 9, 30] � Keep at most 10 parents-configurations for each node and cardinality. � With 60000 iterations: 30000 burn in and keep the last 30000 samples. �

  38. � Alarm Network

  39. � Arabidopsis data

  40. Result Evaluation (missing values) � Model Genetic Network � Before burn in(30000 burn in, 30000 iterations � After burn in 40000 iterations � Temp=[1,1,3,9,12]

  41. The ROC curve for noise=0.2 data=200 with different missing rate � Temp = [1, 1, 3, 9, 12] � Use 30000 burn in and 30000 iterations. � Every 10 steps keep one sample. (before burn in algorithm)

  42. � B cell Lymphoma data

Recommend


More recommend