Probabilistic Graphical Models for Cellular Pathways Florian Markowetz � � � � � � � � � � � florian.markowetz@molgen.mpg.de � � � � � � � � Max Planck Institute for Molecular Genetics � � � � � � � � Computational Diagnostics Group � � � � � � � � � � Berlin, Germany � � � IPM workshop Tehran, 2005 April
� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � Cellular networks Figure from http://array.mbb.yale.edu/yeast/transcription/ Florian Markowetz, Probabilistic Graphical Models for Cellular Pathways , 2005 April 1
� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � Modelling networks High-throughput assays can probe cells at a genome-wide scale. Very prominent: microarrays that measure mRNA transcript quantitites. Need to use probabilistic models , which account for • measurement noise, • variability in the biological system, and • aspects of the system not captured by the model. Florian Markowetz, Probabilistic Graphical Models for Cellular Pathways , 2005 April 2
� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � Clustering by coexpression 36773_f_at 36878_f_at 41723_s_at 39839_at Assumption: 34033_s_at 32542_at 38994_at 40202_at 41356_at 1065_at 40570_at 40936_at 33705_at 32035_at 34362_at 41266_at 37344_at 37988_at Coexpression ∼ coregulation 1096_g_at 41166_at 39318_at 33273_f_at 33274_f_at 41215_s_at 33439_at 38096_f_at 38968_at 34210_at 37043_at 38514_at 1369_s_at 36103_at 38354_at 37701_at 36711_at 280_g_at 33412_at If genes show the same 32794_g_at 39710_at 33232_at 35926_s_at 38585_at 33516_at 38242_at 266_s_at 39389_at 38604_at 307_at 36239_at 38052_at 36108_at expression profiles they follow 36638_at 41193_at 36650_at 34168_at 914_g_at 37280_at 1325_at 37006_at 37625_at 34800_at 36275_at 40953_at 36536_at the same regulatory regimes 33809_at 32612_at 37623_at 35372_r_at 36927_at 41470_at 37558_at 37809_at 38147_at 41504_s_at 38446_at 995_g_at 1110_at 38917_at [7, 25]. 38319_at 32649_at 37399_at 40775_at 41468_at 32855_at 39317_at 33238_at 39829_at 296_at 39729_at 39878_at 41214_at 38355_at 38095_i_at 38833_at 35016_at 37039_at 41164_at 41165_g_at 31687_f_at 31525_s_at 19017 17003 18001 LAL4 19014 20005 02020 43015 28008 31015 10005 11002 28009 01007 04018 15006 24006 09002 16007 16002 64005 43006 12008 83001 26009 65003 56007 19008 01003 44001 49004 37001 19002 04016 28007 24022 03002 36002 09017 27004 49006 62001 43004 20002 12012 64001 65005 28036 84004 26003 62002 15001 24008 26005 26001 08024 48001 12019 25003 11005 01005 24011 43007 04007 31011 12007 22011 24017 14016 37013 22013 68003 24010 12006 43001 08001 04006 26008 28032 16004 15004 19005 24005 28028 31007 63001 57001 24019 64002 36001 08018 28003 LAL5 22010 12026 06002 04008 16009 68001 25006 22009 24018 04010 28021 24001 30001 28035 28024 27003 28037 28006 28001 28043 28031 33005 28042 43012 28023 28047 08012 08011 28019 01010 28044 28005 62003 15005 09008 Florian Markowetz, Probabilistic Graphical Models for Cellular Pathways , 2005 April 3
� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � Correlation graphs An expression profile is a random vector X = ( X 1 , . . . , X p ) . Correlation graph: Depict genes as vertices of a graph and draw an edge ( i, j ) iff the correlation coefficient ρ ij � = 0 . Advantage: This representation of the marginal dependence structure is easy to interpret and can be accurately estimated even if p ≫ N . Application: Stuart et. al [28] build a graph from coexpression across multiple organisms. Florian Markowetz, Probabilistic Graphical Models for Cellular Pathways , 2005 April 4
� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � Problems of correlation based approaches We cannot distinguish direct from indirect dependencies! Three reasons, why X , Y , and Z are highly correlated: H Y X Z X Y Z X Z Y As a cure: search for correlations which cannot be explained by other variables. Florian Markowetz, Probabilistic Graphical Models for Cellular Pathways , 2005 April 5
� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � Overview 1. Gaussian graphical models - conditional independence - partial correlations 2. Bayesian networks - d-separation - PC algorithm - equivalence of networks 3. Bayesian structure learning - marginal likelihood - search strategies Florian Markowetz, Probabilistic Graphical Models for Cellular Pathways , 2005 April 6
� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � Part I. Gaussian graphical models Florian Markowetz, Probabilistic Graphical Models for Cellular Pathways , 2005 April 7
� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � Conditional independence Be X, Y, Z random variables with joint distribution P . X is conditionally independent of Y given Z = Y | Z ⇔ X | P ( X = x, Y = y | Z = z ) P ( X = x | Z = z ) · P ( Y = y | Z = z ) = P ( X = x | Y = y, Z = z ) = P ( X = x | Z = z ) Florian Markowetz, Probabilistic Graphical Models for Cellular Pathways , 2005 April 8
� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � Conditional independence: interpretation Interpret random variables as abstract pieces of knowledge obtained from, say, reading books [16]. = Y | Z means Then X | Knowing Z , reading Y is irrelevant for reading X If I already know Z , then Y offers me no new information to understand X . Florian Markowetz, Probabilistic Graphical Models for Cellular Pathways , 2005 April 9
� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � Conditional independence in Gaussian models • Consider a random vector X = ( X 1 , . . . , X p ) . • Assume that X ∼ N ( µ, Σ) , where Σ is regular. • Let K = Σ − 1 be the concentration matrix of the distribution (aka precision matrix ). Then it holds for i, j ∈ { 1 , . . . , p } with i � = j that = X j | X rest ⇔ k ij = 0 , X i | where rest = { 1 , . . . , p } \ { i, j } [16]. Florian Markowetz, Probabilistic Graphical Models for Cellular Pathways , 2005 April 10
� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � Gaussian Graphical models (GGM) Given a random vector X = ( X 1 , . . . , X p ) . A Gaussian graphical model [16, 6] is an undirected graph on vertex set V , with | V | = p . To each vertex i ∈ V corresponds a random variable X i ∈ X . Draw an edge between vertices i and j if and only if k ij � = 0 . Note: In correlation graphs we modeled via Σ , in GGMs we use K = Σ − 1 . Florian Markowetz, Probabilistic Graphical Models for Cellular Pathways , 2005 April 11
� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � Example of a GGM Missing edges indicate independencies: 1 X j | X rest X i = | 2 3 X 4 | { X 2 , X 3 } X 1 = | = X 3 | { X 1 , X 4 } X 2 | 4 X 2 = X 4 | { X 1 , X 3 } | Florian Markowetz, Probabilistic Graphical Models for Cellular Pathways , 2005 April 12
� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � Estimation from data Likelihood � � − 1 n ( x ; K ) = (2 π ) − p 1 2 | K | 2 exp 2 x T K x Test Null-Hypothesis k ij = 0 versus Alternative k ij � = 0 . • The Null-Hypothesis constrains the precision matrix K , • the alternative leaves K unconstrained. Likelihood ratio test statistic is asymptotically χ 2 distributed [16]. Florian Markowetz, Probabilistic Graphical Models for Cellular Pathways , 2005 April 13
Recommend
More recommend