Simulation of molecular regulatory networks with graphical models Inma Tur 1 Robert Castelo 1 Alberto Roverato 2 inma.tur@upf.edu robert.castelo@upf.edu alberto.roverato@unibo.it 1 Universitat Pompeu Fabra, Barcelona, Spain 2 Universit` a di Bologna, Bologna, Italy useR! 2013 - UCLM, Albacete, Spain - July 10-12 2013 Inma Tur, Alberto Roverato, Robert Castelo Simulation of molecular regulatory networks with graphical models 1 / 27
Motivation - Genomics data High-throughput genomics technologies produce high-dimensional ( p ≫ n ) multivariate data sets of continuous and discrete random variables. Gene expression data Network of molecular regulatory interactions Genetical genomics data Inma Tur, Alberto Roverato, Robert Castelo Simulation of molecular regulatory networks with graphical models 2 / 27
Motivation - Graphical Markov Models (GMM) Gaussian GMMs Homogeneous Mixed GMMs X V ∼ N p ( µ, Σ) X V ∼ N p ( µ ( i ) , Σ( i )) with Σ( i ) ≡ Σ I1 1 2 3 Y1 Y2 4 Y3 > library(qpgraph) > library(qpgraph) > set.seed(12345) > set.seed(12345) > gmm <- rHMgmm(dRegularMarkedGraphParam()) > gmm <- rUGgmm(dRegularGraphParam()) > round(solve(gmm$sigma), digits=1) > round(solve(gmm$sigma), digits=1) Y1 Y2 Y3 1 2 3 4 Y1 11.0 0.0 -7.2 1 9.5 -3.4 -7.2 0.0 Y2 0.0 1.2 -1.6 2 -3.4 5.9 0.0 -2.3 Y3 -7.2 -1.6 8.2 3 -7.2 0.0 8.2 0.9 4 0.0 -2.3 0.9 2.3 > gmm$mean() Y1 Y2 Y3 1 0.4720734 0.9669291 0.7242007 2 1.4720734 1.9669291 1.7934027 Inma Tur, Alberto Roverato, Robert Castelo Simulation of molecular regulatory networks with graphical models 3 / 27
Motivation - Graphical Markov Model (GMM) Testing GMM estimation procedures on simulated data is a fundamental step to verify properties such as correctness or asymptotic behavior. The R/Bioconductor package qpgraph implements algorithms to simulate Gaussian GMMs, homogeneous mixed GMMs and data from them. Simulating Gaussian GMMs requires simulating covariance matrices whose inverse matches a zero-pattern defined by missing edges in a graph. Simulating homogeneous mixed GMMs requires simulating conditional covariance matrices whose inverse matches a zero-pattern defined by a graph, and conditional mean vectors satisfying additive effects. The rest of this talk is based on the vignette entitled ” Simulating molecular regulatory networks using qpgraph”from the qpgraph package. Inma Tur, Alberto Roverato, Robert Castelo Simulation of molecular regulatory networks with graphical models 4 / 27
Outline Simulation of graphs for GMMs 1 Simulation of undirected Gaussian Graphical Markov Models 2 Simulation of Homogeneous Mixed Graphical Markov Models 3 Simulation of eQTL models of experimental crosses 4 Conclusions 5 Inma Tur, Alberto Roverato, Robert Castelo Simulation of molecular regulatory networks with graphical models 5 / 27
Outline Simulation of graphs for GMMs 1 Simulation of undirected Gaussian Graphical Markov Models 2 Simulation of Homogeneous Mixed Graphical Markov Models 3 Simulation of eQTL models of experimental crosses 4 Conclusions 5 Inma Tur, Alberto Roverato, Robert Castelo Simulation of molecular regulatory networks with graphical models 6 / 27
Simulation of graphs for GMMs Consider undirected labeled graphs G = ( V , E ) where V = { 1 , . . . , p } is the vertex set indexing a vector of random variables (r.v.’s) X V = { X 1 , . . . , X p } that belong to some probability distribution P . E ⊆ ( V × V ) is the edge set, such that ( i , j ) �∈ E ⇒ X i ⊥ ⊥ X j | X V \{ X i , X j } , holds in P ( pairwise Markov property w.r.t G ). Let { A , B , S } ⊆ V , S separates A from B in G ( A ⊥ G B | S ) if every path between A and B intersects S . G should be such that A ⊥ G B | S ⇒ X A ⊥ ⊥ X B | X S , holds in P ( global Markov property w.r.t G ). In the context of GMMs we want to simulate graphs in which we can control separation and sparseness . Inma Tur, Alberto Roverato, Robert Castelo Simulation of molecular regulatory networks with graphical models 7 / 27
Simulation of graphs for GMMs qpgraph simulates undirected graphs according to the type of graph 1 pure single type of vertices marked two subsets of vertices (associated to discrete and continuous random variables) the model to simulate the random graph 2 Erd¨ os-R´ enyi edges occur with equal probability, or graphs are chosen uniformly at random with given number of vertices & edges. d -regular vertices have a constant degree d (Harary, 1969). It follows that graph density is a linear function of d : D = d / ( p − 1) . This is implemented in the function rgraphBAM(n, param, ...) which returns objects of the class graphBAM defined in the graph package. Inma Tur, Alberto Roverato, Robert Castelo Simulation of molecular regulatory networks with graphical models 8 / 27
Simulation of graphs for GMMs Input parameters in rgraphBAM() are defined by S4 classes graphParam markedGraphParam p pI Ilabels labels pY Ylabels erGraphParam dRegularGraphParam m [# edges] d [vertex degree] prob [Pr(edge)] exclude erMarkedGraphParam dRegularMarkedGraphParam Inma Tur, Alberto Roverato, Robert Castelo Simulation of molecular regulatory networks with graphical models 9 / 27
Simulation of graphs for GMMs d -regular graphs are simulated with the Steger and Wormald (1999) algorithm > library(qpgraph) > set.seed(1234) > g <- rgraphBAM(dRegularMarkedGraphParam(pI=2, pY=10, d=3)) > plot(g, lwd=3) I2 Y1 I1 Y2 Y4 Y3 Y10 Y6 Y5 Y7 Y8 Y9 plot() is overloaded to use the functionality from the Rgraphviz package. Inma Tur, Alberto Roverato, Robert Castelo Simulation of molecular regulatory networks with graphical models 10 / 27
Outline Simulation of graphs for GMMs 1 Simulation of undirected Gaussian Graphical Markov Models 2 Simulation of Homogeneous Mixed Graphical Markov Models 3 Simulation of eQTL models of experimental crosses 4 Conclusions 5 Inma Tur, Alberto Roverato, Robert Castelo Simulation of molecular regulatory networks with graphical models 11 / 27
Simulation of undirected Gaussian GMMs Undirected Gaussian GMMs are multivariate models on continuous r.v.’s X V = { X 1 , . . . , X p } determined by an undirected graph G = ( V , E ) with V = { 1 , . . . , p } and E ⊆ V × V such that X V ∼ N p ( µ, Σ) where { Σ − 1 } ij = 0 for i � = j and ( i , j ) �∈ G Therefore, to simulate an undirected Gaussian GMM we need to build a matrix Σ such that Σ is positive definite ( Σ ∈ S + ), 1 the off-diagonal cells of the scaled Σ corresponding to the present edges in 2 G match a given marginal correlation ρ , the zero pattern of Σ − 1 matches the missing edges in G . 3 This is not straightforward since setting directly off-diagonal cells to zero in some initial Γ ∈ S + will not typically lead to a positive definite matrix. Inma Tur, Alberto Roverato, Robert Castelo Simulation of molecular regulatory networks with graphical models 12 / 27
Simulation of undirected Gaussian GMMs Let Γ G be an incomplete matrix with elements { γ ij } for i = j or ( i , j ) ∈ G . 1 γ 11 γ 12 γ 13 ∗ γ 21 γ 22 ∗ γ 24 Γ G = 2 3 ∗ γ 31 γ 33 γ 34 ∗ γ 42 γ 43 γ 44 4 Γ is a positive completion of Γ G if Γ ∈ S + and { Γ − 1 } ij =0 for i � = j , ( i , j ) �∈ G . Draw Γ G from a Wishart distribution W p (Λ , p ) ; Λ=∆ R ∆ , ∆=diag( { � 1 / p } p ) and R = { R ij } p × p where R ij = 1 for i = j and R ij = ρ for i � = j . It is required that Λ ∈ S + and this happens if and only if − 1 / ( p − 1) < ρ < 1 . Finally, to obtain Σ ≡ Γ from Γ G , qpgraph uses the regression algorithm by Hastie, Tibshirani and Friedman (2009, pg. 634) as matrix completion algorithm. See functions qpRndWishart() , qpHTF() and qpG2Sigma() for further details. Inma Tur, Alberto Roverato, Robert Castelo Simulation of molecular regulatory networks with graphical models 13 / 27
Simulation of undirected Gaussian GMMs rUGgmm() simulates undirected Gaussian GMMs taking as input either a graphParam or a graphBAM object. It returns an S4 object of class UGgmm . > set.seed(12345) > gmm <- rUGgmm(n=1, g=dRegularGraphParam(p=4, d=2), rho=0.75) > class(gmm) [1] "UGgmm" attr(,"package") [1] "qpgraph" 1 > names(gmm) ## it behaves pretty much like a ' list ' object [1] "X" "p" "g" "mean" "sigma" > round(solve(gmm$sigma), digits=1) 1 2 3 4 2 3 1 20.9 -6.6 -13.6 0.0 2 -6.6 10.6 0.0 -4.3 3 -13.6 0.0 12.7 0.9 4 0.0 -4.3 0.9 4.3 > gmm2 <- rUGgmm(n=1, g=gmm$g) ## fix ' g ' simulate covariance only > round(solve(gmm2$sigma), digits=1) 4 1 2 3 4 1 7.2 -1.3 -3.2 0.0 2 -1.3 1.2 0.0 -0.5 3 -3.2 0.0 3.4 -0.5 4 0.0 -0.5 -0.5 1.5 Inma Tur, Alberto Roverato, Robert Castelo Simulation of molecular regulatory networks with graphical models 14 / 27
Recommend
More recommend