Recent developments in R packages for graphical models — REVISED February 2013 — Søren Højsgaard Aalborg University, Denmark sorenh@math.aau.dk February 22, 2013 Computing for Graphical Models 16 December 2011 Royal Statistical Society, London Compiled: February 22, 2013 File: GM-RSS-slides.tex
Contents 1 Outline 2 Graphs for graphical models 3 Bayesian networks – the gRain package 10 3.1 Specification of conditional probability tables . . . . . . . . . . . . . . . . . . . . . . . . . . 13 3.2 Brute force computations of conditional probabilities . . . . . . . . . . . . . . . . . . . . . . 15 3.3 Bayesian Network (BN) basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 3.4 The chest clinic narrative . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 3.5 Findings and queries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 3.6 The chest clinic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 3.7 Queries – getting beliefs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 3.8 Setting findings – and getting updated beliefs . . . . . . . . . . . . . . . . . . . . . . . . . . 2 3.9 Probability of a finding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 4 Under the hood of gRain – and on the way to gRim 29 4.1 Dependence graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 4.2 Reading conditional independencies – global Markov property . . . . . . . . . . . . . . . . . . 3 4.3 Dependence graph for chest clinic example . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 4.4 Graphs and cliques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 4.5 Decomposable graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 4.6 The key computations with BNs: message passing . . . . . . . . . . . . . . . . . . . . . . . . 38 4.7 Triangulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 4.8 Fundamental operations in gRain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 4.9 Summary of the BN part . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 5 Contingency tables 46 5.1 Log–linear models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 5.2 Graphical models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 5.3 Decomposable models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 5.4 ML estimation in decomposable models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
3 6 Testing for conditional independence 54 6.1 What is a CI-test – stratification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 6.2 Monte Carlo tests* . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 7 Log–linear models using the gRim package 59 7.1 Plotting the dependence graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 7.2 Model specification shortcuts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 7.3 Altering graphical models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 7.4 Model comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 7.5 Decomposable models – deleting edges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 7.6 Decomposable models – adding edges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 7.7 Test for adding and deleting edges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 7.8 Model search in log–linear models using gRim . . . . . . . . . . . . . . . . . . . . . . . . . . 75 8 From graph and data to network 78 9 Prediction 80 10 Other things in gRim 83 11 Built for speed, comfort or safety? 84 12 Winding up 89 13 Book: Graphical Models with R 90
4 1 Outline • Introduce the gRain package (GRAphical Independence Networks) for Bayesian networks • Conditional independence restrictions, dependency graphs, message passing • Log–linear, graphical and decomposable models for contingency tables • Introduce the gRim package (GRaphical Independence Models) • Convert decomposable model to Bayesian network. • The gRbase package and some graph algorithms
5 2 Graphs for graphical models Three different representations of graphs: form <- ~a:b+b:c ug.NEL <- ug(form, result="NEL") # graphNEL (DEFAULT) ug.MAT <- ug(form, result="matrix") ug.SMAT <- ug(form, result="Matrix") # Sparse dgCMatrix ug.IG <- ug(form, result="igraph") • The packages graph and RBGL based on graphNEL (Node-EdgeList representation) • Package igraph has a similar internal representation.
6 ug.NEL A graphNEL graph with undirected edges Number of Nodes = 3 Number of Edges = 2 nodes(ug.NEL) [1] "a" "b" "c" str(edges(ug.NEL)) List of 3 $ a: chr "b" $ b: chr [1:2] "c" "a" $ c: chr "b"
7 ug.MAT a b c a 0 1 0 b 1 0 1 c 0 1 0 ug.SMAT 3 x 3 sparse Matrix of class "dgCMatrix" a b c a . 1 . b 1 . 1 c . 1 .
8 ug.IG IGRAPH UNW- 3 2 -- + attr: name (v/c), label (v/c), weight (e/n) V(ug.IG) Vertex sequence: [1] "a" "b" "c" E(ug.IG) Edge sequence: [1] b -- a [2] c -- b
9 • There are plot methods for graphNEL s and for igraph s. • Graph rendering leaves something to be desired... plot(ug.NEL) a b c
10 3 Bayesian networks – the gRain package Consider the following narrative: Having flu (F) may cause an elevated body temperature (T) (fever). An elevated body temperature may cause a headache (H). Illustrate this narrative by directed acyclic graph (or DAG ): plot(dag(~F+T:F+H:T)) F T H
11 • We have a universe consisting of the variables V = { F, T, H } which all have a finite state space . (Here all are binary). • Corresponding to V there is a random vector X = X V = ( X F , X T , X H ) where x = x V = ( x F , x T , x H ) denotes a specific configuration . • For A ⊂ V we have the random vector X A = ( X v ; v ∈ A ) where a configuration is denoted x A . • We define a joint pmf for X as p X ( x ) = p X F ( x F ) p X T | X F ( x T | x F ) p X H | X T ( x H | x T ) (1) • We allow a simpler notation: Let A and B be disjoint subsets of V . We may then use one of the forms: p X A | X B ( x A | x B ) = p A | B ( x A | x B ) = p ( x A | x B ) = p ( A | B ) Hence (1) may be written p ( V ) = p ( F ) p ( T | F ) p ( H | T )
12 • Notice: By definition of conditional probability we have from Bayes formula that p ( V ) ≡ p ( F, T, H ) = p ( F ) p ( T | F ) p ( H | T, F ) • So the fact that in p ( V ) = p ( F ) p ( T | F ) p ( H | T ) we have p ( H | T ) rather than p ( H | T, F ) reflects the model assumption that if temperature (fever status) is known then knowledge about flu will provide no additional information about headache. • We say that headache is conditionally independent of flu given temperature. • Given a finding or evidence that a person has headache we may now calculate e.g. the probability of having flu, i.e. p ( F = yes | H = yes ) . • In this small example we can compute everything in a brute force way using table operation functions from gRbase .
13 3.1 Specification of conditional probability tables We may specify p ( F ) , p ( T | F ) and p ( H | T ) as tables with parray() (using array() is an alternative), where“yes”is coded as 1 and“no”as 2 . p.F <- parray("F", levels=2, values=c(.01,.99)) F F1 F2 0.01 0.99 p.TgF <- parray(c("T","F"), levels=c(2,2), values=c(.95,.05, .01,.99)) F T F1 F2 T1 0.95 0.01 T2 0.05 0.99 p.HgT <- parray(c("H","T"), levels=c(2,2), values=c(.8,.2, .1,.9)) T H T1 T2 H1 0.8 0.1
14 H2 0.2 0.9
15 3.2 Brute force computations of conditional probabilities Functions tableMult() , tableDiv() and tableMargin() are useful. 1) First calculate joint distribution: p.V <- tableMult(tableMult(p.F, p.TgF), p.HgT) head(as.data.frame.table(p.V)) H T F Freq 1 H1 T1 F1 0.00760 2 H2 T1 F1 0.00190 3 H1 T2 F1 0.00005 4 H2 T2 F1 0.00045 5 H1 T1 F2 0.00792 6 H2 T1 F2 0.00198 2) Then calculate the marginal distribution p.FT <- tableMargin(p.V, margin=c( ' F ' , ' T ' )) as.data.frame.table(p.FT) F T Freq 1 F1 T1 0.0095 2 F2 T1 0.0099
16 3 F1 T2 0.0005 4 F2 T2 0.9801 3) Then calculate conditional distribution p.T <- tableMargin(p.FT, margin= ' T ' ) p.FgT <- tableDiv(p.FT, p.T) p.FgT F T F1 F2 T1 0.4896907216 0.5103093 T2 0.0005098919 0.9994901 So p ( F = yes | H = yes ) = 0 . 500 . However, this scheme is computationally prohibitive in large networks: With 80 variables each with 10 the total state space is 10 80 – the estimated number of atoms in the universe...
Recommend
More recommend