Probabilistic Graphical Models: Bayesian Networks Li Xiong Slide credits: Page (Wisconsin) CS760 , Zhu (Wisconsin) KDD ’12 tutorial
Outline Graphical models Bayesian networks - definition Bayesian networks - inference Bayesian networks - learning November 5, 2017 Data Mining: Concepts and Techniques 2
Overview The envelope quiz There are two envelopes, one has a red ball ($100) and a black ball, the other two black balls You randomly picked an envelope, randomly took out a ball – it was black At this point, you are given the option to switch envelopes. Should you?
Overview The envelope quiz Random variables E ∈ { 1 , 0 } , B ∈ { r, b } P ( E = 1) = P ( E = 0) = 1 / 2 P ( B = r | E = 1) = 1 / 2 ,P ( B = r | E = 0) = 0 We ask: P ( E = 1 | B = b )
Overview The envelope quiz Random variables E ∈ { 1 , 0 } , B ∈ { r, b } P ( E = 1) = P ( E = 0) = 1 / 2 P ( B = r | E = 1) = 1 / 2 ,P ( B = r | E = 0) = 0 We ask: P ( E = 1 | B = b ) P(B = b | E =1) P ( E =1) 1 / 2 × 1 / 2 P ( E = 1 | B = b ) = = 1 / 3 = P ( B = b ) 3 / 4 The graphical model: E B
Overview Reasoning with uncertainty A set of random variables x 1 , . . . , x n • , x n ≡ y the class label e.g. ( x 1 , . . . , x n −1 ) a feature vector Inference: given joint distribution p ( x 1 , ...,x n ) , compute • p ( X Q | X E ) where X Q ∪ X E ⊆ { x 1 ...x n } e.g. Q = { n } , E = { 1 ...n − 1 } , by the definition of conditional p ( x 1 , ...,x n −1 , x n ) p ( x | x , . . . , x n − 1 ) = 1 n Σ v p ( x 1 , ...,x n −1 , x n = v ) Learning: estimate p ( x 1 , ...,x n ) from training data X (1) , ...,X ( N ) , • where X ( i ) = ( x ( i ) , ...,x ( i ) ) n 1
Overview It is difficult to reason with uncertainty joint distribution p ( x 1 , ...,x n ) • exponential naive storage ( 2 n for binary r .v.) hard to interpret (conditional independence) inference p ( X Q | X E ) • Often can’t afford to do it by brute force If p ( x 1 , ...,x n ) not given, estimate it from data • Often can’t afford to do it by brute force Graphical model: efficient representation, inference, and learning • on p ( x 1 , ...,x n ) , exactly or approximately
Definitions Graphical-Model-Nots Graphical model is the study of probabilistic models Just because there are nodes and edges doesn’t mean it’s a graphical model These are not graphical models: neural network decision tree network flow HMM template
Graphical Models Bayesian networks – directed Markov networks – undirected November 5, 2017 Data Mining: Concepts and Techniques 9
Outline Graphical models Bayesian networks - definition Bayesian networks - inference Bayesian networks - learning November 5, 2017 Data Mining: Concepts and Techniques 10
Bayesian Networks: Intuition • A graphical representation for a joint probability distribution Nodes are random variables Directed edges between nodes reflect dependence • Some informal examples: Smoking At Fire Understood Sensor Material Exam Assignment Alarm Grade Grade
Bayesian networks • a BN consists of a Directed Acyclic Graph (DAG) and a set of conditional probability distributions • in the DAG – each node denotes random a variable – each edge from X to Y represents that X directly influences Y – formally: each variable X is independent of its non- descendants given its parents • each node X has a conditional probability distribution (CPD) representing P ( X | Parents ( X ) )
Definitions Directed Graphical Models Conditional Independence Two r .v.s A, B are independent if P ( A, B ) = P ( A ) P ( B ) or P ( A | B ) = P ( A ) (the two are equivalent) Two r .v.s A, B are conditionally independent given C if P ( A, B | C ) = P ( A | C ) P ( B | C ) or P ( A | B, C ) = P ( A | C ) (the two are equivalent)
Bayesian networks • using the chain rule, a joint probability distribution can be expressed as n P( X 1 ) P ( X i | X 1 , … , X i 1 )) P ( X 1 , … , X n ) i 2 • a BN provides a compact representation of a joint probability distribution n P ( X i | Parents ( X i )) P ( X 1 , … , X n ) i 1
Bayesian network example • Consider the following 5 binary random variables: B = a burglary occurs at your house E = an earthquake occurs at your house A = the alarm goes off J = John calls to report the alarm M = Mary calls to report the alarm • Suppose we want to answer queries like what is P ( B | M , J ) ?
Bayesian network example P ( B ) P ( E ) t f t f 0.001 0.999 0.001 0.999 Burglary Earthquake P ( A | B, E ) B E t f t t 0.95 0.05 Alarm t f 0.94 0.06 f t 0.29 0.71 f f 0.001 0.999 JohnCalls MaryCalls P ( J |A ) P ( M |A ) A t f A t f t 0.9 0.1 t 0.7 0.3 f 0.05 0.95 f 0.01 0.99
Bayesian networks Burglary Earthquake P ( B , E , A , J , M ) P ( B ) P ( E ) P ( A | B , E ) Alarm P(J | A ) P ( M | A ) JohnCalls MaryCalls • a standard representation of the joint distribution for the Alarm example has 2 5 = 32 parameters • the BN representation of this distribution has 20 parameters
Bayesian networks • consider a case with 10 binary random variables • How many parameters does a BN with the following graph structure have? 2 4 4 = 42 4 4 4 4 4 8 4 • How many parameters does the standard table representation of the joint distribution have? = 1024
Advantages of the Bayesian network representation • Captures independence and conditional independence where they exist • Encodes the relevant portion of the full joint among variables where dependencies exist • Uses a graphical representation which lends insight into the complexity of inference
Bayesian Networks Graphical models Bayesian networks - definition Bayesian networks – inference Exact inference Approximate inference Bayesian networks – learning Parameter learning Network learning 20
The inference task in Bayesian networks Given : values for some variables in the network ( evidence ), and a set of query variables Do : compute the posterior distribution over the query variables • variables that are neither evidence variables nor query variables are other variables • the BN representation is flexible enough that any set can be the evidence variables and any set can be the query variables
Recall Naïve Bayesian Classifier P ( | C ) P ( C ) X Derive the maximum posterior i i P ( C | ) X i P ( ) X Independence assumption n P ( | ) P ( | ) P ( | ) P ( | ) ... P ( | ) X Ci x Ci x Ci x Ci x Ci k 1 2 n k 1 Simplified network
Inference Exact Inference Exact Inference by Enumeration Let X = ( X Q , X E ,X O ) for query , evidence, and other variables. Infer P ( X Q | X E ) By definition P ( X , X , X ) Σ X O P ( X , X ) Q E O | X ) = Q E P ( X Q E = Σ X Q ,X O P ( X Q , X E ,X O ) P ( X E )
Inference by enumeration example • let a denote A =true , and ¬a denote A =false • suppose we’re given the query: P ( b | j , m ) “probability the house is being burglarized given that John and Mary both called” • from the graph structure we can first compute: P ( b , j , m ) P ( b ) P ( e ) P ( a | b , e ) P ( j | a ) P ( m | a ) E B e a A sum over possible values for E and A variables ( e, ¬e, a, ¬a ) J M
Inference by enumeration example P ( b , j , m ) P ( b ) P ( e ) P ( a | b , e ) P ( j | a ) P ( m | a ) e a P ( b ) P ( e ) P ( a | b , e ) P ( j | a ) P ( m | a ) e a P(E) P(B) B E A J M 0.001 0.001 e, a 0.001 (0.001 0.95 0.9 0.7 E B e, ¬a B E P(A) 0.001 0.05 0.05 0.01 t t 0.95 0.999 0.94 0.9 0.7 ¬e, a t f 0.94 A f t 0.29 0.999 0.06 0.05 0.01) ¬ e, ¬ a f f 0.001 J M A P(J) A P(M) t t 0.9 0.7 f f 0.05 0.01
Inference by enumeration example • now do equivalent calculation for P ( ¬b , j, m ) • and determine P ( b | j, m ) P ( b | j , m ) P ( b , j , m ) P ( b , j , m ) P ( b , j , m ) P ( b , j , m ) P ( j , m )
Inference Exact Inference Exact Inference by Enumeration Let X = ( X Q , X E ,X O ) for query , evidence, and other variables. Infer P ( X Q | X E ) By definition P ( X , X , X ) Σ X O P ( X , X ) Q E O | X ) = Q E P ( X Q E = Σ X Q ,X O P ( X Q , X E ,X O ) P ( X E ) Computational issue: summing exponential number of terms - with k variables in X O each taking r values, there are r k terms
Bayesian Networks Graphical models Bayesian networks - definition Bayesian networks – inference Exact inference Approximate inference Bayesian networks – learning Parameter learning Network learning 28
Approximate (Monte Carlo) Inference in Bayes Nets • Basic idea: repeatedly generate data samples according to the distribution represented by the Bayes Net • Estimate the probability P ( X Q | X E ) E B A J M
Recommend
More recommend