bayesian networks li xiong
play

Bayesian Networks Li Xiong Slide credits: Page (Wisconsin) CS760 , - PowerPoint PPT Presentation

Probabilistic Graphical Models: Bayesian Networks Li Xiong Slide credits: Page (Wisconsin) CS760 , Zhu (Wisconsin) KDD 12 tutorial Outline Graphical models Bayesian networks - definition Bayesian networks - inference Bayesian


  1. Probabilistic Graphical Models: Bayesian Networks Li Xiong Slide credits: Page (Wisconsin) CS760 , Zhu (Wisconsin) KDD ’12 tutorial

  2. Outline  Graphical models  Bayesian networks - definition  Bayesian networks - inference  Bayesian networks - learning November 5, 2017 Data Mining: Concepts and Techniques 2

  3. Overview The envelope quiz  There are two envelopes, one has a red ball ($100) and a black ball, the other two black balls  You randomly picked an envelope, randomly took out a ball – it was black  At this point, you are given the option to switch envelopes. Should you?

  4. Overview The envelope quiz Random variables E ∈ { 1 , 0 } , B ∈ { r, b } P ( E = 1) = P ( E = 0) = 1 / 2 P ( B = r | E = 1) = 1 / 2 ,P ( B = r | E = 0) = 0 We ask: P ( E = 1 | B = b )

  5. Overview The envelope quiz Random variables E ∈ { 1 , 0 } , B ∈ { r, b } P ( E = 1) = P ( E = 0) = 1 / 2 P ( B = r | E = 1) = 1 / 2 ,P ( B = r | E = 0) = 0 We ask: P ( E = 1 | B = b ) P(B = b | E =1) P ( E =1) 1 / 2 × 1 / 2 P ( E = 1 | B = b ) = = 1 / 3 = P ( B = b ) 3 / 4 The graphical model: E B

  6. Overview Reasoning with uncertainty A set of random variables x 1 , . . . , x n • , x n ≡ y the class label e.g. ( x 1 , . . . , x n −1 ) a feature vector Inference: given joint distribution p ( x 1 , ...,x n ) , compute • p ( X Q | X E ) where X Q ∪ X E ⊆ { x 1 ...x n } e.g. Q = { n } , E = { 1 ...n − 1 } , by the definition of conditional p ( x 1 , ...,x n −1 , x n ) p ( x | x , . . . , x n − 1 ) = 1 n Σ v p ( x 1 , ...,x n −1 , x n = v ) Learning: estimate p ( x 1 , ...,x n ) from training data X (1) , ...,X ( N ) , • where X ( i ) = ( x ( i ) , ...,x ( i ) ) n 1

  7. Overview It is difficult to reason with uncertainty joint distribution p ( x 1 , ...,x n ) • exponential naive storage ( 2 n for binary r .v.) hard to interpret (conditional independence) inference p ( X Q | X E ) • Often can’t afford to do it by brute force If p ( x 1 , ...,x n ) not given, estimate it from data • Often can’t afford to do it by brute force Graphical model: efficient representation, inference, and learning • on p ( x 1 , ...,x n ) , exactly or approximately

  8. Definitions Graphical-Model-Nots Graphical model is the study of probabilistic models Just because there are nodes and edges doesn’t mean it’s a graphical model These are not graphical models: neural network decision tree network flow HMM template

  9. Graphical Models  Bayesian networks – directed  Markov networks – undirected November 5, 2017 Data Mining: Concepts and Techniques 9

  10. Outline  Graphical models  Bayesian networks - definition  Bayesian networks - inference  Bayesian networks - learning November 5, 2017 Data Mining: Concepts and Techniques 10

  11. Bayesian Networks: Intuition • A graphical representation for a joint probability distribution  Nodes are random variables  Directed edges between nodes reflect dependence • Some informal examples: Smoking At Fire Understood Sensor Material Exam Assignment Alarm Grade Grade

  12. Bayesian networks • a BN consists of a Directed Acyclic Graph (DAG) and a set of conditional probability distributions • in the DAG – each node denotes random a variable – each edge from X to Y represents that X directly influences Y – formally: each variable X is independent of its non- descendants given its parents • each node X has a conditional probability distribution (CPD) representing P ( X | Parents ( X ) )

  13. Definitions Directed Graphical Models Conditional Independence Two r .v.s A, B are independent if P ( A, B ) = P ( A ) P ( B ) or P ( A | B ) = P ( A ) (the two are equivalent) Two r .v.s A, B are conditionally independent given C if P ( A, B | C ) = P ( A | C ) P ( B | C ) or P ( A | B, C ) = P ( A | C ) (the two are equivalent)

  14. Bayesian networks • using the chain rule, a joint probability distribution can be expressed as n P( X 1 )  P ( X i | X 1 , … , X i  1 )) P ( X 1 , … , X n )  i  2 • a BN provides a compact representation of a joint probability distribution n  P ( X i | Parents ( X i )) P ( X 1 , … , X n )  i  1

  15. Bayesian network example • Consider the following 5 binary random variables: B = a burglary occurs at your house E = an earthquake occurs at your house A = the alarm goes off J = John calls to report the alarm M = Mary calls to report the alarm • Suppose we want to answer queries like what is P ( B | M , J ) ?

  16. Bayesian network example P ( B ) P ( E ) t f t f 0.001 0.999 0.001 0.999 Burglary Earthquake P ( A | B, E ) B E t f t t 0.95 0.05 Alarm t f 0.94 0.06 f t 0.29 0.71 f f 0.001 0.999 JohnCalls MaryCalls P ( J |A ) P ( M |A ) A t f A t f t 0.9 0.1 t 0.7 0.3 f 0.05 0.95 f 0.01 0.99

  17. Bayesian networks Burglary Earthquake P ( B , E , A , J , M )  P ( B )  P ( E )  P ( A | B , E ) Alarm  P(J | A )  P ( M | A ) JohnCalls MaryCalls • a standard representation of the joint distribution for the Alarm example has 2 5 = 32 parameters • the BN representation of this distribution has 20 parameters

  18. Bayesian networks • consider a case with 10 binary random variables • How many parameters does a BN with the following graph structure have? 2 4 4 = 42 4 4 4 4 4 8 4 • How many parameters does the standard table representation of the joint distribution have? = 1024

  19. Advantages of the Bayesian network representation • Captures independence and conditional independence where they exist • Encodes the relevant portion of the full joint among variables where dependencies exist • Uses a graphical representation which lends insight into the complexity of inference

  20. Bayesian Networks  Graphical models  Bayesian networks - definition  Bayesian networks – inference  Exact inference  Approximate inference  Bayesian networks – learning  Parameter learning  Network learning 20

  21. The inference task in Bayesian networks Given : values for some variables in the network ( evidence ), and a set of query variables Do : compute the posterior distribution over the query variables • variables that are neither evidence variables nor query variables are other variables • the BN representation is flexible enough that any set can be the evidence variables and any set can be the query variables

  22. Recall Naïve Bayesian Classifier P ( | C ) P ( C ) X  Derive the maximum posterior  i i P ( C | ) X i P ( ) X  Independence assumption n       P ( | ) P ( | ) P ( | ) P ( | ) ... P ( | ) X Ci x Ci x Ci x Ci x Ci k 1 2 n  k 1  Simplified network

  23. Inference Exact Inference Exact Inference by Enumeration Let X = ( X Q , X E ,X O ) for query , evidence, and other variables. Infer P ( X Q | X E ) By definition P ( X , X , X ) Σ X O P ( X , X ) Q E O | X ) = Q E P ( X Q E = Σ X Q ,X O P ( X Q , X E ,X O ) P ( X E )

  24. Inference by enumeration example • let a denote A =true , and ¬a denote A =false • suppose we’re given the query: P ( b | j , m ) “probability the house is being burglarized given that John and Mary both called” • from the graph structure we can first compute: P ( b , j , m )   P ( b ) P ( e ) P ( a | b , e ) P ( j | a ) P ( m | a ) E B e a A sum over possible values for E and A variables ( e, ¬e, a, ¬a ) J M

  25. Inference by enumeration example P ( b , j , m )   P ( b ) P ( e ) P ( a | b , e ) P ( j | a ) P ( m | a ) e a  P ( b )  P ( e ) P ( a | b , e ) P ( j | a ) P ( m | a ) e a P(E) P(B) B E A J M 0.001 0.001 e, a  0.001  (0.001  0.95  0.9  0.7  E B e, ¬a B E P(A) 0.001  0.05  0.05  0.01  t t 0.95 0.999  0.94  0.9  0.7  ¬e, a t f 0.94 A f t 0.29 0.999  0.06  0.05  0.01) ¬ e, ¬ a f f 0.001 J M A P(J) A P(M) t t 0.9 0.7 f f 0.05 0.01

  26. Inference by enumeration example • now do equivalent calculation for P ( ¬b , j, m ) • and determine P ( b | j, m ) P ( b | j , m )  P ( b , j , m )  P ( b , j , m ) P ( b , j , m )  P (  b , j , m ) P ( j , m )

  27. Inference Exact Inference Exact Inference by Enumeration Let X = ( X Q , X E ,X O ) for query , evidence, and other variables. Infer P ( X Q | X E ) By definition P ( X , X , X ) Σ X O P ( X , X ) Q E O | X ) = Q E P ( X Q E = Σ X Q ,X O P ( X Q , X E ,X O ) P ( X E ) Computational issue: summing exponential number of terms - with k variables in X O each taking r values, there are r k terms

  28. Bayesian Networks  Graphical models  Bayesian networks - definition  Bayesian networks – inference  Exact inference  Approximate inference  Bayesian networks – learning  Parameter learning  Network learning 28

  29. Approximate (Monte Carlo) Inference in Bayes Nets • Basic idea: repeatedly generate data samples according to the distribution represented by the Bayes Net • Estimate the probability P ( X Q | X E ) E B A J M

Recommend


More recommend