15 780 probabilistic graphical models
play

15-780: ProbabilisticGraphicalModels J. Zico Kolter February 22-24, - PowerPoint PPT Presentation

15-780: ProbabilisticGraphicalModels J. Zico Kolter February 22-24, 2016 1 Outline Introduction Probability background Probabilistic graphical models Probabilistic inference MAP Inference 2 Outline Introduction Probability background


  1. 15-780: Probabilistic Graphical Models J. Zico Kolter February 22-24, 2016 1

  2. Outline Introduction Probability background Probabilistic graphical models Probabilistic inference MAP Inference 2

  3. Outline Introduction Probability background Probabilistic graphical models Probabilistic inference MAP Inference 3

  4. Probabilistic reasoning Thus far, most of the problems we have encountered in the course have been deterministic (e.g., assigning an exact set of value to variables, searching where we can deterministically transition between states, optimizing deterministic costs, etc) Many tasks in the real world involve reasoning about and making decisions using probabilities A fundamental shift in AI work that occurred during the 80s/90s 4

  5. Example: topic modeling “Genetics” “Evolution” “Disease” “Computers” human evolution disease computer genome evolutionary host models 0.4 dna species bacteria information genetic organisms diseases data 0.3 genes life resistance computers sequence origin bacterial system Probability gene biology new network 0.2 molecular groups strains systems sequencing phylogenetic control model 0.1 map living infectious parallel information diversity malaria methods 0.0 genetics group parasite networks mapping new parasites software 1 8 16 26 36 46 56 66 76 86 96 project two united new Topics sequences common tuberculosis simulations For documents in a large collection of text, model p ( Word | Topic ) , p ( Topic ) Figure from (Blei, 2011), shows topics and top words learned automatically from reading 17,000 Science articles 5

  6. Example: image segmentation Figure (Nowozin and Lampert, 2012) shows image segmentation problem, original image on left, where goal is to separate foreground from background Middle figure shows a segmentation where each pixel is individually classified as belonging to foreground or background Right figure shows a segmentation where the segmentation is inferred from a probability model over all pixels jointly (encoding probability that neighboring pixels tend to belong to the same group) 6

  7. Example: modeling protein networks In cellular modeling, can we automatically determine how the presence or absence of some proteins affects other proteins? Figure from (Sachs et al., 2005) shows automatically inferred protein probability network, which captured most of the known interactions using data-driven methods (far less manual effort than previous methods) 7

  8. Probabilistic graphical models A common theme in the past several examples is that each relied on a probabilistic model defined over hundreds, thousands, or potentially millions of different quantities “Traditional” joint probability models would not be able to tractably represent and reason over such distributions A key advance in AI has been the development of probabilistic models that exploit notions of independence to compactly model and answer probabilities queries about such distributions 8

  9. Outline Introduction Probability background Probabilistic graphical models Probabilistic inference MAP Inference 9

  10. Random variables A random variable (informally) is a variable whose value is not initially known Instead, the variable can take on different values (and it must take on exactly one of these values), each with an associated probability Weather ∈ { sunny , rainy , cloudy , snowy } p ( Weather = sunny ) = 0 . 3 p ( Weather = rainy ) = 0 . 2 . . . In this course we’ll deal almost exclusively with discrete random variables (taking on values from some finite set) 10

  11. Notation for random variables We’ll use upper case letters X i to denote random variables Important : for a random variable X i taking values { 0 , 1 , 2 }   0 . 1 p ( X i ) = 0 . 4   0 . 5 represents a tuple of the probabilities for each value that X i can take Conversely p ( x i ) (for x i a specific value in { 0 , 1 , 2 } ), or sometimes p ( X i = x i ) , refers to a number (the corresponding entry in the p ( X i ) vector) 11

  12. Given two random variables X 1 with values { 0 , 1 , 2 } and X 2 with values { 0 , 1 } : - p ( X 1 , X 2 ) refers to the entire joint distribution, i.e., it is a tuple with 6 elements (one for each setting of variables) - p ( x 1 , x 2 ) is a number indicating the probability that X 1 = x 1 and X 2 = x 2 . - p ( X 1 , x 2 ) is a tuple with 3 elements, the probabilities for all values of X 1 and the specific value x 2 (note: this is not a probability distribution, it will not sum to one) 12

  13. Basic rules of probability Marginalization : for any random variables X 1 , X 2 [ ] ∑ ∑ p ( X 1 ) = p ( X 1 , x 2 ) = p ( X 1 | x 2 ) p ( x 2 ) x 2 x 2 Conditional probability : The conditional probability p ( X 1 | X 2 ) is defined as p ( X 1 | X 2 ) = p ( X 1 , X 2 ) p ( X 2 ) Chain rule : For any X 1 , . . . , X n n ∏ p ( X 1 , . . . , X n ) = p ( X i | X 1 , . . . , X i − 1 ) i =1 13

  14. = Bayes’s rule : Using definition of conditional probability p ( X 1 , X 2 ) = p ( X 1 | X 2 ) p ( X 2 ) = p ( X 2 | X 1 ) p ( X 1 ) ⇒ p ( X 1 | X 2 ) = p ( X 2 | X 1 ) p ( X 1 ) p ( X 2 | X 1 ) p ( X 1 ) = p ( X 2 ) ∑ x 1 p ( X 2 | x 1 ) p ( x 1 ) An example : I want to know if I have come down with a rare strain of flu (occuring in only 1/10,000 people). There is a an accurate test for the flu (if I have the flu, it will tell me I have it 99% of the time, and if I do not have it, it will tell me I do not have it 99% of the time). I go to the doctor and test positive. What is the probability I have this flu? 14

  15. Independence Two random variables X 1 and X 2 are said to be (marginally, or unconditionally) independent , written X 1 ⊥ ⊥ X 2 , if the joint distribution is given by the product of the marginal distributions p ( X 1 , X 2 ) = p ( X 1 ) p ( X 2 ) ⇐ ⇒ p ( X 1 | X 2 ) = p ( X 1 ) Two random variables X 1 , X 2 are conditionally independent given X 3 , written X 1 ⊥ ⊥ X 2 | X 3 , if p ( X 1 , X 2 | X 3 ) = p ( X 1 | X 3 ) p ( X 2 | X 3 ) ⇐ ⇒ p ( X 1 | X 2 , X 3 ) = p ( X 1 | X 3 ) Marginal independence does not imply conditional independence or vice versa 15

  16. Outline Introduction Probability background Probabilistic graphical models Probabilistic inference MAP Inference 16

  17. High dimensional distributions Probabilistic graphical models (PGMs) are about representing probability distributions over random variables p ( X ) ≡ p ( X 1 , . . . , X n ) where for the remainder of this lecture, x i ∈ { 0 , 1 } n Naively, since there are 2 n possible assignments to X 1 , . . . , X n , can represent this distribution completely using 2 n − 1 numbers, but quickly becomes intractable for large n PGMs are methods to represent these distributions more compactly, by exploiting conditional independence 17

  18. Bayesian networks A Bayesian network is defined by: 1. A directed acyclic graph (DAG) G = ( V = { X 1 , . . . , X n } , E ) 2. A set of conditional probability tables p ( X i | Parents ( X i )) Defines the joint probability distribution n ∏ p ( X ) = p ( X i | Parents ( X i )) i =1 Equivalently, each node is conditionally independent of all non-descendants given its parents 18

  19. Can write distribution as Bayes net example Burglary? Earthquake? X 1 X 2 Alarm? X 3 X 4 X 5 MaryCalls? JohnCalls? 19

  20. Can write distribution as Bayes net example Burglary? Earthquake? p ( X 1 = 1) p ( X 2 = 1) X 1 X 2 0.001 0.002 p ( X 3 = 1) X 1 X 2 Alarm? X 3 0 0 0.001 0 1 0.29 1 0 0.94 X 4 X 5 1 1 0.95 MaryCalls? JohnCalls? p ( X 4 = 1) p ( X 5 = 1) X 3 X 3 0 0.05 0 0.01 1 0.9 1 0.7 19

  21. Bayes net example Burglary? Earthquake? p ( X 1 = 1) p ( X 2 = 1) X 1 X 2 0.001 0.002 p ( X 3 = 1) X 1 X 2 Alarm? X 3 0 0 0.001 0 1 0.29 1 0 0.94 X 4 X 5 1 1 0.95 MaryCalls? JohnCalls? p ( X 4 = 1) p ( X 5 = 1) X 3 X 3 0 0.05 0 0.01 1 0.9 1 0.7 Can write distribution as p ( X ) = p ( X 1 ) p ( X 2 | X 1 ) p ( X 3 | X 1:2 ) p ( X 4 | X 1:3 ) p ( X 5 | X 1:4 ) 19 = p ( X 1 ) p ( X 2 ) p ( X 3 | X 1 , X 2 ) p ( X 4 | X 3 ) p ( X 5 | X 3 )

  22. Independence in Bayes nets Burglary? Earthquake? ⊥ X 5 ? X 4 ⊥ X 1 X 2 ⊥ X 5 | X 3 ? X 4 ⊥ Alarm? X 3 ⊥ X 2 ? X 1 ⊥ ⊥ X 2 | X 3 ? X 1 ⊥ X 4 X 5 MaryCalls? JohnCalls? ⊥ X 2 | X 5 ? X 1 ⊥ 20

  23. Conditional independence in Bayesian networks is characterized by a test called d-separation Two variables X i and X j and conditionally independent given a set of variables X I , if and only if, for all trails connecting X i and X j in the graph, at least one of the following holds: 1. The trail contains a set of nodes X u → X v → X w and X v ∈ X I 2. The trail contains a set of nodes X u ← X v → X w and X v ∈ X I 3. The trail contains a set of nodes X u → X v ← X w and X v and its descendants are not in X I For computing d-separation: (R. Shachter, “Bayes-Ball: The Rational Pastime,” 1998) 21

Recommend


More recommend