and applications
play

and Applications Lecture 8: Review of Probability Theory Juan - PowerPoint PPT Presentation

Artificial Intelligence: Methods and Applications Lecture 8: Review of Probability Theory Juan Carlos Nieves Snchez November 28, 2014 Outline Probability Axioms Independence. Baye s rule. Inference Using Full Joint


  1. Artificial Intelligence: Methods and Applications Lecture 8: Review of Probability Theory Juan Carlos Nieves Sánchez November 28, 2014

  2. Outline • Probability Axioms • Independence. • Baye ’s rule. • Inference Using Full Joint Distributions Review of Probability Theory 3

  3. What is probability theory Probability theory deals with mathematical models of random phenomena. We often use models of randomness to model uncertainty. Uncertainty can have different causes: • Laziness: it is too difficult or computationally expensive to get to a certain answer. • Theoretical ignorance: We don’t know all the rules that influence the processes we are studying. • Practical ignorance: We know the rules in principle, but we don’t have all the data to apply them. Review of Probability Theory 4

  4. Random experiments Mathematical models of randomness are based on the concept of random experiments. Such experiments should have two important properties: • 1. The experiment must be repeatable. • 2. Future outcomes cannot be exactly predicted based on previous outcomes, even if we can control all aspects of the experiment. Examples: • Coin tossing • Genetics Review of Probability Theory 5

  5. Deterministic vs. random models Deterministic models often give a macroscopic view of random phenomena. They describe an average behavior but ignore local random variations. Examples: • Water molecules in a river. • Gas molecules in a heated container. Lesson to be learned: Model on the right level of detail! Review of Probability Theory 6

  6. Random Variables The basic element of probability is the random variable. We can think random variable as an event with some degree of uncertainty as to whether that event occurs. Random variables have a domain of values it can take on. There are two types of random variables: 2. Discrete random variables. 3. Continuous random variables. Review of Probability Theory 7

  7. Exampes of Random Variables Discrete random variable can take values from a finite number of values. For example: • P(DrinkSize=Small) = 0.1 • P(DrinkSize=Medium) = 0.2 • P(DrinkSize=Large) = 0.7 Continuous random variables can take values from the real number, e.g, they can take values from 0,1 . Note: We will mainly be dealing with discrete random variables. Review of Probability Theory 8

  8. Probability Given a random variable A , P(A) denotes the fraction of possible worlds in which A is true. Worlds in which X is false Event space of P(A) all possible worlds Worlds in which A is true Review of Probability Theory 9

  9. Key observation Consider a random experiment for which outcome 𝐵 sometimes occurs and sometimes doesn’t occur. • Repeat the experiment a large number of times and note, for each repetition, whether 𝐵 occurs or not • Let 𝑔 𝑜 (𝐵) be the number of times 𝐵 occurred in the first 𝑜 experiments 𝑔 𝑜 (𝐵) • Let 𝑠 𝑜 be the relative frequency of 𝐵 in the 𝑜 𝐵 = first 𝑜 experiments Key observation: As 𝑜 → ∞ , the relative frequency 𝑠 𝑜 𝐵 converges to a real number . Review of Probability Theory 10

  10. Intuitions about probability I. Since 0 ≤ 𝑔 𝑜 (𝐵) ≤ 𝑜 we have 0 ≤ 𝑠 𝑜 (𝐵) ≤ 1 . Thus the probability of 𝐵 should be in [0, 1] . II. 𝑔 𝑜 ∅ = 0 and 𝑔 𝑜 𝐹𝑤𝑓𝑠𝑧𝑢ℎ𝑗𝑜𝑕 = 𝑜 . Thus the probability of ∅ should be 0 and the probability of 𝐹𝑤𝑓𝑠𝑧𝑢ℎ𝑗𝑜𝑕 should be 1. III. Let 𝐶 be 𝐹𝑤𝑓𝑠𝑧𝑢ℎ𝑗𝑜𝑕 except 𝐵 . Then 𝑔 𝑜 𝐵 + 𝑔 𝑜 𝐶 = 𝑜 and 𝑠 𝑜 𝐵 + 𝑠 𝑜 𝐶 = 1 . Thus the probability of 𝐵 plus the probability of 𝐶 should be 1. IV. Let 𝐵 ⊆ 𝐶 . Then 𝑠 𝑜 𝐵 ≤ 𝑠 𝑜 𝐶 and thus the probability of 𝐵 should be no bigger than that of 𝐶 . V. Let 𝐵 ∩ 𝐶 = ∅ and 𝐷 = 𝐵 ∪ 𝐶 . Then 𝑠 𝑜 (𝐷) = 𝑠 𝑜 (𝐵) + 𝑠 𝑜 (𝐶) . Thus the probability of 𝐷 should be the probability of 𝐵 plus the probability of 𝐶 . VI. Let 𝐷 = 𝐵 ∪ 𝐶 . Then 𝑔 𝑜 𝐷 ≤ 𝑔 𝑜 (𝐵) + 𝑔 𝑜 (𝐶) and 𝑠 𝑜 (𝐷) ≤ 𝑠 𝑜 (𝐵) + 𝑠 𝑜 (𝐶) . Thus the probability of 𝐷 should be at most the sum of the probabilities of 𝐵 and 𝐶 . VII. Let 𝐷 = 𝐵 ∪ 𝐶 and 𝐸 = 𝐵 ∩ 𝐶 . Then 𝑔 𝑜 𝐸 and 𝑜 𝐷 = 𝑔 𝑜 𝐵 + 𝑔 𝑜 𝐶 − 𝑔 thus the probability of 𝐷 should be the probability of 𝐵 plus the probability of 𝐶 minus the probability of 𝐸 . Review of Probability Theory 11

  11. The probability space A probability space is a tuple where: • is the sample space or set of all elementary events • is the set of events (for our purposes, we can consider ) • is the probability function Note: We often use logical formulas to describe events: 𝑇𝑣𝑜𝑜𝑧 ∧ ¬ 𝐺𝑠𝑓𝑓𝑨𝑗𝑜𝑕 Review of Probability Theory 12

  12. Kolmogorov’s axioms Kolmogorov formulated three axioms that the probability function 𝑄 must satisfy. The rest of probability theory can be built from these axioms. 1. A1: For any , there is a nonnegative real number 2. A2: 3. A3: Let be a collection of pairwise disjoint events. Let be their union. Then These axioms are often called Kolmogorov’s axioms in honor of the Russian mathematician Andrei Kolmogorov. Review of Probability Theory 13

  13. Kolmogorov’s axioms Kolmogorov’s axioms express which properties have to satisfy a probability; however, they do not say how to calculate the probability of the events Review of Probability Theory 14

  14. Flipping coins Consider the random experiment of flipping a coin two times, one after the other. Review of Probability Theory 15

  15. Drawing from an urn Consider the random experiment of drawing two balls, one after the other, from an urn that contains a red (R) , a blue (B) , and a green (G) ball. Review of Probability Theory 16

  16. Independent events The difference between the two examples is that in the first one, the two events are independent while in the second they are not. Review of Probability Theory 17

  17. Conditional probability Review of Probability Theory 18

  18. Flipping coins What is the probability of the second throw resulting in a head given that the first one results in a head? Review of Probability Theory 19

  19. Drawing from an urn What is the probability of the second ball being blue given that the first one is red? Review of Probability Theory 20

  20. The product rule If we rewrite the definition of conditional probability, we get the product rule. Conditional probability: Product rule: Review of Probability Theory 21

  21. Bayes’ rule Review of Probability Theory 22

  22. Bayes Rule’ Example Meningitis causes stiff necks with probability 0.5. The prior probability of having meningitis is 0.00002. The prior probability of having a stiff neck is 0.05. What is the probability of having meningitis given that you have a stiff neck? Review of Probability Theory 23

  23. When is Bayes’ Rule Useful? • Sometimes it’s easier to get 𝑄(𝑌|𝑍) than 𝑄(𝑍|𝑌) . • Information is typically available in the form 𝑄(𝑓𝑔𝑔𝑓𝑑𝑢 | 𝑑𝑏𝑣𝑡𝑓 ) rather than 𝑄( 𝑑𝑏𝑣𝑡𝑓 | 𝑓𝑔𝑔𝑓𝑑𝑢) . 𝑄(𝑓𝑔𝑔𝑓𝑑𝑢 | 𝑑𝑏𝑣𝑡𝑓 ) quantifies the relationship in the • causal direction, whear 𝑄( 𝑑𝑏𝑣𝑡𝑓 | 𝑓𝑔𝑔𝑓𝑑𝑢) describes the diagnostic direction. • For example, 𝑄( 𝑡𝑧𝑛𝑞𝑢𝑝𝑛 | 𝑒𝑗𝑡𝑓𝑏𝑡𝑓 ) is easy to measure empirically but obtaining 𝑄( 𝑒𝑗𝑡𝑓𝑏𝑡𝑓 |𝑡𝑧𝑛𝑞𝑢𝑝𝑛 ) is harder. Review of Probability Theory 24

  24. How is Bayes’ Rule Used In machine learning, we use Bayes rule in the following way: Likelihood of the data Prior probability Posterior probability Review of Probability Theory 25

  25. Probability distributions For random variables with finite domains, the probability distribution simply defines the probability of the variable taking on each of the different values. For instance, The bold P indicates that the result is a vector of • numbers representing the probabilities of each individual state of weather; and where we assume a predefined ordering. Because a probability distribution represents a • normalized frequency distribution, the sum of probabilities must sum 1 . Review of Probability Theory 26

  26. P notation and Conditional Distributions Review of Probability Theory 27

  27. Possible worlds and full joint distributions Review of Probability Theory 28

  28. Full Joint Probability Distributions Toothache Cavity Catch false false false 0.576 false false true 0.144 false true false 0.008 false true true 0.072 true false false 0.064 true false true 0.016 true true false 0.012 true true true 0.108 This cell means 𝑄(𝑈𝑝𝑝𝑢ℎ𝑏𝑑ℎ𝑓 = 𝑢𝑠𝑣𝑓, 𝐷𝑏𝑤𝑗𝑢𝑧 = 𝑢𝑠𝑣𝑓, 𝐷𝑏𝑢𝑑ℎ = 𝑢𝑠𝑣𝑓) = 0.108 Review of Probability Theory 29

  29. Joint Probability Distribution Full Joint Probability Distributions are very powerful, they can be used to answer any probabilistic query involving the three random variables. Review of Probability Theory 30

Recommend


More recommend