statistical reasoning
play

Statistical Reasoning Probability and Bayes' Theorem Certainty - PDF document

Statistical Reasoning Probability and Bayes' Theorem Certainty Factors and Rule- Based Systems Bayesian Networks Dempster-Shafer Theory Fuzzy Logic Chapter 8 1 Probability and Bayes' Theorem P(H i|E) = the probability that


  1. Statistical Reasoning • Probability and Bayes' Theorem • Certainty Factors and Rule- Based Systems • Bayesian Networks • Dempster-Shafer Theory • Fuzzy Logic Chapter 8 1

  2. Probability and Bayes' Theorem P(H i|E) = the probability that hypothesis Hi is true given evidence E P(E|Hi) = the probability that we will observe evidence E given that hypothesis Hi is true P(Hi) = the a priori probability that hypothesis Hi is true in the absence of any specific evidence. These probabilities are called prior probabilities or prios k = the number of possible hypotheses Bayes's theorem then state that P(Hi|E) = P(E|Hi) . P(Hi) k Σ P(E|Hn).P(Hn) n=1 Chapter 8 2

  3. In real life problem we have several evidence that are not independent Example S:patient has spots M:patient has measles F:patient has high fever Spots & Fever are not independent events and hence we cannot just sum their effects. There is a need to represent explicitely the conditional probability that arises from their conjunction In general, given a prior body of evidence e and some new observation E, we need to compute P(H|E,e) = P(H|E). P(e|E,H) P(e|E) The size of the set of joint probabilities required to compute this function grows as 2n if there are n different propositions being considered. Chapter 8 3

  4. Bayes's theorem is intractable for several reasons: • The knowledge acquisition problem is insurmountable • The space that would be required to store all the probabilities is too large • The time required to compute the probabilities is too large Chapter 8 4

  5. Certainty Factors and Rule-Based Systems Practical way of compromising on pure Bayesian system was pioneered in the MYCIN system Example If: (1) the stain of the organism is gram-positive, and (2) the morphology of the organism is cocus, and (3) the growth of conformation of the organism is clumps, then there is a suggestive evidence (0.7) that the identity of the organism is staphylococcus Chapter 8 5

  6. Basic Definitions • MB[h,e] - a measure (between 0 and 1) of belief in hypothesis h given the evidence e • MD[h,e] - a meaure (between 0 and 1) of disbelief in hypothesis h given the evidence e. • CF[h,e] - is the certainty factor and is defined as • CF[h,e] = MB[h,e] - MD[h,e] • Since any particular piece of evidence either supports or denies a hypothesis, a single number suffices to define both MB and MD and thus the CF Chapter 8 6

  7. Combination Of Multiple Pieces Of Evidence A C B MB[h,s1 s2] = 0 if MD[h,s1 s2]=1 = MB[h,s1] +MB[h,s2].(1-MB[h,s1]) otherwise MD[h,s1 s2] = 0 if MB[h,s1 s2]=1 MD[h,s1] +MD[h,s2].(1-MD[h,s1]) otherwise Chapter 8 7

  8. Combination Of Multiple Pieces Of Evidence (continue) A and B MB[h1 and h2,e] = min(MB[h1,e],MB[h2,e]) A or B MB[h1 or h2,e] = max(MB[h1,e], MB[h2,e]) A B C MB[h,s] = MB / [h,s]. max(0, CF[s,e]) Chapter 8 8

  9. Advantages • The approach makes strong independence assumptions that make it relatively easy to use • The approach can serve as the basis of practical application programs • It appears to mimic the way people manipulate certainities Disadvantages • The assumption of independency creates danger if rules are not written carefully • No solid theoretical basis Chapter 8 9

  10. Dempster-Shafer Theory • This approach considers sets of propositions and assign to each of them an interval : [Belief, Plausibility] • Belief (Bel) measures the strengthof the evidence in favor of a set of propositions. It ranges from 0 (no evidence) to 1 (certainty) • Plausibility(Pl) measure the extent to which evidence in favor of - s leaves room for belief in s . It also ranges from 0 to 1 and is defined as: Pl(s) = 1 - Bel(- s) • Θ is an exhaustive universe of mutually exclusive hypotheses (frame of discrement) Chapter 8 10

  11. • m( p ) measure the amount of belief that is currently assigned to exactly the set p of hypotheses. • If Θ contains n element then there are 2n subsets of Θ. We must assign m so that the sum of all the m values assigned to the subset of Θ is 1 • Although dealing with 2n values may appear intractable, it usually turns out that many of the subsets will never need to be considered. Suppose we are given two belief functions m 1 and m 2 . Let X be the set of subsets of Θ to which m 1 assigns a nonzero value and let Y be the corresponding set of m 2 . We define the combination m 3 of m 1 and m 2 to be : Σ X Λ Y = Z m1(X). m2(Y) m 3 (Z) = ----------------------- 1 - Σ X Λ Y = φ m1(X). m2(Y) Chapter 8 11

  12. Example Asume that Θ = {A,F,C,P} where A: allergy, F: flu, C: cold, P: pneumonia Our measure of belief before observing any sypmtom is: m( Θ) = 1.0 suppose that m1 corresponds to our belief after observing fever: m1 ({F,C,P}) = 0.6 m1 ( Θ) = 0.4 suppose that m2 corresponds to our belief after observing fever: m2 ({A,F,C}) = 0.8 m2 ( Θ) = 0.2 Computing the combination m 3 m2 ( Θ) 0.8 0.2 m2 ({A,F,C}) 0.6 m 3({F,C}) 0.48 0.12 m1( {F,C,P}) m3 ({F,C,P}) Θ m1( Θ) 0.4 m 3{A,F,C} 0.32 0.08 Chapter 8 12

  13. Now let m 4 corresponds to our belief given just the evidence that the problem goes away when the patient goes on a trip m 4 ({A}) = 0.9 m 4 ( Θ) = 0.1 m4( Θ ) m4({A}) 0.9 0.1 0.48 Φ m3({F,C}) 0.432 {F,C} 0.048 m3({A,F,C}) 0.32 ({A}) 0.288 {A,F,C} 0.032 0.12 Φ m3({F,C,B}) 0.108 {F,C,B} 0.012 m3( Θ ) Θ 0.08 ({A}) 0.072 0.008 But there is now a total belief of 0.54 associated with Φ ; only 0.46 is associated with outcomes that are in fact possible. So we need to scale the remaining values by the facor 1 - 0.54 = 0.46 . Then m 5 is m 5 ({F,C}) 0.104 m 5 ({A,F,C}) 0.070 m 5 ({F,C,B}) 0.026 m 5 ({A}) 0.783 m 5 ( Θ ) 0.017 Chapter 8 13

Recommend


More recommend