applications of bayesian networks
play

Applications of Bayesian networks Ji r Vomlel Laboratory for - PowerPoint PPT Presentation

Applications of Bayesian networks Ji r Vomlel Laboratory for Intelligent Systems University of Economics, Prague Institute of Information Theory and Automation Academy of Sciences of the Czech Republic This presentation is available


  1. Applications of Bayesian networks Jiˇ r´ ı Vomlel Laboratory for Intelligent Systems University of Economics, Prague Institute of Information Theory and Automation Academy of Sciences of the Czech Republic This presentation is available from http://www.utia.cas.cz/vomlel/

  2. Contents: • Bayesian networks as a model for reasoning with uncertainty • Building probabilistic models • Building “good” strategies using the models • Application 1: Adaptive testing • Application 2: Decision-theoretic troubleshooting

  3. Independence If two discrete random variables are independent, the probability of the joint occurrence of values of two variables is equal to the product of the probabilities individually: P ( X = x , Y = y ) = P ( X = x ) · P ( Y = y ) . Also, P ( X = x | Y = y ) = P ( X = x ) - learning the value of Y does not influence your belief about X. Example: two_coins.net

  4. Conditional independence If two variables are conditionally independent, the conditional probability of the joint occurrence given the value of another variable is equal to the product of the conditional probabilities: P ( X = x , Y = y | Z = z ) = P ( X = x | Z = z ) · P ( Y = y | Z = z ) . • Also, learning the value of Z may influence your belief about X and about Y, • but if you know the value of Z, learning the value of Y does not influence your belief about X. P ( X = x | Y = y , Z = z ) = P ( X = x | Z = z ) . Example: two_biased_coins.net

  5. Pearl on Conditional independence (Pearl, 1988, p. 44) • Conditional independence is not a grace of nature for which we must wait passively, but rather a psychological necessity which we satisfy actively by organizing our knowledge in a specific way. • An important tool in such organization is the identification of intermediate variables that induce conditional independence among observables; if they are not in our vocabulary, we create them. In medical diagnosis when some symptoms directly influence one another, the medical profession invents a name for that interaction (e.g., “syndrome”, “complication,” “pathological state”) and treats it as a new auxiliary variable that induces conditional independence; • dependency between any two interacting systems is fully attributed to the dependencies of each on the auxiliary variable.

  6. Building up complex networks • Relationships among many variables are modeled in terms of important relationships among smaller subsets of variables. Example: Wet grass on Holmes’ lawn can be caused either by rain or by his sprinkler. P ( Holmes , Watson , Rain , Sprinkler ) = P ( Holm | Wat , Rn , Sprnk ) · P ( Wat | Rn , Sprnk ) · P ( Rn | Sprnk ) · P ( Sprnk ) = P ( Holm | Rn , Sprnk ) · P ( Wat | Rn ) · P ( Rn ) · P ( Sprnk ) Example: wet_grass.net

  7. Building up complex Bayesian networks • Acyclic directed graphs (DAGs): • Nodes correspond to variables • Directed edges represent explicit dependence relationships • No edges means no explicit dependence, although there can be dependence through relationships with other variables. Example: asia.net

  8. Building Bayesian network models three basic approaches • Discussions with domain experts: expert knowledge is used to get the structure and parameters of the model • A dataset of records is collected and a machine learning method is used to to construct a model and estimate its parameters. • A combination of previous two: e.g. experts helps with the stucture, data are used to estimate parameters.

  9. Typical tasks solved using Bayesian networks Bayesian networks are used: • to model and explain a domain. • to update beliefs about states of certain variables when some other variables were observed, i.e., computing conditional probability distributions, e.g., P ( X 23 | X 17 = yes , X 54 = no ) . • to find most probable configurations of variables • to support decision making under uncertainty • to find good strategies for solving tasks in a domain with uncertainty.

  10. Example of a strategy X 3 = yes X 3 : 1 4 < 2 5 ? X 2 = yes X 3 = no X 2 : 1 5 < 1 4 ? X 1 = yes X 2 = no X 1 : 1 5 < 2 5 ? X 1 = no X 3 is more difficult question than X 2 which is more difficult than X 1 .

  11. Building strategies using the models For all terminal nodes ℓ ∈ L ( s ) of a strategy s we have defined: • steps that were performed to get to that node together with their outcomes. It is called collected evidence e ℓ . • Using the probabilistic model of the domain we can compute probability of getting to that terminal node P ( e ℓ ) . During the process of collecting evidence e we update the probability of getting to a terminal node, which corresponds to conditional probability P ( e ℓ | e ) , where e is evidence collected as far.

  12. Building strategies using the models For all terminal nodes ℓ ∈ L ( s ) of a strategy s we have also defined: • an evaluation function f : ∪ s ∈S L ( s ) �→ R . For each strategy we can compute: • expected value of the strategy: ∑ E f ( s ) = P ( e ℓ ) · f ( e ℓ ) ℓ ∈L ( s ) The goal: • find a strategy that maximizes (minimizes) its expected value

  13. Using entropy as an information measure “The lower the entropy of a probability distribution the more we know.” 1 entropy 0.5 0 0 0.5 1 probability H ( P ( X )) = − ∑ P ( X = x ) · log P ( X = x ) x

  14. X 2 X 3 X 1 X 2 Entropy in node n X 3 H ( e n ) = H ( P ( S | e n )) X 1 X 3 Expected entropy at the end of test t X 2 X 1 ∑ E H ( t ) = P ( e ℓ ) · H ( e ℓ ) X 3 ℓ ∈L ( t ) X 1 X 2 X 3 X 1 X 2

  15. T ... the set of all possible tests X 2 A test t ⋆ is optimal iff X 3 X 1 t ⋆ = t ∈T E H ( t ) . arg min X 2 X 3 X 1 A test t is myopically optimal iff each X 3 question X ⋆ of t minimizes the ex- X 2 X 1 pected value of entropy after the ques- X 3 tion is answered: X 1 X ⋆ = X ∈X E H ( t ↓ X ) , arg min X 2 X 3 X 1 i.e. it works as if the test finished after X 2 the selected question X ⋆ .

  16. Application 1: Adaptive test of basic operations with fractions Examples of tasks: � − 1 � 3 4 · 5 15 24 − 1 8 = 5 8 − 1 8 = 4 8 = 1 T 1 : = 6 8 2 1 6 + 1 12 + 1 2 12 = 3 12 = 1 T 2 : = 12 4 1 4 · 1 1 1 4 · 3 2 = 3 T 3 : = 2 8 � · � 1 � 1 2 · 1 3 + 1 1 4 · 2 3 = 2 12 = 1 � T 4 : = 6 . 2 3

  17. Elementary and operational skills 1 2 > 1 2 3 > 1 CP Comparison (common nu- 3 , 3 merator or denominator) 7 = 1 + 2 1 7 + 2 = 3 AD Addition (comm. denom.) 7 7 5 = 2 − 1 5 − 1 2 = 1 SB Subtract. (comm. denom.) 5 5 1 2 · 3 3 MT Multiplication 5 = 10 � � � � 1 2 , 2 6 , 4 3 CD Common denominator = 3 6 6 = 2 · 2 4 2 · 3 = 2 CL Cancelling out 3 2 = 3 · 2 + 1 7 = 3 1 CIM Conv. to mixed numbers 2 2 2 = 3 · 2 + 1 3 1 = 7 CMI Conv. to improp. fractions 2 2

  18. Misconceptions Label Description Occurrence d = a + c a b + c MAD 14.8% b + d b − c a d = a − c MSB 9.4% b − d a b · c b = a · c MMT1 14.1% b b · c a b = a + c MMT2 8.1% b · b d = a · d a b · c MMT3 15.4% b · c a · c a b · c d = MMT4 8.1% b + d c = a · b a b MC 4.0% c

  19. Student model HV1 ACL ACMI ACIM ACD CP MT CL CMI CIM CD AD SB MMT1 MMT4 MMT3 MMT2 MC MAD MSB

  20. Evidence model for task T 1 � 3 � 4 · 5 − 1 8 = 15 24 − 1 8 = 5 8 − 1 8 = 4 8 = 1 6 2 ⇔ MT & CL & ACL & SB & ¬ MMT 3 & ¬ MMT 4 & ¬ MSB T 1 ACL CL MT SB MMT4 MSB T1 MMT3 P ( X1 | T1 ) X1

  21. Skill Prediction Quality 92 adaptive average descending 90 ascending 88 Quality of skill predictions 86 84 82 80 78 76 74 0 2 4 6 8 10 12 14 16 18 20 Number of answered questions

  22. Total entropy of probability of skills 12 adaptive average descending 11 ascending 10 9 Entropy on skills 8 7 6 5 4 0 2 4 6 8 10 12 14 16 18 20 Number of answered questions

  23. Application 2: Troubleshooting

  24. Application 2: Troubleshooting - Light print problem Actions Faults A 1 F 1 A 2 Problem F 2 A 3 F F 3 Questions Q 1 F 4 • Problems: F 1 Distribution problem , F 2 Defective toner , F 3 Corrupted dataflow , and F 4 Wrong driver setting . • Actions: A 1 Remove, shake and reseat toner , A 2 Try another toner , and A 3 Cycle power . • Questions: Q 1 Is the configuration page printed light?

  25. Troubleshooting strategy A 2 = yes A 1 = yes A 1 = no A 2 = no Q 1 = no A 1 A 2 Q 1 A 2 = no A 1 = no Q 1 = yes A 2 A 1 A 2 = yes A 1 = yes The task is to find a strategy s ∈ S minimising expected cost of repair ∑ E CR ( s ) = P ( e ℓ ) · ( t ( e ℓ ) + c ( e ℓ ) ) . ℓ ∈L ( s )

  26. Expected cost of repair for a given strategy E CR ( s ) = A 2 = yes A 1 = yes � � P ( Q 1 = no , A 1 = yes ) · c Q 1 + c A 1 A 1 = no A 2 = no Q 1 = no A 1 A 2 � � + P ( Q 1 = no , A 1 = no , A 2 = yes ) · c Q 1 + c A 1 + c A 2 � � + P ( Q 1 = no , A 1 = no , A 2 = no ) · c Q 1 + c A 1 + c A 2 + c CS Q 1 � � + P ( Q 1 = yes , A 2 = yes ) · c Q 1 + c A 2 A 2 = no A 1 = no Q 1 = yes A 2 A 1 � � + P ( Q 1 = yes , A 2 = no , A 1 = yes ) · c Q 1 + c A 2 + c A 1 A 2 = yes A 1 = yes � � + P ( Q 1 = yes , A 2 = no , A 1 = no ) · c Q 1 + c A 2 + c A 1 + c CS Demo: light_print_problem

Recommend


More recommend