Some applications of Bayesian networks Jiˇ r´ ı Vomlel Institute of Information Theory and Automation Academy of Sciences of the Czech Republic This presentation is available at http://www.utia.cas.cz/vomlel/ 1
Contents • Brief introduction to Bayesian networks • Typical tasks that can be solved using Bayesian networks • 1: Medical diagnosis (a very simple example) • 2: Decision making maximizing expected utility (another simple example) • 3: Adaptive testing (a case study) • 4: Decision-theoretic troubleshooting (a commercial product) 2
Bayesian network • a directed acyclic graph G = ( V , E ) • each node i ∈ V corresponds to a random variable X i with a finite set X i of mutually exclusive states • pa ( i ) denotes the set of parents of node i in graph G • to each node i ∈ V corresponds a conditional probability table P ( X i | ( X j ) j ∈ pa ( i ) ) • the DAG implies conditional independence relations between ( X i ) i ∈ V • d-separation (Pearl, 1986) can be used to read the CI relations from the DAG 3
Using the chain rule we have that: = ∏ P (( X i ) i ∈ V ) P ( X i | X i − 1 , . . . , X 1 ) i ∈ V Assume an ordering of X i , i ∈ V such that if j ∈ pa ( i ) then j < i . From the DAG we can read conditional independence relations X i ⊥ ⊥ X k | ( X j ) j ∈ pa ( i ) for i ∈ V and k < i and k �∈ pa ( i ) Using the conditional independence relations from the DAG we get = ∏ P (( X i ) i ∈ V ) P ( X i | ( X j ) j ∈ pa ( i ) ) . i ∈ V It is the joint probability distribution represented by the Bayesian network. 4
Example: P ( X 1 ) P ( X 2 ) X 1 X 2 P ( X 3 | X 1 ) P ( X 4 | X 2 ) X 3 X 4 P ( X 5 | X 1 ) X 5 X 6 P ( X 6 | X 3 , X 4 ) P ( X 9 | X 6 ) X 7 X 8 X 9 P ( X 7 | X 5 ) P ( X 8 | X 7 , X 6 ) P ( X 1 , . . . , X 9 ) = = P ( X 9 | X 8 , . . . , X 1 ) · P ( X 8 | X 7 , . . . , X 1 ) · . . . · P ( X 2 | X 1 ) · P ( X 1 ) = P ( X 9 | X 6 ) · P ( X 8 | X 7 , X 6 ) · P ( X 7 | X 5 ) · P ( X 6 | X 4 , X 3 ) · P ( X 5 | X 1 ) · P ( X 4 | X 2 ) · P ( X 3 | X 1 ) · P ( X 2 ) · P ( X 1 ) 5
Typical use of Bayesian networks • to model and explain a domain. • to update beliefs about states of certain variables when some other variables were observed, i.e., computing conditional probability distributions, e.g., P ( X 23 | X 17 = yes , X 54 = no ) . • to find most probable configurations of variables • to support decision making under uncertainty • to find good strategies for solving tasks in a domain with uncertainty. 6
Simplified diagnostic example We have a patient. Possible diagnoses: tuberculosis, lung cancer, bronchitis. 7
We don’t know anything about the pa- Patient is a smoker. tient 8
Patient is a smoker. ... and he complains about dyspnoea 9
Patient is a smoker and complains ... and his X-ray is positive about dyspnoea 10
Patient is a smoker and complains ... and he visited Asia recently about dyspnoea and his X-ray is pos- itive 11
Application 2:Decision making The goal: maximize expected utility Hugin example: mildew4.net 12
Fixed and Adaptive Test Strategies Q 1 Q 2 Q 5 wrong correct Q 3 Q 4 Q 8 Q 4 wrong correct wrong correct Q 5 Q 7 Q 6 Q 9 Q 2 wrong correct wrong correct wrong correct wrong correct Q 6 Q 7 Q 1 Q 3 Q 6 Q 8 Q 4 Q 7 Q 7 Q 10 Q 8 Q 9 Q 10 13
For all nodes n of a strategy s we X 2 have defined: X 3 • evidence e n , i.e. outcomes of X 1 X 2 steps performed to get to node X 3 n , X 1 • probability P ( e n ) of getting to X 3 node n , and X 2 X 1 • utility f ( e n ) being a real num- X 3 ber. X 1 Let L ( s ) be the set of terminal X 2 nodes of strategy s . X 3 X 1 Expected utility of strategy is E f ( s ) = ∑ ℓ ∈L ( s ) P ( e ℓ ) · f ( e ℓ ) . X 2 14
X 2 X 3 X 1 X 2 Strategy s ⋆ is optimal iff it maxi- X 3 mizes its expected utility. X 1 Strategy s is myopically optimal iff X 3 X 2 each step of strategy s is selected X 1 so that it maximizes expected utility X 3 after the selected step is performed X 1 ( one step look ahead ). X 2 X 3 X 1 X 2 15
Application 3: Adaptive test of basic operations with fractions Examples of tasks: � − 1 � 3 4 · 5 15 24 − 1 8 = 5 8 − 1 8 = 4 8 = 1 T 1 : = 6 8 2 1 6 + 1 12 + 1 2 12 = 3 12 = 1 T 2 : = 12 4 1 4 · 1 1 1 4 · 3 2 = 3 T 3 : = 2 8 � · � 1 � 1 2 · 1 3 + 1 1 4 · 2 3 = 2 12 = 1 � T 4 : = 6 . 2 3 16
Elementary and operational skills 1 2 > 1 2 3 > 1 CP Comparison (common nu- 3 , 3 merator or denominator) 7 = 1 + 2 1 7 + 2 = 3 AD Addition (comm. denom.) 7 7 5 = 2 − 1 5 − 1 2 = 1 SB Subtract. (comm. denom.) 5 5 1 2 · 3 3 MT Multiplication 5 = 10 � � � � 1 2 , 2 6 , 4 3 CD Common denominator = 3 6 6 = 2 · 2 4 2 · 3 = 2 CL Cancelling out 3 2 = 3 · 2 + 1 7 = 3 1 CIM Conv. to mixed numbers 2 2 2 = 3 · 2 + 1 3 1 = 7 CMI Conv. to improp. fractions 2 2 17
Misconceptions Label Description Occurrence d = a + c a b + c MAD 14.8% b + d b − c a d = a − c MSB 9.4% b − d a b · c b = a · c MMT1 14.1% b b · c a b = a + c MMT2 8.1% b · b d = a · d a b · c MMT3 15.4% b · c a · c a b · c d = MMT4 8.1% b + d c = a · b a b MC 4.0% c 18
Student model HV2 HV1 ACMI ACIM ACL ACD AD SB CMI CIM CL CD MT CP MAD MSB MC MMT1 MMT2 MMT3 MMT4 19
Evidence model for task T 1 � 3 � 4 · 5 − 1 8 = 15 24 − 1 8 = 5 8 − 1 8 = 4 8 = 1 6 2 ⇔ MT & CL & ACL & SB & ¬ MMT 3 & ¬ MMT 4 & ¬ MSB T 1 ACL CL MT SB MMT4 MSB T1 MMT3 P ( X1 | T1 ) X1 Hugin: model-hv-2.net 20
Using information gain as the utility function “The lower the entropy of a probability distribution the more we know.” H ( P ( X )) = − ∑ P ( X = x ) · log P ( X = x ) x 1 entropy 0.5 0 0 0.5 1 probability Information gain in a node n of a strategy IG ( e n ) = H ( P ( S )) − H ( P ( S | e n )) 21
Skill Prediction Quality 92 adaptive average descending 90 ascending 88 Quality of skill predictions 86 84 82 80 78 76 74 0 2 4 6 8 10 12 14 16 18 20 Number of answered questions 22
Application 4: Troubleshooting Dezide Advisor customized to a specific portal, seen from the user’s perspective through a web browser. 23
Application 2: Troubleshooting - Light print problem Actions Faults A 1 F 1 A 2 Problem F 2 A 3 F F 3 Questions Q 1 F 4 • Problems: F 1 Distribution problem , F 2 Defective toner , F 3 Corrupted dataflow , and F 4 Wrong driver setting . • Actions: A 1 Remove, shake and reseat toner , A 2 Try another toner , and A 3 Cycle power . • Questions: Q 1 Is the configuration page printed light? 24
Troubleshooting strategy A 2 = yes A 1 = yes A 1 = no A 2 = no Q 1 = no A 1 A 2 Q 1 A 2 = no A 1 = no Q 1 = yes A 2 A 1 A 2 = yes A 1 = yes The task is to find a strategy s ∈ S minimising expected cost of repair ∑ E CR ( s ) = P ( e ℓ ) · ( t ( e ℓ ) + c ( e ℓ ) ) . ℓ ∈L ( s ) 25
Expected cost of repair for a given strategy E CR ( s ) = A 2 = yes A 1 = yes � � P ( Q 1 = no , A 1 = yes ) · c Q 1 + c A 1 A 1 = no A 2 = no Q 1 = no A 1 A 2 � � + P ( Q 1 = no , A 1 = no , A 2 = yes ) · c Q 1 + c A 1 + c A 2 � � + P ( Q 1 = no , A 1 = no , A 2 = no ) · c Q 1 + c A 1 + c A 2 + c CS Q 1 � � + P ( Q 1 = yes , A 2 = yes ) · c Q 1 + c A 2 A 2 = no A 1 = no Q 1 = yes A 2 A 1 � � + P ( Q 1 = yes , A 2 = no , A 1 = yes ) · c Q 1 + c A 2 + c A 1 A 2 = yes A 1 = yes � � + P ( Q 1 = yes , A 2 = no , A 1 = no ) · c Q 1 + c A 2 + c A 1 + c CS Demo: www.dezide.com Products/Demo/‘‘Try out expert mode’’ 26
Commercial applications of Bayesian networks in educational testing and troubleshooting • Hugin Expert A/S. software product: Hugin - a Bayesian network tool. http://www.hugin.com/ • Educational Testing Service (ETS) the world’s largest private educational testing organization Research unit doing research on adaptive tests using Bayesian networks: http://www.ets.org/research/ • SACSO Project Systems for Automatic Customer Support Operations - research project of Hewlett Packard and Aalborg University. The troubleshooter offered as DezisionWorks by Dezide Ltd. http://www.dezide.com/ 27
Recommend
More recommend