Introduction to Bayesian Belief Nets Russ Greiner Dep’t of Computing Science Alberta Ingenuity Centre for Machine Learning University of Alberta http://www.cs.ualberta.ca/~ greiner/bn.html
2
Motivation � Gates says [LATimes, 28/Oct/96]: Microsoft’s competitive advantages is its expertise in “Bayesian networks” � Current Products � Microsoft Pregnancy and Child Care (MSN) � Answer Wizard (Office, …) � Print Troubleshooter Excel Workbook Troubleshooter Office 95 Setup Media Troubleshooter Windows NT 4.0 Video Troubleshooter Word Mail Merge Troubleshooter 3
Motivation (II) � US Army: SAI P ( Battalion Detection from SAR, IR… GulfWar) � NASA: Vista (DSS for Space Shuttle) � GE: Gems (real-time monitor for utility generators) � Intel: (infer possible processing problems from end-of-line tests on semiconductor chips) � KIC: � medical: sleep disorders, pathology, trauma care, hand and wrist evaluations, dermatology, home- based health evaluations � DSS for capital equipment : locomotives, gas- turbine engines, office equipment 4
Motivation (III) � Lymph-node pathology diagnosis � Manufacturing control � Software diagnosis � Information retrieval � Types of tasks � Classification/Regression � Sensor Fusion � Prediction/Forecasting 5
Outline � Existing uses of Belief Nets (BNs) � What is a BN ? � Specific Examples of BNs � Contrast with Rules, Neural Nets, … � Possible applications of BNs � Challenges � How to reason efficiently � How to learn BNs 6
Symptoms Symptoms Blah blah ouch yak Chief complaint ouch blah ouch blah History, … blah ouch blah Signs Signs Physical Exam Test results, … Plan Plan Diagnosis Treatment, … 7
Objectives: Decision Support System � Determine � which tests to perform � which repair to suggest based on costs, sensitivity/specificity , … � Use all sources of information � symbolic (discrete observations, history, …) � signal (from sensors) � Handle partial information � Adapt to track fault distribution 8
Underlying Task � Situation: Given observations { O 1 = v 1 , … O k = v k } (symptoms, history, test results, …) what is best DIAGNOSIS Dx i for patient? : Use set of obs 1 & … & obs m → Dx i rules Approach1 : � Approach1 � but… Need rule for each situation � for each diagnosis Dx r � for each set of possible values v j for O j � for each subset of obs. {O x1 , O x2 , … } ⊂ {O j } Can’t use If Temp>100 & BP = High & Cough = Yes → DiseaseX if only know Temp and BP � Seldom Completely Certain 9
Underlying Task � Situation: Given observations { O 1 = v 1 , … O k = v k } (symptoms, history, test results, …) what is best DIAGNOSIS Dx i for patient? � Approach 2 � Approach 2 : Compute Probabilities of Dx i given observations { obs j } P ( Dx = u | O 1 = v 1 , …, O k = v k ) � Challenge: How to express Probabilities? 10
How to deal with Probabilities � Sufficient: “atomic events”: P ( Dx = u, O 1 =v 1 ,..., O k = v k ,…, O N =v N ) for all 2 1+ N values u ∈ { T, F} , v j ∈ { T, F} P ( Dx=T, O 1 =T, O 2 =T, …, O N =T ) = 0.03 P ( Dx=T, O 1 =T, O 2 =T, …, O N =F ) = 0.4 ⇒ … P ( Dx=T, O 1 =F, O 2 =F, … , O N =T ) = 0 … P ( Dx=F, O 1 =F, O 2 =F, …, O N =F ) = 0.01 � Then: Marginalize : ∑ = = = = = = = = P Dx ( u O , v ,... O v ) P Dx ( u O , v ,... O v ,... O v ) 1 1 7 7 1 1 7 7 N N v ,... v 8 N Conditionalize : = = = P Dx ( u O , v ,... O v ) = = = = 1 1 7 7 P Dx ( u O | v ,... O v ) = = 1 1 7 7 P O ( v ,... O v ) 1 1 7 7 • But… even if binary Dx, 20 binary obs.’s. ⇒ > 2,097,000 numbers! 11
Problems with “Atomic Events” Representation is not intuitive � ⇒ Should make “connections” explicit use “local information” P (Jaundice | Hepatitis), P (LightDim | BadBattery),… � Too many numbers – O (2 N ) � Hard to store � Hard to use [Must add 2 r values to marginalize r variables] � Hard to learn [Takes O( 2 N ) samples to learn 2 N parameters] ⇒ Include only necessary “connections” Belief Nets ⇒ 12
13 but +BloodTest not Jaunticed ? Hepatitis? ? Hepatitis, ? BloodTest Jaunticed
Encoding Causal Links � Simple Belief Net: P(H=1) P(H=0) H 0.05 0.95 h P(B=1 | H=h) P(B=0 | H=h) B 1 0.95 0.05 0 0.03 0.97 h b P(J=1|h,b) P(J=0|h,b) J 1 1 0.8 0.2 � Node ~ Variable 1 0 0.8 0.2 0 1 0.3 0.7 Link ~ “Causal dependency” 0 0 0.3 0.7 � “CPTable” ~ P(child | parents) 14
Encoding Causal Links P(H= 1) H 0.05 h P(B= 1 | H= h) 1 0.95 h b P(J= 1|h , b ) B 0 0.03 1 1 0.8 1 0 0.8 J 0 1 0.3 0 0 0.3 � P(J | H, B=0) = P(J | H, B=1) ∀ J, H ! ⇒ P( J | H, B) = P(J | H) � J is INDEPENDENT of B , once we know H � Don’t need B → J arc! 15
Encoding Causal Links P(H= 1) H 0.05 h P(B= 1 | H= h) 1 0.95 h P(J= 1|h ) B 0 0.03 1 0.8 1 J 0 0.3 0 � P(J | H, B=0) = P(J | H, B=1) ∀ J, H ! ⇒ P( J | H, B) = P(J | H) � J is INDEPENDENT of B , once we know H � Don’t need B → J arc! 16
Encoding Causal Links P(H= 1) H 0.05 h P(B= 1 | H= h) 1 0.95 h P(J= 1|h ) B 0 0.03 1 0.8 0 0.3 J � P(J | H, B=0) = P(J | H, B=1) ∀ J, H ! ⇒ P( J | H, B) = P(J | H) � J is INDEPENDENT of B , once we know H � Don’t need B → J arc! 17
Sufficient Belief Net P(H= 1) H 0.05 h P(B= 1 | H= h) 1 0.95 h P(J= 1|h ) B 0 0.03 1 0.8 0 0.3 J � Requires: P(H=1) known P(J=1 | H=1) known P(B=1 | H=1) known (Only 5 parameters, not 7) P (J=0 | H=1) 1 P (H=1 | B=1, J=0 ) = P (H=1) P (B=1 | H=1) P(J=0 |B=1,H=1) Hence: α 18
H “Factoring” B � B does depend on J: J If J=1, then likely that H=1 ⇒ B =1 � but… ONLY THROUGH H: � If know H=1, then likely that B=1 � … doesn’t matter whether J=1 or J=0 ! ⇒ P(J=0 | B=1, H=1) = P(J=0 | H=1) N.b., B and J ARE correlated a priori P(J | B ) ≠ P(J) GIVEN H, they become uncorrelated P(J | B, H) = P(J | H) 19
Factored Distribution � Symptoms independent, given Disease ≠ P ( B | J ) P ( B ) but H Hepatitis P ( B | J,H ) = P ( B | H ) J Jaundice B (positive) Blood test � ReadingAbility and ShoeSize are dependent, P (ReadAbility | ShoeSize ) ≠ P (ReadAbility ) but become independent, given Age P (ReadAbility | ShoeSize, Age ) = P (ReadAbility | Age) Age ShoeSize Reading 20
“Naïve Bayes” � Classification Task: Given { O 1 = v 1 , …, O n = v n } Find h i that maximizes (H = h i | O 1 = v 1 , …, O n = v n ) P(H = h i ) H P(O j = v j | H = h i ) � Given ... O 1 O 2 O n Independent: P(O j | H, O k ,…) = P(O j | H) 1 ∏ = = = = = = = P ( H h | O v ..., O v ) P ( H h ) P ( O v | H h ) α i 1 1 n n i j j i j � Find argmax {h i } 21
H Naïve Bayes (con’t) ... O 1 O 2 O n 1 ∏ = = = = = = = P ( H h | O v ..., O v ) P ( H h ) P ( O v | H h ) α i 1 1 n n i j j i j ∑ ∏ α = = = = = = = P ( O v ,..., O v ) P ( H h ) P ( O v | H h ) Normalizing term � 1 1 n n i j j i i j (No need to compute, as same for all h i ) � Easy to use for Classification � Can use even if some v j s not specified � If k Dx ’s and n O i s, requires only k priors, n * k pairwise-conditionals (Not 2 n+k … relatively easy to learn) 2 n+1 – 1 n 1+2n 10 21 2,047 22 30 61 2,147,438,647
Bigger Networks P(I= 1) P(H= 1) GeneticPH LiverTrauma 0.20 0.32 g lt P(H= 1|g ,lt ) 1 1 0.82 Hepatitis 1 0 0.10 0 1 0.45 0 0 0.04 h P(J= 1| h ) Jaundice Bloodtest 1 0.8 h P(B= 1| h ) 0 0.3 1 0.98 0 0.01 Intuition: Show CAUSAL connections: � GeneticPH CAUSES Hepatitis; Hepatitis CAUSES Jaundice � If GeneticPH, then expect Jaundice: GeneticPH ⇒ Hepatitis ⇒ Jaundice But only via Hepatitis: GeneticPH and not Hepatitis ⇒ Jaundice P ( J | G ) ≠ P ( J ) but P ( J | G,H ) = P ( J | H) 23
Belief Nets � DAG structure � Each node ≡ Variable v � v depends (only) on its parents + conditional prob: P(v i | parent i = 〈 0,1,… 〉 ) � v is INDEPENDENT of non-descendants, given assignments to its parents D I Given H = 1, - D has no influence on J H - J has no influence on B - etc. J B 24
Less Trivial Situations N.b., obs 1 is not always independent of obs 2 given H • Eg, FamilyHistoryDepression ‘causes’ MotherSuicide and Depression • MotherSuicide causes Depression (w/ or w/o F.H.Depression) P(FHD=1) FHD 0.001 f P(MS=1 | FHD=f) 1 0.10 MS f m P(D=1 | FHD=f, MS=m) 0 0.03 1 1 0.97 1 0 0.90 D 0 1 0.08 0 0 0.04 • Here, P ( D | MS, FHD ) ≠ P ( D | FHD ) ! � Can be done using Belief Network, but need to specify: P( FHD ) 1 P( MS | FHD ) 2 P( D | MS, FHD ) 4 25
26 Example: Car Diagnosis
27 MammoNet
28 A Logical Alarm Reduction Mechanism • 8 diagnoses, 16 findings, … ALARM
29 Troup Detection
30 ARCO1: Forecasting Oil Prices
31 ARCO1: Forecasting Oil Prices
32 Forecasting Potato Production
33 Warning System
Recommend
More recommend