Pearl’s computational architecture In Pearl ’s algorithm the graph of a Bayesian network is used as a computational architecture: • each node in the graph is an autonomous object; • each object has a local memory that stores the assessment functions of the associated node; • each object has available a local processor that can do (simple) probabilistic computations; • each arc in the graph is a (bi-directional) communication channel, through which connected objects can send each other messages. 106 / 384
A computational architecture 3 2 1 count count count 107 / 384
A computational architecture 1 3 2 1 2 count count count count count 108 / 384
A computational architecture 2 1 1 3 2 1 2 count count count count count 109 / 384
A computational architecture � � � 1 � � - 2 3 4 4 2 1 3 - 1 1 1 1 1 2 1 4 3 1 2 3 4 110 / 384
A computational architecture � 1 � - � � 4 3 3 2 3 2 4 3 - 1 1 1 1 1 3 4 3 4 111 / 384
Understanding Pearl: single arc (1) Consider Bayesian network B with the following graph: V 1 γ ( V 1 ) γ ( V 2 | V 1 ) V 2 Let Pr be the joint distribution defined by B . We consider the situation without evidence. • Can node V 1 compute the probabilities Pr( V 1 ) ? If so, how? • Can node V 2 compute the probabilities Pr( V 2 ) ? If so, how? 112 / 384
Understanding Pearl: single arc (2) Consider Bayesian network B with the following graph: V 1 γ ( v 1 ) , γ ( ¬ v 1 ) Let Pr be the joint distribution defined by B . We consider the situation γ ( v 2 | v 1 ) , γ ( ¬ v 2 | v 1 ) without evidence. V 2 γ ( v 2 | ¬ v 1 ) , γ ( ¬ v 2 | ¬ v 1 ) • node V 1 can determine the probabilities for its own values: Pr( v 1 ) = γ ( v 1 ) , Pr( ¬ v 1 ) = γ ( ¬ v 1 ) • node V 2 cannot determine Pr( V 2 ) , but does know all four conditional probabilities: Pr( V 2 | V 1 ) = γ ( V 2 | V 1 ) V 2 can compute its probabilities given information from V 1 : Pr( v 2 ) = Pr( v 2 | v 1 ) · Pr( v 1 ) + Pr( v 2 | ¬ v 1 ) · Pr( ¬ v 1 ) Pr( ¬ v 2 ) = Pr( ¬ v 2 | v 1 ) · Pr( v 1 ) + Pr( ¬ v 2 | ¬ v 1 ) · Pr( ¬ v 1 ) 113 / 384
Understanding Pearl: directed path (1) Consider Bayesian network B with the following graph: γ ( V 1 ) V 1 γ ( V 2 | V 1 ) V 2 γ ( V 3 | V 2 ) V 3 We consider the situation without evidence. • Can node V 1 compute the probabilities Pr( V 1 ) ? • Can node V 2 compute the probabilities Pr( V 2 ) ? • Can node V 3 compute the probabilities Pr( V 3 ) ? If so, how? 114 / 384
Understanding Pearl: directed path (2) Consider Bayesian network B with the following graph: γ ( v 1 ) , γ ( ¬ v 1 ) V 1 We consider the situation without evidence. γ ( v 2 | v 1 ) , γ ( ¬ v 2 | v 1 ) V 2 Given information from γ ( v 2 | ¬ v 1 ) , γ ( ¬ v 2 | ¬ v 1 ) V 1 , node V 2 can compute Pr( v 2 ) and Pr( ¬ v 2 ) . γ ( v 3 | v 2 ) , γ ( ¬ v 3 | v 2 ) V 3 γ ( v 3 | ¬ v 2 ) , γ ( ¬ v 3 | ¬ v 2 ) Node V 2 now sends node V 3 the required information; node V 3 computes: Pr( v 3 ) = Pr( v 3 | v 2 ) · Pr( v 2 ) + Pr( v 3 | ¬ v 2 ) · Pr( ¬ v 2 ) = γ ( v 3 | v 2 ) · Pr( v 2 ) + γ ( v 3 | ¬ v 2 ) · Pr( ¬ v 2 ) Pr( ¬ v 3 ) = γ ( ¬ v 3 | v 2 ) · Pr( v 2 ) + γ ( ¬ v 3 | ¬ v 2 ) · Pr( ¬ v 2 ) 115 / 384
Introduction to causal parameters Reconsider Bayesian network B without observations: γ ( v 1 ) , γ ( ¬ v 1 ) V 1 Node V 1 sends a message π V 1 V 2 ↓ enabling V 2 to compute the probabilities for its values. γ ( v 2 | v 1 ) , γ ( ¬ v 2 | v 1 ) V 2 γ ( v 2 | ¬ v 1 ) , γ ( ¬ v 2 | ¬ v 1 ) This message is a function π V 1 V 2 : { v 1 , ¬ v 1 } → [0 , 1] that attaches a number to each value of V 1 , such that � π V 1 V 2 ( c V 1 ) = 1 c V 1 The function π V 1 V 2 is called the causal parameter from V 1 to V 2 . 116 / 384
Causal parameters: an example Consider the following Bayesian network without observations: γ ( v 1 ) = 0 . 7 , γ ( ¬ v 1 ) = 0 . 3 V 1 Node V 1 : π V 1 • receives no mes- V 2 ↓ sages γ ( v 2 | v 1 ) = 0 . 2 , γ ( ¬ v 2 | v 1 ) = 0 . 8 V 2 γ ( v 2 | ¬ v 1 ) = 0 . 5 , γ ( ¬ v 2 | ¬ v 1 ) = 0 . 5 • computes and π V 2 V 3 ↓ sends to V 2 : causal γ ( v 3 | v 2 ) = 0 . 6 , γ ( ¬ v 3 | v 2 ) = 0 . 4 parameter π V 1 V 3 V 2 γ ( v 3 | ¬ v 2 ) = 0 . 1 , γ ( ¬ v 3 | ¬ v 2 ) = 0.9 with π V 1 π V 1 V 2 ( v 1 ) = γ ( v 1 ) = 0 . 7; V 2 ( ¬ v 1 ) = 0 . 3 Node V 1 computes Pr( V 1 ) : Pr( v 1 ) = π V 1 V 2 ( v 1 ) = 0 . 7; Pr( ¬ v 1 ) = 0 . 3 117 / 384
Causal parameters: an example (cntd) γ ( v 1 ) = 0 . 7 , γ ( ¬ v 1 ) = 0 . 3 V 1 Node V 2 : π V 1 • receives causal pa- V 2 ↓ rameter π V 1 V 2 from V 1 γ ( v 2 | v 1 ) = 0 . 2 , γ ( ¬ v 2 | v 1 ) = 0 . 8 V 2 γ ( v 2 | ¬ v 1 ) = 0 . 5 , γ ( ¬ v 2 | ¬ v 1 ) = 0 . 5 • computes and π V 2 V 3 ↓ sends to V 3 : causal γ ( v 3 | v 2 ) = 0 . 6 , γ ( ¬ v 3 | v 2 ) = 0 . 4 parameter π V 2 V 3 V 3 γ ( v 3 | ¬ v 2 ) = 0 . 1 , γ ( ¬ v 3 | ¬ v 2 ) = 0.9 with π V 2 = Pr( v 2 | v 1 ) · Pr( v 1 ) + Pr( v 2 | ¬ v 1 ) · Pr( ¬ v 1 ) V 3 ( v 2 ) = γ ( v 2 | v 1 ) · π V 1 V 2 ( v 1 ) + γ ( v 2 | ¬ v 1 ) · π V 1 V 2 ( ¬ v 1 ) = 0 . 2 · 0 . 7 + 0 . 5 · 0 . 3 = 0 . 29 π V 2 V 3 ( ¬ v 2 ) = 0 . 8 · 0 . 7 + 0 . 5 · 0 . 3 = 0 . 71 Node V 2 computes Pr( V 2 ) : Pr( v 2 ) = π V 2 V 3 ( v 2 ) = 0 . 29; Pr( ¬ v 2 ) = 0 . 71 118 / 384
Causal parameters: an example (cntd) V 1 γ ( v 1 ) = 0 . 7 , γ ( ¬ v 1 ) = 0 . 3 Node V 3 : π V 1 V 2 ↓ • receives causal pa- γ ( v 2 | v 1 ) = 0 . 2 , γ ( ¬ v 2 | v 1 ) = 0 . 8 rameter π V 2 V 2 V 3 from V 2 γ ( v 2 | ¬ v 1 ) = 0 . 5 , γ ( ¬ v 2 | ¬ v 1 ) = 0 . 5 • sends no messa- π V 2 V 3 ↓ ges γ ( v 3 | v 2 ) = 0 . 6 , γ ( ¬ v 3 | v 2 ) = 0 . 4 V 3 γ ( v 3 | ¬ v 2 ) = 0 . 1 , γ ( ¬ v 3 | ¬ v 2 ) = 0.9 Node V 3 computes Pr( V 3 ) : = γ ( v 3 | v 2 ) · π V 2 V 3 ( v 2 ) + γ ( v 3 | ¬ v 2 ) · π V 2 Pr( v 3 ) V 3 ( ¬ v 2 ) = 0 . 6 · 0 . 29 + 0 . 1 · 0 . 71 = 0 . 245 Pr( ¬ v 3 ) = 0 . 4 · 0 . 29 + 0 . 9 · 0 . 71 = 0 . 755 � 119 / 384
Understanding Pearl: simple chains Consider the Bayesian networks B with the following graphs: γ ( v 1 | v 2 ) , γ ( ¬ v 1 | v 2 ) V 1 γ ( v 1 ) , γ ( ¬ v 1 ) V 1 γ ( v 1 | ¬ v 2 ) , γ ( ¬ v 1 | ¬ v 2 ) γ ( v 2 | v 1 ∧ v 3 ) , γ ( v 2 | v 1 ∧ ¬ v 3 ) V 2 γ ( v 2 ) , γ ( ¬ v 2 ) V 2 γ ( v 2 | ¬ v 1 ∧ v 3 ) , γ ( v 2 | ¬ v 1 ∧ ¬ v 3 ) ... γ ( v 3 | v 2 ) , γ ( ¬ v 3 | v 2 ) V 3 V 3 γ ( v 3 ) , γ ( ¬ v 3 ) γ ( v 3 | ¬ v 2 ) , γ ( ¬ v 3 | ¬ v 2 ) We consider the situation without observations. In each of the above networks, can nodes V 1 , V 2 , and V 3 compute the probabilities Pr( V 1 ) , Pr( V 2 ) , and Pr( V 3 ) , respectively. And if so, how? 120 / 384
Understanding Pearl with evidence (1) Consider Bayesian network B with evidence V 1 = true ( v 1 ) and the following graph: Node V 1 updates its probabili- γ ( v 1 ) , γ ( ¬ v 1 ) V 1 ties and causal parameter: π V 1 = Pr v 1 ( v 1 ) V 2 ( v 1 ) π V 1 V 2 ↓ = Pr( v 1 | v 1 ) = 1 γ ( v 2 | v 1 ) , γ ( ¬ v 2 | v 1 ) π V 1 V 2 ( ¬ v 1 ) = Pr v 1 ( ¬ v 1 ) = 0 V 2 γ ( v 2 | ¬ v 1 ) , γ ( ¬ v 2 | ¬ v 1 ) Given the updated information from V 1 , node V 2 updates the probabilities for its own values: = γ ( v 2 | v 1 ) · π V 1 V 2 ( v 1 ) + γ ( v 2 | ¬ v 1 ) · π V 1 Pr v 1 ( v 2 ) V 2 ( ¬ v 1 ) = γ ( v 2 | v 1 ) Pr v 1 ( ¬ v 2 ) = γ ( ¬ v 2 | v 1 ) · π V 1 V 2 ( v 1 ) + γ ( ¬ v 2 | ¬ v 1 ) · π V 1 V 2 ( ¬ v 1 ) = γ ( ¬ v 2 | v 1 ) Note that the function γ V 1 remains unchanged! 121 / 384
Understanding Pearl with evidence (2a) Consider Bayesian network B with the following graph: γ ( v 1 ) , γ ( ¬ v 1 ) V 1 γ ( v 2 | v 1 ) , γ ( ¬ v 2 | v 1 ) V 2 γ ( v 2 | ¬ v 1 ) , γ ( ¬ v 2 | ¬ v 1 ) Suppose we have evidence V 2 = true for node V 2 . • Can node V 1 compute the probabilities Pr v 2 ( V 1 ) ? If so, how? • Can node V 2 compute the probabilities Pr v 2 ( V 2 ) ? If so, how? 122 / 384
Understanding Pearl with evidence (2b) Consider Bayesian network B with evidence V 2 = true and the following graph: Node V 1 cannot update its V 1 γ ( v 1 ) , γ ( ¬ v 1 ) probabilities using its own knowledge; it requires in- formation from V 2 ! What in- γ ( v 2 | v 1 ) , γ ( ¬ v 2 | v 1 ) V 2 formation does V 1 require? γ ( v 2 | ¬ v 1 ) , γ ( ¬ v 2 | ¬ v 1 ) Consider the following properties: = Pr( v 2 | v 1 ) · Pr( v 1 ) Pr v 2 ( v 1 ) ∝ Pr( v 2 | v 1 ) · Pr( v 1 ) Pr( v 2 ) Pr v 2 ( ¬ v 1 ) = Pr( v 2 | ¬ v 1 ) · Pr( ¬ v 1 ) ∝ Pr( v 2 | ¬ v 1 ) · Pr( ¬ v 1 ) Pr( v 2 ) 123 / 384
Introduction to diagnostic parameters Reconsider Bayesian network B : γ ( v 1 ) , γ ( ¬ v 1 ) V 1 Node V 2 sends a message enabling V 1 to update the λ V 1 V 2 ↑ probabilities for its values. γ ( v 2 | v 1 ) , γ ( ¬ v 2 | v 1 ) V 2 γ ( v 2 | ¬ v 1 ) , γ ( ¬ v 2 | ¬ v 1 ) This message is a function λ V 1 V 2 : { v 1 , ¬ v 1 } → [0 , 1] that attaches a number to each value of V 1 . The message basically tells V 1 what node V 2 knows about V 1 ; in general: � λ V 1 V 2 ( c V 1 ) � = 1 c V 1 The function λ V 1 V 2 is called the diagnostic parameter from V 2 to V 1 . 124 / 384
Diagnostic parameters: an example Consider the following Bayesian network B with evidence V 2 = true : γ ( v 1 ) = 0 . 8 , γ ( ¬ v 1 ) = 0 . 2 V 1 λ V 1 V 2 ↑ γ ( v 2 | v 1 ) = 0 . 4 , γ ( ¬ v 2 | v 1 ) = 0 . 6 V 2 γ ( v 2 | ¬ v 1 ) = 0 . 9 , γ ( ¬ v 2 | ¬ v 1 ) = 0 . 1 Node V 2 : • computes and sends to V 1 : diagnostic parameter λ V 1 V 2 with λ V 1 V 2 ( v 1 ) = Pr( v 2 | v 1 ) = γ ( v 2 | v 1 ) = 0 . 4 λ V 1 V 2 ( ¬ v 1 ) = γ ( v 2 | ¬ v 1 ) = 0 . 9 Note that � c V 1 λ ( c V 1 ) = 1 . 3 > 1 ! 125 / 384
Diagnostic parameters: an example (cntd) γ ( v 1 ) = 0 . 8 , γ ( ¬ v 1 ) = 0 . 2 V 1 Node V 1 receives from λ V 1 V 2 ↑ V 2 the diagnostic para- meter λ V 1 γ ( v 2 | v 1 ) = 0 . 4 , γ ( ¬ v 2 | v 1 ) = 0 . 6 V 2 V 2 γ ( v 2 | ¬ v 1 ) = 0 . 9 , γ ( ¬ v 2 | ¬ v 1 ) = 0 . 1 Node V 1 computes: Pr v 2 ( v 1 ) = α · Pr( v 2 | v 1 ) · Pr( v 1 ) = α · λ V 1 V 2 ( v 1 ) · γ ( v 1 ) = α · 0 . 4 · 0 . 8 = α · 0 . 32 Pr v 2 ( ¬ v 1 ) = α · λ V 1 V 2 ( ¬ v 1 ) · γ ( ¬ v 1 ) = α · 0 . 9 · 0 . 2 = α · 0 . 18 Node V 1 now normalises its probabilities using Pr v 2 ( v 1 ) + Pr v 2 ( ¬ v 1 ) = 1 : α · 0 . 32 + α · 0 . 18 = 1 = ⇒ α = 2 resulting in Pr v 2 ( v 1 ) = 0 . 64 Pr v 2 ( ¬ v 1 ) = 0 . 36 126 / 384 �
Understanding Pearl: directed path with evidence Consider Bayesian network B with the following graph: V 1 γ ( v 1 ) , γ ( ¬ v 1 ) γ ( v 2 | v 1 ) , γ ( ¬ v 2 | v 1 ) V 2 γ ( v 2 | ¬ v 1 ) , γ ( ¬ v 2 | ¬ v 1 ) γ ( v 3 | v 2 ) , γ ( ¬ v 3 | v 2 ) V 3 γ ( v 3 | ¬ v 2 ) , γ ( ¬ v 3 | ¬ v 2 ) Suppose we have evidence V 3 = true for node V 3 . • Can node V 1 compute the probabilities Pr v 3 ( V 1 ) ? • Can node V 2 compute the probabilities Pr v 3 ( V 2 ) ? If so, how? • Can node V 3 compute the probabilities Pr v 3 ( V 3 ) ? What if node V 1 , node V 2 , or both have evidence instead? 127 / 384
Pearl on directed paths – An example (1) Consider Bayesian network B with evidence V 3 = true and the following graph: V 1 γ ( v 1 ) , γ ( ¬ v 1 ) Node V 1 : • receives diagnostic para- meter λ V 1 γ ( v 2 | v 1 ) , γ ( ¬ v 2 | v 1 ) V 2 ( V 1 ) V 2 γ ( v 2 | ¬ v 1 ) , γ ( ¬ v 2 | ¬ v 1 ) • computes and sends to V 2 : causal parameter π V 1 V 2 ( V 1 ) = γ ( v 3 | v 2 ) , γ ( ¬ v 3 | v 2 ) γ ( V 1 ) V 3 γ ( v 3 | ¬ v 2 ) , γ ( ¬ v 3 | ¬ v 2 ) Node V 1 computes = α · Pr( v 3 | v 1 ) · Pr( v 1 ) = α · λ V 1 Pr v 3 ( v 1 ) V 2 ( v 1 ) · γ ( v 1 ) Pr v 3 ( ¬ v 1 ) = α · Pr( v 3 | ¬ v 1 ) · Pr( ¬ v 1 ) = α · λ V 1 V 2 ( ¬ v 1 ) · γ ( ¬ v 1 ) 128 / 384
Pearl on directed paths – An example (2) Node V 2 : γ ( v 1 ) , γ ( ¬ v 1 ) V 1 • receives causal parameter π V 1 V 2 ( V 1 ) γ ( v 2 | v 1 ) , γ ( ¬ v 2 | v 1 ) V 2 • receives diagnostic para- γ ( v 2 | ¬ v 1 ) , γ ( ¬ v 2 | ¬ v 1 ) meter λ V 2 V 3 ( V 2 ) • computes and sends to V 3 : γ ( v 3 | v 2 ) , γ ( ¬ v 3 | v 2 ) V 3 π V 2 V 3 ( V 2 ) γ ( v 3 | ¬ v 2 ) , γ ( ¬ v 3 | ¬ v 2 ) Node V 2 computes and sends to V 1 : diagnostic parameter λ V 1 V 2 ( V 1 ) with λ V 1 V 2 ( v 1 ) = Pr( v 3 | v 1 ) = Pr( v 3 | v 2 ) · Pr( v 2 | v 1 ) + Pr( v 3 | ¬ v 2 ) · Pr( ¬ v 2 | v 1 ) = λ V 2 V 3 ( v 2 ) · γ ( v 2 | v 1 ) + λ V 2 V 3 ( ¬ v 2 ) · γ ( ¬ v 2 | v 1 ) λ V 1 V 2 ( ¬ v 1 ) = Pr( v 3 | ¬ v 1 ) = . . . The node then computes Pr v 3 ( V 2 ) . . . How? 129 / 384
Pearl on directed paths – An example (3) γ ( v 1 ) , γ ( ¬ v 1 ) V 1 γ ( v 2 | v 1 ) , γ ( ¬ v 2 | v 1 ) V 2 γ ( v 2 | ¬ v 1 ) , γ ( ¬ v 2 | ¬ v 1 ) γ ( v 3 | v 2 ) , γ ( ¬ v 3 | v 2 ) V 3 γ ( v 3 | ¬ v 2 ) , γ ( ¬ v 3 | ¬ v 2 ) Node V 3 : • receives causal parameter π V 2 V 3 ( V 2 ) • computes and sends to V 2 : diagnostic parameter λ V 2 V 3 ( V 2 ) with λ V 2 = Pr( v 3 | v 2 ) = γ ( v 3 | v 2 ) V 3 ( v 2 ) λ V 2 V 3 ( ¬ v 2 ) = Pr( v 3 | ¬ v 2 ) = γ ( v 3 | ¬ v 2 ) • computes Pr v 3 ( V 3 ) 130 / 384 �
Understanding Pearl: simple chain with evidence Consider the Bayesian networks B with the following graphs: γ ( v 1 | v 2 ) , γ ( ¬ v 1 | v 2 ) V 1 γ ( v 1 ) , γ ( ¬ v 1 ) V 1 γ ( v 1 | ¬ v 2 ) , γ ( ¬ v 1 | ¬ v 2 ) γ ( v 2 | v 1 ∧ v 3 ) , γ ( v 2 | v 1 ∧ ¬ v 3 ) V 2 γ ( v 2 ) , γ ( ¬ v 2 ) V 2 γ ( v 2 | ¬ v 1 ∧ v 3 ) , γ ( v 2 | ¬ v 1 ∧ ¬ v 3 ) ... γ ( v 3 | v 2 ) , γ ( ¬ v 3 | v 2 ) V 3 V 3 γ ( v 3 | ¬ v 2 ) , γ ( ¬ v 3 | ¬ v 2 ) γ ( v 3 ) , γ ( ¬ v 3 ) Suppose we have evidence V 3 = true for V 3 . Answer the following questions for each network above: Can nodes V 1 , V 2 , and V 3 compute the probabilities Pr v 3 ( V 1 ) , Pr v 3 ( V 2 ) , and Pr v 3 ( V 3 ) , respectively. And if so, how? 131 / 384
The parameters as messages V j Consider the graph of a Bayesian net- work as a computational architecture. π V j ↑ λ V j V i ↓ V i The separate causal and diagnostic V i parameters can be considered mes- sages that are passed between ob- π V i ↑ λ V i V k ↓ jects through communication chan- V k nels. V k 132 / 384
Pearl’s algorithm (high-level) Let B = ( G, Γ) be a Bayesian network with G = ( V G , A G ) ; let Pr be the joint distribution defined by B . For each V i ∈ V G do await messages from parents (if any) and compute π ( V i ) await messages from children (if any) and compute λ ( V i ) compute and send messages π V i V ij ( V i ) to all children V i j V jk compute and send messages λ V i ( V j k ) to all parents V j k compute Pr( V i | c E ) for evidence c E (if any) In the prior network message passing starts at ’root’ nodes; upon processing evidence, message passing is initiated at observed nodes. 133 / 384
Notation: partial configurations Definition : A random variable V j ∈ V is called instantiated if evidence V j = true or V j = false is obtained; otherwise V j is called uninstantiated. Let E ⊆ V be the subset of instantiated variables. The obtained configuration c E is called a partial configuration of V , written � c V . Example : Consider V = { V 1 , V 2 , V 3 } . If no evidence is obtained ( E = ∅ ) then: c V = T ( rue ) � c V = ¬ v 2 If evidence V 2 = false is obtained, then: � � Note: with � c V we can refer to evidence without specifying E . 134 / 384
Singly connected graphs (SCGs) Definition : A directed graph G is called singly connected if the underlying graph of G is acyclic. Example : The following graph is singly connected: V i Lemma : Let G be a singly connected graph. Each graph that is obtained from G by removing an arc, is not connected. Definition : A (directed) tree is a singly connected graph where each node has at most one incoming arc. 135 / 384
Notation: lowergraphs and uppergraphs Definition : Let G = ( V G , A G ) be a singly connected graph and let G ( V i ,V j ) be the subgraph of G after removing the arc ( V i , V j ) ∈ A G : G ( V i ,V j ) = ( V G , A G \ { ( V i , V j ) } ) Now consider a node V i ∈ V G : For each node V j ∈ ρ ( V i ) , let G + ( V j ,V i ) be the component of G + G ( V j ,V i ) that contains V j ; ( V j ,V i ) is called an uppergraph of V i . For each node V k ∈ σ ( V i ) , let G − ( V i ,V k ) be the component of G − G ( V i ,V k ) that contains V k ; ( V i ,V k ) is called a lowergraph of V i . 136 / 384
An example G + G + V 1 V 2 ( V 1 ,V 0 ) ( V 2 ,V 0 ) Node V 0 has: – two uppergraphs G + ( V 1 ,V 0 ) and G + V 0 ( V 2 ,V 0 ) – two lowergraphs G − ( V 0 ,V 3 ) and G − ( V 0 ,V 4 ) G − V 3 V 4 G − ( V 0 ,V 3 ) ( V 0 ,V 4 ) For this graph we have, for example, that I ( V G + ( V 1 ,V 0) , { V 0 } , V G − ( V 0 ,V 3) ) I ( V G − ( V 0 ,V 3) , { V 0 } , V G − ( V 0 ,V 4) ) ( V 1 ,V 0) , ∅ , V G + I ( V G + ( V 2 ,V 0) ) 137 / 384
Computing probabilities in singly connected graphs Lemma : Let B = ( G, Γ) be a Bayesian network with singly connected graph G = ( V G , A G ) with V G = V = { V 1 , . . . , V n } , n ≥ 1 ; let Pr be the joint distribution defined by B . � For V i ∈ V , let V + = V \ V + = ( Vj,Vi ) and V − i . V G + i i V j ∈ ρ ( V i ) Then Pr( V i | � c V ) = α · Pr( � | V i ) · Pr( V i | � i ) c V − c V + i i ∧ � where � c V = � c V − c V + i and α is a normalisation constant. 138 / 384
Computing probabilities in singly connected graphs Proof : Pr( V i | � c V ) = Pr( V i | � c V − ∧ � c V + i ) i | V i ) · Pr( � | V i ) · Pr( V i ) Pr( � c V − c V + i i = Pr( � ∧ � i ) c V − c V + i Pr( � i ) c V + | V i ) · Pr( V i | � i ) · = Pr( � c V − c V + Pr( � ∧ � i ) c V − c V + i i = α · Pr( � | V i ) · Pr( V i | � c V − c V + i ) i 1 i ) . where α = � Pr( � | � c V − c V + i 139 / 384
Compound parameters: definition Definition : Let B = ( G, Γ) be a Bayesian network with singly connected graph G = ( V G , A G ) ; let Pr be the joint distribution defined by B . For V i ∈ V G , let V + and V − be as before; i i • the function π : { v i , ¬ v i } → [0 , 1] for node V i is defined by π ( V i ) = Pr( V i | � i ) c V + and is called the compound causal parameter for V i ; • the function λ : { v i , ¬ v i } → [0 , 1] for node V i is defined by λ ( V i ) = Pr( � | V i ) c V − i and is called the compound diagnostic parameter for V i . 140 / 384
Computing probabilities in singly connected graphs Lemma : ( ‘Data Fusion’ ) Let B = ( G, Γ) be a Bayesian network with singly connected graph G = ( V G , A G ) ; let Pr be the joint distribution defined by B . Then for each V i ∈ V G : Pr( V i | � c V G ) = α · π ( V i ) · λ ( V i ) with compound causal parameter π , compound diagnostic parameter λ , and normalisation constant α . Proof : Follows directly from the previous lemma and the definitions of the compound parameters. � 141 / 384
The separate parameters defined Definition : Let B = ( G, Γ) be a Bayesian network with singly connected graph G = ( V G , A G ) ; let Pr be the joint distribution defined by B . Let V i ∈ V G be a node with child V k ∈ σ ( V i ) and parent V j ∈ ρ ( V i ) ; • the function π V i V k : { v i , ¬ v i } → [0 , 1] is defined by π V i V k ( V i ) = Pr( V i | � ( Vi,Vk ) ) c V G + and is called the causal parameter from V i to V k . V j • the function λ V i : { v j , ¬ v j } → [0 , 1] is defined by V j ( Vj,Vi ) | V j ) λ V i ( V j ) = Pr( � c V G − and is called the diagnostic parameter from V i to V j . 142 / 384
V ( G + ( V i ,V k ) ) V i V k V j V ( G − ( V j ,V i ) ) V i 143 / 384
Separate parameters in directed trees V + k V i V k V j V − i V i 144 / 384
Computing compound causal parameters in singly connected graphs Lemma : Let B = ( G, Γ) be as before. Consider a node V i ∈ V G and its parents ρ ( V i ) = { V i 1 , . . . , V i m } , m ≥ 1 . Then � � V ij π ( V i ) = γ ( V i | c ρ ( V i ) ) · V i ( c V ij ) π c ρ ( Vi ) j =1 ,...,m where c ρ ( V i ) = � j =1 ,...,m c V ij Note that each c V ij used in the product should be consistent with the c ρ ( V i ) from the summand! 145 / 384
V ( G + V ( G + ( V i 1 ,V i ) ) V i 1 . . . V i m ( V im ,V i ) V + i V i 146 / 384
Computing compound causal parameters in singly connected graphs Proof : Let Pr be the joint distribution defined by B . Then DEF π ( V i ) = Pr( V i | � c V + i ) = Pr( V i | � c V G + ( Vi 1 ,Vi ) ∧ . . . ∧ � c V G + ( Vim ,Vi ) ) � = Pr( V i | c ρ ( V i ) ∧ � ( Vi 1 ,Vi ) ∧ . . . ∧ � ( Vim ,Vi ) ) · c V G + c V G + c ρ ( Vi ) · Pr( c ρ ( V i ) | � ( Vi 1 ,Vi ) ∧ . . . ∧ � c V G + c V G + ( Vim ,Vi ) ) � � = Pr( V i | c ρ ( V i ) ) · Pr( c V ij | � c V G + ( Vij ,Vi ) ) c ρ ( Vi ) j =1 ,...,m � � V ij γ ( V i | c ρ ( V i ) ) · = π V i ( c V ij ) j =1 ,...,m c ρ ( Vi ) where c ρ ( V i ) = � j =1 ,...,m c V ij � 147 / 384
Computing π in directed trees Lemma : Let B = ( G, Γ) be a Bayesian network with directed tree G . Consider a node V i ∈ V G and its parent ρ ( V i ) = { V j } . Then � V j π ( V i ) = γ ( V i | c V j ) · π V i ( c V j ) c Vj Proof : See the proof for the general case where G is a singly connected graph. Take into account that V i now only has a single parent V j . � 148 / 384
Computing causal parameters in singly connected graphs Lemma : Let B = ( G, Γ) be a Bayesian network with singly connected graph G = ( V G , A G ) . Consider an uninstantiated node V i ∈ V G with m ≥ 1 children σ ( V i ) = { V i 1 , . . . , V i m } . Then � π V i λ V i V ij ( V i ) = α · π ( V i ) · V ik ( V i ) k =1 ,...,m, k � = j where α is a normalisation constant. 149 / 384
150 / 384
Computing causal parameters in singly connected graphs Proof : Let Pr be the joint distribution defined by B . Then DEF π V i V ij ( V i ) = Pr( V i | � ( Vi,Vij ) ) c V G + = α ′ · Pr( � ( Vi,Vij ) | V i ) · Pr( V i ) c V G + � = α ′ · Pr( � c V + i ∧ ( c V G − � ( Vi,Vik ) ) | V i ) · Pr( V i ) k � = j i | V i ) · � = α ′ · Pr( � ( Vi,Vik ) | V i ) · Pr( V i ) c V + k � = j Pr( � c V G − i ) · � = α · Pr( V i | � k � = j Pr( � ( Vi,Vik ) | V i ) c V + c V G − = α · π ( V i ) · � k � = j λ V i V ik ( V i ) � 151 / 384
Computing compound diagnostic parameters in singly connected graphs Lemma : Let B = ( G, Γ) be as before. Consider an uninstantiated node V i ∈ V G with m ≥ 1 children σ ( V i ) = { V i 1 , . . . , V i m } . Then � λ V i λ ( V i ) = V ij ( V i ) j =1 ,...,m 152 / 384
153 / 384
Computing compound diagnostic parameters in singly connected graphs Proof : Let Pr be the joint distribution defined by B . Then DEF λ ( V i ) = Pr( � c V − | V i ) i = Pr( � ( Vi,Vi 1 ) ∧ . . . ∧ � ( Vi,Vim ) | V i ) c V G − c V G − ( Vi,Vi 1 ) | V i ) · . . . · Pr( � ( Vi,Vim ) | V i ) = Pr( � c V G − c V G − = λ V i V i 1 ( V i ) · . . . · λ V i V im ( V i ) � λ V i = V ij ( V i ) � j =1 ,...,m 154 / 384
Computing diagnostic parameters in singly connected graphs Lemma : Let B = ( G, Γ) be as before. Consider a node V i ∈ V G with n ≥ 1 parents ρ ( V i ) = { V j 1 , . . . , V j n } . Then �� � � � � V jk V jl λ V i ( V j k ) = α · λ ( c V i ) · γ ( c V i | x ∧ V j k ) · π V i ( c V jl ) c Vi x = c ρ ( Vi ) \{ Vjk } l =1 ,...,n, l � = k where α is a normalisation constant. Note that each c V jl used in the product should be consistent with the x from the summand! Proof : see syllabus. � 155 / 384
Computing separate λ ’s in directed trees Lemma : Let B = ( G, Γ) be a Bayesian network with directed tree G . Consider a node V i ∈ V G and its parent ρ ( V i ) = { V j } . Then � V j V i ( V j ) = λ ( c V i ) · γ ( c V i | V j ) λ c Vi 156 / 384
Computing separate λ ’s in directed trees Proof : Let Pr be the joint distribution defined by B . Then V j DEF V i ( V j ) = Pr( � | V j ) λ c V − i = Pr( � | v i ∧ V j ) · Pr( v i | V j ) c V − i + Pr( � | ¬ v i ∧ V j ) · Pr( ¬ v i | V j ) c V − i = Pr( � | v i ) · Pr( v i | V j ) c V − i + Pr( � | ¬ v i ) · Pr( ¬ v i | V j ) c V − i = λ ( v i ) · γ ( v i | V j ) + λ ( ¬ v i ) · γ ( ¬ v i | V j ) � λ ( c V i ) · γ ( c V i | V j ) = � c Vi 157 / 384
Pearl’s algorithm: detailed computation rules for inference For V i ∈ V G with ρ ( V i ) = { V j 1 , . . . , V j n } , σ ( V i ) = { V i 1 , . . . , V i m } : Pr( V i | � c V ) = α · π ( V i ) · λ ( V i ) n � � V jk γ ( V i | c ρ ( V i ) ) · π ( V i ) = π V i ( c V jk ) c ρ ( Vi ) k =1 � m λ V i λ ( V i ) = V ij ( V i ) dummy ! j =1 m � = α ′ · π ( V i ) · π V i λ V i V ij ( V i ) V ik ( V i ) dummy ! k =1 ,k � = j � � � n � � V jk V jl V i ( V j k ) = α ′′ · λ ( c V i ) · γ ( c V i | x ∧ V j k ) · V i ( c V jl ) λ π c Vi x = c ρ ( Vi ) \{ Vjk } l =1 ,l � = k with normalisation constants α, α ′ , and α ′′ . 158 / 384
Special cases: roots Let B = ( G, Γ) be a Bayesian network with singly connected graph G ; let Pr be the joint distribution defined by B . • Consider a node W ∈ V G with ρ ( W ) = ∅ The compound causal parameter π : { w, ¬ w } → [0 , 1] for W is defined by π ( W ) = Pr( W | � c W + ) (definition) ( W + = ∅ ) = Pr( W | T ) = Pr( W ) = γ ( W ) 159 / 384
Special cases: leafs Let B = ( G, Γ) and Pr be as before. • Consider a node V with σ ( V ) = ∅ The compound diagnostic parameter λ : { v, ¬ v } → [0 , 1] for V is defined as follows: • if node V is uninstantiated, then λ ( V ) = Pr( � c V − | V ) (definition) ( V − = { V } , V uninst . ) Pr( T | V ) = = 1 • if node V is instantiated, then c V − | V ) λ ( V ) = Pr( � (definition) = Pr( � c V | V ) ( σ ( V ) = ∅ ) � 1 for c V = � c V = 0 for c V � = � c V 160 / 384
Special cases: uninstantiated (sub)graphs “a useful property” • Consider a node V ∈ V G and assume that � c V G = T(rue). The compound diagnostic parameter λ : { v, ¬ v } → [0 , 1] for V is defined as follows: c V − | V ) λ ( V ) = Pr( � (definition) = Pr( T | V ) ( � c V G = T ) = 1 From the above it is clear that this property also holds for any node V for which � c V − = T . 161 / 384
Pearl’s algorithm: a tree example Consider Bayesian network B = ( G, Γ) : γ ( v 1 ) = 0 . 7 V 1 γ ( v 2 | v 1 ) = 0 . 5 γ ( v 5 | v 1 ) = 0 . 1 γ ( v 2 | ¬ v 1 ) = 0 . 4 γ ( v 5 | ¬ v 1 ) = 0 . 8 V 2 V 5 γ ( v 3 | v 2 ) = 0 . 2 γ ( v 3 | ¬ v 2 ) = 0 . 3 γ ( v 4 | v 2 ) = 0 . 8 γ ( v 4 | ¬ v 2 ) = 0 V 3 V 4 Let Pr be the joint distribution defined by B . Assignment : compute Pr( V i ) , i = 1 , . . . , 5 . Start : Pr( V i ) = α · π ( V i ) · λ ( V i ) , i = 1 , . . . , 5 . λ ( V i ) = 1 for all V i . Why? As a result, no normalisation is required and Pr( V i ) = π ( V i ) . 162 / 384
An example (2) γ ( v 1 ) = 0 . 7 V 1 γ ( v 2 | v 1 ) = 0 . 5 γ ( v 5 | v 1 ) = 0 . 1 γ ( v 2 | ¬ v 1 ) = 0 . 4 γ ( v 5 | ¬ v 1 ) = 0 . 8 V 2 V 5 γ ( v 3 | v 2 ) = 0 . 2 γ ( v 3 | ¬ v 2 ) = 0 . 3 γ ( v 4 | v 2 ) = 0 . 8 γ ( v 4 | ¬ v 2 ) = 0 V 3 V 4 π ( V 1 ) = γ ( V 1 ) . Why? Node V 1 computes: Pr( v 1 ) = π ( v 1 ) = γ ( v 1 ) = 0 . 7 Pr( ¬ v 1 ) π ( ¬ v 1 ) γ ( ¬ v 1 ) = = = 0 . 3 Node V 1 computes for node V 2 : π V 1 V 2 ( V 1 ) = π ( V 1 ) (why?) 163 / 384
An example (3) γ ( v 1 ) = 0 . 7 V 1 γ ( v 2 | v 1 ) = 0 . 5 γ ( v 5 | v 1 ) = 0 . 1 γ ( v 2 | ¬ v 1 ) = 0 . 4 γ ( v 5 | ¬ v 1 ) = 0 . 8 V 2 V 5 γ ( v 3 | v 2 ) = 0 . 2 γ ( v 3 | ¬ v 2 ) = 0 . 3 γ ( v 4 | v 2 ) = 0 . 8 γ ( v 4 | ¬ v 2 ) = 0 V 3 V 4 Node V 2 computes: Pr( v 2 ) = π ( v 2 ) = γ ( v 2 | v 1 ) · π V 1 V 2 ( v 1 ) + γ ( v 2 | ¬ v 1 ) · π V 1 V 2 ( ¬ v 1 ) = γ ( v 2 | v 1 ) · π ( v 1 ) + γ ( v 2 | ¬ v 1 ) · π ( ¬ v 1 ) = 0 . 5 · 0 . 7 + 0 . 4 · 0 . 3 = 0 . 47 Pr( ¬ v 2 ) = π ( ¬ v 2 ) = 0 . 5 · 0 . 7 + 0 . 6 · 0 . 3 = 0 . 53 164 / 384
An example (4) γ ( v 1 ) = 0 . 7 V 1 γ ( v 2 | v 1 ) = 0 . 5 γ ( v 5 | v 1 ) = 0 . 1 γ ( v 2 | ¬ v 1 ) = 0 . 4 γ ( v 5 | ¬ v 1 ) = 0 . 8 V 2 V 5 γ ( v 3 | v 2 ) = 0 . 2 γ ( v 3 | ¬ v 2 ) = 0 . 3 γ ( v 4 | v 2 ) = 0 . 8 γ ( v 4 | ¬ v 2 ) = 0 V 3 V 4 Node V 2 computes for node V 3 : π V 2 V 3 ( V 2 ) = π ( V 2 ) Are all causal parameters sent by a node equal to its compound causal parameter? 165 / 384
An example (5) γ ( v 1 ) = 0 . 7 V 1 γ ( v 2 | v 1 ) = 0 . 5 γ ( v 5 | v 1 ) = 0 . 1 γ ( v 2 | ¬ v 1 ) = 0 . 4 γ ( v 5 | ¬ v 1 ) = 0 . 8 V 2 V 5 γ ( v 3 | v 2 ) = 0 . 2 γ ( v 3 | ¬ v 2 ) = 0 . 3 γ ( v 4 | v 2 ) = 0 . 8 V 3 V 4 γ ( v 4 | ¬ v 2 ) = 0 Node V 3 computes: Pr( v 3 ) = π ( v 3 ) = γ ( v 3 | v 2 ) · π V 2 V 3 ( v 2 ) + γ ( v 3 | ¬ v 2 ) · π V 2 V 3 ( ¬ v 2 ) = γ ( v 3 | v 2 ) · π ( v 2 ) + γ ( v 3 | ¬ v 2 ) · π ( ¬ v 2 ) = 0 . 2 · 0 . 47 + 0 . 3 · 0 . 53 = 0 . 253 Pr( ¬ v 3 ) = π ( ¬ v 3 ) = 0 . 8 · 0 . 47 + 0 . 7 · 0 . 53 = 0 . 747 166 / 384
An example (6) γ ( v 1 ) = 0 . 7 V 1 γ ( v 2 | v 1 ) = 0 . 5 γ ( v 5 | v 1 ) = 0 . 1 γ ( v 2 | ¬ v 1 ) = 0 . 4 γ ( v 5 | ¬ v 1 ) = 0 . 8 V 2 V 5 γ ( v 3 | v 2 ) = 0 . 2 γ ( v 3 | ¬ v 2 ) = 0 . 3 γ ( v 4 | v 2 ) = 0 . 8 V 3 V 4 γ ( v 4 | ¬ v 2 ) = 0 In a similar way, we find that Pr( v 4 ) = 0 . 376 , Pr( ¬ v 4 ) = 0 . 624 Pr( v 5 ) = 0 . 310 , Pr( ¬ v 5 ) = 0 . 690 � 167 / 384
Pearl’s algorithm: a singly connected example Consider Bayesian network B = ( G, Γ) : γ ( v 2 ) = 0 . 1 γ ( v 3 ) = 0 . 4 V 2 V 3 γ ( ¬ v 2 ) = 0 . 9 γ ( ¬ v 3 ) = 0 . 6 γ ( v 1 | v 2 ∧ v 3 ) = 0 . 8 γ ( ¬ v 1 | v 2 ∧ v 3 ) = 0 . 2 V 1 γ ( v 1 | ¬ v 2 ∧ v 3 ) = 0 . 9 γ ( ¬ v 1 | ¬ v 2 ∧ v 3 ) = 0 . 1 γ ( v 1 | v 2 ∧ ¬ v 3 ) = 0 . 5 γ ( ¬ v 1 | v 2 ∧ ¬ v 3 ) = 0 . 5 γ ( v 1 | ¬ v 2 ∧ ¬ v 3 ) = 0 . 6 γ ( ¬ v 1 | ¬ v 2 ∧ ¬ v 3 ) = 0 . 4 Let Pr be the joint distribution defined by B . Assignment : compute Pr( V 1 ) = α · π ( V 1 ) · λ ( V 1 ) . λ ( V 1 ) = 1 , so no normalisation is required. 168 / 384
An example (2) γ ( v 2 ) = 0 . 1 γ ( v 3 ) = 0 . 4 V 2 V 3 γ ( ¬ v 2 ) = 0 . 9 γ ( ¬ v 3 ) = 0 . 6 γ ( v 1 | v 2 ∧ v 3 ) = 0 . 8 γ ( ¬ v 1 | v 2 ∧ v 3 ) = 0 . 2 V 1 γ ( v 1 | ¬ v 2 ∧ v 3 ) = 0 . 9 γ ( ¬ v 1 | ¬ v 2 ∧ v 3 ) = 0 . 1 γ ( v 1 | v 2 ∧ ¬ v 3 ) = 0 . 5 γ ( ¬ v 1 | v 2 ∧ ¬ v 3 ) = 0 . 5 γ ( v 1 | ¬ v 2 ∧ ¬ v 3 ) = 0 . 6 γ ( ¬ v 1 | ¬ v 2 ∧ ¬ v 3 ) = 0 . 4 Node V 1 computes: Pr( v 1 ) = π ( v 1 ) = γ ( v 1 | v 2 ∧ v 3 ) · π V 2 V 1 ( v 2 ) · π V 3 V 1 ( v 3 ) + + γ ( v 1 | ¬ v 2 ∧ v 3 ) · π V 2 V 1 ( ¬ v 2 ) · π V 3 V 1 ( v 3 ) + + γ ( v 1 | v 2 ∧ ¬ v 3 ) · π V 2 V 1 ( v 2 ) · π V 3 V 1 ( ¬ v 3 ) + + γ ( v 1 | ¬ v 2 ∧ ¬ v 3 ) · π V 2 V 1 ( ¬ v 2 ) · π V 3 V 1 ( ¬ v 3 ) = 0 . 8 · 0 . 1 · 0 . 4 + 0 . 9 · 0 . 9 · 0 . 4+ + 0 . 5 · 0 . 1 · 0 . 6 + 0 . 6 · 0 . 9 · 0 . 6 = 0 . 71 Pr( ¬ v 1 ) = 0 . 29 � 169 / 384
Instantiated nodes Let B = ( G, Γ) be a Bayesian network with singly connected graph G ; let Pr be as before. • Consider an instantiated node V ∈ V G , for which evidence V = true is obtained. For the compound diagnostic parameter λ : { v, ¬ v } → [0 , 1] for V we have that c V − | v ) λ ( v ) = Pr( � (definition) = Pr( � c V − \{ V } ∧ v | v ) = ?? ( unless σ ( V ) = ∅ in which case λ ( v ) = 1) c V − | ¬ v ) λ ( ¬ v ) = Pr( � (definition) = Pr( � c V − \{ V } ∧ v | ¬ v ) = 0 The case with evidence V = false is similar. 170 / 384
Entering evidence Consider the following fragment of graph G (in black) of a Bayesian network: Suppose evidence is obtai- ned for node V . V Entering evidence is model- λ V D led by extending G with a ‘dummy’ child D for V . D The dummy node sends the diagnostic parameter λ V D to V with λ V λ V D ( v ) = 1 , D ( ¬ v ) = 0 for evidence V = true λ V λ V D ( v ) = 0 , D ( ¬ v ) = 1 for evidence V = false 171 / 384
Entering evidence: a tree example Let Pr and B be as before: γ ( v 1 ) = 0 . 7 V 1 γ ( v 2 | v 1 ) = 0 . 5 γ ( v 5 | v 1 ) = 0 . 1 γ ( v 2 | ¬ v 1 ) = 0 . 4 γ ( v 5 | ¬ v 1 ) = 0 . 8 V 2 V 5 γ ( v 3 | v 2 ) = 0 . 2 γ ( v 3 | ¬ v 2 ) = 0 . 3 γ ( v 4 | v 2 ) = 0 . 8 γ ( v 4 | ¬ v 2 ) = 0 V 3 V 4 Evidence V 1 = false is entered. Assignment : compute Pr ¬ v 1 ( V i ) . Start : Pr ¬ v 1 ( V i ) = α · π ( V i ) · λ ( V i ) , i = 1 , . . . , 5 . For i = 2 , . . . , 5 , we have that λ ( V i ) = 1 . Why? For those nodes we thus have Pr( V i ) = π ( V i ) . 172 / 384
An example with evidence V 1 = false (2) γ ( v 1 ) = 0 . 7 V 1 γ ( v 2 | v 1 ) = 0 . 5 γ ( v 5 | v 1 ) = 0 . 1 γ ( v 2 | ¬ v 1 ) = 0 . 4 γ ( v 5 | ¬ v 1 ) = 0 . 8 V 2 V 5 γ ( v 3 | v 2 ) = 0 . 2 γ ( v 3 | ¬ v 2 ) = 0 . 3 γ ( v 4 | v 2 ) = 0 . 8 γ ( v 4 | ¬ v 2 ) = 0 V 3 V 4 Node V 1 now computes: Pr ¬ v 1 ( v 1 ) = α · π ( v 1 ) · λ ( v 1 ) = 0 Pr ¬ v 1 ( ¬ v 1 ) = α · π ( ¬ v 1 ) · λ ( ¬ v 1 ) = α · 0 . 3 Pr ¬ v 1 ( v 1 ) = 0 , Pr ¬ v 1 ( ¬ v 1 ) = 1 Normalisation gives: Node V 1 computes for node V 2 : π V 1 V 2 ( V 1 ) = α · π ( V 1 ) · λ V 1 V 5 ( V 1 ) · λ V 1 D ( V 1 ) = ? 173 / 384
An example with evidence V 1 = false (3) γ ( v 1 ) = 0 . 7 V 1 γ ( v 2 | v 1 ) = 0 . 5 γ ( v 5 | v 1 ) = 0 . 1 γ ( v 2 | ¬ v 1 ) = 0 . 4 γ ( v 5 | ¬ v 1 ) = 0 . 8 V 2 V 5 γ ( v 3 | v 2 ) = 0 . 2 γ ( v 3 | ¬ v 2 ) = 0 . 3 γ ( v 4 | v 2 ) = 0 . 8 γ ( v 4 | ¬ v 2 ) = 0 V 3 V 4 Node V 2 computes: Pr ¬ v 1 ( v 2 ) = π ( v 2 ) = γ ( v 2 | v 1 ) · π V 1 V 2 ( v 1 ) + γ ( v 2 | ¬ v 1 ) · π V 1 V 2 ( ¬ v 1 ) = 0 . 5 · 0 + 0 . 4 · 1 = 0 . 4 Pr ¬ v 1 ( ¬ v 2 ) = π ( ¬ v 2 ) = 0 . 5 · 0 + 0 . 6 · 1 = 0 . 6 Node V 2 computes for node V 3 : π V 2 V 3 ( V 2 ) = π ( V 2 ) Why? 174 / 384
An example with evidence V 1 = false (4) γ ( v 1 ) = 0 . 7 V 1 γ ( v 2 | v 1 ) = 0 . 5 γ ( v 5 | v 1 ) = 0 . 1 γ ( v 2 | ¬ v 1 ) = 0 . 4 γ ( v 5 | ¬ v 1 ) = 0 . 8 V 2 V 5 γ ( v 3 | v 2 ) = 0 . 2 γ ( v 3 | ¬ v 2 ) = 0 . 3 γ ( v 4 | v 2 ) = 0 . 8 V 3 V 4 γ ( v 4 | ¬ v 2 ) = 0 Node V 3 computes: Pr ¬ v 1 ( v 3 ) = π ( v 3 ) = γ ( v 3 | v 2 ) · π V 2 V 3 ( v 2 ) + γ ( v 3 | ¬ v 2 ) · π V 2 V 3 ( ¬ v 2 ) = γ ( v 3 | v 2 ) · π ( v 2 ) + γ ( v 3 | ¬ v 2 ) · π ( ¬ v 2 ) = 0 . 2 · 0 . 4 + 0 . 3 · 0 . 6 = 0 . 26 Pr ¬ v 1 ( ¬ v 3 ) = 0 . 8 · 0 . 4 + 0 . 7 · 0 . 6 = 0 . 74 175 / 384
An example with evidence V 1 = false (5) γ ( v 1 ) = 0 . 7 V 1 γ ( v 2 | v 1 ) = 0 . 5 γ ( v 5 | v 1 ) = 0 . 1 γ ( v 2 | ¬ v 1 ) = 0 . 4 γ ( v 5 | ¬ v 1 ) = 0 . 8 V 2 V 5 γ ( v 3 | v 2 ) = 0 . 2 γ ( v 3 | ¬ v 2 ) = 0 . 3 γ ( v 4 | v 2 ) = 0 . 8 V 3 V 4 γ ( v 4 | ¬ v 2 ) = 0 In a similar way, we find that Pr ¬ v 1 ( v 4 ) = 0 . 32 , Pr ¬ v 1 ( ¬ v 4 ) = 0 . 68 Pr ¬ v 1 ( v 5 ) = 0 . 80 , Pr ¬ v 1 ( ¬ v 5 ) = 0 . 20 � 176 / 384
Another piece of evidence: tree example Let Pr and B be as before: γ ( v 1 ) = 0 . 7 V 1 γ ( v 2 | v 1 ) = 0 . 5 γ ( v 2 | ¬ v 1 ) = 0 . 4 γ ( v 5 | ¬ v 1 ) = 0 . 8 V 2 V 5 γ ( v 3 | v 2 ) = 0 . 2 γ ( v 5 | v 1 ) = 0 . 1 γ ( v 3 | ¬ v 2 ) = 0 . 3 γ ( v 4 | v 2 ) = 0 . 8 V 3 V 4 γ ( v 4 | ¬ v 2 ) = 0 The additional evidence V 3 = true is entered. Assignment : compute Pr ¬ v 1 ,v 3 ( V i ) . Start : Pr ¬ v 1 ,v 3 ( V i ) = α · π ( V i ) · λ ( V i ) , i = 1 , . . . , 5 . Which parameters can be re-used and which should be updated? 177 / 384
Another example (2) γ ( v 1 ) = 0 . 7 V 1 γ ( v 2 | v 1 ) = 0 . 5 γ ( v 2 | ¬ v 1 ) = 0 . 4 γ ( v 5 | ¬ v 1 ) = 0 . 8 V 2 V 5 γ ( v 3 | v 2 ) = 0 . 2 γ ( v 5 | v 1 ) = 0 . 1 γ ( v 3 | ¬ v 2 ) = 0 . 3 γ ( v 4 | v 2 ) = 0 . 8 V 3 V 4 γ ( v 4 | ¬ v 2 ) = 0 For i = 4 , 5 , we have that λ ( V i ) = 1 . For those two nodes we thus have Pr( V i ) = π ( V i ) . The probabilities for V 1 remain unchanged: Pr ¬ v 1 ,v 3 ( v 1 ) = 0 , Pr ¬ v 1 ,v 3 ( ¬ v 1 ) = 1 The probabilities for node V 5 remain unchanged. Why? Therefore Pr ¬ v 1 ,v 3 ( v 5 ) = Pr ¬ v 1 ( ¬ v 5 ) = 0 . 8 , Pr ¬ v 1 ,v 3 ( ¬ v 5 ) = 0 . 2 178 / 384
Another example (3) γ ( v 1 ) = 0 . 7 V 1 γ ( v 2 | v 1 ) = 0 . 5 γ ( v 2 | ¬ v 1 ) = 0 . 4 γ ( v 5 | ¬ v 1 ) = 0 . 8 V 2 V 5 γ ( v 3 | v 2 ) = 0 . 2 γ ( v 5 | v 1 ) = 0 . 1 γ ( v 3 | ¬ v 2 ) = 0 . 3 γ ( v 4 | v 2 ) = 0 . 8 V 3 V 4 γ ( v 4 | ¬ v 2 ) = 0 Node V 3 computes: Pr ¬ v 1 ,v 3 ( v 3 ) = α · π ( v 3 ) · λ ( v 3 ) = α · π ( v 3 ) = α · 0 . 26 Why? Pr ¬ v 1 ,v 3 ( ¬ v 3 ) = α · π ( ¬ v 3 ) · λ ( ¬ v 3 ) = 0 After normalisation: Pr ¬ v 1 ,v 3 ( v 3 ) = 1 , Pr ¬ v 1 ,v 3 ( ¬ v 3 ) = 0 V 3 ( V 2 ) = � Node V 3 computes for node V 2 : λ V 2 c V 3 λ ( V 3 ) · γ ( c V 3 | V 2 ) 179 / 384
Another example (4) γ ( v 1 ) = 0 . 7 V 1 γ ( v 2 | v 1 ) = 0 . 5 γ ( v 2 | ¬ v 1 ) = 0 . 4 γ ( v 5 | ¬ v 1 ) = 0 . 8 V 2 V 5 γ ( v 3 | v 2 ) = 0 . 2 γ ( v 5 | v 1 ) = 0 . 1 γ ( v 3 | ¬ v 2 ) = 0 . 3 γ ( v 4 | v 2 ) = 0 . 8 V 3 V 4 γ ( v 4 | ¬ v 2 ) = 0 Node V 2 computes: Pr ¬ v 1 ,v 3 ( v 2 ) = α · π ( v 2 ) · λ ( v 2 ) = α · π ( v 2 ) · λ V 2 V 3 ( v 2 ) · λ V 2 V 4 ( v 2 ) = α · π ( v 2 ) · γ ( v 3 | v 2 ) = α · 0 . 4 · 0 . 2 = α · 0 . 08 Pr ¬ v 1 ,v 3 ( ¬ v 2 ) = α · π ( ¬ v 2 ) · λ ( ¬ v 2 ) = α · π ( ¬ v 2 ) · λ V 2 V 3 ( ¬ v 2 ) · λ V 2 V 4 ( ¬ v 2 ) = α · π ( ¬ v 2 ) · γ ( v 3 | ¬ v 2 ) = α · 0 . 6 · 0 . 3 = α · 0 . 18 Normalisation results in: Pr ¬ v 1 ,v 3 ( v 2 ) = 0 . 31 , Pr ¬ v 1 ,v 3 ( ¬ v 2 ) = 0 . 69 180 / 384
Another example (5) γ ( v 1 ) = 0 . 7 V 1 γ ( v 2 | v 1 ) = 0 . 5 γ ( v 2 | ¬ v 1 ) = 0 . 4 γ ( v 5 | ¬ v 1 ) = 0 . 8 V 2 V 5 γ ( v 3 | v 2 ) = 0 . 2 γ ( v 5 | v 1 ) = 0 . 1 γ ( v 3 | ¬ v 2 ) = 0 . 3 γ ( v 4 | v 2 ) = 0 . 8 V 3 V 4 γ ( v 4 | ¬ v 2 ) = 0 Node V 2 computes for node V 4 : π V 2 V 4 ( V 2 ) = α · π ( V 2 ) · λ V 2 V 3 ( V 2 ) ⇒ 0 . 31 and 0 . 69 Node V 4 computes: Pr ¬ v 1 ,v 3 ( v 4 ) = π ( v 4 ) = γ ( v 4 | v 2 ) · π V 2 V 4 ( v 2 ) + γ ( v 4 | ¬ v 2 ) · π V 2 V 4 ( ¬ v 2 ) = γ ( v 4 | v 2 ) · π V 2 V 4 ( v 2 ) + 0 = 0 . 8 · 0 . 31 = 0 . 248 Pr ¬ v 1 ,v 3 ( ¬ v 4 ) = 0 . 2 · 0 . 31 + 1 . 0 · 0 . 69 = 0 . 752 181 / 384 �
Entering evidence: a singly connected example Let Pr and B be as before: γ ( v 2 ) = 0 . 1 γ ( v 3 ) = 0 . 4 V 2 V 3 γ ( ¬ v 2 ) = 0 . 9 γ ( ¬ v 3 ) = 0 . 6 γ ( v 1 | v 2 ∧ v 3 ) = 0 . 8 γ ( ¬ v 1 | v 2 ∧ v 3 ) = 0 . 2 V 1 γ ( v 1 | ¬ v 2 ∧ v 3 ) = 0 . 9 γ ( ¬ v 1 | ¬ v 2 ∧ v 3 ) = 0 . 1 γ ( v 1 | v 2 ∧ ¬ v 3 ) = 0 . 5 γ ( ¬ v 1 | v 2 ∧ ¬ v 3 ) = 0 . 5 γ ( v 1 | ¬ v 2 ∧ ¬ v 3 ) = 0 . 6 γ ( ¬ v 1 | ¬ v 2 ∧ ¬ v 3 ) = 0 . 4 Evidence V 1 = true is entered. Assignment : compute Pr v 1 ( V 2 ) = α · π ( V 2 ) · λ ( V 2 ) . 182 / 384
An example with evidence V 1 = true (2) γ ( v 2 ) = 0 . 1 γ ( v 3 ) = 0 . 4 V 2 V 3 γ ( ¬ v 2 ) = 0 . 9 γ ( ¬ v 3 ) = 0 . 6 γ ( v 1 | v 2 ∧ v 3 ) = 0 . 8 γ ( ¬ v 1 | v 2 ∧ v 3 ) = 0 . 2 V 1 γ ( v 1 | ¬ v 2 ∧ v 3 ) = 0 . 9 γ ( ¬ v 1 | ¬ v 2 ∧ v 3 ) = 0 . 1 γ ( v 1 | v 2 ∧ ¬ v 3 ) = 0 . 5 γ ( ¬ v 1 | v 2 ∧ ¬ v 3 ) = 0 . 5 γ ( v 1 | ¬ v 2 ∧ ¬ v 3 ) = 0 . 6 γ ( ¬ v 1 | ¬ v 2 ∧ ¬ v 3 ) = 0 . 4 Node V 1 computes for node V 2 : λ V 2 = λ ( v 1 ) · [ γ ( v 1 | v 2 ∧ v 3 ) · π V 3 V 1 ( v 2 ) V 1 ( v 3 ) + γ ( v 1 | v 2 ∧ ¬ v 3 ) · π V 3 V 1 ( ¬ v 3 )] + λ ( ¬ v 1 ) · [ γ ( ¬ v 1 | v 2 ∧ v 3 ) · π V 3 V 1 ( v 3 ) + γ ( ¬ v 1 | v 2 ∧ ¬ v 3 ) · π V 3 V 1 ( ¬ v 3 )] = = 0 . 8 · 0 . 4 + 0 . 5 · 0 . 6 = 0 . 62 λ V 2 V 1 ( ¬ v 2 ) = 0 . 9 · 0 . 4 + 0 . 6 · 0 . 6 = 0 . 72 183 / 384
An example with evidence V 1 = true (3) γ ( v 2 ) = 0 . 1 γ ( v 3 ) = 0 . 4 V 2 V 3 γ ( ¬ v 2 ) = 0 . 9 γ ( ¬ v 3 ) = 0 . 6 γ ( v 1 | v 2 ∧ v 3 ) = 0 . 8 γ ( ¬ v 1 | v 2 ∧ v 3 ) = 0 . 2 V 1 γ ( v 1 | ¬ v 2 ∧ v 3 ) = 0 . 9 γ ( ¬ v 1 | ¬ v 2 ∧ v 3 ) = 0 . 1 γ ( v 1 | v 2 ∧ ¬ v 3 ) = 0 . 5 γ ( ¬ v 1 | v 2 ∧ ¬ v 3 ) = 0 . 5 γ ( v 1 | ¬ v 2 ∧ ¬ v 3 ) = 0 . 6 γ ( ¬ v 1 | ¬ v 2 ∧ ¬ v 3 ) = 0 . 4 Node V 2 computes: Pr v 1 ( v 2 ) = α · π ( v 2 ) · λ ( v 2 ) = α · γ ( v 2 ) · λ V 2 V 1 ( v 2 ) = = α · 0 . 1 · 0 . 62 = 0 . 062 α Pr v 1 ( ¬ v 2 ) = α · 0 . 9 · 0 . 72 = 0 . 648 α Normalisation gives: Pr v 1 ( v 2 ) ∼ 0 . 087 , Pr v 1 ( ¬ v 2 ) ∼ 0 . 913 � 184 / 384
The message passing Initially, the Bayesian network is in a stable situation. evidence λ π Once evidence is entered into the network, this stability is disturbed. 185 / 384
The message passing, continued Evidence initiates message passing throughout the entire network: When each node in the network has been visited by the message passing algorithm, the network re- turns to a new stable situa- tion. 186 / 384
Pearl: some complexity issues Consider a Bayesian network B with singly connected digraph G with n ≥ 1 nodes. Suppose that node V has O ( n ) parents and O ( n ) children: W 1 . . . W i . . . W p ρ ( V ) V Z 1 . . . Z j . . . Z s σ ( V ) • Computing the compound causal parameter requires at most O (2 n ) time: � � π W i π ( V ) = γ ( V | c ρ ( V ) ) · V ( c W i ) c ρ ( V ) k =1 ,...,p 187 / 384
Complexity issues (2) ρ ( V ) W 1 . . . W i . . . W p V σ ( V ) Z 1 . . . Z j . . . Z s • Computing the compound diagnostic parameter requires at most O ( n ) time: � λ V λ ( V ) = Z j ( V ) j =1 ,...,s A node can therefore compute the probabilities for its values in at most O (2 n ) time. 188 / 384
Recommend
More recommend