probabilistic inference in bn2t models by weighted model
play

Probabilistic Inference in BN2T Models by Weighted Model Counting - PowerPoint PPT Presentation

Probabilistic Inference in BN2T Models by Weighted Model Counting Jirka Vomlel Institute of Information Theory and Automation Academy of Sciences of the Czech Republic http://www.utia.cz/vomlel Aalborg, Denmark, November, 21, 2013 A medical


  1. Probabilistic Inference in BN2T Models by Weighted Model Counting Jirka Vomlel Institute of Information Theory and Automation Academy of Sciences of the Czech Republic http://www.utia.cz/vomlel Aalborg, Denmark, November, 21, 2013

  2. A medical example: Carcinoid Heart Disease (CHD), van Gerven (2003) • Eleven CHD risk factors X 1 , . . . , X 11 measured at patient admission to the clinic: diarrhea, hepatic metastases, etc.

  3. A medical example: Carcinoid Heart Disease (CHD), van Gerven (2003) • Eleven CHD risk factors X 1 , . . . , X 11 measured at patient admission to the clinic: diarrhea, hepatic metastases, etc. • Dependent variable Y has two values: either 0 if CHD does not develop of 1 if it does.

  4. A medical example: Carcinoid Heart Disease (CHD), van Gerven (2003) • Eleven CHD risk factors X 1 , . . . , X 11 measured at patient admission to the clinic: diarrhea, hepatic metastases, etc. • Dependent variable Y has two values: either 0 if CHD does not develop of 1 if it does. • The conditional probability P ( Y | X 1 , . . . , X 11 ) is modeled by a noisy threshold model with ℓ = 6 .

  5. A medical example: Carcinoid Heart Disease (CHD), van Gerven (2003) • Eleven CHD risk factors X 1 , . . . , X 11 measured at patient admission to the clinic: diarrhea, hepatic metastases, etc. • Dependent variable Y has two values: either 0 if CHD does not develop of 1 if it does. • The conditional probability P ( Y | X 1 , . . . , X 11 ) is modeled by a noisy threshold model with ℓ = 6 . • The threshold model (without noise) implies that the CHD develop if at least 6 risk factors are positive.

  6. A medical example: Carcinoid Heart Disease (CHD), van Gerven (2003) • Eleven CHD risk factors X 1 , . . . , X 11 measured at patient admission to the clinic: diarrhea, hepatic metastases, etc. • Dependent variable Y has two values: either 0 if CHD does not develop of 1 if it does. • The conditional probability P ( Y | X 1 , . . . , X 11 ) is modeled by a noisy threshold model with ℓ = 6 . • The threshold model (without noise) implies that the CHD develop if at least 6 risk factors are positive. • The noise on inputs allows a non-zero probability of no CHD even if at least 6 risk factors are positive.

  7. BN2T - Bayesian Network with 2 Layers Consisting of Noisy Threshold Models X 1 X 2 X 3 X 4 Y 1 Y 2 Y j takes value 1 iff at least ℓ out of k parents X i take value 1 .

  8. BN2T - Bayesian Network with 2 Layers Consisting of Noisy Threshold Models X 1 X 2 X 3 X 4 Y 1 Y 2 Y j takes value 1 iff at least ℓ out of k parents X i take value 1 . Assume ℓ = 2 .

  9. BN2T - Bayesian Network with 2 Layers Consisting of Noisy Threshold Models For deterministic threshold the value of p = P ( Y 1 = 1 | X 1 , X 2 , X 3 ) is X 1 X 2 X 3 X 4 X 1 X 2 X 3 p 0 0 0 0 0 0 1 0 0 1 0 0 Y 1 Y 2 0 1 1 1 Y j takes value 1 iff at 1 0 0 0 least ℓ out of k parents 1 0 1 1 1 1 0 1 X i take value 1 . 1 1 1 1 Assume ℓ = 2 .

  10. BN2T - Bayesian Network with 2 Layers Consisting of Noisy Threshold Models For noisy threshold the value of p ′ = P ( Y 1 = 1 | X 1 , X 2 , X 3 ) is X 1 X 2 X 3 X 4 p ′ X 1 X 2 X 3 p 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 Y 1 Y 2 0 1 1 1 (1 − p 2 )(1 − p 3 ) Y j takes value 1 iff at 1 0 0 0 0 least ℓ out of k parents 1 0 1 1 (1 − p 1 )(1 − p 3 ) 1 1 0 1 (1 − p 1 )(1 − p 2 ) X i take value 1 . 1 1 1 1 (1 − p 1 )(1 − p 2 )(1 − p 3 ) Assume ℓ = 2 . + p 1 (1 − p 2 )(1 − p 3 ) +(1 − p 1 ) p 2 (1 − p 3 ) +(1 − p 1 )(1 − p 2 ) p 3 where p 1 , p 2 , p 3 are inhibitory probabilities.

  11. BN2T - Bayesian Network with 2 Layers Consisting of Noisy Threshold Models The joint probability of the Bayesian network X 1 X 2 X 3 X 4 P ( X 1 , . . . , X n , Y 1 , . . . , Y m ) n m � � = P ( X i ) P ( Y j | pa ( Y j )) . Y 1 Y 2 i =1 j =1 Y j takes value 1 iff at least ℓ out of k parents X i take value 1 . Assume ℓ = 2 .

  12. Noisy-threshold with explicit deterministic and noisy parts X 1 X 2 X k . . . X ′ X ′ X ′ 1 2 . . . k Y

  13. Noisy-threshold with explicit deterministic and noisy parts P X ′ i | X i � 1 � p i = X 1 X 2 X k . . . 0 1 − p i X ′ X ′ X ′ 1 2 . . . k Y

  14. Noisy-threshold with explicit deterministic and noisy parts P X ′ i | X i � 1 � p i = X 1 X 2 X k . . . 0 1 − p i P Y | X ′ 1 , X ′ 2 , X ′ 3 � � � � X ′ X ′ X ′   1 1 1 0 1 2 . . . k 1 0 0 0     = .   � � � � 0 0 0 1     Y 0 1 1 1 where we visualize the CPT as a tensor using nested matrices.

  15. Tensor of the threshold (for ℓ = 1 ) as 4D cube and its decomposition 1 1 0 0 1 1 0 0 1 1 0 0 0 1 X ′ 3 1 0 Y X ′ 2 X ′ 1

  16. Tensor of the threshold (for ℓ = 1 ) as 4D cube and its decomposition 1 1 0 1 1 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 + = 1 1 1 1 0 0 0 0 0 0 0 0 −1 0 1 1 1 0 X ′ 3 1 0 0 0 1 0 Y X ′ 2 X ′ 1

  17. Tensor of the threshold (for ℓ = 1 ) as 4D cube and its decomposition 1 1 0 1 1 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 + = 1 1 1 1 0 0 0 0 0 0 0 0 −1 0 1 1 1 0 X ′ 3 1 0 0 0 1 0 Y X ′ 2 X ′ 1 1 −1 1 0 1 0 + = 0 1 1 1 1 1 1 1 1 0

  18. CP tensor decomposition of the threshold CPT (Vomlel, Tichavsk´ y, 2012) X ′ X ′ X ′ 1 2 k . . . Y ′

  19. CP tensor decomposition of the threshold CPT (Vomlel, Tichavsk´ y, 2012) P Y =1 | X ′ 1 , X ′ 2 , X ′ 3 3 � � = ψ X ′ i , Y ′ , X ′ X ′ X ′ 1 2 k . . . i =1 Y ′ Y ′

  20. CP tensor decomposition of the threshold CPT (Vomlel, Tichavsk´ y, 2012) P Y =1 | X ′ 1 , X ′ 2 , X ′ 3 3 � � = ψ X ′ i , Y ′ , X ′ X ′ X ′ 1 2 k . . . i =1 Y ′ ψ X ′ i , Y ′   � � � 1 1 1 3 3 − 3 Y ′ 6 3 2 =   � �   1 1 2 3 − 3 0 6 3 for i = 1 , 2 , 3 .

  21. CP tensor decomposition of the threshold CPT (Vomlel, Tichavsk´ y, 2012) P Y =1 | X ′ 1 , X ′ 2 , X ′ 3 3 � � = ψ X ′ i , Y ′ , X ′ X ′ X ′ 1 2 k . . . i =1 Y ′ ψ X ′ i , Y ′   � � � 1 1 1 3 3 − 3 Y ′ 6 3 2 =   � �   1 1 2 3 − 3 0 6 3 for i = 1 , 2 , 3 . Instead of an array with 2 k entries we get k arrays with 2 k entries!

  22. Probabilistic inference by weighted model counting (WMC) • The basic idea of WMC is to encode a Bayesian network using a conjunctive normal form (CNF),

  23. Probabilistic inference by weighted model counting (WMC) • The basic idea of WMC is to encode a Bayesian network using a conjunctive normal form (CNF), • associate weights to literals according to the CPTs of the Bayesian network, and

  24. Probabilistic inference by weighted model counting (WMC) • The basic idea of WMC is to encode a Bayesian network using a conjunctive normal form (CNF), • associate weights to literals according to the CPTs of the Bayesian network, and • compute the probability of evidence as the sum of weights of all logical models consistent with that evidence.

  25. Probabilistic inference by weighted model counting (WMC) • The basic idea of WMC is to encode a Bayesian network using a conjunctive normal form (CNF), • associate weights to literals according to the CPTs of the Bayesian network, and • compute the probability of evidence as the sum of weights of all logical models consistent with that evidence. • The weight of a logical model is the product of weights of all literals.

  26. Probabilistic inference by weighted model counting (WMC) • The basic idea of WMC is to encode a Bayesian network using a conjunctive normal form (CNF), • associate weights to literals according to the CPTs of the Bayesian network, and • compute the probability of evidence as the sum of weights of all logical models consistent with that evidence. • The weight of a logical model is the product of weights of all literals. • Efficient WMC solvers exploiting several advanced techniques such as clause learning, component caching can be used – e.g. Cachet.

  27. Probabilistic inference by weighted model counting (WMC) • The basic idea of WMC is to encode a Bayesian network using a conjunctive normal form (CNF), • associate weights to literals according to the CPTs of the Bayesian network, and • compute the probability of evidence as the sum of weights of all logical models consistent with that evidence. • The weight of a logical model is the product of weights of all literals. • Efficient WMC solvers exploiting several advanced techniques such as clause learning, component caching can be used – e.g. Cachet. • If the Bayesian network exhibit a lot of determinism this is much more efficient than standard techniques.

  28. Encoding transformed BN2T as a CNF using Chavira and Darwiche (2008) encoding Clauses for indicator ( λ ) and parameter ( θ ) logical variables: ⊕ x ∈ X i λ x ”states of X i are mutualy exclusive” X i

  29. Encoding transformed BN2T as a CNF using Chavira and Darwiche (2008) encoding Clauses for indicator ( λ ) and parameter ( θ ) logical variables: ⊕ x ∈ X i λ x ”states of X i are mutualy exclusive” X i ”states of Y ′ j λ y j are mutualy exclusive” ⊕ y ∈ Y ′ Y ′ j

Recommend


More recommend