Probabilistic Inference in BN2T Models by Weighted Model Counting Jirka Vomlel Institute of Information Theory and Automation Academy of Sciences of the Czech Republic http://www.utia.cz/vomlel Aalborg, Denmark, November, 21, 2013
A medical example: Carcinoid Heart Disease (CHD), van Gerven (2003) • Eleven CHD risk factors X 1 , . . . , X 11 measured at patient admission to the clinic: diarrhea, hepatic metastases, etc.
A medical example: Carcinoid Heart Disease (CHD), van Gerven (2003) • Eleven CHD risk factors X 1 , . . . , X 11 measured at patient admission to the clinic: diarrhea, hepatic metastases, etc. • Dependent variable Y has two values: either 0 if CHD does not develop of 1 if it does.
A medical example: Carcinoid Heart Disease (CHD), van Gerven (2003) • Eleven CHD risk factors X 1 , . . . , X 11 measured at patient admission to the clinic: diarrhea, hepatic metastases, etc. • Dependent variable Y has two values: either 0 if CHD does not develop of 1 if it does. • The conditional probability P ( Y | X 1 , . . . , X 11 ) is modeled by a noisy threshold model with ℓ = 6 .
A medical example: Carcinoid Heart Disease (CHD), van Gerven (2003) • Eleven CHD risk factors X 1 , . . . , X 11 measured at patient admission to the clinic: diarrhea, hepatic metastases, etc. • Dependent variable Y has two values: either 0 if CHD does not develop of 1 if it does. • The conditional probability P ( Y | X 1 , . . . , X 11 ) is modeled by a noisy threshold model with ℓ = 6 . • The threshold model (without noise) implies that the CHD develop if at least 6 risk factors are positive.
A medical example: Carcinoid Heart Disease (CHD), van Gerven (2003) • Eleven CHD risk factors X 1 , . . . , X 11 measured at patient admission to the clinic: diarrhea, hepatic metastases, etc. • Dependent variable Y has two values: either 0 if CHD does not develop of 1 if it does. • The conditional probability P ( Y | X 1 , . . . , X 11 ) is modeled by a noisy threshold model with ℓ = 6 . • The threshold model (without noise) implies that the CHD develop if at least 6 risk factors are positive. • The noise on inputs allows a non-zero probability of no CHD even if at least 6 risk factors are positive.
BN2T - Bayesian Network with 2 Layers Consisting of Noisy Threshold Models X 1 X 2 X 3 X 4 Y 1 Y 2 Y j takes value 1 iff at least ℓ out of k parents X i take value 1 .
BN2T - Bayesian Network with 2 Layers Consisting of Noisy Threshold Models X 1 X 2 X 3 X 4 Y 1 Y 2 Y j takes value 1 iff at least ℓ out of k parents X i take value 1 . Assume ℓ = 2 .
BN2T - Bayesian Network with 2 Layers Consisting of Noisy Threshold Models For deterministic threshold the value of p = P ( Y 1 = 1 | X 1 , X 2 , X 3 ) is X 1 X 2 X 3 X 4 X 1 X 2 X 3 p 0 0 0 0 0 0 1 0 0 1 0 0 Y 1 Y 2 0 1 1 1 Y j takes value 1 iff at 1 0 0 0 least ℓ out of k parents 1 0 1 1 1 1 0 1 X i take value 1 . 1 1 1 1 Assume ℓ = 2 .
BN2T - Bayesian Network with 2 Layers Consisting of Noisy Threshold Models For noisy threshold the value of p ′ = P ( Y 1 = 1 | X 1 , X 2 , X 3 ) is X 1 X 2 X 3 X 4 p ′ X 1 X 2 X 3 p 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 Y 1 Y 2 0 1 1 1 (1 − p 2 )(1 − p 3 ) Y j takes value 1 iff at 1 0 0 0 0 least ℓ out of k parents 1 0 1 1 (1 − p 1 )(1 − p 3 ) 1 1 0 1 (1 − p 1 )(1 − p 2 ) X i take value 1 . 1 1 1 1 (1 − p 1 )(1 − p 2 )(1 − p 3 ) Assume ℓ = 2 . + p 1 (1 − p 2 )(1 − p 3 ) +(1 − p 1 ) p 2 (1 − p 3 ) +(1 − p 1 )(1 − p 2 ) p 3 where p 1 , p 2 , p 3 are inhibitory probabilities.
BN2T - Bayesian Network with 2 Layers Consisting of Noisy Threshold Models The joint probability of the Bayesian network X 1 X 2 X 3 X 4 P ( X 1 , . . . , X n , Y 1 , . . . , Y m ) n m � � = P ( X i ) P ( Y j | pa ( Y j )) . Y 1 Y 2 i =1 j =1 Y j takes value 1 iff at least ℓ out of k parents X i take value 1 . Assume ℓ = 2 .
Noisy-threshold with explicit deterministic and noisy parts X 1 X 2 X k . . . X ′ X ′ X ′ 1 2 . . . k Y
Noisy-threshold with explicit deterministic and noisy parts P X ′ i | X i � 1 � p i = X 1 X 2 X k . . . 0 1 − p i X ′ X ′ X ′ 1 2 . . . k Y
Noisy-threshold with explicit deterministic and noisy parts P X ′ i | X i � 1 � p i = X 1 X 2 X k . . . 0 1 − p i P Y | X ′ 1 , X ′ 2 , X ′ 3 � � � � X ′ X ′ X ′ 1 1 1 0 1 2 . . . k 1 0 0 0 = . � � � � 0 0 0 1 Y 0 1 1 1 where we visualize the CPT as a tensor using nested matrices.
Tensor of the threshold (for ℓ = 1 ) as 4D cube and its decomposition 1 1 0 0 1 1 0 0 1 1 0 0 0 1 X ′ 3 1 0 Y X ′ 2 X ′ 1
Tensor of the threshold (for ℓ = 1 ) as 4D cube and its decomposition 1 1 0 1 1 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 + = 1 1 1 1 0 0 0 0 0 0 0 0 −1 0 1 1 1 0 X ′ 3 1 0 0 0 1 0 Y X ′ 2 X ′ 1
Tensor of the threshold (for ℓ = 1 ) as 4D cube and its decomposition 1 1 0 1 1 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 + = 1 1 1 1 0 0 0 0 0 0 0 0 −1 0 1 1 1 0 X ′ 3 1 0 0 0 1 0 Y X ′ 2 X ′ 1 1 −1 1 0 1 0 + = 0 1 1 1 1 1 1 1 1 0
CP tensor decomposition of the threshold CPT (Vomlel, Tichavsk´ y, 2012) X ′ X ′ X ′ 1 2 k . . . Y ′
CP tensor decomposition of the threshold CPT (Vomlel, Tichavsk´ y, 2012) P Y =1 | X ′ 1 , X ′ 2 , X ′ 3 3 � � = ψ X ′ i , Y ′ , X ′ X ′ X ′ 1 2 k . . . i =1 Y ′ Y ′
CP tensor decomposition of the threshold CPT (Vomlel, Tichavsk´ y, 2012) P Y =1 | X ′ 1 , X ′ 2 , X ′ 3 3 � � = ψ X ′ i , Y ′ , X ′ X ′ X ′ 1 2 k . . . i =1 Y ′ ψ X ′ i , Y ′ � � � 1 1 1 3 3 − 3 Y ′ 6 3 2 = � � 1 1 2 3 − 3 0 6 3 for i = 1 , 2 , 3 .
CP tensor decomposition of the threshold CPT (Vomlel, Tichavsk´ y, 2012) P Y =1 | X ′ 1 , X ′ 2 , X ′ 3 3 � � = ψ X ′ i , Y ′ , X ′ X ′ X ′ 1 2 k . . . i =1 Y ′ ψ X ′ i , Y ′ � � � 1 1 1 3 3 − 3 Y ′ 6 3 2 = � � 1 1 2 3 − 3 0 6 3 for i = 1 , 2 , 3 . Instead of an array with 2 k entries we get k arrays with 2 k entries!
Probabilistic inference by weighted model counting (WMC) • The basic idea of WMC is to encode a Bayesian network using a conjunctive normal form (CNF),
Probabilistic inference by weighted model counting (WMC) • The basic idea of WMC is to encode a Bayesian network using a conjunctive normal form (CNF), • associate weights to literals according to the CPTs of the Bayesian network, and
Probabilistic inference by weighted model counting (WMC) • The basic idea of WMC is to encode a Bayesian network using a conjunctive normal form (CNF), • associate weights to literals according to the CPTs of the Bayesian network, and • compute the probability of evidence as the sum of weights of all logical models consistent with that evidence.
Probabilistic inference by weighted model counting (WMC) • The basic idea of WMC is to encode a Bayesian network using a conjunctive normal form (CNF), • associate weights to literals according to the CPTs of the Bayesian network, and • compute the probability of evidence as the sum of weights of all logical models consistent with that evidence. • The weight of a logical model is the product of weights of all literals.
Probabilistic inference by weighted model counting (WMC) • The basic idea of WMC is to encode a Bayesian network using a conjunctive normal form (CNF), • associate weights to literals according to the CPTs of the Bayesian network, and • compute the probability of evidence as the sum of weights of all logical models consistent with that evidence. • The weight of a logical model is the product of weights of all literals. • Efficient WMC solvers exploiting several advanced techniques such as clause learning, component caching can be used – e.g. Cachet.
Probabilistic inference by weighted model counting (WMC) • The basic idea of WMC is to encode a Bayesian network using a conjunctive normal form (CNF), • associate weights to literals according to the CPTs of the Bayesian network, and • compute the probability of evidence as the sum of weights of all logical models consistent with that evidence. • The weight of a logical model is the product of weights of all literals. • Efficient WMC solvers exploiting several advanced techniques such as clause learning, component caching can be used – e.g. Cachet. • If the Bayesian network exhibit a lot of determinism this is much more efficient than standard techniques.
Encoding transformed BN2T as a CNF using Chavira and Darwiche (2008) encoding Clauses for indicator ( λ ) and parameter ( θ ) logical variables: ⊕ x ∈ X i λ x ”states of X i are mutualy exclusive” X i
Encoding transformed BN2T as a CNF using Chavira and Darwiche (2008) encoding Clauses for indicator ( λ ) and parameter ( θ ) logical variables: ⊕ x ∈ X i λ x ”states of X i are mutualy exclusive” X i ”states of Y ′ j λ y j are mutualy exclusive” ⊕ y ∈ Y ′ Y ′ j
Recommend
More recommend