Computationally efficient probabilistic inference with noisy threshold models based on a CP tensor decomposition Jirka Vomlel and Petr Tichavsk´ y Institute of Information Theory and Automation (´ UTIA) Academy of Sciences of the Czech Republic
Contents • Motivation
Contents • Motivation • Noisy threshold models
Contents • Motivation • Noisy threshold models • CP-decomposition of conditional probability tables
Contents • Motivation • Noisy threshold models • CP-decomposition of conditional probability tables • Experiments
Contents • Motivation • Noisy threshold models • CP-decomposition of conditional probability tables • Experiments • Conclusions
Quick Medical Reference - Decision Theoretic (QMR-DT) Miller et al. (1986) and Shwe et al. (1991). • 570 diseases in the first level
Quick Medical Reference - Decision Theoretic (QMR-DT) Miller et al. (1986) and Shwe et al. (1991). • 570 diseases in the first level • 4075 observations in the second level
Quick Medical Reference - Decision Theoretic (QMR-DT) Miller et al. (1986) and Shwe et al. (1991). • 570 diseases in the first level • 4075 observations in the second level • all variables are binary
Quick Medical Reference - Decision Theoretic (QMR-DT) Miller et al. (1986) and Shwe et al. (1991). • 570 diseases in the first level • 4075 observations in the second level • all variables are binary • conditional probability tables are noisy-or models
Quick Medical Reference - Decision Theoretic (QMR-DT) Miller et al. (1986) and Shwe et al. (1991). • 570 diseases in the first level • 4075 observations in the second level • all variables are binary • conditional probability tables are noisy-or models X 3 X 1 X 2 X 4 X 5 X 6 Y 1 Y 2
Quick Medical Reference - Decision Theoretic (QMR-DT) Miller et al. (1986) and Shwe et al. (1991). • 570 diseases in the first level • 4075 observations in the second level • all variables are binary • conditional probability tables are noisy-or models X 3 X 1 X 2 X 4 X 5 X 6 Y 1 Y 2 Definition (The inference task) Given a subset of observations (e.g. Y 1 and Y 2 ) compute probabilities of diseases (e.g. P ( X i | Y 1 = y 1 , Y 2 = y 2 ) , i = 1, . . . , 6.
Noisy threshold - a generalization of noisy-or X 1 X 2 . . . X k X ′ X ′ X ′ . . . 1 2 k Y
Noisy threshold - a generalization of noisy-or Y takes value 1 if at least ℓ out of k parents take value 1: P ( Y = 1 | X ′ 1 = x ′ 1 , . . . , X ′ k = x ′ k ) X 1 X 2 . . . X k � 1 if x ′ 1 + . . . + x ′ k � ℓ = 0 otherwise. X ′ X ′ X ′ . . . 1 2 k Y
Noisy threshold - a generalization of noisy-or Y takes value 1 if at least ℓ out of k parents take value 1: P ( Y = 1 | X ′ 1 = x ′ 1 , . . . , X ′ k = x ′ k ) X 1 X 2 . . . X k � 1 if x ′ 1 + . . . + x ′ k � ℓ = 0 otherwise. Noise: for i = 1, . . . , k X ′ X ′ X ′ . . . 1 2 k P ( X ′ i = 1 | X i = x i ) � 0 if x i = 0 = otherwise. π i Y
An example for k = 4, ℓ = 1, and π i = 1, i = 1, . . . , k - i.e., for deterministic OR function P ( Y = 1 | X 1 = x 1 , . . . , X 4 = x 4 )
An example for k = 4, ℓ = 1, and π i = 1, i = 1, . . . , k - i.e., for deterministic OR function P ( Y = 1 | X 1 = x 1 , . . . , X 4 = x 4 ) � 0 � 1 � � 1 1 1 1 1 1 = � 1 � 1 � � 1 1 1 1 1 1
An example for k = 4, ℓ = 1, and π i = 1, i = 1, . . . , k - i.e., for deterministic OR function P ( Y = 1 | X 1 = x 1 , . . . , X 4 = x 4 ) � 0 � 1 � � 1 1 1 1 1 1 = � 1 � 1 � � 1 1 1 1 1 1 � 1 � 1 � 1 � 0 � � � � 1 1 0 0 1 1 1 1 0 0 0 0 = − � 1 � 1 � 0 � 0 � � � � 1 1 0 0 1 1 1 1 0 0 0 0
An example for k = 4, ℓ = 1, and π i = 1, i = 1, . . . , k - i.e., for deterministic OR function P ( Y = 1 | X 1 = x 1 , . . . , X 4 = x 4 ) � 0 � 1 � � 1 1 1 1 1 1 = � 1 � 1 � � 1 1 1 1 1 1 � 1 � 1 � 1 � 0 � � � � 1 1 0 0 1 1 1 1 0 0 0 0 = − � 1 � 1 � 0 � 0 � � � � 1 1 0 0 1 1 1 1 0 0 0 0 = ( 1, 1 ) ⊗ ( 1, 1 ) ⊗ ( 1, 1 ) ⊗ ( 1, 1 ) − ( 1, 0 ) ⊗ ( 1, 0 ) ⊗ ( 1, 0 ) ⊗ ( 1, 0 )
An example for k = 4, ℓ = 1, and π i = 1, i = 1, . . . , k - i.e., for deterministic OR function P ( Y = 1 | X 1 = x 1 , . . . , X 4 = x 4 ) � 0 � 1 � � 1 1 1 1 1 1 = � 1 � 1 � � 1 1 1 1 1 1 � 1 � 1 � 1 � 0 � � � � 1 1 0 0 1 1 1 1 0 0 0 0 = − � 1 � 1 � 0 � 0 � � � � 1 1 0 0 1 1 1 1 0 0 0 0 = ( 1, 1 ) ⊗ ( 1, 1 ) ⊗ ( 1, 1 ) ⊗ ( 1, 1 ) − ( 1, 0 ) ⊗ ( 1, 0 ) ⊗ ( 1, 0 ) ⊗ ( 1, 0 ) ( 1, 1 ) ⊗ k − ( 1, 0 ) ⊗ k =
Compilation of the threshold model for ℓ = 1 - the standard approach Lauritzen and Spiegelhalter (1988), Jensen et al. (1990), Shafer and Shenoy (1990) X 1 X 2 Y X 3 X 4
Compilation of the threshold model for ℓ = 1 - the standard approach Lauritzen and Spiegelhalter (1988), Jensen et al. (1990), Shafer and Shenoy (1990) X 1 X 1 X 2 X 2 Y Y X 3 X 3 X 4 X 4
Compilation of the threshold model for ℓ = 1 - the standard approach Lauritzen and Spiegelhalter (1988), Jensen et al. (1990), Shafer and Shenoy (1990) X 1 X 1 X 2 X 2 Y Y X 3 X 3 X 4 X 4 The total table size is 2 5 = 32.
Compilation of the threshold model for ℓ = 1 - after the suggested decomposition D´ ıez and Gal´ an (2002), Vomlel (2002), Savick´ y and Vomlel (2007) X 1 X 2 Y X 3 X 4
Compilation of the threshold model for ℓ = 1 - after the suggested decomposition D´ ıez and Gal´ an (2002), Vomlel (2002), Savick´ y and Vomlel (2007) X 1 X 1 X 2 X 2 Y B Y X 3 X 3 X 4 X 4
Compilation of the threshold model for ℓ = 1 - after the suggested decomposition D´ ıez and Gal´ an (2002), Vomlel (2002), Savick´ y and Vomlel (2007) X 1 X 1 X 2 X 2 Y B Y X 3 X 3 X 4 X 4 The total table size is 5 · 2 2 = 20.
Decomposition of T ( ℓ , k ) into sum of tensor products • P ( Y = 1 | X = x ) can be viewed as a tensor T ( ℓ , k ) .
Decomposition of T ( ℓ , k ) into sum of tensor products • P ( Y = 1 | X = x ) can be viewed as a tensor T ( ℓ , k ) . • All dimensions of T ( ℓ , k ) are equal to 2.
Decomposition of T ( ℓ , k ) into sum of tensor products • P ( Y = 1 | X = x ) can be viewed as a tensor T ( ℓ , k ) . • All dimensions of T ( ℓ , k ) are equal to 2. • T ( ℓ , k ) is symmetric.
Decomposition of T ( ℓ , k ) into sum of tensor products • P ( Y = 1 | X = x ) can be viewed as a tensor T ( ℓ , k ) . • All dimensions of T ( ℓ , k ) are equal to 2. • T ( ℓ , k ) is symmetric. Definition (Symmetric rank) Symmetric rank (srank) is the minimum number r such that r � b i · a ⊗ k T ( ℓ , k ) = i i = 1 where for i = 1, . . . , k : • b i ∈ R and • a i are real-valued vectors of length 2.
Decomposition of T ( ℓ , k ) into sum of tensor products • P ( Y = 1 | X = x ) can be viewed as a tensor T ( ℓ , k ) . • All dimensions of T ( ℓ , k ) are equal to 2. • T ( ℓ , k ) is symmetric. Definition (Symmetric rank) Symmetric rank (srank) is the minimum number r such that r � b i · a ⊗ k T ( ℓ , k ) = i i = 1 where for i = 1, . . . , k : • b i ∈ R and • a i are real-valued vectors of length 2. • This decomposition is called Canonical Polyadic (CP) or CANDECOMP-PARAFAC (CP) or tensor rank-one .
Theoretical results Results in the proceedings: • srank ( T ( 0, k )) = 1.
Theoretical results Results in the proceedings: • srank ( T ( 0, k )) = 1. • srank ( T ( k , k )) = 1.
Theoretical results Results in the proceedings: • srank ( T ( 0, k )) = 1. • srank ( T ( k , k )) = 1. • srank ( T ( 1, k )) = 2.
Theoretical results Results in the proceedings: • srank ( T ( 0, k )) = 1. • srank ( T ( k , k )) = 1. • srank ( T ( 1, k )) = 2. • srank ( T ( k − 1, k )) = k .
Theoretical results Results in the proceedings: • srank ( T ( 0, k )) = 1. • srank ( T ( k , k )) = 1. • srank ( T ( 1, k )) = 2. • srank ( T ( k − 1, k )) = k . • srank ( T ( ℓ , k )) � k for ℓ = 3, . . . , k − 2.
Theoretical results Results in the proceedings: • srank ( T ( 0, k )) = 1. • srank ( T ( k , k )) = 1. • srank ( T ( 1, k )) = 2. • srank ( T ( k − 1, k )) = k . • srank ( T ( ℓ , k )) � k for ℓ = 3, . . . , k − 2. • An algorithm for CP-decomposition to k factors.
Theoretical results Results in the proceedings: • srank ( T ( 0, k )) = 1. • srank ( T ( k , k )) = 1. • srank ( T ( 1, k )) = 2. • srank ( T ( k − 1, k )) = k . • srank ( T ( ℓ , k )) � k for ℓ = 3, . . . , k − 2. • An algorithm for CP-decomposition to k factors. • For the noisy threshold the above values represent upper bounds.
Recommend
More recommend