full bayesian network classifiers by jiang su and harry
play

Full Bayesian Network Classifiers by Jiang Su and Harry Zhang - PowerPoint PPT Presentation

Full Bayesian Network Classifiers by Jiang Su and Harry Zhang Flemming Jensen November 2008 Purpose To introduce the full Bayesian network classifier(FBC). Introduction Bayesian networks are often used for the classification problem, where a


  1. Structure Learning Algorithm Algorithm FBC-Structure( S , X ) 1 B = empty. 2 Partition the training data S into | C | subsets S c by the class value c . 3 For each training data set S c Compute the mutual information M ( X i ; X j ) and the dependency threshold φ ( X i , X j ) between each pair of variables X i and X j . Compute W ( X i ) for each variable X i . For all variables X i in X

  2. Structure Learning Algorithm Algorithm FBC-Structure( S , X ) 1 B = empty. 2 Partition the training data S into | C | subsets S c by the class value c . 3 For each training data set S c Compute the mutual information M ( X i ; X j ) and the dependency threshold φ ( X i , X j ) between each pair of variables X i and X j . Compute W ( X i ) for each variable X i . For all variables X i in X - Add all the variables X j with W ( X j ) > W ( X i ) to the parent set Π X i of X i .

  3. Structure Learning Algorithm Algorithm FBC-Structure( S , X ) 1 B = empty. 2 Partition the training data S into | C | subsets S c by the class value c . 3 For each training data set S c Compute the mutual information M ( X i ; X j ) and the dependency threshold φ ( X i , X j ) between each pair of variables X i and X j . Compute W ( X i ) for each variable X i . For all variables X i in X - Add all the variables X j with W ( X j ) > W ( X i ) to the parent set Π X i of X i . - Add arcs from all the variables X j in Π X i to X i .

  4. Structure Learning Algorithm Algorithm FBC-Structure( S , X ) 1 B = empty. 2 Partition the training data S into | C | subsets S c by the class value c . 3 For each training data set S c Compute the mutual information M ( X i ; X j ) and the dependency threshold φ ( X i , X j ) between each pair of variables X i and X j . Compute W ( X i ) for each variable X i . For all variables X i in X - Add all the variables X j with W ( X j ) > W ( X i ) to the parent set Π X i of X i . - Add arcs from all the variables X j in Π X i to X i . Add the resulting network B c to B .

  5. Structure Learning Algorithm Algorithm FBC-Structure( S , X ) 1 B = empty. 2 Partition the training data S into | C | subsets S c by the class value c . 3 For each training data set S c Compute the mutual information M ( X i ; X j ) and the dependency threshold φ ( X i , X j ) between each pair of variables X i and X j . Compute W ( X i ) for each variable X i . For all variables X i in X - Add all the variables X j with W ( X j ) > W ( X i ) to the parent set Π X i of X i . - Add arcs from all the variables X j in Π X i to X i . Add the resulting network B c to B . 4 Return B .

  6. Example - Structure Learning Algorithm Example using 1000 labeled instances, where C is the class variable and A , B , and D are feature variables. C A B D # C A B D # c 1 a 1 b 1 d 1 11 c 2 a 1 b 1 d 1 36 c 1 a 1 b 1 d 2 5 c 2 a 1 b 1 d 2 36 7 259 c 1 a 1 b 2 d 1 c 2 a 1 b 2 d 1 17 29 c 1 a 1 b 2 d 2 c 2 a 1 b 2 d 2 227 96 c 1 a 2 b 1 d 1 c 2 a 2 b 1 d 1 97 96 c 1 a 2 b 1 d 2 c 2 a 2 b 1 d 2 11 43 c 1 a 2 b 2 d 1 c 2 a 2 b 2 d 1 25 5 c 1 a 2 b 2 d 2 c 2 a 2 b 2 d 2

  7. Example - Structure Learning Algorithm # C A B D 11 c 1 a 1 b 1 d 1 5 c 1 a 1 b 1 d 2 7 c 1 a 1 b 2 d 1 17 c 1 a 1 b 2 d 2 227 c 1 a 2 b 1 d 1 97 c 1 a 2 b 1 d 2 c 1 a 2 b 2 d 1 11 c 1 a 2 b 2 d 2 25 The 400 data instances where C = c 1 . b 1 b 2 11+5 7+17 a 1 400 400 227+97 11+25 a 2 400 400 P ( A , B )

  8. Example - Structure Learning Algorithm # C A B D 11 c 1 a 1 b 1 d 1 5 c 1 a 1 b 1 d 2 7 c 1 a 1 b 2 d 1 17 c 1 a 1 b 2 d 2 227 c 1 a 2 b 1 d 1 97 c 1 a 2 b 1 d 2 c 1 a 2 b 2 d 1 11 c 1 a 2 b 2 d 2 25 The 400 data instances where C = c 1 . b 1 b 2 11+5 7+17 a 1 400 400 227+97 11+25 a 2 400 400 P ( A , B )

  9. Example - Structure Learning Algorithm # C A B D 11 c 1 a 1 b 1 d 1 5 c 1 a 1 b 1 d 2 7 c 1 a 1 b 2 d 1 17 c 1 a 1 b 2 d 2 227 c 1 a 2 b 1 d 1 97 c 1 a 2 b 1 d 2 c 1 a 2 b 2 d 1 11 c 1 a 2 b 2 d 2 25 The 400 data instances where C = c 1 . b 1 b 2 11+5 7+17 a 1 400 400 227+97 11+25 a 2 400 400 P ( A , B )

  10. Example - Structure Learning Algorithm # C A B D 11 c 1 a 1 b 1 d 1 5 c 1 a 1 b 1 d 2 7 c 1 a 1 b 2 d 1 17 c 1 a 1 b 2 d 2 227 c 1 a 2 b 1 d 1 97 c 1 a 2 b 1 d 2 c 1 a 2 b 2 d 1 11 c 1 a 2 b 2 d 2 25 The 400 data instances where C = c 1 . b 1 b 2 11+5 7+17 a 1 400 400 227+97 11+25 a 2 400 400 P ( A , B )

  11. Example - Structure Learning Algorithm # C A B D 11 c 1 a 1 b 1 d 1 5 c 1 a 1 b 1 d 2 7 c 1 a 1 b 2 d 1 17 c 1 a 1 b 2 d 2 227 c 1 a 2 b 1 d 1 97 c 1 a 2 b 1 d 2 c 1 a 2 b 2 d 1 11 c 1 a 2 b 2 d 2 25 The 400 data instances where C = c 1 . b 1 b 2 11+5 7+17 a 1 400 400 227+97 11+25 a 2 400 400 P ( A , B )

  12. Example - Structure Learning Algorithm # C A B D 11 c 1 a 1 b 1 d 1 5 c 1 a 1 b 1 d 2 7 c 1 a 1 b 2 d 1 17 c 1 a 1 b 2 d 2 227 c 1 a 2 b 1 d 1 97 c 1 a 2 b 1 d 2 c 1 a 2 b 2 d 1 11 c 1 a 2 b 2 d 2 25 The 400 data instances where C = c 1 . b 1 b 2 11+5 7+17 a 1 400 400 227+97 11+25 a 2 400 400 P ( A , B )

  13. Example - Structure Learning Algorithm # C A B D 11 c 1 a 1 b 1 d 1 5 c 1 a 1 b 1 d 2 7 c 1 a 1 b 2 d 1 17 c 1 a 1 b 2 d 2 227 c 1 a 2 b 1 d 1 97 c 1 a 2 b 1 d 2 c 1 a 2 b 2 d 1 11 c 1 a 2 b 2 d 2 25 The 400 data instances where C = c 1 . b 1 b 2 a 1 0.04 0.06 a 2 0.81 0.09 P ( A , B )

  14. Example - Structure Learning Algorithm b 1 b 2 a 1 0.04 0.06 a 2 0.81 0.09 P ( A , B ) b 1 b 2 a 1 (0.04 + 0.06) · (0.04 + 0.81) (0.04 + 0.06) · (0.06 + 0.09) a 2 (0.81 + 0.09) · (0.04 + 0.81) (0.81 + 0.09) · (0.06 + 0.09) P ( A ) P ( B ) P ( x , y ) log P ( x , y ) � M ( X ; Y ) = P ( x ) P ( y ) x ∈ X , y ∈ Y M ( A ; B )= 0 . 04 · log ( 0 . 04 0 . 085) + 0 . 81 · log ( 0 . 81 0 . 765) +0 . 06 · log ( 0 . 06 0 . 015) + 0 . 09 · log ( 0 . 09 0 . 135) = 0 . 027

  15. Example - Structure Learning Algorithm b 1 b 2 a 1 0.04 0.06 a 2 0.81 0.09 P ( A , B ) b 1 b 2 a 1 (0.04 + 0.06) · (0.04 + 0.81) (0.04 + 0.06) · (0.06 + 0.09) a 2 (0.81 + 0.09) · (0.04 + 0.81) (0.81 + 0.09) · (0.06 + 0.09) P ( A ) P ( B ) P ( x , y ) log P ( x , y ) � M ( X ; Y ) = P ( x ) P ( y ) x ∈ X , y ∈ Y M ( A ; B )= 0 . 04 · log ( 0 . 04 0 . 085) + 0 . 81 · log ( 0 . 81 0 . 765) +0 . 06 · log ( 0 . 06 0 . 015) + 0 . 09 · log ( 0 . 09 0 . 135) = 0 . 027

  16. Example - Structure Learning Algorithm b 1 b 2 a 1 0.04 0.06 a 2 0.81 0.09 P ( A , B ) b 1 b 2 a 1 (0.04 + 0.06) · (0.04 + 0.81) (0.04 + 0.06) · (0.06 + 0.09) a 2 (0.81 + 0.09) · (0.04 + 0.81) (0.81 + 0.09) · (0.06 + 0.09) P ( A ) P ( B ) P ( x , y ) log P ( x , y ) � M ( X ; Y ) = P ( x ) P ( y ) x ∈ X , y ∈ Y M ( A ; B )= 0 . 04 · log ( 0 . 04 0 . 085) + 0 . 81 · log ( 0 . 81 0 . 765) +0 . 06 · log ( 0 . 06 0 . 015) + 0 . 09 · log ( 0 . 09 0 . 135) = 0 . 027

  17. Example - Structure Learning Algorithm b 1 b 2 a 1 0.04 0.06 a 2 0.81 0.09 P ( A , B ) b 1 b 2 a 1 (0.04 + 0.06) · (0.04 + 0.81) (0.04 + 0.06) · (0.06 + 0.09) a 2 (0.81 + 0.09) · (0.04 + 0.81) (0.81 + 0.09) · (0.06 + 0.09) P ( A ) P ( B ) P ( x , y ) log P ( x , y ) � M ( X ; Y ) = P ( x ) P ( y ) x ∈ X , y ∈ Y M ( A ; B )= 0 . 04 · log ( 0 . 04 0 . 085) + 0 . 81 · log ( 0 . 81 0 . 765) +0 . 06 · log ( 0 . 06 0 . 015) + 0 . 09 · log ( 0 . 09 0 . 135) = 0 . 027

  18. Example - Structure Learning Algorithm b 1 b 2 a 1 0.04 0.06 a 2 0.81 0.09 P ( A , B ) b 1 b 2 a 1 (0.04 + 0.06) · (0.04 + 0.81) (0.04 + 0.06) · (0.06 + 0.09) a 2 (0.81 + 0.09) · (0.04 + 0.81) (0.81 + 0.09) · (0.06 + 0.09) P ( A ) P ( B ) P ( x , y ) log P ( x , y ) � M ( X ; Y ) = P ( x ) P ( y ) x ∈ X , y ∈ Y M ( A ; B )= 0 . 04 · log ( 0 . 04 0 . 085) + 0 . 81 · log ( 0 . 81 0 . 765) +0 . 06 · log ( 0 . 06 0 . 015) + 0 . 09 · log ( 0 . 09 0 . 135) = 0 . 027

  19. Example - Structure Learning Algorithm b 1 b 2 a 1 0.04 0.06 a 2 0.81 0.09 P ( A , B ) b 1 b 2 a 1 (0.04 + 0.06) · (0.04 + 0.81) (0.04 + 0.06) · (0.06 + 0.09) a 2 (0.81 + 0.09) · (0.04 + 0.81) (0.81 + 0.09) · (0.06 + 0.09) P ( A ) P ( B ) P ( x , y ) log P ( x , y ) � M ( X ; Y ) = P ( x ) P ( y ) x ∈ X , y ∈ Y M ( A ; B )= 0 . 04 · log ( 0 . 04 0 . 085) + 0 . 81 · log ( 0 . 81 0 . 765) +0 . 06 · log ( 0 . 06 0 . 015) + 0 . 09 · log ( 0 . 09 0 . 135) = 0 . 027

  20. Example - Structure Learning Algorithm b 1 b 2 a 1 0.04 0.06 a 2 0.81 0.09 P ( A , B ) b 1 b 2 a 1 0.085 0.015 a 2 0.765 0.135 P ( A ) P ( B ) P ( x , y ) log P ( x , y ) � M ( X ; Y ) = P ( x ) P ( y ) x ∈ X , y ∈ Y M ( A ; B )= 0 . 04 · log ( 0 . 04 0 . 085) + 0 . 81 · log ( 0 . 81 0 . 765) +0 . 06 · log ( 0 . 06 0 . 015) + 0 . 09 · log ( 0 . 09 0 . 135) = 0 . 027

  21. Example - Structure Learning Algorithm b 1 b 2 a 1 0.04 0.06 a 2 0.81 0.09 P ( A , B ) b 1 b 2 a 1 0.085 0.015 a 2 0.765 0.135 P ( A ) P ( B ) P ( x , y ) log P ( x , y ) � M ( X ; Y ) = P ( x ) P ( y ) x ∈ X , y ∈ Y M ( A ; B )= 0 . 04 · log ( 0 . 04 0 . 085) + 0 . 81 · log ( 0 . 81 0 . 765) +0 . 06 · log ( 0 . 06 0 . 015) + 0 . 09 · log ( 0 . 09 0 . 135) = 0 . 027

  22. Example - Structure Learning Algorithm b 1 b 2 a 1 0.04 0.06 a 2 0.81 0.09 P ( A , B ) b 1 b 2 a 1 0.085 0.015 a 2 0.765 0.135 P ( A ) P ( B ) P ( x , y ) log P ( x , y ) � M ( X ; Y ) = P ( x ) P ( y ) x ∈ X , y ∈ Y M ( A ; B ) = 0 . 04 · log ( 0 . 04 0 . 085) + 0 . 81 · log ( 0 . 81 0 . 765) +0 . 06 · log ( 0 . 06 0 . 015) + 0 . 09 · log ( 0 . 09 0 . 135) = 0 . 027

  23. Example - Structure Learning Algorithm b 1 b 2 a 1 0.04 0.06 a 2 0.81 0.09 P ( A , B ) b 1 b 2 a 1 0.085 0.015 a 2 0.765 0.135 P ( A ) P ( B ) P ( x , y ) log P ( x , y ) � M ( X ; Y ) = P ( x ) P ( y ) x ∈ X , y ∈ Y M ( A ; B ) = 0 . 04 · log ( 0 . 04 0 . 085)+0 . 81 · log ( 0 . 81 0 . 765) +0 . 06 · log ( 0 . 06 0 . 015) + 0 . 09 · log ( 0 . 09 0 . 135) = 0 . 027

  24. Example - Structure Learning Algorithm b 1 b 2 a 1 0.04 0.06 a 2 0.81 0.09 P ( A , B ) b 1 b 2 a 1 0.085 0.015 a 2 0.765 0.135 P ( A ) P ( B ) P ( x , y ) log P ( x , y ) � M ( X ; Y ) = P ( x ) P ( y ) x ∈ X , y ∈ Y M ( A ; B ) = 0 . 04 · log ( 0 . 04 0 . 085) + 0 . 81 · log ( 0 . 81 0 . 765) +0 . 06 · log ( 0 . 06 0 . 015) + 0 . 09 · log ( 0 . 09 0 . 135) = 0 . 027

  25. Example - Structure Learning Algorithm b 1 b 2 a 1 0.04 0.06 a 2 0.81 0.09 P ( A , B ) b 1 b 2 a 1 0.085 0.015 a 2 0.765 0.135 P ( A ) P ( B ) P ( x , y ) log P ( x , y ) � M ( X ; Y ) = P ( x ) P ( y ) x ∈ X , y ∈ Y M ( A ; B ) = 0 . 04 · log ( 0 . 04 0 . 085) + 0 . 81 · log ( 0 . 81 0 . 765) + 0 . 06 · log ( 0 . 06 0 . 015)+0 . 09 · log ( 0 . 09 0 . 135) = 0 . 027

  26. Example - Structure Learning Algorithm b 1 b 2 a 1 0.04 0.06 a 2 0.81 0.09 P ( A , B ) b 1 b 2 a 1 0.085 0.015 a 2 0.765 0.135 P ( A ) P ( B ) P ( x , y ) log P ( x , y ) � M ( X ; Y ) = P ( x ) P ( y ) x ∈ X , y ∈ Y M ( A ; B ) = 0 . 04 · log ( 0 . 04 0 . 085) + 0 . 81 · log ( 0 . 81 0 . 765) + 0 . 06 · log ( 0 . 06 0 . 015) + 0 . 09 · log ( 0 . 09 0 . 135)= 0 . 027

  27. Example - Structure Learning Algorithm b 1 b 2 a 1 0.04 0.06 a 2 0.81 0.09 P ( A , B ) b 1 b 2 a 1 0.085 0.015 a 2 0.765 0.135 P ( A ) P ( B ) P ( x , y ) log P ( x , y ) � M ( X ; Y ) = P ( x ) P ( y ) x ∈ X , y ∈ Y M ( A ; B ) = 0 . 04 · log ( 0 . 04 0 . 085) + 0 . 81 · log ( 0 . 81 0 . 765) + 0 . 06 · log ( 0 . 06 0 . 015) + 0 . 09 · log ( 0 . 09 0 . 135) = 0 . 027

  28. Example - Structure Learning Algorithm Mutual information M ( A ; B ) = 0 . 027

  29. Example - Structure Learning Algorithm Mutual information M ( A ; B ) = 0 . 027 M ( A ; D ) = 0 . 004

  30. Example - Structure Learning Algorithm Mutual information M ( A ; B ) = 0 . 027 M ( A ; D ) = 0 . 004 M ( B ; D ) = 0 . 018

  31. Example - Structure Learning Algorithm Mutual information M ( A ; B ) = 0 . 027 M ( A ; D ) = 0 . 004 M ( B ; D ) = 0 . 018 Dependency threshold φ ( X i , X j ) = logN 2 N × T ij

  32. Example - Structure Learning Algorithm Mutual information M ( A ; B ) = 0 . 027 M ( A ; D ) = 0 . 004 M ( B ; D ) = 0 . 018 Dependency threshold φ ( X i , X j ) = logN 2 N × T ij φ ( A , B ) = φ ( A , D ) = φ ( B , D )

  33. Example - Structure Learning Algorithm Mutual information M ( A ; B ) = 0 . 027 M ( A ; D ) = 0 . 004 M ( B ; D ) = 0 . 018 Dependency threshold φ ( X i , X j ) = logN 2 N × T ij φ ( A , B ) = φ ( A , D ) = φ ( B , D ) = 4 log 400 800

  34. Example - Structure Learning Algorithm Mutual information M ( A ; B ) = 0 . 027 M ( A ; D ) = 0 . 004 M ( B ; D ) = 0 . 018 Dependency threshold φ ( X i , X j ) = logN 2 N × T ij φ ( A , B ) = φ ( A , D ) = φ ( B , D ) = 4 log 400 = 0 . 013 800

  35. Example - Structure Learning Algorithm Mutual information M ( A ; B ) = 0 . 027 M ( A ; D ) = 0 . 004 M ( B ; D ) = 0 . 018 Dependency threshold φ ( X i , X j ) = logN 2 N × T ij φ ( A , B ) = φ ( A , D ) = φ ( B , D ) = 4 log 400 = 0 . 013 800 Total influence M ( X i ; X j ) >φ ( X i , X j ) � W ( X i ) = M ( X i ; X j ) j ( j � = i )

  36. Example - Structure Learning Algorithm Mutual information M ( A ; B ) = 0 . 027 M ( A ; D ) = 0 . 004 M ( B ; D ) = 0 . 018 Dependency threshold φ ( X i , X j ) = logN 2 N × T ij φ ( A , B ) = φ ( A , D ) = φ ( B , D ) = 4 log 400 = 0 . 013 800 Total influence M ( X i ; X j ) >φ ( X i , X j ) � W ( X i ) = M ( X i ; X j ) j ( j � = i ) indent indent indent W ( A )

  37. Example - Structure Learning Algorithm Mutual information M ( A ; B ) = 0 . 027 M ( A ; D ) = 0 . 004 M ( B ; D ) = 0 . 018 Dependency threshold φ ( X i , X j ) = logN 2 N × T ij φ ( A , B ) = φ ( A , D ) = φ ( B , D ) = 4 log 400 = 0 . 013 800 Total influence M ( X i ; X j ) >φ ( X i , X j ) � W ( X i ) = M ( X i ; X j ) j ( j � = i ) indent indent indent W ( A ) = M ( A ; B )

  38. Example - Structure Learning Algorithm Mutual information M ( A ; B ) = 0 . 027 M ( A ; D ) = 0 . 004 M ( B ; D ) = 0 . 018 Dependency threshold φ ( X i , X j ) = logN 2 N × T ij φ ( A , B ) = φ ( A , D ) = φ ( B , D ) = 4 log 400 = 0 . 013 800 Total influence M ( X i ; X j ) >φ ( X i , X j ) � W ( X i ) = M ( X i ; X j ) j ( j � = i ) indent indent indent W ( A ) = M ( A ; B ) = 0 . 027

  39. Example - Structure Learning Algorithm Mutual information M ( A ; B ) = 0 . 027 M ( A ; D ) = 0 . 004 M ( B ; D ) = 0 . 018 Dependency threshold φ ( X i , X j ) = logN 2 N × T ij φ ( A , B ) = φ ( A , D ) = φ ( B , D ) = 4 log 400 = 0 . 013 800 Total influence M ( X i ; X j ) >φ ( X i , X j ) � W ( X i ) = M ( X i ; X j ) j ( j � = i ) indent indent indent W ( A ) = M ( A ; B ) = 0 . 027 indent indent indent W ( B )

  40. Example - Structure Learning Algorithm Mutual information M ( A ; B ) = 0 . 027 M ( A ; D ) = 0 . 004 M ( B ; D ) = 0 . 018 Dependency threshold φ ( X i , X j ) = logN 2 N × T ij φ ( A , B ) = φ ( A , D ) = φ ( B , D ) = 4 log 400 = 0 . 013 800 Total influence M ( X i ; X j ) >φ ( X i , X j ) � W ( X i ) = M ( X i ; X j ) j ( j � = i ) indent indent indent W ( A ) = M ( A ; B ) = 0 . 027 indent indent indent W ( B ) = M ( A ; B )

  41. Example - Structure Learning Algorithm Mutual information M ( A ; B ) = 0 . 027 M ( A ; D ) = 0 . 004 M ( B ; D ) = 0 . 018 Dependency threshold φ ( X i , X j ) = logN 2 N × T ij φ ( A , B ) = φ ( A , D ) = φ ( B , D ) = 4 log 400 = 0 . 013 800 Total influence M ( X i ; X j ) >φ ( X i , X j ) � W ( X i ) = M ( X i ; X j ) j ( j � = i ) indent indent indent W ( A ) = M ( A ; B ) = 0 . 027 indent indent indent W ( B ) = M ( A ; B ) + M ( B ; D )

  42. Example - Structure Learning Algorithm Mutual information M ( A ; B ) = 0 . 027 M ( A ; D ) = 0 . 004 M ( B ; D ) = 0 . 018 Dependency threshold φ ( X i , X j ) = logN 2 N × T ij φ ( A , B ) = φ ( A , D ) = φ ( B , D ) = 4 log 400 = 0 . 013 800 Total influence M ( X i ; X j ) >φ ( X i , X j ) � W ( X i ) = M ( X i ; X j ) j ( j � = i ) indent indent indent W ( A ) = M ( A ; B ) = 0 . 027 indent indent indent W ( B ) = M ( A ; B ) + M ( B ; D ) = 0 . 045

  43. Example - Structure Learning Algorithm Mutual information M ( A ; B ) = 0 . 027 M ( A ; D ) = 0 . 004 M ( B ; D ) = 0 . 018 Dependency threshold φ ( X i , X j ) = logN 2 N × T ij φ ( A , B ) = φ ( A , D ) = φ ( B , D ) = 4 log 400 = 0 . 013 800 Total influence M ( X i ; X j ) >φ ( X i , X j ) � W ( X i ) = M ( X i ; X j ) j ( j � = i ) indent indent indent W ( A ) = M ( A ; B ) = 0 . 027 indent indent indent W ( B ) = M ( A ; B ) + M ( B ; D ) = 0 . 045 indent indent indent W ( D )

  44. Example - Structure Learning Algorithm Mutual information M ( A ; B ) = 0 . 027 M ( A ; D ) = 0 . 004 M ( B ; D ) = 0 . 018 Dependency threshold φ ( X i , X j ) = logN 2 N × T ij φ ( A , B ) = φ ( A , D ) = φ ( B , D ) = 4 log 400 = 0 . 013 800 Total influence M ( X i ; X j ) >φ ( X i , X j ) � W ( X i ) = M ( X i ; X j ) j ( j � = i ) indent indent indent W ( A ) = M ( A ; B ) = 0 . 027 indent indent indent W ( B ) = M ( A ; B ) + M ( B ; D ) = 0 . 045 indent indent indent W ( D ) = M ( B ; D )

  45. Example - Structure Learning Algorithm Mutual information M ( A ; B ) = 0 . 027 M ( A ; D ) = 0 . 004 M ( B ; D ) = 0 . 018 Dependency threshold φ ( X i , X j ) = logN 2 N × T ij φ ( A , B ) = φ ( A , D ) = φ ( B , D ) = 4 log 400 = 0 . 013 800 Total influence M ( X i ; X j ) >φ ( X i , X j ) � W ( X i ) = M ( X i ; X j ) j ( j � = i ) indent indent indent W ( A ) = M ( A ; B ) = 0 . 027 indent indent indent W ( B ) = M ( A ; B ) + M ( B ; D ) = 0 . 045 indent indent indent W ( D ) = M ( B ; D ) = 0 . 018

  46. B A D Example - Structure Learning Algorithm We now construct a full Bayesian network with variable order according to the total influence values:

  47. B A D Example - Structure Learning Algorithm We now construct a full Bayesian network with variable order according to the total influence values: W ( A ) = 0 . 027 W ( B ) = 0 . 045 W ( D ) = 0 . 018

  48. B A D Example - Structure Learning Algorithm We now construct a full Bayesian network with variable order according to the total influence values: W ( A ) = 0 . 027 W ( B ) = 0 . 045 W ( D ) = 0 . 018 W ( B ) > W ( A ) > W ( D )

  49. Example - Structure Learning Algorithm We now construct a full Bayesian network with variable order according to the total influence values: W ( A ) = 0 . 027 W ( B ) = 0 . 045 W ( D ) = 0 . 018 W ( B ) > W ( A ) > W ( D ) B A D We now have the full Bayesian network B c 1 , which is the part of the multinet that corresponds to C = c 1 . We should now repeat the process to construct B c 2 and thereby complete the FBC structure learning.

  50. Example - Structure Learning Algorithm We now construct a full Bayesian network with variable order according to the total influence values: W ( A ) = 0 . 027 W ( B ) = 0 . 045 W ( D ) = 0 . 018 W ( B ) > W ( A ) > W ( D ) B A D We now have the full Bayesian network B c 1 , which is the part of the multinet that corresponds to C = c 1 . We should now repeat the process to construct B c 2 and thereby complete the FBC structure learning.

  51. Example - Structure Learning Algorithm We now construct a full Bayesian network with variable order according to the total influence values: W ( A ) = 0 . 027 W ( B ) = 0 . 045 W ( D ) = 0 . 018 W ( B ) > W ( A ) > W ( D ) B A D We now have the full Bayesian network B c 1 , which is the part of the multinet that corresponds to C = c 1 . We should now repeat the process to construct B c 2 and thereby complete the FBC structure learning.

  52. Example - Structure Learning Algorithm We now construct a full Bayesian network with variable order according to the total influence values: W ( A ) = 0 . 027 W ( B ) = 0 . 045 W ( D ) = 0 . 018 W ( B ) > W ( A ) > W ( D ) B A D We now have the full Bayesian network B c 1 , which is the part of the multinet that corresponds to C = c 1 . We should now repeat the process to construct B c 2 and thereby complete the FBC structure learning.

  53. Example - Structure Learning Algorithm We now construct a full Bayesian network with variable order according to the total influence values: W ( A ) = 0 . 027 W ( B ) = 0 . 045 W ( D ) = 0 . 018 W ( B ) > W ( A ) > W ( D ) B A D We now have the full Bayesian network B c 1 , which is the part of the multinet that corresponds to C = c 1 . We should now repeat the process to construct B c 2 and thereby complete the FBC structure learning.

  54. CPT-tree Learning We now need to learn a CPT-tree for each variable in the full BN.

  55. CPT-tree Learning We now need to learn a CPT-tree for each variable in the full BN. A traditional decision tree learning algorithm, such as C4.5, can be used to learn CPT-trees. However, since the time complexity typically is O ( n 2 · N ) the resulting FBC learning algorithm would have a complexity of O ( n 3 · N ).

  56. CPT-tree Learning We now need to learn a CPT-tree for each variable in the full BN. A traditional decision tree learning algorithm, such as C4.5, can be used to learn CPT-trees. However, since the time complexity typically is O ( n 2 · N ) the resulting FBC learning algorithm would have a complexity of O ( n 3 · N ). Instead a fast decision tree learning algorithm is purposed.

  57. CPT-tree Learning We now need to learn a CPT-tree for each variable in the full BN. A traditional decision tree learning algorithm, such as C4.5, can be used to learn CPT-trees. However, since the time complexity typically is O ( n 2 · N ) the resulting FBC learning algorithm would have a complexity of O ( n 3 · N ). Instead a fast decision tree learning algorithm is purposed. The algorithm uses the mutual information to determine a fixed ordering of variables from root to leaves.

  58. CPT-tree Learning We now need to learn a CPT-tree for each variable in the full BN. A traditional decision tree learning algorithm, such as C4.5, can be used to learn CPT-trees. However, since the time complexity typically is O ( n 2 · N ) the resulting FBC learning algorithm would have a complexity of O ( n 3 · N ). Instead a fast decision tree learning algorithm is purposed. The algorithm uses the mutual information to determine a fixed ordering of variables from root to leaves. The predefined variable ordering makes the algorithm faster than traditional decision tree learning algorithms.

  59. CPT-tree Learning Algorithm Algorithm Fast-CPT-Tree(Π X i , S )

  60. CPT-tree Learning Algorithm Algorithm Fast-CPT-Tree(Π X i , S ) 1 Create an empty tree T .

  61. CPT-tree Learning Algorithm Algorithm Fast-CPT-Tree(Π X i , S ) 1 Create an empty tree T . 2 If ( S is pure or empty) or (Π X i is empty) Return T .

  62. CPT-tree Learning Algorithm Algorithm Fast-CPT-Tree(Π X i , S ) 1 Create an empty tree T . 2 If ( S is pure or empty) or (Π X i is empty) Return T . 3 qualified = False .

  63. CPT-tree Learning Algorithm Algorithm Fast-CPT-Tree(Π X i , S ) 1 Create an empty tree T . 2 If ( S is pure or empty) or (Π X i is empty) Return T . 3 qualified = False . 4 While ( qualified == False ) and (Π X i is not empty)

  64. CPT-tree Learning Algorithm Algorithm Fast-CPT-Tree(Π X i , S ) 1 Create an empty tree T . 2 If ( S is pure or empty) or (Π X i is empty) Return T . 3 qualified = False . 4 While ( qualified == False ) and (Π X i is not empty) Choose the variable X j with the highest M ( X j ; X i ).

Recommend


More recommend