outline
play

Outline 1 Introduction 2 Discrete Predictors 3 Validation of - PowerPoint PPT Presentation

Introduction Discrete Predictors Validation Summary B AYESIAN N ETWORK C LASSIFIERS Pedro Larra naga Computational Intelligence Group Artificial Intelligence Department Universidad Polit ecnica de Madrid Bayesian Networks: From Theory


  1. Introduction Discrete Predictors Validation Summary B AYESIAN N ETWORK C LASSIFIERS Pedro Larra˜ naga Computational Intelligence Group Artificial Intelligence Department Universidad Polit´ ecnica de Madrid Bayesian Networks: From Theory to Practice International Black Sea University Autumn School on Machine Learning 3-11 October 2019, Tbilisi, Georgia Pedro Larra˜ naga Bayesian Network Classifiers 1 / 52

  2. Introduction Discrete Predictors Validation Summary Outline 1 Introduction 2 Discrete Predictors 3 Validation of Supervised Classifiers 4 Summary Pedro Larra˜ naga Bayesian Network Classifiers 2 / 52

  3. Introduction Discrete Predictors Validation Summary Outline 1 Introduction 2 Discrete Predictors 3 Validation of Supervised Classifiers 4 Summary Pedro Larra˜ naga Bayesian Network Classifiers 3 / 52

  4. Introduction Discrete Predictors Validation Summary Supervised classification X 1 . . . X n C x ( 1 ) x ( 1 ) ( x ( 1 ) , c ( 1 ) ) c ( 1 ) . . . n 1 x ( 2 ) x ( 2 ) ( x ( 2 ) , c ( 2 ) ) c ( 2 ) . . . n 1 . . . . . . . . . x ( N ) x ( N ) ( x ( N ) , c ( N ) ) c ( N ) . . . n 1 x ( N + 1 ) x ( N + 1 ) x ( N + 1 ) . . . ??? n 1 Pedro Larra˜ naga Bayesian Network Classifiers 4 / 52

  5. Introduction Discrete Predictors Validation Summary Applications domains Supervised pattern recognition Decision support systems for diagnosis and prognosis Loan decision Spam detection Prediction of sport results Hand writing character recognition Weather forecast Prediction of the secondary structure of proteins . . . Pedro Larra˜ naga Bayesian Network Classifiers 5 / 52

  6. Introduction Discrete Predictors Validation Summary Optical character recognition Figure: Hand writing character recognition Pedro Larra˜ naga Bayesian Network Classifiers 6 / 52

  7. Introduction Discrete Predictors Validation Summary Weather forecast Figure: Methereology Pedro Larra˜ naga Bayesian Network Classifiers 7 / 52

  8. Introduction Discrete Predictors Validation Summary Computational biology Figure: Prediction of the secondary structure of proteins Pedro Larra˜ naga Bayesian Network Classifiers 8 / 52

  9. Introduction Discrete Predictors Validation Summary Paradigms for supervised classification Statistical and machine learning Bayesian networks (Pearl, 1988) Classification trees (Quinlan, 1986; Breiman et al. 1984) Classifier systems (Holland, 1975) Discriminant analysis (Fisher, 1936) k –NN classifiers (Covert and Hart, 1967; Dasarathy, 1991) Logistic regression (Hosmer and Lemeshov, 1989) Neural networks (McCulloch and Pitts, 1943) Rule induction (Clark and Nibblet, 1989; Cohen, 1995; Holte, 1993) Support vector machines (Cristianini and Shawe–Taylor, 2000) Pedro Larra˜ naga Bayesian Network Classifiers 9 / 52

  10. Introduction Discrete Predictors Validation Summary Bayesian network based classifiers Hierarchy of classifiers Na¨ ıve Bayes (NB) (Minsky, 1961) Semina¨ ıve Bayes (Pazzani, 1997) Tree augmented na¨ ıve Bayes (TAN) (Friedman et al., 1997) k -dependence Bayesian classifier ( k -DB) (Sahami, 1996) Markov blanket (Sierra and Larraaga, 1998) Bayesian multinets (Kontkanen et al., 2000) Pedro Larra˜ naga Bayesian Network Classifiers 10 / 52

  11. Introduction Discrete Predictors Validation Summary Outline 1 Introduction 2 Discrete Predictors 3 Validation of Supervised Classifiers 4 Summary Pedro Larra˜ naga Bayesian Network Classifiers 11 / 52

  12. Introduction Discrete Predictors Validation Summary Introduction Fundamentals Cost matrix: cost ( r , s ) with r predicted class and s true class r , s = 1 , . . . r 0 Minimization of the total cost error (Bayes rule) r 0 � γ ( x ) = arg min cost ( c , k ) P ( c | x 1 , . . . , x n ) c k = 1 In the case of a 0 / 1 loss function: γ ( x ) = arg max P ( c | x 1 , . . . , x n ) c Pedro Larra˜ naga Bayesian Network Classifiers 12 / 52

  13. Introduction Discrete Predictors Validation Summary Generative versus discriminative classifiers Generative classifiers P ( c | x 1 , . . . , x n ) obtained in an undirected way P ( c | x 1 , . . . , x n ) ∝ P ( c , x 1 , . . . , x n ) ∝ P ( c ) P ( x 1 , . . . , x n | c ) Parameters estimated from the joint log–likelihood N � � ( x ( 1 ) , c ( 1 ) ) , . . . , ( x ( N ) , c ( N ) ) � log P ( x ( j ) , c ( j ) ) L = j = 1 Discriminant analysis Na¨ ıve Bayes Pedro Larra˜ naga Bayesian Network Classifiers 13 / 52

  14. Introduction Discrete Predictors Validation Summary Generative versus discriminative classifiers Discriminative classifiers Discriminative classifiers P ( c | x 1 , . . . , x n ) directly Parameters are estimated from the conditional log–likelihood: N � � � ( c ( 1 ) | x ( 1 ) ) , . . . , ( c ( N ) | x ( N ) ) log P ( c ( j ) | x ( j ) ) L = j = 1 Logistic regression Pedro Larra˜ naga Bayesian Network Classifiers 14 / 52

  15. Introduction Discrete Predictors Validation Summary From the classical diagnosis problem to the na¨ ıve Bayes Classical diagnosis problem. Multiple diseases X 1 . . . X n Y 1 . . . Y m x ( 1 ) x ( 1 ) y ( 1 ) y ( 1 ) ( x ( 1 ) , y ( 1 ) ) . . . . . . n m 1 1 x ( 2 ) x ( 2 ) y ( 2 ) y ( 2 ) ( x ( 2 ) , y ( 2 ) ) . . . . . . n m 1 1 . . . . . . . . . x ( N ) x ( N ) y ( N ) y ( N ) ( x ( N ) , y ( N ) ) . . . . . . n m 1 1 Table: Classical diagnosis problem Pedro Larra˜ naga Bayesian Network Classifiers 15 / 52

  16. Introduction Discrete Predictors Validation Summary From the classical diagnosis problem to the na¨ ıve Bayes Classical diagnosis problem. Multiple diseases ( y ∗ 1 , . . . , y ∗ m ) = arg ( y 1 ,..., y m ) P ( Y 1 = y 1 , . . . , Y m = y m | X 1 = x 1 , . . . , X n = x n ) max P ( Y 1 = y 1 , . . . , Y m = y m | X 1 = x 1 , . . . , X n = x n ) ∝ P ( Y 1 = y 1 , . . . , Y m = y m ) P ( X 1 = x 1 , . . . , X n = x n | Y 1 = y 1 , . . . , Y m = y m ) Number of parameters: 2 m − 1 + ( 2 n − 1 ) 2 m m = 3 , n = 10 number of parameters ≃ 8 · 10 3 number of parameters ≃ 33 · 10 6 m = 5 , n = 20 number of parameters ≃ 11 · 10 17 m = 10 , n = 50 Pedro Larra˜ naga Bayesian Network Classifiers 16 / 52

  17. Introduction Discrete Predictors Validation Summary From the classical diagnosis problem to the na¨ ıve Bayes Single disease c ∗ = arg max P ( C = c | X 1 = x 1 , . . . , X n = x n ) c P ( C = c | X 1 = x 1 , . . . , X n = x n ) ∝ P ( C = c ) P ( X 1 = x 1 , . . . , X n = x n | C = c ) Number of parameters: ( r 0 − 1 ) + r 0 ( 2 n − 1 ) number of parameters ≃ 3 · 10 3 r 0 = 3 , n = 10 number of parameters ≃ 5 · 10 6 r 0 = 5 , n = 20 number of parameters ≃ 11 · 10 15 r 0 = 10 , n = 50 Pedro Larra˜ naga Bayesian Network Classifiers 17 / 52

  18. Introduction Discrete Predictors Validation Summary From the classical diagnosis problem to the na¨ ıve Bayes Single disease and symptoms conditionally independent given the disease c ∗ = arg max P ( C = c | X 1 = x 1 , . . . , X n = x n ) c n � = arg max P ( C = c ) P ( X i = x i | C = c ) c i = 1 Number of parameters: r 0 − 1 + r 0 n r 0 = 3 , n = 10 , number of parameters = 32 r 0 = 5 , n = 20 , number of parameters = 104 r 0 = 10 , n = 50 , number of parameters = 509 Pedro Larra˜ naga Bayesian Network Classifiers 18 / 52

  19. Introduction Discrete Predictors Validation Summary Na¨ ıve Bayes as a probabilistic graphical model Na¨ ıve Bayes (Minsky, 1961) Predictor variables conditionally independent given C c ∗ = arg max c P ( C = c ) � n i = 1 P ( X i = x i | C = c ) Figure: Structure of a na¨ ıve Bayes Pedro Larra˜ naga Bayesian Network Classifiers 19 / 52

  20. Introduction Discrete Predictors Validation Summary Na¨ ıve Bayes (Minsky, 1961) Pattern recognition versus machine learning Long tradition in the pattern recognition community: Minsky (1961), van Woerkom and Brodman (1961), Warner et al. (1961), Bailey (1964), Boyle et al. (1966), Maron (1961), Duda and Hart (1973) Introduced in the machine learning field by Cestnik et al. (1987). Different names: idiot Bayes : Ohmann et al. (1988) na¨ ıve Bayes : Kononenko (1990) simple Bayes : Gammerman and Thatcher (1991) independent Bayes : Todd and Stamper (1994) Pedro Larra˜ naga Bayesian Network Classifiers 20 / 52

  21. Introduction Discrete Predictors Validation Summary Na¨ ıve Bayes (Minsky, 1961) Theoretical results Minsky (1961). The decision surfaces in a na¨ ıve Bayes classifier with binary predictor variables are hyperplanes Peot (1996). Generalization of the previous result for the case of nominal (no binary) predictor variables Duda and Hart (1973). For ordinal predictor variables, the decision surfaces are polynomials Domingos and Pazzani (1997). Although the estimation of p ( c | x 1 , . . . , x n ) is not well calibrated, na¨ ıve Bayes can obtain competitive accuracies Pedro Larra˜ naga Bayesian Network Classifiers 21 / 52

Recommend


More recommend