CS 730/730W/830: Intro AI Naive Bayes Boosting 1 handout: slides asst 5 milestone was due Wheeler Ruml (UNH) Lecture 22, CS 730 – 1 / 14
Supervised Learning: Summary So Far learning as function approximation Naive Bayes Boosting k -NN: distance function (any attributes), any labels Neural network: numeric attributes, numeric or binary labels Regression: incremental training with LMS 3-Layer ANN: train with BackProp Inductive Logic Programming: logical concepts Decision Trees: easier with discrete attributes and labels Wheeler Ruml (UNH) Lecture 22, CS 730 – 2 / 14
Naive Bayes ■ Bayes’ Theorem ■ The NB Model ■ The NB Classifier ■ Break Boosting Naive Bayes Wheeler Ruml (UNH) Lecture 22, CS 730 – 3 / 14
Bayes’ Theorem Naive Bayes P ( H | D ) = P ( H ) P ( D | H ) ■ Bayes’ Theorem ■ The NB Model P ( D ) ■ The NB Classifier ■ Break Boosting Wheeler Ruml (UNH) Lecture 22, CS 730 – 4 / 14
Bayes’ Theorem Naive Bayes P ( H | D ) = P ( H ) P ( D | H ) ■ Bayes’ Theorem ■ The NB Model P ( D ) ■ The NB Classifier ■ Break Boosting P ( H ) = 0 . 0001 P ( D | H ) = 0 . 99 P ( D ) = 0 . 01 P ( H | D ) = Wheeler Ruml (UNH) Lecture 22, CS 730 – 4 / 14
Bayes’ Theorem Naive Bayes P ( H | D ) = P ( H ) P ( D | H ) ■ Bayes’ Theorem ■ The NB Model P ( D ) ■ The NB Classifier ■ Break Boosting P ( H ) = 0 . 0001 P ( D | H ) = 0 . 99 P ( D ) = 0 . 01 P ( H | D ) = If you don’t have P(D), Wheeler Ruml (UNH) Lecture 22, CS 730 – 4 / 14
Bayes’ Theorem Naive Bayes P ( H | D ) = P ( H ) P ( D | H ) ■ Bayes’ Theorem ■ The NB Model P ( D ) ■ The NB Classifier ■ Break Boosting P ( H ) = 0 . 0001 P ( D | H ) = 0 . 99 P ( D ) = 0 . 01 P ( H | D ) = If you don’t have P(D), somtimes it helps to note that P ( D ) = P ( D | H ) P ( H ) + P ( D |¬ H ) P ( ¬ H ) Wheeler Ruml (UNH) Lecture 22, CS 730 – 4 / 14
A Naive Bayesian Model Bayes’ Theorem: Naive Bayes ■ Bayes’ Theorem P ( H | D ) = P ( H ) P ( D | H ) ■ The NB Model ■ The NB Classifier P ( D ) ■ Break Boosting Wheeler Ruml (UNH) Lecture 22, CS 730 – 5 / 14
A Naive Bayesian Model Bayes’ Theorem: Naive Bayes ■ Bayes’ Theorem P ( H | D ) = P ( H ) P ( D | H ) ■ The NB Model ■ The NB Classifier P ( D ) ■ Break Boosting naive model: � P ( D | H ) = P ( x i , . . . , x n | H ) = P ( x i | H ) i Wheeler Ruml (UNH) Lecture 22, CS 730 – 5 / 14
A Naive Bayesian Model Bayes’ Theorem: Naive Bayes ■ Bayes’ Theorem P ( H | D ) = P ( H ) P ( D | H ) ■ The NB Model ■ The NB Classifier P ( D ) ■ Break Boosting naive model: � P ( D | H ) = P ( x i , . . . , x n | H ) = P ( x i | H ) i attributes independent, given class Wheeler Ruml (UNH) Lecture 22, CS 730 – 5 / 14
A Naive Bayesian Model Bayes’ Theorem: Naive Bayes ■ Bayes’ Theorem P ( H | D ) = P ( H ) P ( D | H ) ■ The NB Model ■ The NB Classifier P ( D ) ■ Break Boosting naive model: � P ( D | H ) = P ( x i , . . . , x n | H ) = P ( x i | H ) i attributes independent, given class � P ( H | x 1 , . . . , x n ) = αP ( H ) P ( x i | H ) i Wheeler Ruml (UNH) Lecture 22, CS 730 – 5 / 14
The ‘Naive Bayes’ Classifier Naive Bayes ■ Bayes’ Theorem � P ( H | x 1 , . . . , x n ) = αP ( H ) P ( x i | H ) ■ The NB Model ■ The NB Classifier i ■ Break attributes independent, given class Boosting maximum a posteriori = pick highest maximum likelihood = ignore prior watch for sparse data when learning! learning as density estimation Wheeler Ruml (UNH) Lecture 22, CS 730 – 6 / 14
Break asst 5 ■ Naive Bayes exam 2 ■ ■ Bayes’ Theorem ■ The NB Model projects ■ ■ The NB Classifier ■ Break Boosting Wheeler Ruml (UNH) Lecture 22, CS 730 – 7 / 14
Naive Bayes Boosting ■ Ensembles ■ AdaBoost ■ Behavior ■ Summary ■ EOLQs Boosting Wheeler Ruml (UNH) Lecture 22, CS 730 – 8 / 14
Ensemble Learning committees, ensembles Naive Bayes weak vs strong learners Boosting ■ Ensembles reduce variance, expand hypothesis space (eg, half-spaces) ■ AdaBoost ■ Behavior ■ Summary ■ EOLQs Wheeler Ruml (UNH) Lecture 22, CS 730 – 9 / 14
AdaBoost N examples, T rounds, L a weak learner on weighted examples Naive Bayes Boosting ■ Ensembles ■ AdaBoost p ← uniform distribution over the N examples ■ Behavior for t = 1 to T do ■ Summary ■ EOLQs h t ← call L with weights p ǫ t ← h t ’s weighted misclassification probability if ǫ t = 0 , return h t α t ← 1 2 ln( 1 − ǫ t ǫ t ) for each example i if h t ( i ) is correct, p i ← p i e − α t else, p i ← p i e α t normalize p to sum to 1 return the h weighted by the α to classify, choose label with highest sum of weighted votes Wheeler Ruml (UNH) Lecture 22, CS 730 – 10 / 14
Boosting Function Naive Bayes Boosting ■ Ensembles ■ AdaBoost ■ Behavior ■ Summary ■ EOLQs Wheeler Ruml (UNH) Lecture 22, CS 730 – 11 / 14
Behavior doesn’t overfit (maximizes margin even when no error) Naive Bayes outliers get high weight, can be inspected Boosting ■ Ensembles problems: ■ AdaBoost ■ Behavior not enough data ■ ■ Summary hypothesis class too small ■ EOLQs ■ boosting: learner too weak, too strong ■ Wheeler Ruml (UNH) Lecture 22, CS 730 – 12 / 14
Supervised Learning: Summary k -NN: distance function (any attributes), any labels Naive Bayes Neural network: numeric attributes, numeric or binary labels Boosting ■ Ensembles Perceptron: equivalent to linear regression ■ AdaBoost ■ Behavior 3-Layer ANN: BackProp learning ■ Summary ■ EOLQs Decision Trees: easier with discrete attributes and labels Inductive Logic Programming: logical concepts Naive Bayes: easier with discrete attributes and labels Boosting: general wrapper to improve performance Didn’t cover: RBFs, EBL, SVMs Wheeler Ruml (UNH) Lecture 22, CS 730 – 13 / 14
EOLQs What question didn’t you get to ask today? ■ Naive Bayes What’s still confusing? ■ Boosting ■ Ensembles What would you like to hear more about? ■ ■ AdaBoost ■ Behavior Please write down your most pressing question about AI and put ■ Summary ■ EOLQs it in the box on your way out. Thanks! Wheeler Ruml (UNH) Lecture 22, CS 730 – 14 / 14
Recommend
More recommend