Applied Machine Learning Applied Machine Learning Naive Bayes - PowerPoint PPT Presentation

Applied Machine Learning Applied Machine Learning Naive Bayes Siamak Ravanbakhsh Siamak Ravanbakhsh COMP 551 COMP 551 (winter 2020) (winter 2020) 1

Learning objectives Learning objectives generative vs. discriminative classifier Naive Bayes classifier assumption different design choices 2

Discreminative vs generative classification Discreminative vs generative classification so far we modeled the conditional distribution : p ( y ∣ x ) discriminative 3 . 1 image: https://rpsychologist.com

Discreminative vs generative classification Discreminative vs generative classification so far we modeled the conditional distribution : p ( y ∣ x ) discriminative learn the joint distribution p ( y , x ) = p ( y ) p ( x ∣ y ) generative 3 . 1 image: https://rpsychologist.com

Discreminative vs generative classification Discreminative vs generative classification so far we modeled the conditional distribution : p ( y ∣ x ) discriminative learn the joint distribution p ( y , x ) = p ( y ) p ( x ∣ y ) generative Bayes rule p ( c ) p ( x ∣ c ) p ( y = c ∣ x ) = p ( x ) 3 . 1 image: https://rpsychologist.com

Discreminative vs generative classification Discreminative vs generative classification so far we modeled the conditional distribution : p ( y ∣ x ) discriminative learn the joint distribution p ( y , x ) = p ( y ) p ( x ∣ y ) generative prior class probability: frequency of observing this label Bayes rule p ( c ) p ( x ∣ c ) p ( y = c ∣ x ) = p ( x ∣ y = 1) p ( x ) p ( x ∣ y = 0) x 3 . 1 image: https://rpsychologist.com

Discreminative vs generative classification Discreminative vs generative classification so far we modeled the conditional distribution : p ( y ∣ x ) discriminative learn the joint distribution p ( y , x ) = p ( y ) p ( x ∣ y ) generative prior class probability: frequency of observing this label likelihood of input features given the class label (input features for each label come from a different distribution) Bayes rule p ( c ) p ( x ∣ c ) p ( y = c ∣ x ) = p ( x ∣ y = 1) p ( x ) p ( x ∣ y = 0) x 3 . 1 image: https://rpsychologist.com

Discreminative vs generative classification Discreminative vs generative classification so far we modeled the conditional distribution : p ( y ∣ x ) discriminative learn the joint distribution p ( y , x ) = p ( y ) p ( x ∣ y ) generative prior class probability: frequency of observing this label likelihood of input features given the class label (input features for each label come from a different distribution) Bayes rule p ( c ) p ( x ∣ c ) p ( y = c ∣ x ) = p ( x ∣ y = 1) p ( x ) p ( x ∣ y = 0) marginal probability of the input (evidence) x C ′ ∑ c =1 p ( x , c ) ′ 3 . 1 image: https://rpsychologist.com

Discreminative vs generative classification Discreminative vs generative classification so far we modeled the conditional distribution : p ( y ∣ x ) discriminative learn the joint distribution p ( y , x ) = p ( y ) p ( x ∣ y ) generative prior class probability: frequency of observing this label likelihood of input features given the class label (input features for each label come from a different distribution) Bayes rule p ( c ) p ( x ∣ c ) p ( y = c ∣ x ) = p ( x ∣ y = 1) p ( x ) p ( x ∣ y = 0) posterior probability marginal probability of the input (evidence) x of a given class C ′ ∑ c =1 p ( x , c ) ′ 3 . 1 image: https://rpsychologist.com

Discreminative vs generative classification Discreminative vs generative classification so far we modeled the conditional distribution : p ( y ∣ x ) discriminative learn the joint distribution p ( y , x ) = p ( y ) p ( x ∣ y ) generative prior class probability: frequency of observing this label likelihood of input features given the class label (input features for each label come from a different distribution) how to classify new input x? Bayes rule p ( c ) p ( x ∣ c ) p ( y = c ∣ x ) = p ( x ∣ y = 1) p ( x ) p ( x ∣ y = 0) posterior probability marginal probability of the input (evidence) x of a given class C ′ ∑ c =1 p ( x , c ) ′ 3 . 1 image: https://rpsychologist.com

Example: Example: Bayes rule for classification Bayes rule for classification patient having cancer? y ∈ {yes, no} x ∈ {−, +} test results, a single binary feature p ( c ) p ( x ∣ c ) p ( c ∣ x ) = p ( x ) 3 . 2

Example: Example: Bayes rule for classification Bayes rule for classification patient having cancer? y ∈ {yes, no} x ∈ {−, +} test results, a single binary feature prior: 1% of population has cancer p (yes) = .01 p ( c ) p ( x ∣ c ) p ( c ∣ x ) = p ( x ) 3 . 2

Example: Bayes rule for classification Example: Bayes rule for classification patient having cancer? y ∈ {yes, no} x ∈ {−, +} test results, a single binary feature prior: 1% of population has cancer p (yes) = .01 p (+∣yes) = .9 likelihood: TP rate of the test (90%) p ( c ) p ( x ∣ c ) p ( c ∣ x ) = p ( x ) 3 . 2

Example: Example: Bayes rule for classification Bayes rule for classification patient having cancer? y ∈ {yes, no} x ∈ {−, +} test results, a single binary feature prior: 1% of population has cancer p (yes) = .01 p (+∣yes) = .9 likelihood: TP rate of the test (90%) p ( c ) p ( x ∣ c ) p ( c ∣ x ) = p ( x ) FP rate of the test (5%) p (+) = p (yes) p (+∣yes) + p (no) p (+∣no) = .01 × .9 + .99 × .05 = .189 evidence: 3 . 2

Example: Bayes rule for classification Example: Bayes rule for classification patient having cancer? y ∈ {yes, no} x ∈ {−, +} test results, a single binary feature prior: 1% of population has cancer p (yes) = .01 p (+∣yes) = .9 likelihood: TP rate of the test (90%) p ( c ) p ( x ∣ c ) p ( c ∣ x ) = p ( x ) FP rate of the test (5%) posterior: p (yes∣+) = .08 p (+) = p (yes) p (+∣yes) + p (no) p (+∣no) = .01 × .9 + .99 × .05 = .189 evidence: 3 . 2

Example: Example: Bayes rule for classification Bayes rule for classification patient having cancer? y ∈ {yes, no} x ∈ {−, +} test results, a single binary feature prior: 1% of population has cancer p (yes) = .01 p (+∣yes) = .9 likelihood: TP rate of the test (90%) p ( c ) p ( x ∣ c ) p ( c ∣ x ) = p ( x ) FP rate of the test (5%) posterior: p (yes∣+) = .08 p (+) = p (yes) p (+∣yes) + p (no) p (+∣no) = .01 × .9 + .99 × .05 = .189 evidence: in a generative classifier likelihood & prior class probabilities are learned from data 3 . 2 Winter 2020 | Applied Machine Learning (COMP551)

Generative classification Generative classification prior class probability: frequency of observing this label likelihood of input features given the class label (input features for each label come from a different distribution) p ( c ) p ( x ∣ c ) p ( y = c ∣ x ) = p ( x ) posterior probability marginal probability of the input (evidence) of a given class C ′ p ( x , c ) ∑ c =1 ′ Some generative classifiers: Gaussian Discriminant Analysis: the likelihood is multivariate Gaussian Naive Bayes: decomposed likelihood 4 . 1 image: https://rpsychologist.com

Naive Bayes: Naive Bayes: model model number of input features D assumption about the likelihood p ( x ∣ y ) = p ( x ∣ y ) ∏ d =1 d 4 . 2

Naive Bayes: Naive Bayes: model model number of input features D assumption about the likelihood p ( x ∣ y ) = p ( x ∣ y ) ∏ d =1 d when is this assumption correct? ⊥ ∣ when features are conditionally independent given the label x x y i j knowing the label, the value of one input feature gives us no information about the other input features 4 . 2

Naive Bayes: Naive Bayes: model model number of input features D assumption about the likelihood p ( x ∣ y ) = p ( x ∣ y ) ∏ d =1 d when is this assumption correct? ⊥ ∣ when features are conditionally independent given the label x x y i j knowing the label, the value of one input feature gives us no information about the other input features chain rule of probability (true for any distribution) p ( x ∣ y ) = p ( x ∣ y ) p ( x ∣ y , x ) p ( x ∣ y , x , x ) … p ( x ∣ y , x , … , x ) 1 2 1 3 1 2 1 D −1 D 4 . 2

Naive Bayes: Naive Bayes: model model number of input features D assumption about the likelihood p ( x ∣ y ) = p ( x ∣ y ) ∏ d =1 d when is this assumption correct? ⊥ ∣ when features are conditionally independent given the label x x y i j knowing the label, the value of one input feature gives us no information about the other input features chain rule of probability (true for any distribution) p ( x ∣ y ) = p ( x ∣ y ) p ( x ∣ y , x ) p ( x ∣ y , x , x ) … p ( x ∣ y , x , … , x ) 1 2 1 3 1 2 1 D −1 D conditional independence assumption x1, x2 give no extra information, so p ( x ∣ y , x , x ) = p ( x ∣ y ) 3 1 2 3 4 . 2

Naive Bayes: objective Naive Bayes: objective given the training dataset (1) (1) ( N ) ( N ) D = {( x , y ), … , ( x , y )} maximize the joint likelihood (contrast with logistic regression) ( n ) ( n ) ℓ( w , u ) = log p ( x , y ) ∑ n u , w 4 . 3

Naive Bayes: objective Naive Bayes: objective given the training dataset (1) (1) ( N ) ( N ) D = {( x , y ), … , ( x , y )} maximize the joint likelihood (contrast with logistic regression) ( n ) ( n ) ℓ( w , u ) = log p ( x , y ) ∑ n u , w ( n ) ( n ) ( n ) = log p ( y ) + log p ( x ∣ y ) ∑ n u w ( n ) ( n ) ( n ) = log p ( y ) + log p ( x ∣ y ) ∑ n ∑ n u w 4 . 3

Applied Machine Learning Applied Machine Learning Naive Bayes - PowerPoint PPT Presentation

Applied Machine Learning Applied Machine Learning Naive Bayes Siamak Ravanbakhsh Siamak Ravanbakhsh COMP 551 COMP 551 (winter 2020) (winter 2020) 1 Learning objectives Learning objectives generative vs. discriminative classifier Naive

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

Applied Machine Learning Introduction 1 APPLIED MACHINE LEARNING Practicalities Contact

APPLIED MACHINE LEARNING Methods for Clustering K-means, Soft K-means DBSCAN 1 MACHINE

Applied Machine Learning Introduction 1 APPLIED MACHINE LEARNING Practicalities Slides and

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

APPLIED MACHINE LEARNING Probability Density Functions Gaussian Mixture Models 1 APPLIED

Applied Machine Learning Applied Machine Learning Convolutional Neural Networks Siamak

Applied Machine Learning Applied Machine Learning Multilayer Perceptron Siamak Ravanbakhsh

Applied Machine Learning Applied Machine Learning Convolutional Neural Networks Siamak

Applied Machine Learning Applied Machine Learning Perceptron and Support Vector Machines Siamak

Applied Machine Learning Applied Machine Learning Decision Trees Siamak Ravanbakhsh Siamak

Applied Machine Learning Applied Machine Learning Gradient Descent Methods Siamak Ravanbakhsh

Applied Machine Learning Applied Machine Learning Gradient Descent Methods Siamak Ravanbakhsh

Applied Machine Learning Applied Machine Learning Bootstrap, Bagging and Boosting Siamak

Applied Machine Learning Applied Machine Learning Logistic Regression Siamak Ravanbakhsh Siamak

Applied Machine Learning Applied Machine Learning Regularization Siamak Ravanbakhsh Siamak

T-61.3050 Machine Learning: Basic Principles Bayesian Networks Kai Puolam aki Laboratory of

Bayesian networks Independence Bayesian networks Markov conditions Inference by

CS 730/730W/830: Intro AI Bayesian Networks Approx. Inference Exact Inference 1 handout: slides

Bayes Nets (Ch. 14) Announcements Homework 1 posted Bayesian Network A Bayesian network (Bayes

Adversarial Approaches to Bayesian Learning and Bayesian Approaches to Adversarial Robustness

Bayesian Deep Learning and Restricted Boltzmann Machines Narada Warakagoda Forsvarets

The LESO-PB building building control system 0.0015 Density estimate 0.0010 0.0005 0.0000 0

Statistics Review of Probability Model Shiu-Sheng Chen Department of Economics National Taiwan