Generative Models and Nave Bayes Ke Chen Reading: [14.3, EA], - PowerPoint PPT Presentation

Generative Models and Naïve Bayes Ke Chen Reading: [14.3, EA], [3.5, KPM], [1.5.4, CMB] COMP24111 Machine Learning

Outline • Background and Probability Basics • Probabilistic Classification Principle – Probabilistic discriminative models – Generative models and their application to classification – MAP and converting generative into discriminative • Naïve Bayes – an generative model – Principle and Algorithms (discrete vs. continuous) – Example: Play Tennis • Zero Conditional Probability and Treatment • Summary 2 COMP24111 Machine Learning

Background • There are three methodologies: a ) Model a classification rule directly Examples: k-NN, linear classifier, SVM , neural nets, .. b ) Model the probability of class memberships given input data Examples: logistic regression, probabilistic neural nets (softmax),… c ) Make a probabilistic model of data within each class Examples: naive Bayes, model-based …. • Important ML taxonomy for learning models probabilistic models vs non-probabilistic models discriminative models vs generative models 3 COMP24111 Machine Learning

Background • Based on the taxonomy, we can see different the essence of learning models (classifiers) more clearly. Probabilistic Non-Probabilistic • Logistic Regression • K-nn • Probabilistic neural nets • Linear classifier Discriminative • …….. • SVM • Neural networks • …… • Naïve Bayes • Model-based (e.g., GMM) N.A. (?) Generative • …… 4 COMP24111 Machine Learning

Probability Basics • Prior, conditional and joint probability for random variables P ( x ) – Prior probability: , P ( x | x ) P(x | x ) – Conditional probability: 1 2 2 1 = = x ( x , x ), P ( x ) P(x ,x ) – Joint probability: 1 2 1 2 = = P(x 1 ,x ) P ( x | x ) P ( x ) P ( x | x ) P ( x ) – Relationship: 2 2 1 1 1 2 2 – Independence: = = = P ( x | x ) P ( x ), P ( x | x ) P ( x ), P(x ,x ) P ( x ) P ( x ) 2 1 2 1 2 1 1 2 1 2 • Bayesian Rule × P ( x | c ) P ( c ) Likelihood Prior = = P ( c | x ) Posterior P ( x ) Evidence Discriminative Generative 5 COMP24111 Machine Learning

Probabilistic Classification Principle • Establishing a probabilistic model for classification Discriminative model – = ⋅ ⋅ ⋅ = ⋅ ⋅ ⋅ , P ( c | x ) c c , , c x (x , , x ) 1 L 1 n P ( 1 x c | ) P ( c 2 x | ) P ( c | x ) • To train a discriminative classifier L • • • regardless its probabilistic or non- probabilistic nature , all training examples of different classes must Discriminative be jointly used to build up a single Probabilistic Classifier discriminative classifier. • Output L probabilities for L class labels in a probabilistic classifier • • • while a single label is achieved by x x x n 1 2 a non-probabilistic classifier . = ⋅ ⋅ ⋅ x ( x , x , , x ) 1 2 n 6 COMP24111 Machine Learning

Probabilistic Classification Principle • Establishing a probabilistic model for classification (cont.) Generative model (must be probabilistic) – = ⋅ ⋅ ⋅ = ⋅ ⋅ ⋅ P ( x | c ) c c , , c , x (x , , x ) 1 L 1 n • L probabilistic models have P x ( | c ) P ( x | c ) 1 L to be trained independently Generative Generative • Each is trained on only the • • • Probabilistic Model Probabilistic Model examples of the same label for Class 1 for Class L • Output L probabilities for a • • • • • • given input with L models x x x x x x 1 2 n 1 2 n • “Generative” means that = ⋅ ⋅ ⋅ such a model produces data x ( x , x , , x ) 1 2 n subject to the distribution via sampling. 7 COMP24111 Machine Learning

Probabilistic Classification Principle M aximum A P osterior ( MAP ) classification rule • For an input x , find the largest one from L probabilities output by – P ( 1 c | x ) , ..., P ( c | x ). a discriminative probabilistic classifier L ( * x Assign x to label c* if is the largest. – P c | ) • Generative classification with the MAP rule – Apply Bayesian rule to convert them into posterior probabilities P ( x | c ) P ( c ) = ∝ i i P ( c | x ) P ( x | c ) P ( c ) i i i Common factor for P ( x ) all L probabilities = ⋅ ⋅⋅ for i 1 , 2 , , L – Then apply the MAP rule to assign a label 8 COMP24111 Machine Learning

Naïve Bayes • Bayes classification ∝ = ⋅ ⋅ ⋅ = P ( c | ) P ( | c ) P ( c ) P ( x , , x | c ) P ( c ) for c c ,..., c . x x 1 n 1 L ⋅ ⋅ ⋅ P ( 1 x , , x | c ) Difficulty: learning the joint probability is infeasible! n • Naïve Bayes classification – Assume all input features are class conditionally independent! ⋅ ⋅ ⋅ = ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ P ( x , x , , x | c ) P ( x | x , , x , c ) P ( x , , x | c ) 1 2 n 1 2 n 2 n = ⋅ ⋅ ⋅ Applying the P ( x | c ) P ( x , , x | c ) 1 2 n independence = ⋅ ⋅ ⋅ P ( x | c ) P ( x | c ) P ( x | c ) assumption 1 2 n = ⋅ ⋅ ⋅ ' ( a , a , , a ) x Apply the MAP classification rule: assign to c* if – 1 2 n ⋅ ⋅ ⋅ > ⋅ ⋅ ⋅ ≠ = ⋅ ⋅ ⋅ * * * * [ P ( a | c ) P ( a | c )] P ( c ) [ P ( a | c ) P ( a | c )] P ( c ), c c , c c , , c 1 n 1 n 1 L ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ esitmate of P ( a , , a | c ) * estimate of P ( a , , a | c ) 1 n 1 n 9 COMP24111 Machine Learning

Naïve Bayes = ⋅ ⋅ ⋅ For each target va lue of c (c c , , c ) i i 1 L ˆ ← P ( c ) estimate P ( c ) with examples in S ; i i = ⋅ ⋅ ⋅ = ⋅ ⋅ ⋅ For every feature value x of each feature x ( j 1 , , F ; k 1 , , N ) jk j j ˆ = ← P x x c estimate P x c with examples in S; ( | ) ( | ) j jk i jk i ′ ′ = ⋅ ⋅ ⋅ x ' ( a , , a ) 1 n ′ ′ ′ ′ ˆ ⋅ ⋅ ⋅ ˆ ˆ > ˆ ⋅ ⋅ ⋅ ˆ ˆ ≠ = ⋅ ⋅ ⋅ * * * * [ P ( a | c ) P ( a | c )] P ( c ) [ P ( a | c ) P ( a | c )] P ( c ), c c , c c , , c 1 n 1 i n i i i i 1 L 10 COMP24111 Machine Learning

Example • Example: Play Tennis 11 COMP24111 Machine Learning

Example • Learning Phase Outlook Play= Yes Play= No Temperature Play= Yes Play= No Sunny Hot 2/9 3/5 2/9 2/5 Overcast Mild 4/9 0/5 4/9 2/5 Rain Cool 3/9 2/5 3/9 1/5 Humidity Play= Yes Play=N o Wind Play= Yes Play= No Strong High 3/9 3/5 3/9 4/5 Weak Normal 6/9 2/5 6/9 1/5 P (Play =Yes) = 9/14 P (Play =No) = 5/14 12 COMP24111 Machine Learning

Naïve Bayes • Algorithm: Continuous-valued Features – Numberless values taken by a continuous-valued feature – Conditional probability often modeled with the normal distribution   − µ 2 ( x ) 1   ˆ = − j ji P ( x | c ) exp   j i σ π σ 2 2 2   ji ji µ = : mean (avearage) of feature values x of examples for which c c ji j i σ = : standard deviation of feature values x of examples for which c c ji j i = ⋅ ⋅ ⋅ = ⋅ ⋅ ⋅ for ( X , , X ), C c , , c X Learning Phase: – 1 F 1 L F × = = ⋅ ⋅ ⋅ L P ( C c ) i 1 , , L Output: normal distributions and i ′ ′ ′ = ⋅ ⋅ ⋅ X ( 1 a , , a ) – Test Phase: Given an unknown instance n • Instead of looking-up tables, calculate conditional probabilities with all the normal distributions achieved in the learning phrase • Apply the MAP rule to assign a label (the same as done for the discrete case) 14 COMP24111 Machine Learning

Generative Models and Nave Bayes Ke Chen Reading: [14.3, EA], - PowerPoint PPT Presentation

Generative Models and Nave Bayes Ke Chen Reading: [14.3, EA], [3.5, KPM], [1.5.4, CMB] COMP24111 Machine Learning Outline Background and Probability Basics Probabilistic Classification Principle Probabilistic discriminative

Naive Bayes and Gaussian Bayes Classifier Ladislav Rampasek slides by Mengye Ren and others

generative design systems Generative Brief Design Definitions Workshop Processes

The Nave Bayes Classifier Machine Learning 1 Todays lecture The nave Bayes Classifier

Bayes Theorem Thomas Bayes (1701-1761) Simple form of Bayes Theorem, for

Generative networks part 2: GANs 23 / 54 Recap on generative networks Generative networks provide

CSC421/2516 Lecture 18: Generative Adversarial Networks Roger Grosse and Jimmy Ba Roger Grosse

Cognitive Modeling Unseen Examples 2 Bayes Classifiers Lecture 14: Naive Bayes Classifiers

DATA MINING: NAVE BAYES 1 Nave Bayes Classifier Thomas Bayes 1702 - 1761 We will start off

STAT 339 Naive Bayes Classification 8-10 March 2017 Colin Reimer Dawson Outline Naive Bayes

Bayes Classifiers Nave Bayes Classification Patrick Mair Bayes Classifiers Weather data

I ntroduction to Mobile Robotics Bayes Filter Kalm an Filter Wolfram Burgard 1 Bayes

Deep Generative models for Inverse Problems Alex Dimakis joint work with Ashish Bora, Dave Van

Invertible Generative Models for Inverse Problems Mitigating Representation Error and Dataset Bias

Generative Adversarial Nets(GANs) Troy Cary and Chenzhi Zhao A generative adversarial net is

Learning Deep Generative Models Inference & Representation Lecture 12 Rahul G. Krishnan

Compressed Sensing and Generative Models Ashish Bora Ajil Jalal Eric Price Alex Dimakis UT

quancol . ........ . . . ... ... ... ... ... ... ... Hillston 21/9/2016 1 / 70

P(X=x i ) , or P(x i ) , is the probability that the = Pr( x ( a , b )) p (

Using the UART with STM32 Microcontrollers Corrado Santoro ARSLAB - Autonomous and Robotic

FedRec: Federated Recommendation with Explicit Feedback Guanyu Lin 1 , 2 # , Feng Liang 1 , 2 # ,

4 Bayesian Belief Networks (also called Bayes Nets) Interesting because: The Naive Bayes

ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale

Bayesian networks (2) Lirong Xia Last class Bayesian networks compact, graphical

Slides for Lecture 5 ENEL 353: Digital Circuits Fall 2013 Term Steve Norman, PhD, PEng

Generative Models and Nave Bayes Ke Chen Reading: [14.3, EA], - PowerPoint PPT Presentation

Generative Models and Nave Bayes Ke Chen Reading: [14.3, EA], [3.5, KPM], [1.5.4, CMB] COMP24111 Machine Learning Outline Background and Probability Basics Probabilistic Classification Principle Probabilistic discriminative

Naive Bayes and Gaussian Bayes Classifier Ladislav Rampasek slides by Mengye Ren and others

generative design systems Generative Brief Design Definitions Workshop Processes

The Nave Bayes Classifier Machine Learning 1 Todays lecture The nave Bayes Classifier

Bayes Theorem Thomas Bayes (1701-1761) Simple form of Bayes Theorem, for

Generative networks part 2: GANs 23 / 54 Recap on generative networks Generative networks provide

CSC421/2516 Lecture 18: Generative Adversarial Networks Roger Grosse and Jimmy Ba Roger Grosse

Cognitive Modeling Unseen Examples 2 Bayes Classifiers Lecture 14: Naive Bayes Classifiers

DATA MINING: NAVE BAYES 1 Nave Bayes Classifier Thomas Bayes 1702 - 1761 We will start off

STAT 339 Naive Bayes Classification 8-10 March 2017 Colin Reimer Dawson Outline Naive Bayes

Bayes Classifiers Nave Bayes Classification Patrick Mair Bayes Classifiers Weather data

I ntroduction to Mobile Robotics Bayes Filter Kalm an Filter Wolfram Burgard 1 Bayes

Deep Generative models for Inverse Problems Alex Dimakis joint work with Ashish Bora, Dave Van

Invertible Generative Models for Inverse Problems Mitigating Representation Error and Dataset Bias

Generative Adversarial Nets(GANs) Troy Cary and Chenzhi Zhao A generative adversarial net is

Learning Deep Generative Models Inference &amp; Representation Lecture 12 Rahul G. Krishnan

Compressed Sensing and Generative Models Ashish Bora Ajil Jalal Eric Price Alex Dimakis UT

quancol . ........ . . . ... ... ... ... ... ... ... Hillston 21/9/2016 1 / 70

P(X=x i ) , or P(x i ) , is the probability that the = Pr( x ( a , b )) p (

Using the UART with STM32 Microcontrollers Corrado Santoro ARSLAB - Autonomous and Robotic

FedRec: Federated Recommendation with Explicit Feedback Guanyu Lin 1 , 2 # , Feng Liang 1 , 2 # ,

4 Bayesian Belief Networks (also called Bayes Nets) Interesting because: The Naive Bayes

ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale

Bayesian networks (2) Lirong Xia Last class Bayesian networks compact, graphical

Slides for Lecture 5 ENEL 353: Digital Circuits Fall 2013 Term Steve Norman, PhD, PEng

Learning Deep Generative Models Inference & Representation Lecture 12 Rahul G. Krishnan