Generative and Discriminative Learning Machine Learning 1 What we - PowerPoint PPT Presentation

Generative and Discriminative Learning Machine Learning 1

What we saw most of the semester • A fixed, unknown distribution D over X £ Y – X: Instance space, Y: label space (eg: {+1, -1}) • Given a dataset S = {(x i , y i )} • Learning – Identify a hypothesis space H, define a loss function L(h, x, y) – Minimize average loss over training data (plus regularization) • The guarantee – If we find an algorithm that minimizes loss on the observed data – Then, learning theory guarantees good future behavior (as a function of H) 2

What we saw most of the semester Is this different • A fixed, unknown distribution D over X £ Y from assuming a – X: Instance space, Y: label space (eg: {+1, -1}) distribution over X and a fixed oracle function f? • Given a dataset S = {(x i , y i )} • Learning – Identify a hypothesis space H, define a loss function L(h, x, y) – Minimize average loss over training data (plus regularization) • The guarantee – If we find an algorithm that minimizes loss on the observed data – Then, learning theory guarantees good future behavior (as a function of H) 3

Discriminative models Goal: learn directly how to make predictions • Look at many (positive/negative) examples • Discover regularities in the data • Use these to construct a prediction policy • Assumptions come in the form of the hypothesis class Bottom line: approximating ℎ: 𝑌 → 𝑍 is estimating the conditional probability 𝑄(𝑍|𝑌) 7

Generative models • Explicitly model how instances in each category are generated by modeling the joint probability of X and Y, that is 𝑄(𝑍, 𝑌) • That is, learn 𝑄(𝑌|𝑍) and 𝑄(𝑍) • We did this for naïve Bayes – Naïve Bayes is a generative model • Predict 𝑄(𝑍|𝑌) using the Bayes rule 8

Example: Generative story of naïve Bayes 9

Example: Generative story of naïve Bayes P(Y) Y First sample a label 10

Example: Generative story of naïve Bayes P(Y) Y X 1 P(X 1 | Y) Given the label, sample the features independently from the conditional distributions 11

Example: Generative story of naïve Bayes P(Y) Y X 1 X 2 P(X 1 | Y) P(X 2 | Y) Given the label, sample the features independently from the conditional distributions 12

Example: Generative story of naïve Bayes P(Y) Y X 1 X 2 X 3 P(X 1 | Y) P(X 2 | Y) P(X 3 | Y) Given the label, sample the features independently from the conditional distributions 13

Example: Generative story of naïve Bayes P(Y) Y . . . X 1 X 2 X 3 P(X 1 | Y) P(X 2 | Y) P(X 3 | Y) Given the label, sample the features independently from the conditional distributions 14

Example: Generative story of naïve Bayes P(Y) Y . . . X 1 X 2 X 3 X d P(X 1 | Y) P(X 2 | Y) P(X 3 | Y) P(X d | Y) Given the label, sample the features independently from the conditional distributions 15

Generative vs Discriminative models • Generative models – learn P(x, y) – Use the capacity of the model to characterize how the data is generated (both inputs and outputs) – Eg: Naïve Bayes, Hidden Markov Model • Discriminative models – learn P(y | x) – Use the capacity of the model to characterize the decision boundary only – Eg: Logistic Regression, Conditional models (several names) 16

Generative vs Discriminative models • Generative models – learn P(x, y) – Use the capacity of the model to characterize how the data is generated (both inputs and outputs) – Eg: Naïve Bayes, Hidden Markov Model • Discriminative models – learn P(y | x) – Use model capacity to characterize the decision boundary only – Eg: Logistic Regression, Conditional models (several names), most neural models 17

Generative vs Discriminative models • Generative models – learn P(x, y) – Use the capacity of the model to characterize how the data is generated (both inputs and outputs) – Eg: Naïve Bayes, Hidden Markov Model A generative model tries to characterize the distribution of the inputs, a discriminative model doesn’t care • Discriminative models – learn P(y | x) – Use model capacity to characterize the decision boundary only – Eg: Logistic Regression, Conditional models (several names), most neural models 20

Generative and Discriminative Learning Machine Learning 1 What we - PowerPoint PPT Presentation

Generative and Discriminative Learning Machine Learning 1 What we saw most of the semester A fixed, unknown distribution D over X Y X: Instance space, Y: label space (eg: {+1, -1}) Given a dataset S = {(x i , y i )} Learning

Generative vs. discriminative Generative Discriminative Belief network A is more More

Discriminative Models Joakim Nivre Uppsala University Department of Linguistics and Philology

Logistic Regression, Generative and Discriminative Classifiers Recommended reading: Ng and

generative design systems Generative Brief Design Definitions Workshop Processes

Generative Learning INFO-4604, Applied Machine Learning University of Colorado Boulder November

Discriminative vs. Generative Learning CS 760@UW-Madison Goals for the lecture you should

Discriminative word alignment by learning the Discriminative word alignment by learning the

Generative Models for Discriminative Problems Chris Dyer DeepMind ASRU 2017 December 19,

Generative networks part 2: GANs 23 / 54 Recap on generative networks Generative networks provide

Augmented Statistical Models: Exploiting Generative Models in Discriminative Classifiers Martin

Linear discriminant functions Andrea Passerini passerini@disi.unitn.it Machine Learning Linear

PAC Learning + Oracles, Sampling, Generative vs. Discriminative Matt Gormley Lecture 16 Oct.

Introduction to Generative Models (and GANs) Haoqiang Fan fhq@megvii.com Nov. 2017 Figures

Generative Adversarial Nets(GANs) Troy Cary and Chenzhi Zhao A generative adversarial net is

Three models for discriminative machine Three models for discriminative machine translation using

Generative and discriminative classification techniques Machine Learning and Category

Machine Learning Anders Holst SICS Big Data Analytics Analysis Big Data Big Value Big Data

Statistical Tools in Collider Experiments Multivariate analysis in high energy physics Lecture 3

On Dangers of Overtraining Steganography to Incomplete Cover Model Jan Kodovsk, Jessica

Generative Adversarial Networks (part 2) Benjamin Striner 1 1 Carnegie Mellon University April 22,

Natural Language Processing Classification I Dan Klein UC Berkeley 1 2 Classification

Machine Learning Techniques for HEP Data Analysis with T MVA Andreas Hoecker ( * ) (CERN) Seminar,

Multivariate Data Analysis with T MVA Andreas Hoecker ( * ) (CERN) Statistical Tools Workshop,

Charming new results from STAR! NSD Staff Meeting, January 22, 2019 Sooraj Radhakrishnan