Discriminative vs. Generative Learning CS 760@UW-Madison Goals for - PowerPoint PPT Presentation

Discriminative vs. Generative Learning CS 760@UW-Madison

Goals for the lecture you should understand the following concepts • the relationship between logistic regression and Naïve Bayes • the relationship between discriminative and generative learning • when discriminative/generative is likely to learn more accurate models

Review

Discriminative vs. Generative Discriminative approach: • hypothesis ℎ ∈ 𝐼 directly predicts the label given the features 𝑧 = ℎ(𝑦) or more generally, 𝑞 𝑧 𝑦 = ℎ(𝑦) • then define a loss function 𝑀(ℎ) and find hypothesis with min. loss Generative approach: • hypothesis ℎ ∈ 𝐼 specifies a generative story for how the data was created: 𝑞(𝑦, 𝑧) = ℎ(𝑦, 𝑧) • then pick a hypothesis by maximum likelihood estimation (MLE) or Maximum A Posteriori (MAP)

Summary: generative approach • Step 1: specify the joint data distribution (generative story) • Step 2: use MLE or MAP for training • Step 3: use Bayes’ rule for inference on test instances • Example: Naïve Bayes (conditional independence) 𝑞 𝑦, 𝑧 = 𝑞 𝑧 𝑞 𝑦 𝑧 = 𝑞 𝑧 ෑ 𝑞(𝑦 𝑗 |𝑧) 𝑗

Summary: discriminative approach • Step 1: specify the hypothesis class • Step 2: specify the loss • Step 3: design optimization algorithm for training How to design the hypotheses and the loss? Can design by a generative approach! • Step 0: specify 𝑞 𝑦 𝑧 and 𝑞(𝑧) • Step 1: compute hypotheses 𝑞(𝑧|𝑦) using Bayes’ rule • Step 2: use conditional MLE to derive the negative log- likelihood loss (or use MAP to derive the loss) • Step 3: design optimization algorithm for training • Example: logistic regression

Logistic regression • Suppose the class-conditional densities 𝑞 𝑦 𝑧 is normal 2𝜌 𝑒/2 exp − 1 1 2 𝑞 𝑦 𝑧 = 𝑞 𝑦 𝑍 = 𝑧 = 𝑂 𝑦|𝜈 𝑧 , 𝐽 = 𝑦 − 𝜈 𝑧 2 • Then conditional probability by Bayes’ rule: exp(𝑏 𝑧 ) 𝑞 𝑦|𝑍 = 𝑧 𝑞(𝑍 = 𝑧) 𝑞 𝑍 = 𝑧|𝑦 = σ 𝑙 𝑞 𝑦|𝑍 = 𝑙 𝑞(𝑍 = 𝑙) = σ 𝑙 exp(𝑏 𝑙 ) where 𝑈 = − 1 2 𝑦 𝑈 𝑦 + 𝑥 𝑙 𝑦 + 𝑐 𝑙 𝑏 𝑙 ≔ ln 𝑞 𝑦 𝑍 = 𝑙 𝑞 𝑍 = 𝑙 with 𝑐 𝑙 = − 1 1 𝑥 𝑙 = 𝜈 𝑙 , 𝑈 𝜈 𝑙 + ln 𝑞 𝑍 = 𝑙 + ln 2 𝜈 𝑙 2𝜌 𝑒/2

Logistic regression • Suppose the class-conditional densities 𝑞 𝑦 𝑧 is normal 2𝜌 𝑒/2 exp − 1 1 2 𝑞 𝑦 𝑧 = 𝑞 𝑦 𝑍 = 𝑧 = 𝑂 𝑦|𝜈 𝑧 , 𝐽 = 𝑦 − 𝜈 𝑧 2 1 2 𝑦 𝑈 𝑦 , we have • Cancel out − exp(𝑏 𝑧 ) 𝑥 𝑙 𝑈 𝑦 + 𝑐 𝑙 𝑞 𝑍 = 𝑧|𝑦 = σ 𝑙 exp(𝑏 𝑙 ) , 𝑏 𝑙 ≔ where 𝑐 𝑙 = − 1 1 𝑥 𝑙 = 𝜈 𝑙 , 𝑈 𝜈 𝑙 + ln 𝑞 𝑍 = 𝑙 + ln 2 𝜈 𝑙 2𝜌 𝑒/2

Logistic regression: summary • Suppose the class-conditional densities 𝑞 𝑦 𝑧 is normal 2𝜌 𝑒/2 exp − 1 1 2 𝑞 𝑦 𝑧 = 𝑞 𝑦 𝑍 = 𝑧 = 𝑂 𝑦|𝜈 𝑧 , 𝐽 = 𝑦 − 𝜈 𝑧 2 • Then exp( 𝑥 𝑧 𝑈 𝑦 + 𝑐 𝑧 ) 𝑞 𝑍 = 𝑧|𝑦 = σ 𝑙 exp( 𝑥 𝑙 𝑈 𝑦 + 𝑐 𝑙 ) which is the hypothesis class for multiclass logistic regression • Training: find parameters {𝑥 𝑙 , 𝑐 𝑙 } that minimize the negative log-likelihood loss 𝑛 − 1 log 𝑞 𝑧 = 𝑧 (𝑘) 𝑦 (𝑘) 𝑛 ෍ 𝑘=1

Naïve Bayes vs. Logistic Regression

Connecting Naïve Bayes and logistic regression • Interesting observation: logistic regression is derived from the generative story 2𝜌 𝑒/2 exp − 1 1 2 𝑞 𝑦 𝑧 = 𝑞 𝑦 𝑍 = 𝑧 = 𝑂 𝑦|𝜈 𝑧 , 𝐽 = 𝑦 − 𝜈 𝑧 2 1 exp − 1 2 = 2𝜌 𝑒/2 ෑ 2 𝑦 𝑗 − 𝑣 𝑧𝑗 𝑗 which is a special case of Naïve Bayes! • Is the general Naïve Bayes assumption enough to get logistic regression? (Instead of the more special Normal distribution assumption) • Yes, with an additional linearity assumption

Naïve Bayes revisited consider Naïve Bayes for a binary classification task n  = = ( 1 ) ( | 1 ) P Y P x Y i = = = 1 ( 1 | ,..., ) i P Y x x 1 n ( ,..., ) P x x 1 n n  expanding denominator = = ( 1 ) ( | 1 ) P Y P x Y i = = 1 i n n   = = + = = ( 1 ) ( | 1 ) ( 0 ) ( | 0 ) P Y P x Y P Y P x Y i i = = 1 1 i i dividing everything by numerator 1 = n  = = ( 0 ) ( | 0 ) P Y P x Y i + = 1 1 i n  = = ( 1 ) ( | 1 ) P Y P x Y i = 1 i

Naïve Bayes revisited 1 = = ( 1 | ,..., ) P Y x x 1 n n  = = ( 0 ) ( | 0 ) P Y P x Y i + = 1 1 i n  = = ( 1 ) ( | 1 ) P Y P x Y i = 1 i 1 = applying exp(ln( a )) = a     n    = =   ( 0 ) ( | 0 ) P Y P x Y i     + = 1 1 exp ln i     n  = =  ( 1 ) ( | 1 )   P Y P x Y  i     = 1 i 1 applying ln(a/b) = -ln(b/a) =     n     = =  ( 1 ) ( | 1 ) P Y P x Y i     + − = 1 1 exp ln i     n  = =  ( 0 ) ( | 0 )   P Y P x Y  i     = 1 i

Naïve Bayes revisited 1 = = ( 1 | ,..., ) P Y x x 1   n   n    = =   ( 1 ) ( | 1 ) P Y P x Y i     + − = 1 1 exp ln i     n  = =  ( 0 ) ( | 0 )   P Y P x Y  i     = 1 i converting log of products to sum of logs 1 = = ( 1 | ,..., ) P Y x x 1   n    =  = n ( 1 ) ( | 1 ) P Y  P x Y   + − −     1 exp ln ln i     = =   ( 0 )  ( | 0 )   P Y P x Y  = 1 i i Does this look familiar?

Naïve Bayes vs. logistic regression Naïve Bayes 1 = = ( 1 | ,..., ) P Y x x   1 n    =  = n ( 1 ) ( | 1 ) P Y  P x Y     + −   − i 1 exp ln ln     = =  ( 0 )  ( | 0 )    P Y P x Y  = 1 i i Linearity assumption: logistic regression the log-ratio is linear in 𝑦 1 = ( ) f x     n    + − +   1 exp w w i x   0 i     = 1 i

Naïve Bayes vs. logistic regression Naïve Bayes 1 = = ( 1 | ,..., ) P Y x x   1 n    =  = n ( 1 ) ( | 1 ) P Y  P x Y     + −   − i 1 exp ln ln     = =  ( 0 )  ( | 0 )    P Y P x Y  = 1 i i Linearity assumption: logistic regression the log-ratio is linear in 𝑦 1 = ( ) f x     n    + − +   1 exp w w i x   0 i     = 1 i Summary: If we begin with a Naïve Bayes generative story to derive a discriminative approach (assuming linearity), we get logistic regression!

Naïve Bayes vs. logistic regression Naïve Bayes Generative counterpart of logistic regression 1 = = ( 1 | ,..., ) P Y x x   1 n    =  = n ( 1 ) ( | 1 ) P Y  P x Y     + −   − i 1 exp ln ln     = =  ( 0 )  ( | 0 )    P Y P x Y  = 1 i i Discriminative counterpart of Naïve Bayes logistic regression 1 = ( ) f x     n    + − +   1 exp w w i x   0 i     = 1 i Summary: If we begin with a Naïve Bayes generative story to derive a discriminative approach (assuming linearity), we get logistic regression!

Naïve Bayes vs. logistic regression Conditional Independence (Naïve Bayes assumption) Generative approach Discriminative approach (+ linearity assumption) Naïve Bayes method Logistic regression

Logistic regression as a neural net ln P ( Y = 1) æ ö ÷ ç è P ( Y = 0) ø Y 1 ln P ( red | Y = 1) æ ö ÷ ç è P ( red | Y = 0) ø Color=red Color=blue Size=big Size=small ln P ( blue | Y = 1) æ ö ÷ ç è P ( blue | Y = 0) ø The connection can give interpretation for the weights in logistic regression: weights correspond to log ratios

Which is better?

Naïve Bayes vs. logistic regression • they have the same functional form, and thus have the same hypothesis space bias (recall our discussion of inductive bias) • Do they learn the same models? In general, no . They use different methods to estimate the model parameters. Naïve Bayes uses MLE to learn the parameters 𝑞(𝑦 𝑗 |𝑧) , whereas LR minimizes the loss to learn the parameters 𝑥 𝑗 .

Discriminative vs. Generative Learning CS 760@UW-Madison Goals for - PowerPoint PPT Presentation

Discriminative vs. Generative Learning CS 760@UW-Madison Goals for the lecture you should understand the following concepts the relationship between logistic regression and Nave Bayes the relationship between discriminative and

Generative vs. discriminative Generative Discriminative Belief network A is more More

Discriminative Models Joakim Nivre Uppsala University Department of Linguistics and Philology

Logistic Regression, Generative and Discriminative Classifiers Recommended reading: Ng and

generative design systems Generative Brief Design Definitions Workshop Processes

Generative Models for Discriminative Problems Chris Dyer DeepMind ASRU 2017 December 19,

Generative networks part 2: GANs 23 / 54 Recap on generative networks Generative networks provide

Generative Learning INFO-4604, Applied Machine Learning University of Colorado Boulder November

Augmented Statistical Models: Exploiting Generative Models in Discriminative Classifiers Martin

Discriminative word alignment by learning the Discriminative word alignment by learning the

Three models for discriminative machine Three models for discriminative machine translation using

Introduction to Generative Models (and GANs) Haoqiang Fan fhq@megvii.com Nov. 2017 Figures

Generative Adversarial Nets(GANs) Troy Cary and Chenzhi Zhao A generative adversarial net is

CSC421/2516 Lecture 18: Generative Adversarial Networks Roger Grosse and Jimmy Ba Roger Grosse

Linear discriminant functions Andrea Passerini passerini@disi.unitn.it Machine Learning Linear

Generative and Discriminative Methods for Online Adaptation in SMT aschle , P. Simianer ,

Generative and discriminative classification techniques Machine Learning and Category

Chapter 3: Modeling with First-Order Differential Equations Department of Electrical Engineering

Machine Learning - MT 2016 16. Course Summary Varun Kanade University of Oxford November 30,

Overfitting Many hypotheses consistent with/close to the data About this class With enough

LEARNING Outline Math Behind Logistic Regression Visualizing Logistic Regression Loss

Logistic Regression Required reading: Mitchell draft chapter (see course website)

Optimal scaling and convergence of Markov chain Monte Carlo methods Alain Durmus Joint work

Overview of logistic regression Richard Erickson Instructor DataCamp Generalized Linear Models

Logistic Regression Two Worlds: Probabilistic & Algorithmic We know two conceptual approaches

Sambuz

Useful Links

Newsletter

Mail Us

Discriminative vs. Generative Learning CS 760@UW-Madison Goals for - PowerPoint PPT Presentation

Discriminative vs. Generative Learning CS 760@UW-Madison Goals for the lecture you should understand the following concepts the relationship between logistic regression and Nave Bayes the relationship between discriminative and

Generative vs. discriminative Generative Discriminative Belief network A is more More

Discriminative Models Joakim Nivre Uppsala University Department of Linguistics and Philology

Logistic Regression, Generative and Discriminative Classifiers Recommended reading: Ng and

generative design systems Generative Brief Design Definitions Workshop Processes

Generative Models for Discriminative Problems Chris Dyer DeepMind ASRU 2017 December 19,

Generative networks part 2: GANs 23 / 54 Recap on generative networks Generative networks provide

Generative Learning INFO-4604, Applied Machine Learning University of Colorado Boulder November

Augmented Statistical Models: Exploiting Generative Models in Discriminative Classifiers Martin

Discriminative word alignment by learning the Discriminative word alignment by learning the

Three models for discriminative machine Three models for discriminative machine translation using

Introduction to Generative Models (and GANs) Haoqiang Fan fhq@megvii.com Nov. 2017 Figures

Generative Adversarial Nets(GANs) Troy Cary and Chenzhi Zhao A generative adversarial net is

CSC421/2516 Lecture 18: Generative Adversarial Networks Roger Grosse and Jimmy Ba Roger Grosse

Linear discriminant functions Andrea Passerini passerini@disi.unitn.it Machine Learning Linear

Generative and Discriminative Methods for Online Adaptation in SMT aschle , P. Simianer ,

Generative and discriminative classification techniques Machine Learning and Category

Chapter 3: Modeling with First-Order Differential Equations Department of Electrical Engineering

Machine Learning - MT 2016 16. Course Summary Varun Kanade University of Oxford November 30,

Overfitting Many hypotheses consistent with/close to the data About this class With enough

LEARNING Outline Math Behind Logistic Regression Visualizing Logistic Regression Loss

Logistic Regression Required reading: Mitchell draft chapter (see course website)

Optimal scaling and convergence of Markov chain Monte Carlo methods Alain Durmus Joint work

Overview of logistic regression Richard Erickson Instructor DataCamp Generalized Linear Models

Logistic Regression Two Worlds: Probabilistic &amp; Algorithmic We know two conceptual approaches

Sambuz

Useful Links

Newsletter

Mail Us

Logistic Regression Two Worlds: Probabilistic & Algorithmic We know two conceptual approaches