generative and discriminative learning
play

Generative and Discriminative Learning Machine Learning 1 What we - PowerPoint PPT Presentation

Generative and Discriminative Learning Machine Learning 1 What we saw most of the semester A fixed, unknown distribution D over X Y X: Instance space, Y: label space (eg: {+1, -1}) Given a dataset S = {(x i , y i )} Learning


  1. Generative and Discriminative Learning Machine Learning 1

  2. What we saw most of the semester • A fixed, unknown distribution D over X £ Y – X: Instance space, Y: label space (eg: {+1, -1}) • Given a dataset S = {(x i , y i )} • Learning – Identify a hypothesis space H, define a loss function L(h, x, y) – Minimize average loss over training data (plus regularization) • The guarantee – If we find an algorithm that minimizes loss on the observed data – Then, learning theory guarantees good future behavior (as a function of H) 2

  3. What we saw most of the semester Is this different • A fixed, unknown distribution D over X £ Y from assuming a – X: Instance space, Y: label space (eg: {+1, -1}) distribution over X and a fixed oracle function f? • Given a dataset S = {(x i , y i )} • Learning – Identify a hypothesis space H, define a loss function L(h, x, y) – Minimize average loss over training data (plus regularization) • The guarantee – If we find an algorithm that minimizes loss on the observed data – Then, learning theory guarantees good future behavior (as a function of H) 3

  4. What we saw most of the semester Is this different • A fixed, unknown distribution D over X £ Y from assuming a – X: Instance space, Y: label space (eg: {+1, -1}) distribution over X and a fixed oracle function f? • Given a dataset S = {(x i , y i )} • Learning – Identify a hypothesis space H, define a loss function L(h, x, y) – Minimize average loss over training data (plus regularization) • The guarantee – If we find an algorithm that minimizes loss on the observed data – Then, learning theory guarantees good future behavior (as a function of H) 4

  5. What we saw most of the semester Is this different • A fixed, unknown distribution D over X £ Y from assuming a – X: Instance space, Y: label space (eg: {+1, -1}) distribution over X and a fixed oracle function f? • Given a dataset S = {(x i , y i )} • Learning – Identify a hypothesis space H, define a loss function L(h, x, y) – Minimize average loss over training data (plus regularization) • The guarantee – If we find an algorithm that minimizes loss on the observed data – Then, learning theory guarantees good future behavior (as a function of H) 5

  6. What we saw most of the semester Is this different • A fixed, unknown distribution D over X £ Y from assuming a – X: Instance space, Y: label space (eg: {+1, -1}) distribution over X and a fixed oracle function f? • Given a dataset S = {(x i , y i )} • Learning – Identify a hypothesis space H, define a loss function L(h, x, y) – Minimize average loss over training data (plus regularization) • The guarantee – If we find an algorithm that minimizes loss on the observed data – Then, learning theory guarantees good future behavior (as a function of H) 6

  7. Discriminative models Goal: learn directly how to make predictions • Look at many (positive/negative) examples • Discover regularities in the data • Use these to construct a prediction policy • Assumptions come in the form of the hypothesis class Bottom line: approximating ℎ: 𝑌 → 𝑍 is estimating the conditional probability 𝑄(𝑍|𝑌) 7

  8. Generative models • Explicitly model how instances in each category are generated by modeling the joint probability of X and Y, that is 𝑄(𝑍, 𝑌) • That is, learn 𝑄(𝑌|𝑍) and 𝑄(𝑍) • We did this for naïve Bayes – Naïve Bayes is a generative model • Predict 𝑄(𝑍|𝑌) using the Bayes rule 8

  9. Example: Generative story of naïve Bayes 9

  10. Example: Generative story of naïve Bayes P(Y) Y First sample a label 10

  11. Example: Generative story of naïve Bayes P(Y) Y X 1 P(X 1 | Y) Given the label, sample the features independently from the conditional distributions 11

  12. Example: Generative story of naïve Bayes P(Y) Y X 1 X 2 P(X 1 | Y) P(X 2 | Y) Given the label, sample the features independently from the conditional distributions 12

  13. Example: Generative story of naïve Bayes P(Y) Y X 1 X 2 X 3 P(X 1 | Y) P(X 2 | Y) P(X 3 | Y) Given the label, sample the features independently from the conditional distributions 13

  14. Example: Generative story of naïve Bayes P(Y) Y . . . X 1 X 2 X 3 P(X 1 | Y) P(X 2 | Y) P(X 3 | Y) Given the label, sample the features independently from the conditional distributions 14

  15. Example: Generative story of naïve Bayes P(Y) Y . . . X 1 X 2 X 3 X d P(X 1 | Y) P(X 2 | Y) P(X 3 | Y) P(X d | Y) Given the label, sample the features independently from the conditional distributions 15

  16. Generative vs Discriminative models • Generative models – learn P(x, y) – Use the capacity of the model to characterize how the data is generated (both inputs and outputs) – Eg: Naïve Bayes, Hidden Markov Model • Discriminative models – learn P(y | x) – Use the capacity of the model to characterize the decision boundary only – Eg: Logistic Regression, Conditional models (several names) 16

  17. Generative vs Discriminative models • Generative models – learn P(x, y) – Use the capacity of the model to characterize how the data is generated (both inputs and outputs) – Eg: Naïve Bayes, Hidden Markov Model • Discriminative models – learn P(y | x) – Use model capacity to characterize the decision boundary only – Eg: Logistic Regression, Conditional models (several names), most neural models 17

  18. Generative vs Discriminative models • Generative models – learn P(x, y) – Use the capacity of the model to characterize how the data is generated (both inputs and outputs) – Eg: Naïve Bayes, Hidden Markov Model • Discriminative models – learn P(y | x) – Use model capacity to characterize the decision boundary only – Eg: Logistic Regression, Conditional models (several names), most neural models 18

  19. Generative vs Discriminative models • Generative models – learn P(x, y) – Use the capacity of the model to characterize how the data is generated (both inputs and outputs) – Eg: Naïve Bayes, Hidden Markov Model • Discriminative models – learn P(y | x) – Use model capacity to characterize the decision boundary only – Eg: Logistic Regression, Conditional models (several names), most neural models 19

  20. Generative vs Discriminative models • Generative models – learn P(x, y) – Use the capacity of the model to characterize how the data is generated (both inputs and outputs) – Eg: Naïve Bayes, Hidden Markov Model A generative model tries to characterize the distribution of the inputs, a discriminative model doesn’t care • Discriminative models – learn P(y | x) – Use model capacity to characterize the decision boundary only – Eg: Logistic Regression, Conditional models (several names), most neural models 20

Recommend


More recommend