linear models
play

Linear Models Machine Learning 1 Checkpoint: The bigger picture - PowerPoint PPT Presentation

Linear Models Machine Learning 1 Checkpoint: The bigger picture Supervised learning: instances, concepts, and hypotheses Specific learners Learning Hypothesis/ Labeled algorithm Model h Decision trees data New example


  1. Linear Models Machine Learning 1

  2. Checkpoint: The bigger picture • Supervised learning: instances, concepts, and hypotheses • Specific learners Learning Hypothesis/ Labeled algorithm Model h – Decision trees data New example Prediction h • General ML ideas – Features as high dimensional vectors – Overfitting Questions? 2

  3. Checkpoint: The bigger picture • Supervised learning: instances, concepts, and hypotheses • Specific learners Learning Hypothesis/ Labeled algorithm Model h – Decision trees data New example Prediction h • General ML ideas – Features as high dimensional vectors – Overfitting Questions? 3

  4. Checkpoint: The bigger picture • Supervised learning: instances, concepts, and hypotheses • Specific learners Learning Hypothesis/ Labeled algorithm Model h – Decision trees data New example Prediction h • General ML ideas – Features as high dimensional vectors – Overfitting Questions? 4

  5. Checkpoint: The bigger picture • Supervised learning: instances, concepts, and hypotheses • Specific learners Learning Hypothesis/ Labeled algorithm Model h – Decision trees data New example Prediction h • General ML ideas – Features as high dimensional vectors – Overfitting Questions? 5

  6. Lecture outline • Linear models • What functions do linear classifiers express? 6

  7. Where are we? • Linear models – Introduction: Why linear classifiers and regressors? – Geometry of linear classifiers – A notational simplification – Learning linear classifiers: The lay of the land • What functions do linear classifiers express? 7

  8. Is learning possible at all? • There are 2 16 = 65536 possible Boolean functions over 4 inputs – Why? There are 16 possible outputs. Each way to fill these 16 slots is a different function, giving 2 16 functions. • We have seen only 7 outputs • We cannot know what the rest are without seeing them – Think of an adversary filling in the labels every time you make a guess at the function 8

  9. Is learning possible at all? • There are 2 16 = 65536 possible Boolean functions over 4 inputs – Why? There are 16 possible outputs. Each way to fill these 16 slots is a different function, giving 2 16 functions. How could we possibly learn anything? • We have seen only 7 outputs • We cannot know what the rest are without seeing them – Think of an adversary filling in the labels every time you make a guess at the function 9

  10. Solution: Restrict the search space A hypothesis space is the set of possible functions we consider • – We were looking at the space of all Boolean functions – Instead choose a hypothesis space that is smaller than the space of all functions • Only simple conjunctions (with four variables, there are only 16 conjunctions without negations) • Simple disjunctions m-of-n rules: Fix a set of n variables. At least m of them must be true • • Linear functions ! 10

  11. Which is the better classifier? Suppose this our training set and we have to separate the blue circles from the red triangles 11

  12. Which is the better classifier? Curve: A Suppose this our training set and we have to separate the blue circles from the red triangles 12

  13. Which is the better classifier? Curve: A Line: B Suppose this our training set and we have to separate the blue circles from the red triangles 13

  14. Which is the better classifier? Curve: A Line: B Suppose this our training set and we have to separate the blue circles from the red triangles Blue Red 14

  15. Which is the better classifier? Curve: A Line: B Suppose this our training set and we have to separate the blue circles from the red triangles Blue Red Think about overfitting Which curve runs the risk of overfitting? Simplicity versus Accuracy 15

  16. Similar argument for regression F( x ) x Linear regression might make smaller errors on new points 16

  17. Similar argument for regression F( x ) Curve: A x Linear regression might make smaller errors on new points 17

  18. Similar argument for regression F( x ) Curve: A Line: B x Linear regression might make smaller errors on new points 18

  19. Recall: Regression vs. Classification • Linear regression is about predicting real valued outputs • Linear classification is about predicting a discrete class label – +1 or -1 – SPAM or NOT-SPAM – Or more than two categories 19

  20. Linear classifiers: An example Suppose we want to determine whether a robot arm is defective or not using two measurements: 1. The maximum distance the arm can reach 𝑒 2. The maximum angle it can rotate 𝑏 Suppose we use a linear decision rule that predicts defective if 2𝑒 + 0.01𝑏 ≥ 7 We can apply this rule if we have the two measurements For example: for a certain arm, if d = 3 and a = 200, then 2𝑒 + 0.01𝑏 = 8 ≥ 7 The arm would be labeled as not defective 20

  21. Linear classifiers: An example Suppose we want to determine whether a robot arm is defective or not using two measurements: 1. The maximum distance the arm can reach 𝑒 2. The maximum angle it can rotate 𝑏 Suppose we use a linear decision rule that predicts defective if 2𝑒 + 0.01𝑏 ≥ 7 We can apply this rule if we have the two measurements For example: for a certain arm, if d = 3 and a = 200, then 2𝑒 + 0.01𝑏 = 8 ≥ 7 The arm would be labeled as not defective 21

  22. Linear classifiers: An example Suppose we want to determine whether a robot arm is defective or not using two measurements: 1. The maximum distance the arm can reach 𝑒 2. The maximum angle it can rotate 𝑏 Suppose we use a linear decision rule that predicts defective if 2𝑒 + 0.01𝑏 ≥ 7 We can apply this rule if we have the two measurements For example: for a certain arm, if d = 3 and a = 200, then 2𝑒 + 0.01𝑏 = 8 ≥ 7 The arm would be labeled as not defective 22

  23. Linear classifiers: An example Suppose we want to determine whether a robot arm is defective or not using two measurements: 1. The maximum distance the arm can reach 𝑒 2. The maximum angle it can rotate 𝑏 Suppose we use a linear decision rule that predicts defective if 2𝑒 + 0.01𝑏 ≥ 7 We can apply this rule if we have the two measurements This rule is an example of a linear classifier For example: for a certain arm, if d = 3 and a = 200, then Features are weighted and added up, 2𝑒 + 0.01𝑏 = 8 ≥ 7 the sum is checked against a threshold The arm would be labeled as not defective 23

  24. Linear Classifiers Inputs are 𝑒 dimensional feature vectors, denoted by 𝐲 Output is a label 𝑧 ∈ {−1, 1} Linear Threshold Units classify an example 𝐲 using parameters 𝐱 (a 𝑒 dimensional vector) and 𝑐 (a real number) according the following classification rule Output = sign 𝐱 ! 𝐲 + 𝑐 = sign = 𝑥 " 𝑦 " + 𝑐 " 𝐱 ! 𝐲 + 𝑐 ≥ 0 ⇒ 𝑧 = +1 𝐱 ! 𝐲 + 𝑐 < 0 ⇒ 𝑧 = −1 𝑐 is called the bias term 24

  25. Linear Classifiers Inputs are 𝑒 dimensional feature vectors, denoted by 𝐲 Output is a label 𝑧 ∈ {−1, 1} Linear Threshold Units classify an example 𝐲 using parameters 𝐱 (a 𝑒 dimensional vector) and 𝑐 (a real number) according the following classification rule Output = sign 𝐱 ! 𝐲 + 𝑐 = sign = 𝑥 " 𝑦 " + 𝑐 " 𝐱 ! 𝐲 + 𝑐 ≥ 0 ⇒ 𝑧 = +1 𝐱 ! 𝐲 + 𝑐 < 0 ⇒ 𝑧 = −1 𝑐 is called the bias term 25

  26. Linear Classifiers Inputs are 𝑒 dimensional feature vectors, denoted by 𝐲 Output is a label 𝑧 ∈ {−1, 1} Linear Threshold Units classify an example 𝐲 using parameters 𝐱 (a 𝑒 dimensional vector) and 𝑐 (a real number) according the following classification rule Output = sign 𝐱 ! 𝐲 + 𝑐 = sign = 𝑥 " 𝑦 " + 𝑐 " if 𝐱 ! 𝐲 + 𝑐 ≥ 0 then predict 𝑧 = +1 if 𝐱 ! 𝐲 + 𝑐 < 0 then predict 𝑧 = −1 𝑐 is called the bias term 26

  27. The geometry of a linear classifier An illustration in two dimensions x 1 x 2 27

  28. The geometry of a linear classifier An illustration in two dimensions +++ + + + + + x 1 - - - - - - - - - - - - -- - - - - x 2 28

  29. The geometry of a linear classifier An illustration in two dimensions sgn(b +w 1 x 1 + w 2 x 2 ) +++ + + + + + x 1 - - - - - - - - - - - - -- - - - - x 2 29

  30. The geometry of a linear classifier An illustration in two dimensions sgn(b +w 1 x 1 + w 2 x 2 ) b +w 1 x 1 + w 2 x 2 =0 +++ + + + + + x 1 - - - - - - - - - - - - -- - - - - x 2 30

  31. The geometry of a linear classifier An illustration in two dimensions sgn(b +w 1 x 1 + w 2 x 2 ) b +w 1 x 1 + w 2 x 2 =0 +++ + + + + + x 1 - - - - - - - - - - - - -- - - - - x 2 31

  32. The geometry of a linear classifier An illustration in two dimensions sgn(b +w 1 x 1 + w 2 x 2 ) b +w 1 x 1 + w 2 x 2 =0 +++ + + + + + [w 1 w 2 ] x 1 - - - - - - - - - - - - -- - - - - x 2 32

  33. The geometry of a linear classifier An illustration in two dimensions sgn(b +w 1 x 1 + w 2 x 2 ) We only care about the sign, not the magnitude b +w 1 x 1 + w 2 x 2 =0 +++ + + + + + [w 1 w 2 ] x 1 - - - - - - - - - - - - -- - - - - x 2 33

Recommend


More recommend