online learning
play

Online Learning Machine Learning 1 Some slides based on lectures - PowerPoint PPT Presentation

Online Learning Machine Learning 1 Some slides based on lectures from Dan Roth, Avrim Blum and others Big picture 2 Big picture Last lecture: Linear models 3 Big picture Linear models How good is a learning algorithm? 4 Big picture


  1. Online Learning Machine Learning 1 Some slides based on lectures from Dan Roth, Avrim Blum and others

  2. Big picture 2

  3. Big picture Last lecture: Linear models 3

  4. Big picture Linear models How good is a learning algorithm? 4

  5. Big picture Linear models Online learning How good is a learning algorithm? 5

  6. Big picture Linear models Perceptron, Winnow Online learning How good is a learning algorithm? 6

  7. Big picture Linear models …. Perceptron, Support Vector Winnow Machines PAC, Online …. Empirical Risk learning Minimization How good is a learning algorithm? 7

  8. Mistake bound learning • The mistake bound model • A proof of concept mistake bound algorithm: The Halving algorithm • Examples • Representations and ease of learning 8

  9. Coming up… • Mistake-driven learning • Learning algorithms for learning a linear function over the feature space – Perceptron (with many variants) – General Gradient Descent view Issues to watch out for – Importance of Representation – Complexity of Learning – More about features 9

  10. Mistake bound learning • The mistake bound model • A proof of concept mistake bound algorithm: The Halving algorithm • Examples • Representations and ease of learning 10

  11. Motivation Consider a learning problem in a very high dimensional space {𝑦 ! , 𝑦 " , ⋯ , 𝑦 !###### } And assume that the function space is very sparse (the function of interest depends on a small number of attributes.) 𝑔 = 𝑦 " ∧ 𝑦 $ ∧ 𝑦 % ∧ 𝑦 & ∧ 𝑦 !## Middle Eastern deserts are known for their sweetness Can we develop an algorithm that depends only weakly on the • dimensionality and mostly on the number of relevant attributes? How should we represent the hypothesis? • 11

  12. Motivation Consider a learning problem in a very high dimensional space {𝑦 ! , 𝑦 " , ⋯ , 𝑦 !###### } And assume that the function space is very sparse (the function of interest depends on a small number of attributes.) 𝑔 = 𝑦 " ∧ 𝑦 $ ∧ 𝑦 % ∧ 𝑦 & ∧ 𝑦 !## Middle Eastern deserts are known for their sweetness Can we develop an algorithm that depends only weakly on the • dimensionality and mostly on the number of relevant attributes? How should we represent the hypothesis? • 12

  13. Motivation Consider a learning problem in a very high dimensional space {𝑦 ! , 𝑦 " , ⋯ , 𝑦 !###### } And assume that the function space is very sparse (the function of interest depends on a small number of attributes.) 𝑔 = 𝑦 " ∧ 𝑦 $ ∧ 𝑦 % ∧ 𝑦 & ∧ 𝑦 !## Middle Eastern deserts are known for their sweetness Can we develop an algorithm that depends only weakly on the • dimensionality and mostly on the number of relevant attributes? How should we represent the hypothesis? • 13

  14. Motivation Consider a learning problem in a very high dimensional space {𝑦 ! , 𝑦 " , ⋯ , 𝑦 !###### } And assume that the function space is very sparse (the function of interest depends on a small number of attributes.) 𝑔 = 𝑦 " ∧ 𝑦 $ ∧ 𝑦 % ∧ 𝑦 & ∧ 𝑦 !## Middle Eastern deserts are known for their sweetness Can we develop an algorithm that depends only weakly on the • dimensionality and mostly on the number of relevant attributes? How should we represent the hypothesis? • 14

  15. An illustration of mistake driven learning Learner One example: x Prediction ℎ ! (x) Current hypothesis ℎ ! Loop forever: 1. Receive example x 2. Make a prediction using the current hypothesis ℎ ! (x) 3. Receive the true label for x . 4. If ℎ ! (x) is not correct, then: • Update ℎ ! to ℎ !"# 15

  16. An illustration of mistake driven learning Learner One example: x Prediction ℎ ! (x) Current hypothesis ℎ ! Loop forever: 1. Receive example x 2. Make a prediction using the current hypothesis ℎ ! (x) 3. Receive the true label for x . 4. If ℎ ! (x) is not correct, then: • Update ℎ ! to ℎ !"# Only need to define how prediction and update behave Can such a simple scheme work? How do we quantify what “work” means? 16

  17. Mistake bound algorithms Setting: • – Instance space: 𝒴 (dimensionality 𝑜 ) – Target 𝑔: 𝒴 → 0,1 , 𝑔 ∈ 𝐷 the concept class (parameterized by 𝑜 ) Learning Protocol: • – Learner is given 𝐲 ∈ 𝒴 , randomly chosen – Learner predicts ℎ(𝐲) and is then given 𝑔 𝐲 ⟵ the feedback Performance: learner makes a mistake when ℎ 𝐲 ≠ 𝑔(𝑦) • – 𝑁 ! 𝑔, 𝑇 : Number of mistakes algorithm 𝐵 makes on sequence 𝑇 of examples for the target function 𝑔 – 𝑁 ! 𝐷 = max "∈$ 𝑁 ! 𝑔, 𝑇 : The maximum possible number of mistakes made by 𝐵 for any target function in 𝐷 and any sequence S of examples Algorithm 𝐵 is a mistake bound algorithm for the concept class 𝐷 if • 𝑁 𝐵 (𝐷) is a polynomial in the dimensionality 𝑜 17

  18. Mistake bound algorithms Setting: • – Instance space: 𝒴 (dimensionality 𝑜 ) – Target 𝑔: 𝒴 → 0,1 , 𝑔 ∈ 𝐷 the concept class (parameterized by 𝑜 ) Learning Protocol: • – Learner is given 𝐲 ∈ 𝒴 , randomly chosen – Learner predicts ℎ(𝐲) and is then given 𝑔 𝐲 ⟵ the feedback Performance: learner makes a mistake when ℎ 𝐲 ≠ 𝑔(𝑦) • – 𝑁 ! 𝑔, 𝑇 : Number of mistakes algorithm 𝐵 makes on sequence 𝑇 of examples for the target function 𝑔 – 𝑁 ! 𝐷 = max "∈$ 𝑁 ! 𝑔, 𝑇 : The maximum possible number of mistakes made by 𝐵 for any target function in 𝐷 and any sequence S of examples Algorithm 𝐵 is a mistake bound algorithm for the concept class 𝐷 if • 𝑁 𝐵 (𝐷) is a polynomial in the dimensionality 𝑜 18

  19. Mistake bound algorithms Setting: • – Instance space: 𝒴 (dimensionality 𝑜 ) – Target 𝑔: 𝒴 → 0,1 , 𝑔 ∈ 𝐷 the concept class (parameterized by 𝑜 ) Learning Protocol: • – Learner is given 𝐲 ∈ 𝒴 , randomly chosen – Learner predicts ℎ(𝐲) and is then given 𝑔 𝐲 ⟵ the feedback Performance: learner makes a mistake when ℎ 𝐲 ≠ 𝑔(𝑦) • – 𝑁 ! 𝑔, 𝑇 : Number of mistakes algorithm 𝐵 makes on sequence 𝑇 of examples for the target function 𝑔 – 𝑁 ! 𝐷 = max "∈$ 𝑁 ! 𝑔, 𝑇 : The maximum possible number of mistakes made by 𝐵 for any target function in 𝐷 and any sequence S of examples Algorithm 𝐵 is a mistake bound algorithm for the concept class 𝐷 if • 𝑁 𝐵 (𝐷) is a polynomial in the dimensionality 𝑜 19

  20. Mistake bound algorithms Setting: • – Instance space: 𝒴 (dimensionality 𝑜 ) – Target 𝑔: 𝒴 → 0,1 , 𝑔 ∈ 𝐷 the concept class (parameterized by 𝑜 ) Learning Protocol: • – Learner is given 𝐲 ∈ 𝒴 , randomly chosen – Learner predicts ℎ(𝐲) and is then given 𝑔 𝐲 ⟵ the feedback Performance: learner makes a mistake when ℎ 𝐲 ≠ 𝑔(𝑦) • – 𝑁 ! 𝑔, 𝑇 : Number of mistakes algorithm 𝐵 makes on sequence 𝑇 of examples for the target function 𝑔 – 𝑁 ! 𝐷 = max "∈$ 𝑁 ! 𝑔, 𝑇 : The maximum possible number of mistakes made by 𝐵 for any target function in 𝐷 and any sequence S of examples Algorithm 𝐵 is a mistake bound algorithm for the concept class 𝐷 if • 𝑁 𝐵 (𝐷) is a polynomial in the dimensionality 𝑜 20

  21. Mistake bound algorithms Setting: • – Instance space: 𝒴 (dimensionality 𝑜 ) – Target 𝑔: 𝒴 → 0,1 , 𝑔 ∈ 𝐷 the concept class (parameterized by 𝑜 ) Learning Protocol: • – Learner is given 𝐲 ∈ 𝒴 , randomly chosen – Learner predicts ℎ(𝐲) and is then given 𝑔 𝐲 ⟵ the feedback Performance: learner makes a mistake when ℎ 𝐲 ≠ 𝑔(𝑦) • – 𝑁 ! 𝑔, 𝑇 : Number of mistakes algorithm 𝐵 makes on sequence 𝑇 of examples for the target function 𝑔 – 𝑁 ! 𝐷 = max "∈$ 𝑁 ! 𝑔, 𝑇 : The maximum possible number of mistakes made by 𝐵 for any target function in 𝐷 and any sequence S of examples Algorithm 𝐵 is a mistake bound algorithm for the concept class 𝐷 if • 𝑁 𝐵 (𝐷) is a polynomial in the dimensionality 𝑜 21

  22. Mistake bound algorithms Setting: • – Instance space: 𝒴 (dimensionality 𝑜 ) – Target 𝑔: 𝒴 → 0,1 , 𝑔 ∈ 𝐷 the concept class (parameterized by 𝑜 ) Learning Protocol: • – Learner is given 𝐲 ∈ 𝒴 , randomly chosen – Learner predicts ℎ(𝐲) and is then given 𝑔 𝐲 ⟵ the feedback Performance: learner makes a mistake when ℎ 𝐲 ≠ 𝑔(𝑦) • – 𝑁 ! 𝑔, 𝑇 : Number of mistakes algorithm 𝐵 makes on sequence 𝑇 of examples for the target function 𝑔 – 𝑁 ! 𝐷 = max "∈$ 𝑁 ! 𝑔, 𝑇 : The maximum possible number of mistakes made by 𝐵 for any target function in 𝐷 and any sequence S of examples Algorithm 𝐵 is a mistake bound algorithm for the concept class 𝐷 if • 𝑁 𝐵 (𝐷) is a polynomial in the dimensionality 𝑜 22

  23. Learnability in the mistake bound model • Algorithm 𝐵 is a mistake bound algorithm for the concept class 𝐷 if 𝑁 𝐵 (𝐷) is a polynomial in the dimensionality 𝑜 – That is, the maximum number of mistakes it makes for any sequence of inputs (perhaps even an adversarially chosen one) is polynomial in the dimensionality 23

Recommend


More recommend