learning from data lecture 14 three learning principles
play

Learning From Data Lecture 14 Three Learning Principles Occams - PowerPoint PPT Presentation

Learning From Data Lecture 14 Three Learning Principles Occams Razor Sampling Bias Data Snooping M. Magdon-Ismail CSCI 4100/6100 recap: Validation and Cross Validation Validation Cross Validation D ( N ) D 1 D 2 D N D train D


  1. Learning From Data Lecture 14 Three Learning Principles Occam’s Razor Sampling Bias Data Snooping M. Magdon-Ismail CSCI 4100/6100

  2. recap: Validation and Cross Validation Validation Cross Validation D ( N ) D 1 D 2 · · · D N D train D ( N − K ) · · · g 1 g 2 g N ( x 1 , y 1 ) ( x 2 , y 2 ) ( x N , y N ) g D val · · · ( K ) e 1 e 2 e N � �� � take average g E val ( g ) g E cv · · · H 1 H 2 H 3 H M Model Selection − − − − − − − − − − − − → → → → · · · g 1 g 2 g 3 g M M Three Learning Principles : 2 /58 � A c L Creator: Malik Magdon-Ismail Occam, bias, snooping − →

  3. We Will Discuss . . . • Occam’s Razor : pick a model carefully • Sampling Bias : generate the data carefuly • Data Snooping : handle the data carefully M Three Learning Principles : 3 /58 � A c L Creator: Malik Magdon-Ismail Occam’s Razor − →

  4. Occam’s Razor M Three Learning Principles : 4 /58 � A c L Creator: Malik Magdon-Ismail Occam − →

  5. Occam’s Razor use a ‘ razor ’ to ‘trim down’ “an explanation of the data to make it as simple as possible but no simpler.” attributed to William of Occam (14th Century) and often mistakenly to Einstein M Three Learning Principles : 5 /58 � A c L Creator: Malik Magdon-Ismail Simpler is Better − →

  6. Simpler is Better The simplest model that fits the data is also the most plausible. . . . or, beware of using complex models to fit data M Three Learning Principles : 6 /58 � A c L Creator: Malik Magdon-Ismail What is Simpler? − →

  7. What is Simpler? simple hypothesis h simple hypothesis set H Ω( h ) Ω( H ) low order polynomial H with small d vc hypothesis with small weights small number of hypotheses easily described hypothesis low entropy set . . . . . . The equivalence: A hypothesis set with simple hypotheses must be small We had a glimpse of this: λ soft order constraint (smaller H ) → minimize E aug (favors simpler h ) . ← − − − − M Three Learning Principles : 7 /58 � A c L Creator: Malik Magdon-Ismail What is Simpler? − →

  8. What is Simpler? simple hypothesis h simple hypothesis set H Ω( h ) Ω( H ) low order polynomial H with small d vc hypothesis with small weights small number of hypotheses easily described hypothesis low entropy set . . . . . . The equivalence: A hypothesis set with simple hypotheses must be small We had a glimpse of this: λ soft order constraint (smaller H ) → minimize E aug (favors simpler h ) . ← − − − − M Three Learning Principles : 8 /58 � A c L Creator: Malik Magdon-Ismail What is Simpler? − →

  9. What is Simpler? simple hypothesis h simple hypothesis set H Ω( h ) Ω( H ) low order polynomial H with small d vc hypothesis with small weights small number of hypotheses easily described hypothesis low entropy set . . . . . . The equivalence: A hypothesis set with simple hypotheses must be small We had a glimpse of this: λ soft order constraint (smaller H ) ← − − − − → minimize E aug (favors simpler h ) . M Three Learning Principles : 9 /58 � A c L Creator: Malik Magdon-Ismail Why is Simpler Better − →

  10. Why is Simpler Better Mathematically: simple curtails ability to fit noise, VC-dimension is small, and blah and blah . . . simpler is better because you will be more “surprised” when you fit the data. If something unlikely happens, it is very significant when it happens. . . . “Is there any other point to which you would wish to draw my attention?” Detective Gregory: Sherlock Holmes: “To the curious incident of the dog in the night-time.” “The dog did nothing in the night-time.” Detective Gregory: “That was the curious incident.” Sherlock Holmes: . . . – Silver Blaze , Sir Arthur Conan Doyle M Three Learning Principles : 10 /58 � A c L Creator: Malik Magdon-Ismail Scientific Experiment − →

  11. A Scientific Experiment Axiom. If an experiment has no chance of falsifying a hypothesis, then the result of that experiment provides no evidence one way or the other for the hypothesis. Scientist 3 resistivity ρ temperature T no evidence very convincing some evidence? Who provides most evidence for the hypothesis “ ρ is linear in T ”? M Three Learning Principles : 11 /58 � A c L Creator: Malik Magdon-Ismail Scientific Experiment − →

  12. A Scientific Experiment Axiom. If an experiment has no chance of falsifying a hypothesis, then the result of that experiment provides no evidence one way or the other for the hypothesis. Scientist 2 Scientist 3 resistivity ρ resistivity ρ temperature T temperature T no evidence very convincing some evidence? Who provides most evidence for the hypothesis “ ρ is linear in T ”? M Three Learning Principles : 12 /58 � A c L Creator: Malik Magdon-Ismail Scientific Experiment − →

  13. A Scientific Experiment Axiom. If an experiment has no chance of falsifying a hypothesis, then the result of that experiment provides no evidence one way or the other for the hypothesis. Scientist 1 Scientist 2 Scientist 3 resistivity ρ resistivity ρ resistivity ρ temperature T temperature T temperature T no evidence very convincing some evidence? Who provides most evidence for the hypothesis “ ρ is linear in T ”? M Three Learning Principles : 13 /58 � A c L Creator: Malik Magdon-Ismail Scientific Experiment − →

  14. A Scientific Experiment Axiom. If an experiment has no chance of falsifying a hypothesis, then the result of that experiment provides no evidence one way or the other for the hypothesis. Scientist 1 Scientist 2 Scientist 3 resistivity ρ resistivity ρ resistivity ρ temperature T temperature T temperature T no evidence very convincing some evidence? Who provides most evidence for the hypothesis “ ρ is linear in T ”? M Three Learning Principles : 14 /58 � A c L Creator: Malik Magdon-Ismail Scientist 2 vs. 3 − →

  15. Scientist 2 Versus Scientist 3 Axiom. If an experiment has no chance of falsifying a hypothesis, then the result of that experiment provides no evidence one way or the other for the hypothesis. Scientist 1 Scientist 2 Scientist 3 resistivity ρ resistivity ρ resistivity ρ temperature T temperature T temperature T no evidence very convincing some evidence? Who provides most evidence? M Three Learning Principles : 15 /58 � A c L Creator: Malik Magdon-Ismail Scientist 1 vs. 3 − →

  16. Scientist 1 versus Scientist 3 Axiom. If an experiment has no chance of falsifying a hypothesis, then the result of that experiment provides no evidence one way or the other for the hypothesis. Scientist 1 Scientist 2 Scientist 3 resistivity ρ resistivity ρ resistivity ρ temperature T temperature T temperature T no evidence very convincing some evidence? Who provides most evidence? M Three Learning Principles : 16 /58 � A c L Creator: Malik Magdon-Ismail Non-Falsifiability − →

  17. Axiom of Non-Falsifiability Axiom. If an experiment has no chance of falsifying a hypothesis, then the result of that experiment provides no evidence one way or the other for the hypothesis. Scientist 1 Scientist 2 Scientist 3 resistivity ρ resistivity ρ temperature T temperature T no evidence very convincing some evidence? Who provides most evidence? M Three Learning Principles : 17 /58 � A c L Creator: Malik Magdon-Ismail Falsification and m H ( N ) − →

  18. Falsification and m H ( N ) If H shatters x 1 , · · · , x N , – Don’t be surprised if you fit the data. – Can’t falsify “ H is a good set of candidate hypotheses for f ”. If H doesn’t shatter x 1 , · · · , x N , and the target values are uniformly distributed, P [falsification] ≥ 1 − m H ( N ) . 2 N A good fit is surprising with simple H , hence significant. You can, but didn’t falsify “ H is a good set of candidate hypotheses for f ” The data must have a chance to win. M Three Learning Principles : 18 /58 � A c L Creator: Malik Magdon-Ismail Falsification and m H ( N ) − →

  19. Falsification and m H ( N ) If H shatters x 1 , · · · , x N , – Don’t be surprised if you fit the data. – Can’t falsify “ H is a good set of candidate hypotheses for f ”. If H doesn’t shatter x 1 , · · · , x N , and the target values are uniformly distributed, P [falsification] ≥ 1 − m H ( N ) . 2 N A good fit is surprising with simple H , hence significant. You can, but didn’t falsify “ H is a good set of candidate hypotheses for f ” The data must have a chance to win. M Three Learning Principles : 19 /58 � A c L Creator: Malik Magdon-Ismail Falsification and m H ( N ) − →

  20. Falsification and m H ( N ) If H shatters x 1 , · · · , x N , – Don’t be surprised if you fit the data. – Can’t falsify “ H is a good set of candidate hypotheses for f ”. If H doesn’t shatter x 1 , · · · , x N , and the target values are uniformly distributed, P [falsification] ≥ 1 − m H ( N ) . 2 N A good fit is surprising with simple H , hence significant. You can, but didn’t falsify “ H is a good set of candidate hypotheses for f ” The data must have a chance to win. M Three Learning Principles : 20 /58 � A c L Creator: Malik Magdon-Ismail Beyond Occam − →

Recommend


More recommend