Computational Learning Theory: Occams Razor Machine Learning 1 - PowerPoint PPT Presentation

Computational Learning Theory: Occam’s Razor Machine Learning 1 Slides based on material from Dan Roth, Avrim Blum, Tom Mitchell and others

This lecture: Computational Learning Theory • The Theory of Generalization • Probably Approximately Correct (PAC) learning • Positive and negative learnability results • Agnostic Learning • Shattering and the VC dimension 2

Where are we? • The Theory of Generalization • Probably Approximately Correct (PAC) learning • Positive and negative learnability results • Agnostic Learning • Shattering and the VC dimension 3

This section 1. Define the PAC model of learning 2. Make formal connections to the principle of Occam’s razor 4

This section ü Define the PAC model of learning 2. Make formal connections to the principle of Occam’s razor 5

Occam’s Razor Named after William of Occam – AD 1300s Prefer simpler explanations over more complex ones “Numquam ponenda est pluralitas sine necessitate” (Never posit plurality without necessity.) Historically, a widely prevalent idea across different schools of philosophy 6

Why would a consistent learner fail? Consistent learner: Suppose we have a learner that produces a hypothesis that is consistent with a training set... … but the training set is not a representative sample of the instance space. Then the hypothesis we learned could be bad even if it is consistent with the entire training set. We can try to 1. quantify the probability of such a bad situation occurring and, 2. then, bound the probability to be low. 7

Why would a consistent learner fail? Consistent learner: Suppose we have a learner that produces a hypothesis that is consistent with a training set... … but the training set is not a representative sample of the instance space. Then the hypothesis we learned could be bad even if it is consistent with the entire training set. We can try to 1. quantify the probability of such a bad situation occurring and, 2. then, ask what will it take for this probability to be low. 8

Towards formalizing Occam’s Razor Claim : The probability that there is a hypothesis ℎ ∈ 𝐼 that: 1. Is Consistent with 𝑛 examples, and 2. Has 𝐹𝑠𝑠 ! ℎ > 𝜗 is less than 𝐼 1 − 𝜗 " Proof : Let ℎ be such a bad hypothesis that has an error > 𝜗 Probability that ℎ is consistent with one example is Pr 𝑔 𝑦 = ℎ 𝑦 < 1 − 𝜗 The training set consists of 𝑛 examples drawn independently So, probability that ℎ is consistent with 𝑛 examples < 1 − 𝜗 " Probability that some bad hypothesis in 𝐼 is consistent with 𝑛 examples is less than 𝐼 1 − 𝜗 " 9

Towards formalizing Occam’s Razor (Assuming consistency) Claim : The probability that there is a hypothesis ℎ ∈ 𝐼 that: 1. Is Consistent with 𝑛 examples, and 2. Has 𝐹𝑠𝑠 ! ℎ > 𝜗 is less than 𝐼 1 − 𝜗 " Proof : Let ℎ be such a bad hypothesis that has an error > 𝜗 Probability that ℎ is consistent with one example is Pr 𝑔 𝑦 = ℎ 𝑦 < 1 − 𝜗 The training set consists of 𝑛 examples drawn independently So, probability that ℎ is consistent with 𝑛 examples < 1 − 𝜗 " Probability that some bad hypothesis in 𝐼 is consistent with 𝑛 examples is less than 𝐼 1 − 𝜗 " 10

Towards formalizing Occam’s Razor (Assuming consistency) Claim : The probability that there is a hypothesis ℎ ∈ 𝐼 that: 1. Is Consistent with 𝑛 examples, and That is, consistent yet bad 2. Has 𝐹𝑠𝑠 ! ℎ > 𝜗 is less than 𝐼 1 − 𝜗 " Proof : Let ℎ be such a bad hypothesis that has an error > 𝜗 Probability that ℎ is consistent with one example is Pr 𝑔 𝑦 = ℎ 𝑦 < 1 − 𝜗 The training set consists of 𝑛 examples drawn independently So, probability that ℎ is consistent with 𝑛 examples < 1 − 𝜗 " Probability that some bad hypothesis in 𝐼 is consistent with 𝑛 examples is less than 𝐼 1 − 𝜗 " 11

Towards formalizing Occam’s Razor (Assuming consistency) Claim : The probability that there is a hypothesis ℎ ∈ 𝐼 that: 1. Is Consistent with 𝑛 examples, and That is, consistent yet bad 2. Has 𝐹𝑠𝑠 ! ℎ > 𝜗 is less than 𝐼 1 − 𝜗 " Proof : Let ℎ be such a bad hypothesis that has an error > 𝜗 Probability that ℎ is consistent with one example is Pr 𝑔 𝑦 = ℎ 𝑦 < 1 − 𝜗 The training set consists of 𝑛 examples drawn independently So, probability that ℎ is consistent with 𝑛 examples < 1 − 𝜗 " Probability that some bad hypothesis in 𝐼 is consistent with 𝑛 examples is less than 𝐼 1 − 𝜗 " Union bound For a set of events, the probability that at least one of them 15 happens < the sum of the probabilities of the individual events

Occam’s Razor The probability that there is a hypothesis ℎ ∈ 𝐼 that: 1. Is Consistent with 𝑛 examples, and This situation is a bad one. Let us try to see what we need to do to 2. Has 𝐹𝑠𝑠 ! ℎ > 𝜗 ensure that this situation is rare. is less than 𝐼 1 − 𝜗 " We want to make this probability small, say smaller than 𝜀 𝐼 1 − 𝜗 " < 𝜀 log 𝐼 + 𝑛 log 1 − 𝜗 < log 𝜀 We know that 𝑓 #$ = 1 − 𝑦 + $ ! % − $ " & … > 1 − 𝑦 Let’s use log 1 − 𝜗 < −𝜗 to get a safer 𝜀 16

Occam’s Razor The probability that there is a hypothesis ℎ ∈ 𝐼 that: 1. Is Consistent with 𝑛 examples, and 2. Has 𝐹𝑠𝑠 ! ℎ > 𝜗 is less than 𝐼 1 − 𝜗 " We want to make this probability small, say smaller than 𝜀 𝐼 1 − 𝜗 " < 𝜀 log 𝐼 + 𝑛 log 1 − 𝜗 < log 𝜀 We know that 𝑓 #$ = 1 − 𝑦 + $ ! % − $ " & … > 1 − 𝑦 Let’s use log 1 − 𝜗 < −𝜗 to get a safer 𝜀 17

Occam’s Razor The probability that there is a hypothesis ℎ ∈ 𝐼 that: 1. Is Consistent with 𝑛 examples, and 2. Has 𝐹𝑠𝑠 ! ℎ > 𝜗 is less than 𝐼 1 − 𝜗 " We want to make this probability small, say smaller than 𝜀 𝐼 1 − 𝜗 " < 𝜀 log 𝐼 + 𝑛 log 1 − 𝜗 < log 𝜀 We know that 𝑓 #$ = 1 − 𝑦 + $ ! % − $ " & … > 1 − 𝑦 If 𝜀 is small, then the probability that there is a consistent, yet Let’s use log 1 − 𝜗 < −𝜗 to get a safer 𝜀 bad hypothesis would also be small (because of this inequality) 20

Computational Learning Theory: Occams Razor Machine Learning 1 - PowerPoint PPT Presentation

Computational Learning Theory: Occams Razor Machine Learning 1 Slides based on material from Dan Roth, Avrim Blum, Tom Mitchell and others This lecture: Computational Learning Theory The Theory of Generalization Probably

The simpler the better: Thinning out MIP's by Occam's razor Matteo Fischetti, University of

occam 1.04159. . . Adam Sampson ats1@kent.ac.uk University of Kent http://www.cs.kent.ac.uk/

Razor and ReCycle A M E E N A K E L Razor Razor Motivation Power Todays designs

Improving Forecasts of Extreme Values By Machine Learning Models Using Occam's Razor William W.

occwserv: An occam Web-Server (version 2) Fred Barnes ( frmb2@ukc.ac.uk ) Computing Laboratory,

Compiling occam to C with Tock Adam Sampson ats@offog.org University of Kent

Evidence and Occams razor Based on David J.C. MacKay: Information Theory and Learning

Inductive Learning and Ockhams Razor Konstantin Genin Kevin T. Kelly Carnegie Mellon

Mobile Escape Analysis for occam-pi CPA-2009 Fred Barnes School of Computing, University of

RMoX: A Raw Metal occam Experiment Fred Barnes ( frmb2@ukc.ac.uk ) Christian Jacobsen (

Making music with occam- Adam Sampson ats@offog.org University of Kent

Occam : Automated Software Winnowing Gregory Malecha 1 Ashish Gehani 2 Natarajan Shankar 2 1

Efficient Tensor Decomposition and Its Application Naoki KAWASHIMA (ISSP) Dec. 3, 2018 Occam's

All models are wrong, but some are useful George Box London open spaces expenditure and

CSC2541 Lecture 2 Bayesian Occams Razor and Gaussian Processes Roger Grosse Roger Grosse

Stochastic Analysis of Bubble Razor Guowei Zhang Peter A. Beerel Department of Microelectronics

Image servers and IIIF Robert Casties, MPI for History of Science, Berlin Digital images as

TH E ROYAL CAN AD IAN N U M IS M ATIC AS S OCIATION S lid e S e t Co lle c tio n H o w t o Or

Political Sociology Week 3: Ethnicity Michaelmas 2019 Dr Anna Krausova Definitions Race:

Early Twentieth-Century Fiction e20fic19.blogs.rutgers.edu Prof. Andrew Goldstone

CSE 527 Lecture 10 Parsimony and Phylogenetic Footprinting Phylogenies (aka Evolutionary

Infinite Models Zoubin Ghahramani Center for Automated Learning and Discovery Carnegie Mellon

Learning From Data Lecture 15 Reflecting on Our Path - Epilogue to Part I What We Did The

Decision Tree Learning: Part 1 Yingyu Liang Computer Sciences 760 Fall 2017