Computational Learning Theory: Probably Approximately Correct (PAC) - PowerPoint PPT Presentation

Computational Learning Theory: Probably Approximately Correct (PAC) Learning Machine Learning 1 Slides based on material from Dan Roth, Avrim Blum, Tom Mitchell and others

Computational Learning Theory • The Theory of Generalization • Probably Approximately Correct (PAC) learning • Positive and negative learnability results • Agnostic Learning • Shattering and the VC dimension 2

Where are we? • The Theory of Generalization • Probably Approximately Correct (PAC) learning • Positive and negative learnability results • Agnostic Learning • Shattering and the VC dimension 3

This section 1. Define the PAC model of learning 2. Make formal connections to the principle of Occam’s razor 4

This section 1. Define the PAC model of learning 2. Make formal connections to the principle of Occam’s razor 5

Recall: The setup Instance Space: 𝑌 , the set of examples • Concept Space: 𝐷 , the set of possible target functions: 𝑔 ∈ 𝐷 is the hidden • target function – Eg: all 𝑜 -conjunctions; all 𝑜 -dimensional linear functions, … Hypothesis Space: 𝐼 , the set of possible hypotheses • This is the set that the learning algorithm explores – Training instances: 𝑇×{−1,1} : positive and negative examples of the target • concept. ( 𝑇 is a finite subset of 𝑌 ) – Training instances are generated by a fixed unknown probability distribution 𝐸 over 𝑌 What we want: A hypothesis h ∈ 𝐼 such that ℎ 𝑦 = 𝑔(𝑦) • – Evaluate h on subsequent examples 𝑦 ∈ 𝑌 drawn according to 𝐸 6

Formulating the theory of prediction All the notation we have so far on one slide In the general case, we have – 𝑌 : instance space, 𝑍 : output space = {+1, -1} – 𝐸 : an unknown distribution over 𝑌 – 𝑔 : an unknown target function X → 𝑍 , taken from a concept class 𝐷 – ℎ : a hypothesis function X → 𝑍 that the learning algorithm selects from a hypothesis class 𝐼 – 𝑇 : a set of m training examples drawn from 𝐸 , labeled with f – err ! ℎ : The true error of any hypothesis ℎ – err " ℎ : The empirical error or training error or observed error of ℎ 7

Theoretical questions • Can we describe or bound the true error (err D ) given the empirical error (err S )? • Is a concept class C learnable? • Is it possible to learn C using only the functions in H using the supervised protocol? • How many examples does an algorithm need to guarantee good performance? 8

Expectations of learning • We cannot expect a learner to learn a concept exactly – There will generally be multiple concepts consistent with the available data (which represent a small fraction of the available instance space) – Unseen examples could potentially have any label – Let’s “agree” to misclassify uncommon examples that do not show up in the training set • We cannot always expect to learn a close approximation to the target concept – Sometimes (hopefully only rarely) the training set will not be representative (will contain uncommon examples) 9

Expectations of learning • We cannot expect a learner to learn a concept exactly – There will generally be multiple concepts consistent with the available data (which represent a small fraction of the available instance space) – Unseen examples could potentially have any label – Let’s “agree” to misclassify uncommon examples that do not show up in the training set • We cannot always expect to learn a close approximation to the target concept – Sometimes (hopefully only rarely) the training set will not be representative (will contain uncommon examples) 10

Expectations of learning • We cannot expect a learner to learn a concept exactly – There will generally be multiple concepts consistent with the available data (which represent a small fraction of the available instance space) – Unseen examples could potentially have any label The only realistic expectation of a good learner is – Let’s “agree” to misclassify uncommon examples that do not that with high probability it will learn a close show up in the training set approximation to the target concept • We cannot always expect to learn a close approximation to the target concept – Sometimes (hopefully only rarely) the training set will not be representative (will contain uncommon examples) 11

Probably approximately correctness The only realistic expectation of a good learner is that with high probability it will learn a close approximation to the target concept • In Probably Approximately Correct (PAC) learning, one requires that – given small parameters ² and ±, – With probability at least 1 - ±, a learner produces a hypothesis with error at most ² • The only reason we can hope for this is the consistent distribution assumption 12

Probably approximately correctness The only realistic expectation of a good learner is that with high probability it will learn a close approximation to the target concept • In Probably Approximately Correct (PAC) learning, one requires that – given small parameters 𝜗 and 𝜀 , – With probability at least 1 − 𝜀 , a learner produces a hypothesis with error at most 𝜗 • The only reason we can hope for this is the consistent distribution assumption 13

Probably approximately correctness The only realistic expectation of a good learner is that with high probability it will learn a close approximation to the target concept • In Probably Approximately Correct (PAC) learning, one requires that – given small parameters 𝜗 and 𝜀 , – With probability at least 1 − 𝜀 , a learner produces a hypothesis with error at most 𝜗 • The only reason we can hope for this is the consistent distribution assumption 14

PAC Learnability Consider a concept class 𝐷 defined over an instance space 𝑌 (containing instances of length 𝑜 ), and a learner 𝑀 using a hypothesis space 𝐼 The concept class 𝐷 is PAC learnable by 𝑀 using 𝐼 if for all 𝑔 ∈ 𝐷 , for all distribution 𝐸 over 𝑌 , and fixed 0 < 𝜗, 𝜀 < 1 , given 𝑛 examples sampled independently according to 𝐸 , with probability at least (1 − 𝜀) , the algorithm 𝑀 produces a hypothesis ℎ ∈ 𝐼 that has error at most 𝜗 , where 𝑛 is polynomial in ⁄ 1 𝜗 , ⁄ 1 𝜀 , 𝑜 and 𝑡𝑗𝑨𝑓(𝐼) . recall that Err D (h) = Pr D [f(x) ≠ h(x)] The concept class 𝐷 is efficiently learnable if 𝑀 can produce the hypothesis in time that is polynomial in ⁄ 1 𝜗 , ⁄ 1 𝜀 , 𝑜 and 𝑡𝑗𝑨𝑓(𝐼) . 15

Computational Learning Theory: Probably Approximately Correct (PAC) - PowerPoint PPT Presentation

Computational Learning Theory: Probably Approximately Correct (PAC) Learning Machine Learning 1 Slides based on material from Dan Roth, Avrim Blum, Tom Mitchell and others Computational Learning Theory The Theory of Generalization

Computational Learning Theory: Agnostic Learning Machine Learning 1 Slides based on material

Computational Learning Theory: Positive and negative learnability results Machine Learning 1

Computational Learning Theory: Occams Razor Machine Learning 1 Slides based on material from

Computational Learning Theory: Shattering and VC Dimensions Machine Learning 1 Slides based on

Computational Learning Theory For which tasks is successful learning possible? Under what

MACHINE LEARNING Probably Approximately Correct (PAC) Learning Alessandro Moschitti Department

CS485/685 Lecture 15: Feb 28, 2012 Probably Approximately Correct Learning [BDSS] Chapter 1

Rockets engines would be approximately unchanged. A. approximately half as much. B.

Statistical and Computational Statistical and Computational Learning Theory Learning Theory

Probably Approximately Correct (PAC) Selection in Simulation/Best-Arm Problems David Eckman

Computational Learning Theory: An Analysis of a Conjunction Learner Machine Learning Slides

Key Ideas and Architectures in Deep Learning Applications that (probably) use DL Autonomous

Applying Computational Learning Theory to Software Testing Neil Walkinshaw Computational

Computational Learning Theory Based on Machine Learning, T. Mitchell, McGRAW Hill, 1997, ch.

Domain IV Science Approximately 19% of the test Approximately 52 Items 40

Approximately 9,900 Employees Nationwide Approximately 37 Companies Nationwide Over 100

PAC Learning Matt Gormley Lecture 14 Oct. 17, 2018 1 ML Big Picture Learning Paradigms:

PAC Learning Learning Theory Readings: Matt Gormley Murphy -- Bishop

a HIGH IMPEDANCE SENSORS I Photodiode Preamplifiers I Piezoelectric Sensors N Accelerometers N

Conformational Variability Experience with Ribosomes Exploration of reconstruction strategy

PAC Learning prof. dr Arno Siebes Algorithmic Data Analysis Group Department of Information and

A Search for the LHCb Charmed Pentaquark using Photoproduction of J/ at Threshold in Hall

Data Dependent Priors in PAC-Bayes Bounds John Shawe-Taylor University College London Joint work

Program-level Assessment Committee (PAC) Meeting Minutes April 1, 2019 Attendance: Paul Mixon,