Foundations of Machine Learning Learning with Finite Hypothesis - PowerPoint PPT Presentation

Foundations of Machine Learning Learning with Finite Hypothesis Sets

Motivation Some computational learning questions • What can be learned efficiently? • What is inherently hard to learn? • A general model of learning? Complexity • Computational complexity: time and space. • Sample complexity: amount of training data needed to learn successfully. • Mistake bounds: number of mistakes before learning successfully. page Foundations of Machine Learning 2

This lecture PAC Model Sample complexity, finite H , consistent case Sample complexity, finite H , inconsistent case page Foundations of Machine Learning 3

Definitions and Notation : set of all possible instances or examples, e.g., X the set of all men and women characterized by their height and weight. : the target concept to learn; can be c : X → { 0 , 1 } identified with its support . { x ∈ X : c ( x )=1 } : concept class, a set of target concepts . C c : target distribution, a fixed probability D distribution over . Training and test examples are X drawn according to . D page 4

Definitions and Notation : training sample. S : set of concept hypotheses, e.g., the set of all H linear classifiers. The learning algorithm receives sample and S selects a hypothesis from approximating . h S H c page 5

Errors True error or generalization error of with h respect to the target concept and distribution : D c R ( h ) = Pr x � D [ h ( x ) � = c ( x )] = x � D [1 h ( x ) � = c ( x ) ] . E Empirical error: average error of on the training h sample drawn according to distribution , D S m � [1 h ( x ) � = c ( x ) ] = 1 � R S ( h ) = Pr [ h ( x ) � = c ( x )] = E 1 h ( x i ) � = c ( x i ) . m x � b x � b D D i =1 � � Note:   � R ( h ) = E R S ( h ) . S ∼ D m page 6

PAC Model (Valiant, 1984) PAC learning: Probably Approximately Correct learning. Definition: concept class is PAC-learnable if there C exists a learning algorithm such that: L • for all and all distributions , c ∈ C, ⇥ > 0 , � > 0 , D S ∼ D m [ R ( h S ) ≤ � ] ≥ 1 − � , Pr • for samples of size for a m = poly (1 / ⇥ , 1 / � ) S fixed polynomial. page 7

Remarks Concept class is known to the algorithm. C Distribution-free model: no assumption on . D Both training and test examples drawn . ∼ D Probably: confidence . 1 − δ Approximately correct: accuracy . 1 − � Efficient PAC-learning: runs in time . poly (1 / � , 1 / � ) L What about the cost of the representation of ? c ∈ C page 8

PAC Model - New Definition Computational representation: • cost for in . O ( n ) x ∈ X • cost for in . O ( size ( c )) c ∈ C Extension: running time. O ( poly (1 / ⇥ , 1 / � )) − → O ( poly (1 / ⇥ , 1 / � , n, size ( c ))) . page 9

Example - Rectangle Learning Problem: learn unknown axis-aligned rectangle R using as small a labeled sample as possible. R’ R Hypothesis: rectangle R’. In general, there may be false positive and false negative points. page 10

Example - Rectangle Learning Simple method: choose tightest consistent rectangle R’ for a large enough sample. How large a sample? Is this class PAC-learnable? R’ R What is the probability that ? R ( R � ) > � page 11

Example - Rectangle Learning Fix and assume (otherwise the result Pr D [ R ] > � � > 0 is trivial). Let be four smallest rectangles along r 1 , r 2 , r 3 , r 4 the sides of such that . Pr D [ r i ] ≥ � R 4 r 1 R =[ l, r ] × [ b, t ] R’ r 4 =[ l, s 4 ] × [ b, t ] r 4 r 2 � � s 4 =inf { s : Pr [ l, s ] × [ b, t ] 4 } ≥ � R � � Pr [ l, s 4 [ × [ b, t ] < � r 3 4 D page 12

Example - Rectangle Learning Errors can only occur in . Thus (geometry), R − R � misses at least one region . R ( R � ) > � ⇒ R � r i Therefore, i =1 { R � misses r i } ] Pr[ R ( R � ) > � ] ≤ Pr[ ∪ 4 4 Pr[ { R � misses r i } ] � ≤ i =1 4 ) m ≤ 4 e � m � 4 . ≤ 4(1 − � r 1 R’ r 4 r 2 R r 3 13 page

Example - Rectangle Learning Set to match the upper bound: δ > 0 4 e − m � 4 ≤ δ ⇔ m ≥ 4 � log 4 � . Then, for , with probability at least , m ≥ 4 � log 4 1 − δ � R ( R � ) ≤ � . r 1 R’ r 4 r 2 R r 3 page 14

Notes Infinite hypothesis set, but simple proof. Does this proof readily apply to other similar concepts classes? Geometric properties: • key in this proof. • in general non-trivial to extend to other classes, e.g., non-concentric circles (see HW2, 2006) . Need for more general proof and results. page 15

This lecture PAC Model Sample complexity, finite H , consistent case Sample complexity, finite H , inconsistent case page 16

Learning Bound for Finite H - Consistent Case Theorem: let be a finite set of functions from H X to and an algorithm that for any target { 0 , 1 } L concept and sample returns a consistent S c ∈ H hypothesis : . Then, for any , with b R S ( h S )=0 δ > 0 h S probability at least , 1 − δ R ( h S ) ≤ 1 m (log | H | + log 1 δ ) . page 17

Learning Bound for Finite H - Consistent Case Proof: for any , define . ✏ > 0 H ✏ = { h ∈ H : R ( h ) > ✏ } Then, h i ∃ h ∈ H ✏ : b Pr R S ( h ) = 0 h i R S ( h 1 ) = 0 ∨ · · · ∨ b b = Pr R S ( h | H ✏ | ) = 0 h i X b Pr R S ( h ) = 0 ( union bound ) ≤ h ∈ H ✏ X (1 − ✏ ) m ≤ | H | (1 − ✏ ) m ≤ | H | e − m ✏ . ≤ h ∈ H ✏ page 18

Remarks The algorithm can be ERM if problem realizable. Error bound linear in and only logarithmic in . 1 1 δ m is the number of bits used for the log 2 | H | representation of . H Bound is loose for large . | H | Uninformative for infinite . | H | page 19

Conjunctions of Boolean Literals Example for . n =6 Algorithm: start with and rule x 1 ∧ x 1 ∧ · · · ∧ x n ∧ x n out literals incompatible with positive examples. 0 1 1 0 1 1 + 0 1 1 1 1 1 + 0 0 1 1 0 1 - 0 1 1 1 1 1 + 1 0 0 1 1 0 - 0 1 0 0 1 1 + 0 1 ? ? 1 1 x 1 ∧ x 2 ∧ x 5 ∧ x 6 . page 20

Conjunctions of Boolean Literals Problem: learning class of conjunctions of C n boolean literals with at most variables (e.g., n for , ). n =3 x 1 ∧ x 2 ∧ x 3 Algorithm: choose consistent with . h S • Since , sample complexity: | H | = | C n | =3 n m ≥ 1 ⇥ ((log 3) n + log 1 � ) . � = . 02 , ⇥ = . 1 , n =10 , m ≥ 149 . • Computational complexity: polynomial, since algorithmic cost per training example is in . O ( n ) page 21

This lecture PAC Model Sample complexity, finite H , consistent case Sample complexity, finite H , inconsistent case page 22

Inconsistent Case No is a consistent hypothesis. h ∈ H The typical case in practice: difficult problems, complex concept class. But, inconsistent hypotheses with a small number of errors on the training set can be useful. Need a more powerful tool: Hoeffding’s inequality. page 23

Hoeffding’s Inequality Corollary: for any and any hypothesis h : X → { 0 , 1 } � > 0 the following inequalities holds: R ( h ) ≥ � ] ≤ e − 2 m � 2 Pr[ R ( h ) − � R ( h ) − R ( h ) ≥ � ] ≤ e − 2 m � 2 . Pr[ � Combining these one-sided inequalities yields R ( h ) | ≥ � ] ≤ 2 e − 2 m � 2 . Pr[ | R ( h ) − � page 24

Application to Learning Algorithm? Can we apply that bound to the hypothesis h S returned by our learning algorithm when training on sample ? S No, because is not a fixed hypothesis, it depends h S on the training sample. Note also that E[ � R ( h S )] is not a simple quantity such as . R ( h S ) Instead, we need a bound that holds simultaneously for all hypotheses , a uniform convergence h ∈ H bound. page 25

Generalization Bound - Finite H Theorem: let be a finite hypothesis set, then, for H any , with probability at least , δ > 0 1 − δ � log | H | + log 2 ∀ h ∈ H, R ( h ) ≤ � δ R S ( h ) + . 2 m Proof: By the union bound, � � � � � R ( h ) − � � > � Pr max R S ( h ) h ∈ H �� R ( h 1 ) − � � R ( h | H | ) − � � > � ∨ . . . ∨ � > � = Pr R S ( h 1 ) R S ( h | H | ) �� R ( h ) − � � > � Pr R S ( h ) ≤ h ∈ H ≤ 2 | H | exp( − 2 m � 2 ) . page 26

Remarks Thus, for a finite hypothesis set, whp, �� log | H | ∀ h ∈ H, R ( h ) ≤ � R S ( h ) + O . m Error bound in (quadratically worse). 1 O ( √ m ) can be interpreted as the number of bits log 2 | H | needed to encode . H Occam’s Razor principle (theologian William of Occam): “plurality should not be posited without necessity”. page 27

Foundations of Machine Learning Learning with Finite Hypothesis - PowerPoint PPT Presentation

Foundations of Machine Learning Learning with Finite Hypothesis Sets Motivation Some computational learning questions What can be learned efficiently? What is inherently hard to learn? A general model of learning? Complexity

recap to this point foundations foundations foundations foundations genetics =

Boosting: Foundations and Algorithms Boosting: Foundations and Algorithms Boosting: Foundations

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

Foundations of Tidy Machine Learning Dmitriy (Dima) Gorenshteyn Lead Data Scientist, Memorial

Welcome to the Machine Learning Toolbox! Machine Learning Toolbox Supervised learning caret

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

INTRODUCTION TO MACHINE LEARNING Joseph C. Osborn CS 51A Spring 2020 Machine Learning is

Human and Machine Learning Tom Mitchell Machine Learning Department Carnegie Mellon University

Machine Learning Algorithms for Classification Machine Learning Algorithms for Classification

Machine Learning - Intro Aarti Singh Machine Learning 10-701/15-781 Sept 8, 2010 You tell me

MACHINE LEARNING Kernel Canonical Correlation Analysis 1 ADVANCED MACHINE LEARNING ADVANCED

Prediction and Solomonoff Pter Gcs Boston University Quantum Foundations worshop, August

Welcome to Class 2: Did people in Columbuss >me

Quantum Mechanics A Gentle Introduction Sebastian Riese 27.12.2018 Quantum Mechanics 1/40

What is this thing...? Lecture 20. Realism Continued * Reading for this week: T&R Chapter 12,

Automatic Learning of a Morphological Model Theory and Unsupervised Approaches Unsupervised

Poll Everywhere Quick Guide Google Slides Part I: Creating Polls at the Poll Everywhere web

Depth Sensing Shao-Yi Chien Department of Electrical Engineering National Taiwan

VALSE VA ON ONLINE Tr Tracking Mu Multiple Ob Objects in in Im Image Se Sequences

Foundations of Machine Learning Learning with Finite Hypothesis - PowerPoint PPT Presentation

Foundations of Machine Learning Learning with Finite Hypothesis Sets Motivation Some computational learning questions What can be learned efficiently? What is inherently hard to learn? A general model of learning? Complexity

recap to this point foundations foundations foundations foundations genetics =

Boosting: Foundations and Algorithms Boosting: Foundations and Algorithms Boosting: Foundations

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

Foundations of Tidy Machine Learning Dmitriy (Dima) Gorenshteyn Lead Data Scientist, Memorial

Welcome to the Machine Learning Toolbox! Machine Learning Toolbox Supervised learning caret

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

INTRODUCTION TO MACHINE LEARNING Joseph C. Osborn CS 51A Spring 2020 Machine Learning is

Human and Machine Learning Tom Mitchell Machine Learning Department Carnegie Mellon University

Machine Learning Algorithms for Classification Machine Learning Algorithms for Classification

Machine Learning - Intro Aarti Singh Machine Learning 10-701/15-781 Sept 8, 2010 You tell me

MACHINE LEARNING Kernel Canonical Correlation Analysis 1 ADVANCED MACHINE LEARNING ADVANCED

Prediction and Solomonoff Pter Gcs Boston University Quantum Foundations worshop, August

Welcome to Class 2: Did people in Columbuss &gt;me

Quantum Mechanics A Gentle Introduction Sebastian Riese 27.12.2018 Quantum Mechanics 1/40

What is this thing...? Lecture 20. Realism Continued * Reading for this week: T&amp;R Chapter 12,

Automatic Learning of a Morphological Model Theory and Unsupervised Approaches Unsupervised

Poll Everywhere Quick Guide Google Slides Part I: Creating Polls at the Poll Everywhere web

Depth Sensing Shao-Yi Chien Department of Electrical Engineering National Taiwan

VALSE VA ON ONLINE Tr Tracking Mu Multiple Ob Objects in in Im Image Se Sequences

Welcome to Class 2: Did people in Columbuss >me

What is this thing...? Lecture 20. Realism Continued * Reading for this week: T&R Chapter 12,