Computational Learning Theory [read Chapter 7] [Suggested exercises: 7.1, 7.2, 7.5, 7.8] � Computational learning theory � Setting 1: learner p oses queries to teac her � Setting 2: teac her c ho oses examples � Setting 3: randomly generated instances, lab eled b y teac her � Probably appro ximately correct (P A C) learning � V apnik-Cherv onenkis Dimension � Mistak e b ounds 175 lecture slides for textb o ok Machine L e arning , � c T om M. Mitc hell, McGra w Hill, 1997
Computational Learning Theory What general la ws constrain inductiv e learning? W e seek theory to relate: � Probabilit y of successful learning � Num b er of training examples � Complexit y of h yp othesis space � Accuracy to whic h target concept is appro ximated � Manner in whic h training examples presen ted 176 lecture slides for textb o ok Machine L e arning , � c T om M. Mitc hell, McGra w Hill, 1997
Protot ypical Concept Learning T ask � Giv en: { Instances X : P ossible da ys, eac h describ ed b y the attributes Sky, A irT emp, Humidity, Wind, Water, F or e c ast { T arget function c : E nj oy S por t : X ! f 0 ; 1 g { Hyp otheses H : Conjunctions of literals. E.g. h ? ; C ol d; H ig h; ? ; ? ; ? i : { T raining examples D : P ositiv e and negativ e examples of the target function h x ; c ( x ) i ; : : : h x ; c ( x ) i 1 1 m m � Determine: { A h yp othesis h in H suc h that h ( x ) = c ( x ) for all x in D ? { A h yp othesis h in H suc h that h ( x ) = c ( x ) for all x in X ? 177 lecture slides for textb o ok Machine L e arning , � c T om M. Mitc hell, McGra w Hill, 1997
Sample Complexit y Ho w man y training examples are su�cien t to learn the target concept? 1. If learner prop oses instances, as queries to teac her � Learner prop oses instance x , teac her pro vides c ( x ) 2. If teac her (who kno ws c ) pro vides training examples � teac her pro vides sequence of examples of form h x; c ( x ) i 3. If some random pro cess (e.g., nature) prop oses instances � instance x generated randomly , teac her pro vides c ( x ) 178 lecture slides for textb o ok Machine L e arning , � c T om M. Mitc hell, McGra w Hill, 1997
Sample Complexit y: 1 Learner prop oses instance x , teac her pro vides c ( x ) (assume c is in learner's h yp othesis space H ) Optimal query strategy: pla y 20 questions � pic k instance x suc h that half of h yp otheses in V S classify x p ositiv e, half classify x negativ e � When this is p ossible, need d log j H je queries to 2 learn c � when not p ossible, need ev en more 179 lecture slides for textb o ok Machine L e arning , � c T om M. Mitc hell, McGra w Hill, 1997
Sample Complexit y: 2 T eac her (who kno ws c ) pro vides training examples (assume c is in learner's h yp othesis space H ) Optimal teac hing strategy: dep ends on H used b y learner Consider the case H = conjunctions of up to n b o olean literal s and their negations e.g., ( Air T emp = W ar m ) ^ ( W ind = S tr ong ), where Air T emp; W ind; : : : eac h ha v e 2 p ossible v alues. � if n p ossible b o olean attributes in H , n + 1 examples su�ce � wh y? 180 lecture slides for textb o ok Machine L e arning , � c T om M. Mitc hell, McGra w Hill, 1997
Sample Complexit y: 3 Giv en: � set of instances X � set of h yp otheses H � set of p ossible target concepts C � training instances generated b y a �xed, unkno wn probabilit y distribution D o v er X Learner observ es a sequence D of training examples of form h x; c ( x ) i , for some target concept c 2 C � instances x are dra wn from distribution D � teac her pro vides target v alue c ( x ) for eac h Learner m ust output a h yp othesis h estimating c � h is ev aluated b y its p erformance on subsequen t instances dra wn according to D Note: randomly dra wn instances, noise-free classi�cati ons 181 lecture slides for textb o ok Machine L e arning , � c T om M. Mitc hell, McGra w Hill, 1997
T rue Error of a Hyp othesis Instance space X De�nition: The true error (denoted er r or ( h )) of h yp othesis h with resp ect to D - - c h target concept c and distribution D is the + probabilit y that h will misclassify an instance + dra wn at random according to D . er r or ( h ) � Pr [ c ( x ) 6 = h ( x )] D - x 2D Where c and h disagree 182 lecture slides for textb o ok Machine L e arning , � c T om M. Mitc hell, McGra w Hill, 1997
Tw o Notions of Error T r aining err or of h yp othesis h with resp ect to target concept c � Ho w often h ( x ) 6 = c ( x ) o v er training instances T rue err or of h yp othesis h with resp ect to c � Ho w often h ( x ) 6 = c ( x ) o v er future random instances Our concern: � Can w e b ound the true error of h giv en the training error of h ? � First consider when training error of h is zero (i.e., h 2 V S ) H ;D 183 lecture slides for textb o ok Machine L e arning , � c T om M. Mitc hell, McGra w Hill, 1997
Exhausting the V ersion Space ( r = training error, er r or = true error) Hypothesis space H De�nition: The v ersion space V S is said H ;D error =.3 to b e � - exhausted with resp ect to c and D , if error =.1 r =.4 =.2 r error =.2 ev ery h yp othesis h in V S has error less H ;D =0 r than � with resp ect to c and D . VSH,D error =.2 error =.3 =.3 r error =.1 ( 8 h 2 V S ) er r or ( h ) < � =.1 r H ;D D =0 r 184 lecture slides for textb o ok Machine L e arning , � c T om M. Mitc hell, McGra w Hill, 1997
Ho w man y examples will � -exhaust the VS? Theorem: [Haussler, 1988]. If the h yp othesis space H is �nite, and D is a sequence of m � 1 indep enden t random examples of some target concept c , then for an y 0 � � � 1, the probabilit y that the v ersion space with resp ect to H and D is not � -exhausted (with resp ect to c ) is less than � �m j H j e In teresting! This b ounds the probabilit y that an y consisten t learner will output a h yp othesis h with er r or ( h ) � � If w e w an t to this probabilit y to b e b elo w � � �m j H j e � � then 1 m � (ln j H j + ln (1 =� )) � 185 lecture slides for textb o ok Machine L e arning , � c T om M. Mitc hell, McGra w Hill, 1997
Learning Conjunctions of Bo olean Literals Ho w man y examples are su�cien t to assure with probabilit y at least (1 � � ) that ev ery h in V S satis�es er r or ( h ) � � H ;D D Use our theorem: 1 m � (ln j H j + ln (1 =� )) � Supp ose H con tains conjunctions of constrain ts on up to n b o olean attributes (i.e., n b o olean literals) . n Then j H j = 3 , and 1 n m � (ln 3 + ln (1 =� )) � or 1 m � ( n ln 3 + ln (1 =� )) � 186 lecture slides for textb o ok Machine L e arning , � c T om M. Mitc hell, McGra w Hill, 1997
Ho w Ab out E nj oy S por t ? 1 m � (ln j H j + ln (1 =� )) � If H is as giv en in E nj oy S por t then j H j = 973, and 1 m � (ln 973 + ln (1 =� )) � ... if w an t to assure that with probabilit y 95%, V S con tains only h yp otheses with er r or ( h ) � : 1, then D it is su�cien t to ha v e m examples, where 1 m � (ln 973 + ln (1 =: 05)) : 1 m � 10(ln 973 + ln 20) m � 10(6 : 88 + 3 : 00) m � 98 : 8 187 lecture slides for textb o ok Machine L e arning , � c T om M. Mitc hell, McGra w Hill, 1997
P A C Learning Consider a class C of p ossible target concepts de�ned o v er a set of instances X of length n , and a learner L using h yp othesis space H . De�nition: C is P A C-learnable b y L using H if for all c 2 C , distributions D o v er X , � suc h that 0 < � < 1 = 2, and � suc h that 0 < � < 1 = 2, learner L will with probabilit y at least (1 � � ) output a h yp othesis h 2 H suc h that er r or ( h ) � � , in time that is p olynomial in D 1 =� , 1 =� , n and siz e ( c ). 188 lecture slides for textb o ok Machine L e arning , � c T om M. Mitc hell, McGra w Hill, 1997
Recommend
More recommend