The PAC Learning Framework Guoqing Zheng January 20, 2015 Guoqing Zheng The PAC Learning Framework
Intro Questions about learning: What can be learned efficiently? (what is hard to learn?) How many examples needed? General model of learning? Answer: = ⇒ Probably Approximately Correct (PAC) learning framework Guoqing Zheng The PAC Learning Framework
Learning theory basics Notations X input space, Y label space, Y = { 0 , 1 } Concept c : X → Y , Concept class C Hypothesis h , Hypothesis set H ( H may or may not be the same as C ) S = ( X 1 , ..., X m ) sample of m iid examples from unknown but fixed distribution D Task of learning: Use S to select a hypothesis h S ∈ H that has a small generalization error w.r.t to c Guoqing Zheng The PAC Learning Framework
Generalization error and empirical error Generalization error: R ( h ) = E X ∼ D [ I ( h ( X ) � = c ( X ))] (1) Empirical error: m R ( h ) = 1 � ˆ I ( h ( X i ) � = c ( X i )) (2) m i =1 Relationship: � � ˆ R ( h ) = R ( h ) (3) E S ∼ D m Guoqing Zheng The PAC Learning Framework
PAC learning A concept class C is PAC-learnable if there exists an algorithm A , for all distributions D on X , for any target concept c ∈ C , for any ǫ > 0 and δ > 0 , after observing m ≥ poly (1 /ǫ, 1 /δ, n, size ( c )) examples, it returns a hypothesis h S where P S ∈ D m [ R ( h S ) ≤ ǫ ] ≥ 1 − δ (4) � �� � Approximately � �� � Probably It is further efficiently PAC-learnable if also A runs in poly (1 /ǫ, 1 /δ, n, size ( c )) time. Guoqing Zheng The PAC Learning Framework
Example: Learning axis-aligned rectangles Figure: R is the target rectangle, R’ is the constructed rectangle Proof the target class if PAC learnable. Construct R S =R’ as the tightest rectangle containing the positive points; Denote P ( R ) the probability of a point randomly drawn from D to fall within R; Error can only happen for points falling inside R. If P ( R ) ≤ ǫ , P ( R ( R S ) > ǫ ) = 0 < 1 − δ for any δ > 0 ; Guoqing Zheng The PAC Learning Framework
Example: Learning axis-aligned rectangles (contd.) Figure: R is the target rectangle, R’ is the constructed rectangle Now assume P ( R ) > ǫ , construct r 1 , r 2 , r 3 , r 4 such that P ( r i ) = ǫ/ 4 for i = 1 , 2 , 3 , 4 ; If R S meets all four regions, R ( R S ) ≤ ǫ ; otherwise, if R ( R S ) > ǫ , R S must miss at least one of the four regions. Guoqing Zheng The PAC Learning Framework
Example: Learning axis-aligned rectangles (contd.) Hence, � 4 � � P ( R ( R S ) > ǫ ) ≤ P { R S ∩ r i = ∅} (5) i =1 4 � ≤ P ( { R S ∩ r i = ∅} ) (6) i =1 ≤ 4(1 − ǫ/ 4) m (7) ( because 1 − x ≤ e − x ) ≤ 4 exp( − mǫ/ 4) (8) Let 4 exp( − mǫ/ 4) ≤ δ ⇔ m ≥ 4 ǫ log 4 δ . So for any ǫ > 0 and δ > 0 , when m ≥ 4 ǫ log 4 δ , P ( R ( R S ) > ǫ ) ≤ δ . Also the representation cost for the point and for the rectangles is const. Hence, the concept class of axis-aligned rectangles is PAC-learnable. Guoqing Zheng The PAC Learning Framework
Generalization bounds for finite H (consistent case) For finite H and consistent hypothesis h S , PAC learnable if � � m ≥ 1 log | H | + log 1 (9) ǫ δ Proof: � � ∃ h ∈ H : ˆ R ( h ) = 0 ∧ R ( h ) > ǫ (10) P � ( h 1 ∈ H, ˆ = P R ( h 1 ) = 0 ∧ R ( h 1 ) > ǫ ) (11) � ∨ ( h 2 ∈ H, ˆ R ( h 2 ) = 0 ∧ R ( h 2 ) > ǫ ) ∨ ... (12) � ˆ � � ≤ R ( h ) = 0 ∧ R ( h ) > ǫ (13) P h ∈ H � ˆ � � ≤ R ( h ) = 0 | R ( h ) > ǫ P (14) h ∈ H � (1 − ǫ ) m = | H | (1 − ǫ ) m ≤ (15) h ∈ H ≤| H | e − mǫ ( because 1 − x ≤ e − x ) (16) Guoqing Zheng The PAC Learning Framework
Hoeffding’s inequality Markov’s inequality : For X ≥ 0 and any ǫ > 0 , P ( X ≥ ǫ ) ≤ ǫ − 1 E ( X ) (17) because � ∞ � ∞ � ∞ E ( X ) = xp ( x ) dx ≥ xp ( x ) dx ≥ ǫp ( x ) dx (18) 0 ǫ ǫ � ∞ = ǫ p ( x ) dx = ǫ P ( X ≥ ǫ ) (19) ǫ Chernoff bounding technique : For X and any ǫ > 0 , t > 0 , P ( X ≥ ǫ ) = P ( e tX ≥ e tǫ ) ≤ e − tǫ E ( e tX ) (20) Guoqing Zheng The PAC Learning Framework
Hoeffding’s lemma Hoeffding’s lemma : Suppose E ( X ) = 0 and a ≤ X ≤ b . Then for any t > 0 t 2( b − a )2 E ( e tX ) ≤ e (21) 8 Proof: b − a b ) ≤ b − X b − a e ta + X − a e tX = e t ( b − X b − a a + X − a b − a e tb (22) b − ae ta + − a b t − ae tb ≡ e g ( u ) ⇒ E ( e tX ) ≤ (23) where u = t ( b − a ) , g ( u ) = − γu + log(1 − γ + γe u ) , γ ≡ − a b − a Guoqing Zheng The PAC Learning Framework
Hoeffding’s lemma (contd.) For g ( u ) = − γu + log(1 − γ + γe u ) , we can verify g (0) = 0 ; γe u g ′ ( u ) = − γ + γe u +1 − γ , hence g ′ (0) = 0 ; Further, ( γe u + 1 − γ ) 2 ≤ (1 − γ ) γe u (1 − γ ) γe u 4(1 − γ ) γe u = 1 g ′′ ( u ) = (24) 4 (because ( a + b ) 2 ≥ 4 ab .) By Taylor’s theorem, ∃ ξ ∈ (0 , u ) s.t. g ( u ) = g (0) + ug ′ (0) + u 2 2 g ′′ ( ξ ) = u 2 2 g ′′ ( ξ ) ≤ u 2 8 = t 2 ( b − a ) 2 8 (25) t 2( b − a )2 ⇒ E ( e tX ) ≤ e g ( u ) ≤ e . (26) 8 Guoqing Zheng The PAC Learning Framework
Hoeffding’s inequality Hoeffding’s inequality : Let S = { X 1 , X 2 , ..., X m } are a sample of m independent variables with X i ∈ [ a, b ] and common mean µ . Let X m be the sample mean. For any ǫ > 0 , P S ∈ D m � � � − 2 mǫ 2 / ( b − a ) 2 � X m − µ ≥ ǫ ≤ exp (27) P S ∈ D m � � � − 2 mǫ 2 / ( b − a ) 2 � X m − µ ≤ − ǫ ≤ exp (28) Proof: For any t > 0 , � m � � P S ∈ D m � � X m − µ ≥ ǫ = P S ∈ D m X i − mµ ≥ mǫ (29) i =1 m � i =1 X i − mµ ) � � e t ( X i − µ ) � � � m e t ( ≤ e − tmǫ E = e − tmǫ (30) E i =1 m t 2( b − a )2 � = e − tmǫ + t 2 m ( b − a ) 2 / 8 ≤ e − 2 mǫ 2 / ( b − a ) 2 . (31) ≤ e − tmǫ e 8 i =1 The other side is similar. Guoqing Zheng The PAC Learning Framework
Generalization bounds for finite H (inconsistent case) By Hoeffding’s inequality: � � | ˆ ≤ 2 exp( − 2 mǫ 2 ) R ( h ) − R ( h ) | ≥ ǫ (32) P S ∈ D m For a fixed h , w.p. ≥ 1 − δ � log 2 R ( h ) ≤ ˆ δ R ( h ) + (33) 2 m For finite H and inconsistent case, ∀ h ∈ H , w.p. ≥ 1 − δ � log | H | + log 2 R ( h ) ≤ ˆ δ R ( h ) + (34) 2 m Guoqing Zheng The PAC Learning Framework
Generalization bounds for finite H (inconsistent case) Proof: � � ∃ h ∈ H, | ˆ R ( h ) − R ( h ) | > ǫ (35) P � � � � � � | ˆ R ( h 1 ) − R ( h 1 ) | > ǫ ∨ ... ∨ | ˆ = P R h | H | − R h | H | | > ǫ (36) � � � | ˆ ≤ R ( h ) − R ( h ) | > ǫ (37) P h ∈ H ≤ 2 | H | exp( − 2 mǫ 2 ) (38) Setting the RHS to be δ complets the proof. Guoqing Zheng The PAC Learning Framework
Generalities Agnostic (Non-relizable) PAC-leanring: for all D over X × Y , for any ǫ > 0 and δ > 0 and sample size m ≥ poly (1 /ǫ, 1 /δ, n, size ( c )) , if the following holds: P S ∈ D m � � R ( h S ) − min h ∈ H R ( h ) ≤ ǫ ≥ 1 − δ (39) Bayes hypothesis: hypothesis h such that R ( h ) = R ∗ ≡ inf h R ( h ) (40) Note: Bayes hypothesis may or may not be in H Estimation and approximation errors R ( h ) − R ∗ = ( R ( h ) − R ( h ∗ )) + ( R ( h ∗ ) − R ∗ ) (41) � �� � � �� � estimation approximation Guoqing Zheng The PAC Learning Framework
Generalities (contd.) Estimation error can sometimes be bounded in terms of generalization error from PAC. For exmaple, for h ERM the hypothesis returned by empirical S risk minimization, ) − ˆ ) + ˆ R ( h ERM ) − R ( h ∗ ) = R ( h ERM R ( h ERM R ( h ERM ) − R ( h ∗ ) S S S S ≤ R ( h ERM ) − ˆ R ( h ERM ) + ˆ R ( h ∗ ) − R ( h ∗ ) S S | R ( h ) − ˆ ≤ 2 sup R ( h ) | (42) h ∈ H Guoqing Zheng The PAC Learning Framework
Questions? Thanks! Guoqing Zheng The PAC Learning Framework
Recommend
More recommend