the pac learning framework
play

The PAC Learning Framework Guoqing Zheng January 20, 2015 Guoqing - PowerPoint PPT Presentation

The PAC Learning Framework Guoqing Zheng January 20, 2015 Guoqing Zheng The PAC Learning Framework Intro Questions about learning: What can be learned efficiently? (what is hard to learn?) How many examples needed? General model of


  1. The PAC Learning Framework Guoqing Zheng January 20, 2015 Guoqing Zheng The PAC Learning Framework

  2. Intro Questions about learning: What can be learned efficiently? (what is hard to learn?) How many examples needed? General model of learning? Answer: = ⇒ Probably Approximately Correct (PAC) learning framework Guoqing Zheng The PAC Learning Framework

  3. Learning theory basics Notations X input space, Y label space, Y = { 0 , 1 } Concept c : X → Y , Concept class C Hypothesis h , Hypothesis set H ( H may or may not be the same as C ) S = ( X 1 , ..., X m ) sample of m iid examples from unknown but fixed distribution D Task of learning: Use S to select a hypothesis h S ∈ H that has a small generalization error w.r.t to c Guoqing Zheng The PAC Learning Framework

  4. Generalization error and empirical error Generalization error: R ( h ) = E X ∼ D [ I ( h ( X ) � = c ( X ))] (1) Empirical error: m R ( h ) = 1 � ˆ I ( h ( X i ) � = c ( X i )) (2) m i =1 Relationship: � � ˆ R ( h ) = R ( h ) (3) E S ∼ D m Guoqing Zheng The PAC Learning Framework

  5. PAC learning A concept class C is PAC-learnable if there exists an algorithm A , for all distributions D on X , for any target concept c ∈ C , for any ǫ > 0 and δ > 0 , after observing m ≥ poly (1 /ǫ, 1 /δ, n, size ( c )) examples, it returns a hypothesis h S where P S ∈ D m [ R ( h S ) ≤ ǫ ] ≥ 1 − δ (4) � �� � Approximately � �� � Probably It is further efficiently PAC-learnable if also A runs in poly (1 /ǫ, 1 /δ, n, size ( c )) time. Guoqing Zheng The PAC Learning Framework

  6. Example: Learning axis-aligned rectangles Figure: R is the target rectangle, R’ is the constructed rectangle Proof the target class if PAC learnable. Construct R S =R’ as the tightest rectangle containing the positive points; Denote P ( R ) the probability of a point randomly drawn from D to fall within R; Error can only happen for points falling inside R. If P ( R ) ≤ ǫ , P ( R ( R S ) > ǫ ) = 0 < 1 − δ for any δ > 0 ; Guoqing Zheng The PAC Learning Framework

  7. Example: Learning axis-aligned rectangles (contd.) Figure: R is the target rectangle, R’ is the constructed rectangle Now assume P ( R ) > ǫ , construct r 1 , r 2 , r 3 , r 4 such that P ( r i ) = ǫ/ 4 for i = 1 , 2 , 3 , 4 ; If R S meets all four regions, R ( R S ) ≤ ǫ ; otherwise, if R ( R S ) > ǫ , R S must miss at least one of the four regions. Guoqing Zheng The PAC Learning Framework

  8. Example: Learning axis-aligned rectangles (contd.) Hence, � 4 � � P ( R ( R S ) > ǫ ) ≤ P { R S ∩ r i = ∅} (5) i =1 4 � ≤ P ( { R S ∩ r i = ∅} ) (6) i =1 ≤ 4(1 − ǫ/ 4) m (7) ( because 1 − x ≤ e − x ) ≤ 4 exp( − mǫ/ 4) (8) Let 4 exp( − mǫ/ 4) ≤ δ ⇔ m ≥ 4 ǫ log 4 δ . So for any ǫ > 0 and δ > 0 , when m ≥ 4 ǫ log 4 δ , P ( R ( R S ) > ǫ ) ≤ δ . Also the representation cost for the point and for the rectangles is const. Hence, the concept class of axis-aligned rectangles is PAC-learnable. Guoqing Zheng The PAC Learning Framework

  9. Generalization bounds for finite H (consistent case) For finite H and consistent hypothesis h S , PAC learnable if � � m ≥ 1 log | H | + log 1 (9) ǫ δ Proof: � � ∃ h ∈ H : ˆ R ( h ) = 0 ∧ R ( h ) > ǫ (10) P � ( h 1 ∈ H, ˆ = P R ( h 1 ) = 0 ∧ R ( h 1 ) > ǫ ) (11) � ∨ ( h 2 ∈ H, ˆ R ( h 2 ) = 0 ∧ R ( h 2 ) > ǫ ) ∨ ... (12) � ˆ � � ≤ R ( h ) = 0 ∧ R ( h ) > ǫ (13) P h ∈ H � ˆ � � ≤ R ( h ) = 0 | R ( h ) > ǫ P (14) h ∈ H � (1 − ǫ ) m = | H | (1 − ǫ ) m ≤ (15) h ∈ H ≤| H | e − mǫ ( because 1 − x ≤ e − x ) (16) Guoqing Zheng The PAC Learning Framework

  10. Hoeffding’s inequality Markov’s inequality : For X ≥ 0 and any ǫ > 0 , P ( X ≥ ǫ ) ≤ ǫ − 1 E ( X ) (17) because � ∞ � ∞ � ∞ E ( X ) = xp ( x ) dx ≥ xp ( x ) dx ≥ ǫp ( x ) dx (18) 0 ǫ ǫ � ∞ = ǫ p ( x ) dx = ǫ P ( X ≥ ǫ ) (19) ǫ Chernoff bounding technique : For X and any ǫ > 0 , t > 0 , P ( X ≥ ǫ ) = P ( e tX ≥ e tǫ ) ≤ e − tǫ E ( e tX ) (20) Guoqing Zheng The PAC Learning Framework

  11. Hoeffding’s lemma Hoeffding’s lemma : Suppose E ( X ) = 0 and a ≤ X ≤ b . Then for any t > 0 t 2( b − a )2 E ( e tX ) ≤ e (21) 8 Proof: b − a b ) ≤ b − X b − a e ta + X − a e tX = e t ( b − X b − a a + X − a b − a e tb (22) b − ae ta + − a b t − ae tb ≡ e g ( u ) ⇒ E ( e tX ) ≤ (23) where u = t ( b − a ) , g ( u ) = − γu + log(1 − γ + γe u ) , γ ≡ − a b − a Guoqing Zheng The PAC Learning Framework

  12. Hoeffding’s lemma (contd.) For g ( u ) = − γu + log(1 − γ + γe u ) , we can verify g (0) = 0 ; γe u g ′ ( u ) = − γ + γe u +1 − γ , hence g ′ (0) = 0 ; Further, ( γe u + 1 − γ ) 2 ≤ (1 − γ ) γe u (1 − γ ) γe u 4(1 − γ ) γe u = 1 g ′′ ( u ) = (24) 4 (because ( a + b ) 2 ≥ 4 ab .) By Taylor’s theorem, ∃ ξ ∈ (0 , u ) s.t. g ( u ) = g (0) + ug ′ (0) + u 2 2 g ′′ ( ξ ) = u 2 2 g ′′ ( ξ ) ≤ u 2 8 = t 2 ( b − a ) 2 8 (25) t 2( b − a )2 ⇒ E ( e tX ) ≤ e g ( u ) ≤ e . (26) 8 Guoqing Zheng The PAC Learning Framework

  13. Hoeffding’s inequality Hoeffding’s inequality : Let S = { X 1 , X 2 , ..., X m } are a sample of m independent variables with X i ∈ [ a, b ] and common mean µ . Let X m be the sample mean. For any ǫ > 0 , P S ∈ D m � � � − 2 mǫ 2 / ( b − a ) 2 � X m − µ ≥ ǫ ≤ exp (27) P S ∈ D m � � � − 2 mǫ 2 / ( b − a ) 2 � X m − µ ≤ − ǫ ≤ exp (28) Proof: For any t > 0 , � m � � P S ∈ D m � � X m − µ ≥ ǫ = P S ∈ D m X i − mµ ≥ mǫ (29) i =1 m � i =1 X i − mµ ) � � e t ( X i − µ ) � � � m e t ( ≤ e − tmǫ E = e − tmǫ (30) E i =1 m t 2( b − a )2 � = e − tmǫ + t 2 m ( b − a ) 2 / 8 ≤ e − 2 mǫ 2 / ( b − a ) 2 . (31) ≤ e − tmǫ e 8 i =1 The other side is similar. Guoqing Zheng The PAC Learning Framework

  14. Generalization bounds for finite H (inconsistent case) By Hoeffding’s inequality: � � | ˆ ≤ 2 exp( − 2 mǫ 2 ) R ( h ) − R ( h ) | ≥ ǫ (32) P S ∈ D m For a fixed h , w.p. ≥ 1 − δ � log 2 R ( h ) ≤ ˆ δ R ( h ) + (33) 2 m For finite H and inconsistent case, ∀ h ∈ H , w.p. ≥ 1 − δ � log | H | + log 2 R ( h ) ≤ ˆ δ R ( h ) + (34) 2 m Guoqing Zheng The PAC Learning Framework

  15. Generalization bounds for finite H (inconsistent case) Proof: � � ∃ h ∈ H, | ˆ R ( h ) − R ( h ) | > ǫ (35) P � � � � � � | ˆ R ( h 1 ) − R ( h 1 ) | > ǫ ∨ ... ∨ | ˆ = P R h | H | − R h | H | | > ǫ (36) � � � | ˆ ≤ R ( h ) − R ( h ) | > ǫ (37) P h ∈ H ≤ 2 | H | exp( − 2 mǫ 2 ) (38) Setting the RHS to be δ complets the proof. Guoqing Zheng The PAC Learning Framework

  16. Generalities Agnostic (Non-relizable) PAC-leanring: for all D over X × Y , for any ǫ > 0 and δ > 0 and sample size m ≥ poly (1 /ǫ, 1 /δ, n, size ( c )) , if the following holds: P S ∈ D m � � R ( h S ) − min h ∈ H R ( h ) ≤ ǫ ≥ 1 − δ (39) Bayes hypothesis: hypothesis h such that R ( h ) = R ∗ ≡ inf h R ( h ) (40) Note: Bayes hypothesis may or may not be in H Estimation and approximation errors R ( h ) − R ∗ = ( R ( h ) − R ( h ∗ )) + ( R ( h ∗ ) − R ∗ ) (41) � �� � � �� � estimation approximation Guoqing Zheng The PAC Learning Framework

  17. Generalities (contd.) Estimation error can sometimes be bounded in terms of generalization error from PAC. For exmaple, for h ERM the hypothesis returned by empirical S risk minimization, ) − ˆ ) + ˆ R ( h ERM ) − R ( h ∗ ) = R ( h ERM R ( h ERM R ( h ERM ) − R ( h ∗ ) S S S S ≤ R ( h ERM ) − ˆ R ( h ERM ) + ˆ R ( h ∗ ) − R ( h ∗ ) S S | R ( h ) − ˆ ≤ 2 sup R ( h ) | (42) h ∈ H Guoqing Zheng The PAC Learning Framework

  18. Questions? Thanks! Guoqing Zheng The PAC Learning Framework

Recommend


More recommend