The PAC Learning Framework Guoqing Zheng January 20, 2015 Guoqing - PowerPoint PPT Presentation

The PAC Learning Framework Guoqing Zheng January 20, 2015 Guoqing Zheng The PAC Learning Framework

Intro Questions about learning: What can be learned efficiently? (what is hard to learn?) How many examples needed? General model of learning? Answer: = ⇒ Probably Approximately Correct (PAC) learning framework Guoqing Zheng The PAC Learning Framework

Learning theory basics Notations X input space, Y label space, Y = { 0 , 1 } Concept c : X → Y , Concept class C Hypothesis h , Hypothesis set H ( H may or may not be the same as C ) S = ( X 1 , ..., X m ) sample of m iid examples from unknown but fixed distribution D Task of learning: Use S to select a hypothesis h S ∈ H that has a small generalization error w.r.t to c Guoqing Zheng The PAC Learning Framework

Generalization error and empirical error Generalization error: R ( h ) = E X ∼ D [ I ( h ( X ) � = c ( X ))] (1) Empirical error: m R ( h ) = 1 � ˆ I ( h ( X i ) � = c ( X i )) (2) m i =1 Relationship: � � ˆ R ( h ) = R ( h ) (3) E S ∼ D m Guoqing Zheng The PAC Learning Framework

PAC learning A concept class C is PAC-learnable if there exists an algorithm A , for all distributions D on X , for any target concept c ∈ C , for any ǫ > 0 and δ > 0 , after observing m ≥ poly (1 /ǫ, 1 /δ, n, size ( c )) examples, it returns a hypothesis h S where P S ∈ D m [ R ( h S ) ≤ ǫ ] ≥ 1 − δ (4) � �� Approximately � �� Probably It is further efficiently PAC-learnable if also A runs in poly (1 /ǫ, 1 /δ, n, size ( c )) time. Guoqing Zheng The PAC Learning Framework

Example: Learning axis-aligned rectangles Figure: R is the target rectangle, R’ is the constructed rectangle Proof the target class if PAC learnable. Construct R S =R’ as the tightest rectangle containing the positive points; Denote P ( R ) the probability of a point randomly drawn from D to fall within R; Error can only happen for points falling inside R. If P ( R ) ≤ ǫ , P ( R ( R S ) > ǫ ) = 0 < 1 − δ for any δ > 0 ; Guoqing Zheng The PAC Learning Framework

Example: Learning axis-aligned rectangles (contd.) Figure: R is the target rectangle, R’ is the constructed rectangle Now assume P ( R ) > ǫ , construct r 1 , r 2 , r 3 , r 4 such that P ( r i ) = ǫ/ 4 for i = 1 , 2 , 3 , 4 ; If R S meets all four regions, R ( R S ) ≤ ǫ ; otherwise, if R ( R S ) > ǫ , R S must miss at least one of the four regions. Guoqing Zheng The PAC Learning Framework

Example: Learning axis-aligned rectangles (contd.) Hence, � 4 � � P ( R ( R S ) > ǫ ) ≤ P { R S ∩ r i = ∅} (5) i =1 4 � ≤ P ( { R S ∩ r i = ∅} ) (6) i =1 ≤ 4(1 − ǫ/ 4) m (7) ( because 1 − x ≤ e − x ) ≤ 4 exp( − mǫ/ 4) (8) Let 4 exp( − mǫ/ 4) ≤ δ ⇔ m ≥ 4 ǫ log 4 δ . So for any ǫ > 0 and δ > 0 , when m ≥ 4 ǫ log 4 δ , P ( R ( R S ) > ǫ ) ≤ δ . Also the representation cost for the point and for the rectangles is const. Hence, the concept class of axis-aligned rectangles is PAC-learnable. Guoqing Zheng The PAC Learning Framework

Generalization bounds for finite H (consistent case) For finite H and consistent hypothesis h S , PAC learnable if � � m ≥ 1 log | H | + log 1 (9) ǫ δ Proof: � � ∃ h ∈ H : ˆ R ( h ) = 0 ∧ R ( h ) > ǫ (10) P � ( h 1 ∈ H, ˆ = P R ( h 1 ) = 0 ∧ R ( h 1 ) > ǫ ) (11) � ∨ ( h 2 ∈ H, ˆ R ( h 2 ) = 0 ∧ R ( h 2 ) > ǫ ) ∨ ... (12) � ˆ � � ≤ R ( h ) = 0 ∧ R ( h ) > ǫ (13) P h ∈ H � ˆ � � ≤ R ( h ) = 0 | R ( h ) > ǫ P (14) h ∈ H � (1 − ǫ ) m = | H | (1 − ǫ ) m ≤ (15) h ∈ H ≤| H | e − mǫ ( because 1 − x ≤ e − x ) (16) Guoqing Zheng The PAC Learning Framework

Hoeffding’s inequality Markov’s inequality : For X ≥ 0 and any ǫ > 0 , P ( X ≥ ǫ ) ≤ ǫ − 1 E ( X ) (17) because � ∞ � ∞ � ∞ E ( X ) = xp ( x ) dx ≥ xp ( x ) dx ≥ ǫp ( x ) dx (18) 0 ǫ ǫ � ∞ = ǫ p ( x ) dx = ǫ P ( X ≥ ǫ ) (19) ǫ Chernoff bounding technique : For X and any ǫ > 0 , t > 0 , P ( X ≥ ǫ ) = P ( e tX ≥ e tǫ ) ≤ e − tǫ E ( e tX ) (20) Guoqing Zheng The PAC Learning Framework

Hoeffding’s lemma Hoeffding’s lemma : Suppose E ( X ) = 0 and a ≤ X ≤ b . Then for any t > 0 t 2( b − a )2 E ( e tX ) ≤ e (21) 8 Proof: b − a b ) ≤ b − X b − a e ta + X − a e tX = e t ( b − X b − a a + X − a b − a e tb (22) b − ae ta + − a b t − ae tb ≡ e g ( u ) ⇒ E ( e tX ) ≤ (23) where u = t ( b − a ) , g ( u ) = − γu + log(1 − γ + γe u ) , γ ≡ − a b − a Guoqing Zheng The PAC Learning Framework

Hoeffding’s lemma (contd.) For g ( u ) = − γu + log(1 − γ + γe u ) , we can verify g (0) = 0 ; γe u g ′ ( u ) = − γ + γe u +1 − γ , hence g ′ (0) = 0 ; Further, ( γe u + 1 − γ ) 2 ≤ (1 − γ ) γe u (1 − γ ) γe u 4(1 − γ ) γe u = 1 g ′′ ( u ) = (24) 4 (because ( a + b ) 2 ≥ 4 ab .) By Taylor’s theorem, ∃ ξ ∈ (0 , u ) s.t. g ( u ) = g (0) + ug ′ (0) + u 2 2 g ′′ ( ξ ) = u 2 2 g ′′ ( ξ ) ≤ u 2 8 = t 2 ( b − a ) 2 8 (25) t 2( b − a )2 ⇒ E ( e tX ) ≤ e g ( u ) ≤ e . (26) 8 Guoqing Zheng The PAC Learning Framework

Hoeffding’s inequality Hoeffding’s inequality : Let S = { X 1 , X 2 , ..., X m } are a sample of m independent variables with X i ∈ [ a, b ] and common mean µ . Let X m be the sample mean. For any ǫ > 0 , P S ∈ D m � � � − 2 mǫ 2 / ( b − a ) 2 � X m − µ ≥ ǫ ≤ exp (27) P S ∈ D m � � � − 2 mǫ 2 / ( b − a ) 2 � X m − µ ≤ − ǫ ≤ exp (28) Proof: For any t > 0 , � m � � P S ∈ D m � � X m − µ ≥ ǫ = P S ∈ D m X i − mµ ≥ mǫ (29) i =1 m � i =1 X i − mµ ) � � e t ( X i − µ ) � � � m e t ( ≤ e − tmǫ E = e − tmǫ (30) E i =1 m t 2( b − a )2 � = e − tmǫ + t 2 m ( b − a ) 2 / 8 ≤ e − 2 mǫ 2 / ( b − a ) 2 . (31) ≤ e − tmǫ e 8 i =1 The other side is similar. Guoqing Zheng The PAC Learning Framework

Generalization bounds for finite H (inconsistent case) By Hoeffding’s inequality: � � | ˆ ≤ 2 exp( − 2 mǫ 2 ) R ( h ) − R ( h ) | ≥ ǫ (32) P S ∈ D m For a fixed h , w.p. ≥ 1 − δ � log 2 R ( h ) ≤ ˆ δ R ( h ) + (33) 2 m For finite H and inconsistent case, ∀ h ∈ H , w.p. ≥ 1 − δ � log | H | + log 2 R ( h ) ≤ ˆ δ R ( h ) + (34) 2 m Guoqing Zheng The PAC Learning Framework

Generalization bounds for finite H (inconsistent case) Proof: � � ∃ h ∈ H, | ˆ R ( h ) − R ( h ) | > ǫ (35) P � � � � � � | ˆ R ( h 1 ) − R ( h 1 ) | > ǫ ∨ ... ∨ | ˆ = P R h | H | − R h | H | | > ǫ (36) � � � | ˆ ≤ R ( h ) − R ( h ) | > ǫ (37) P h ∈ H ≤ 2 | H | exp( − 2 mǫ 2 ) (38) Setting the RHS to be δ complets the proof. Guoqing Zheng The PAC Learning Framework

Generalities Agnostic (Non-relizable) PAC-leanring: for all D over X × Y , for any ǫ > 0 and δ > 0 and sample size m ≥ poly (1 /ǫ, 1 /δ, n, size ( c )) , if the following holds: P S ∈ D m � � R ( h S ) − min h ∈ H R ( h ) ≤ ǫ ≥ 1 − δ (39) Bayes hypothesis: hypothesis h such that R ( h ) = R ∗ ≡ inf h R ( h ) (40) Note: Bayes hypothesis may or may not be in H Estimation and approximation errors R ( h ) − R ∗ = ( R ( h ) − R ( h ∗ )) + ( R ( h ∗ ) − R ∗ ) (41) � �� estimation approximation Guoqing Zheng The PAC Learning Framework

Generalities (contd.) Estimation error can sometimes be bounded in terms of generalization error from PAC. For exmaple, for h ERM the hypothesis returned by empirical S risk minimization, ) − ˆ ) + ˆ R ( h ERM ) − R ( h ∗ ) = R ( h ERM R ( h ERM R ( h ERM ) − R ( h ∗ ) S S S S ≤ R ( h ERM ) − ˆ R ( h ERM ) + ˆ R ( h ∗ ) − R ( h ∗ ) S S | R ( h ) − ˆ ≤ 2 sup R ( h ) | (42) h ∈ H Guoqing Zheng The PAC Learning Framework

Questions? Thanks! Guoqing Zheng The PAC Learning Framework

The PAC Learning Framework Guoqing Zheng January 20, 2015 Guoqing - PowerPoint PPT Presentation

The PAC Learning Framework Guoqing Zheng January 20, 2015 Guoqing Zheng The PAC Learning Framework Intro Questions about learning: What can be learned efficiently? (what is hard to learn?) How many examples needed? General model of

Guiding Financial Controls and Practices for PACs and PAC Treasurers PAC Treasurers Workshop

NAPSLO PAC Contributions How contributing to the NAPSLO PAC will benefit you, your company and the

WELCOME June 2011 PAC Presentation Opening Remarks Introductions June 2011 PAC

AAOS Orthopaedic PAC The Orthopaedic PAC is the only national political action committee

LArIAT Fermilab PAC Meeting November 11, 2016 Jen Raaf PAC Charge Fermilab PAC Meeting, J.

HERITAGE SQUARE CONSIDERATIONS Public Process Project Advisory Committee Meetings: PAC Meeting

Interferometric Sensor (MAGIS-100) PAC Meeting Jason Hogan on behalf of the MAGIS

PAC Learning prof. dr Arno Siebes Algorithmic Data Analysis Group Department of Information and

PAC Team P resentation Provosts Assessment Committee (PAC) Fall Convocation 2018 University

SCOTT RIT-PAC III Objectives Describe the SCOTT RIT-PAC III and its components

PAC 101 BY PAST PAC CHAIR, BRIGITTA SHORE PURPOSE OF A P ARENT A DVISORY C OUNCIL To advocate

Gui Guidi ding ng Fi Financi nancial al Controls and and Pr Practices for PA PACs and

Country Paper Presentation on Implementation of SEA-PAC Action Plan (2017-2019) at 13 th SEA-

PAC ACIFIC IFIC RI RING NG OF FI FIRE RE Photo credit: wikipedia.org PAC ACIFIC IFIC TY

w pac.edu.au e admissions@pac.edu.au ABN 235 392 909 73 Anna Karenina by Leo Tolstoy; Ulysses by

Red Wing Bridge Project PAC #11/TAC #14 Meeting June 25, 2015 PAC #11/TAC #14 June 25, 2015

t

Critical Factors Characterizing Projects and Lifecycle Models 30 th Pacific NW Software Quality

Injectivity of Hermitian frame measurements Cynthia Vinzant North Carolina State University

An algebraic approach to phase retrieval Cynthia Vinzant University of Michigan joint with Aldo

Analytic algorithms for the moment polytope Cole Franks Rutgers University Based on joint work

Structured Condition Numbers and Backward Errors in Scalar Product Spaces Franoise Tisseur

succeed! Bill hancock! DRAMA KING!! MAYFIELD ! R E K A B CUTEST COUPLE!! ! D O O F

Machine learning theory Model Selection Hamid Beigy Sharif University of Technology March 16,