pac learning midterm review
play

PAC Learning + Midterm Review Matt Gormley Lecture 15 March 7, - PowerPoint PPT Presentation

10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University PAC Learning + Midterm Review Matt Gormley Lecture 15 March 7, 2018 1 ML Big Picture Learning Paradigms: Problem


  1. 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University PAC Learning + Midterm Review Matt Gormley Lecture 15 March 7, 2018 1

  2. ML Big Picture Learning Paradigms: Problem Formulation: Vision, Robotics, Medicine, What is the structure of our output prediction? What data is available and NLP, Speech, Computer when? What form of prediction? boolean Binary Classification • supervised learning categorical Multiclass Classification • unsupervised learning ordinal Ordinal Classification Application Areas • semi-supervised learning real Regression • reinforcement learning Key challenges? • active learning ordering Ranking • imitation learning multiple discrete Structured Prediction • domain adaptation • multiple continuous (e.g. dynamical systems) online learning Search • density estimation both discrete & (e.g. mixed graphical models) • recommender systems cont. • feature learning • manifold learning • dimensionality reduction Facets of Building ML Big Ideas in ML: • ensemble learning Systems: Which are the ideas driving • distant supervision How to build systems that are development of the field? • hyperparameter optimization robust, efficient, adaptive, • inductive bias effective? • Theoretical Foundations: generalization / overfitting 1. Data prep • What principles guide learning? 2. Model selection bias-variance decomposition • 3. Training (optimization / q probabilistic generative vs. discriminative search) • deep nets, graphical models q information theoretic 4. Hyperparameter tuning on • PAC learning q evolutionary search validation data • distant rewards 5. (Blind) Assessment on test q ML as optimization 2 data

  3. LEARNING THEORY 3

  4. Questions For Today 1. Given a classifier with zero training error, what can we say about generalization error? (Sample Complexity, Realizable Case) 2. Given a classifier with low training error, what can we say about generalization error? (Sample Complexity, Agnostic Case) 3. Is there a theoretical justification for regularization to avoid overfitting? (Structural Risk Minimization) 4

  5. PAC/SLT models for Supervised Learning PAC / SLT Model Data Distribution D on X Source Expert / Oracle Learning Algorithm Labeled Examples (x 1 ,c*(x 1 )),…, ( x m ,c*(x m )) c* : X ! Y Alg.outputs h : X ! Y x 1 > 5 + + - + - + +1 x 6 > 2 - - - - -1 +1 6 Slide from Nina Balcan

  6. Two Types of Error True Error (aka. expected risk ) Train Error (aka. empirical risk ) 7

  7. PAC / SLT Model 8

  8. Three Hypotheses of Interest 9

  9. PAC LEARNING 10

  10. Probably Approximately Correct (PAC) Learning Whiteboard: – PAC Criterion – Meaning of “Probably Approximately Correct” – PAC Learnable – Consistent Learner – Sample Complexity 11

  11. Generalization and Overfitting Whiteboard: – Realizable vs. Agnostic Cases – Finite vs. Infinite Hypothesis Spaces 12

  12. PAC Learning 13

  13. SAMPLE COMPLEXITY RESULTS 14

  14. Sample Complexity Results We’ll start with the Four Cases we care about… finite case… Realizable Agnostic 15

  15. Sample Complexity Results Four Cases we care about… Realizable Agnostic 16

  16. Example: Conjunctions In-Class Quiz: Suppose H = class of conjunctions over x in {0,1} M If M = 10, ! = 0.1, δ = 0.01, how many examples suffice? Realizable Agnostic 17

  17. Sample Complexity Results Four Cases we care about… Realizable Agnostic 18

  18. 1. Bound is inversely linear in 1. Bound is inversely quadratic in Sample Complexity Results epsilon (e.g. halving the error epsilon (e.g. halving the error requires double the examples) requires 4x the examples) 2. Bound is only logarithmic in 2. Bound is only logarithmic in |H| (e.g. quadrupling the |H| (i.e. same as Realizable hypothesis space only requires case) double the examples) Four Cases we care about… Realizable Agnostic 19

  19. Generalization and Overfitting Whiteboard: – Sample Complexity Bounds (Agnostic Case) – Corollary (Agnostic Case) – Empirical Risk Minimization – Structural Risk Minimization – Motivation for Regularization 22

  20. Sample Complexity Results Four Cases we care about… Realizable Agnostic We need a new definition of “complexity” for a Hypothesis space for these results (see VC Dimension ) 23

  21. Sample Complexity Results Four Cases we care about… Realizable Agnostic 24

  22. VC DIMENSION 25

  23. What if H is infinite? + + - + E.g., linear separators in R d - + - - - - - + E.g., thresholds on the real line w - - + E.g., intervals on the real line a b 26

  24. Shattering, VC-dimension Definition : H[S] – the set of splittings of dataset S using concepts from H. H shatters S if | H S | = 2 |𝑇| . A set of points S is shattered by H is there are hypotheses in H that split S in all of the 2 |𝑇| possible ways; i.e., all possible ways of classifying points in S are achievable using concepts in H. Definition : VC-dimension (Vapnik-Chervonenkis dimension) The VC-dimension of a hypothesis space H is the cardinality of the largest set S that can be shattered by H. If arbitrarily large finite sets can be shattered by H, then VCdim(H) = ∞ 27

  25. Shattering, VC-dimension Definition : VC-dimension (Vapnik-Chervonenkis dimension) The VC-dimension of a hypothesis space H is the cardinality of the largest set S that can be shattered by H. If arbitrarily large finite sets can be shattered by H, then VCdim(H) = ∞ To show that VC-dimension is d: – there exists a set of d points that can be shattered – there is no set of d+1 points that can be shattered. Fact : If H is finite, then VCdim (|H|) . (H) ≤ log 28

  26. Shattering, VC-dimension If the VC-dimension is d, that means there exists a set of d points that can be shattered, but there is no set of d+1 points that can be shattered. - + E.g., H= Thresholds on the real line w VCdim H = 1 + - - - + E.g., H= Intervals on the real line VCdim H = 2 + - + 29

  27. Shattering, VC-dimension If the VC-dimension is d, that means there exists a set of d points that can be shattered, but there is no set of d+1 points that can be shattered. VCdim H = 2k E.g., H= Union of k intervals on the real line + - - + + - A sample of size 2k shatters VCdim H ≥ 2k (treat each pair of points as a separate case of intervals) VCdim H < 2k + 1 + - + - + … 30

  28. Shattering, VC-dimension E.g., H= linear separators in R 2 VCdim H ≥ 3 31

  29. Shattering, VC-dimension E.g., H= linear separators in R 2 VCdim H < 4 Case 1: one point inside the triangle formed by the others. Cannot label inside point as positive and outside points as negative. Case 2: all points on the boundary (convex hull). Cannot label two diagonally as positive and other two as negative. Fact: VCdim of linear separators in R d is d+1 32

  30. Sample Complexity Results Four Cases we care about… Realizable Agnostic 34

  31. Questions For Today 1. Given a classifier with zero training error, what can we say about generalization error? (Sample Complexity, Realizable Case) 2. Given a classifier with low training error, what can we say about generalization error? (Sample Complexity, Agnostic Case) 3. Is there a theoretical justification for regularization to avoid overfitting? (Structural Risk Minimization) 39

  32. Learning Theory Objectives You should be able to… • Identify the properties of a learning setting and assumptions required to ensure low generalization error • Distinguish true error, train error, test error • Define PAC and explain what it means to be approximately correct and what occurs with high probability • Apply sample complexity bounds to real-world learning examples • Distinguish between a large sample and a finite sample analysis • Theoretically motivate regularization 40

  33. Outline • Midterm Exam Logistics • Sample Questions • Classification and Regression: The Big Picture • Q&A 41

  34. MIDTERM EXAM LOGISTICS 42

  35. Midterm Exam • Time / Location – Time: Evening Exam Thu, March 22 at 6:30pm – 8:30pm – Room : We will contact each student individually with your room assignment . The rooms are not based on section. – Seats: There will be assigned seats . Please arrive early. – Please watch Piazza carefully for announcements regarding room / seat assignments. • Logistics – Format of questions: • Multiple choice • True / False (with justification) • Derivations • Short answers • Interpreting figures • Implementing algorithms on paper – No electronic devices – You are allowed to bring one 8½ x 11 sheet of notes (front and back) 43

  36. Midterm Exam • How to Prepare – Attend the midterm review lecture (right now!) – Review prior year’s exam and solutions (we’ll post them) – Review this year’s homework problems – Consider whether you have achieved the “learning objectives” for each lecture / section 44

Recommend


More recommend