introduction to machine learning cmu 10701
play

Introduction to Machine Learning CMU-10701 8. Stochastic - PowerPoint PPT Presentation

Introduction to Machine Learning CMU-10701 8. Stochastic Convergence Barnabs Pczos Motivation 2 What have we seen so far? Several algorithms that seem to work fine on training datasets: Linear regression Nave Bayes classifier


  1. Introduction to Machine Learning CMU-10701 8. Stochastic Convergence Barnabás Póczos

  2. Motivation 2

  3. What have we seen so far? Several algorithms that seem to work fine on training datasets: • Linear regression • Naïve Bayes classifier • Perceptron classifier • Support Vector Machines for regression and classification  How good are these algorithms on unknown test sets?  How many training samples do we need to achieve small error?  What is the smallest possible error we can achieve? ) Learning Theory To answer these questions, we will need a few powerful tools 3

  4. Basic Estimation Theory 4

  5. Rolling a Dice, Estimation of parameters θ 1 , θ 2 ,…, θ 6 12 24 Does the MLE estimation converge to the right value? How fast does it converge? 60 120 5

  6. Rolling a Dice Calculating the Empirical Average Does the empirical average converge to the true mean? How fast does it converge? 6

  7. Rolling a Dice, Calculating the Empirical Average 5 sample traces How fast do they converge? 7

  8. Key Questions • Do empirical averages converge? • Does the MLE converge in the dice rolling problem? • What do we mean on convergence? • What is the rate of convergence? I want to know the coin parameter θ 2 [0,1] within ε = 0.1 error, with probability at least 1- δ = 0.95. How many flips do I need? Applications: • drug testing (Does this drug modifies the average blood pressure?) • user interface design (We will see later) 8

  9. Outline Theory : • Stochastic Convergences: – Weak convergence = Convergence in distribution – Convergence in probability – Strong (almost surely) – Convergence in L p norm • Limit theorems: – Law of large numbers – Central limit theorem • Tail bounds: – Markov, Chebyshev 9

  10. Stochastic convergence definitions and properties 10

  11. Convergence of vectors 11

  12. Convergence in Distribution = Convergence Weakly = Convergence in Law Let {Z, Z 1 , Z 2 , …} be a sequence of random variables. Notation: Definition: This is the “weakest” convergence. 12

  13. Convergence in Distribution = Convergence Weakly = Convergence in Law Only the distribution functions converge! (NOT the values of the random variables) 1 0 a 13

  14. Convergence in Distribution = Convergence Weakly = Convergence in Law Continuity is important! Example: Proof: 1 1 0 0 0 1/n 0 The limit random variable is constant 0: In this example the limit Z is discrete, not random (constant 0), although Z n is a continuous random variable. 14

  15. This image cannot currently be displayed. Convergence in Distribution = Convergence Weakly = Convergence in Law Properties Z n and Z can still be independent even if their distributions are the same! Scheffe's theorem: convergence of the probability density functions ) convergence in distribution Example: (Central Limit Theorem ) 15

  16. Convergence in Probability Notation: Definition: This indeed measures how far the values of Z n ( ω ) and Z( ω ) are from each other. 16

  17. Almost Surely Convergence Notation: Definition: 17

  18. Convergence in p-th mean, L p norm Notation: Definition: Properties: 18

  19. Counter Examples 19

  20. Further Readings on Stochastic convergence • http://en.wikipedia.org/wiki/Convergence_of_random_variables • Patrick Billingsley : Probability and Measure • Patrick Billingsley : Convergence of Probability Measures 20

  21. Finite sample tail bounds Useful tools! 21

  22. This image cannot currently be displayed. Gauss Markov inequality If X is any nonnegative random variable and a > 0, then Proof: Decompose the expectation Corollary: Chebyshev's inequality 22

  23. Chebyshev inequality If X is any nonnegative random variable and a > 0, then Here Var( X ) is the variance of X , defined as: Proof: 23

  24. This image cannot currently be displayed. Generalizations of Chebyshev's inequality Chebyshev: This is equivalent to this: Symmetric two-sided case ( X is symmetric distribution ) Asymmetric two-sided case ( X is asymmetric distribution ) There are lots of other generalizations, for example multivariate X. 24

  25. Higher moments? Markov: Chebyshev: Higher moments: where n ≥ 1 Other functions instead of polynomials? Exp function: Proof: (Markov ineq.) 25

  26. Law of Large Numbers 26

  27. Do empirical averages converge? Chebyshev’s inequality is good enough to study the question: Do the empirical averages converge to the true mean? Answer: Yes, they do. (Law of large numbers) 27

  28. Law of Large Numbers Weak Law of Large Numbers: Strong Law of Large Numbers: 28

  29. Weak Law of Large Numbers Proof I: Assume finite variance. (Not very important) Therefore, As n approaches infinity, this expression approaches 1. 29

  30. What we have learned today Theory : • Stochastic Convergences: – Weak convergence = Convergence in distribution – Convergence in probability – Strong (almost surely) – Convergence in L p norm • Limit theorems: – Law of large numbers – Central limit theorem • Tail bounds: – Markov, Chebyshev 30

  31. Thanks for your attention  31

Recommend


More recommend