statistics for machine learning
play

Statistics for Machine Learning Prof. Seungchul Lee Industrial AI - PowerPoint PPT Presentation

Statistics for Machine Learning Prof. Seungchul Lee Industrial AI Lab. Statistics and Probability statistics data model probability 2 Populations and Samples A population includes all the elements from a set of data A parameter is a


  1. Statistics for Machine Learning Prof. Seungchul Lee Industrial AI Lab.

  2. Statistics and Probability statistics data model probability 2

  3. Populations and Samples • A population includes all the elements from a set of data • A parameter is a quantity computed from a population – mean, 𝜈 – variance, 𝜏 2 • A sample is a subset of the population. – one or more observations • A statistic is a quantity computed from a sample – sample mean, ҧ 𝑦 – sample variance, 𝑡 2 – sample correlation, 𝑇 𝑦𝑧 3

  4. How to Generate Random Numbers • Data sampled from population/process/generative model 4

  5. Histogram • Graphical representation of data distribution ⇒ rough sense of density of data counts/freq ... ... bin 5

  6. Inference • True population or process is modeled probabilistically • Sampling supplies us with realizations from probability model • Compute something, but recognize that we could have just as easily gotten a different set of realizations 6

  7. Inference 7

  8. Inference • We want to infer the characteristics of the true probability model from our one sample. 8

  9. The Law of Large Numbers • Sample mean converges to the population mean as sample size gets large • True for any probability density functions 9

  10. Sample Mean and Sample Size • Sample mean and sample variance 10

  11. The Central Limit Theorem • Sample mean (not samples) will be approximately normally distributed as a sample size 𝑛 → ∞ • More samples provide more confidence (or less uncertainty) • Note: true regardless of any distributions of population 11

  12. Uniform Distribution: 𝒚~𝑽 𝟏, 𝟐 12

  13. Sample Size 13

  14. Variance Gets Smaller as 𝒏 is Larger • Seems approximately Gaussian distributed • Numerically demonstrate that sample mean follows Gaussian distribution 14

  15. Multivariate Statistics • 𝑛 observations 𝑦 𝑗 , 𝑦 2 , ⋯ , 𝑦 𝑛 15

  16. Correlation of Two Random Variables • Correlation – Strength of linear relationship between two variables, 𝑦 and 𝑧 16

  17. Correlation of Two Random Variables • Assume 17

  18. Correlation Coefficient • +1 → close to a straight line • −1 → close to a straight line • Indicate how close to a linear line, but • No information on slope • Does not tell anything about causality 18

  19. Correlation Coefficient 19

  20. Correlation Coefficient 20

  21. Correlation Coefficient Plot • Plots correlation coefficients among pairs of variables • http://rpsychologist.com/d3/correlation/ 21

  22. Covariance Matrix 22

Recommend


More recommend