business statistics
play

Business Statistics CONTENTS Sampling The central limit theorem - PowerPoint PPT Presentation

SAMPLING, THE CLT, AND THE STANDARD ERROR Business Statistics CONTENTS Sampling The central limit theorem Point and interval estimates for Confidence intervals for Old exam question Further study SAMPLING Suppose youre a


  1. SAMPLING, THE CLT, AND THE STANDARD ERROR Business Statistics

  2. CONTENTS Sampling The central limit theorem Point and interval estimates for ๐œˆ Confidence intervals for ๐œˆ Old exam question Further study

  3. SAMPLING Suppose youโ€™re a scissors manufacturer in the UK โ–ช What proportion of your production should be left-handed? โ–ช Three strategies โ–ช look at Wikipedia (โ€œ Studies suggest that 70 โ€“ 90% of the world population is right-handed.[4][5] โ€) โ–ช ask all persons in the UK (~63 million) โ–ช ask a sample of persons (100?) in the UK

  4. SAMPLING Sampling is the process of collecting data about a sample (a subset of the population), with the aim of representing the entire population โ–ช Arguments pro sampling โ–ช too costly to probe entire population โ–ช too time-consuming โ–ช too dangerous โ–ช too destructive โ–ช etc. โ–ช Arguments against sampling โ–ช limited accuracy ๏‚ฎ confidence intervals (later in this course) โ–ช not representative ๏‚ฎ design of experiments (not in this course)

  5. SAMPLING A sample should be representative โ–ช e.g., donโ€™t ask people at Schiphol if theyโ€™re afraid of flying A sample should be large enough โ–ช cf. the โ€œ ๐‘œ โ€ law later on Choice in sampling โ–ช with replacement or without replacement โ–ช this has consequences for the probability model

  6. SAMPLING Population Sample unknown known we would like to know irrelevant parameter statistic mostly Greek letters ( ๐œŒ , ๐œ ) mostly Roman letters ( ๐‘ž , ๐‘ก ) some deviating notations ( ๐‘‚ ) some deviating notations ( าง ๐‘ฆ , ๐‘œ )

  7. THE CENTRAL LIMIT THEOREM โ–ช Let ๐‘Œ 1 , ๐‘Œ 2 , โ€ฆ , ๐‘Œ ๐‘œ be a random sample from a population 2 ๐‘Œ with mean ๐œˆ ๐‘Œ and variance ๐œ ๐‘Œ โ–ช e.g., body heights of ๐‘œ persons Capital ๐‘Œ , because it is a โ–ช waiting times of ๐‘œ customers random variable! โ–ช failure rates of ๐‘œ cars, ... ๐‘Œ 1 +๐‘Œ 2 +โ‹ฏ+๐‘Œ ๐‘œ โ–ช Then, for ๐‘œ sufficiently large, the mean เดค ๐‘Œ = ๐‘œ 1. is normally distributed 2. with mean ๐œˆ เดค ๐‘Œ = ๐œˆ ๐‘Œ 2 2 = ๐œ ๐‘Œ 3. and variance ๐œ เดค Capital เดค ๐‘Œ , because this is ๐‘Œ ๐‘œ also a random variable!

  8. THE CENTRAL LIMIT THEOREM So for large ๐‘œ : 2 2 = ๐œ ๐‘Œ เดค ๐‘Œ~๐‘‚ ๐œˆ เดค ๐‘Œ = ๐œˆ ๐‘Œ , ๐œ เดค ๐‘Œ ๐‘œ โ–ช or for short 2 ๐‘Œ~๐‘‚ ๐œˆ ๐‘Œ , ๐œ ๐‘Œ เดค ๐‘œ โ–ช This holds regardless of the distribution of ๐‘Œ ! โ–ช so thatโ€™s why the normal distribution is called โ€œnormalโ€ โ–ช this fact is called the central limit theorem (CLT) โ–ช it is one of the most important results of statistics โ–ช it holds for โ€œsufficiently largeโ€ ๐‘œ

  9. THE CENTRAL LIMIT THEOREM The CLT for a fair die Distribution of เดค ๐‘Œ for โ–ช ๐‘œ = 1 โ–ช ๐‘œ = 2 โ–ช ๐‘œ = 5 โ–ช ๐‘œ = 20

  10. THE CENTRAL LIMIT THEOREM The CLT for a loaded (unfair) die Distribution of เดค ๐‘Œ for โ–ช ๐‘œ = 1 โ–ช ๐‘œ = 2 โ–ช ๐‘œ = 5 โ–ช ๐‘œ = 20

  11. EXERCISE 1 We roll with a die 100 times. The outcomes are ๐‘Œ = ๐‘Œ 1 , ๐‘Œ 2 , โ€ฆ , ๐‘Œ 100 . How is เดค ๐‘Œ distributed?

  12. THE CENTRAL LIMIT THEOREM A โ€œproofโ€ of the theorem (for normal populations) โ–ช Recall the additive property of the normal distribution: 2 and ๐‘Œ 2 ~๐‘‚ ๐œˆ ๐‘Œ , ๐œ ๐‘Œ 2 , then ๐‘Œ 1 + โ–ช if ๐‘Œ 1 ~๐‘‚ ๐œˆ ๐‘Œ , ๐œ ๐‘Œ 2 (provided ๐‘Œ 1 and ๐‘Œ 2 are independent) ๐‘Œ 2 ~๐‘‚ 2๐œˆ ๐‘Œ , 2๐œ ๐‘Œ 2 then ๐‘๐‘Œ~๐‘‚ ๐‘๐œˆ ๐‘Œ , ๐‘ 2 ๐œ ๐‘Œ 2 โ–ช Also recal that if ๐‘Œ~๐‘‚ ๐œˆ ๐‘Œ , ๐œ ๐‘Œ 2 2 then ๐‘Œ 1 +๐‘Œ 2 ๐œ ๐‘Œ โ–ช So, if ๐‘Œ 1 + ๐‘Œ 2 ~๐‘‚ 2๐œˆ ๐‘Œ , 2๐œ ๐‘Œ ~๐‘‚ ๐œˆ ๐‘Œ , 2 2 2 ๐‘Œ 1 +โ‹ฏ+๐‘Œ ๐‘œ ๐œ ๐‘Œ โ–ช and more general: ~๐‘‚ ๐œˆ ๐‘Œ , ๐‘œ ๐‘œ You donโ€™t need to reproduce 2 ๐œ ๐‘Œ โ–ช or equivalently: เดค such proofs, but it may help ๐‘Œ~๐‘‚ ๐œˆ ๐‘Œ , ๐‘œ โ–ช This proof works for normal populations and all ๐‘œ , but the CLT is valid for all populations and โ€œlargeโ€ ๐‘œ

  13. THE CENTRAL LIMIT THEOREM Some consequences of the CLT โ–ช เดค ๐‘Œ is an estimator of ๐œˆ ๐‘Œ โ–ช and าง ๐‘ฆ is the best estimate of ๐œˆ ๐‘Œ โ–ช เดค ๐‘Œ will be a better estimator for large ๐‘œ โ–ช because ๐œ เดค ๐‘Œ decreases with ๐‘œ โ–ช we can use the distribution of เดค ๐‘Œ to construct a confidence interval for ๐œˆ

  14. THE CENTRAL LIMIT THEOREM The CLT holds for ๐‘œ โ€œsufficientlyโ€ large โ–ช More specifically: โ–ช if ๐‘Œ is normally distributed, the CLT holds for all sample sizes ๐‘œ โ–ช if the distribution of ๐‘Œ is fairly symmetric without extreme outliers, for sample sizes ๐‘œ โ‰ฅ 15 the CLT gives a pretty good approximation of the distribution of เดค ๐‘Œ โ–ช for any distribution of เดค ๐‘Œ and a sample size ๐‘œ โ‰ฅ 30 , the CLT gives a pretty good approximation of the distribution of เดค ๐‘Œ

  15. THE CENTRAL LIMIT THEOREM The effect of asymmetry vs. sample size

  16. POINT AND INTERVAL ESTIMATES FOR ๐œˆ A statistic is a function of the (randomly sampled) data โ–ช important example: the statistic เดค ๐‘Œ 1 โ–ช defined by เดค ๐‘œ ๐‘œ ฯƒ ๐‘—=1 ๐‘Œ = ๐‘Œ ๐‘— 1 ๐‘œ โ–ช in a concrete case, าง ๐‘ฆ ๐‘— is the best possible ๐‘œ ฯƒ ๐‘—=1 ๐‘ฆ = estimate of the parameter ๐œˆ โ–ช so the sample mean าง ๐‘ฆ is the best possible estimate of the population mean ๐œˆ โ–ช because it is just one value, it is a point estimate

  17. าง POINT AND INTERVAL ESTIMATES FOR ๐œˆ Due to sampling variation, าง ๐‘ฆ will be different in each sample โ–ช and there will be a distribution of าง ๐‘ฆ -values, the distribution เดค ๐‘Œ โ–ช the true value of ๐œˆ may be different from the value of ๐‘ฆ obtained โ–ช however, keep in mind that the value of าง ๐‘ฆ obtained cannot be โ€œtooโ€ wrong 2 , so it follows that a specific โ–ช we know that เดค ๐‘Œ~๐‘‚ ๐œˆ เดค ๐‘Œ , ๐œ เดค ๐‘Œ value าง ๐‘ฆ must be within ๐œˆ เดค ๐‘Œ โˆ’ 1.96๐œ เดค ๐‘Œ , ๐œˆ เดค ๐‘Œ + 1.96๐œ เดค ๐‘Œ with 95% probability

  18. าง าง าง าง าง าง POINT AND INTERVAL ESTIMATES FOR ๐œˆ Conversely, the population value ๐œˆ เดค ๐‘Œ must be within ๐‘Œ with 95% probability ๐‘ฆ โˆ’ 1.96๐œ เดค ๐‘Œ , ๐‘ฆ + 1.96๐œ เดค โ–ช and because ๐œˆ เดค ๐‘Œ = ๐œˆ ๐‘Œ , the population value ๐œˆ ๐‘Œ must be within ๐‘Œ with 95% probability ๐‘ฆ โˆ’ 1.96๐œ เดค ๐‘Œ , ๐‘ฆ + 1.96๐œ เดค โ–ช this is an interval estimate for ๐œˆ ๐‘Œ โ–ช we say that ๐‘Œ is a 95% ๐‘ฆ โˆ’ 1.96๐œ เดค ๐‘Œ , ๐‘ฆ + 1.96๐œ เดค confidence interval for ๐œˆ ๐‘Œ

  19. POINT AND INTERVAL ESTIMATES FOR ๐œˆ So: โ–ช we estimate ๐œˆ ๐‘Œ by าง ๐‘ฆ โ–ช and we know with 95% probability that าง ๐‘ฆ โˆ’ 1.96๐œ เดค ๐‘Œ โ‰ค ๐œˆ ๐‘Œ โ‰ค าง ๐‘ฆ + 1.96๐œ เดค ๐‘Œ ๐œ ๐‘Œ โ–ช the quantity ๐œ เดค ๐‘œ is the standard error of the ๐‘Œ = distribution of the mean เดค ๐‘Œ โ–ช it is so important that we give it a special name: the standard error of the mean โ–ช sometimes (unfortunately!) abbreviated as the standard error

  20. EXERCISE 2 We sample ( ๐‘œ = 25 ) from a normal population ๐‘Œ with 2 = 4 . We find าง unknown ๐œˆ ๐‘Œ and known ๐œ ๐‘Œ ๐‘ฆ = 3 . a. Give a point estimate for ๐œˆ ๐‘Œ . b. Find the standard error of the mean, ๐‘ก เดค ๐‘Œ . b. Give a 95% -confidence interval for ๐œˆ ๐‘Œ .

  21. าง CONCEPTS AND SYMBOLS โ–ช Carefully distinguish: โ–ช ๐œˆ ๐‘Œ (a value, often unknown) ๐‘ฆ (a value from observations) โ–ช ๐‘Œ (a distribution, not a value) เดค โ–ช 2 (both are values, often โ–ช and its two parameters ๐œˆ เดค ๐‘Œ and ๐œ เดค ๐‘Œ unknown) โ–ช Later on, we will follow a similar logic, e.g. 2 โ–ช ๐œ ๐‘Œ 2 โ–ช ๐‘ก ๐‘Œ and the CLT claims that 2 ๐œˆ เดค ๐‘Œ = ๐œˆ ๐‘Œ โ–ช ๐‘‡ ๐‘Œ 2 2 = ๐œ ๐‘Œ โ–ช and its two parameters ๐œ เดค ๐‘Œ ๐‘œ

  22. OLD EXAM QUESTION 23 March 2015, Q1h

  23. FURTHER STUDY Doane & Seward 5/E 8.1-8.3 Tutorial exercises week 2 sampling distribution central limit theorem standard error

Recommend


More recommend