business statistics
play

Business Statistics CONTENTS Estimating parameters The sampling - PowerPoint PPT Presentation

: ESTIMATES, CONFIDENCE INTERVALS, AND TESTS Business Statistics CONTENTS Estimating parameters The sampling distribution Confidence intervals for Hypothesis tests for The -distribution Comparison of and Old exam


  1. ๐œˆ : ESTIMATES, CONFIDENCE INTERVALS, AND TESTS Business Statistics

  2. CONTENTS Estimating parameters The sampling distribution Confidence intervals for ๐œˆ Hypothesis tests for ๐œˆ The ๐‘ข -distribution Comparison of ๐‘จ and ๐‘ข Old exam question Further study

  3. ESTIMATING PARAMETERS Central task in inferential statistics โ–ช Estimation โ–ช estimating a parameter (population value) from a sample โ–ช Example โ–ช what proportion of cars in Amsterdam is electric? โ–ช population value: ๐œŒ โ–ช sample of size ๐‘œ = 200 cars yields 26 electric cars 26 โ–ช so, ๐‘ž = 200 = 0.13 โ–ช this suggests ๐œŒ โ‰ˆ 0.13

  4. ESTIMATING PARAMETERS Terminology โ–ช Parameter โ–ช a characteristic descriptive of the population โ–ช e.g., ๐œˆ , ๐œŒ , ๐œ (or ๐œ 2 ) โ–ช Estimator โ–ช a statistic derived from a sample to infer the value of a population parameter โ–ช e.g., เดค ๐‘Œ , ๐‘„ , ๐‘‡ (or ๐‘‡ 2 ) โ–ช Estimate โ–ช the value of the estimator in a particular sample โ–ช e.g., าง ๐‘ฆ , ๐‘ž , ๐‘ก (or ๐‘ก 2 )

  5. ESTIMATING PARAMETERS

  6. าง ESTIMATING PARAMETERS Estimator Estimate Population parameter 1 1 Mean ๐œˆ เดค ๐‘œ ๐‘œ ๐‘Œ = ๐‘œ ฯƒ ๐‘—=1 ๐‘Œ ๐‘— ๐‘ฆ = ๐‘œ ฯƒ ๐‘—=1 ๐‘ฆ ๐‘— Standard ๐œ 1 1 ๐‘œ ๐‘Œ ๐‘— โˆ’ เดค ๐‘œ ๐‘Œ 2 ๐‘ฆ 2 ๐‘‡ = ๐‘œโˆ’1 ฯƒ ๐‘—=1 ๐‘ก = ๐‘œโˆ’1 ฯƒ ๐‘—=1 ๐‘ฆ ๐‘— โˆ’ าง deviation ๐‘ฆ ๐‘Œ Proportion ๐œŒ ๐‘ž = ๐‘„ = ๐‘œ ๐‘œ

  7. ESTIMATING PARAMETERS โ–ช Another example (Amsterdam, 2015): โ–ช what is the mean price of a glass of beer? โ–ช population value: ๐œˆ โ–ช sample of size ๐‘œ = 64 glasses of beer yields าง ๐‘ฆ = 2.06โ‚ฌ โ–ช this suggests that ๐œˆ = 2.06โ‚ฌ โ–ช But suppose we had taken a different sample โ–ช again with sample size ๐‘œ = 64 โ–ช but now perhaps yielding าง ๐‘ฆ = 2.13โ‚ฌ โ–ช then we would estimate ๐œˆ = 2.13โ‚ฌ โ–ช Obviously there is sampling variation ๐‘ฆ -values (the sampling distribution of เดค โ–ช so a distribution of าง ๐‘Œ ) โ–ช Solution: point estimates and confidence intervals

  8. THE SAMPLING DISTRIBUTION โ–ช Example โ–ช Consider a discrete uniform population consisting of the integers {0, 1, 2, 3} โ–ช The population parameters are: โ–ช ๐œˆ = 1.5 โ–ช ๐œ = 1.118

  9. THE SAMPLING DISTRIBUTION โ–ช Sample ๐‘œ = 2 values and calculate าง ๐‘ฆ โ–ช Do this for all possible sample of size ๐‘œ = 2 ๐‘ฆ -values: the distribution เดค โ–ช You will get a distribution of าง ๐‘Œ

  10. THE SAMPLING DISTRIBUTION โ–ช We will study the variance of the estimate of a population parameter from a sample statistic โ–ช We will do so by studying how the sample statistic varies when you draw a different sample โ–ช Example: โ–ช GMAT score of MBA students โ–ช ๐‘‚ = 2637 โ–ช ๐œˆ = 520.78 โ–ช ๐œ = 86.60

  11. THE SAMPLING DISTRIBUTION โ–ช Consider eight random samples, each of size ๐‘œ = 5 โ–ช the sample means ( าง ๐‘ฆ 8 = 582 ) ๐‘ฆ 1 = 504.0, าง ๐‘ฆ 2 = 576.0, โ€ฆ , าง tend to be close to the population mean ( ๐œˆ = 520.78 ) โ–ช sometimes a bit lower, sometimes a bit higher

  12. THE SAMPLING DISTRIBUTION โ–ช The dot plots show that the sample means ( าง ๐‘ฆ 8 ) ๐‘ฆ 1 , โ€ฆ , าง have much less variation than the individual data points ( ๐‘ฆ 1 , โ€ฆ , ๐‘ฆ 2637 )

  13. THE SAMPLING DISTRIBUTION โ–ช An estimator is a random variable since samples vary โ–ช so we write it as a capital letter, e.g., ๐‘Œ , เดค ๐‘Œ , ๐‘‡ , etc. โ–ช The sampling distribution of an estimator is the probability distribution of all possible values the statistic may assume when a random sample of (a fixed) size ๐‘œ is taken โ–ช so we write ๐‘Œ~๐‘‚ ๐œˆ, ๐œ , etc.

  14. THE SAMPLING DISTRIBUTION โ–ช The sampling distribution of เดค ๐‘Œ โ–ช for a population with ๐œˆ = ๐œˆ ๐‘Œ and ๐œ 2 = ๐œ ๐‘Œ 2 โ–ช If the CLT holds 2 ๐‘Œ~๐‘‚ ๐œˆ ๐‘Œ , ๐œ ๐‘Œ 3 things: เดค shape, mean, dispersion ๐‘œ โ–ช So, the statistic เดค ๐‘Œ โ–ช is normally distributed โ–ช has mean ๐œˆ ๐‘Œ ๐œ ๐‘Œ โ–ช and has standard deviation ๐‘œ โ–ช Fortunately, the CLT holds pretty often

  15. THE SAMPLING DISTRIBUTION โ–ช The standard deviation of the distribution of sample means เดค ๐‘Œ ๐œ ๐‘Œ โ–ช is given by ๐œ เดค ๐‘Œ = ๐‘œ โ–ช has a special name: standard error of the mean โ–ช is often abbreviated as the standard error (SE) โ–ช decreases with increasing sample size โ–ช but only according to the โ€œlaw of diminishing returnsโ€ ( 1/ ๐‘œ ) โ–ช is often calculated by software (SPSS, etc.) โ–ช is the basis for confidence intervals and hypothesis tests (see later) Thatโ€™s a bit confusing, because we will meet more standard errors later on

  16. EXERCISE 1 What is the meaning of the standard error?

  17. CONFIDENCE INTERVALS FOR ๐œˆ โ–ช A sample mean าง ๐‘ฆ is a point estimate of the population mean ๐œˆ โ–ช it is the best possible estimate of ๐œˆ To simplify notation, we will drop the โ€œ ๐‘Œ โ€ from ๐œˆ ๐‘Œ now, โ–ช but it will probably not be completely right and write just ๐œˆ โ–ช A confidence interval (CI) for the mean is a range of possible values for ๐œˆ : ๐œˆ lower โ‰ค ๐œˆ โ‰ค ๐œˆ upper โ–ช such that the interval ๐ท๐ฝ ๐œˆ = ๐œˆ lower , ๐œˆ upper contains the true value ( ๐œˆ ) with a certain probability (e.g., 95% )

  18. าง CONFIDENCE INTERVALS FOR ๐œˆ โ–ช From the CLT it follows that under certain conditions: the distribution of เดค โ–ช ๐‘Œ is normal the best estimate of เดค โ–ช ๐‘Œ of ๐œˆ is าง ๐‘ฆ ๐œ the standard deviation of เดค ๐‘Œ is โ–ช ๐‘œ โ–ช This implies that: ๐œ ๐œ with probability 2.5% , เดค ๐‘œ โ‡’ ๐œˆ > เดค โ–ช ๐‘Œ < ๐œˆ โˆ’ 1.96 ๐‘Œ + 1.96 ๐‘œ ๐œ ๐œ with probability 2.5% , เดค ๐‘œ โ‡’ ๐œˆ < เดค โ–ช ๐‘Œ > ๐œˆ + 1.96 ๐‘Œ โˆ’ 1.96 ๐‘œ ๐œ ๐œ so with probability 95% , เดค ๐‘œ โ‰ค ๐œˆ โ‰ค เดค โ–ช ๐‘Œ โˆ’ 1.96 ๐‘Œ + 1.96 ๐‘œ โ–ช So, if we find a sample mean าง ๐‘ฆ , we can construct the following 95% confidence interval for ๐œˆ : ๐‘ฆ โˆ’ 1.96 ๐œ ๐‘ฆ + 1.96 ๐œ CI ๐œˆ,0.95 = ๐‘œ , าง ๐‘œ

  19. าง าง าง CONFIDENCE INTERVALS FOR ๐œˆ Three notations for a confidence interval for ๐œˆ ๐œ ๐œ โ–ช ๐‘ฆ โˆ’ 1.96 ๐‘œ , าง ๐‘ฆ + 1.96 ๐‘œ ๐œ ๐œ โ–ช ๐‘ฆ โˆ’ 1.96 ๐‘œ โ‰ค ๐œˆ โ‰ค าง ๐‘ฆ + 1.96 ๐‘œ ๐œ โ–ช ๐‘ฆ ยฑ 1.96 ๐‘œ

  20. าง CONFIDENCE INTERVALS FOR ๐œˆ Example โ–ช Population โ–ช ๐œˆ = 520.78 (unknown) โ–ช ๐œ = 86.60 (known) โ–ช normally distributed (assumed) โ–ช Sample โ–ช ๐‘œ = 5 (chosen) ๐‘ฆ = 504.0 (estimated) โ–ช โ–ช Calculation 86.60 โ–ช standard error of mean: 5 = 38.73 โ–ช 1.96 ร— 38.73 = 75.91 โ–ช ๐ท๐ฝ ๐œˆ,0.95 = 428.09, 579.91

  21. EXERCISE 2 Write the confidence interval 428.09, 579.91 in two alternative ways.

  22. าง CONFIDENCE INTERVALS FOR ๐œˆ โ–ช The factor 1.96 is of course related to the 95% probability โ–ช Other confidence levels: Where ๐‘จ ๐›ฝ/2 is such that ๐‘„ ๐‘Ž โ‰ค ๐‘จ ๐›ฝ/2 = ๐›ฝ if ๐‘Ž is drawn from a ๐‘Ž -distribution โ–ช General form of a 1 โˆ’ ๐›ฝ ร— 100% confidence interval of the mean: ๐œ ๐œ CI ๐œˆ,1โˆ’๐›ฝ = ๐‘ฆ โˆ’ ๐‘จ ๐›ฝ/2 ๐‘œ , าง ๐‘ฆ + ๐‘จ ๐›ฝ/2 ๐‘œ

  23. CONFIDENCE INTERVALS FOR ๐œˆ

  24. CONFIDENCE INTERVALS FOR ๐œˆ โ–ช Trade-off โ–ช narrow CI ๏ƒ› low confidence level โ–ช wide CI ๏ƒ› high confidence level โ–ช Choice of confidence level depends on application โ–ช more precision required for a refinery than for a dairy farm

  25. CONFIDENCE INTERVALS FOR ๐œˆ โ–ช A confidence interval either does or does not contain ๐œˆ โ–ช The confidence level quantifies the risk โ–ช Out of 100 confidence intervals, approximately 95% will contain ๐œˆ , while approximately 5% might not contain ๐œˆ

  26. HYPOTHESIS TESTS FOR ๐œˆ โ–ช We can use the standard error to perform a hypothesis test โ–ช recall that ๐ท๐ฝ ๐œˆ,0.95 = 428.09, 579.91 โ–ช Suppose we hypothesize ๐œˆ = 550 โ–ช The value 550 is inside the 95% confidence interval for ๐œˆ โ–ช therefore the sample statistic+confidence interval will not suggest that the hypothesis ( ๐œˆ = 550 ) is wrong โ–ช and we will not reject the hypothesis โ–ช notice that we didnโ€™t say that ๐œˆ = 550 ; we only said that we canโ€™t reject it (at a 5% significance level)

  27. HYPOTHESIS TESTS FOR ๐œˆ โ–ช Another example: suppose we hypothesize that ๐œˆ = 600 โ–ช The value 600 is outside the confidence interval for ๐œˆ โ–ช finding a confidence interval not containing ๐œˆ happens only in 5% of the cases โ–ช so we conclude that ๐œˆ โ‰  600 (at a 5% significance level) โ–ช therefore the sample statistic+confidence interval will suggest that the hypothesis ( ๐œˆ = 600 ) is wrong โ–ช and we will reject the hypothesis Much more on hypothesis tests later on!

  28. าง THE ๐‘ข -DISTRIBUTION ๐œ ๐œ โ–ช A closer look at CI ๐œˆ,0.95 = ๐‘ฆ โˆ’ 1.96 ๐‘œ , าง ๐‘ฆ + 1.96 ๐‘œ โ–ช Given a sample mean าง ๐‘ฆ , you can find a 95% confidence interval for the population mean ๐œˆ โ–ช Sounds great when you donโ€™t know ๐œˆ ... โ–ช ... but it assumes you do know ๐œ ! โ–ช There are many situations in which you donโ€™t know ๐œˆ and you also donโ€™t know ๐œ โ–ช So what to do?

  29. THE ๐‘ข -DISTRIBUTION โ–ช A simple strategy โ–ช If the population standard deviation ๐œ is unknown, we can estimate it with the sample standard deviation ๐‘ก ๐‘ก ๐œ โ–ช Then we use ยฑ1.96 ๐‘œ instead of ยฑ1.96 ๐‘œ โ–ช But we pay a price for that โ–ช The reason is that ๐‘ก is itself an estimate of ๐œ , and therefore uncertain โ–ช The price we pay is that the factor โ€œ 1.96 โ€ must be somewhat larger

Recommend


More recommend