๐ : ESTIMATES, CONFIDENCE INTERVALS, AND TESTS Business Statistics
CONTENTS Estimating parameters The sampling distribution Confidence intervals for ๐ Hypothesis tests for ๐ The ๐ข -distribution Comparison of ๐จ and ๐ข Old exam question Further study
ESTIMATING PARAMETERS Central task in inferential statistics โช Estimation โช estimating a parameter (population value) from a sample โช Example โช what proportion of cars in Amsterdam is electric? โช population value: ๐ โช sample of size ๐ = 200 cars yields 26 electric cars 26 โช so, ๐ = 200 = 0.13 โช this suggests ๐ โ 0.13
ESTIMATING PARAMETERS Terminology โช Parameter โช a characteristic descriptive of the population โช e.g., ๐ , ๐ , ๐ (or ๐ 2 ) โช Estimator โช a statistic derived from a sample to infer the value of a population parameter โช e.g., เดค ๐ , ๐ , ๐ (or ๐ 2 ) โช Estimate โช the value of the estimator in a particular sample โช e.g., าง ๐ฆ , ๐ , ๐ก (or ๐ก 2 )
ESTIMATING PARAMETERS
าง ESTIMATING PARAMETERS Estimator Estimate Population parameter 1 1 Mean ๐ เดค ๐ ๐ ๐ = ๐ ฯ ๐=1 ๐ ๐ ๐ฆ = ๐ ฯ ๐=1 ๐ฆ ๐ Standard ๐ 1 1 ๐ ๐ ๐ โ เดค ๐ ๐ 2 ๐ฆ 2 ๐ = ๐โ1 ฯ ๐=1 ๐ก = ๐โ1 ฯ ๐=1 ๐ฆ ๐ โ าง deviation ๐ฆ ๐ Proportion ๐ ๐ = ๐ = ๐ ๐
ESTIMATING PARAMETERS โช Another example (Amsterdam, 2015): โช what is the mean price of a glass of beer? โช population value: ๐ โช sample of size ๐ = 64 glasses of beer yields าง ๐ฆ = 2.06โฌ โช this suggests that ๐ = 2.06โฌ โช But suppose we had taken a different sample โช again with sample size ๐ = 64 โช but now perhaps yielding าง ๐ฆ = 2.13โฌ โช then we would estimate ๐ = 2.13โฌ โช Obviously there is sampling variation ๐ฆ -values (the sampling distribution of เดค โช so a distribution of าง ๐ ) โช Solution: point estimates and confidence intervals
THE SAMPLING DISTRIBUTION โช Example โช Consider a discrete uniform population consisting of the integers {0, 1, 2, 3} โช The population parameters are: โช ๐ = 1.5 โช ๐ = 1.118
THE SAMPLING DISTRIBUTION โช Sample ๐ = 2 values and calculate าง ๐ฆ โช Do this for all possible sample of size ๐ = 2 ๐ฆ -values: the distribution เดค โช You will get a distribution of าง ๐
THE SAMPLING DISTRIBUTION โช We will study the variance of the estimate of a population parameter from a sample statistic โช We will do so by studying how the sample statistic varies when you draw a different sample โช Example: โช GMAT score of MBA students โช ๐ = 2637 โช ๐ = 520.78 โช ๐ = 86.60
THE SAMPLING DISTRIBUTION โช Consider eight random samples, each of size ๐ = 5 โช the sample means ( าง ๐ฆ 8 = 582 ) ๐ฆ 1 = 504.0, าง ๐ฆ 2 = 576.0, โฆ , าง tend to be close to the population mean ( ๐ = 520.78 ) โช sometimes a bit lower, sometimes a bit higher
THE SAMPLING DISTRIBUTION โช The dot plots show that the sample means ( าง ๐ฆ 8 ) ๐ฆ 1 , โฆ , าง have much less variation than the individual data points ( ๐ฆ 1 , โฆ , ๐ฆ 2637 )
THE SAMPLING DISTRIBUTION โช An estimator is a random variable since samples vary โช so we write it as a capital letter, e.g., ๐ , เดค ๐ , ๐ , etc. โช The sampling distribution of an estimator is the probability distribution of all possible values the statistic may assume when a random sample of (a fixed) size ๐ is taken โช so we write ๐~๐ ๐, ๐ , etc.
THE SAMPLING DISTRIBUTION โช The sampling distribution of เดค ๐ โช for a population with ๐ = ๐ ๐ and ๐ 2 = ๐ ๐ 2 โช If the CLT holds 2 ๐~๐ ๐ ๐ , ๐ ๐ 3 things: เดค shape, mean, dispersion ๐ โช So, the statistic เดค ๐ โช is normally distributed โช has mean ๐ ๐ ๐ ๐ โช and has standard deviation ๐ โช Fortunately, the CLT holds pretty often
THE SAMPLING DISTRIBUTION โช The standard deviation of the distribution of sample means เดค ๐ ๐ ๐ โช is given by ๐ เดค ๐ = ๐ โช has a special name: standard error of the mean โช is often abbreviated as the standard error (SE) โช decreases with increasing sample size โช but only according to the โlaw of diminishing returnsโ ( 1/ ๐ ) โช is often calculated by software (SPSS, etc.) โช is the basis for confidence intervals and hypothesis tests (see later) Thatโs a bit confusing, because we will meet more standard errors later on
EXERCISE 1 What is the meaning of the standard error?
CONFIDENCE INTERVALS FOR ๐ โช A sample mean าง ๐ฆ is a point estimate of the population mean ๐ โช it is the best possible estimate of ๐ To simplify notation, we will drop the โ ๐ โ from ๐ ๐ now, โช but it will probably not be completely right and write just ๐ โช A confidence interval (CI) for the mean is a range of possible values for ๐ : ๐ lower โค ๐ โค ๐ upper โช such that the interval ๐ท๐ฝ ๐ = ๐ lower , ๐ upper contains the true value ( ๐ ) with a certain probability (e.g., 95% )
าง CONFIDENCE INTERVALS FOR ๐ โช From the CLT it follows that under certain conditions: the distribution of เดค โช ๐ is normal the best estimate of เดค โช ๐ of ๐ is าง ๐ฆ ๐ the standard deviation of เดค ๐ is โช ๐ โช This implies that: ๐ ๐ with probability 2.5% , เดค ๐ โ ๐ > เดค โช ๐ < ๐ โ 1.96 ๐ + 1.96 ๐ ๐ ๐ with probability 2.5% , เดค ๐ โ ๐ < เดค โช ๐ > ๐ + 1.96 ๐ โ 1.96 ๐ ๐ ๐ so with probability 95% , เดค ๐ โค ๐ โค เดค โช ๐ โ 1.96 ๐ + 1.96 ๐ โช So, if we find a sample mean าง ๐ฆ , we can construct the following 95% confidence interval for ๐ : ๐ฆ โ 1.96 ๐ ๐ฆ + 1.96 ๐ CI ๐,0.95 = ๐ , าง ๐
าง าง าง CONFIDENCE INTERVALS FOR ๐ Three notations for a confidence interval for ๐ ๐ ๐ โช ๐ฆ โ 1.96 ๐ , าง ๐ฆ + 1.96 ๐ ๐ ๐ โช ๐ฆ โ 1.96 ๐ โค ๐ โค าง ๐ฆ + 1.96 ๐ ๐ โช ๐ฆ ยฑ 1.96 ๐
าง CONFIDENCE INTERVALS FOR ๐ Example โช Population โช ๐ = 520.78 (unknown) โช ๐ = 86.60 (known) โช normally distributed (assumed) โช Sample โช ๐ = 5 (chosen) ๐ฆ = 504.0 (estimated) โช โช Calculation 86.60 โช standard error of mean: 5 = 38.73 โช 1.96 ร 38.73 = 75.91 โช ๐ท๐ฝ ๐,0.95 = 428.09, 579.91
EXERCISE 2 Write the confidence interval 428.09, 579.91 in two alternative ways.
าง CONFIDENCE INTERVALS FOR ๐ โช The factor 1.96 is of course related to the 95% probability โช Other confidence levels: Where ๐จ ๐ฝ/2 is such that ๐ ๐ โค ๐จ ๐ฝ/2 = ๐ฝ if ๐ is drawn from a ๐ -distribution โช General form of a 1 โ ๐ฝ ร 100% confidence interval of the mean: ๐ ๐ CI ๐,1โ๐ฝ = ๐ฆ โ ๐จ ๐ฝ/2 ๐ , าง ๐ฆ + ๐จ ๐ฝ/2 ๐
CONFIDENCE INTERVALS FOR ๐
CONFIDENCE INTERVALS FOR ๐ โช Trade-off โช narrow CI ๏ low confidence level โช wide CI ๏ high confidence level โช Choice of confidence level depends on application โช more precision required for a refinery than for a dairy farm
CONFIDENCE INTERVALS FOR ๐ โช A confidence interval either does or does not contain ๐ โช The confidence level quantifies the risk โช Out of 100 confidence intervals, approximately 95% will contain ๐ , while approximately 5% might not contain ๐
HYPOTHESIS TESTS FOR ๐ โช We can use the standard error to perform a hypothesis test โช recall that ๐ท๐ฝ ๐,0.95 = 428.09, 579.91 โช Suppose we hypothesize ๐ = 550 โช The value 550 is inside the 95% confidence interval for ๐ โช therefore the sample statistic+confidence interval will not suggest that the hypothesis ( ๐ = 550 ) is wrong โช and we will not reject the hypothesis โช notice that we didnโt say that ๐ = 550 ; we only said that we canโt reject it (at a 5% significance level)
HYPOTHESIS TESTS FOR ๐ โช Another example: suppose we hypothesize that ๐ = 600 โช The value 600 is outside the confidence interval for ๐ โช finding a confidence interval not containing ๐ happens only in 5% of the cases โช so we conclude that ๐ โ 600 (at a 5% significance level) โช therefore the sample statistic+confidence interval will suggest that the hypothesis ( ๐ = 600 ) is wrong โช and we will reject the hypothesis Much more on hypothesis tests later on!
าง THE ๐ข -DISTRIBUTION ๐ ๐ โช A closer look at CI ๐,0.95 = ๐ฆ โ 1.96 ๐ , าง ๐ฆ + 1.96 ๐ โช Given a sample mean าง ๐ฆ , you can find a 95% confidence interval for the population mean ๐ โช Sounds great when you donโt know ๐ ... โช ... but it assumes you do know ๐ ! โช There are many situations in which you donโt know ๐ and you also donโt know ๐ โช So what to do?
THE ๐ข -DISTRIBUTION โช A simple strategy โช If the population standard deviation ๐ is unknown, we can estimate it with the sample standard deviation ๐ก ๐ก ๐ โช Then we use ยฑ1.96 ๐ instead of ยฑ1.96 ๐ โช But we pay a price for that โช The reason is that ๐ก is itself an estimate of ๐ , and therefore uncertain โช The price we pay is that the factor โ 1.96 โ must be somewhat larger
Recommend
More recommend