Precision, Accuracy, Standard Error & the Central Limit Theorem “He uses statistics as a drunken man uses lamp posts, for support rather than illumination.” Andrew Lang (Scottish poet)
Statistical Vocabulary Review • Descriptive statistics – numerical/graphical summary of data • Inferential statistics – predict or control the values of variables – make conclusions with • Distribution – probability associated with each possible value of a variable – a.k.a. probability function • Parameter – population characteristic; unknown – needs to be estimated for a sample (e.g. mean of a population) • Statistic – estimation of parameter (e.g. mean of a sample) • Error – difference between an observed value (or calculated) value and its true (or expected) value • Degrees of freedom – the number of values in the final calculation of a statistic that are free to vary
More Terminology • Precision – a measure of how close measured/estimated values are to each other • Accuracy – a measure of how close an estimator is expected to be to the true value of a parameter • Bias – how far the average statistic lies from the parameter it is estimating – e.g. the error which arises when measuring or estimating a parameter • Error – the difference between an observed value (or calculated) value and its true (or expected) value Errors from chance will cancel each other out in the long run, BUT those from bias will not.
ERROR high low high low BIAS ACCURACY low high PRECISION high low
Standard error “How confident are we in our statistic?” Standard error – standard deviation of a statistic Standard error of the mean - reflects the overall distribution of the means you would get from repeatedly resampling 𝑇𝐹 𝑛𝑓𝑏𝑜 = 𝑡 s = sample standard deviation 𝑜 n = sample size Small values = the more representative the sample will be of the overall population Large values = the less likely the sample adequately represents the overall population For parameter estimation - Less precision is reflected in a larger standard error
The Central Limit Theorem “ Sample means tend to cluster around the central population value .” Therefore : When sample size is large, you can assume that 𝑦 is close to the value of 𝜈 • • With a small sample size you have a better chance to get a mean that is far off the true population mean Normal distribution t-distribution (sampling distribution)
The Central Limit Theorem Prove it to yourself: www.tinyurl.com/clt-simulator 1. Click Begin on left 2. Select Uniform distribution as our parent population 3. For the first measurement select Mean with N=2 (only use 2 samples to generate mean) 4. For the first measurement select Mean with N=25 (use 25 samples to generate mean) 5. Repeatedly click Animated to watch points be randomly selected, and the mean of the sample generated/plotted in the distributions below 6. Click 5 or 10,000 or 100,000 to generate multiple animations at once (number of times you sample the population) 7. How does the resulting distribution of the means change?
Recommend
More recommend