Sampling Marc H. Mehlman marcmehlman@yahoo.com University of New Haven Marc Mehlman Marc Mehlman (University of New Haven) Sampling 1 / 20
Table of Contents Sampling Distributions 1 Central Limit Theorem 2 Binomial Distribution 3 Marc Mehlman Marc Mehlman (University of New Haven) Sampling 2 / 20
Sampling Distributions Sampling Distributions Sampling Distributions Marc Mehlman Marc Mehlman (University of New Haven) Sampling 3 / 20
Sampling Distributions Parameters and Statistics As we begin to use sample data to draw conclusions about a wider population, we must be clear about whether a number describes a sample or a population. A parameter is a number that describes some characteristic of the A parameter is a number that describes some characteristic of the population. In statistical practice, the value of a parameter is not population. In statistical practice, the value of a parameter is not known because we cannot examine the entire population. known because we cannot examine the entire population. A statistic is a number that describes some characteristic of a A statistic is a number that describes some characteristic of a sample. The value of a statistic can be computed directly from the sample. The value of a statistic can be computed directly from the sample data. We often use a statistic to estimate an unknown sample data. We often use a statistic to estimate an unknown parameter. parameter. Remember s and p: s tatistics come from s amples and p arameters come from p opulations. We write µ (the Greek letter mu) for the population mean and σ for the x population standard deviation. We write (x-bar) for the sample mean and s for the sample standard deviation. 4 Marc Mehlman Marc Mehlman (University of New Haven) Sampling 4 / 20
Sampling Distributions Statistical Estimation The process of statistical inference involves using information from a sample to draw conclusions about a wider population. Different random samples yield different statistics. We need to be able to describe the sampling distribution of possible statistic values in order to perform statistical inference. We can think of a statistic as a random variable because it takes numerical values that describe the outcomes of the random sampling process. Population Population Collect data from a Sample Sample representative Sample ... Make an Inference about the Population. 5 Marc Mehlman Marc Mehlman (University of New Haven) Sampling 5 / 20
Sampling Distributions Sampling Variability Different random samples yield different statistics. This basic fact is called sampling variability: the value of a statistic varies in repeated random sampling. To make sense of sampling variability, we ask, “What would happen if we took many samples?” Population Population Sample Sample Sample Sample Sample Sample Sample Sample Sample Sample Sample Sample Sample Sample Sample Sample 6 Marc Mehlman Marc Mehlman (University of New Haven) Sampling 6 / 20
Sampling Distributions Sampling Distributions The law of large numbers assures us that if we measure enough subjects, the statistic x-bar will eventually get very close to the unknown parameter µ . If we took every one of the possible samples of a certain size, calculated the sample mean for each, and graphed all of those values, we’d have a sampling distribution. The population distribution of a variable is the distribution of The population distribution of a variable is the distribution of values of the variable among all individuals in the population. values of the variable among all individuals in the population. The sampling distribution of a statistic is the distribution of The sampling distribution of a statistic is the distribution of values taken by the statistic in all possible samples of the same values taken by the statistic in all possible samples of the same size from the same population. size from the same population. 7 Marc Mehlman Marc Mehlman (University of New Haven) Sampling 7 / 20
Sampling Distributions Mean and Standard Deviation of a Sample Mean Mean of a sampling distribution of a sample mean There is no tendency for a sample mean to fall systematically above or below µ , even if the distribution of the raw data is skewed. Thus, the mean of the sampling distribution is an unbiased estimate of the population mean µ. Standard deviation of a sampling distribution of a sample mean The standard deviation of the sampling distribution measures how much the sample statistic varies from sample to sample. It is smaller than the standard deviation of the population by a factor of √ n . Averages are less variable than individual observations. 8 Marc Mehlman Marc Mehlman (University of New Haven) Sampling 8 / 20
Sampling Distributions The Sampling Distribution of a Sample Mean When we choose many SRSs from a population, the sampling distribution of the sample mean is centered at the population mean µ and is less spread out than the population distribution. Here are the facts. The Sampling Distribution of Sample Means The Sampling Distribution of Sample Means Suppose that x is the mean of an SRS of size n drawn from a large population with mean µ and standard deviation σ . Then : is µ x = µ The mean of the sampling distribution of x The st andard deviation of the sampling distribution of x is σ x = σ n Note: These facts about the mean and standard deviation of x are true no matter what shape the population distribution has . If individual observations have the N ( µ,σ) distribution, then the sample mean If individual observations have the N ( µ,σ) distribution, then the sample mean of an SRS of size n has the N ( µ , σ/√ n ) distribution regardless of the sample of an SRS of size n has the N ( µ , σ/√ n ) distribution regardless of the sample size n . 9 size n . 9 Marc Mehlman Marc Mehlman (University of New Haven) Sampling 9 / 20
Central Limit Theorem Central Limit Theorem Central Limit Theorem Marc Mehlman Marc Mehlman (University of New Haven) Sampling 10 / 20
Central Limit Theorem Central Limit Theorem “I know of scarcely anything so apt to impress the imagination as the wonderful form of cosmic order expressed by the “law of frequency of error” [the normal distribution]. The law would have been personified by the Greeks and deified, if they had known of it. It reigns with serenity and in complete self effacement amidst the wildest confusion. The huger the mob, and the greater the anarchy, the more perfect is its sway. It is the supreme law of Unreason.” – Francis Galton In the previous slide, the sampling distribution of ¯ X is depicted as: 1 with mean µ , ie unbiased. with standard deviation σ/ √ n . 2 with normal distribution. 3 The first two depictions are always true, regardless of sample size or population distribution. The Central Limit Theorem (below) says the third depiction is approximately true, regardless of population distribution, for large sample sizes, n . As Francis Galton said, the averaged effects of random acts from a large mob form a familiar pattern. Theorem (Central Limit Theorem, CLT) Consider a random sample of size n from a population with mean µ and standard deviation σ . µ, σ/ √ n For large n, the sampling distribution of ¯ � � X is approximately N . Marc Mehlman Marc Mehlman (University of New Haven) Sampling 11 / 20
Central Limit Theorem Example Based on service records from the past year, the time (in hours) that a technician requires to complete preventative maintenance on an air conditioner follows the distribution that is strongly right-skewed, and whose most likely outcomes are close to 0. The mean time is µ = 1 hour and the standard deviation is σ = 1. Your company will service an SRS of 70 air conditioners. You have budgeted 1.1 hours per unit. Will this be enough? The central limit theorem states that the sampling distribution of the mean time spent working on the 70 units is: σ x = σ 1 n = 70 = 0.12 = μ = μ x 1 The sampling distribution of the mean time spent working is approximately N (1, 0.12) because n = 70 ≥ 30. P ( x > 1.1) = P ( Z > 0.83) z = 1.1 − 1 0.12 = 0.83 = 1 − 0.7967 = 0.2033 If you budget 1.1 hours per unit, there is a 20% chance the technicians will not complete the 11 work within the budgeted time. Marc Mehlman Marc Mehlman (University of New Haven) Sampling 12 / 20
Central Limit Theorem A Few More Facts Any linear combination of independent Normal random variables is also Normally distributed. More generally, the central limit theorem notes that the distribution of a sum or average of many small random quantities is close to Normal. Finally, the central limit theorem also applies to discrete random variables. 12 Marc Mehlman Marc Mehlman (University of New Haven) Sampling 13 / 20
Binomial Distribution Binomial Distribution Binomial Distribution Marc Mehlman Marc Mehlman (University of New Haven) Sampling 14 / 20
Binomial Distribution Definition (Bernoulli Distribution, X ∼ BIN (1 , p )) Model: X = # heads after tossing a coin once, that has a probability of heads on each toss equal to p . Definition (Binomial Distribution, X ∼ BIN ( n , p )) Model: X = # heads after tossing a coin n times, that has a probability of heads on each toss equal to p . Theorem If X ∼ BIN ( n , p ) and j is a nonnegative integer between 0 and n inclusive � n � p j (1 − p ) n − j . P ( X = j ) = j Furthermore σ 2 � µ X = np , X = np (1 − p ) σ X = np (1 − p ) . and Marc Mehlman Marc Mehlman (University of New Haven) Sampling 15 / 20
Recommend
More recommend