samples and statistics
play

Samples and Statistics The objective of statistical inference is to - PowerPoint PPT Presentation

ST 435/535 Statistical Methods for Quality and Productivity Improvement / Statistical Process Control Samples and Statistics The objective of statistical inference is to draw conclusions or make decisions about a population, based on a sample


  1. ST 435/535 Statistical Methods for Quality and Productivity Improvement / Statistical Process Control Samples and Statistics “The objective of statistical inference is to draw conclusions or make decisions about a population, based on a sample selected from the population.” Inference is simplest when the sample is a random sample from the population: the sample values X 1 , X 2 , . . . , X n are statistically independent and all have the same distribution. That is not possible when sampling without replacement from a finite population; in that case, a random sample is one that is drawn in � N � such a way that all possible samples have the same probability of n being chosen. 1 / 41 Inferences About Process Quality Statistics and Sampling Distributions

  2. ST 435/535 Statistical Methods for Quality and Productivity Improvement / Statistical Process Control It is not always possible or desirable to use a random sample. For example, the successive values plotted in a control chart are rarely independent, because they are influenced by slow-changing properties of the system. When we know, or suspect, that the sample was not a random sample, we should use appropriate methods. 2 / 41 Inferences About Process Quality Statistics and Sampling Distributions

  3. ST 435/535 Statistical Methods for Quality and Productivity Improvement / Statistical Process Control Statistic A statistic is a quantity that can be calculated from only the values in a sample. Examples of statistics: Sample mean: n x = 1 � ¯ x i ; n i =1 Sample standard deviation: � n � 1 � � s = ( x i − ¯ x ) 2 ; � n − 1 i =1 A quantity like ¯ x − µ is not a statistic, because to calculate it we must know the value of the population parameter µ . 3 / 41 Inferences About Process Quality Statistics and Sampling Distributions

  4. ST 435/535 Statistical Methods for Quality and Productivity Improvement / Statistical Process Control Sampling distribution A statistic computed from a random sample it itself a random variable, and has its own probability distribution. The distribution of a statistic of a random sample is called its sampling distribution , to emphasize that we are dealing with a statistic and not a single observation. 4 / 41 Inferences About Process Quality Statistics and Sampling Distributions

  5. ST 435/535 Statistical Methods for Quality and Productivity Improvement / Statistical Process Control Sampling from a normal distribution Suppose that X 1 , X 2 , . . . , X n is a random sample from a normal population with mean µ and variance σ 2 . That is, X 1 , X 2 , . . . , X n are independent, and each is distributed as N ( µ, σ 2 ). Then the sampling distribution of the sample mean ¯ X is N ( µ, σ 2 / n ), or equivalently ¯ X − µ Z = σ/ √ n ∼ N (0 , 1) . 5 / 41 Inferences About Process Quality Statistics and Sampling Distributions

  6. ST 435/535 Statistical Methods for Quality and Productivity Improvement / Statistical Process Control The sampling distribution of the sample variance is a scaled chi-square distribution: χ 2 = ( n − 1) S 2 ∼ χ 2 n − 1 . σ 2 The χ 2 distribution with ν degrees of freedom, here n − 1, is the Gamma distribution with shape parameter r = ν/ 2 and rate parameter λ = 1 / 2. 6 / 41 Inferences About Process Quality Statistics and Sampling Distributions

  7. ST 435/535 Statistical Methods for Quality and Productivity Improvement / Statistical Process Control These sampling distributions are used to derive confidence intervals for µ and σ 2 , respectively. However, the confidence interval for µ requires that we know the value of σ ; this is rarely the case. When σ is unknown, we use a third sampling result: the sampling distribution of ¯ X − µ T = S / √ n is Student’s t -distribution with n − 1 degrees of freedom. 7 / 41 Inferences About Process Quality Statistics and Sampling Distributions

  8. ST 435/535 Statistical Methods for Quality and Productivity Improvement / Statistical Process Control Sampling from a Bernoulli distribution Recall the notion of a sequence of independent trials, each resulting in success or failure, used to introduce the binomial distribution. Let X i be the indicator of success at the i th trial: � if the i th trial is a success; 1 X i = if the i th trial is a failure. 0 Each X i follows the Bernoulli distribution with parameter p = P ( X i = 1). 8 / 41 Inferences About Process Quality Statistics and Sampling Distributions

  9. ST 435/535 Statistical Methods for Quality and Productivity Improvement / Statistical Process Control The number of successes in n trials is X = X 1 + X 2 + · · · + X n , which follows the binomial distribution with parameters n and p . The sample mean ¯ X = X / n = ˆ p also has a discrete distribution, most easily described in terms of the distribution of X ; in particular E( ¯ X ) = p and Var( ¯ X ) = p (1 − p ) / n . By the Central Limit Theorem, ¯ X is approximately normal, N ( p , p (1 − p ) / n ). 9 / 41 Inferences About Process Quality Statistics and Sampling Distributions

  10. ST 435/535 Statistical Methods for Quality and Productivity Improvement / Statistical Process Control Sampling from a Poisson distribution If X 1 , X 2 , . . . , X n are independent and each has the Poisson distribution with parameter λ , then X = X 1 + X 2 + · · · + X n follows the Poisson distribution with parameter n λ . The sample mean ¯ X = X / n = ˆ p also has a discrete distribution, most easily described in terms of the distribution of X ; in particular E( ¯ X ) = λ and Var( ¯ X ) = λ/ n . By the Central Limit Theorem, ¯ X is approximately normal, N ( λ, λ/ n ). 10 / 41 Inferences About Process Quality Statistics and Sampling Distributions

  11. ST 435/535 Statistical Methods for Quality and Productivity Improvement / Statistical Process Control More generally, if X 1 , X 2 , . . . , X n are independent and X i has the Poisson distribution with parameter λ i , then X = X 1 + X 2 + · · · + X n follows the Poisson distribution with parameter � n i =1 λ i . 11 / 41 Inferences About Process Quality Statistics and Sampling Distributions

  12. ST 435/535 Statistical Methods for Quality and Productivity Improvement / Statistical Process Control Point Estimation In any of these sampling contexts, we need to make inferences about the parameter(s) of the corresponding model. A point estimator of a parameter is a sample statistic that approximates the parameter. As a statistic, it has a sampling distribution, with a mean and a variance. The standard deviation of its sampling distribution is called its standard error . 12 / 41 Inferences About Process Quality Point Estimation of Process Parameters

  13. ST 435/535 Statistical Methods for Quality and Productivity Improvement / Statistical Process Control If an estimator ˆ θ of some parameter θ satisfies E(ˆ θ ) = θ , it is called unbiased . In some situations, but not all, unbiased estimators are best. The mean squared error of an estimator ˆ θ of some parameter θ is θ ) 2 + Var(ˆ E[(ˆ θ − θ ) 2 ] = bias(ˆ θ ) which for an unbiased ˆ θ is just Var(ˆ θ ). X and variance s 2 are always In a random sample, the sample mean ¯ unbiased estimators of the population mean µ and variance σ 2 , respectively, but s is biased for σ . 13 / 41 Inferences About Process Quality Point Estimation of Process Parameters

  14. ST 435/535 Statistical Methods for Quality and Productivity Improvement / Statistical Process Control In some situations, the sample range , x ( n ) − x (1) , has been used to construct an estimator of the population standard deviation σ because it requires little computation. This construction is critically dependent on the assumption that the data are normally distributed; for any other distribution, the relationship between the range and the standard deviation is different. 14 / 41 Inferences About Process Quality Point Estimation of Process Parameters

  15. ST 435/535 Statistical Methods for Quality and Productivity Improvement / Statistical Process Control Inference for a Single Sample Inferences about some parameter may be made using: a point estimator; an interval estimator; a hypothesis test. 15 / 41 Inferences About Process Quality Statistical Inference for a Single Sample

  16. ST 435/535 Statistical Methods for Quality and Productivity Improvement / Statistical Process Control Mean of a normal population Point estimator The usual point estimator of µ is the unbiased ¯ X . The sampling distribution of ¯ X is N ( µ, σ 2 / n ), so its standard error is σ/ √ n . When σ is unknown, we replace it by s to get the estimated standard error s / √ n . 16 / 41 Inferences About Process Quality Statistical Inference for a Single Sample

  17. ST 435/535 Statistical Methods for Quality and Productivity Improvement / Statistical Process Control Interval estimator The usual interval estimator is a confidence interval , derived from the distribution of Z (when σ is known) or T (when σ is unknown). Known σ : X ± z α/ 2 × σ ¯ √ n Unknown σ : X ± t α/ 2 , n − 1 × s ¯ √ n In each case, the interval contains µ with probability 1 − α , and is called a 100(1 − α )% confidence interval. The confidence level 100(1 − α )% is often 95%, but sometimes 99% is preferred. 17 / 41 Inferences About Process Quality Statistical Inference for a Single Sample

Recommend


More recommend