UQ, STAT2201, 2017, Lecture 6 Unit 6 Statistical Inference Ideas. - PowerPoint PPT Presentation

UQ, STAT2201, 2017, Lecture 6 Unit 6 – Statistical Inference Ideas. 1

Statistical Inference is the process of forming judgements about the parameters of a population typically on the basis of random sampling . 2

The random variables X 1 , X 2 , . . . , X n are an (i.i.d.) random sample of size n if (a) the X i ’s are independent random variables and (b) every X i has the same probability distribution. A statistic is any function of the observations in a random sample, and the probability distribution of a statistic is called the sampling distribution . 3

Any function of the observation, or any statistic , is also a random variable. We call the probability distribution of a statistic a sampling distribution . A point estimate of some population parameter θ is a single numerical value ˆ θ of a statistic ˆ Θ. The statistic ˆ Θ is called the point estimator . 4

The most common statistic we consider is the sample mean , X , with a given value denoted by x . As an estimator, the sample mean is an estimator of the population mean, µ . 5

The Central Limit Theorem 6

Central Limit Theorem (for sample means): If X 1 , X 2 , . . . , X n is a random sample of size n taken from a population with mean µ and finite variance σ 2 and if X is the sample mean, the limiting form of the distribution of Z = X − µ σ/ √ n as n → ∞ , is the standard normal distribution. This implies that X is approximately normally distributed with mean µ and standard deviation σ/ √ n . 7

The standard error of X is given by σ/ √ n . In most practical situations σ is not known but rather estimated in this case, the estimated standard error , (denoted in typical computer output as ”SE”), is s / √ n where the sample standard deviation s is the point estimator for the population standard deviation, � n � x 2 i − n x 2 � � � i =1 � s = . n − 1 8

Central Limit Theorem (for sums): Manipulate the central limit theorem (for sample means and use � n i =1 X i = nX . This yields, � n i =1 X i − n µ Z = √ , n σ 2 which follows a standard normal distribution as n → ∞ . This implies that � n i =1 X i is approximately normally distributed with mean n µ and variance n σ 2 . 9

Confidence Intervals 10

Knowing the sampling distribution (or the approximate sampling distribution) of a statistic is the key for the two main tools of statistical inference that we study: (a) Confidence intervals – a method for yielding error bounds on point estimates . (b) Hypothesis testing – a methodology for making conclusions about population parameters. 11

The formulas for most of the statistical procedures use quantiles of the sampling distribution . When the distribution is N (0 , 1) (standard normal), the α ’s quantile is denoted z α and satisfies: � z α 1 − x 2 √ 2 dx . α = e 2 π −∞ A common value to use for α is 0 . 05 and in procedures the expressions z 1 − α or z 1 − α/ 2 appear. Note that in this case z 1 − α/ 2 = 1 . 96 ≈ 2 . 12

A confidence interval estimate for µ is an interval of the form l ≤ µ ≤ u , where the end-points l and u are computed from the sample data. Because different samples will produce different values of l and u , these end points are values of random variables L and U , respectively. Suppose that � � L ≤ µ ≤ U = 1 − α. P The resulting confidence interval for µ is l ≤ µ ≤ u . The end-points or bounds l and u are called the lower- and upper-confidence limits (bounds), respectively, and 1 − α is called the confidence level . 13

If x is the sample mean of a random sample of size n from a normal population with known variance σ 2 , a 100(1 − α )% confidence interval on µ is given by σ σ √ n ≤ µ ≤ x + z 1 − α/ 2 √ n . x − z 1 − α/ 2 Note that it is roughly of the form, x − 2 SE ≤ µ ≤ x + 2 SE . Learn how to do back of the envelope calculations! 14

Confidence interval formulas give insight into the required sample size : If x is used as an estimate of µ , we can be 100(1 − α )% confident that the error | x − µ | will not exceed a specified amount ∆ when the sample size is not smaller than � 2 � z 1 − α/ 2 σ n = . ∆ 15

Hypothesis Testing 16

A statistical hypothesis is a statement about the parameters of one or more populations. The null hypothesis , denoted H 0 is the claim that is initially assumed to be true based on previous knowledge. The alternative hypothesis , denoted H 1 is a claim that contradicts the null hypothesis. 17

For some arbitrary value µ 0 , a two-sided alternative hypothesis is expressed as: H 0 : µ = µ 0 , H 1 : µ � = µ 0 A one-sided alternative hypothesis is expressed as: H 0 : µ = µ 0 , H 1 : µ < µ 0 or H 0 : µ = µ 0 , H 1 : µ > µ 0 . 18

The standard scientific research use of hypothesis is to “hope to reject” H 0 so as to have statistical evidence for the validity of H 1 . 19

An hypothesis test is based on a decision rule that is a function of the test statistic . For example: Reject H 0 if the test statistic is below a specified threshold, otherwise don’t reject. 20

Rejecting the null hypothesis H 0 when it is true is defined as a type I error . Failing to reject the null hypothesis H 0 when it is false is defined as a type II error . 21

H 0 Is True H 0 Is False Fail to reject H 0 : No error Type II error Reject H 0 : Type I error No error � H 0 is true). � α = P (type I error) = P (reject H 0 � H 0 is false ). � β = P (type II error) = P (fail to reject H 0 22

The power of a statistical test is the probability of rejecting the null hypothesis H 0 when the alternative hypothesis is true. Desire: α is low and power (1 − β ) as high as can be. 23

Simple Hypothesis Tests 24

A typical example of a simple hypothesis test has H 0 : µ = µ 0 , H 1 : µ = µ 1 , where µ 0 and µ 1 are some specified values for the population mean. This test isn’t typically practical but is useful for understanding the concepts at hand. Assuming that µ 0 < µ 1 and setting a threshold, τ , reject H 0 if the x > τ , otherwise don’t reject. 25

Explicit calculation of the relationships of τ , α , β , n , σ , µ 0 and µ 1 is possible in this case. 26

Practical Hypothesis Tests (focus of Units 7,8 of the course) 27

In most hypothesis tests used in practice (and in this course), a specified level of type I error, α is predetermined (e.g. α = 0 . 05) and the type II error is not directly specified. The probability of making a type II error β increases (power decreases) rapidly as the true value of µ approaches the hypothesized value. The probability of making a type II error also depends on the sample size n - increasing the sample size results in a decrease in the probability of a type II error. The population (or natural) variability (e.g. described by σ ) also affects the power. 28

The P-value is the smallest level of significance that would lead to rejection of the null hypothesis H 0 with the given data. That is, the P-value is based on the data. It is computed by considering the location of the test statistic under the sampling distribution based on H 0 . 29

It is customary to consider the test statistic (and the data) significant when the null hypothesis H 0 is rejected; therefore, we may think of the P -value as the smallest α at which the data are significant. In other words, the P -value is the observed significance level . 30

Clearly, the P -value provides a measure of the credibility of the null hypothesis. Computing the exact P -value for a statistical test is not always doable by hand. It is typical to report the P -value in studies where H 0 was rejected (and new scientific claims were made). Typical (“convincing”) values can be of the order 0 . 001. 31

A General Procedure for Hypothesis Tests is (1) Parameter of interest: From the problem context, identify the parameter of interest. (2) Null hypothesis, H 0 : State the null hypothesis, H 0 . (3) Alternative hypothesis, H 1 : Specify an appropriate alternative hypothesis, H 1 . (4) Test statistic: Determine an appropriate test statistic. (5) Reject H 0 if: State the rejection criteria for the null hypothesis. (6) Computations: Compute any necessary sample quantities, substitute these into the equation for the test statistic, and compute the value. (7) Draw conclusions: Decide whether or not H 0 should be rejected and report that in the problem context. 32

UQ, STAT2201, 2017, Lecture 6 Unit 6 Statistical Inference Ideas. - PowerPoint PPT Presentation

UQ, STAT2201, 2017, Lecture 6 Unit 6 Statistical Inference Ideas. 1 Statistical Inference is the process of forming judgements about the parameters of a population typically on the basis of random sampling . 2 The random variables X 1 , X 2

UQ, STAT2201, 2017, Lecture 8 (and part of 9). Unit 8 Two Sample Inference. Unit 9

UQ, STAT2201, 2017, Lecture 7. Unit 7 Single Sample Inference. 1 Setup: A sample x 1 , . . .

UQ, STAT2201, 2017, Lecture 5 Unit 4 Joint Distributions and Unit 5 Descriptive

STAT2201 Analysis of Engineering & Scientific Data Unit 7 Slava Vaisman The University of

STAT2201 Analysis of Engineering & Scientific Data Unit 6 Slava Vaisman The University of

UQ, STAT2201, 2017, Lecture 2, Unit 2, Probability and Monte Carlo. 1 Im willing to bet that

UQ, STAT2201, 2017, Lecture 9. Unit 10 Further Stats Overview 1 The Strength of Conditional

STAT2201 Analysis of Engineering & Scientific Data Unit 8 Slava Vaisman The University of

STAT2201 Analysis of Engineering & Scientific Data Unit 3 Slava Vaisman The University of

UQ, STAT2201, 2017, Lectures 3 and 4 Unit 3 Probability Distributions. 1 Random Variables

HOUSING PROJECT 1 UNIT 4 UNIT 1 UNIT 6 UNIT 5 UNIT 3 UNIT 2 Application of the Concept

STAT 401A - Statistical Methods for Research Workers Statistical Inference Jarad Niemi (Dr. J)

Foundations for Inference I Dajiang Liu @PHS525 Feb-09-2016 Statistical Inference

Statistical Natural Language Processing Statistical models: learning, inference, estimation,

Inference in Bayesian networks Chapter 14.45 Chapter 14.45 1 Outline Exact inference

COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Trevor Cohn 23. PGM

Choosing sample size in randomized experiments Aleksey Tetenov (University of Bristol) Cemmap

A Framework for Hypothesis Tests in Statistical Models With Linear Predictors Georges Monette 1

Theory of Statistical Inference Dajiang Liu @PHS 525 Feb-11, 2016 Sampling Distribution for

Introduction I Introduction I Introduction II Introduction II Statistical inference

STAT 113 Hypothesis Testing II Colin Reimer Dawson Oberlin College October 10, 2017 1 / 30

Bayesian approach for similarity testing: concepts and examples David.LeBlond@sbcglobal.net

Null Hypothesis Significance Testing Signifcance Level, Power, t -Tests 18.05 Spring 2014 Jeremy

HOST Statistics ECE 525 Introduction Probability and statistics play very important roles in

UQ, STAT2201, 2017, Lecture 6 Unit 6 Statistical Inference Ideas. - PowerPoint PPT Presentation

UQ, STAT2201, 2017, Lecture 6 Unit 6 Statistical Inference Ideas. 1 Statistical Inference is the process of forming judgements about the parameters of a population typically on the basis of random sampling . 2 The random variables X 1 , X 2

UQ, STAT2201, 2017, Lecture 8 (and part of 9). Unit 8 Two Sample Inference. Unit 9

UQ, STAT2201, 2017, Lecture 7. Unit 7 Single Sample Inference. 1 Setup: A sample x 1 , . . .

UQ, STAT2201, 2017, Lecture 5 Unit 4 Joint Distributions and Unit 5 Descriptive

STAT2201 Analysis of Engineering &amp; Scientific Data Unit 7 Slava Vaisman The University of

STAT2201 Analysis of Engineering &amp; Scientific Data Unit 6 Slava Vaisman The University of

UQ, STAT2201, 2017, Lecture 2, Unit 2, Probability and Monte Carlo. 1 Im willing to bet that

UQ, STAT2201, 2017, Lecture 9. Unit 10 Further Stats Overview 1 The Strength of Conditional

STAT2201 Analysis of Engineering &amp; Scientific Data Unit 8 Slava Vaisman The University of

STAT2201 Analysis of Engineering &amp; Scientific Data Unit 3 Slava Vaisman The University of

UQ, STAT2201, 2017, Lectures 3 and 4 Unit 3 Probability Distributions. 1 Random Variables

HOUSING PROJECT 1 UNIT 4 UNIT 1 UNIT 6 UNIT 5 UNIT 3 UNIT 2 Application of the Concept

STAT 401A - Statistical Methods for Research Workers Statistical Inference Jarad Niemi (Dr. J)

Foundations for Inference I Dajiang Liu @PHS525 Feb-09-2016 Statistical Inference

Statistical Natural Language Processing Statistical models: learning, inference, estimation,

Inference in Bayesian networks Chapter 14.45 Chapter 14.45 1 Outline Exact inference

COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Trevor Cohn 23. PGM

Choosing sample size in randomized experiments Aleksey Tetenov (University of Bristol) Cemmap

A Framework for Hypothesis Tests in Statistical Models With Linear Predictors Georges Monette 1

Theory of Statistical Inference Dajiang Liu @PHS 525 Feb-11, 2016 Sampling Distribution for

Introduction I Introduction I Introduction II Introduction II Statistical inference

STAT 113 Hypothesis Testing II Colin Reimer Dawson Oberlin College October 10, 2017 1 / 30

Bayesian approach for similarity testing: concepts and examples David.LeBlond@sbcglobal.net

Null Hypothesis Significance Testing Signifcance Level, Power, t -Tests 18.05 Spring 2014 Jeremy

HOST Statistics ECE 525 Introduction Probability and statistics play very important roles in

STAT2201 Analysis of Engineering & Scientific Data Unit 7 Slava Vaisman The University of

STAT2201 Analysis of Engineering & Scientific Data Unit 6 Slava Vaisman The University of

STAT2201 Analysis of Engineering & Scientific Data Unit 8 Slava Vaisman The University of

STAT2201 Analysis of Engineering & Scientific Data Unit 3 Slava Vaisman The University of