Hypothesis Testing Recall that a point estimate of some parameter is - PowerPoint PPT Presentation

ST 380 Probability and Statistics for the Physical Sciences Hypothesis Testing Recall that a point estimate of some parameter is its most plausible value, in the light of some observed data. Similarly, an interval estimate is a range of reasonably plausible values. Sometimes, a particular value of the parameter is of interest, and we want to decide how plausible it is, again in the light of some observed data. 1 / 30 Tests of Hypotheses Introduction

ST 380 Probability and Statistics for the Physical Sciences Example A foundry making 16GB flash memory chips has historically had a 3% loss rate to process flaws. New equipment has a greater throughput, but a test batch of 250 chips contains 12 with flaws, a 4.8% rate. Was that just a chance effect, or is the new equipment more prone to flaws? 2 / 30 Tests of Hypotheses Introduction

ST 380 Probability and Statistics for the Physical Sciences The statistical framework: X is the number of flawed chips, and we assume that flaws arise independently, so X ∼ Bin( n , p ). The simplest explanation is that nothing changed, that is p = p 0 = . 03. We call this the null hypothesis and denote it H 0 . H 0 : p = p 0 . The alternative is that something did change, and we’re especially concerned that it’s worse. This alternative hypothesis is denoted H a . H a : p > p 0 . 3 / 30 Tests of Hypotheses Introduction

ST 380 Probability and Statistics for the Physical Sciences Note: neither H 0 nor H a allows the possibility that the new equipment is better : p < p 0 . We should really express the null hypothesis as “the new equipment is no worse than the current equipment”, and then H 0 becomes H 0 : p ≤ p 0 . Now all possibilities are covered. In other cases, we may be interested in changes in either direction: H 0 : p = p 0 H a : p � = p 0 . 4 / 30 Tests of Hypotheses Introduction

ST 380 Probability and Statistics for the Physical Sciences We now ask: If H 0 were true, what is the chance of seeing as many as 12 flaws in 250 trials? And the answer is . 076 when p = p 0 = . 03, although less when p < p 0 . So finding 12 or more flawed chips is not especially unlikely under the null hypothesis, and we would not regard it as strong evidence that H 0 is false. In any situation, we can carry out a similar calculation: the probability of observing something as extreme as what actually happened, if the null hypothesis were true. 5 / 30 Tests of Hypotheses Introduction

ST 380 Probability and Statistics for the Physical Sciences The result is called the P -value, and is written P = . 076, for example. By convention, P < . 05 is regarded as “evidence against H 0 ”, and P < . 01 is regarded as “strong evidence”. A P -value . 1 > P ≥ . 05 might be called “weak evidence”. 6 / 30 Tests of Hypotheses Introduction

ST 380 Probability and Statistics for the Physical Sciences Test Procedures Sometimes we need to make a decision about the null hypothesis, not just weigh the evidence against it; e.g., whether to accept the new equipment, or ask the supplier to fix it. We must decide whether or not to reject the null hypothesis. Note: a null hypothesis is usually unlikely to be exactly true, so we do not speak of accepting it, only failing to reject it. Think of it as a working hypothesis, which we use as an approximation until it’s shown to be false. 7 / 30 Tests of Hypotheses Hypotheses and Test Procedures

ST 380 Probability and Statistics for the Physical Sciences Test procedure To carry out a hypothesis test, we need: A test statistic , such as the count X of faulty chips. Usually, a cutoff point, or critical value , to identify values of the test statistic for which we reject H 0 , such as X > 12. Formally, a rejection region : the set of values of the test statistic for which we reject H 0 , such as { 13 , 14 , . . . } . 8 / 30 Tests of Hypotheses Hypotheses and Test Procedures

ST 380 Probability and Statistics for the Physical Sciences Errors Making a decision about a null hypothesis has the possibility of two kinds of error: Type I error: Rejecting the null hypothesis when it is true; Type II error: Failing to reject the null hypothesis when it is false. Error Probabilities Conventionally, the probabilities of Type I and Type II errors are denoted α and β , respectively. 9 / 30 Tests of Hypotheses Hypotheses and Test Procedures

ST 380 Probability and Statistics for the Physical Sciences In cases like the chip foundry, where the hypotheses are H 0 : p ≤ . 03 H a : p > . 03 both α and β depend on p . If the rule is to reject H 0 when X > 12, 250 � α ( p ) = P ( X > 12) = b ( x ; 250 , p ) , p ≤ . 03 x =13 and 12 � β ( p ) = P ( X ≤ 12) = b ( x ; 250 , p ) , p > . 03 x =0 10 / 30 Tests of Hypotheses Hypotheses and Test Procedures

ST 380 Probability and Statistics for the Physical Sciences Significance level We usually ignore the dependence of α ( p ) on p by looking only at the worst case. The significance level of the test, also denoted α , is the worst Type I error probability. In the chip foundry example, this is α = 0 < p ≤ . 03 α ( p ) max and this is easily shown to be α ( . 03) = . 0402. 11 / 30 Tests of Hypotheses Hypotheses and Test Procedures

ST 380 Probability and Statistics for the Physical Sciences Power The dependence of β ( p ) on p cannot be handled as simply: if p is just a little greater than . 03, 12 12 � � β ( p ) = b ( x ; 250 , p ) ≈ b ( x ; 250 , . 03) = 1 − α = . 9598 x =0 x =0 but, for larger p , β ( p ) is more reasonable. For example, β ( . 05) = . 5175, and β ( . 10) = . 0021. We usually focus on Power( p ) = P (Reject H 0 ) as a function of p = 1 − β ( p ) . 12 / 30 Tests of Hypotheses Hypotheses and Test Procedures

ST 380 Probability and Statistics for the Physical Sciences The power curve: plot(function(p) 1 - pbinom(12, 250, p), from = .03, to = .10, xlab = "p", ylab = "Power", ylim = c(0, 1)) title("Power curve") abline(h = 1 - pbinom(12, 250, .03), col = "blue") 13 / 30 Tests of Hypotheses Hypotheses and Test Procedures

ST 380 Probability and Statistics for the Physical Sciences Tests About a Population Mean Suppose that X 1 , X 2 , . . . , X n is a random sample from a population with mean µ . To decide how plausible is a particular value µ 0 , it is natural to see how far the sample mean ¯ x is from µ 0 . If ¯ x is close to µ 0 , that value seems quite plausible, but not otherwise. Suppose that we are interested in deviations in either direction: H 0 : µ = µ 0 H a : µ � = µ 0 . 14 / 30 Tests of Hypotheses Tests About a Population Mean

ST 380 Probability and Statistics for the Physical Sciences For example, 36 water samples taken downstream from the discharge of a water treatment facility showed barium concentrations with x = 10 . 87 and s = 13 . 31 mg/L, respectively, whereas the upstream ¯ concentration was 5.32 mg/L. The (estimated) standard error of ¯ X is 13 . 31 √ = 2 . 22 36 so the observed downstream mean is 10 . 87 − 5 . 32 = 2 . 50 2 . 22 standard errors higher than upstream. 15 / 30 Tests of Hypotheses Tests About a Population Mean

ST 380 Probability and Statistics for the Physical Sciences The natural test statistic is | ¯ X − µ 0 | | T | = standard error of ¯ X where T has observed value x − µ 0 ¯ t = X . standard error of ¯ In the example, t = 10 . 87 − 5 . 32 = 2 . 50 2 . 22 as we calculated earlier. 16 / 30 Tests of Hypotheses Tests About a Population Mean

ST 380 Probability and Statistics for the Physical Sciences To test H 0 , we need to calculate the P -value P ( | T | ≥ | t | when H 0 is true) . We can do this in various cases: X 1 , X 2 , . . . , X n normally distributed, σ known: T ∼ N (0 , 1); X 1 , X 2 , . . . , X n normally distributed, σ unknown but estimated by s : T ∼ Student’s t ; n large, σ known or estimated by s : T ≈ N (0 , 1). 17 / 30 Tests of Hypotheses Tests About a Population Mean

ST 380 Probability and Statistics for the Physical Sciences In the example, we could use the large sample size 36 to justify using the normal distribution, and calculate P ( | T | ≥ 2 . 50) ≈ 1 − Φ(2 . 50) + Φ( − 2 . 50) = . 012 Alternatively, we could guess that the individual measurements are normally distributed, and use the t -distribution with n − 1 = 35 degrees of freedom: P ( | T | ≥ 2 . 50) = 1 − F 35 (2 . 50) + F 35 ( − 2 . 50) = . 017 Either way, P < . 05 and the P -value is close to . 01, so we have evidence against H 0 , if not strong evidence. 18 / 30 Tests of Hypotheses Tests About a Population Mean

ST 380 Probability and Statistics for the Physical Sciences Test Procedure If we must make a decision, we need a rejection region . Typically, we first choose the significance level α , most commonly .05. The critical value is then either z α/ 2 or t α/ 2 , n − 1 , depending on which assumptions we are making. For instance, in the normal case, P ( | T | ≥ z α/ 2 ) = 1 − Φ( z α/ 2 ) + Φ( − z α/ 2 ) = α/ 2 + α/ 2 = α. Then we reject H 0 whenever | t | ≥ critical value. 19 / 30 Tests of Hypotheses Tests About a Population Mean

Hypothesis Testing Recall that a point estimate of some parameter is - PowerPoint PPT Presentation

ST 380 Probability and Statistics for the Physical Sciences Hypothesis Testing Recall that a point estimate of some parameter is its most plausible value, in the light of some observed data. Similarly, an interval estimate is a range of

STAT 113 Hypothesis Testing I Colin Reimer Dawson Oberlin College October 5, 2017 1 / 17

Chapter 6 Hypothesis Testing What is Hypothesis Testing? the use of statistical

Chapter 6 Hypothesis Testing What is Hypothesis Testing? the use of statistical

STAT 215 Hypothesis Testing I Colin Reimer Dawson Oberlin College September 7, 2017 1 / 14

CME/STATS 195 CME/STATS 195 Lecture 7: Hypothesis Testing and Lecture 7: Hypothesis Testing and

Gov 2000: 6. Hypothesis Testing Matthew Blackwell October 11, 2016 1 / 55 1. Hypothesis

Cluster Validity Hypothesis Random Graph Hypothesis Random Label Hypothesis Relative Criteria

Testing Specification testing Michel Bierlaire Introduction to choice models Differences from

Hypothesis Testing Mark Lunt Centre for Epidemiology Versus Arthritis University of Manchester

Hypothesis tests with binomial example STAT 587 (Engineering) Iowa State University October 2,

t -tests STAT 587 (Engineering) Iowa State University October 2, 2020 Statistical hypothesis

Testing 6.1 Specification testing Michel Bierlaire A short reminder on hypothesis testing

Hypothesis testing get data that differ from the null hypothesis. If the data would be quite

Lecture 4: Hypothesis Testing Ani Manichaikul amanicha@jhsph.edu 20 April 2007 1 / 69 Steps of

Hypothesis Testing Part I James J. Heckman University of Chicago Econ 312, Spring 2019 Heckman

Bayesian hypothesis testing Dr. Jarad Niemi STAT 544 - Iowa State University March 7, 2019

Statistical Power in Statistical Power in ANOVA ANOVA Rick Balkin Balkin, Ph.D., LPC , Ph.D.,

Primer on multiple testing Joshua Loftus July 23, 2015 One hypothesis, many kinds of errors We

14. hypothesis testing 1 competing hypotheses Programmers using the Eclipse IDE make fewer

Inference Statistical inference Definition: Definition: The act or process of reaching

Hypotheses testing, p-values, Type I and Type II Errors Statistics are not substitute for

An introduction to R: Basic statistics with R No emie Becker, Sonja Grath & Dirk Metzler

A/B Testing: Avoiding Common Pitfalls Danielle Jabin Mrz 6, 2014 2 Make all the worlds

New approaches to error control in multiple testing Juliet Popper Shaffer Fourth Lehmann

Hypothesis Testing Recall that a point estimate of some parameter is - PowerPoint PPT Presentation

ST 380 Probability and Statistics for the Physical Sciences Hypothesis Testing Recall that a point estimate of some parameter is its most plausible value, in the light of some observed data. Similarly, an interval estimate is a range of

STAT 113 Hypothesis Testing I Colin Reimer Dawson Oberlin College October 5, 2017 1 / 17

Chapter 6 Hypothesis Testing What is Hypothesis Testing? the use of statistical

Chapter 6 Hypothesis Testing What is Hypothesis Testing? the use of statistical

STAT 215 Hypothesis Testing I Colin Reimer Dawson Oberlin College September 7, 2017 1 / 14

CME/STATS 195 CME/STATS 195 Lecture 7: Hypothesis Testing and Lecture 7: Hypothesis Testing and

Gov 2000: 6. Hypothesis Testing Matthew Blackwell October 11, 2016 1 / 55 1. Hypothesis

Cluster Validity Hypothesis Random Graph Hypothesis Random Label Hypothesis Relative Criteria

Testing Specification testing Michel Bierlaire Introduction to choice models Differences from

Hypothesis Testing Mark Lunt Centre for Epidemiology Versus Arthritis University of Manchester

Hypothesis tests with binomial example STAT 587 (Engineering) Iowa State University October 2,

t -tests STAT 587 (Engineering) Iowa State University October 2, 2020 Statistical hypothesis

Testing 6.1 Specification testing Michel Bierlaire A short reminder on hypothesis testing

Hypothesis testing get data that differ from the null hypothesis. If the data would be quite

Lecture 4: Hypothesis Testing Ani Manichaikul amanicha@jhsph.edu 20 April 2007 1 / 69 Steps of

Hypothesis Testing Part I James J. Heckman University of Chicago Econ 312, Spring 2019 Heckman

Bayesian hypothesis testing Dr. Jarad Niemi STAT 544 - Iowa State University March 7, 2019

Statistical Power in Statistical Power in ANOVA ANOVA Rick Balkin Balkin, Ph.D., LPC , Ph.D.,

Primer on multiple testing Joshua Loftus July 23, 2015 One hypothesis, many kinds of errors We

14. hypothesis testing 1 competing hypotheses Programmers using the Eclipse IDE make fewer

Inference Statistical inference Definition: Definition: The act or process of reaching

Hypotheses testing, p-values, Type I and Type II Errors Statistics are not substitute for

An introduction to R: Basic statistics with R No emie Becker, Sonja Grath &amp; Dirk Metzler

A/B Testing: Avoiding Common Pitfalls Danielle Jabin Mrz 6, 2014 2 Make all the worlds

New approaches to error control in multiple testing Juliet Popper Shaffer Fourth Lehmann

An introduction to R: Basic statistics with R No emie Becker, Sonja Grath & Dirk Metzler