Hypothesis testing DS GA 1002 Probability and Statistics for Data - PowerPoint PPT Presentation

Hypothesis testing DS GA 1002 Probability and Statistics for Data Science http://www.cims.nyu.edu/~cfgranda/pages/DSGA1002_fall17 Carlos Fernandez-Granda

Example In a medical study 10% of women and 12.5% of men suffer from heart disease Hypothesis: Men are more prone to have heart disease than women If there are 20 people in the study, effect could be by chance If there are 20 000 people, we are more convinced Hypothesis testing makes this precise

Hypothesis testing Framework to decide whether patterns in data are random fluctuations Aim: Establish whether a predefined hypothesis is supported by the data

The hypothesis testing framework Parametric testing Nonparametric testing Multiple testing

Null and alternative hypotheses Null hypothesis H 0 : There is no underlying phenomenon (men are not more prone to heart disease) Alternative hypothesis H 1 : There is an underlying phenomenon We reject H 0 if it does not explain the data well Failing to reject H 0 does not mean that we think it holds, we just don’t have enough evidence Frequentist perspective: A hypothesis holds or does not hold deterministically

Tests A test is a procedure to decide whether to reject the null hypothesis General strategy: 1. Compute a test statistic from the data T ( x 1 , . . . , x n ) 2. Decide on a rejection region R such that if T ( x 1 , . . . , x n ) ∈ R it is very unlikely that the null hypothesis holds 3. Reject the null hypothesis if T ( x 1 , . . . , x n ) ∈ R

Errors Reject H 0 ? No Yes H 0 is true Type I error � H 1 is true Type II error �

Size and significance level Priority: Control Type I errors The size of a test is the probability of making a Type I error The significance level is an upper bound on the size

Significance level The effect is significant (at a level of 5%) Translation: Given the assumed probabilistic model, the probability that we reject the null hypothesis when it is true is at most 5%

p value The p value is the smallest significance level at which we would reject H 0 for a particular dataset It is a function of the data, not a probability

Overview 1. Choose a conjecture 2. Determine the corresponding null hypothesis 3. Choose a test 4. Gather the data 5. Compute the test statistic from the data 6. Compute the p value and reject the null hypothesis if it is below a predefined limit (typically 1% or 5%)

Example: Clutch Conjecture: NBA player is more effective in 4th quarter Null hypothesis: He’s equally effective Test statistic: Games out of 20 in which he scores more points per minute in the 4th quarter What threshold do we need to ensure a significance level of 1 % , 5 % ? The test statistic is 14, what is the p value?

Example: Clutch T 0 represents the test statistic under the null hypothesis We consider a rejection region of the form R := { t | t ≥ η } The size of the test is P ( T 0 > η )

Example: Clutch T 0 represents the test statistic under the null hypothesis We consider a rejection region of the form R := { t | t ≥ η } The size of the test is P ( T 0 > η ) What is the distribution of the test statistic T 0 under the null hypothesis?

Example: Clutch T 0 represents the test statistic under the null hypothesis We consider a rejection region of the form R := { t | t ≥ η } The size of the test is P ( T 0 > η ) What is the distribution of the test statistic T 0 under the null hypothesis? Binomial with parameters 20 and 1 / 2

Example: Clutch T 0 represents the test statistic under the null hypothesis We consider a rejection region of the form R := { t | t ≥ η } The size of the test is n P ( T 0 > η ) = 1 � n � � 2 n k k = η What is the distribution of the test statistic T 0 under the null hypothesis? Binomial with parameters 20 and 1 / 2

Distribution under null hypothesis η 1 2 3 4 5 P ( T 0 ≥ η ) 1.000 1.000 1.000 0.999 0.994 η 6 7 8 9 10 P ( T 0 ≥ η ) 0.979 0.942 0.868 0.748 0.588 η 11 12 13 14 15 P ( T 0 ≥ η ) 0.412 0.252 0.132 0.058 0.021 η 16 17 18 19 20 P ( T 0 ≥ η ) 0.006 0.001 0.000 0.000 0.000

Example: Clutch What threshold do we need to ensure a significance level of 1 % ? What threshold do we need to ensure a significance level of 5 % ? The test statistic is 14, what is the p value?

Example: Clutch What threshold do we need to ensure a significance level of 1 % ? 16 What threshold do we need to ensure a significance level of 5 % ? The test statistic is 14, what is the p value?

Example: Clutch What threshold do we need to ensure a significance level of 1 % ? 16 What threshold do we need to ensure a significance level of 5 % ? 15 The test statistic is 14, what is the p value?

Example: Clutch What threshold do we need to ensure a significance level of 1 % ? 16 What threshold do we need to ensure a significance level of 5 % ? 15 The test statistic is 14, what is the p value? 5.8 %

Example: Clutch What threshold do we need to ensure a significance level of 1 % ? 16 What threshold do we need to ensure a significance level of 5 % ? 15 The test statistic is 14, what is the p value? 5.8 % Is this the probability that the null hypothesis holds?

Example: Clutch What threshold do we need to ensure a significance level of 1 % ? 16 What threshold do we need to ensure a significance level of 5 % ? 15 The test statistic is 14, what is the p value? 5.8 % Is this the probability that the null hypothesis holds? No!

Power The power of a test is the probability of rejecting H 0 under H 1 For a given significance level, we want as much power as possible Problem: We need to know the distribution of the data under H 1 !

Parametric testing Data are sampled from a known distribution with unknown parameters Probability measure P θ depends on θ Frequentist perspective The parameter is deterministic and so are the hypotheses Notation: � X is a random vector distributed according to P θ , x are a realization of � the data � X

If H 0 is θ = θ 0 The size of a test with test statistic T and rejection region R is � � T ( � α := P θ 0 X ) ∈ R If the rejection region is of the form T ( � x ) ≥ η � � T ( � α = P θ 0 X ) ≥ η Smallest η at which we reject H 0 is T ( � x ) � � T ( � p = P θ 0 X ) ≥ T ( � x ) p value: probability under H 0 of observing a test statistic that is as extreme as the one we observe

Composite hypotheses θ = θ 0 is a simple hypothesis A composite hypothesis is of the form θ ∈ S for a certain set S The size of a composite test is � � T ( � α = sup X ) ≥ η P θ θ ∈H 0 The p value is � � T ( � X ) ≥ T ( � p = sup P θ x ) θ ∈H 0

Power function The power function of the test is defined as � � T ( � β ( θ ) := P θ X ) ∈ R We want β ( θ ) ≈ 0 for θ ∈ H 0 and β ( θ ) ≈ 1 for θ ∈ H 1

Example: Coin flip Conjecture: Coin is biased towards heads θ > 1 / 2 Null hypothesis: Coin not biased towards heads θ ≤ 1 / 2 Test statistic: Number of heads out of n = 5 , 10 , 100 flips Rejection region: Heads = n , Heads ≥ 3 n / 5 Power function?

Coin flip power function If η = n , � � T ( � β 1 ( θ ) = P θ X ) ∈ R If η = 3 n / 5, � � T ( � β 2 ( θ ) = P θ X ) ∈ R

Coin flip power function If η = n , � � T ( � β 1 ( θ ) = P θ X ) ∈ R = θ n If η = 3 n / 5, � � T ( � β 2 ( θ ) = P θ X ) ∈ R

Coin flip power function If η = n , � � T ( � β 1 ( θ ) = P θ X ) ∈ R = θ n If η = 3 n / 5, � � T ( � β 2 ( θ ) = P θ X ) ∈ R n � n � θ k ( 1 − θ ) n − k � = k k = 3 n / 5

η = n n = 5 n = 50 0.75 n = 100 β ( θ ) 0.50 0.25 0.05 0.25 0.50 0.75 θ

η ≥ 3 n / 5 n = 5 n = 50 0.75 n = 100 β ( θ ) 0.50 0.25 0.05 0.25 0.50 0.75 θ

Permutation test Aim: Compare two datasets � x A and � x B Null hypothesis: The two datasets are sampled from the same distribution No parametric model...

Test statistic Choose test statistic T and evaluate the difference T diff ( � x ) := T ( � x A ) − T ( � x B ) , Test: R := { t | t ≥ η } Problem: How do we determine significance level or p value?

Main insight: Exchangeability under permutations Under H 0 distribution of T diff ( � X ) does not change if we permute labels Joint distribution of � X 1 , � X 2 , . . . , � X n and of any permutation X 24 , � � X n , . . . , � X 3 are the same Values of T diff after permuting t diff , 1 , . . . t diff , n ! are uniformly distributed n ! = 1 � � T diff ( � � X ) ≥ η 1 t diff , i ≥ η P n ! i = 1 This is the size of the test! � � T diff ( � p = P X ) ≥ T diff ( � x ) n ! = 1 � 1 t diff , i ≥ T diff ( � x ) n ! i = 1

Hypothesis testing DS GA 1002 Probability and Statistics for Data - PowerPoint PPT Presentation

Hypothesis testing DS GA 1002 Probability and Statistics for Data Science http://www.cims.nyu.edu/~cfgranda/pages/DSGA1002_fall17 Carlos Fernandez-Granda Example In a medical study 10% of women and 12.5% of men suffer from heart disease

STAT 113 Hypothesis Testing I Colin Reimer Dawson Oberlin College October 5, 2017 1 / 17

Chapter 6 Hypothesis Testing What is Hypothesis Testing? the use of statistical

Chapter 6 Hypothesis Testing What is Hypothesis Testing? the use of statistical

STAT 215 Hypothesis Testing I Colin Reimer Dawson Oberlin College September 7, 2017 1 / 14

CME/STATS 195 CME/STATS 195 Lecture 7: Hypothesis Testing and Lecture 7: Hypothesis Testing and

Gov 2000: 6. Hypothesis Testing Matthew Blackwell October 11, 2016 1 / 55 1. Hypothesis

Cluster Validity Hypothesis Random Graph Hypothesis Random Label Hypothesis Relative Criteria

Testing Specification testing Michel Bierlaire Introduction to choice models Differences from

Hypothesis Testing Mark Lunt Centre for Epidemiology Versus Arthritis University of Manchester

Hypothesis tests with binomial example STAT 587 (Engineering) Iowa State University October 2,

t -tests STAT 587 (Engineering) Iowa State University October 2, 2020 Statistical hypothesis

Testing 6.1 Specification testing Michel Bierlaire A short reminder on hypothesis testing

Hypothesis testing get data that differ from the null hypothesis. If the data would be quite

Lecture 4: Hypothesis Testing Ani Manichaikul amanicha@jhsph.edu 20 April 2007 1 / 69 Steps of

Hypothesis Testing Part I James J. Heckman University of Chicago Econ 312, Spring 2019 Heckman

Bayesian hypothesis testing Dr. Jarad Niemi STAT 544 - Iowa State University March 7, 2019

Significance Testing Evaluation, session 6 CS6200: Information Retrieval Statistical

SMT error analysis and mapping to syntactic, semantic and structural fixes Nora Aranberri IXA

Website http://exceptionsafecode.com Bibliography Video Comments Contact Email

Confidence Intervals and Hypothesis Testing Marc H. Mehlman marcmehlman@yahoo.com University of

Statistical Analysis of Corpus Data with R Hypothesis Testing for Corpus Frequency Data The

ACMS 20340 Statistics for Life Sciences Chapter 15: Inference in Practice Inference in Practice

Hypothesis Tests for Population Means Bernd Schr oder logo1 Bernd Schr oder Louisiana

Business Statistics CONTENTS Two types of error The power of a test Experimental design

Sambuz

Useful Links

Newsletter

Mail Us