Hypothesis testing DS GA 1002 Probability and Statistics for Data Science http://www.cims.nyu.edu/~cfgranda/pages/DSGA1002_fall17 Carlos Fernandez-Granda
Example In a medical study 10% of women and 12.5% of men suffer from heart disease Hypothesis: Men are more prone to have heart disease than women If there are 20 people in the study, effect could be by chance If there are 20 000 people, we are more convinced Hypothesis testing makes this precise
Hypothesis testing Framework to decide whether patterns in data are random fluctuations Aim: Establish whether a predefined hypothesis is supported by the data
The hypothesis testing framework Parametric testing Nonparametric testing Multiple testing
Null and alternative hypotheses Null hypothesis H 0 : There is no underlying phenomenon (men are not more prone to heart disease) Alternative hypothesis H 1 : There is an underlying phenomenon We reject H 0 if it does not explain the data well Failing to reject H 0 does not mean that we think it holds, we just don’t have enough evidence Frequentist perspective: A hypothesis holds or does not hold deterministically
Tests A test is a procedure to decide whether to reject the null hypothesis General strategy: 1. Compute a test statistic from the data T ( x 1 , . . . , x n ) 2. Decide on a rejection region R such that if T ( x 1 , . . . , x n ) ∈ R it is very unlikely that the null hypothesis holds 3. Reject the null hypothesis if T ( x 1 , . . . , x n ) ∈ R
Errors Reject H 0 ? No Yes H 0 is true Type I error � H 1 is true Type II error �
Size and significance level Priority: Control Type I errors The size of a test is the probability of making a Type I error The significance level is an upper bound on the size
Significance level The effect is significant (at a level of 5%) Translation: Given the assumed probabilistic model, the probability that we reject the null hypothesis when it is true is at most 5%
p value The p value is the smallest significance level at which we would reject H 0 for a particular dataset It is a function of the data, not a probability
Overview 1. Choose a conjecture 2. Determine the corresponding null hypothesis 3. Choose a test 4. Gather the data 5. Compute the test statistic from the data 6. Compute the p value and reject the null hypothesis if it is below a predefined limit (typically 1% or 5%)
Example: Clutch Conjecture: NBA player is more effective in 4th quarter Null hypothesis: He’s equally effective Test statistic: Games out of 20 in which he scores more points per minute in the 4th quarter What threshold do we need to ensure a significance level of 1 % , 5 % ? The test statistic is 14, what is the p value?
Example: Clutch T 0 represents the test statistic under the null hypothesis We consider a rejection region of the form R := { t | t ≥ η } The size of the test is P ( T 0 > η )
Example: Clutch T 0 represents the test statistic under the null hypothesis We consider a rejection region of the form R := { t | t ≥ η } The size of the test is P ( T 0 > η ) What is the distribution of the test statistic T 0 under the null hypothesis?
Example: Clutch T 0 represents the test statistic under the null hypothesis We consider a rejection region of the form R := { t | t ≥ η } The size of the test is P ( T 0 > η ) What is the distribution of the test statistic T 0 under the null hypothesis? Binomial with parameters 20 and 1 / 2
Example: Clutch T 0 represents the test statistic under the null hypothesis We consider a rejection region of the form R := { t | t ≥ η } The size of the test is n P ( T 0 > η ) = 1 � n � � 2 n k k = η What is the distribution of the test statistic T 0 under the null hypothesis? Binomial with parameters 20 and 1 / 2
Distribution under null hypothesis η 1 2 3 4 5 P ( T 0 ≥ η ) 1.000 1.000 1.000 0.999 0.994 η 6 7 8 9 10 P ( T 0 ≥ η ) 0.979 0.942 0.868 0.748 0.588 η 11 12 13 14 15 P ( T 0 ≥ η ) 0.412 0.252 0.132 0.058 0.021 η 16 17 18 19 20 P ( T 0 ≥ η ) 0.006 0.001 0.000 0.000 0.000
Example: Clutch What threshold do we need to ensure a significance level of 1 % ? What threshold do we need to ensure a significance level of 5 % ? The test statistic is 14, what is the p value?
Example: Clutch What threshold do we need to ensure a significance level of 1 % ? 16 What threshold do we need to ensure a significance level of 5 % ? The test statistic is 14, what is the p value?
Example: Clutch What threshold do we need to ensure a significance level of 1 % ? 16 What threshold do we need to ensure a significance level of 5 % ? 15 The test statistic is 14, what is the p value?
Example: Clutch What threshold do we need to ensure a significance level of 1 % ? 16 What threshold do we need to ensure a significance level of 5 % ? 15 The test statistic is 14, what is the p value? 5.8 %
Example: Clutch What threshold do we need to ensure a significance level of 1 % ? 16 What threshold do we need to ensure a significance level of 5 % ? 15 The test statistic is 14, what is the p value? 5.8 % Is this the probability that the null hypothesis holds?
Example: Clutch What threshold do we need to ensure a significance level of 1 % ? 16 What threshold do we need to ensure a significance level of 5 % ? 15 The test statistic is 14, what is the p value? 5.8 % Is this the probability that the null hypothesis holds? No!
Power The power of a test is the probability of rejecting H 0 under H 1 For a given significance level, we want as much power as possible Problem: We need to know the distribution of the data under H 1 !
The hypothesis testing framework Parametric testing Nonparametric testing Multiple testing
Parametric testing Data are sampled from a known distribution with unknown parameters Probability measure P θ depends on θ Frequentist perspective The parameter is deterministic and so are the hypotheses Notation: � X is a random vector distributed according to P θ , x are a realization of � the data � X
If H 0 is θ = θ 0 The size of a test with test statistic T and rejection region R is � � T ( � α := P θ 0 X ) ∈ R If the rejection region is of the form T ( � x ) ≥ η � � T ( � α = P θ 0 X ) ≥ η Smallest η at which we reject H 0 is T ( � x ) � � T ( � p = P θ 0 X ) ≥ T ( � x ) p value: probability under H 0 of observing a test statistic that is as extreme as the one we observe
Composite hypotheses θ = θ 0 is a simple hypothesis A composite hypothesis is of the form θ ∈ S for a certain set S The size of a composite test is � � T ( � α = sup X ) ≥ η P θ θ ∈H 0 The p value is � � T ( � X ) ≥ T ( � p = sup P θ x ) θ ∈H 0
Power function The power function of the test is defined as � � T ( � β ( θ ) := P θ X ) ∈ R We want β ( θ ) ≈ 0 for θ ∈ H 0 and β ( θ ) ≈ 1 for θ ∈ H 1
Example: Coin flip Conjecture: Coin is biased towards heads θ > 1 / 2 Null hypothesis: Coin not biased towards heads θ ≤ 1 / 2 Test statistic: Number of heads out of n = 5 , 10 , 100 flips Rejection region: Heads = n , Heads ≥ 3 n / 5 Power function?
Coin flip power function If η = n , � � T ( � β 1 ( θ ) = P θ X ) ∈ R If η = 3 n / 5, � � T ( � β 2 ( θ ) = P θ X ) ∈ R
Coin flip power function If η = n , � � T ( � β 1 ( θ ) = P θ X ) ∈ R = θ n If η = 3 n / 5, � � T ( � β 2 ( θ ) = P θ X ) ∈ R
Coin flip power function If η = n , � � T ( � β 1 ( θ ) = P θ X ) ∈ R = θ n If η = 3 n / 5, � � T ( � β 2 ( θ ) = P θ X ) ∈ R n � n � θ k ( 1 − θ ) n − k � = k k = 3 n / 5
η = n n = 5 n = 50 0.75 n = 100 β ( θ ) 0.50 0.25 0.05 0.25 0.50 0.75 θ
η ≥ 3 n / 5 n = 5 n = 50 0.75 n = 100 β ( θ ) 0.50 0.25 0.05 0.25 0.50 0.75 θ
The hypothesis testing framework Parametric testing Nonparametric testing Multiple testing
Permutation test Aim: Compare two datasets � x A and � x B Null hypothesis: The two datasets are sampled from the same distribution No parametric model...
Test statistic Choose test statistic T and evaluate the difference T diff ( � x ) := T ( � x A ) − T ( � x B ) , Test: R := { t | t ≥ η } Problem: How do we determine significance level or p value?
Main insight: Exchangeability under permutations Under H 0 distribution of T diff ( � X ) does not change if we permute labels Joint distribution of � X 1 , � X 2 , . . . , � X n and of any permutation X 24 , � � X n , . . . , � X 3 are the same Values of T diff after permuting t diff , 1 , . . . t diff , n ! are uniformly distributed n ! = 1 � � T diff ( � � X ) ≥ η 1 t diff , i ≥ η P n ! i = 1 This is the size of the test! � � T diff ( � p = P X ) ≥ T diff ( � x ) n ! = 1 � 1 t diff , i ≥ T diff ( � x ) n ! i = 1
Recommend
More recommend