lecture 4 permutation methods
play

Lecture 4: Permutation Methods Applied Statistics 2014 1 / 21 - PowerPoint PPT Presentation

Randomization Model Population Model Rank Tests Assignment Lecture 4: Permutation Methods Applied Statistics 2014 1 / 21 Randomization Model Population Model Rank Tests Assignment Permutation Methods Non-parametric methods for testing


  1. Randomization Model Population Model Rank Tests Assignment Lecture 4: Permutation Methods Applied Statistics 2014 1 / 21

  2. Randomization Model Population Model Rank Tests Assignment Permutation Methods Non-parametric methods for testing difference among samples (or groups). These tests can serve as alternatives to some classical tests such two-sample t -tests and ANOVA tests. First introduced in Fisher (1935) and Pitman (1937) There are two typical settings. Randomization Model : randomization tests Population Model : permutation tests Provide a unified framework for rank-based tests such as Wilcoxon rank test It is computationally intensive 2 / 21

  3. Randomization Model Population Model Rank Tests Assignment Randomization Model Basis: subjects are randomly assigned to different treatments (usual practice in medicine) The only random aspect of the model is the assignment of treatments. Inference is limited to subjects under study. There is no population. 3 / 21

  4. Randomization Model Population Model Rank Tests Assignment Randomization Model - Example Example (Ernst (2004)) A new treatment for post-surgical recovery is compared with a standard treatment. Of the n subjects available for the study, n 1 are randomly assigned to receive the new treatment, while the remaining n 2 = n − n 1 receive the standard treatment. The corresponding recovery times (in days) are recorded: X 1 , . . . , X n 1 and Y 1 , . . . , Y n 2 , for new and standard treatments, respectively. H 0 : There is no difference between the treatments. H a : The new treatment decreases the recovery times. Test statistic: T d = ¯ X − ¯ Y . 4 / 21

  5. Randomization Model Population Model Rank Tests Assignment Randomization Model - Example Specifically, n = 7 , n 1 = 4 and n 2 = 3 and ( x 1 , x 2 , x 3 , x 4 ) = (19 , 22 , 25 , 26) , ( y 1 , y 2 , y 3 ) = (23 , 33 , 40) t d = ¯ x − ¯ y = − 9 How to compute the p -value? The only random aspect is the random assignment of treatment. So if H 0 is true , then the recovery time for each subject will be the same regardless of which treatment is received. Under H 0 , the distribution of T d is obtained based on the permutation of the values of x i ’s and y i ’s. 5 / 21

  6. Randomization Model Population Model Rank Tests Assignment Randomization Model - Example � 7 � There are in total =35 equally likely randomizations. 4 i X 1 X 2 X 3 X 4 Y 1 Y 2 Y 3 t i 1 19 22 25 26 23 33 40 -9.00 2 22 23 25 26 19 33 40 -6.67 3 22 33 25 26 19 23 40 -0.83 4 22 25 26 40 19 23 33 3.25 ... 35 19 23 33 40 22 25 26 4.42 The p -value is given by � 35 i =1 I ( t i ≤ t d ) p = P H 0 ( T d < t d ) = ≈ 0 . 0857 . 35 6 / 21

  7. Randomization Model Population Model Rank Tests Assignment Randomization Model - Example Randomisation distribution 0.07 prob 0.05 0.03 -10 -5 0 5 10 t Figure : Reference distribution 7 / 21

  8. Randomization Model Population Model Rank Tests Assignment Randomization Model - Remarks Since the subjects are not randomly chosen, the conclusion can not be generalized to a broader range than the subjects under studied. 8 / 21

  9. Randomization Model Population Model Rank Tests Assignment Population Model Suppose there are two independent random samples: X 1 , . . . , X n 1 and Y 1 , . . . , Y n 2 . d H 0 : X 1 = Y 1 versus H 1 : E ( X 1 ) > E ( Y 1 ) . Test statistic T = ¯ X n 1 − ¯ Y n 2 . We reject H 0 for large value of T . Under H 0 , the reference distribution of T is obtained in the same way as in the randomization model. 9 / 21

  10. Randomization Model Population Model Rank Tests Assignment Population Model Let n = n 1 + n 2 . Define Z 1 = X 1 , . . . , Z n 1 = X n 1 , Z n 1 +1 = Y 1 , . . . , Z n = Y n 2 , and denote the observed values by ( z 1 , . . . , z n ) . � n 1 � n 2 1 1 T = i =1 Z i − i =1 Z i + n 1 . n 1 n 2 Under H 0 , Z i ’s are iid. Define the event E = { ( Z 1 , . . . , Z n ) = ( z pe (1) , . . . , z pe ( n ) ) , for some permutation pe } . Then for any permutation ˜ p , p ( n ) ) | E ) = 1 P H 0 (( Z 1 , . . . , Z n ) = ( z ˜ p (1) , . . . , z ˜ n ! . � n 1 � n 2 1 1 Let t i = i =1 z ˜ p ( i ) − i =1 z ˜ p ( i + n 1 ) . We have, n 1 n 2 1 P H 0 ( T = t i | E ) = � . � n n 1 10 / 21

  11. Randomization Model Population Model Rank Tests Assignment Population Model We obtain the conditional sample of T : { t 1 , . . . , t m } , where � n � m = . n 1 Write t = ¯ x n 1 − ¯ y n 2 . The p -value is given by # { i : t i ≥ t } . � n � n 1 1 Note that the p -value is at least n 1 ) . ( n 11 / 21

  12. Randomization Model Population Model Rank Tests Assignment Population Model Lemma k If the significance level α = n 1 ) and we can take [ t n − k +1 , ∞ ] as the ( n critical region, where t ( n − k +1) is the k -th largest value of t i ’s. Then the permutation test is exact, that is P H 0 ( T ≥ t ( n − k +1) | E ) = α. 12 / 21

  13. Randomization Model Population Model Rank Tests Assignment Population Model Lemma k If the significance level α = n 1 ) and we can take [ t n − k +1 , ∞ ] as the ( n critical region, where t ( n − k +1) is the k -th largest value of t i ’s. Then the permutation test is exact, that is P H 0 ( T ≥ t ( n − k +1) | E ) = α. Note that the critical value t ( n − k +1) is a random cut as it depends on the data (or observations). It is a conditional test as it generates the permutation distribution of T conditional on the observed values. Conditional on the observed values, the permutation distribution of T does not depend on the underlying population G and F . Hence, the test is distribution free. 12 / 21

  14. Randomization Model Population Model Rank Tests Assignment Population Model - Remarks The basic idea is to generate a reference distribution by recalculating a statistic for many permutations of the data. Not all statistics can be used in permutation methods. Suppose X ∼ N ( µ 1 , σ 2 1 ) and Y ∼ N ( µ 2 , σ 2 2 ) . Based on two independent samples, we want to test H 0 : µ 1 = µ 2 . If the variances are unknown and hence not necessary equal. Consider the t -statistics, X m − ¯ ¯ Y n T = , � S 2 X /m + S 2 Y /n The distribution of T is not invariant under permutation. 13 / 21

  15. Randomization Model Population Model Rank Tests Assignment Population Model - Remarks Exhuasitively computing all permutations is unfeasible for large values of n 1 and n 2 . For instance, if n 1 = n 2 = 15 , � 30 � > 155 million . 15 We can use Monte-Carlo methods to estimate the p -value. Generate B samples from the permutation distribution.The function boot in R package boot can be useful for this purpose. Approximate p -value by its sample counterpart. p = 1 + � B i =1 I ( t i ≥ t ) ˆ . 1 + B 14 / 21

  16. Randomization Model Population Model Rank Tests Assignment Population Model - Example Byzantine coins. This is example 15.6 in Kvam and Vidakovic (2007). Researchers investigated the silver content ( % Ag) of a num- ber of Byzantine coins discovered in Cyprus. The coins are from the first and fourth coinage in the reign of King Manuel I, Commenus (1143-1180). Based on the following data, we want to test if there is a significant difference between the two coinages in terms of silver content. For coins from the first coinage ( X ): (5.9, 6.8, 6.4, 7.0, 6.6, 7.7, 7.2, 6.9, 6.2) For coins from the fourth coinage ( Y ): (5.3, 5.6, 5.5, 5.1, 6.2, 5.8, 5.8) d H 0 : X = Y versus H 1 : E ( X ) � = E ( Y ) . This is a two-sided alternative. 15 / 21

  17. Randomization Model Population Model Rank Tests Assignment Population Model - Example We choose the test statistic T = ¯ X − ¯ Y . Note that n 1 = 9 and n 2 = 7 . � 16 � For each of the = 11440 =: m permutations, we calculate the value 9 t i . Permutation distribution, observed value in blue 1.0 0.8 0.6 Density 0.4 0.2 0.0 -1.0 -0.5 0.0 0.5 1.0 T � m The test statistics t = 1 . 13 . Let ¯ t = 1 i =1 t i be the mean of the m permutation distribution. We define the two-sided p value as m p = 1 � I ( | t i − ¯ t | ≥ | t − ¯ t | ) = 0 . 000699 . m i =1 16 / 21

  18. Randomization Model Population Model Rank Tests Assignment Wilconxon/Mann-Whitney test iid iid Suppose X 1 , . . . , X n 1 ∼ F X and Y 1 , . . . , Y n 2 ∼ F Y . Both F X and F Y are continuous. H 0 : F X = F Y versus H 1 : F X < F Y Under H 1 , X 1 is stochastically larger than Y 1 . Let ( R 1 , . . . , R n 1 + n 2 ) be the ranks of the pooled sample ( X 1 , . . . , X n 1 , Y 1 , . . . , Y n 2 ) . So R 1 is the rank of X 1 in all n = n 1 + n 2 observations. Wilcoxon’s test statistics is T = � n 1 i =1 R i . We reject H 0 for large value of T . What is the reference distribution of T Under H 0 ? 17 / 21

  19. Randomization Model Population Model Rank Tests Assignment Wilconxon/Mann-Whitney test Under H 0 , we have ( R 1 , . . . , R n 1 ) is a random sample without replacement from { 1 , 2 , . . . , n 1 + n 2 } ; the distribution of T is known and does NOT depend on F X ( = F Y ). 18 / 21

Recommend


More recommend