p values randomization tests and nonparametric
play

P -values, Randomization Tests, and Nonparametric Combinations of - PowerPoint PPT Presentation

P -values, Randomization Tests, and Nonparametric Combinations of Tests Tonix Virtual Retreat Philip B. Stark 22 October 2020 University of California, Berkeley 1 Randomized experiments Subjects recruited at one or more centers


  1. P -values, Randomization Tests, and Nonparametric Combinations of Tests Tonix Virtual Retreat Philip B. Stark 22 October 2020 University of California, Berkeley 1

  2. Randomized experiments • Subjects recruited at one or more centers • Criteria to ensure they have the condition • Randomized to treatment/control or treatment level, sometimes w/ constraints or “bias” to get balance. • Randomization algorithms often proprietary 2

  3. Analyzing the data • Common to use things like ANOVA, t-tests, regression, logistic regression • Assumptions generally have nothing to do with the experiment 3

  4. Small example 11 pairs of rats, each pair from the same litter. Randomly–by coin toss–put one of each pair into “enriched” environment; other sib gets “normal” environment. After 65 days, measure cortical mass (mg). enriched 689 656 668 660 679 663 664 647 694 633 653 impoverished 657 623 652 654 658 646 600 640 605 635 642 diff 32 33 16 6 21 17 64 7 89 -2 11 Cartoon of Rosenzweig, M.R., E.L. Bennet, and M.C. Diamond, 1972. Brain changes in response to experience, Scientific American , 226 , 22–29. 4

  5. Informal Hypotheses Null hypothesis: treatment has “no effect.” Alternative hypothesis: treatment increases cortical mass. Suggests 1-sided test for an increase. 5

  6. Test contenders • 2-sample Student t -test mean(treatment) - mean(control) pooled estimate of SD of difference of means 6

  7. Test contenders • 2-sample Student t -test mean(treatment) - mean(control) pooled estimate of SD of difference of means • 1-sample Student t -test on the differences mean(differences) √ SD(differences) / 11 6

  8. Test contenders • 2-sample Student t -test mean(treatment) - mean(control) pooled estimate of SD of difference of means • 1-sample Student t -test on the differences mean(differences) √ SD(differences) / 11 • randomization test using t -statistic of differences: same statistic, calibrate probability differently 6

  9. The Neyman “ticket” model (1930) • S subjects, T treatments 7

  10. The Neyman “ticket” model (1930) • S subjects, T treatments • subject s represented by a ticket with T numbers on it, x s 1 , . . . , x sT , set before treatment is assigned (but unknown to the experimenter) resp to tx 1 resp to tx 2 · · · resp to tx T 4 9.2 · · · -3.33 7

  11. The Neyman “ticket” model (1930) • S subjects, T treatments • subject s represented by a ticket with T numbers on it, x s 1 , . . . , x sT , set before treatment is assigned (but unknown to the experimenter) resp to tx 1 resp to tx 2 · · · resp to tx T 4 9.2 · · · -3.33 • x st is the response subject s will have if assigned treatment t • if subject s is assigned to treatment t , observe x st 7

  12. The Neyman “ticket” model (1930) • S subjects, T treatments • subject s represented by a ticket with T numbers on it, x s 1 , . . . , x sT , set before treatment is assigned (but unknown to the experimenter) resp to tx 1 resp to tx 2 · · · resp to tx T 4 9.2 · · · -3.33 • x st is the response subject s will have if assigned treatment t • if subject s is assigned to treatment t , observe x st • no necessary connection of the numbers across subjects • no assumption about the distribution of the numbers • “non-interference” implicit 7

  13. Generalizations • subject s represented by a ticket with T J -vectors on it, � x s 1 , . . . ,� x sJ . • if subject s is assigned treatment t s , observe the vector � x st item resp to tx 1 resp to tx 2 · · · resp to tx T 1 4 9.2 · · · -3.33 2 2 1 · · · 17 . . . . . . . . . . . . . . . 5 42 · · · 9 J 8

  14. More generalizations • subject s represented by a ticket with T probability distributions on it, F s 1 , . . . , F sT . • if subject s is assigned treatment t , observe a draw from F st • F st could be a multivariate distribution resp to tx 1 resp to tx 2 · · · resp to tx T F 11 ( · ) F 12 ( · ) · · · F 1 T ( · ) 9

  15. Generic notation x st could be a scalar, a vector, or a realization of a random variable or random vector. ψ ( · ) is a test statistic : it maps the data x to a scalar 10

  16. The strong null hypothesis • “treatment doesn’t matter at all” • subject s ’s response would have been the same, no matter what treatment was assigned 11

  17. The strong null hypothesis • “treatment doesn’t matter at all” • subject s ’s response would have been the same, no matter what treatment was assigned • x s 1 = x s 2 = · · · = x sT • (but x st is not necessarily equal to x rt for r � = s ) 11

  18. The strong null hypothesis • “treatment doesn’t matter at all” • subject s ’s response would have been the same, no matter what treatment was assigned • x s 1 = x s 2 = · · · = x sT • (but x st is not necessarily equal to x rt for r � = s ) resp to tx 1 resp to tx 2 · · · resp to tx T 4 4 · · · 4 11

  19. • if the null is true, know what would have been observed if random assignment had been different: every subject would have had same response • induces null distribution for any test statistic ψ • completely determined by the randomization: no additional assumptions 12

  20. The rats: strong null Treatment has no effect–as if each rat’s cortical mass was determined before randomization. Then equally likely that the rat with the heavier cortex will be assigned to treatment or to control, independently across littermate pairs. Gives 2 11 = 2048 equally likely possibilities: ± 32 ± 33 ± 16 ± 6 ± 21 ± 17 ± 64 ± 7 ± 89 ± 2 ± 11 13

  21. Alternative hypotheses 1. Individual’s response depends only on that individual’s assignment • Special cases: shift, scale, etc. 2. Interactions/Interference: my response could depend on your treatment 14

  22. Assumptions of the tests 1. 2-sample t -test: • masses are iid sample from normal distribution, same unknown variance, same unknown mean. • Tests “weak” null hypothesis (plus normality, independence, non-interference, etc.). 2. 1-sample t -test on the differences: • mass differences are iid sample from normal distribution, unknown variance, zero mean. • Tests “weak” null hypothesis (plus normality, independence, non-interference, etc.) 3. randomization test: • randomization performed as claimed. • tests strong null hypothesis. Assumptions of randomization test are true by fiat. 15

  23. Student t -test calculations Mean of differences: 26.73mg Sample SD of differences: 27.33mg t -statistic: 3 . 244 ≡ t 0 . P -value for 2-sided t -test: 0.0088 16

  24. Student t -test calculations Mean of differences: 26.73mg Sample SD of differences: 27.33mg t -statistic: 3 . 244 ≡ t 0 . P -value for 2-sided t -test: 0.0088 • Why do cortical weights have normal distribution? • Why is variance of the difference between treatment and control the same for different litters? • Treatment and control are dependent because assigning a rat to treatment excludes it from the control group, and vice versa. • P -value depends on assuming differences are iid sample from a normal distribution. • If we reject the null, is that because there is a treatment effect, or because the other assumptions are wrong? 16

  25. Randomization t -test calculations Could enumerate all 2 11 = 2 , 048 equally likely possibilities. Calculate t -statistic for each. P -value is (# possibilities s.t. t ≥ t 0 )/2048 ≈ 0 . 0018. 17

  26. 18

  27. “Statistical procedure and experimental design are only two different aspects of the same whole, and that whole is the logical requirements of the complete process of adding to natural knowledge by experimentation.” 19

  28. “A Lady declares that by tasting a cup of tea made with milk she can discriminate whether the milk or the tea infusion was first added to the cup. We will consider the problem of designing an experiment by means of which this assertion can be tested. · · · Our experiment consists in mixing eight cups of tea, four in one way and four in the other, and presenting them to the subject for judgment in a random order. The subject has been told in advance of what the test will consist, namely, that she will be asked to taste eight cups, that these shall be four of each kind, and that they shall be presented to her in a random order, that is in an order not determined arbitrarily by human choice, but by the actual manipulation of the physical apparatus used in games of chance, dice, cards, roulettes, etc., or, more expeditiously, from a published collection of random sampling numbers purporting to give the actual results of such manipulation. Her task is to divide the 8 cups into two sets of 4, agreeing, if possible, with the treatments received.” 20

  29. Test statistic: number of correct IDs � = 70 � 8 4 21

  30. Test statistic: number of correct IDs � = 70 � 8 4 � = 16 � 4 �� 4 3 1 1 / 70 ≈ 0 . 014; (16 + 1) / 70 ≈ 0 . 243 21

Recommend


More recommend