hypothesis testing part 2
play

Hypothesis testing, part 2 With some material from Howard Seltman, - PowerPoint PPT Presentation

Hypothesis testing, part 2 With some material from Howard Seltman, Blase Ur, Bilge Mutlu, Vibha Sazawal 1 CATEGOR ORICAL IV, NU NUMERI ERIC DV 2 Independent samples, one IV # Conditions Normal/Parametric Non-parametric Exactly 2


  1. Hypothesis testing, part 2 With some material from Howard Seltman, Blase Ur, Bilge Mutlu, Vibha Sazawal 1

  2. CATEGOR ORICAL IV, NU NUMERI ERIC DV 2

  3. Independent samples, one IV # Conditions Normal/Parametric Non-parametric Exactly 2 T-test Mann-Whitney U, bootstrap 2+ One-way ANOVA Kruskal-Wallis, bootstrap 3

  4. Is your data normal? • Skewness: asymmetry • Kurtosis: “peakedness” rel. to normal – Both: within +- 2SE(s/u) is OK • Or use Shapiro-Wilk (null = normal) • Or look at Q-Q plot 4

  5. T-test • Already talked about • Assumptions: normality, equal variances, independent samples – Can use Levene to test equal variance assumption • Post-test: check residuals for assumption fit – For a t-test this is the same pre or post – For other tests you check residual vs. fit post 5

  6. One way ANOVA • H0: m 1 = m 2 = m 3 • H1: at least one doesn’t match • NOT H1: m 1 != m 2 != m 3 • Assumptions: normality, common variance, independent errors • Intuition: F statistic – Variance between / Variance within – Under (exact null), F=1; F >> 1 rejects null 6

  7. One-way ANOVA • F = MS b / MS w • MSw = sum [sum[ (diff from mean) 2 ]] / df w – df w = N-k, where k = number of conditions – Sum over all conditions; sum per condition • MS b = sum [(diff from grand mean) 2] / df b – df b = k-1 – Every observation goes in the sum 7

  8. (example from Vibha Sazawal) 8

  9. 9

  10. F-distribution rejected 10

  11. Now what? (Contrasts) • So we rejected the null. What did we learn? – What *didn’t* we learn? – At least one is different ... Which? All? – This is called an “omnibus test” • To answer our actual research question, we usually need pa pairw rwise co contra trasts ts 11

  12. The trouble with contrasts • Contrasts mess with your Type I bounds – One test: 95% confident – Three tests: 85.7% confident – 5 conditions, all pairs: 4 + 3 + 2 + 1 = 10 tests: 59.9% – UH OH 12

  13. Planned vs. post hoc • Planned: You have a theory. – Really, no cheating – You get n-1 pairwise comparisons for free – In theory, should not be control vs. all, but prob. OK – NO COMPARISONS unless omnibus passes • Post-hoc – Anything unplanned – More than n-1 – Requires correction! – Doesn’t necessarily require omnibus first 13

  14. Correction • Adjust {p-values, alpha} to compensate for multiple testing post-hoc • Bonferroni (most conservative) – Assume all possible pairs: m = k(k-1)/m (comb.) – alpha c = alpha / m – Once you have looked, implication is you did all the comparisons implicitly! • Holm-Bonferroni is less conservative – Stepwise adjusting alpha as you go • Dunnett for specifically all vs. control, others 14

  15. Independent samples, one IV # Conditions Normal/Parametric Non-parametric Exactly 2 T-test Mann-Whitney U, bootstrap 2+ One-way ANOVA Kruskal-Wallis, bootstrap 15

  16. Non-parametrics: MWU and K-W • Good for non-normal data, likert data (ordinal, not actually numeric) • Assumptions: independent, at least ordinal • Null: P(X > Y) = P(Y > X) where X,Y are observations from the 2 distributions (MWU) – If assume same distribution shape, continuous then this can can be seen as comparing medians 16

  17. MWU and K-W continued • Essentially: rank order all data (both conditions) – Total ranks for condition 1, compare to “expected” – Various procecures to correct for ties 17

  18. Bootstrap • Resampling technique(s) • Intuition: – Create “null” distribution by e.g. subtracting means so mA = mB = 0 • Now you have shifted samples A-hat and B-hat – Combine these to make a null distribution – Draw sample of size N, with replacement • Do it 1000 (or 10k) times – Use this to determine critical value (alpha = 0.05) – Compare this critical value to your real data for test 18

  19. Paired samples, one IV # Conditions Normal/Parametric Non-parametric Exactly 2 Paired T-test Wilcoxon signed-rank 2+ 2-way ANOVA w/ Friedman subject random factor Mixed models (later) 19

  20. Paired T-test • Two samples per participant item • Test subtracts them • Then uses one-sample T-test with H0: m = 0 and H1: m != 0 • Regular T-test assumptions, plus: does subtraction make sense here? 20

  21. Wilcoxon S.R. / Friedman • H0: difference btwn pairs is symmetric around 0 • H1: … or not • Excludes no-change items • Essentially: rank by abs. difference; compare signs * ranks • (Friedman = 3+ generalization) 21

  22. One numeric IV, numeric DV SIMPLE LINEAR REGRESSION ON 22

  23. Simple linear regression • E(Y|x) = b 0 + b 1 x … looks at populations – Population mean at this value of x • Key H0: b 1 != 0 – b 0 usually not important for significance (obv. important in model fit) • b 1 : slope à change in Y per unit X • Best fit: Least squares, or maximum likelihood – LSq: minimize sum of squares of residuals – ML: max prob. of seeing this data with this model 23

  24. Assumptions, caveats • Assumes: – linearity in Y ~ X – normally distributed error for each x, with constant variance at all x – Error measuring X is small compared to var. Y (fixed X) • Independent errors! – Serial correlation, data that is grouped, etc. (later) • Don’t interpret widely outside available x vals • Can transform for linearity! – Log(Y), sqrt(y), 1/y, y^2 24

  25. Assumption/residual checking • Before: Use scatterplot for plausible linearity • After: residual vs. fit – Residual on Y vs. predicted on X – Should be relatively even distributed around 0 (linear) – Should have relatively even v. spread (eq. var) • After: quantile-normal of residuals 25

  26. Model interpretation • Interpret b1, interpret the p-value • CI: if it crosses 0, it’s not significant • R 2 : fraction of total variation accounted for – Intutively: explained variance / total variance – Explained = var(Y) – residual errors • F 2 = R 2 / (1 – R R 2 ); SML: 0.02, 0.15, 0.35 (cohen) 26

  27. Robustness • Brittle to linearity, independent errors • Somewhat brittle to fixed-X • Fairly robust to equal variance • Quite robust to normality 27

  28. CATEGOR ORICAL OU OUTCOM OMES 28

  29. One Cat. IV, Cat. DV, independent • Contingency tables: how many people in each combination of categories 29

  30. Chi-square test of independence • H0: distribution of Var1 is the same at every level of Var2 (and vice versa) – Null dist. Approaches X^2 when sample size grows – Heuristic: no cells < 5 – Can use FET instead • Intuition: – Sum over rows/columns: (observed – expected)^2 / expected – Expected: marginal % * count in other margin 30

  31. Paired 2x2 tables • Use McNemar’s test – Contigency table: matches and mismatches for each option. • H0: marginals are the same Cond1: Yes Cond 1: No Cond2: Yes a b a + b Cond2: No c d c + d a + c b + d N • Essentially a X^2 test on the agreement – Test stat: (b-c)^2 / (b+c) 31

  32. Paired, continued • Cochran’s Q: extended for more than two conditions • Other similar extensions for related tasks 32

  33. Critiques • Choose a paper that has one (or more) empirical experiments as a central contribution – Doesn’t have to be human subjects, but can be – Does have to have enough description of experiment • 10-12 minute presentation • Briefly: research questions, necessary background • Main: describe and critique methods – Experimental design, data collection, analysis – Good, bad, ugly, missing • Briefly, results? 33

  34. Logistic regression (logit) • Numeric IV, binary DV (or ordinal) • log( E(Y)/ (1-E(Y)) ) == log ( Pr (Y=1) / Pr (Y=0)) = b 0 + b 1 x • Log odds of success = linear function – Odds: 0 to inf., 1 is the middle – e.g.: odds = 5 = 5:1 … for five successes, one fail – Log odds: -inf to inf w/ 0 in the middle: good for regression • Modeled as binomial distribution 34

  35. Interpreting logistic regression • Take exp(coef) to get interpretable odds. • For each unit increase in x, odds increase b 1 times – Note that this can make small coefs important! • Use e.g., Homer-Lemeshow test for goodness of fit – null == data fit the model – But not a lot of power! 35

  36. MULTIVARIATE MU 36

  37. Multiple regression • Linear/logistic regression with more variables! – At least one numeric, 0+ categorical • Still: fixed x, normal errors w/ equal variance, independent errors (linear) • Linear relationship in E(Y) and one x, when other inputs held constant – Effects of each x are independent! • Still check q-n of residuals, residual vs. fit 37

  38. Model selection • Which covariates to keep? (more on this in a bit) 38

  39. Adding categorical vars • Indicator variables (everything is 0 or 1) • Need one fewer indicator than conditions – One condition is true; or none are true (baseline) – Coefs are *r *relative to o baseline*! *! • Model selection: keep all or none for one factor • Called “ANCOVA” when at least one each numeric + categorical 39

  40. Interaction • What if your covariates *aren’t* independent? • E(Y) = b0 + b 1 x 1 + b 2 x 2 + b 12 x 1 x 2 – Slope for x1 is diff. for each value of x2 • Superadditive: all in same direction, interaction makes effects stronger • Subadditive: interaction is in opposite direction • For indicator vars, all or none 40

Recommend


More recommend