hypothesis testing
play

HYPOTHESIS TESTING PART III LEARNING GOALS become able to - PowerPoint PPT Presentation

INTRODUCTION TO DATA ANALYSIS HYPOTHESIS TESTING PART III LEARNING GOALS become able to interpret & apply some statistical tests Pearsons -tests of independence 2 z -test one-sample t -test two-sample t -test


  1. INTRODUCTION TO DATA ANALYSIS HYPOTHESIS TESTING PART III

  2. LEARNING GOALS ▸ become able to interpret & apply some statistical tests ▸ Pearson’s -tests of independence χ 2 ▸ z -test ▸ one-sample t -test ▸ two-sample t -test ▸ one-way ANOVA ▸ understand differences and commonalities of different approaches to frequentist testing ▸ Fisher ▸ Neyman/Pearson ▸ modern hybrid NHST

  3. P-VALUE p ( D obs ) = P ( T | H 0 ⪰ H 0, a t ( D obs ))

  4. Pearson’s χ 2 -test goodness of fit

  5. χ 2 PEARSON -TESTS ▸ tests for categorical data (with more than two categories) ▸ two flavors: ▸ test of goodness of fit ▸ test of independence ▸ sampling distribution is a -distribution χ 2

  6. χ 2 -DISTRIBUTION ▸ standard normal random variables: X 1 , … X n ▸ derived RV: Y = X 2 1 + … + X 2 n ▸ it follows (by construction) that: y ∼ χ 2 - distribution( n )

  7. PEARSON’S -TEST [GOODNESS OF FIT] χ 2 Is it conceivable that each category (= pair of music+subject choice) has been selected with the same flat probability of 0.25?

  8. ⃗ ⃗ ⃗ ⃗ FREQUENTIST MODEL FOR PEARSON’S -TEST [GOODNESS OF FIT] χ 2 n ∼ Multinomial( p p , N ) N k ( n i − np i ) 2 χ 2 = ∑ np i i =1 n Sampling distribution: χ 2 ∼ χ 2 - distribution( k − 1) χ 2

  9. ⃗ ⃗ PEARSON’S -TEST [GOODNESS OF FIT] χ 2 p N n χ 2 k ( n i − np i ) 2 χ 2 = ∑ np i i =1 χ 2 ∼ χ 2 - distribution( k − 1)

  10. ⃗ ⃗ PEARSON’S -TEST [GOODNESS OF FIT] χ 2 p N n χ 2 k ( n i − np i ) 2 χ 2 = ∑ np i i =1 χ 2 ∼ χ 2 - distribution( k − 1)

  11. ⃗ ⃗ PEARSON’S -TEST [GOODNESS OF FIT] χ 2 p N n χ 2 k ( n i − np i ) 2 χ 2 = ∑ np i i =1 χ 2 ∼ χ 2 - distribution( k − 1)

  12. PEARSON’S -TEST [GOODNESS OF FIT] χ 2 How to interpret / report the result: What about the lecturer’s conjecture that (colorfully speaking) logic + metal = 🥱 ?

  13. Pearson’s χ 2 -test independence

  14. STOCHASTIC INDEPENDENCE ▸ events and are stochastically independent iff A B ▸ intuitively: learning one does not change beliefs about the other; ▸ formally: P ( A ∣ B ) = P ( A ) ▸ notice that entails that (see web-book) P ( A ∣ B ) = P ( A ) P ( B ∣ A ) = P ( B )

  15. STOCHASTIC INDEPENDENCE

  16. PEARSON’S -TEST [INDEPENDENCE] χ 2 Is it conceivable that the outcome in each cell is given by independent choices of row and column options? Hence: is the probability of a choice of cell the product of the probability of row- and column choices?

  17. ⃗ ⃗ ⃗ ⃗ ⃗ ⃗ ⃗ ⃗ ⃗ FREQUENTIST MODEL FOR PEARSON’S -TEST [INDEPENDENCE] χ 2 r c p = vec . of outer product r & c n ∼ Multinomial( p , N ) p N k ( n i − np i ) 2 χ 2 = ∑ np i i =1 n Sampling distribution: χ 2 ∼ χ 2 - distribution ( ( k r − 1) ⋅ ( k c − 1) ) χ 2

  18. ⃗ ⃗ ⃗ ⃗ FREQUENTIST MODEL FOR PEARSON’S -TEST [INDEPENDENCE] χ 2 r c p N n χ 2

  19. ⃗ ⃗ ⃗ ⃗ FREQUENTIST MODEL FOR PEARSON’S -TEST [INDEPENDENCE] χ 2 r c p N n k ( n i − np i ) 2 χ 2 = ∑ χ 2 np i i =1

  20. ⃗ ⃗ ⃗ ⃗ FREQUENTIST MODEL FOR PEARSON’S -TEST [INDEPENDENCE] χ 2 r c p N n χ 2

  21. ⃗ ⃗ ⃗ ⃗ FREQUENTIST MODEL FOR PEARSON’S -TEST [INDEPENDENCE] χ 2 r c p N n χ 2

  22. FREQUENTIST MODEL FOR PEARSON’S -TEST [INDEPENDENCE] χ 2 How to interpret / report the result:

  23. z- test

  24. ⃗ z SCENARIO FOR A -TEST [ONE-SAMPLE] ▸ metric variable with samples from normal distribution x ▸ unknown μ ▸ known [usually unrealistic!] σ Is it plausible to maintain that this data was generated by a normal distribution with mean 100 (if we assume that the standard deviation is known to be 15)?

  25. z FREQUENTIST MODEL FOR A -TEST [ONE-SAMPLE] μ σ x i ∼ Normal( μ , σ ) x − μ z = ¯ σ / N x i Sampling distribution: z ∼ Normal(0,1) z

  26. FREQUENTIST Z-TEST [APPLICATION] x i ∼ Normal( μ , σ ) x − μ z = ¯ σ / N z ∼ Normal(0,1)

  27. FREQUENTIST Z-TEST [APPLICATION] x i ∼ Normal( μ , σ ) x − μ z = ¯ σ / N z ∼ Normal(0,1)

  28. one-sample t- test

  29. ̂ ̂ ⃗ ̂ FREQUENTIST T-TEST MODEL [ONE-SAMPLE] x i ∼ Normal( μ , σ ) μ σ n 1 ∑ x ) 2 ( x i − μ σ = n − 1 i =1 x − μ 0 t = ¯ x i n σ / n Sampling distribution: t t ∼ Student - t( ν = n − 1)

  30. t -DISTRIBUTION ▸ two random variables: x ∼ Normal(0,1) y ∼ χ 2 - distribution( n ) ▸ derived RV: X Z = Y / n ▸ it follows (by construction) that: z ∼ Student - t( ν = n − 1)

  31. ̂ ⃗ ̂ FREQUENTIST T-TEST [APPLICATION] x i ∼ Normal( μ , σ ) n 1 ∑ x ) 2 ( x i − μ σ = n − 1 i =1 x − μ 0 t = ¯ σ / n t ∼ Student - t( ν = n − 1)

  32. ⃗ ̂ ̂ FREQUENTIST T-TEST [APPLICATION] x i ∼ Normal( μ , σ ) n 1 ∑ x ) 2 ( x i − μ σ = n − 1 i =1 x − μ 0 t = ¯ σ / n t ∼ Student - t( ν = n − 1)

  33. two-sample t- test (unpaired data, equal variance & unequal sample size)

  34. COMPARING TWO GROUPS OF METRIC MEASURES Is it plausible to assume that the observed prices for conventional and organic avocados could have been generated by a single normal distribution?

  35. ̂ ̂ ̂ FREQUENTIST T-TEST MODEL [TWO-SAMPLE, UNPAIRED, EQUAL VARIANCE, UNEQUAL SAMPLE SIZES] x A i ∼ Normal( μ + δ , σ ) μ δ x B i ∼ Normal( μ , σ ) σ 2 σ 2 ( n A − 1) ̂ A + ( n B − 1) ̂ ( n B ) 1 + 1 B σ = σ n B n A + n B − 2 n A n A x A x B x B ) − δ ) ⋅ 1 t = ( (¯ i i x A − ¯ σ Sampling distribution: t t ∼ Student - t( ν = n A + n B − 2)

  36. ̂ ̂ TWO-SAMPLE T-TEST EXAMPLE x A i ∼ Normal( μ + δ , σ ) x B i ∼ Normal( μ , σ ) σ 2 σ 2 ( n A − 1) ̂ A + ( n B − 1) ̂ ( n B ) 1 + 1 B σ = n A + n B − 2 n A x B ) − δ ) ⋅ 1 t = ( (¯ x A − ¯ σ t ∼ Student - t( ν = n A + n B − 2)

  37. ̂ ̂ TWO-SAMPLE T-TEST EXAMPLE x A i ∼ Normal( μ + δ , σ ) x B i ∼ Normal( μ , σ ) σ 2 σ 2 ( n A − 1) ̂ A + ( n B − 1) ̂ ( n B ) 1 + 1 B σ = n A + n B − 2 n A x B ) − δ ) ⋅ 1 t = ( (¯ x A − ¯ σ t ∼ Student - t( ν = n A + n B − 2)

  38. one-way ANOVA

  39. COMPARING K ≥ 2 GROUPS OF METRIC MEASURES Is it plausible to assume that these measures stem from the same normal distribution?

  40. t WHY NOT -TESTS? ▸ we could run -tests between t different groups ▸ chance of error rises with α each comparison ▸ common corrections apply ▸ gets tedious with large k

  41. ̂ ̂ ̂ ̂ FREQUENTIST MODEL FOR ANOVA [ONE-WAY] σ between x ij ∼ Normal( μ , σ ) F = σ within μ σ ∑ k j =1 ∑ n j x j ) 2 i =1 ( x ij − ¯ σ within = ∑ k i =1 ( n i − 1) x ij ∑ k x ) 2 x j − ¯ j =1 n j (¯ ¯ σ between = k − 1 Sampling distribution: F ∼ F - distribution ( k − 1, F ( n i − 1) ) k ∑ i =1

  42. F-STATISTIC EXAMPLES

  43. -DISTRIBUTION F ▸ two -distributed random variables: χ 2 x ∼ χ 2 - distribution( m ) y ∼ χ 2 - distribution( n ) ▸ derived RV: Z = X / m Y / n ▸ it follows (by construction) that: z ∼ F - distribution( m , n )

  44. EXAMPLE

  45. varieties of frequentist testing

  46. THREE VARIETIES OF FREQUENTIST TESTING FISHER NEYMAN/PEARSON HYBRID NHST* explicit & serious X X ✓ alternative H a when to set-up after data before data after data statistical model collection collection collection goal of statistical quantify evidence decide action: decide action: analysis against H 0 adopt H 0 or H a adopt H 0 or ¬H 0 X X ✓ power calculation * this is a worst-case portrait of modern NHST ; this is not how it should be done

  47. NEYMAN/PEARSON APPROACH [INFORMAL GIST] ▸ procedure in N/P approach: ▸ fix H 0 and H a (based on prior research) ▸ determine desired α - and β -error level ▸ calculate sample size N necessary for β given α ▸ run the experiment ▸ determine significance based on α -level ▸ make a dichotomous decision: ▸ accept H a if test is significant ▸ accept H 0 otherwise

  48. LONG-TERM ERROR CONTROL IN NEYMAN/PEARSON APPROACH [more data = tighter curves!! = lower β ] [sampling distribution of mean under H 0 ] [sampling distribution of mean under H a ] [ β error = accept H 0 when H a is true] [ α error = accept H a when H 0 is true] [null-hypothesis] [alternative hypothesis]

  49. EXAMPLES FROM TEXTBOOKS neither textbook talks about fixing Ha and/or calculating power of a test

  50. THREE VARIETIES OF FREQUENTIST TESTING FISHER NEYMAN/PEARSON HYBRID NHST* explicit & serious X X ✓ alternative H a when to set-up after data before data after data statistical model collection collection collection goal of statistical quantify evidence decide action: decide action: analysis against H 0 adopt H 0 or H a adopt H 0 or ¬H 0 X X ✓ power calculation * this is a worst-case portrait of modern NHST ; this is not how it should be done

Recommend


More recommend