The Aggregate Association Index Eric J. Beh School of Mathematical and Physical Sciences University of Newcastle, Australia COMPSTAT 2010, Paris, France – August 24
The 2 x 2 Contingency Table Cross-classify a sample of size n according to two dichotomous variables “Let us blot out the contents of Column 1 Column 2 Total the table, leaving only the marginal frequencies . . . [they] by p p p Row 1 • 1 11 12 themselves supply no information on . . . the proportionality of the p p p Row 2 • 22 2 21 frequencies in the body of the p • p • table . . . ” Total 1 1 2 – Fisher (1935) Define 2 − P p p p ( ) p = • • • 2 1 1 1 2 = X P | p , p n 11 P • • 1 1 1 p p p 1 p • • • 2 1 2 • 1
Bounds of P 1 Duncan & Davis (1953) Bounds − n n n = • • ≤ ≤ • = 1 2 1 L max 0 , P min , 1 U 1 1 1 n n • • 1 1 100(1 – α )% Confidence Bounds χ χ 2 2 p p p p = − < < + = α • • α • • * * 1 2 1 2 L p p P p p U α • • • • α 1 2 1 1 2 n p p n p p • • • • 1 2 1 2 ( ) ( ) = < < = * * L max 0 , L P min 1 , U U α α α α 1
Aggregate Association Index (AAI) 30 Chi-squared Statistic Statistically significant association 25 20 15 χ 2 10 α 5 0 p 1* L 1 0.0 0.2 0.4 0.6 0.8 1.0 L α U α U 1 P 1 χ 2 If the area under X 2 (P 1 ) but above is large than there may be α evidence to suggest that there is a significant association (at the α level of significance) between the two dichotomous variables.
Aggregate Association Index (AAI) 30 Chi-squared Statistic Statistically significant association 25 20 15 χ 2 10 α 5 0 p 1* L 1 0.0 0.2 0.4 0.6 0.8 1.0 L α U α U 1 P 1 [ ] ( ) ( ) ( ) U ∫ − + − χ + α 2 2 L L U U X P | p , p dP α α α • • 1 1 1 1 1 1 = − L α A 100 1 α ( ) U ∫ 1 2 X P | p , p dP • • 1 1 1 1 L 1
Example – Fisher’s Twin Data Fisher's data studies 30 criminal twins and classifies them according to whether they are a monozygotic twin or a dizygotic twin. The table also classifies whether their same sex twin has been convicted of a criminal offence. Pearson chi-squared statistic is 13.032. p-value = 0.0003 → there is evidence of a strong association between the two variables. The product moment correlation = 0.6591 → positive association
Example – Fisher’s Twin Data But, as Fisher (1935) did, suppose we “blot out” the cells of the table. Question: What information do the margins provide in understanding the extent to which the variables are associated. We shall calculate the aggregate association index
Example – Fisher’s Twin Data A 0.05 = 61.83 If we consider the 5% level of significance, the margins provide strong evidence that there may exist a significant association between twin type & conviction status 2 − 221 30 P 12 ( ) = 2 where 0 ≤ P 1 ≤ 0.9231 1 X P 1 216 17
Direction of the Association + A α − A α + − = + A A A α α α
Fisher’s Twin Data ( . . . revisited) = A 61 . 83 0 . 05 + − = = A 46 . 43 A 15 . 40 0 . 05 0 . 05 Therefore based solely on the marginal information we can determine that the variables are three times more likely to be positively associated than negatively associated
Discussion The index provides an indication of the extent to which two dichotomous variables are statistically significantly association given only the marginal information Index is not meant to infer the individual level correlation of the variables, but to provide a measure reflecting how likely the two variables may be associated. Further Issues: Investigate the applicability of index for G (>1) 2x2 tables, including incorporating covariate information (ecological inference) Has links with the correspondence analysis of aggregate data Link with Fisher’s exact test
Recommend
More recommend