201ab Quantitative methods L.12 Linear model: Categorical predictors E D V UL | UCSD Psychology Psych 201ab: Quantitative methods
Overly specific named procedures Response ~null ~binary ~category ~numerical ~numerical + category Numerical 1-sample 2-sample T- ANOVA Regression, ANCOVA T-test test Pearson correlation Ranked- Mann- Kruskall- Spearman numerical Whitney-U Wallis correlation 2-category Binomial Fisher’s Chi-sq. Logistic regression test exact test indep. k-category Chi-sq. Chi-squared independence goodness of fit E D V UL | UCSD Psychology
Conceptually correct, but some restrictions apply. E D V UL | UCSD Psychology
Overly specific named procedures Response ~null ~binary ~category ~numerical ~numerical + category Numerical 1-sample 2-sample T- ANOVA Regression, ANCOVA T-test test Pearson correlation lm(y~1) lm(y~f) lm(y~x) lm(y~x+f) Ranked- Mann- Kruskall- Spearman numerical Whitney-U Wallis correlation ~ lm(rank(y)~rank(x)) ~ lm(rank(y)~f) 2-category Binomial Fisher’s Chi-sq. Logistic regression test exact test indep. glm(y~…, family=binomial()) k-category Chi-sq. Chi-squared independence goodness of fit ~ glm(y~…, family=poisson()) E D V UL | UCSD Psychology
Overly specific named procedures Response ~null ~binary ~category ~numerical ~numerical + category Numerical 1-sample 2-sample T- ANOVA Regression, ANCOVA T-test test Pearson correlation lm(y~1) lm(y~f) lm(y~x) lm(y~x+f) Ranked- Mann- Kruskall- Spearman numerical Whitney-U Wallis correlation ~ lm(rank(y)~rank(x)) ~ lm(rank(y)~f) 2-category Binomial Fisher’s Chi-sq. Logistic regression test exact test indep. glm(y~…, family=binomial()) k-category Chi-sq. Chi-squared independence goodness of fit ~ glm(y~…, family=poisson()) E D V UL | UCSD Psychology
GLM: Categorical predictors (factors) • Why? • Making it go in R. – Data representation for categorical variable – lm() implementation • What is it actually doing? – Different perspectives on categorical predictors – Predictors / design matrix in LM. – Coding categories into design matrix. • Variations that require extensions of LM – Unequal variance t-test or ANOVA – Repeated measures and other random effects / correlated error structures. E D V UL | UCSD Psychology
Why categorical predictors? • Does mean y differ between… Predictor is treated – Treatment and control? as a dichotomous / – Males and females? binary categorical variable – Dogs and cats? • Does mean y vary among… Predictor is – Drug types? treated as a – Ethnicities? Religions? Etc. categorical variable – Dog breeds? E D V UL | UCSD Psychology
Do the groups have different means? • If we have 1 group and a point null for mean, we test the intercept: lm(y~1) -- a “one-sample t-test” • If we have 2 groups and a null of same means: we test the difference coef: lm(y~f) -- a “2-sample t-test”. • If we have 3+ groups and a null of same means: we test the ANOVA: lm(y~f) – an “analysis of variance” – Lots of t-tests between pairs of groups are impractical, don’t answer the right question. – Instead we test the variance of means across groups : this is the “analysis of variance”. E D V UL | UCSD Psychology
Three ways to think about factors Cell organization: Tidy data frame/table: Common formulation for doing How we will see our data. ANOVA calculation by hand. We avoid hand calculations, but this formulation helps understand what we are estimating. E D V UL | UCSD Psychology
Categorical predictors in R E D V UL | UCSD Psychology
Categorical predictors in R: 1-sample t-test • Does the mean of a group differ from some null mean? • E.g., does the mean level of conscientiousness deviate from random responses. – 10 (1-5 likert items), 6 positively coded, 4 negatively coded. – Mean expected from random responding: 6 (3*6 – 3*4) E D V UL | UCSD Psychology
Categorical predictors in R: 1-sample t-test • Does the mean of a group differ from some null mean? • E.g., does the mean level of conscientiousness deviate from random responses. – 10 (1-5 likert items), 6 positively coded, 4 negatively coded. – Mean expected from random responding: 6 (3*6 – 3*4) Why is this wrong? E D V UL | UCSD Psychology
Categorical predictors in R: 1-sample t-test • Does the mean of a group differ from some null mean? • E.g., does the mean level of conscientiousness deviate from random responses. – 10 (1-5 likert items), 6 positively coded, 4 negatively coded. – Mean expected from random responding: 6 (3*6 – 3*4) Via lm() Via t-test function E D V UL | UCSD Psychology
Categorical predictors in R: 2-sample t-test • Do the two groups have the same mean? • E.g., does the mean level of conscientiousness differ between males and females? E D V UL | UCSD Psychology
Categorical predictors in R: 2-sample t-test • Do the two groups have the same mean? • E.g., does the mean level of conscientiousness differ between males and females? Via t-test function Via lm() E D V UL | UCSD Psychology
Categorical predictors in R: one-way anova • Do the groups have the same mean? i.e., is there non-zero variance across group means? • E.g., does the mean level of conscientiousness differ among religions? E D V UL | UCSD Psychology
Categorical predictors in R: one-way anova • Do groups have same mean? Variance across group means? • does mean conscientiousness differ among religions? E D V UL | UCSD Psychology
Categorical predictors in R: two-way anova • Does mean vary across either/both factors? Consistently? does mean conscientiousness vary among religion, gender? E D V UL | UCSD Psychology
Categorical predictors in R: two-way anova • Does mean vary across either/both factors? Consistently? does mean conscientiousness vary among religion, gender? E D V UL | UCSD Psychology
GLM: Categorical predictors (factors) • Why? • Making it go in R. – Data representation for categorical variable – lm() implementation • What is it actually doing? – Different perspectives on categorical predictors – Predictors / design matrix in LM. – Coding categories into design matrix. • Variations that require extensions of LM – Unequal variance t-test or ANOVA – Repeated measures and other random effects / correlated error structures. E D V UL | UCSD Psychology
Three ways to think about factors Cell organization: Tidy data frame/table: Matrix notation: Common formulation for doing How we will see our data. How statistical software ANOVA calculation by hand. represents our data to do the analysis. We avoid hand calculations, but this formulation helps understand what Makes it easier to think we are estimating. about coding schemes. E D V UL | UCSD Psychology
Y i = β 0 + β 1 X 1 i + β 2 X 2 i + ε i Y Y i β 0 Response ˆ Y i ≡ µ Y | X 1 i , X 2 i , ε i Plane β 2 2 β 2 β 1 (0,0,0) (0,1) β 1 + β 2 (0,2) (1,0) X 2 ( X 1 i , X 2 i ) X 1 (1,1) (1,2) FROM JULIAN PARRIS E D V UL | UCSD Psychology
Y i = β 0 + β 1 X 1 i + β 2 X 2 i + ε i ! $ ! $ ! $ y 1 1 x 11 x 21 ε 1 # & # & # & y 2 1 x 12 x 22 # & # & # ε 2 & # & # & # & ! $ y 3 1 x 13 x 23 β 0 ε 3 # & # & # & # & # & # & # & ... ... ... ... ... = # β 1 & + # & # & # & # & y i 1 x 1 i x 2 i ε i β 2 # & # & # & # & " % # & # & # & ... ... ... ... ... # & # & # & y n 1 x 1 n x 2 n ε n # & # & # & " % " % " % E D V UL | UCSD Psychology
Y i = β 0 + β 1 X 1 i + β 2 X 2 i + ε i All the y data ! $ ! $ ! $ y 1 1 x 11 x 21 ε 1 points in a # & # & # & single vector y 2 1 x 12 x 22 # & # & # ε 2 & # & # & # & ! $ y 3 1 x 13 x 23 β 0 ε 3 # & # & # & # & # & # & # & ... ... ... ... ... = # β 1 & + # & # & # & # & y i 1 x 1 i x 2 i ε i β 2 # & # & # & # & " % # & # & # & ... ... ... ... ... # & # & # & y n 1 x 1 n x 2 n ε n # & # & # & " % " % " % E D V UL | UCSD Psychology
Y i = β 0 + β 1 X 1 i + β 2 X 2 i + ε i All of the x predictors in one matrix. (constant 1 for the intercept: sometimes called X0) All the y data ! $ ! $ ! $ y 1 1 x 11 x 21 ε 1 points in a # & # & # & single vector y 2 1 x 12 x 22 # & # & # ε 2 & # & # & # & ! $ y 3 1 x 13 x 23 β 0 ε 3 # & # & # & # & # & # & # & ... ... ... ... ... = # β 1 & + # & # & # & # & y i 1 x 1 i x 2 i ε i β 2 # & # & # & # & " % # & # & # & ... ... ... ... ... # & # & # & y n 1 x 1 n x 2 n ε n # & # & # & " % " % " % E D V UL | UCSD Psychology
Recommend
More recommend