Contrast Coding Or: One of These Levels is Not Like the Others Scott Fraundorf (and Tuan Lam) MLM Reading Group – 03.10.11
Administrivia ● 3/10 (TODAY): Contrast coding overview ● 4/7: Simple vs main effects ● 4/21: Principal components analysis ● 1 st week of May: Harald Baayen visit
Outline Why use contrast coding? ● Example contrasts ● Contrast estimates ● Contrasts in R ● Multiple comparisons ● How does it work? ● Other kinds of coding ● Interactions ●
Why Use Contrast Coding? Scott's example study: ● + + + = ITEM LOCATION OF PRIOR SUBJECT DISFLUENCY KNOWLEDGE Examining recall memory for spoken ● discourse as a function of: Location of disfluencies (categorical variable) ● Prior story knowledge (continuous variable) ●
Why Use Contrast Coding? Regression equation: Predicts values ● Could use this to predict whether or not ● something will be remembered + + + = ITEM LOCATION OF PRIOR SUBJECT DISFLUENCY KNOWLEDGE But in cognitive psych: ● Often interested in the effect of specific levels ● Test which ones differ significantly ●
Outline Why use contrast coding? ● Example contrasts ● Contrast estimates ● Contrasts in R ● Multiple comparisons ● How does it work? ● Other kinds of coding ● Interactions ●
Contrast Coding 0.8 0.75 % of story recalled 0.7 0.65 0.6 0.55 0.5 0.45 0.4 Typical Atypical Fluent ● Example: Fluent vs. disfluencies in typical locations vs. in atypical locations ● Which ones differ significantly?
Contrast Coding ● Contrasts: Test differences between specific levels – Same as a planned comparison in an ANOVA – Also analogous to a post-hoc test ● Planned comparisons vs post-hoc tests – If we are deciding tests post-hoc, greater chance of capitalizing on chance / spurious effect – Contrasts are set before you fit the model, but it would be possible to go back and change the contrasts afterwards – We are basically on the honor system here—no way to prove the comparison was planned ahead of time
Contrasts! ● Contrasts like weighted sums of means – In multiple regression / MLM context, also subject to other variables in the model ● Using your scale to test what's different
Contrast Coding It looks like the Fluent stories might not be remembered as well. 0.8 Let's use a contrast to test this. 0.75 % of story recalled 0.7 0.65 0.6 0.55 0.5 0.45 0.4 Typical Atypical Fluent
Contrasts TYPICAL FLUENT ATYPICAL Question 1: Do disfluencies affect recall?
One side positive. One side negative. Contrasts This determines which levels are being compared (+ versus -) Doesn't really matter .33 .33 -.66 which side you choose as the + side. It just affects the sign of the result, but not magnitude or statistical TYPICAL FLUENT ATYPICAL significance Contrast weights are assigned
One side positive. Contrasts One side negative. Codes add up to zero . Also nice to have the absolute values of the .33 .33 -.66 + code and the – code sum to 1 . (We'll see why later.) abs(.33) + abs(-.66) = 1 TYPICAL FLUENT ATYPICAL Contrast weights are assigned
One side positive. One side negative. Contrasts Codes add up to zero. .33 .33 -.66 TYPICAL FLUENT ATYPICAL Does contrast differ significantly from zero? If so, difference between levels is significant. Can conceptualize the comparison as: Contrast 1: .33(Typical) + .33 (Typical) - .66(Fluent) (holding other variables constant)
Contrasts .33 .33 -.66 TYPICAL ATYPICAL FLUENT * Contrast 1: .33(Typical) + .33 (Typical) - .66(Fluent)
Contrast Coding * 0.8 0.75 Our first contrast reveals that fluent stories are 0.7 remembered worse. % of story recalled 0.65 Now let's look at Typical vs Atypical 0.6 0.55 0.5 0.45 0.4 Typical Atypical Fluent We always have j – 1 contrasts, where j = the # of levels of the factor So, here 2 contrasts needed to fully describe
Contrasts TYPICAL ATYPICAL Question 2: Does location of disfluencies matter?
One side positive. Contrasts One side negative. Codes add up to zero. Sum of absolute values -.50 .50 of codes is 1. TYPICAL ATYPICAL 0 FLUENT (zeroed out here!) Contrast 2: .50(Typical) - .50(Atypical) + 0(Rest)
Contrast Coding * 0.8 n.s. 0.75 0.7 % of story recalled 0.65 0.6 0.55 0.5 0.45 0.4 Typical Atypical Fluent
One Important Point! Choice of contrasts doesn't affect total ● variance accounted for by variable Only about differences between levels ● Can divide this up in multiple different ways ● and still account for same total variance LOCATION IN STORY
Outline Why use contrast coding? ● Example contrasts ● Contrast estimates ● Contrasts in R ● Multiple comparisons ● How does it work? ● Other kinds of coding ● Interactions ●
Why -.5 and .5? ● Why [-.5 .5] instead of [-1 1]? ● Doesn't affect significance test ● Does affect β weight (estimate) – Std error is also scaled accordingly FILLER LOCATION: [-.5 .5] FILLER LOCATION: [-1 1]
Contrast Estimates CONTRAST Beta weight (estimate) represents the effect of a 1-unit CODE change in the contrast, holding ATYPICAL .5 everything else constant LOCATION In this case, a 1-unit change in }1 contrast IS the difference between the levels' codes Thus, the contrast correctly represents .04825 as the difference between the conditions TYPICAL -.5 LOCATION
Contrast Estimates CONTRAST Here, the total difference between the levels' codes is 2 CODE ATYPICAL 1 So, a 1-unit change in the LOCATION contrast is only HALF the difference between the levels' }2 codes Thus, the estimate of the contrast is .024 … only half the difference between the conditions TYPICAL -1 LOCATION
Contrast Estimates Beta weight (estimate) represents the effect of a 1-unit change in the contrast CONTRAST CONTRAST CODE CODE ATYPICAL .5 ATYPICAL 1 LOCATION LOCATION }2 }1 TYPICAL TYPICAL -.5 -1 LOCATION LOCATION 1 unit change in contrast IS 1 unit change in contrast IS the difference between levels only half the difference (.04825 in this case) between levels
So Why -.5 and .5? ● Better tell you about difference in means! – The actual difference between conditions is .048 – It would be perfectly correct to describe .024 as half the difference between levels and you could even put a CI around it … it's just less intuitive for your readers FILLER LOCATION: [-.5 .5] FILLER LOCATION: [-1 1]
So Why -.5 and .5? ● Better tell you about difference in means! – The actual difference between conditions is .048 – It would be perfectly correct to describe .024 as half the difference between levels and you could even put a CI around it … it's just less intuitive for your readers ● Both contrasts would account for the same amount of variance ● This is just another case of deciding the scale of a variable – Akin to measuring temperature in C versus F … both account for the same variance, but the numbers are on different scales
Imbalanced Designs You may have an ● unequal number of observations per cell e.g. some data lost, – or responses not codable Correct for this ● in your contrast codes if you want things centered Ask Tuan or Scott – about how to do this :)
Outline Why use contrast coding? ● Example contrasts ● Contrast estimates ● Contrasts in R ● Multiple comparisons ● How does it work? ● Other kinds of coding ● Interactions ●
Contrasts in R ● To check what the current contrasts are: – contrasts(YourDataFrame$VariableName) ● To set the contrasts: – contrasts(YourDataFrame$VariableName) = cbind(c(.33,.33,-.66),c(.50,-.50,0)) ● Each c(xx,yy,zz) is the weights for one of the contrasts you want to run ● e.g. (.33, .33, -.66) is one contrast ● After setting contrasts, run lmer model to get the results of the contrasts
Contrasts in R ● Should have j – 1 contrasts, where k = # of levels of the factor ● If using a subset of data, some levels of the factor may no longer be present – e.g. you dropped a condition – But, R still “remembers” that these levels exist and will get mad you didn't specify enough contrasts – Fix this by reconverting to a factor: ● YourDataFrame$Variable = factor(YourDataFrame$Variable)
Another R Tip ● To see the mean of each level of an I.V.: – tapply(YourDataFrame$DVName, YourDataFrame$IVName,mean) – Could also do median, sd, etc. ● For a 2-way (or more!) table – tapply(YourDataFrame$DVName, list(YourDataFrame$IVName1, YourDataFrame$IVName2), mean) ● Doesn't work if you have missing values – But Tuan has made a version of tapply that fixes this problem
Outline Why use contrast coding? ● Example contrasts ● Contrast estimates ● Contrasts in R ● Multiple comparisons ● How does it work? ● Other kinds of coding ● Interactions ●
Multiple Comparisons (Here Comes Trouble!)
Multiple Comparisons Lots of comparisons you can run ● Suppose we tested both young & older ● adults on the disfluency task: ATYPICAL / FLUENT / TYPICAL / YOUNGER YOUNGER YOUNGER ATYPICAL / FLUENT / TYPICAL / OLDER OLDER OLDER
Recommend
More recommend