Individual Differences & Item Effects: How to test them, & how to test them well
Individual Differences & Item Effects Properties of subjects Properties of items Cognitive abilities (WM Lexical frequency task scores, inhibition) Segmental properties Gender Plausibility Age L2 proficiency Task strategy
Two Challenges Subject & item properties are not at the level of individual trials How to implement in your model? What do they mean statistically? Subject & item properties often not experimentally manipulated How to best investigate?
Example Study Fraundorf et al., 2010 Both the British and the French biologists had been searching Malaysia and Indonesia for the endangered monkeys. Finally, the British spotted one of the monkeys in MALAYSIA and planted a radio tag on it. British found it or In Malaysia or in French found it? Indonesia? INTRO. EXPT 1 EXPT 2 DISC.
Manipulate presentational vs contrastive accents Finally, the British spotted one of the monkeys in MALAYSIA ... Finally, the BRITISH spotted one of the monkeys in Malaysia... Finally, the BRITISH spotted one of the monkeys in MALAYSIA ... Finally, the British spotted one of the monkeys in Malaysia... INTRO. EXPT 1 EXPT 2 DISC.
Original Results Contrastive (L+H*) accent benefits memory No effect of accent on other item Effects seem localized INTRO. EXPT 1 EXPT 2 DISC.
Implementing in R New experiment: do these effects vary with individual differences in working memory? Need trial level and subject level variables in the same dataframe
Implementing in R Then, can add to model just like any other factor: lmer(Correct ~ Accent * WM_Score + (1| Subject) + (1|StoryID), family=binomial) R automatically figures out it's subject-level Each subject always has the same score
Merging Dataframes What if trials & subjects in separate files? Data2: Subject-level Data1: Trial-level Load them both into R and use merge : FullDataframe = merge(Data1, Data2, all.x=TRUE)
Merging Dataframes But, these may be separate files Data2: Subject-level Data1: Trial-level Load them both into R and use merge : FullDataframe = merge(Data1, Data2, all.x=TRUE) Need some column that has the same name in both data frames
Merging Dataframes Load them both into R and use merge : FullDataframe = merge(Data1, Data2, all.x=TRUE) Need some column that has the same name in both data frames Can specify WHICH columns to use with the by parameter. See ?merge for more details. Default is to delete subjects if they can't be matched across data frames . all.x = TRUE fills in NA values instead so you can track these subjects
What's Going On Statistically? LEVEL 2: Subjects, Items Knight Monkey story story LEVEL 1: Trial Knight Monkey Knight Monkey
What's Going On Statistically? LEVEL 2: Subjects, Items Knight Monkey Have random effects of our subjects & items. Results in residuals: +4 vs mean Eun-Kyung accuracy: 80% -4 vs mean Tuan accuracy: 72% Level 2 factors may help us explain this variation
What's Going On Statistically? Model without WM: Unexplained variance between subjects Model with main effect of WM: Unexplained subject variance reduced Fixed effects unchanged because these were manipulated within subjects
Random Slopes & Adding main effects at Level 2 will not change fixed effects at Level 1 But can also add INTERACTIONS with trial level factors These help explain the random slopes
Effect of Subject-Level Variables Remember random slopes? Variance between subjects in a fixed effect Alison Memory Accuracy Zhenghan Other Item Has Other Item Has Presentational Contrastive Accent Accent
Random Slopes & Adding main effects at Level 2 will not change fixed effects at Level 1 But can also add INTERACTIONS These help explain the random slopes May be more interesting, theoretically People with low WM scores DO show a penalty to memory if something else in the story gets a contrastive accent
Random Slopes & Illogical to have a random slope by subject for something at the subject level There isn't a separate WM effect for each subject lmer lets you fit this … but I'm not sure what it represents
Individual Differences: How to Do Them Well What Scott has learned from the individual differences literature Example study: Pitch accenting as cue to reference resolution (deaccented referents are usually given) Can we predict individual differences in use of this cue?
Discriminant Validity Many individual differences are correlated
Discriminant Validity Many individual differences are correlated e.g. some subjects may just try harder than others Consequently, they would do better on both WM task & eye-tracking task Usually not theoretically interesting Principle #1: Include >1 construct so we know what really matters
Discriminant Validity How to deal with correlated predictors? Simple solution: Regress 1 on the other ModelWM <- lm(WMMean ~ PSpeed, data=Cyclops) Then use the residuals as new measure Cyclops$ResidWM <- residuals(ModelWM) “The part of WM we couldn't explain from perceptual speed” Better solutions: path analysis & structural equation modeling
Discriminant Validity
Discriminant Validity Some people asked about how to get these colored scatterplots... Need to download & load package gclus Then... Cyclops.short <- subset(Cyclops, select=c('PSpeed', 'GoodProsody', Here you select 'ResidWM')) which variables Cyclops.r <- abs(cor(Cyclops.short, go in the use="pairwise.complete.obs")) scatterplot Cyclops.col <- dmat.color(Cyclops.r) Cyclops.o � order.single(Cyclops.col) cpairs(Cyclops.short, Cyclops.o, panel.colors=Cyclops.col, gap=.5)
Reliability Not all individual measures are good measures Measures may be noisy Measures may not measure a stable or meaningful characteristic Suppose you found vocab predicted outcome but not WM Maybe you had a bad WM measure
Reliability Good tests produce consistent scores Measuring something real about a person Can test this yourself with >1 assessment … or split halves Calculate Pearson's r: cor.test(Cyclops$PSpeed1, Cyclops$PSpeed2) Scatterplot: plot(Cyclops$PSpeed1, Cyclops$PSpeed2) Typical standard may be r = .70 - .80 needed for “good” reliability
Reliability Good tests produce consistent scores Measuring something real about a person Can test this yourself with >1 assessment … or split halves r = .16 Bad! r = .77 Good! Principle 2 : Check reliability of measures!
Latent Variables Some things can be measured directly i k | d e.g. gender of a subject, segmental properties of a work n Ə Many things in psychology measured indirectly Alphabet Span Ability to do Task tasks in spite of (Read words & interference recall alphabeticaly)
Latent Variables But, few tasks are process pure Alphabet knowledge Alphabet Span Working memory Reading Span Reading ability
Latent Variables Principle 3: Overcome task-specific factors with multiple measures of same construct Simple analysis: Use sum or average as your predictor Advanced techniques Verify measures are related with factor analysis Examine only common variance: latent variable analysis, structural equation modeling
Continuous Predictors Many individual differences are continuous Good to include continuous variation if you have full range Splits needed in ANOVA But throws away info.; less powerful Histogram: hist(Cyclops$WM, breaks=20)
Continuous Predictors Don't want to treat predictor as continuous if sampling was dichotomous In this case, we didn't sample middle-aged people
Continuous Predictors Don't want to treat predictor as continuous if sampling was dichotomous Pattern could be this...
Continuous Predictors Don't want to treat predictor as continuous if sampling was dichotomous ...or this!
Continuous Predictors Don't want to treat predictor as continuous if Here there sampling was be dragons dichotomous We have no info. about what should be in the middle
Comparing Predictors How do we tell which has a stronger effect? (A) rashness QVT (B) timidity TEMERITY (C) desire (D) kindness QVR Perceptual Speed Vocab Measure: # of Measure: # of multiple- choice Qs correct of 40 same/different judgments in 2 min. Beta = 14.69 Beta = 6.03 1 add'l correct 1 add'l trial: prosody word: prosody score + 15 score + 6
Comparing Predictors Issue: Measures often on different scales Perceptual Speed Vocab Beta = 6.03 Beta = 14.69 Range: 12.00 to 32.00 Range: 82 to 236 Mean: 20.80 Mean: 160 Std. Dev.: 28.75 Std. Dev.: 5.30
Comparing Predictors Issue: Measures often on different scales Solution: Standardize the predictors so you are comparing z scores Scale so SD = 1 Cyclops$Vocab_z = scale(Cyclops$Vocab, Center center=TRUE, scale=TRUE) so mean = 0 Changes your parameter estimates but not your hypothesis tests Perceptual speed: Standardized beta = .31 Vocab: Standardized beta = .14
Comparing Predictors
Recommend
More recommend