Scale construction Michelle Mazurek (some material from Bilge Mutlu) 1
About scales • Bridging from qual. to quant • Using (typically) ordinal questions – Sometimes nominal categorical • Using them in a repeatable way • That is validated! • For construct validity 3
Thinking about construct validity • How to measure something complicated / hard to define – Risk taking – Privacy concern – Sociability – Etc. ted way! • In a va vali lida date 4
What do we want to validate? • Items -> latent factors • Reliability: internal consistency, test-retest • Reflects something in the real world 5
Overall procedure • Generate items and review for wording, match to intended construct, etc. – Expert review; cognitive interview • Refine items – Check for range effects – Do exploratory factor analysis – Get rid of ones that don’t work – Set up subscales – Repeat 6
Overall procedure (2) • Validate – Against other scales, real-world behavior – That subscales still intra-correlate and load – Test-retest – Different populations? Modes (internet)? 7
EXPLOR ORATOR ORY FACTOR OR AN ANAL ALYSIS 8
Often, multiple components • Risk perception: different kinds of risk • Privacy -- ideas about collection vs. unauthorized sharing, etc. • … • Subscales! 9
Observable vs. latent • Observable: answers to items, test scores, other measurements factor that correlates with • Latent: underlying fa (governs?) multiple measurable components • Factor analysis: re reduce large number of observables to smaller number of latent factors – Resultant factors hopefully (mostly) independent 10
Factor analysis model • X 1 -X n : measured variables • F 1 -F m : latent factors • b 11 -b nm : factor loa loadin ings • X 1 = b 11 F 1 + b 12 F 2 + …. b 1m F m + e 1 X 2 = b 21 F 1 + b 22 F 2 + …. b 2m F m + e 2 X n = b n1 F 1 + b n2 F 2 + …. b nm F m + e n 11
Factor analysis model • Loadings: -1 to 1, where 0 = no loading • Like to end up w/ mostly 1s and 0s • All based on correlation / covariance matrices among the measure variables 12
Assumptions: • Measurement error constant variance, avg=0 • No assoc. btwn errors • No assoc. btwn factor + measurement error • Local/conditional independence: – Meas. Vars are independent (given the factor) • In practice: everything in standardized – Subtract the mean (center at 0) and div by StD (var=1) – Total variance = # of meas. variables 13
Requires large samples • Rule of thumb: 10 observations per variable in the list (so if 30 item scale, n=300) 14
Running example • Teaching reviews (from “Real Statistics with Excel” website) • 120 obs. of 9 questions – All on 1-10 Likerts – E.g. is entertaining, communicates well, has expertise in the subject, passion for teaching, etc. 15
Overall procedure • Co Collec ect + ex explore e data • Extract initial factors; choose how many to retain • Choose and use estimation method • Rotate • Interpret, adjust, repeat 16
Explore data • Check for range effects • Check for applicability of factor analysis – KMO sampling adequacy (> 0.6) – Bartlett’s sphericity • Null: correlation matrix is identity matrix (everything is uncorrelated). You want to reject it (p < 0.05). But, it’s always rejected basically. 17
Overall procedure • Collect + explore data • Extra Extract ct initial facto ctors rs; ch choose how many y to to reta tain • Choose and use estimation method • Rotate • Interpret, adjust, repeat 18
How many factors? • Theoretical / predicted answer • Guess and check • Use PCA to find out – Start with factors = # of variables – Decide how many to retain based on results • Too many: some may have zero loadings; not parsimonious • Too few: may have incorrect loadings (worse!) 19
Using PCA to retain factors • Each factor has an associated eigenvalue; retain based on eigenvalues All with eigenvalue > 1 (Kaiser) • – Factor contributes more than a single measure variable to the total variance (each meas has var=1) – This is obviously arbitrary; can retain too many • Scree plot (Catell): Plot, keep left of inflection – Subjective • Min factors where sum > 70% (80%) of total variance • Others 20
Overall procedure • Collect + explore data • Extract initial factors; choose how many to retain • Cho Choose e and nd us use e es esti tima mati tion n metho method • Rotate • Interpret, adjust, repeat • Confirm: collect new data and fit to model – Evaluate adequacy; compare to other models 21
Main estimation method • Maximum likelihood – Max. likelihood of seeing this corr. matrix (more CFA) • Principle Axis – Put as many vars as possible on first factor, etc. • Principle components (ish) – Account for max. variance with first factor, etc. 22
Overall procedure • Collect + explore data • Extract initial factors; choose how many to retain • Choose and use estimation method • Ro Rota tate te • Interpret, adjust items, repeat 23
Rotation factor loadings • There are infinite equally good solutions to the factor loadings (matrix math) • Think of these as rotations – Factors are axes/vectors, variables “load” onto close by axes, can ”rotate” them infinitely • Goal: loadings that are close to either 1 or 0 – Distribute items among factors – Clearly distinguish “on” or “off” – Does not improve fit! 24
Rotation methods • Orthogonal: factors independent – Varimax: max sq. loading variance ac across ss va vars rs • Most common – Quartimax: max. it ac across fac ss factors • Oblique: not independent – Oblimin, promax 25
Choosing rotation • Maybe not super important • Orthogonal: simple to interpret – Is independence reasonable for your construct? • Oblique: maybe simpler structure, but interactions are confusing – Loading not interpretable as correlation var + factor 26
Overall procedure • Collect + explore data • Extract initial factors; choose how many to retain • Choose and use estimation method • Rotate • In Interpret, a , adjust st i items, r s, repeat 27
Detour: FA vs. clustering • Clustering: Group ob observation ions – Find and profile subgroups • FA: Group va vari riabl bles – Data reduction – Latent factors 28
Detour: FA vs. PCA • Meta-analysis study • CFA: underlying construct – Best for correlations of variables, structure of data • PCA: increased factor loadings – Best for summarizing, reducing variables • (Kim 2008) 29
Detour: Communality vs. uniqueness • Communality: Variance in the measure variable explained by the factors • Uniqueness: variance explained by the e term 30
Choosing items • Drop anything with uniqueness > 0.5 – Not well mapped to factors • Keep things that load > 0.3 (or 0.5) • Avoid cross-loading items – Anything that doesn’t load as least 2x on “main” factor (“Saucier”) 31
Interpreting a subscale • Is there a coherent explanation for why these particular questions fit together? • Do the subscale items have high reliability? – Cronbach alpha > 0.6 for each, 0.7 for majority of the subscales (McKinley) – Item-total correlation (pearson btwn item and subscale average) > 0.2 (Everitt) 32
Validating the scale • Get a new sample, check validity • Does PCA produce same # of factors? • Do items load as predicted? • Test-retest: same participants, over time • Validate against real-world data: – SEBIS vs. measured security behavior – DOSPERT vs. risk behaviors 33
Recommend
More recommend