experimental design continued
play

Experimental design (continued) Spring 2017 Michelle Mazurek Some - PowerPoint PPT Presentation

Experimental design (continued) Spring 2017 Michelle Mazurek Some content adapted from Bilge Mutlu, Vibha Sazawal, Howard Seltman 1 Administrative No class Tuesday Homework 1 Plug for Tanu Mitra grad student session 2 Todays


  1. Experimental design (continued) Spring 2017 Michelle Mazurek Some content adapted from Bilge Mutlu, Vibha Sazawal, Howard Seltman 1

  2. Administrative • No class Tuesday • Homework 1 • Plug for Tanu Mitra grad student session 2

  3. Today’s class • Finish threats to validity • Experimental design / choices • Alternatives to experiments 3

  4. Quick review • Internal validity: causality – Isolate variable of interest – Randomized assignment • External validity – Representative sample – Representative environment/task/analysis • Valid constructs – Measure something meaningful – Reliable 4

  5. Know what you’re measuring • Especially when dealing with large-scale data from the internet – What are you missing? What is duplicated? – What is the precision and accuracy of the data? – Are you capturing what you think you’re capturing? – *Vantage point* – Representativeness / diversity 5

  6. Calibrating constructs • Examine outliers and spikes • Check for self-consistency • Compare multiple measures – Multiple datasets – Multiple ways of calculating a value • Test with synthetic data • Check longitudinal data periodically! 6

  7. Mis-measurements, now what? • Discard? (Why might this be bad?) – Discard outliers? Definition? • Use an explicit adjustment? 7

  8. Other measurement notes • (Don’t really fit here, but from Paxson paper) • Metadata and good analysis logging is critical! • Be clear about unknowns and limitations 8

  9. 4. Power • Power: Likelihood that if there’s a real effect, you will find it. • Why might you not find it? – Sample size – Effect size – Missing explanatory variables – Va Variability 9

  10. JXxto/T_ssNrNODtI/AAAAAAAAAo0/LXcl0Pxzg40/s1 Promote power • Covariates: Measure possible http://4.bp.blogspot.com/-Fuha1- confounds, include in analysis • Use reliable measurements • Control the environment • Potential tradeoff: Generalizability for power – E.g., limit variability between subjects 10

  11. EX EXPERI ERIMEN ENTAL DES ESIGN 11

  12. Some important decisions • What is the hypothesis? • Between or within subjects? • What treatment levels / conditions? • What dependent variables to measure? 12

  13. Good hypothesis design • Predicted relationship between (at least) 2 vars – Testable, falsifiable • Operational – Vars are clearly defined – Relationship / how you measure it clearly defined 13

  14. Good hypothesis design (cont.) • Justified – Exploratory results – Theory in related area – Well justified intuition? • Parsimonious 14

  15. Between vs. Within • Between: Each participant belongs to exactly one condition • Within: Each participant belongs to multiple 15

  16. Between vs. Within • More participants • More time each • Cleaner/less bias • More power (less variability subj-subj) 16

  17. Improving on between-subjects • Matching: Get like participants for each condition • Pro: reduces variability • Con: Hard to find; what do you match on? • In general, be very cautious 17

  18. Improving on within-subjects • Ordering effects can be HUGE – Learning, fatigue – Range effects: learn most for closest conditions • Mitigate via co counte terba rbalanci cing – All possible orders A B C D – Balanced latin square C A D B B D A C D C B A 18

  19. Counterbalancing doesn’t fix: • Range effects (most average treatment) • Context effects (what most participants are more familiar with) 19

  20. Mixed models are also possible • Everyone gets the same three tasks • Order of tasks varies • Tool with which to execute tasks varies 20

  21. Selecting conditions • How many IVs? – Password meter example • How many / which levels for each? – Cannot infer anything about levels you didn’t test 21

  22. Full-factorial (or not) • Full-factorial: All possible combinations of all Ivs – And all orderings? • Not: Only a subset – Selected how? – Recall: Vary at most one thing each time! • Planned comparisons! 22

  23. Why multivariate? • What is different between running one experiment with two IVs vs. two experiments with one IV each? • Interaction effects! 23

  24. Dependent variables • What and how to measure? – Construct validity, again! – Performance (time, errors, FP/FN, etc.) – Opinions/attitude – Audio recording, screen capture, keystrokes, copy- pasting behavior, etc. – Demographics • Multiple measures toward higher-level construct? 24

  25. NO NOT J T JUST E ST EXPE PERIM IMENTS NTS 25

  26. Kinds of measurement studies • Experimental • Observational/correlational • Quasi-experimental 26

  27. Observational/correlational • Observe that X and Y (don’t) increase and decrease together / in opposition • Research doesn’t apply any control or treatment: just measure incidence – Does lead exposure correlate with crime rate? • Directionality and third-variable both issues 27

  28. Quasi-experiments • Subset of observational studies • Can’t randomize assignment • But, experimenter controls something Group 1 Group 1 Treatment Group 2 Group 2 28

  29. Observational examples • Cohort study • Regression discontinuity • BIBIFI example 29

  30. Pluses and minuses • Can measure things that simply can’t be done with true experiments • In general, association at best – causality very hard to establish – Some statistical techniques to help exist • Low internal validity – can you maximize it within the available constraints? 30

Recommend


More recommend