Experimental design (continued) Spring 2017 Michelle Mazurek Some - PowerPoint PPT Presentation

Experimental design (continued) Spring 2017 Michelle Mazurek Some content adapted from Bilge Mutlu, Vibha Sazawal, Howard Seltman 1

Administrative • No class Tuesday • Homework 1 • Plug for Tanu Mitra grad student session 2

Today’s class • Finish threats to validity • Experimental design / choices • Alternatives to experiments 3

Quick review • Internal validity: causality – Isolate variable of interest – Randomized assignment • External validity – Representative sample – Representative environment/task/analysis • Valid constructs – Measure something meaningful – Reliable 4

Know what you’re measuring • Especially when dealing with large-scale data from the internet – What are you missing? What is duplicated? – What is the precision and accuracy of the data? – Are you capturing what you think you’re capturing? – *Vantage point* – Representativeness / diversity 5

Calibrating constructs • Examine outliers and spikes • Check for self-consistency • Compare multiple measures – Multiple datasets – Multiple ways of calculating a value • Test with synthetic data • Check longitudinal data periodically! 6

Mis-measurements, now what? • Discard? (Why might this be bad?) – Discard outliers? Definition? • Use an explicit adjustment? 7

Other measurement notes • (Don’t really fit here, but from Paxson paper) • Metadata and good analysis logging is critical! • Be clear about unknowns and limitations 8

4. Power • Power: Likelihood that if there’s a real effect, you will find it. • Why might you not find it? – Sample size – Effect size – Missing explanatory variables – Va Variability 9

JXxto/T_ssNrNODtI/AAAAAAAAAo0/LXcl0Pxzg40/s1 Promote power • Covariates: Measure possible http://4.bp.blogspot.com/-Fuha1- confounds, include in analysis • Use reliable measurements • Control the environment • Potential tradeoff: Generalizability for power – E.g., limit variability between subjects 10

EX EXPERI ERIMEN ENTAL DES ESIGN 11

Some important decisions • What is the hypothesis? • Between or within subjects? • What treatment levels / conditions? • What dependent variables to measure? 12

Good hypothesis design • Predicted relationship between (at least) 2 vars – Testable, falsifiable • Operational – Vars are clearly defined – Relationship / how you measure it clearly defined 13

Good hypothesis design (cont.) • Justified – Exploratory results – Theory in related area – Well justified intuition? • Parsimonious 14

Between vs. Within • Between: Each participant belongs to exactly one condition • Within: Each participant belongs to multiple 15

Between vs. Within • More participants • More time each • Cleaner/less bias • More power (less variability subj-subj) 16

Improving on between-subjects • Matching: Get like participants for each condition • Pro: reduces variability • Con: Hard to find; what do you match on? • In general, be very cautious 17

Improving on within-subjects • Ordering effects can be HUGE – Learning, fatigue – Range effects: learn most for closest conditions • Mitigate via co counte terba rbalanci cing – All possible orders A B C D – Balanced latin square C A D B B D A C D C B A 18

Counterbalancing doesn’t fix: • Range effects (most average treatment) • Context effects (what most participants are more familiar with) 19

Mixed models are also possible • Everyone gets the same three tasks • Order of tasks varies • Tool with which to execute tasks varies 20

Selecting conditions • How many IVs? – Password meter example • How many / which levels for each? – Cannot infer anything about levels you didn’t test 21

Full-factorial (or not) • Full-factorial: All possible combinations of all Ivs – And all orderings? • Not: Only a subset – Selected how? – Recall: Vary at most one thing each time! • Planned comparisons! 22

Why multivariate? • What is different between running one experiment with two IVs vs. two experiments with one IV each? • Interaction effects! 23

Dependent variables • What and how to measure? – Construct validity, again! – Performance (time, errors, FP/FN, etc.) – Opinions/attitude – Audio recording, screen capture, keystrokes, copy- pasting behavior, etc. – Demographics • Multiple measures toward higher-level construct? 24

NO NOT J T JUST E ST EXPE PERIM IMENTS NTS 25

Kinds of measurement studies • Experimental • Observational/correlational • Quasi-experimental 26

Observational/correlational • Observe that X and Y (don’t) increase and decrease together / in opposition • Research doesn’t apply any control or treatment: just measure incidence – Does lead exposure correlate with crime rate? • Directionality and third-variable both issues 27

Quasi-experiments • Subset of observational studies • Can’t randomize assignment • But, experimenter controls something Group 1 Group 1 Treatment Group 2 Group 2 28

Observational examples • Cohort study • Regression discontinuity • BIBIFI example 29

Pluses and minuses • Can measure things that simply can’t be done with true experiments • In general, association at best – causality very hard to establish – Some statistical techniques to help exist • Low internal validity – can you maximize it within the available constraints? 30

Experimental design (continued) Spring 2017 Michelle Mazurek Some - PowerPoint PPT Presentation

Experimental design (continued) Spring 2017 Michelle Mazurek Some content adapted from Bilge Mutlu, Vibha Sazawal, Howard Seltman 1 Administrative No class Tuesday Homework 1 Plug for Tanu Mitra grad student session 2 Todays

Basic Experimental Design Basic Concepts in Experimental Design Prof. Dr. Luc Duchateau Ghent

Experimental Design and Probability Introduction to course Robin Elahi Experimental Design and

Experimental Design in R Kaelen Medeiros Product Data Scientist at DataCamp DataCamp

WHAT WOULD TREX DO? From Experimental Design to Analysis, the TREX Approach EXPERIMENTAL DESIGN

Experimental Design for Simulation Experimental Design for Simulation [Law, Ch. 12][Sanchez et al.

Principles of Experimental Design Applied Statistics and Experimental Design Chapter 1 Peter

Design Exploration and Design Exploration and Experimental Validation of Experimental Validation

Latin Squares Kaelen Medeiros Content Quality Analyst DataCamp Experimental Design in R Latin

In vitro tests and experimental animal In vitro tests and experimental animal In vitro tests and

What Can Experimental Philosophy Do? David Chalmers Cast of Characters n X-Phi: Experimental

EMBEDDED SYSTEMS BASICS WORKSHOP by ELC Skyward SKYWARD EXPERIMENTAL ROCKETRY SKYWARD

Todays Agenda Todays Agenda Continued Todays Agenda Continued Save the Date August

Cluster algebras, snake graphs and continued fractions Ralf Schiffler Intro Cluster algebras

The game Euclid , its variants, and continued fractions Nhan Bao Ho 23 April 2014 Nhan Bao Ho

The ergodic theory of continued fraction maps Speaker: Radhakrishnan Nair University of

Lecture 5-2: Sequential Circuit Design continued FSM design Design steps for FSM: Draw state

. . . . . : o . affine indep 4 . un . . VECTOR #TEN NO IN 2 WAYS AS AFF . COMB . as= and

Finite Fields, Applications and Open Problems Daniel Panario School of Mathematics and Statistics

Alex Suciu Northeastern University Conference on Hyperplane Arrangements and Characteristic

Bad and Good Ways of Post-Processing Biased Random Numbers Markus Dichtl Siemens AG Corporate

PH296, Section 36 February 25, 2002 Discussion of: K. Kerr, M. Martin, and G. Churchill. (2000).

r -regular families of graph automorphisms Robert Jajcay Comenius University and University

A QUD-based theory of quantifier conjunction with but WCCFL 38, University of British Columbia

Lattice-Theoretic Data Flow Analysis Framework Lattices Define lattice D = ( S , ): Goals: