Why experimenters should not randomize, and what they should do - PowerPoint PPT Presentation

Why experimenters should not randomize, and what they should do instead Maximilian Kasy Department of Economics, Harvard University Maximilian Kasy (Harvard) Experimental design 1 / 42

Introduction project STAR Covariate means within school for the actual ( D ) and for the optimal ( D ∗ ) treatment assignment School 16 D ∗ = 0 D ∗ = 1 D = 0 D = 1 girl 0.42 0.54 0.46 0.41 black 1.00 1.00 1.00 1.00 birth date 1980.18 1980.48 1980.24 1980.27 free lunch 0.98 1.00 0.98 1.00 n 123 37 123 37 School 38 D ∗ = 0 D ∗ = 1 D = 0 D = 1 girl 0.45 0.60 0.49 0.47 black 0.00 0.00 0.00 0.00 birth date 1980.15 1980.30 1980.19 1980.17 free lunch 0.86 0.33 0.73 0.73 n 49 15 49 15 Maximilian Kasy (Harvard) Experimental design 2 / 42

Introduction Some intuitions “compare apples with apples” ⇒ balance covariate distribution not just balance of means! don’t add random noise to estimators – why add random noise to experimental designs? optimal design for STAR: 19% reduction in mean squared error relative to actual assignment equivalent to 9% sample size, or 773 students Maximilian Kasy (Harvard) Experimental design 3 / 42

Introduction Some context - a very brief history of experiments How to ensure we compare apples with apples? 1 physics - Galileo,... controlled experiment, not much heterogeneity, no self-selection ⇒ no randomization necessary 2 modern RCTs - Fisher, Neyman,... observationally homogenous units with unobserved heterogeneity ⇒ randomized controlled trials (setup for most of the experimental design literature) 3 medicine, economics: lots of unobserved and observed heterogeneity ⇒ topic of this talk Maximilian Kasy (Harvard) Experimental design 4 / 42

Introduction The setup 1 Sampling: random sample of n units baseline survey ⇒ vector of covariates X i 2 Treatment assignment: binary treatment assigned by D i = d i ( X , U ) X matrix of covariates; U randomization device 3 Realization of outcomes: Y i = D i Y 1 i + (1 − D i ) Y 0 i 4 Estimation: estimator � β of the (conditional) average treatment effect, � β = 1 i E [ Y 1 i − Y 0 i | X i , θ ] n Maximilian Kasy (Harvard) Experimental design 5 / 42

Introduction Questions How should we assign treatment? In particular, if X has continuous or many discrete components? How should we estimate β ? What is the role of prior information? Maximilian Kasy (Harvard) Experimental design 6 / 42

Introduction Framework proposed in this talk 1 Decision theoretic: d and � β minimize risk R ( d , � β | X ) (e.g., expected squared error) 2 Nonparametric: no functional form assumptions 3 Bayesian: R ( d , � β | X ) averages expected loss over a prior. prior: distribution over the functions x → E [ Y d i | X i = x , θ ] 4 Non-informative: limit of risk functions under priors such that Var( β ) → ∞ Maximilian Kasy (Harvard) Experimental design 7 / 42

Introduction Main results 1 The unique optimal treatment assignment does not involve randomization. 2 Identification using conditional independence is still guaranteed without randomization. 3 Tractable nonparametric priors 4 Explicit expressions for risk as a function of treatment assignment ⇒ choose d to minimize these 5 MATLAB code to find optimal treatment assignment 6 Magnitude of gains: between 5 and 20% reduction in MSE relative to randomization, for realistic parameter values in simulations For project STAR: 19% gain relative to actual assignment Maximilian Kasy (Harvard) Experimental design 8 / 42

Introduction Roadmap 1 Motivating examples 2 Formal decision problem and the optimality of non-randomized designs 3 Nonparametric Bayesian estimators and risk 4 Choice of prior parameters 5 Discrete optimization, and how to use my MATLAB code 6 Simulation results and application to project STAR 7 Outlook: Optimal policy and statistical decisions Maximilian Kasy (Harvard) Experimental design 9 / 42

Introduction Notation random variables: X i , D i , Y i values of the corresponding variables: x , d , y matrices/vectors for observations i = 1 , . . . , n : X , D , Y vector of values: d shorthand for data generating process: θ “frequentist” probabilities and expectations: conditional on θ “Bayesian” probabilities and expectations: unconditional Maximilian Kasy (Harvard) Experimental design 10 / 42

Introduction Example 1 - No covariates n d := � 1 ( D i = d ), σ 2 d = Var( Y d i | θ ) � D i � � Y i − 1 − D i � β := Y i n 1 n − n 1 i Two alternative designs: Randomization conditional on n 1 1 Complete randomization: D i i.i.d., P ( D i = 1) = p 2 Corresponding estimator variances n 1 fixed ⇒ 1 σ 2 σ 2 1 0 + n 1 n − n 1 n 1 random ⇒ 2 � σ 2 � σ 2 1 0 E n 1 + n 1 n − n 1 Choosing (unique) minimizing n 1 is optimal. Indifferent which of observationally equivalent units get treatment. Maximilian Kasy (Harvard) Experimental design 11 / 42

Introduction Example 2 - discrete covariate X i ∈ { 0 , . . . , k } , n x := � i 1 ( X i = x ) n d , x := � i 1 ( X i = x , D i = d ), σ 2 d , x = Var( Y d i | X i = x , θ ) � D i � � � n x 1 − D i � β := 1 ( X i = x ) Y i − Y i n n 1 , x n x − n 1 , x x i Three alternative designs: Stratified randomization, conditional on n d , x 1 Randomization conditional on n d = � 1 ( D i = d ) 2 Complete randomization 3 Maximilian Kasy (Harvard) Experimental design 12 / 42

Introduction Corresponding estimator variances 1 Stratified; n d , x fixed ⇒ � � � σ 2 σ 2 n x 1 , x 0 , x V ( { n d , x } ) := + n n 1 , x n x − n 1 , x x 2 n d , x random but n d = � x n d , x fixed ⇒ � � � � � � E V ( { n d , x } ) � n 1 , x = n 1 � x 3 n d , x and n d random ⇒ E [ V ( { n d , x } )] ⇒ Choosing unique minimizing { n d , x } is optimal. Maximilian Kasy (Harvard) Experimental design 13 / 42

Introduction Example 3 - Continuous covariate X i ∈ R continuously distributed ⇒ no two observations have the same X i ! Alternative designs: Complete randomization 1 Randomization conditional on n d 2 Discretize and stratify: 3 Choose bins [ x j , x j +1 ] X i = � j · 1 ( X i ∈ [ x j , x j +1 ]) ˜ stratify based on ˜ X i Special case: pairwise randomization 4 “Fully stratify” 5 But what does that mean??? Maximilian Kasy (Harvard) Experimental design 14 / 42

Introduction Some references Optimal design of experiments : Smith (1918), Kiefer and Wolfowitz (1959), Cox and Reid (2000), Shah and Sinha (1989) Nonparametric estimation of treatment effects : Imbens (2004) Gaussian process priors : Wahba (1990) (Splines), Matheron (1973); Yakowitz and Szidarovszky (1985) (“Kriging” in Geostatistics), Williams and Rasmussen (2006) (machine learning) Bayesian statistics, and design : Robert (2007), O’Hagan and Kingman (1978), Berry (2006) Simulated annealing : Kirkpatrick et al. (1983) Maximilian Kasy (Harvard) Experimental design 15 / 42

Decision problem A formal decision problem risk function of treatment assignment d ( X , U ), estimator � β , under loss L , data generating process θ : R ( d , � β | X , U , θ ) := E [ L ( � β, β ) | X , U , θ ] (1) ( d affects the distribution of � β ) (conditional) Bayesian risk: � R B ( d , � R ( d , � β | X , U ) := β | X , U , θ ) dP ( θ ) (2) � R B ( d , � R B ( d , � β | X ) := β | X , U ) dP ( U ) (3) � R B ( d , � R B ( d , � β ) := β | X , U ) dP ( X ) dP ( U ) (4) conditional minimax risk: R mm ( d , � R ( d , � β | X , U ) := max β | X , U , θ ) (5) θ objective: min R B or min R mm Maximilian Kasy (Harvard) Experimental design 16 / 42

Decision problem Optimality of deterministic designs Theorem Given � β ( Y , X , D ) 1 d ∗ ( X ) ∈ d ( X ) ∈{ 0 , 1 } n R B ( d , � argmin β | X ) (6) minimizes R B ( d , � β ) among all d ( X , U ) (random or not). 2 Suppose R B ( d 1 , � β | X ) − R B ( d 2 , � β | X ) is continuously distributed ∀ d 1 � = d 2 ⇒ d ∗ ( X ) is the unique minimizer of (6) . 3 Similar claims hold for R mm ( d , � β | X , U ) , if the latter is finite. Intuition: similar to why estimators should not randomize R B ( d , � β | X , U ) does not depend on U ⇒ neither do its minimizers d ∗ , � β ∗ Maximilian Kasy (Harvard) Experimental design 17 / 42

Decision problem Conditional independence Theorem Assume i.i.d. sampling stable unit treatment values, and D = d ( X , U ) for U ⊥ ( Y 0 , Y 1 , X ) | θ . Then conditional independence holds; P ( Y i | X i , D i = d i , θ ) = P ( Y d i i | X i , θ ) . This is true in particular for deterministic treatment assignment rules D = d ( X ) . Intuition: under i.i.d. sampling P ( Y d i i | X , θ ) = P ( Y d i i | X i , θ ) . Maximilian Kasy (Harvard) Experimental design 18 / 42

Nonparametric Bayes Nonparametric Bayes Let f ( X i , D i ) = E [ Y i | X i , D i , θ ]. Assumption (Prior moments) E [ f ( x , d )] = µ ( x , d ) Cov( f ( x 1 , d 1 ) , f ( x 2 , d 2 )) = C (( x 1 , d 1 ) , ( x 2 , d 2 )) Assumption (Mean squared error objective) Loss L ( � β, β ) = ( � β − β ) 2 , Bayes risk R B ( d , � β | X ) = E [( � β − β ) 2 | X ] Assumption (Linear estimators) β = w 0 + � � i w i Y i , where w i might depend on X and on D, but not on Y . Maximilian Kasy (Harvard) Experimental design 19 / 42

Why experimenters should not randomize, and what they should do - PowerPoint PPT Presentation

Why experimenters should not randomize, and what they should do instead Maximilian Kasy Department of Economics, Harvard University Maximilian Kasy (Harvard) Experimental design 1 / 42 Introduction project STAR Covariate means within school

Why Randomize? Adam Osman J-PAL Course Overview 1. What is Evaluation? 2. Outcomes, Impact,

To Randomize or Not To Randomize: Space Optimal Summaries for Hyperlink Analysis Tam as

How to Randomize? Roland Rathelot J-PAL Course Overview 1. What is Evaluation? 2. Outcomes,

Why Im NOT Why Im NOT Why Im NOT Why Im NOT a Hindu Why Im NOT a Hindu

Causality and randomization Maximilian Kasy November 2, 2018 Introduction This talk is based

PPD Status Jonathan Lewis All Experimenters Meeting 19 March 2018 EDIT School Great

NOvA Experiment Status All Experimenters Meeting Jaroslav Zalesak, FNAL/Institute of Physics,

NOvA Experiment Status All Experimenters Meeting Jaroslav Zalesak, FNAL/Institute of Physics,

Seven Rules of Thumb for Web Site Experimenters First two rules in this talk, rest in PPT

How to Randomize? Bruno Crepon J-PAL Lecture Overview Unit and method of randomization

CY CYANO ANOTOXINS: WHA INS: WHAT THEY ARE THEY ARE AND WHY THEY AND WHY THEY MA MATTER

Threats and Analysis Bruno Crpon J-PAL Course Overview 1. What is Evaluation? 2. Outcomes,

Sampling and Sample Size Rohit Naimpally J-PAL Course Overview 1. What is Evaluation? 2.

Threats and Analysis Bruno Crpon J-PAL Course Overview 1. What is Evaluation? 2. Outcomes,

What is Evaluation? Adam Osman J-PAL Course Overview 1. What is Evaluation? 2. Outcomes,

Post-Design Challenges Professor Supreet Kaur Department of Economics UC Berkeley Course

Randomized techniques for parameterized algorithms Dniel Marx 1 1 Institute of Computer Science

Approximation and Randomized Algorithms Lecturer: Shi Li Department of Computer Science and

15-251 Great Theoretical Ideas in Computer Science Lecture 20: Randomized Algorithms November

CSC373 Week 11: Randomized Algorithms 373F19 - Nisarg Shah & Karan Singh 1 Randomized

Random Graph Models Prof. Srijan Kumar 1 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web

Randomized algorithms Review basics from ``Think like the pros'' Recall QuickSort(low,

Certified Adversarial Robustness via Randomized Smoothing Jeremy Cohen Elan Rosenfeld

Biostatistics and Design Core up to 2016 Andrea J Cook, PhD Senior Investigator Biostatistics

Why experimenters should not randomize, and what they should do - PowerPoint PPT Presentation

Why experimenters should not randomize, and what they should do instead Maximilian Kasy Department of Economics, Harvard University Maximilian Kasy (Harvard) Experimental design 1 / 42 Introduction project STAR Covariate means within school

Why Randomize? Adam Osman J-PAL Course Overview 1. What is Evaluation? 2. Outcomes, Impact,

To Randomize or Not To Randomize: Space Optimal Summaries for Hyperlink Analysis Tam as

How to Randomize? Roland Rathelot J-PAL Course Overview 1. What is Evaluation? 2. Outcomes,

Why Im NOT Why Im NOT Why Im NOT Why Im NOT a Hindu Why Im NOT a Hindu

Causality and randomization Maximilian Kasy November 2, 2018 Introduction This talk is based

PPD Status Jonathan Lewis All Experimenters Meeting 19 March 2018 EDIT School Great

NOvA Experiment Status All Experimenters Meeting Jaroslav Zalesak, FNAL/Institute of Physics,

NOvA Experiment Status All Experimenters Meeting Jaroslav Zalesak, FNAL/Institute of Physics,

Seven Rules of Thumb for Web Site Experimenters First two rules in this talk, rest in PPT

How to Randomize? Bruno Crepon J-PAL Lecture Overview Unit and method of randomization

CY CYANO ANOTOXINS: WHA INS: WHAT THEY ARE THEY ARE AND WHY THEY AND WHY THEY MA MATTER

Threats and Analysis Bruno Crpon J-PAL Course Overview 1. What is Evaluation? 2. Outcomes,

Sampling and Sample Size Rohit Naimpally J-PAL Course Overview 1. What is Evaluation? 2.

Threats and Analysis Bruno Crpon J-PAL Course Overview 1. What is Evaluation? 2. Outcomes,

What is Evaluation? Adam Osman J-PAL Course Overview 1. What is Evaluation? 2. Outcomes,

Post-Design Challenges Professor Supreet Kaur Department of Economics UC Berkeley Course

Randomized techniques for parameterized algorithms Dniel Marx 1 1 Institute of Computer Science

Approximation and Randomized Algorithms Lecturer: Shi Li Department of Computer Science and

15-251 Great Theoretical Ideas in Computer Science Lecture 20: Randomized Algorithms November

CSC373 Week 11: Randomized Algorithms 373F19 - Nisarg Shah &amp; Karan Singh 1 Randomized

Random Graph Models Prof. Srijan Kumar 1 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web

Randomized algorithms Review basics from ``Think like the pros'' Recall QuickSort(low,

Certified Adversarial Robustness via Randomized Smoothing Jeremy Cohen Elan Rosenfeld

Biostatistics and Design Core up to 2016 Andrea J Cook, PhD Senior Investigator Biostatistics

CSC373 Week 11: Randomized Algorithms 373F19 - Nisarg Shah & Karan Singh 1 Randomized