What is to be done? Two attempts using Gaussian process priors - PowerPoint PPT Presentation

What is to be done? What is to be done? Two attempts using Gaussian process priors Maximilian Kasy Department of Economics, Harvard University Oct 14 2017 1 / 33

What is to be done? What questions should econometricians work on? ◮ Incentives of the publication process: ◮ Appeal to referees from the same subfield. ◮ Danger of self-referentiality, untethering from external relevance. ◮ Versus broader usefulness: ◮ Tools useful for empirical researchers, policy makers. ◮ Anchored in substantive applications, broader methodological considerations. ◮ One way to get there: Well defined decision problems. 2 / 33

What is to be done? Decision problems ◮ Objects to carefully choose: ◮ Objective function. ◮ Space of possible decisions / policy alternatives. ◮ Identifying assumptions. ◮ Prior information. ◮ Features the priors should be uninformative about. ◮ Once these are specified, coherent and well-behaved solutions can be derived. ◮ Useful tool for tractable solutions without functional form restrictions: Gaussian process priors. 3 / 33

What is to be done? Outline of this talk ◮ Brief introduction to Gaussian process regression ◮ Application 1: Optimal treatment assignment in experiments. ◮ Setting: Treatment assignment given baseline covariates ◮ General decision theory result: Non-random rules dominate random rules ◮ Prior for expectation of potential outcomes given covariates ◮ Expression for MSE of estimator for ATE to minimize by treatment assignment ◮ Application 2: Optimal insurance and taxation. ◮ Economic setting: Co-insurance rate for health insurance ◮ Statistical setting: prior for behavioral average response function ◮ Expression for posterior expected social welfare to maximize by choice of co-insurance rate 4 / 33

What is to be done? References Williams, C. and Rasmussen, C. (2006). Gaussian processes for machine learning . MIT Press, chapter 2. Kasy, M. (2016). Why experimenters might not always want to randomize, and what they could do instead. Political Analysis , 24(3):324–338. Kasy, M. (2017). Optimal taxation and insurance using machine learning. Working Paper, Harvard University . 5 / 33

What is to be done? Gaussian process regression Brief introduction to Gaussian process regression ◮ Suppose we observe n i.i.d. draws of ( Y i , X i ) , where Y i is real valued and X i is a k vector. ◮ Y i = f ( X i )+ ε i ◮ ε i | X , f ( · ) ∼ N ( 0 , σ 2 ) ◮ Prior: f is distributed according to a Gaussian process, f | X ∼ GP ( 0 , C ) , where C is a covariance kernel, Cov ( f ( x ) , f ( x ′ ) | X ) = C ( x , x ′ ) . ◮ We will leave conditioning on X implicit. 6 / 33

What is to be done? Gaussian process regression Posterior mean ◮ The joint distribution of ( f ( x ) , Y ) is given by � � � � �� f ( x ) C ( x , x ) c ( x ) ∼ N 0 , , c ( x ) ′ C + σ 2 I n Y where ◮ c ( x ) is the n vector with entries C ( x , X i ) , ◮ and C is the n × n matrix with entries C i , j = C ( X i , X j ) . ◮ Therefore � � − 1 · Y . C + σ 2 I n E [ f ( x ) | Y ] = c ( x ) · ◮ Read: � f ( · ) = E [ f ( · ) | Y ] ◮ is a linear combination of the functions C ( · , X i ) � � − 1 · Y . C + σ 2 I n ◮ with weights 7 / 33

What is to be done? Gaussian process regression Both applications use Gaussian process priors 1. Optimal experimental design ◮ How to assign treatment to minimize mean squared error for treatment effect estimators? ◮ Gaussian process prior for the conditional expectation of potential outcomes given covariates. 2. Optimal insurance and taxation ◮ How to choose a co-insurance rate or tax rate to maximize social welfare, given (quasi-)experimental data? ◮ Gaussian process prior for the behavioral response function mapping the co-insurance rate into the tax base. 8 / 33

What is to be done? Experimental design Application 1 “Why experimenters might not always want to randomize” Setup 1. Sampling: random sample of n units baseline survey ⇒ vector of covariates X i 2. Treatment assignment: binary treatment assigned by D i = d i ( X , U ) X matrix of covariates; U randomization device 3. Realization of outcomes: Y i = D i Y 1 i +( 1 − D i ) Y 0 i 4. Estimation: estimator � β of the (conditional) average treatment effect, β = 1 n ∑ i E [ Y 1 i − Y 0 i | X i , θ ] 9 / 33

What is to be done? Experimental design Questions ◮ How should we assign treatment? ◮ In particular, if X i has continuous or many discrete components? ◮ How should we estimate β ? ◮ What is the role of prior information? 10 / 33

What is to be done? Experimental design Some intuition ◮ “Compare apples with apples” ⇒ balance covariate distribution. ◮ Not just balance of means! ◮ We don’t add random noise to estimators – why add random noise to experimental designs? ◮ Identification requires controlled trials (CTs), but not randomized controlled trials (RCTs). 11 / 33

What is to be done? Experimental design General decision problem allowing for randomization ◮ General decision problem: ◮ State of the world θ , observed data X , randomization device U ⊥ X , ◮ decision procedure δ ( X , U ) , loss L ( δ ( X , U ) , θ ) . ◮ Conditional expected loss of decision procedure δ ( X , U ) : R ( δ , θ | U = u ) = E [ L ( δ ( X , u ) , θ ) | θ ] ◮ Bayes risk: � � R B ( δ , π ) = R ( δ , θ | U = u ) d π ( θ ) dP ( u ) ◮ Minimax risk: � R mm ( δ ) = R ( δ , θ | U = u ) dP ( u ) max θ 12 / 33

What is to be done? Experimental design Theorem (Optimality of deterministic decisions) Consider a general decision problem. Let R ∗ equal R B or R mm . Then: 1. The optimal risk R ∗ ( δ ∗ ) , when considering only deterministic procedures δ ( X ) , is no larger than the optimal risk when allowing for randomized procedures δ ( X , U ) . 2. If the optimal deterministic procedure δ ∗ is unique, then it has strictly lower risk than any non-trivial randomized procedure. 13 / 33

What is to be done? Experimental design Proof ◮ Any probability distribution P ( u ) satisfies ◮ ∑ u P ( u ) = 1, P ( u ) ≥ 0 for all u . ◮ Thus ∑ u R u · P ( u ) ≥ min u R u for any set of values R u . ◮ Let δ u ( x ) = δ ( x , u ) . ◮ Then � R B ( δ , π ) = ∑ R ( δ u , θ ) d π ( θ ) P ( u ) u � R ( δ u , θ ) d π ( θ ) = min u R B ( δ u , π ) . ≥ min u ◮ Similarly R mm ( δ ) = ∑ R ( δ u , θ ) P ( u ) max θ u R ( δ u , θ ) = min u R mm ( δ u ) . ≥ min u max θ 14 / 33

What is to be done? Experimental design Bayesian setup ◮ Back to experimental design setting. ◮ Conditional distribution of potential outcomes: for d = 0 , 1 Y d i | X i = x ∼ N ( f ( x , d ) , σ 2 ) . ◮ Gaussian process prior: f ∼ GP ( µ , C ) , E [ f ( x , d )] = µ ( x , d ) Cov ( f ( x 1 , d 1 ) , f ( x 2 , d 2 )) = C (( x 1 , d 1 ) , ( x 2 , d 2 )) ◮ Conditional average treatment effect (CATE): β = 1 i | X i , θ ] = 1 n ∑ E [ Y 1 i − Y 0 n ∑ f ( X i , 1 ) − f ( X i , 0 ) . i i 15 / 33

What is to be done? Experimental design Notation ◮ Covariance matrix C , where C i , j = C (( X i , D i ) , ( X j , D j )) ◮ Mean vector µ , components µ i = µ ( X i , D i ) ◮ Covariance of observations with CATE, C i = Cov ( Y i , β | X , D ) = 1 n ∑ ( C (( X i , D i ) , ( X j , 1 )) − C (( X i , D i ) , ( X j , 0 ))) . j 16 / 33

What is to be done? Experimental design Posterior expectation and risk ◮ The posterior expectation � β of β equals β = µ β + C ′ · ( C + σ 2 I ) − 1 · ( Y − µ ) . � ◮ The corresponding risk equals R B ( d , � β | X ) = Var ( β | X , Y ) = Var ( β | X ) − Var ( E [ β | X , Y ] | X ) = Var ( β | X ) − C ′ · ( C + σ 2 I ) − 1 · C . 17 / 33

What is to be done? Experimental design Discrete optimization ◮ The optimal design solves C ′ · ( C + σ 2 I ) − 1 · C . max d ◮ Possible optimization algorithms: 1. Search over random d 2. greedy algorithm 3. simulated annealing 18 / 33

What is to be done? Experimental design Special case linear separable model ◮ Suppose f ( x , d ) = x ′ · γ + d · β , γ ∼ N ( 0 , Σ) , and we estimate β using comparison of means. 1 − X 0 ) ′ · γ , prior expected squared bias ◮ Bias of � β equals ( X 1 − X 1 − X 0 ) ′ · Σ · ( X 0 ) . ( X ◮ Mean squared error � � 1 + 1 1 − X 1 − X MSE ( d 1 ,..., d n ) = σ 2 · 0 ) ′ · Σ · ( X 0 ) . +( X n 1 n 0 ◮ ⇒ Risk is minimized by 1. choosing treatment and control arms of equal size, 2. and optimizing balance as measured by the difference in covariate 1 − X 0 ) . means ( X 19 / 33

What is to be done? Optimal insurance Application 2 “Optimal insurance and taxation using machine learning” Economic setting ◮ Population of insured individuals i . ◮ Y i : health care expenditures of individual i . ◮ T i : share of health care expenditures covered by the insurance 1 − T i : coinsurance rate; Y i · ( 1 − T i ) : out-of-pocket expenditures ◮ Behavioral response to share covered: structural function Y i = g ( T i , ε i ) . ◮ Per capita expenditures under policy t : average structural function m ( t ) = E [ g ( t , ε i )] . 20 / 33

What is to be done? Two attempts using Gaussian process priors - PowerPoint PPT Presentation

What is to be done? What is to be done? Two attempts using Gaussian process priors Maximilian Kasy Department of Economics, Harvard University Oct 14 2017 1 / 33 What is to be done? What questions should econometricians work on?

Gaussian Filter The Gaussian filter 1 2 1 A Gaussian kernel gives less 1 2 4 2 weight to

Lecture 3 Capacity of Multiuser Gaussian Channels The Gaussian uplink: 6.1 The fading

Faster Gaussian Lattice Sampling using Information Leakage Gaussian Sampling Our Work Lazy

Gaussian Process Lei Tang Arizona State University Jul. 31th, 2007 Lei Tang (ASU) Gaussian

Gaussian Processes Seung-Hoon Na Chonbuk National University Gaussian Process Regression

Non-Gaussian likelihoods for Gaussian Processes Alan Saul Outline Motivation Non-Gaussian

CS70: Jean Walrand: Lecture 36. Gaussian and CLT CS70: Jean Walrand: Lecture 36. Gaussian and

Winning Presentation in a Day Get It Done Right, Get It Done Fast Winning Presentation in a Day

Winning Presentation in a Day Get It Done Right, Get It Winning Presentation in a Day Get It Done

Gaussian Processes Dan Cervone NYU CDS November 10, 2015 Dan Cervone (NYU CDS) Gaussian

So What Has So, What Has So, What Has So What Has Vision Done For Vision Done For Vision Done

never done jalewis@thoughtworks.com @boicy 1 never done 2 never done Incomplete adjective

never done jalewis@thoughtworks.com @boicy 1 never done 2 never done Incomplete adjective

CSci 8980: Advanced Topics in Graphical Models Gaussian Processes Instructor: Arindam Banerjee

MAP for Gaussian mean and variance Conjugate priors Mean: Gaussian prior Variance:

CMPUT 466 Introduction to Gaussian Processes Dan Lizotte The Plan Introduction to Gaussian

Revealed preference and aggregation Bram De Rock (Brussels) Joint work with Laurens Cherchye

ASL CrIS Cal L. Strow UMBC Overview Pre-Launch Spectral Calibration of the CrIS Sensitivity

srt t r

Full Diversity Unitary Precoded Integer-Forcing Amin Sakzad Clayton School of IT Joint work with

On Multi-Period Risk Functionals Georg Ch. Pflug September 25, 2008 Georg Ch. Pflug On

Voltage Divider Rule Series and parallel resistors Current divider and voltage divider

Treatment Time / MU calculation in RT Maria Rosa Malisan Maria Rosa Malisan Clinical Dose

Combinatorial Auctions Do Need Modest Interaction Sepehr Assadi University of Pennsylvania