Econ 2148, fall 2017 Applications of Gaussian process priors - PowerPoint PPT Presentation

Shrinkage Econ 2148, fall 2017 Applications of Gaussian process priors Maximilian Kasy Department of Economics, Harvard University 1 / 36

Shrinkage Applications from my own work Agenda ◮ Optimal treatment assignment in experiments. ◮ Setting: Treatment assignment given baseline covariates ◮ General decision theory result: Non-random rules dominate random rules ◮ Prior for expectation of potential outcomes given covariates ◮ Expression for MSE of estimator for ATE to minimize by treatment assignment ◮ Optimal insurance and taxation. ◮ Review: Envelope theorem. ◮ Economic setting: Co-insurance rate for health insurance ◮ Statistical setting: prior for behavioral average response function ◮ Expression for posterior expected social welfare to maximize by choice of co-insurance rate 2 / 36

Shrinkage Applications use Gaussian process priors 1. Optimal experimental design ◮ How to assign treatment to minimize mean squared error for treatment effect estimators? ◮ Gaussian process prior for the conditional expectation of potential outcomes given covariates. 2. Optimal insurance and taxation ◮ How to choose a co-insurance rate or tax rate to maximize social welfare, given (quasi-)experimental data? ◮ Gaussian process prior for the behavioral response function mapping the co-insurance rate into the tax base. 3 / 36

Shrinkage Experimental design Application 1 “Why experimenters might not always want to randomize” Setup 1. Sampling: random sample of n units baseline survey ⇒ vector of covariates X i 2. Treatment assignment: binary treatment assigned by D i = d i ( X , U ) X matrix of covariates; U randomization device 3. Realization of outcomes: Y i = D i Y 1 i +( 1 − D i ) Y 0 i 4. Estimation: estimator � β of the (conditional) average treatment effect, β = 1 n ∑ i E [ Y 1 i − Y 0 i | X i , θ ] 4 / 36

Shrinkage Experimental design Questions ◮ How should we assign treatment? ◮ In particular, if X i has continuous or many discrete components? ◮ How should we estimate β ? ◮ What is the role of prior information? 5 / 36

Shrinkage Experimental design Some intuition ◮ “Compare apples with apples” ⇒ balance covariate distribution. ◮ Not just balance of means! ◮ We don’t add random noise to estimators – why add random noise to experimental designs? ◮ Identification requires controlled trials (CTs), but not randomized controlled trials (RCTs). 6 / 36

Shrinkage Experimental design General decision problem allowing for randomization ◮ General decision problem: ◮ State of the world θ , observed data X , randomization device U ⊥ X , ◮ decision procedure δ ( X , U ) , loss L ( δ ( X , U ) , θ ) . ◮ Conditional expected loss of decision procedure δ ( X , U ) : R ( δ , θ | U = u ) = E [ L ( δ ( X , u ) , θ ) | θ ] ◮ Bayes risk: � � R B ( δ , π ) = R ( δ , θ | U = u ) d π ( θ ) dP ( u ) ◮ Minimax risk: � R mm ( δ ) = R ( δ , θ | U = u ) dP ( u ) max θ 7 / 36

Shrinkage Experimental design Theorem (Optimality of deterministic decisions) Consider a general decision problem. Let R ∗ equal R B or R mm . Then: 1. The optimal risk R ∗ ( δ ∗ ) , when considering only deterministic procedures δ ( X ) , is no larger than the optimal risk when allowing for randomized procedures δ ( X , U ) . 2. If the optimal deterministic procedure δ ∗ is unique, then it has strictly lower risk than any non-trivial randomized procedure. 8 / 36

Shrinkage Experimental design Practice problem Proof this. Hints: ◮ Assume for simplicity that U has finite support. ◮ Note that a (weighted) average of numbers is always at least as large as their minimum. ◮ Write the risk (Bayes or minimax) of any randomized assignment rule as (weighted) average of the risk of deterministic rules. 9 / 36

Shrinkage Experimental design Solution ◮ Any robability distribution P ( u ) satisfies ◮ ∑ u P ( u ) = 1, P ( u ) ≥ 0 for all u . ◮ Thus ∑ u R u · P ( u ) ≥ min u R u for any set of values R u . ◮ Let δ u ( x ) = δ ( x , u ) . ◮ Then � R B ( δ , π ) = ∑ R ( δ u , θ ) d π ( θ ) P ( u ) u � R ( δ u , θ ) d π ( θ ) = min u R B ( δ u , π ) . ≥ min u ◮ Similarly R mm ( δ ) = ∑ R ( δ u , θ ) P ( u ) max θ u R ( δ u , θ ) = min u R mm ( δ u ) . ≥ min u max θ 10 / 36

Shrinkage Experimental design Bayesian setup ◮ Back to experimental design setting. ◮ Conditional distribution of potential outcomes: for d = 0 , 1 Y d i | X i = x ∼ N ( f ( x , d ) , σ 2 ) . ◮ Gaussian process prior: f ∼ GP ( µ , C ) , E [ f ( x , d )] = µ ( x , d ) Cov ( f ( x 1 , d 1 ) , f ( x 2 , d 2 )) = C (( x 1 , d 1 ) , ( x 2 , d 2 )) ◮ Conditional average treatment effect (CATE): β = 1 i | X i , θ ] = 1 n ∑ E [ Y 1 i − Y 0 n ∑ f ( X i , 1 ) − f ( X i , 0 ) . i i 11 / 36

Shrinkage Experimental design Notation: ◮ Covariance matrix C , where C i , j = C (( X i , D i ) , ( X j , D j )) ◮ Mean vector µ , components µ i = µ ( X i , D i ) ◮ Covariance of observations with CATE, C i = Cov ( Y i , β | X , D ) = 1 n ∑ ( C (( X i , D i ) , ( X j , 1 )) − C (( X i , D i ) , ( X j , 0 ))) . j Practice problem ◮ Derive the posterior expectation � β of β . ◮ Derive risk of any deterministic treatment assignment vector d , assuming 1. The estimator � β is used. 2. The loss function ( � β − β ) 2 is considered. 12 / 36

Shrinkage Experimental design Solution ◮ The posterior expectation � β of β equals β = µ β + C ′ · ( C + σ 2 I ) − 1 · ( Y − µ ) . � ◮ The corresponding risk equals R B ( d , � β | X ) = Var ( β | X , Y ) = Var ( β | X ) − Var ( E [ β | X , Y ] | X ) = Var ( β | X ) − C ′ · ( C + σ 2 I ) − 1 · C . 13 / 36

Shrinkage Experimental design Discrete optimization ◮ The optimal design solves C ′ · ( C + σ 2 I ) − 1 · C . max d ◮ Possible optimization algorithms: 1. Search over random d 2. greedy algorithm 3. simulated annealing 14 / 36

Shrinkage Experimental design Variation of the problem Practice problem ◮ Suppose that the researcher insists on estimating β using a simple comparison of means, β = 1 D i Y i − 1 � n 1 ∑ n 0 ∑ ( 1 − D i ) Y i . i i ◮ Derive again the risk of any deterministic treatment assignment vector d , assuming 1. The estimator � β is used. 2. The loss function ( � β − β ) 2 is considered. 15 / 36

Shrinkage Experimental design Solution ◮ Notation: i = µ ( X i , d ) and C d 1 , d 2 ◮ Let µ d = C (( X i , d 1 ) , ( X j , d 2 )) . i , j ◮ Collect these terms in the vectors µ d and matrices C d 1 , d 2 , and let � � C 00 C 01 µ = ( µ 1 , µ 2 ) , � � C = . C 10 C 11 ◮ Weights w = ( w 0 , w 1 ) , w 1 i = d i n 1 − 1 n , i = − 1 − d i w 0 n 0 + 1 n . ◮ Risk: Sum of variance and squared bias, � � � � 2 + w ′ · � 1 + 1 β | X ) = σ 2 · w ′ · � R B ( d , � + µ C · w . n 1 n 0 16 / 36

Shrinkage Experimental design Special case linear separable model ◮ Suppose f ( x , d ) = x ′ · γ + d · β , γ ∼ N ( 0 , Σ) , and we estimate β using comparison of means. 1 − X 0 ) ′ · γ , prior expected squared bias ◮ Bias of � β equals ( X 1 − X 1 − X 0 ) ′ · Σ · ( X 0 ) . ( X ◮ Mean squared error � � 1 + 1 1 − X 1 − X MSE ( d 1 ,..., d n ) = σ 2 · 0 ) ′ · Σ · ( X 0 ) . +( X n 1 n 0 ◮ ⇒ Risk is minimized by 1. choosing treatment and control arms of equal size, 2. and optimizing balance as measured by the difference in covariate 1 − X 0 ) . means ( X 17 / 36

Shrinkage Envelope theorem Review for application 2: The envelope theorem ◮ Policy parameter t ◮ Vector of individual choices x ◮ Choice set X ◮ Individual utility υ ( x , t ) ◮ Realized choices x ( t ) ∈ argmax υ ( x , t ) . x ∈ X ◮ Realized utility V ( t ) = max x ∈ X υ ( x , t ) = υ ( x ( t ) , t ) 18 / 36

Shrinkage Envelope theorem ◮ Let x ∗ = x ( t ∗ ) for some fixed t ∗ ◮ Define V ( t ) = V ( t ) − υ ( x ∗ , t ) ˜ (1) = υ ( x ( t ) , t ) − υ ( x ( t ∗ ) , t ) x ∈ X υ ( x , t ) − υ ( x ∗ , t ) . = max (2) ◮ Definition of ˜ V immediately implies: ◮ ˜ V ( t ) ≥ 0 for all t and ˜ V ( t ∗ ) = 0. ◮ Thus: t ∗ is a global minimizer of ˜ V . ◮ If ˜ V is differentiable at t ∗ : ˜ V ′ ( t ∗ ) = 0 ◮ Thus V ′ ( t ∗ ) = ∂ ∂ t υ ( x ∗ , t ) | t = t ∗ , ◮ Behavioral responses don’t matter for effect of policy change on individual utility! 19 / 36

Shrinkage Optimal insurance Application 2 “Optimal insurance and taxation using machine learning” Economic setting ◮ Population of insured individuals i . ◮ Y i : health care expenditures of individual i . ◮ T i : share of health care expenditures covered by the insurance 1 − T i : coinsurance rate; Y i · ( 1 − T i ) : out-of-pocket expenditures ◮ Behavioral response to share covered: structural function Y i = g ( T i , ε i ) . ◮ Per capita expenditures under policy t : average structural function m ( t ) = E [ g ( t , ε i )] . 20 / 36

Econ 2148, fall 2017 Applications of Gaussian process priors - PowerPoint PPT Presentation

Shrinkage Econ 2148, fall 2017 Applications of Gaussian process priors Maximilian Kasy Department of Economics, Harvard University 1 / 36 Shrinkage Applications from my own work Agenda Optimal treatment assignment in experiments.

Econ 2148, fall 2019 Applications of Gaussian process priors Maximilian Kasy Department of

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines

Econ 2148, fall 2019 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines

Econ 2148, fall 2017 Shrinkage in the Normal means model Maximilian Kasy Department of

Econ 2148, fall 2017 Statistical decision theory Maximilian Kasy Department of Economics,

Econ 2148, fall 2017 Instrumental variables II, continuous treatment Maximilian Kasy Department

Econ 2148, fall 2017 Instrumental variables I, origins and binary treatment case Maximilian Kasy

Econ 2148, fall 2019 Data visualization Maximilian Kasy Department of Economics, Harvard

Econ 2148, fall 2019 Shrinkage in the Normal means model Maximilian Kasy Department of

Econ 2148, fall 2019 Text as data Maximilian Kasy Department of Economics, Harvard University 1

Econ 2148, fall 2019 Multi-armed bandits Maximilian Kasy Department of Economics, Harvard

Econ 2148, fall 2019 Trees, forests, and causal trees Maximilian Kasy Department of Economics,

Econ 2148, fall 2019 Statistical decision theory Maximilian Kasy Department of Economics,

Econ 2148, fall 2019 Instrumental variables II, continuous treatment Maximilian Kasy Department

Econ 2148, fall 2019 Instrumental variables I, origins and binary treatment case Maximilian Kasy

Tutorials on the Gaussian Random Process and its OR Applications By Juta Pichitlamken

Gaussian Process Lei Tang Arizona State University Jul. 31th, 2007 Lei Tang (ASU) Gaussian

Gaussian Processes Seung-Hoon Na Chonbuk National University Gaussian Process Regression

Advances in using GPs with derivative observations Gaussian Process approximations 2017

Introduction of WirelessHART CSE 521S, Fall 2019 Yehan Ma Applications in Process Industry

Stanford CS193p Developing Applications for iOS Fall 2017-18 CS193p Fall 2017-18 Today More

Lecture 13 Gaussian Process Models - Part 2 Colin Rundel 03/01/2017 1 EDA and GPs 2 t i t j t

Stanford CS193p Developing Applications for iOS Fall 2017-18 CS193p Fall 2017-18 Today Segues

Stanford CS193p Developing Applications for iOS Fall 2017-18 CS193p Fall 2017-18 Today Emoji