Policy Evaluation with Latent Confounders via Optimal Balance Andrew Bennett 1 Cornell University awb222@cornell.edu Nathan Kallus 1 Cornell University kallus@cornell.edu 1 Alphabetical order. 1 / 33
Policy Learning Problem Given some observational data on individuals described by some covariates ( X ), interventions performed on those individuals ( T ), and resultant outcomes ( Y ), wish to estimate utility of policies that assign treatment to individuals based on covariates Challenging problem when the relationship between T and Y in the logged data is confounded, even controlling for X Examples: Drug assignment policy: X is patient information available to doctors, T is drug assigned, Y is medical outcome, and confounding due to factors not fully accounted for by X (e.g. socieoconomics) deciding drug assignment in observational data Personalized education: X contains individual student statistics, T is an educational intervention, Y is measure of post-intervention student outcomes, and confounding due to X poorly accounting for criteria used by decision makers in observational data (e.g. X contains standardized test score but decisions made based on actual student capability) 2 / 33
Setup - Latent Confounder Framework Logged Data Model: Latent Confounders: Z ∈ Z ⊆ R p Observed Proxies: X ∈ X ⊆ R q Treatment: T ∈ { 1 , . . . , m } Potential Outcomes: Y ( t ) ∈ R Assumption ( Z are true confounders) For every t ∈ { 1 , . . . , m } , the variables X , T , Y ( t ) are mutually independent, conditioned on Z. X Z T Y 3 / 33
Setup - Logging and Behavior Policies Evaluation Policy: π t ( x ) denotes the probability of assigning treatment T = t given observed proxies X = x by evaluation policy Logging Policy: e t ( z ) denotes the probability of assigning treatment T = t given observed proxies Z = z by logging policy η t ( x ) denotes the probability of assigning treatment T = t given observed proxies X = x by logging policy 4 / 33
Setup - Policy Evaluation Goal Definition (Policy Value) τ π = E [ � m t =1 π t ( X ) Y ( t )]. Goal: Our goal is to estimate the policy value τ π given iid logged data of the form (( X 1 , T 1 , Y 1 ) , . . . , ( X n , T n , Y n )) τ π that minimizes the MSE E [(ˆ τ π − τ π ) 2 ] Want to find an estimator ˆ 5 / 33
Setup - Latent Confounder Model X Z T Y We denote by ϕ ( z ; x , t ) the conditional density of Z given X = x , T = t Assumption (Latent Confounder Model) We assume that we have an identified model for ϕ ( z ; x , t ) , and that we can calculate conditional densities and sample Z values using this model 6 / 33
Setup - Observed Proxies X Z T Y We do not assume ignorability given X This means standard approaches based on inverse propensity scores are bound to fail Instead the proxies X can be used (along with T ) to calculate the posterior of the true confounders Z , which can be used for evaluation 7 / 33
Setup - Additional Assumptions Assumption (Weak Overlap) E [ e − 2 ( Z )] < ∞ t Assumption (Bounded Variance) The conditional variance of our potential outcomes given X , T is bounded: V [ Y ( t ) | X , T ] ≤ σ 2 . 8 / 33
Setup - Mean Value Functions Define the following mean value functions: µ t ( z ) = E [ Y ( t ) | Z = z ] ν t ( x , t ′ ) = E [ Y ( t ) | X = x , T = t ′ ] = E [ µ t ( Z ) | X = x , T = t ′ ] ρ t ( x ) = E [ Y ( t ) | X = x ] = E [ µ t ( Z ) | X = x ] Note that we can equivalently redefine policy value as: m τ π = E [ � π t ( X ) Y ( t )] t =1 m � = E [ π t ( X ) µ t ( Z )] t =1 m � = E [ π t ( X ) ν t ( X , T )] t =1 9 / 33
Past Work - Standard Estimator Types Weighted, Direct, and Doubly Robust estimators: n W = 1 � ˆ τ π W i Y i n i =1 n m ρ = 1 � � τ π ˆ π t ( X i )ˆ ρ t ( X i ) ˆ n i =1 t =1 n m n ρ = 1 ρ t ( X i ) + 1 � � � τ π ˆ π t ( X i )ˆ W i ( Y i − ˆ ρ T i ( X i )) W , ˆ n n i =1 t =1 i =1 Note that ˆ ρ t is not straightforward to estimate via regression since ρ t ( x ) = E [ Y ( t ) | X = x ] � = E [ Y | X = x ] Correct IPW weights W i = π T i ( X i ) / e T i ( Z i ) are infeasible since Z i is not observed, and naively misspecified IPW weights W i = π T i ( X i ) /η T i ( X i ) lead to biased evaluation 10 / 33
Past Work - Optimal Balancing Optimal Balancing (Kallus 2018) seeks to come up with a set of weights W i that ˆ τ π W minimize an estimate of the worst-case MSE of policy evaluation, given a class of functions for the unknown mean value function Define CMSE ( W , µ ) to be the conditional mean squared error given the logged data of ˆ τ π W as an estimate of the sample average policy effect (SAPE), if the mean value function were given by µ Choose weights W ∗ for evaluation according to the rule: W ∗ = arg min sup CMSE ( W , µ ) W ∈W µ ∈F Permits simple QP algorithm when F is a class of RKHS functions 11 / 33
Generalized IPS Weights I Suppose we want to define weights W ( X , T ) IPS-style such that the weighted estimator is unbiased term-by-term, this requires solving: E [ W ( X , T ) δ T i t Y ( t )] = E [ π t ( X ) Y ( t )] Can easily verify that if we assume ignorability given X this equation is solved by standard IPS weights W ( X , T ) = π T ( X ) /η T ( X ) Theorem (Generalized IPS Weights) If W ( x , t ) satisfies the above equation then for each t ∈ { 1 , . . . , m } � m t ′ =1 η t ′ ( x ) ν t ( x , t ′ ) + Ω t ( x ) W ( x , t ) = π t ( x ) , η t ( x ) ν t ( x , t ) for some Ω t ( x ) such that E [Ω t ( X )] = 0 ∀ t. 12 / 33
Generalized IPS Weights II Calculating these generalized IPS weights is not straightforward since it involves the counterfactual estimation of ν t ( x , t ′ ) for t � = t ′ (requires knowledge of Z ) In addition would expect high variance from error in estimating ν t due to its position in denominator However the fact that such weights exist supports idea of using optimal balancing style approach, and choosing weights that balance a flexible class of possible mean outcome functions 13 / 33
Adversarial Objective Motivation Define the following, where we embed the dependence on µ inside ν t implicitly: f it = W i δ T i t − π t ( X i ) � 2 � n m + 2 σ 2 1 � � n 2 � W � 2 J ( W , µ ) = f it ν t ( X i , T i ) 2 , n i =1 t =1 Theorem (CMSE Upper Bound) W − τ π ) 2 | X 1: n , T 1: n ] ≤ 2 J ( W , µ ) + O p (1 / n ) . E [(ˆ τ π Lemma (CMSE Convergence implies Consistency) W = τ π + O p (1 / √ n ) . W − τ π ) 2 | X 1: n , T 1: n ] = O p (1 / n ) then ˆ If E [(ˆ τ π τ π 14 / 33
Balancing Objective Our optimal balancing objective is to choose weights W ∗ for evaluation according to the following optimzation problem: W ∗ = arg min sup J ( W , µ ) W ∈W µ ∈F 15 / 33
Feasibility of Balancing Objective I Minimizing J ( W , µ ) over some class of µ ∈ F corresponds to balancing some class of functions ν implicitly indexed by µ , since: � 2 � n n m 1 W i ν T i ( X i , T i ) − 1 � � � J ( W , µ ) = π t ( X i ) ν t ( X i , T i ) n n i =1 i =1 t =1 + 2 σ 2 n 2 � W � 2 2 Note that such balancing would be impossible over a generic flexible class of functions ν ignoring Z , due to ν t ( x , t ′ ) terms for t � = t ′ 16 / 33
Feasibility of Balancing Objective II The following lemma suggests that this fundamental counterfactual issue may not be a problem given our implicit constraint imposed by indexing using µ and our overlap assumption: Lemma (Mean Value Function Overlap) Assuming � µ t � ∞ ≤ b, under our weak overlap assumption, for all x ∈ X , and t , t ′ , t ′′ ∈ { 1 , . . . , m } we have | ν t ( x , t ′′ ) | ≤ η t ′ ( x ) � 8 b E [ e − 2 ( Z ) | X = x , T = t ′ ] | ν t ( x , t ′ ) | . t η t ′′ ( x ) 17 / 33
Assumptions for Consistent Evaluation I Define F t = { µ t : ∃ ( µ ′ 1 , . . . , µ ′ m ) ∈ F with µ ′ t = µ t } , then we make the following assumptions: Assumption (Normed) For each t ∈ { 1 , . . . , m } there exists a norm � · � t on span( F t ) , and there exists a norm � · � on span( F ) which is defined given some R m norm as � µ � = � ( � µ 1 � 1 , . . . , � µ m � m ) � . Assumption (Absolutely Star Shaped) For every µ ∈ F and | λ | ≤ 1 , we have λµ ∈ F . Assumption (Convex Compact) F is convex and compact 18 / 33
Assumptions for Consistent Evaluation II Assumption (Square Integrable) For each t ∈ { 1 , . . . , m } the space F t is a subset of L 2 ( Z ) , and its norm dominates the L 2 norm (i.e., inf µ t ∈F t � µ t � / � µ t � L 2 > 0 ). Assumption (Nondegeneracy) Define B ( γ ) = { µ ∈ span( F ) : � µ � ≤ γ } . Then we have B ( γ ) ⊆ F for some γ > 0 . Assumption (Boundedness) sup µ ∈F � µ � ∞ < ∞ . 19 / 33
Assumptions for Consistent Evaluation III Definition (Rademacher Complexity) 1 � n R n ( F ) = E [sup f ∈F i =1 ǫ i f ( Z i )], where ǫ i are iid Rademacher random n variables. Assumption (Complexity) For each t ∈ { 1 . . . , m } we have R n ( F t ) = o (1) . 20 / 33
Recommend
More recommend