bootstrapping sensitivity analysis
play

Bootstrapping Sensitivity Analysis Qingyuan Zhao Department of - PowerPoint PPT Presentation

Bootstrapping Sensitivity Analysis Qingyuan Zhao Department of Statistics, The Wharton School University of Pennsylvania June 2, 2019 @ OSU Bayesian Causal Inference Workshop (Joint work with Bhaswar B. Battacharya and Dylan S. Small) Why


  1. Bootstrapping Sensitivity Analysis Qingyuan Zhao Department of Statistics, The Wharton School University of Pennsylvania June 2, 2019 @ OSU Bayesian Causal Inference Workshop (Joint work with Bhaswar B. Battacharya and Dylan S. Small)

  2. Why sensitivity analysis? ◮ Unless we have perfectly executed randomized experiment, causal inference is based on some unverifiable assumptions . ◮ In observational studies, the most commonly used assumption is ignorability or no unmeasured confounding : � X . � A ⊥ ⊥ Y (0) , Y (1) We can only say this assumption is “plausible”. ◮ Sensitivity analysis asks: what if this assumption does not hold? Does our qualitative conclusion still hold? ◮ This question appears in many settings: 1. Confounded observational studies. 2. Survey sampling with missing not at random (MNAR). 3. Longitudinal study with non-ignorable dropout. ◮ In general, this means that the target parameter (e.g. average treatment effect) is only partially identified . 1/20

  3. Overview: Bootstrapping sensitivity analysis Point-identified parameter: Efron’s bootstrap Bootstrap Point estimator = = = = = = = = = = = = ⇒ Confidence interval Partially identified parameter: An analogy Optimization Percentile Bootstrap Minimax inequality Extrema estimator = = = = = = = = = = = = ⇒ Confidence interval Rest of the talk Apply this idea to IPW estimators in a marginal sensitivity model. 2/20

  4. Some existing sensitivity models Generally, we need to specify how unconfoundedness is violated. 1. Y models: Consider a specific difference between the conditional distribution Y ( a ) | X , A and Y ( a ) | X . ◮ Commonly called “pattern mixture models”. ◮ Robins (1999, 2002); Birmingham et al. (2003); Vansteelandt et al. (2006); Daniels and Hogan (2008). 2. A models: Consider a specific difference between the conditional distribution A | X , Y ( a ) and A | X . ◮ Commonly called “selection models”. ◮ Scharfstein et al. (1999); Gilbert et al. (2003). 3. Simultaneous models: Consider a range of A models and/or Y models and report the “worst case” result. ◮ Cornfield et al. (1959); Rosenbaum (2002); Ding and VanderWeele (2016). Our sensitivity model— A hybrid of 2nd and 3rd, similar to Rosenbaum’s. 3/20

  5. Rosenbaum’s sensitivity model ◮ Imagine there is an unobserved confounder U that “summarizes” all confounding, so A ⊥ ⊥ Y (0) , Y (1) | X , U . ◮ Let e 0 ( x , u ) = P 0 ( A = 1 | X = x , U = u ). Rosenbaum’s sensitivity model e ( x , u ) : 1 � � R (Γ) = Γ ≤ OR ( e ( x , u 1 ) , e ( x , u 2 )) ≤ Γ , ∀ x ∈ X , u 1 , u 2 , where OR ( p 1 , p 2 ) := [ p 1 / (1 − p 1 )] / [ p 2 / (1 − p 2 )] is the odds ratio . ◮ Rosenbaum’s question: can we reject the sharp null hypothesis Y (0) ≡ Y (1) for every e 0 ( x , u ) ∈ R (Γ)? ◮ Robins (2002): we don’t need to assume the existence of U . Let U = Y (1) when the goal is to estimate E [ Y (1)]. 4/20

  6. Our sensitivity model ◮ Let e 0 ( x ) = P 0 ( A = 1 | X = x ) be the propensity score. Marginal sensitivity models e ( x , y ) : 1 � � M (Γ) = Γ ≤ OR ( e ( x , y ) , e 0 ( x )) ≤ Γ , ∀ x ∈ X , y . ◮ Compare this to Rosenbaum’s model: e ( x , u ) : 1 � � R (Γ) = Γ ≤ OR ( e ( x , u 1 ) , e ( x , u 2 )) ≤ Γ , ∀ x ∈ X , u 1 , u 2 . ◮ Tan (2006) first considered this model, but he did not consider statistical inference in finite sample. √ Γ) ⊆ R (Γ) ⊆ M (Γ) . 1 ◮ Relationship between the two models: M ( ◮ For observational studies, we assume both P 0 ( A = 1 | X , Y (1)) , P 0 ( A = 1 | X , Y (0)) ∈ M (Γ). 1 The second part needs “compatibility”: e ( x , y ) marginalizes to e 0 ( x ). 5/20

  7. Parametric extension ◮ In practice, the propensity score e 0 ( X ) = P 0 ( A = 1 | X ) is often estimated by a parametric model. Definition (Parametric marginal sensitivity models) e ( x , y ) : 1 � � M β 0 (Γ) = Γ ≤ OR ( e ( x , y ) , e β 0 ( x )) ≤ Γ , ∀ x ∈ X , y , where e β 0 ( x ) is the best parametric approximation of e 0 ( x ). This sensitivity model covers both 1. Model misspecification , that is, e β 0 ( x ) � = e 0 ( x ); and 2. Missing not at random , that is, e 0 ( x ) � = e 0 ( x , y ). 6/20

  8. Logistic representations 1. Rosenbaum’s sensitivity model: logit ( e ( x , u )) = g ( x ) + γ u , where 0 ≤ U ≤ 1 and γ = log Γ. 2. Marginal sensitivity model: logit ( e ( h ) ( x , y )) = logit ( e 0 ( x )) + h ( x , y ) , where � h � ∞ = sup | h ( x , y ) | ≤ γ . Due to this representation, we also call it a marginal L ∞ -sensitivity model . 3. Parametric marginal sensitivity model: logit ( e ( h ) ( x , y )) = logit ( e β 0 ( x )) + h ( x , y ) , where � h � ∞ = sup | h ( x , y ) | ≤ γ . 7/20

  9. Confidence interval I ◮ For simplicity, consider the “missing data” problem where Y = Y (1) is only observed if A = 1. ◮ Observe i.i.d. samples ( A i , X i , A i Y i ), i = 1 , . . . , n . ◮ The estimand is µ 0 = E 0 [ Y ], however it is only partially identified under a simultaneous sensitivity model. Goal 1 (Coverage of true parameter) Construct a data-dependent interval [ L , U ] such that � � P 0 µ 0 ∈ [ L , U ] ≥ 1 − α whenever e 0 ( X , Y ) = P 0 ( A = 1 | X , Y ) ∈ M (Γ). 8/20

  10. Confidence interval II ◮ The inverse probability weighting (IPW) identity: � AY � MAR AY � � E 0 [ Y ] = E = E . e 0 ( X , Y ) e 0 ( X ) ◮ Define � � AY µ ( h ) = E 0 e ( h ) ( X , Y ) ◮ Partially identified region: { µ ( h ) : e ( h ) ∈ M (Γ) } . Goal 2 (Coverage of partially identified region) Construct a data-dependent interval [ L , U ] such that � { µ ( h ) : e ( h ) ∈ M (Γ) } ⊆ [ L , U ] � ≥ 1 − α. P 0 ◮ Imbens and Manski (2004) have discussed the difference between these two Goals. 9/20

  11. An intuitive idea: “The Union Method” ◮ Suppose for any h , we have a confidence interval [ L ( h ) , U ( h ) ] such that n →∞ P 0 ( µ ( h ) ∈ [ L ( h ) , U ( h ) ]) ≥ 1 − α lim inf � h � L ( h ) and U = sup U ( h ) , so [ L , U ] is the union interval . ◮ Let L = inf � h � Theorem 1. [ L , U ] satisfies Goal 1 asymptotically. 2. Furthermore if the intervals are “congruent”: ∃ α ′ < α such that µ ( h ) < L ( h ) � µ ( h ) > U ( h ) � � ≤ α ′ , lim sup � ≤ α − α ′ . lim sup n →∞ P 0 n →∞ P 0 Then [ L , U ] satisfies Goal 2 asymptotically. 10/20

  12. Practical challenge: How to take the union? ◮ Suppose ˆ g ( x ) is an estimate of logit ( e 0 ( x )). ◮ For a specific difference h , we can estimate e ( h ) ( x , y ) by 1 e ( h ) ( x , y ) = ˆ g ( x , y ) . 1 + e h ( x , y ) − ˆ ◮ This leads to an (stabilized) IPW estimate of µ ( h ) : n � − 1 � 1 n � 1 � A i A i Y i µ ( h ) = � � ˆ . e ( h ) ( X i , Y i ) e ( h ) ( X i , Y i ) n ˆ n ˆ i =1 i =1 ◮ Under regularity conditions, the Z-estimation theory tells us µ ( h ) − µ ( h ) � d √ n � → N (0 , ( σ ( h ) ) 2 ) ˆ σ ( h ) 2 · ˆ µ ( h ) ∓ z α ◮ Therefore we can use [ L ( h ) , U ( h ) ] = ˆ √ n . ◮ However, computing the union interval requires solving a complicated optimization problem. 11/20

  13. Bootstrapping sensitivity analysis Point-identified parameter: Efron’s bootstrap Bootstrap Point estimator = = = = = = = = = = = = ⇒ Confidence interval Partially identified parameter: An analogy Optimization Percentile Bootstrap Minimax inequality Extrema estimator = = = = = = = = = = = = ⇒ Confidence interval A simple procedure for simultaneous sensitivity analysis 1. Generate B random resamples of the data. For each resample, compute the extrema of IPW estimates under M β 0 (Γ). 2. Construct the confidence interval using L = Q α/ 2 of the B minima and U = Q 1 − α/ 2 of the B maxima. Theorem [ L , U ] achieves Goal 2 for M β 0 (Γ) asymptotically. 12/20

  14. Proof of the Theorem Partially identified parameter: Three ideas Optimization 1. Percentile Bootstrap 2. Minimax inequality Extrema estimator = = = = = = = = = = = = ⇒ Confidence interval µ ( h ) can be captured by bootstrap. The 1. The sampling variability of ˆ percentile bootstrap CI is given by � � � � �� µ ( h ) µ ( h ) ˆ ˆ ˆ , Q 1 − α ˆ . Q α b b 2 2 2. Generalized minimax inequality: Percentile Bootstrap CI � � � � � � � � µ ( h ) µ ( h ) µ ( h ) µ ( h ) ˆ ˆ ˆ ˆ Q α inf ˆ ≤ inf h Q α ˆ ≤ sup Q 1 − α ˆ ≤ Q 1 − α sup ˆ . b b b b 2 2 2 2 h h h Union CI 13/20

  15. Computation Partially identified parameter: Three ideas 3. Optimization Percentile Bootstrap Minimax inequality Extrema estimator = = = = = = = = = = = = ⇒ Confidence interval µ ( h ) is a linear fractional programming : 3. Computing extrema of ˆ Let z i = e h ( X i , Y i ) , we just need to solve � n � 1 + z i e − ˆ g ( X i ) � i =1 A i Y i max or min g ( X i ) � , � n � 1 + z i e − ˆ i =1 A i z i ∈ [Γ − 1 , Γ] , i = 1 , . . . , n . subject to ◮ This can be converted to a linear programming. ◮ Moreover, the solution z must have the same/opposite order as Y , so the time complexity can be reduced to O ( n ) (optimal). The role of Bootstrap Comapred to the union method, the workflow is greatly simplified: 1. No need to derive σ ( h ) analytically (though we could). 2. No need to optimize σ ( h ) (which is very challenging). 14/20

Recommend


More recommend