bootstrapping sensitivity analysis
play

Bootstrapping Sensitivity analysis Qingyuan Zhao Statistical - PowerPoint PPT Presentation

Bootstrapping Sensitivity analysis Qingyuan Zhao Statistical Laboratory, University of Cambridge August 3, 2020 @ JSM Sensitivity analysis The broader concept [Saltelli et al., 2004] Sensitivity analysis is the study of how the


  1. Bootstrapping Sensitivity analysis Qingyuan Zhao Statistical Laboratory, University of Cambridge August 3, 2020 @ JSM

  2. Sensitivity analysis The broader concept [Saltelli et al., 2004] ◮ Sensitivity analysis is “the study of how the uncertainty in the output of a mathematical model or system (numerical or otherwise) can be apportioned to different sources of uncertainty in its inputs ”. ◮ Model inputs may be any factor that “can be changed in a model prior to its execution”, including” “structural and epistemic sources of uncertainty”. In observational studies ◮ The most typical question is: How do the qualitative and/or quantitative conclusions of the observational study change if the no unmeasured confounding assumption is violated? 1/18

  3. Sensitivity analysis for observational studies State of the art ◮ Gazillions of methods specifically designed for different problems. ◮ Various forms of statistical guarantees. ◮ Often not straightforward to interpret Goals of this talk 1. What is the common structure behind various methods for sensitivity analysis? 2. Can we bootstrap sensitivity analysis? 2/18

  4. What is a sensitivity model? General setup infer Observed data O = ⇒ Distribution of the full data F . ◮ Prototypical example: Observe iid copies of O = ( X , A , Y ) from the underlying full data F = ( X , A , Y (0) , Y (1)), where A is a binary treatment, X is covariates, Y is outcome. An abstraction A sensitivity model is a family of distributions F θ,η of F that satisfies: 1. Augmentation: Setting η = 0 corresponds to a primary analysis assuming no unmeasured confounders. 2. Model identifiability: Given η , the implied marginal distribution O θ,η of the observed data O is identifiable. Statistical problem Given η (or the range of η ), use the observed data to make inference about some causal parameter β = β ( θ, η ). 3/18

  5. Understanding sensitivity models Observational equivalence ◮ F θ,η and F θ ′ ,η ′ are said to be observationally equivalent if O θ,η = O θ ′ ,η ′ . We write this as F θ,η ≃ F θ ′ ,η ′ . ◮ Equivalence class [ F θ,η ] = {F θ ′ ,η ′ | F θ,η ≃ F θ ′ ,η ′ } . Types of sensitivity models Testable models When F θ,η is not rich enough, [ F θ,η ] is a singleton and η can be identified from the observed data (should be avoided in practice). Global models For any ( θ, η ) and η ′ , there exists F θ ′ ,η ′ ≃ F θ,η . Separable models For any ( θ, η ), F θ,η ≃ F θ, 0 . 4/18

  6. A visualization η η [F θ , η ] [F θ , η ] θ θ Left: Global sensitivity models; Right: Separable sensitivity models. 5/18

  7. Statistical inference Modes of inference 1. Point identified sensitivity analysis is performed at a fixed η . 2. Partially identified sensitivity analysis is performed simultaneously over η ∈ H for a given range H . Statistical guarantees of interval estimators 1. Confidence interval [ C L ( O 1: n ; η ) , C U ( O 1: n ; η )] satisfies � � inf β ( θ 0 , η 0 ) ∈ [ C L ( η 0 ) , C U ( η 0 )] ≥ 1 − α. θ 0 ,η 0 P θ 0 ,η 0 2. Sensitivity interval [ C L ( O 1: n ; H ) , C U ( O 1: n ; H )] satisfies � � inf β ( θ 0 , η 0 ) ∈ [ C L ( H ) , C U ( H )] ≥ 1 − α. (1) θ 0 ,η 0 P θ 0 ,η 0 They look almost the same, but because the latter interval only depends on H , (1) is actually equivalent to � � inf inf β ( θ, η ) ∈ [ C L ( H ) , C U ( H )] ≥ 1 − α. P θ 0 ,η 0 θ 0 ,η 0 F θ,η ≃F θ 0 ,η 0 6/18

  8. Approaches to sensitivity analysis ◮ Point identified sensitivity analysis is basically the same as primary analysis with known “offset” η . ◮ Partially identified sensitivity analysis is much harder. Let F θ 0 ,η 0 be the truth. The fundamental problem is to make inference about η ∈ H { β ( θ, η ) | F θ,η ≃ F θ 0 ,η 0 } and sup inf { β ( θ, η ) | F θ,η ≃ F θ 0 ,η 0 } η ∈ H Method 1 Solve the population optimization problems analytically. ◮ Not always feasible. Method 2 Solve the sample approximation problem and use asymptotic normality. ◮ Central limit theorems not always true or established. Method 3 Take the union of confidence intervals � [ C L ( H ) , C U ( H )] = [ C L ( η ) , C U ( η )] . η ∈ H ◮ By the union bound, this is a (1 − α )-sensitivity interval if all [ C L ( η ) , C U ( η )] are (1 − α )-confidence intervals. 7/18

  9. Computational challenges for Method 3 � [ C L ( H ) , C U ( H )] = [ C L ( η ) , C U ( η )] . η ∈ H ◮ Using asymptotic theory, it is often not difficult to construct asymptotic confidence intervals of the form 2 · ˆ σ ( η ) [ C L ( η ) , C U ( η )] = ˆ β ( η ) ∓ z α √ n ◮ Unlike Method 2 that only needs to optimize ˆ β ( η ), Method 3 further needs to optimize the usually much more complicated ˆ σ ( η ) over η ∈ H . 8/18

  10. Method 4: Percentile bootstrap 1. For fixed η , use the percentile bootstrap confidence interval ( b is an index for data resample) � ˆ � ˆ � � �� ˆ ˆ [ C L ( η ) , C U ( η )] = Q α β b ( η ) , Q 1 − α β b ( η ) . 2 2 2. Use the generalized minimax inequality to interchange quantile and infimum/supremum: Percentile bootstrap sensitivity interval ˆ � ˆ � ˆ ˆ � � � � � � ˆ ˆ ˆ ˆ Q α inf β b ( η ) ≤ inf η Q α β b ( η ) ≤ sup η Q 1 − α β b ( η ) ≤ Q 1 − α sup β b ( η ) . 2 2 2 2 η η Union sensitivity interval Advantages ◮ Computation is reduced to repeating Method 2 over data resamples. ◮ Only need coverage guarantee for [ C L ( η ) , C U ( η )] for fixed η . 9/18

  11. Bootstrapping sensitivity analysis Point-identified parameter: Efron’s bootstrap Bootstrap Point estimator = = = = = = = = = = = = ⇒ Confidence interval Partially identified parameter: Three ideas Optimization Percentile Bootstrap Minimax inequality Extrema estimator = = = = = = = = = = = = ⇒ Sensitivity interval Rest of the talk Apply this idea to IPW estimators for a marginal sensitivity model. 10/18

  12. Our sensitivity model ◮ Consider the prototypical example: A is a binary treatment, X is covariates, Y is outcome. ◮ U “summarizes” unmeasured confounding, so A ⊥ ⊥ Y (0) , Y (1) | X , U . ◮ Let e 0 ( x ) = P 0 ( A = 1 | X = x ) , e ( x , u ) = P ( A = 1 | X = x , U = u ) . Marginal sensitivity models e ( x , u ) : 1 � � E M (Γ) = Γ ≤ OR ( e ( x , u ) , e 0 ( x )) ≤ Γ , ∀ x ∈ X , y . ◮ Compare this to the Rosenbaum [2002] model: e ( x , u ) : 1 � � E R (Γ) = Γ ≤ OR ( e ( x , u 1 ) , e ( x , u 2 )) ≤ Γ , ∀ x ∈ X , u 1 , u 2 . ◮ Tan [2006] first considered the marginal model, but he did not consider statistical inference in finite sample. √ Γ) ⊆ E R (Γ) ⊆ E M (Γ) . 1 ◮ Relationship between the two models: E M ( 1 The second part needs “compatibility”: e ( x , y ) should marginalize to e 0 ( x ). 11/18

  13. Parametric extension ◮ In practice, the propensity score e 0 ( X ) = P 0 ( A = 1 | X ) is often estimated by a parametric model. Parametric marginal sensitivity models e ( x , u ) : 1 � � E M (Γ , β 0 ) = Γ ≤ OR ( e ( x , u ) , e β 0 ( x )) ≤ Γ , ∀ x ∈ X , y ◮ e β 0 ( x ) is the best parametric approximation to e 0 ( x ). This sensitivity model covers both 1. Model misspecification , that is, e β 0 ( x ) � = e 0 ( x ); and 2. Missing not at random , that is, e 0 ( x ) � = e ( x , u ). 12/18

  14. Logistic representations 1. Rosenbaum’s sensitivity model: logit ( e ( x , u )) = g ( x ) + u log Γ , where 0 ≤ U ≤ 1. 2. Marginal sensitivity model: logit ( e η ( x , u )) = logit ( e 0 ( x )) + η ( x , u ) , where η ∈ H Γ = { η ( x , u ) | � η � ∞ = sup | η ( x , u ) | ≤ log Γ } . 3. Parametric marginal sensitivity model: logit ( e η ( x , u )) = logit ( e β 0 ( x )) + η ( x , u ) , where η ∈ H Γ . 13/18

  15. Computation Bootstrapping partially identified sensitivity analysis Optimization Percentile Bootstrap Minimax inequality Extrema estimator = = = = = = = = = = = = ⇒ Sensitivity interval ◮ Stabilized inverse-probability weighted (IPW) estimator for β = E [ Y (1)]: n � − 1 � 1 n � 1 A i A i Y i � ˆ � � β ( η ) = , n ˆ e η ( X i , U i ) n e η ( X i , U i ) ˆ i =1 i =1 where ˆ e η can be obtained by plugging in an estimator of β 0 . ◮ Computing extrema of ˆ β ( η ) is a linear fractional programming : Let h i = exp {− η ( X i , U i ) } and g i = 1 / e ˆ β 0 ( X i ), � n i =1 A i Y i [1 + h i ( g i − 1)] max or min i =1 A i [1 + h i ( g i − 1)] , � n h i ∈ [Γ − 1 , Γ] , i = 1 , . . . , n . subject to This can be converted to a linear programming and can in fact be solved in O ( n ) time (optimal rate). 14/18

  16. Example Fish consumption and blood mercury ◮ 873 controls: ≤ 1 serving of fish per month. ◮ 234 treated: ≥ 12 servings of fish per month. ◮ Covariates: gender, age, income (very imblanced), race, education, ever smoked, # cigarettes. Implementation details ◮ Rosenbaum’s method: 1-1 matching, CI constructed by Hodges-Lehmann (assuming causal effect is constant). ◮ Our method (percentile Bootstrap): stabilized IPW for ATT w/wo augmentation by outcome linear regression. 15/18

Recommend


More recommend