Pseudo-Bayesian Inference for Complex Survey Data Matt Williams 1 Terrance Savitsky 2 1 National Center for Science and Engineering Statistics National Science Foundation mrwillia@nsf.gov 2 Office of Survey Methods Research Bureau of Labor Statistics Savitsky.Terrance@bls.gov University of Michigan April 8, 2020 1
Thank you! ◮ Terrance Savitsky for being a great collaborator and mentor. ◮ Brady West and Jennifer Sinibaldi for making this connection. ◮ Jill Esau for orchestrating. ◮ You all for sharing your time today! 2
Bio 1. Work ◮ 9 years as mathematical statistical for federal government: USDA, HHS, NSF ◮ Sample design, weighting, imputation, estimation, disclosure limitation (production and methods development) 2. Consulting ◮ International surveys for agricultural production (USAID) and vaccination knowledge, attitudes, and behaviors (UNICEF) 3. Research (ORCID: 0000-0001-8894-1240) ◮ Constrained Optimization for Survey Applications (weight adjustment, benchmarking model estimates) ◮ Applying Bayesian inference methods to data from complex surveys. 3
Outline 1 Informative Sampling (Savitsky and Toth, 2016) 2 Theory and Examples Consistency (Williams and Savitsky, 2020) Uncertainty Quantification (Williams and Savitsky, in press) 3 Implementation Details Model Fitting Variance Estimation 4 Related and Current Works 4
Outline 1 Informative Sampling (Savitsky and Toth, 2016) 2 Theory and Examples Consistency (Williams and Savitsky, 2020) Uncertainty Quantification (Williams and Savitsky, in press) 3 Implementation Details Model Fitting Variance Estimation 4 Related and Current Works 5
Example: Informative Sampling ◮ Take a sample from U.S. population of business establishments ◮ Single stage, fixed-size, pps sampling design ◮ y = (e.g., Hires, Separations) ◮ Size variable is total employment, x ◮ y �⊥ x . ◮ B = 500 Monte Carlo samples at each of n ν = (100 , 500 , 1500 , 2500) establishments 6
Distributions of y in Informative Samples Hires Seps 400 300 Distribution of Response Values 200 100 0 1000 2000 1000 2000 pop 100 500 pop 100 500 Sample Size 7
Population Inference from Informative Samples ◮ Goal: perform inference about a finite population generated from an unknown model, P θ 0 ( y ). ◮ Data: from under a complex sampling design distribution, P ν ( δ ) ◮ Probabilities of inclusion π i = Pr ( δ i = 1 | y ) are often associated with the variable of interest (purposefully) ◮ Sampling designs are “informative”: the balance of information in the sample � = balance in the population. ◮ Biased Estimation: estimate P θ 0 ( y ) without accounting for P ν ( δ ). ◮ Use inverse probability weights w i = 1 /π i to mitigate bias. ◮ Incorrect Uncertainty Quantification: ◮ Failure to account for dependence induced by P ν ( δ ) leads to standard errors and confidence intervals that are the wrong size. 8
Outline 1 Informative Sampling (Savitsky and Toth, 2016) 2 Theory and Examples Consistency (Williams and Savitsky, 2020) Uncertainty Quantification (Williams and Savitsky, in press) 3 Implementation Details Model Fitting Variance Estimation 4 Related and Current Works 9
Why Bayes? ◮ Allows more complex, non-parametric (semi-supervised) models ◮ Use hierarchical modeling to capture rich dependence in data ◮ Have small sample properties from posterior distribution ◮ Full uncertainty quantification ◮ Gold standard for imputation 10
Pseudo Posterior ◮ Pseudo posterior ∝ Pseudo Likelihood × Prior � n � p π ( θ | y , ˜ p ( y i | θ ) ˜ w i � w ) ∝ p ( θ ) i =1 w i := 1 π i w i w i = ˜ , i = 1 , . . . , n � w i n 11
Similar Posterior Geometry y i | µ i , Φ − 1 � w i ∝ N P y i | µ i , [ w i Φ ] − 1 � � � N P n � ◮ normalize weights, w i = n , to scale posterior i =1 12
Outline 1 Informative Sampling (Savitsky and Toth, 2016) 2 Theory and Examples Consistency (Williams and Savitsky, 2020) Uncertainty Quantification (Williams and Savitsky, in press) 3 Implementation Details Model Fitting Variance Estimation 4 Related and Current Works 13
Pseudo Posterior Contraction - Count Data Distribution within 95% CI for Coefficient 0.6 0.7 0.8 0.9 0.5 0.6 0.7 0.8 pop weight 500 N × D ignore Ψ y id srs ∼ ind ∼ Pois (exp ( ψ id )) N × P pop X weight 1000 P × D B + N N × D ignore Sample Size srs pop weight 1500 � ignore I N , srs D × D Λ − 1 pop � weight 2500 ignore srs Emp_Seps Emp_Hires 14
Frequentist Consistency of a (Pseudo) Posterior ◮ Estimated distribution p π ( θ | y , ˜ w ) collapses around generating parameter θ 0 with increasing population N ν and sample n ν sizes. ◮ Evaluated with respect to joint distribution of population generation P θ 0 ( y ) and the sample inclusion indicators P ν ( δ ). ◮ Conditions on the model P θ 0 ( y ) (standard) ◮ Complexity of the model limited by sample size ◮ Prior distribution not too restrictive (e.g. point mass) ◮ Conditions on the sampling design P ν ( δ ) (new) ◮ Every unit in population has non-zero probability of inclusion = ⇒ finite weights ◮ Dependence restricted to countable blocks of bounded size = ⇒ arbitrary dependence within clusters, but approximate independence between clusters. 15
Simulation Example: Three-Stage Sample Area (PPS), Household (Systematic, sorting by Size), Individual (PPS) Deviation Deviation 1.0 40 0.5 30 0.0 20 −0.5 10 0 −1.0 Figure: Factorization matrix ( π ij / ( π i π j ) − 1) for two PSU’s. Magnitude (left) and Sign (right). Systematic Sampling ( π ij = 0). Clustering and PPS sampling ( π ij > π i π j ). Independent first stage sample ( π ij = π i π j ) 16
Simulation Examples: Logistic Regression ◮ ind y i | µ i ∼ Bern ( F l ( µ i )) , i = 1 , . . . , N ◮ µ = − 1 . 88 + 1 . 0 ① 1 + 0 . 5 ① 2 ◮ The x 1 and x 2 distributions are N (0 , 1) and E ( r = 1 / 5) with rate r ◮ Size measure used for sample selection is ˜ ① 2 = ① 2 − min( ① 2 ) + 1, but neither ˜ ① 2 or ① 2 are available to the analyst. ◮ Intercept chosen so median of µ ≈ 0 → median of F l ( µ ) ≈ 0 . 5. 17
Simulation Example: Three-Stage Sample (Cont) 50 100 200 400 800 2 1 Curve 0 −1 0.0 −2.5 logBias −5.0 −7.5 0 −1 logMSE −2 −3 −4 −5 −2 −1 0 1 2 −2 −1 0 1 2 −2 −1 0 1 2 −2 −1 0 1 2 −2 −1 0 1 2 x Figure: The marginal estimate of µ = f ( x 1 ). population curve , sample with equal weights , and inverse probability weights . Top to bottom: estimated curve, log of BIAS, log MSE. Left to right: sample size (50 to 800). 18
Outline 1 Informative Sampling (Savitsky and Toth, 2016) 2 Theory and Examples Consistency (Williams and Savitsky, 2020) Uncertainty Quantification (Williams and Savitsky, in press) 3 Implementation Details Model Fitting Variance Estimation 4 Related and Current Works 19
Asymptotic Variances ◮ Let ℓ θ ( ② ) = log p ( ② | θ ). ◮ Rely on the variance and expected curvature of the score function ℓ θ 0 = ∂ 2 ℓ ˙ ∂θ | θ = θ 0 with ¨ ℓ θ 0 = ∂ℓ ∂ 2 θ | θ = θ 0 i ∈ U ν E P θ 0 ¨ ◮ H θ 0 = − 1 � ℓ θ 0 ( y ν i ) N ν i ∈ U ν E P θ 0 ˙ ℓ θ 0 ( y ν i ) ˙ 1 ℓ θ 0 ( y ν i ) T ◮ J θ 0 = � N ν ◮ Under correctly specified models: ◮ H θ 0 = J θ 0 (Bartlett’s second identity) ◮ Posterior variance N ν V ( θ | ② ) = H − 1 same as variance of MLE θ 0 (Bernstein-von Mises) 20
Scaling and Warping of Pseudo MLE ◮ Mispecified (under-specified) full joint sampling distribution P ν ( δ ). ◮ Failure of Bartlett’s Second Identity for composite likelihood ◮ Asymptotic Covariance: H − 1 θ 0 H − 1 θ 0 J π θ 0 ◮ Simple Random Sampling: J π θ 0 = J θ 0 ◮ Unequal weighting: J π θ 0 ≥ J θ 0 �� 1 N ν � � θ 0 = J θ 0 + 1 � ℓ θ 0 ( y ν i ) ˙ ˙ J π ℓ θ 0 ( y ν i ) T E P θ 0 − 1 π ν i N ν i =1 ◮ Shape of asymptotic distribution warped by unequal weighting ∝ 1 π ν i ◮ If less efficient (cluster) sampling design : J π θ 0 ≥ J θ 0 ◮ If more efficient (stratified) sampling design : J π θ 0 ≤ J θ 0 21
Asymptotic Covariances Different ◮ Pseudo MLE: H − 1 θ 0 H − 1 θ 0 J π θ 0 (Robust) ◮ Pseudo Posterior: H − 1 θ 0 (Model-based) ◮ The un-adjusted pseudo-posterior will give the wrong coverage of uncertainty regions. 22
Adjust Pseudo Posterior draws to Sandwich ◮ ˆ θ m ≡ sample pseudo posterior for m = 1 , . . . , M draws with mean ¯ θ � � ◮ ˆ θ m − ¯ ˆ R − 1 2 R 1 + ¯ θ a m = θ θ 1 R 1 = H − 1 θ 0 H − 1 ◮ where R ′ θ 0 J π θ 0 2 R 2 = H − 1 ◮ R ′ θ 0 23
Adjustment Procedure ◮ Procedure to compute adjustment, ˆ θ a m ◮ Input ˆ θ m drawn from single run of MCMC ◮ Re-sample data under sampling design ◮ Draw PSUs (clusters) without replacement ◮ Compute ˆ H θ 0 and ˆ J π θ 0 ◮ Expectations with respect to P θ 0 , P ν � N ν ◮ Let P π 1 δ ν i N ν = π ν i δ ( y ν i ) i =1 N ν � � N ν ˙ ◮ J π P π θ 0 = Var P θ 0 , P ν ℓ θ 0 � � N ν ¨ ◮ H π P π θ 0 = − E P θ 0 , P ν ℓ θ 0 = H θ 0 24
Recommend
More recommend