calibrated bayes an attractive framework for official
play

Calibrated Bayes: an attractive framework for official statistics in - PowerPoint PPT Presentation

Calibrated Bayes: an attractive framework for official statistics in the 21st century Roderick J. Little Overview Design-based versus model-based survey inference Current orthodoxy: design-model compromise Strengths and drawbacks


  1. Calibrated Bayes: an attractive framework for official statistics in the 21st century Roderick J. Little

  2. Overview • Design-based versus model-based survey inference • Current orthodoxy: design-model compromise – Strengths and drawbacks • An alternative: Calibrated Bayes • Two US Census Bureau applications – Disclaimer: views are mine, not US Census Bureau 2 NTTS 2015: Calibrated Bayes

  3. Overview • Design-based versus model-based survey inference • Current orthodoxy: design-model compromise – Strengths and drawbacks • An alternative: Calibrated Bayes • Two US Census Bureau applications – Disclaimer: views are mine, not US Census Bureau 3 NTTS 2015: Calibrated Bayes

  4. Survey estimation • Design-based inference: population values are fixed, inference is based on probability distribution of sample selection. Obviously this assumes that we have a probability sample (or “quasi - randomization”, where we pretend that we have one) • Model-based inference: survey variables are assumed to come from a statistical model: probability sampling is not the basis for inference, but useful for making the sample selection ignorable . (see e.g. Gelman et al., 2003; Little 2004) 4 NTTS 2015: Calibrated Bayes

  5. Design vs model-based survey inference • Two main variants of model-based inference: – Superpopulation models : Frequentist inference based on repeated samples from a “ superpopulation ” model – Bayes : add prior distribution for parameters; inference about finite population quantities or parameters based on posterior distribution • A fascinating part of the more general debate about frequentist versus Bayesian inference in statistics at large: – Design-based inference is inherently frequentist – Purest form of model-based inference is Bayes 5 NTTS 2015: Calibrated Bayes

  6. Design-based inference   ( ,..., ) = population values (fixed); design variables Y Y Y Z 1 N  ( , ) = finite population quantity Q Q Y Z  ( ,..., ) = Sample Inclusion Indicators (random) I I I 1 N I i  R 1 , unit included in sample S T 0 , otherwise  part of included in the survey Y Y inc  ˆ ˆ ( , , ) = sample estimate of q q Y I Z Q inc ˆ ˆ ( , , ) = sample estimate of , the variance of V Y I Z V q inc   ˆ ˆ    ˆ ˆ 1.96 , 1.96 95% confidence interval for q V q V Q 6 NTTS 2015: Calibrated Bayes

  7. Choice of ˆ q Seek good design-based properties:  ˆ : ( | ) (too strong) design unbiasedness E q Y Q  ˆ Or weaker: : as sample size gets large design consistency q Q It is natural to seek an estimate that is - design efficient However, this kind of optimality is not possible without a model (Horvitz and Thompson 1952, Godambe 1955) There are many choices of design-consistent estimates ... Many survey estimates are motivated by mod els: implicit  Regression model regression estimator  Ratio model rat io estimator, etc. 7 NTTS 2015: Calibrated Bayes

  8. Limitations of design-based approach • Inference is based on probability sampling, but true probability samples are harder and harder to come by: – Noncontact, nonresponse is increasing – Face-to-face interviews increasingly expensive – High proportion of available information is now not based on probability samples (e.g. internet, administrative data) • Theory is basically asymptotic -- limited tools for small samples, e.g. small area estimation 8 NTTS 2015: Calibrated Bayes

  9. Asymptotia Highlands How many more to reach the promised land of asymptotia? Murky sub- asymptotial forests Design-based methods live in the land of asymptotia 9

  10. Model-based approaches • In model-based , or model-dependent , approaches, models are the basis for the entire inference: estimator, standard error, interval estimation • Two variants: – Superpopulation modeling – Bayesian (full probability) modeling • Common theme is to “infer” or “predict” about non - sampled portion of the population, conditional on the sample and model • Superpopulation is super, but Bayes is better … for small samples 10 NTTS 2015: Calibrated Bayes

  11. Bayes inference for surveys Model: ( | ) = prior distribution for p Y Z Y  Data: ampled values of ; = design variables Y s Y Z inc  Inference about ( , ) are based on Q Q Y Z posterior predictive distribution ( ( , ) | , ) p Q Y Z Y Z inc In particular:  ˆ One estimate is posterior mean: ( | , ) q E Q Y Z inc Standard error is posterior sd: ( | , ) Var Q Y Z inc 95% posterior probability interval plays role of confidence interval (with a simpler interpretat ion) 11 NTTS 2015: Calibrated Bayes

  12. Parametric models Usually prior distribution is specified via parametric models:      ( | ) ( | , ) ( | ) p Y Z p Y Z p Z d p Y Z  ( | , ) = parametric model, as in superpopulation approach   ( | ) = prior distribution for p Z  Inference about is then obtained from its posterior distribution, computed via Bayes’ Theorem:      ( | , ) ( | ) ( | , ) p Y Z p Z L Y Z inc inc   ( | , ) Likelihood function L Y Z inc That is: Posterior = Prior x Likelihood 12 NTTS 2015: Calibrated Bayes

  13. Example. Spline model on weights   n 1  Sample Population     / ; selection prob y y   HT i i i   Z Y Z N  1 i A modeling alternative to the HT estimator is create predictions from a more robust model relating to : Y Z   n N 1    ˆ ˆ = , predictions from model, e.g.: y  y y  y mod i i i   N    1 1 i i n    2 2 ~ Nor( , ); leads to y y i i i HT     2 k ~ Nor( ( ), ); ( ) = penalized spline of on y S S Y Z i i i i Simulations in Zheng and Little (2005) suggest better RMSE, confidence coverage for spline model compared with design-based approaches 13 NTTS 2015: Calibrated Bayes

  14. The model-based perspective- pros • Flexible, unified approach for all survey problems – Models for nonresponse, response and matching errors, small area models, combining data sources • Bayesian approach is not asymptotic, provides better small-sample inferences • Probability sampling is justified as making sampling mechanism ignorable, improving robustness 14 NTTS 2015: Calibrated Bayes

  15. Models bring survey inference closer to the statistical mainstream B/F Gorilla Why? I am an Follow my (frequentist) economist, I statistical standards build models! NTTS 2015: Calibrated Bayes 15

  16. The model-based perspective- cons • Explicit dependence on the choice of model, which has subjective elements (but assumptions are explicit, not buried in a formula) • Bad models provide bad answers – justifiable concerns about the effect of model misspecification • Models are needed for all survey variables – need to understand the data, and potential for more complex computations 16 NTTS 2015: Calibrated Bayes

  17. Overview • Design-based versus model-based survey inference • Current orthodoxy: design-model compromise – Strengths and drawbacks • An alternative: Calibrated Bayes • Two US Census Bureau applications – Disclaimer: views are mine, not US Census Bureau 17 NTTS 2015: Calibrated Bayes

  18. The current “status quo” -- design- model compromise • Design-based for large samples, descriptive statistics – But may be model assisted , e.g. regression calibration: N N   ˆ      ˆ ˆ ˆ ( ) / , model prediction T y I y y y GREG i i i i i i   i 1 i 1 – model estimates adjusted to protect against misspecification, (e.g. Särndal, Swensson and Wretman 1992). • Model-based for small area estimation, nonresponse, time series,… • Attempts to capitalize on best features of both paradigms… but … at the expense of “inferential schizophrenia” (Little 2012)? 18 NTTS 2015: Calibrated Bayes

  19. Example: when is an area “s mall ”? Design-based inference n n 0 = “Point of - inferential ----------------------------------- o schizophrenia” m Model-based inference e t e How do I choose n 0 ? r If n 0 = 35, should my entire statistical philosophy and inference be different when n=34 and n=36? n=36, CI: [ ] (wider since based on direct estimate) n=34, CI: [ ] (narrower since based on model) 19 NTTS 2015: Calibrated Bayes

  20. Multilevel (hierarchical Bayes) models Model estimate      ˆ (1 ) w y w  a a a a a n Direct estimate - 1 o w m a e t e 0 r Sample size n Bayesian multilevel model estimates borrow strength increasingly from model as n decreases 20 NTTS 2015: Calibrated Bayes

  21. Overview • Design-based versus model-based survey inference • Current orthodoxy: design-model compromise – Strengths and drawbacks • An alternative: Calibrated Bayes • Two US Census Bureau applications – Disclaimer: views are mine, not US Census Bureau 21 NTTS 2015: Calibrated Bayes

Recommend


More recommend