Calibrated Bayes: an attractive framework for official statistics in - PowerPoint PPT Presentation

Calibrated Bayes: an attractive framework for official statistics in the 21st century Roderick J. Little

Overview • Design-based versus model-based survey inference • Current orthodoxy: design-model compromise – Strengths and drawbacks • An alternative: Calibrated Bayes • Two US Census Bureau applications – Disclaimer: views are mine, not US Census Bureau 2 NTTS 2015: Calibrated Bayes

Survey estimation • Design-based inference: population values are fixed, inference is based on probability distribution of sample selection. Obviously this assumes that we have a probability sample (or “quasi - randomization”, where we pretend that we have one) • Model-based inference: survey variables are assumed to come from a statistical model: probability sampling is not the basis for inference, but useful for making the sample selection ignorable . (see e.g. Gelman et al., 2003; Little 2004) 4 NTTS 2015: Calibrated Bayes

Design vs model-based survey inference • Two main variants of model-based inference: – Superpopulation models : Frequentist inference based on repeated samples from a “ superpopulation ” model – Bayes : add prior distribution for parameters; inference about finite population quantities or parameters based on posterior distribution • A fascinating part of the more general debate about frequentist versus Bayesian inference in statistics at large: – Design-based inference is inherently frequentist – Purest form of model-based inference is Bayes 5 NTTS 2015: Calibrated Bayes

Design-based inference   ( ,..., ) = population values (fixed); design variables Y Y Y Z 1 N  ( , ) = finite population quantity Q Q Y Z  ( ,..., ) = Sample Inclusion Indicators (random) I I I 1 N I i  R 1 , unit included in sample S T 0 , otherwise  part of included in the survey Y Y inc  ˆ ˆ ( , , ) = sample estimate of q q Y I Z Q inc ˆ ˆ ( , , ) = sample estimate of , the variance of V Y I Z V q inc   ˆ ˆ    ˆ ˆ 1.96 , 1.96 95% confidence interval for q V q V Q 6 NTTS 2015: Calibrated Bayes

Choice of ˆ q Seek good design-based properties:  ˆ : ( | ) (too strong) design unbiasedness E q Y Q  ˆ Or weaker: : as sample size gets large design consistency q Q It is natural to seek an estimate that is - design efficient However, this kind of optimality is not possible without a model (Horvitz and Thompson 1952, Godambe 1955) There are many choices of design-consistent estimates ... Many survey estimates are motivated by mod els: implicit  Regression model regression estimator  Ratio model rat io estimator, etc. 7 NTTS 2015: Calibrated Bayes

Limitations of design-based approach • Inference is based on probability sampling, but true probability samples are harder and harder to come by: – Noncontact, nonresponse is increasing – Face-to-face interviews increasingly expensive – High proportion of available information is now not based on probability samples (e.g. internet, administrative data) • Theory is basically asymptotic -- limited tools for small samples, e.g. small area estimation 8 NTTS 2015: Calibrated Bayes

Asymptotia Highlands How many more to reach the promised land of asymptotia? Murky sub- asymptotial forests Design-based methods live in the land of asymptotia 9

Model-based approaches • In model-based , or model-dependent , approaches, models are the basis for the entire inference: estimator, standard error, interval estimation • Two variants: – Superpopulation modeling – Bayesian (full probability) modeling • Common theme is to “infer” or “predict” about non - sampled portion of the population, conditional on the sample and model • Superpopulation is super, but Bayes is better … for small samples 10 NTTS 2015: Calibrated Bayes

Bayes inference for surveys Model: ( | ) = prior distribution for p Y Z Y  Data: ampled values of ; = design variables Y s Y Z inc  Inference about ( , ) are based on Q Q Y Z posterior predictive distribution ( ( , ) | , ) p Q Y Z Y Z inc In particular:  ˆ One estimate is posterior mean: ( | , ) q E Q Y Z inc Standard error is posterior sd: ( | , ) Var Q Y Z inc 95% posterior probability interval plays role of confidence interval (with a simpler interpretat ion) 11 NTTS 2015: Calibrated Bayes

Parametric models Usually prior distribution is specified via parametric models:      ( | ) ( | , ) ( | ) p Y Z p Y Z p Z d p Y Z  ( | , ) = parametric model, as in superpopulation approach   ( | ) = prior distribution for p Z  Inference about is then obtained from its posterior distribution, computed via Bayes’ Theorem:      ( | , ) ( | ) ( | , ) p Y Z p Z L Y Z inc inc   ( | , ) Likelihood function L Y Z inc That is: Posterior = Prior x Likelihood 12 NTTS 2015: Calibrated Bayes

Example. Spline model on weights   n 1  Sample Population     / ; selection prob y y   HT i i i   Z Y Z N  1 i A modeling alternative to the HT estimator is create predictions from a more robust model relating to : Y Z   n N 1    ˆ ˆ = , predictions from model, e.g.: y  y y  y mod i i i   N    1 1 i i n    2 2 ~ Nor( , ); leads to y y i i i HT     2 k ~ Nor( ( ), ); ( ) = penalized spline of on y S S Y Z i i i i Simulations in Zheng and Little (2005) suggest better RMSE, confidence coverage for spline model compared with design-based approaches 13 NTTS 2015: Calibrated Bayes

The model-based perspective- pros • Flexible, unified approach for all survey problems – Models for nonresponse, response and matching errors, small area models, combining data sources • Bayesian approach is not asymptotic, provides better small-sample inferences • Probability sampling is justified as making sampling mechanism ignorable, improving robustness 14 NTTS 2015: Calibrated Bayes

Models bring survey inference closer to the statistical mainstream B/F Gorilla Why? I am an Follow my (frequentist) economist, I statistical standards build models! NTTS 2015: Calibrated Bayes 15

The model-based perspective- cons • Explicit dependence on the choice of model, which has subjective elements (but assumptions are explicit, not buried in a formula) • Bad models provide bad answers – justifiable concerns about the effect of model misspecification • Models are needed for all survey variables – need to understand the data, and potential for more complex computations 16 NTTS 2015: Calibrated Bayes

The current “status quo” -- design- model compromise • Design-based for large samples, descriptive statistics – But may be model assisted , e.g. regression calibration: N N   ˆ      ˆ ˆ ˆ ( ) / , model prediction T y I y y y GREG i i i i i i   i 1 i 1 – model estimates adjusted to protect against misspecification, (e.g. Särndal, Swensson and Wretman 1992). • Model-based for small area estimation, nonresponse, time series,… • Attempts to capitalize on best features of both paradigms… but … at the expense of “inferential schizophrenia” (Little 2012)? 18 NTTS 2015: Calibrated Bayes

Example: when is an area “s mall ”? Design-based inference n n 0 = “Point of - inferential ----------------------------------- o schizophrenia” m Model-based inference e t e How do I choose n 0 ? r If n 0 = 35, should my entire statistical philosophy and inference be different when n=34 and n=36? n=36, CI: [ ] (wider since based on direct estimate) n=34, CI: [ ] (narrower since based on model) 19 NTTS 2015: Calibrated Bayes

Multilevel (hierarchical Bayes) models Model estimate      ˆ (1 ) w y w  a a a a a n Direct estimate - 1 o w m a e t e 0 r Sample size n Bayesian multilevel model estimates borrow strength increasingly from model as n decreases 20 NTTS 2015: Calibrated Bayes

Calibrated Bayes: an attractive framework for official statistics in - PowerPoint PPT Presentation

Calibrated Bayes: an attractive framework for official statistics in the 21st century Roderick J. Little Overview Design-based versus model-based survey inference Current orthodoxy: design-model compromise Strengths and drawbacks

OFFICIAL OFFICIAL OFFICIAL OFFICIAL OFFICIAL OFFICIAL OFFICIAL The OCS NEC Group

Calibrated Bayes, and Inferential Paradigm for Of7icial Statistics in the Era of Big Data Rod

The Calibrated Bayes Factor for Model Comparison Steve MacEachern The Ohio State University

Official Statistics Matt Dray, Assistant Statistician Official Statistics 2 Official

Naive Bayes and Gaussian Bayes Classifier Ladislav Rampasek slides by Mengye Ren and others

The Nave Bayes Classifier Machine Learning 1 Todays lecture The nave Bayes Classifier

Bayes Theorem Thomas Bayes (1701-1761) Simple form of Bayes Theorem, for

5 Official 5 Official 5 Official 5 Official Run Zone Coverage Run Zone Coverage Run Zone

DATA MINING: NAVE BAYES 1 Nave Bayes Classifier Thomas Bayes 1702 - 1761 We will start off

Cognitive Modeling Unseen Examples 2 Bayes Classifiers Lecture 14: Naive Bayes Classifiers

STAT 339 Naive Bayes Classification 8-10 March 2017 Colin Reimer Dawson Outline Naive Bayes

Bayes Classifiers Nave Bayes Classification Patrick Mair Bayes Classifiers Weather data

I ntroduction to Mobile Robotics Bayes Filter Kalm an Filter Wolfram Burgard 1 Bayes

Prsentation gnrale Official service providers Official service providers Official service

On-demand radio imaging On-demand radio imaging access to calibrated data for all astronomers

Convex Calibrated Surrogates for Low-Rank Loss Matrices with Applications to Subset Ranking

Decision problems September 4, 2019 . . . . . . . . . . . . . . . . . . . . .

Bayesian Parametrics: How to Develop a CER with Limited Data and Even without Data Christian

Willis P. Whichard Rotary Club of Durham Durham, North Carolina Monday, January 27, 2020 12:30

Uranium Medical Research Uranium Medical Research Centre Centre Health Consequences of Health

Acknowledgment v At the start of any new venture, it is my

Bayesian Methods in Reliability Engineering ASQ Reliability Division Webinar Program Nov 15 th

Biostatistics 602 - Statistical Inference March 14th, 2013 Biostatistics 602 - Lecture 16 Hyun

EST5104 Bayesian Inference EST5803 Advanced Bayesian Inference Ricardo Ehlers ehlers@icmc.usp.br

Sambuz

Useful Links

Newsletter

Mail Us