In Inconsis istent Regressio ion and Nonresponse bia ias: Exp xplo lorin ing th their ir Rela latio ionship ip as s a Functio ion of f Resp sponse Im Imbala lance Carl-Erik Särndal (presenter) & Peter Lundquist Statistics Sweden Workshop on Responsive and Adaptive Survey Design Washington DC March 14, 2018
My presentation - The paper as published in JOS is technical - for details, see the paper - focus here more on experience from a research activity - and on future prospects & utility of adaptive survey design - my personal impressions from research mainly with Statistics Sweden I am not a spokesperson for Statistics Sweden; just an independent researcher.
My presentation The JOS paper - and our other published work - looks at adaptive design from a particular point of view : Focus on the (improved) accuracy in estimation that may result from use of adaptive design - it is estimation theory There are other important aspects of adaptive design : cost efficiency, timeliness, etc., not covered here Question arising: Is adaptive design “worth the effort” for an agency such as Statisticss Sweden.
The institutional context We think in terms of : A National Statistical Institute (NSI) that • is experiencing high nonresponse in important probability sampling surveys, • can rely on a vast supply of auxiliary information (admin registers & paradata) • uses calibrated weighting adjustment as a standard in estimation • asks itself: is adaptive design important from accuracy (or other) points of view Statistics Sweden fits the description; other NSI:s may not
Nonresponse dramatically increasing – Swedish Labour Fource Survey 2002-2016 60% 50% 40% 15-24 år 25-54 år 30% 55-74 år 15-74 år 20% 10% 0% 2002 2004 2006 2008 2010 2012 2014 2016
The institutional point of view Statistics Sweden is aware: That high nonresponse causes (high) bias in estimates And that effective tools are needed To reduce nonresponse bias in survey estimates (for official statistics) But for this, can rely on well-established devise: Calibrated weighting adjustment
Sampling frame : Register of the population of Sweden Population ( U ) Response set ( r ) Sample ( s ) s is a probability sample : known positive inclusion probabilities r = response set (Note r is not ”the sample ”. s is the sample)
The institutional point of view Statistics Sweden : Calibrated adjustment weighting at the estimation stage: is a trusted tool that goes a good part of the way to eliminate nonresponse bias. Is considered, right or wrong , as “sufficiently good ” under the (regrettable) circumstances: Why do anything else?
Calibrated adjustment weighting Calibration theory & Calibrated adjustment weighting studied and practiced at Statistics Sweden since 1995 Statistics Sweden in a sense a victim of the success of calibration thinking (theory to which I contributed from 1980’s and on), Statistics Sweden’s attitude could be expressed as: “We have a tool; we know how to calibrate; we have highly qualified “calibrators”, don’t need adaptive design from the accuracy point of view”
Calibrated adjustment weighting Calibrated (or other) adjustment weighting may go part of the way but is insufficient for eliminating nonresponse bias. Deviation from unbiased estimate can be traced to regression incorrectly estimated from the response (of survey variable y on auxiliary vector x ) ; a selection effect; missing not at random (MNAR).
Cornerstones for official statistics from sample surveys For an NSI like Statistics Sweden these include : - probability sampling (from register of Total Swedish Population) - calibrated weighting in the estimation, using much auxiliary information A fixation on the importance of the realized final survey response rate.
Imbalance - a core concept of adaptive design computable throughout the data collection period of the current response set r from given probability sample s with respect to chosen “monitoring vector” x : Σ 2 1 ( , ) IMB r x s P ( x x ) ( x x ) r s s r s A simple descriptive measure contrasting current response r with given sample s P d / d response rate k k r s Σ x x d / d weighting matrix s k k k k s s d inverse inclusion probabilityof unit k … . k
Central question: Low imbalance more accurate estimation of Y y ? k U ˆ ˆ Consider the deviation Y Y between CAL FUL the CALibration estimator based on an extensive auxiliary vector x and the unbiased FULsample estimator (hypothetical under nonresponse) Have done: Theoretical and empirical work, on data from important Statistics Sweden surveys such as the Labour Force Survey
Research work on adaptive design Since 2012, co-authored several papers on the theme : “Improved accuracy in estimation by reduced imbalance (adaptive design)” in co-operation with: - Statistics Sweden - Stockholm University - University of Tartu, Estonia Accomplishment is of two kinds : Getting theoretical work published in the good journals is one thing; Getting the work implemented (in a statistical agency) is another
Deviation of CALibration estimate from unbiased FUL sample estimate ˆ ˆ ( ) / ( ) see articlefordetails Y Y N x b b JOS CAL FUL s r s differencebetween regression vectors b b r s (biased) regression b r computed on response r vis-à-vis (unbiased) regression b s for the whole sample s x x Reducing imbalance IMB is about reducing r s ? b b But will reduced IMB bring reduced difference r s …
We find Scaled deviation of CAL estimator from unbiased FUL estimator ˆ ˆ ( Y Y ) / N x b ( b ) CAL FUL s r s is to a degree reduced by reduced IMB C onclusion in JOS article: “The message is that modest expectations for better accuracy are in order, rather than hopes of great payoff, when imbalance is reduced through adaptive data collection.” The deviation (the nonresponse bias) is not eliminated, not even if IMB can be made to approach zero.
The ultimate question is : Do we need probability sampling ? See it this way: - Adaptive design operates on a drawn probability sample , in order to get from it a set of respondents with low imbalance - Any further gain of accuracy (in estimates) from low imbalance is limited, given that calibrated adjustment weighting takes place at the estimation stage - But one can argue: with non-probability sampling , say quota sampling, we can get a perfectly balanced (zero imbalance) set of respondents and then practice calibrated weighting a) Is adaptive design (acting on a probability sample) sufficiently attractive? b) Should Statistics Sweden (& others) abandon probability sampling ?
Implementation Goals of this workshop (as mentioned in mail-outs) are to address questions such as : - What are best practices of implementing the core principles of adaptive design ? - To what extent can the drivers of adaptive design be conceptualized and effectively implemented in complex surveys ? Crucial questions.
Implementing core principles of adaptive design One core principle - a driver - of adaptive design : Faced with high survey nonresponse, we must strive to get a well representative, low imbalance, set of respondents. Imbalance IMB is an example of a simple statistical concept for the data collection. But hard to get accepted and effectively implemented . Despite all recent literature on adaptive design, some NSI:s still need to get rid of a fixation on high final survey response rate as the dominating driver for the data collection.
Recommend
More recommend