Is Random Sampling Necessary? Dan Hedlin Department of Statistics, Stockholm University
Focus on official statistics ● Trust is paramount (Holt 2008) ● Very wide group of users ● Official statistics is official ● Bias important ● Not as cost sensitive as market research ● Strong emphasis on generality and precision (trade- off in model building between generality, realism and precision, see Levins 1966 and Baker et al 2013 sec 8.2) NTTS 2015. Dan Hedlin, Stockholm University 10/3/2015 2
Trends ● “Wider , Deeper, Quicker, Better, Cheaper” (Holt 2007) ● Increasing rates of nonresponse, hard to find, hard to contact in the Western world ● Expansion of data sources, data collection methods; mixed modes ● Expanding research, mostly application driven. NTTS 2015. Dan Hedlin, Stockholm University 10/3/2015 3
On bias ● In official statistics, a minimum MSE estimator is not necessarily desirable. Bias is not on the same footing as variance ● Note that point estimates, not interval estimates, are used ● Bias is (potentially) worse than variance ● Also an issue of trust NTTS 2015. Dan Hedlin, Stockholm University 10/3/2015 4
On balanced samples ● What is balance? ● Basically that this holds: 𝑦 𝑘 𝑦 𝑘 𝑜 = for some 𝑡 𝑉 𝑂 number j (ignoring weights). (Valliant et al. 2000) “Sample balance” ● Or 𝒚 𝑠 “Response set balance” 𝑡 = 𝒚 NTTS 2015. Dan Hedlin, Stockholm University 10/3/2015 5
A balanced sample is good to have 𝑠 + 𝜸 𝑡 − 𝜸 𝑠 ´𝒚 𝒕 ● 𝑧 𝑡 − 𝑧 𝑠 = 𝒚 𝑡 − 𝒚 𝑠 ´𝛄 (Särndal & Lundquist 2014) 𝑡 − 𝜸 𝑠 ? ● Does small 𝒚 𝑠 imply small 𝜸 𝑡 − 𝒚 ● The answer is “probably yes” (Särndal & Lundquist 2014, Sec. 6) NTTS 2015. Dan Hedlin, Stockholm University 10/3/2015 6
● So what is the causal mechanism 𝑡 − 𝜸 𝑠 ? 𝑠 -> small 𝜸 small 𝒚 𝑡 − 𝒚 Loosely speaking, it is: Small variance of response propensities (in groups defined by x) 𝑡 , 𝒚 𝑠 and that we can manipulate ● Note that we know 𝒚 𝑠 by adaptive sampling 𝒚 (Schouten et al. 2013, Särndal & Lundquist 2014) NTTS 2015. Dan Hedlin, Stockholm University 10/3/2015 7
Comparing three strategies Perfect frame + random sample + unknown 1. response propensities + we strive for response set balance The same as 1 but with nonrandom sample 2. The same as 1 but with a restricted frame. There is 3. auxiliary data on the frame. Only deficiency is undercoverage. E.g. large, “good” web panel. NTTS 2015. Dan Hedlin, Stockholm University 10/3/2015 8
Scene ● Specify 𝑔(𝐙|𝐘; 𝛄) for all variables Y for all N units. X for sampling design end estimation. Analytic aim: inference about 𝛄 1. Descriptive aim: inference about 𝐙 𝒕 2. ● Some y used for post-stratification 𝐙 = 𝐙 𝑞𝑝𝑡𝑢 , 𝐙 𝑛 ● For robustness of post-stratification to nonresponse, see Särndal & Lundström (2005) 𝑞𝑝𝑡𝑢 , 𝐘; 𝛄) for inference about 𝐙 𝒕 ● We need 𝑔(𝐙 𝑡 𝑛 |𝐙 𝑡 𝑞𝑝𝑡𝑢 , 𝐚, 𝐘; 𝛄) , Z indicates web panel 𝑛 |𝐙 𝑡 ● Further 𝑔(𝐙 𝑡 membership in Strategy 3 NTTS 2015. Dan Hedlin, Stockholm University 10/3/2015 9
Sample selection ignorability criterion: 1. 𝑞𝑝𝑡𝑢 , 𝐘 𝑔 𝐉 𝑡 𝐙 𝑡 , 𝐘 = 𝑔 𝐉 𝑡 𝐙 𝑡 ● True also for some nonrandom sampling designs, e.g. sample balanced designs (Little 1982, Smith 1983) To be able to ignore nonresponse: 2. 𝑞𝑝𝑡𝑢 , 𝐘 𝑔 𝑲 𝑠 𝐉 𝑡 , 𝐙 𝑡 , 𝐘 = 𝑔 𝑲 𝑠 𝐉 𝑡 , 𝐙 𝑡 To be able to ignore web panel selection 3. mechanism: 𝑛 𝐙 𝑡 𝑛 𝐙 𝑡 𝑞𝑝𝑡𝑢 , 𝐚, 𝐘; 𝛄 = 𝑔 𝐙 𝑡 𝑞𝑝𝑡𝑢 , 𝐘; 𝛄 𝑔 𝐙 𝑡 (Little 1982, Smith 1983, Valliant et al. 2003) NTTS 2015. Dan Hedlin, Stockholm University 10/3/2015 10
● We restrict attention to ignorable sampling designs; nonrandom samples must be ignorable ● Hence it is Strategy 3 that is different. ● Does criterion 3 hold in practice? E.g. Sjöström (2012) found that sometimes it does, sometimes it does not. See also Baker et al. (2013). ● Note also that Strategy 3 has to a some limited extent always been in use in survey sampling, in particular in business surveys (cut-off sampling) NTTS 2015. Dan Hedlin, Stockholm University 10/3/2015 11
Further issues ● Suppose you are successful in balancing the set of responses. Does it matter whether you have started from a random sample or a nonrandom, ignorable sample? It would seem that it does not. ● A more practical issue: If you strive for balancing the response set, is it easier to start from a random sample? ● What is best, balancing response set or adjusting through estimation? Some evidence that balancing is slightly better (Schouten et al. 2014) NTTS 2015. Dan Hedlin, Stockholm University 10/3/2015 12
● Of course, there is a broader picture (Schouten et al. 2012) NTTS 2015. Dan Hedlin, Stockholm University 10/3/2015 13
References Baker, R. et al. (2013). Report on the AAPOR task force on non-probability ● sampling. American Association for Public Opinion Research. Holt, D. (2007). The Official Statistics Olympic Challenge: Wider, Deeper, Quicker, ● Better, Cheaper. (With discussion). The American Statistician , 61, 1-15. Holt, D. (2008). Official statistics, public policy and public trust. Journal of the ● Royal Statistical Society, Series A, 171, 1 – 20. Levins, R. (1966). The strategy of model building in population biology. American ● Scientist. Little, R. J.A. (1982). Models for Nonresponse in Sample Surveys. Journal of the ● American Statistical Association, 77, 237-250. Särndal, C.-E. and Lundquist, P . (2014). Accuracy in Estimation with ● Nonresponse: A Function of Degree of Imbalance and Degree of Explanation. Journal of Survey Statistics and Methodology, 1-27. ● Särndal, C.-E. and Lundström, S. (2012). Estimation in Surveys with Nonresponse. New York: Wiley. NTTS 2015. Dan Hedlin, Stockholm University 10/3/2015 14
Schouten, B., Bethlehem, J., Beullens, K., Kleven, Ø., Loosvelt, G., Luiten, A., Rutar, K., ● Shlomo, N. and Skinner, C. (2012). Evaluating, Comparing, Monitoring, and Improving Representativeness of Survey Response Through R-Indicators and Partial R-Indicators. International Statistical Review, 80, 382-399. Shouten, B., Calinescu, M. and Luiten, A. (2013). Optimizing quality of response through ● adaptive survey designs. Survey Methodology, 39, 29-58. Scouten, B., Cobben, F., Lundquist, P. and Wagner, J. (2014). Theoretical and Empirical ● Support for Adjustment of Nonresponse by design. Discussion paper, 2014/15, Statistics Netherlands. Sjöström, T. (2012). Självrekryterade jämfört med slumpmässigt rekryterade paneler. ● Novus, Sweden. (in Swedish) Smith, T.M.F. (1983). On the validity of inferences from non-random sample. Journal of ● the Royal Statistical Society, Series A, 146, 394-403. Valliant, R., Dorfman, A.H. and Royall, R.M. (2000). Finite Population Sampling and ● Inference: A Prediction Approach. New York: Wiley. NTTS 2015. Dan Hedlin, Stockholm University 10/3/2015 15
Recommend
More recommend