The Impact of Targeted Data Collection on Nonresponse Bias in an Establishment Survey A Simulation Study of Adaptive Survey Design Jaki S. McCarthy, USDA, National Agricultural Statistics Service James Wagner, University of Michigan Herschel Sanders, RTI Journal of Official Statistics Special Issue Workshop
Adaptive Designs Tailoring Data Collection • Differential treatment is applied to individual sample units, either pre-planned or as data collection progresses to – Reduce nonresponse bias or – Contain costs • For example: – Self administered web cases assigned to interviewer modes – Call attempts limited for some and increased for others 2
Additional considerations for establishments • More auxiliary information (frame data) may be available – Establishment characteristics – Response in previous contacts – Publicly available information • Size of establishments may differ dramatically – A few very large estabs can dominate some population estimates 3
Additional considerations for establishments • Establishments may view responding differently than households – Reluctance to release proprietary information – Response is a cost to the business – Response may be more complicated for some estabs (require multiple reporters, record retrieval, etc.) – Special handling may be in place for some 4
Do Household Survey Findings Hold for Establishments? • More likely nonrespondents can be targeted for earlier or more intensive fieldwork (Peytchev 2010, Wagner 2012, Luiten and Schouten 2013) – to increase response – To balance sample representativeness • Many establishment surveys already apply differential data collection strategies 5
Adaptive Designs: Who and How? Adaptive or responsive designs have to determine: – WHO to target: which cases should be targeted? – WHAT to do with the targeted respondents? i.e. what alternative procedures can be applied to these cases? This is specific to the survey and data collection procedures Goals of our simulation: Can we select establishments to target for alternative procedures? Will targeting these establishments help reduce nonresponse bias? 6
NASS Crop Acreage and Production Survey (APS) • Survey of US farm operations • Produces estimates of grain stocks, multiple crop inventories and production • Conducted quarterly in March, June, September and December • Estimates produced at the US, and State level (some crops differing by state) 7
NASS Crop Acreage and Production Survey (APS) • Sample design is multivariate probability proportional to size • Data collection period is short, less than 2 weeks • Multimode data collection – Survey is mailed, can be returned or completed online – Telephone and in person follow up begins a few days after mailing – Most data collected by CATI • Strata based non-response adjustment 8
Crops APS: Who to target? • Extensive auxiliary data available for sampled operations – All included in mandatory Census of Agriculture – Additional response history for other NASS surveys is known • Auxiliary data used to develop classification tree models to identify survey refusals and noncontacts • Each operation ranked into response propensity groups (McCarthy, Jacobs, and McCracken, 2010; Mitchell and McCarthy, 2012) – 1 - 4 for refusals – 1 - 5 for noncontacts 9
Crops APS: How to change data collection? • What potential strategies are available for THIS survey? – Move data collection from phone to in person? – Assign cases to best interviewers? – Prioritize effort from one subset of cases to another? • Not likely to change response rates dramatically 10
Data from 2012 COA used as proxy for simulation • Key Crops APS estimates include specific crop acreages and production • Some crops more common than others, operations differ in size and contribution to population totals • Prior COA data is highly correlated to Crops APS data – Matched to Crops APS for ~70% of respondents and nonrespondents • Matched data set considered 100% response set 11
Do changes in response rates for targeted operations reduce nonresponse bias? • Identify key crop estimates in simulation states (Iowa, Tenn) – Soybeans – Alfalfa • Substitute 2012 COA proxy data for 2010 Crops APS sample members • Produce estimates and calculate nonresponse bias – 100% response (gold standard, no NR) – Current nonresponse – Potential nonresponse patterns with changes to data collection 12
Farms in Iowa and Tennessee Tennessee Iowa • Number of Farms: 68,050 • Number of Farms: 88,637 • Average Farm: 160 acres • Average Farm: 345 acres • Total Cropland: 9,082,099 • Total Cropland: 26,256,347 • 2012 Soybean acres: • 2012 Soybeans acres: 1,229,385 9,301,594 • 2012 Alfalfa acres: 14,296 • 2012 Alfalfa acres: 656,367 13
# farms with Soybeans Iowa: 41,710 Tennessee: 3,656 14
# farms with Alfalfa Iowa: 19,717 Tennessee: 1,140 15
Simulate different patterns of nonresponse • Simulation 1: resources diverted from the most likely refusals and instead used to increase response among likely noncontacts • Simulation 2: resources diverted from the most likely to refuse and instead used to increase response from cases with a lower likelihood of refusing • Simulation 3: if neither of these scenarios results in significant changes in bias, what would be the impact of severely reduced response rates on bias? 16
Subgroup Subgroup labels Change to Response Probabilities Current Baseline Strategy Pattern 1: Divert effort from likely refusals to likely noncontacts Refusal rank score highest -10% Noncontact rank score in highest two categories +10% All others 0% Pattern 2: Divert effort from most likely to refuse to less likely to refuse Refusal rank score highest -10% Refusal rank score second highest +10% All others 0% -50% Pattern 3: Reduce response propensities by 50% 17
Current nonresponse adjustment strategy • Nonresponse strata based on size and type of commodity • Self representing strata of largest operations with sample weights =1 – No nonresponse allowed in this strata – In practice, records are manually estimated by analysts if missing • For remaining strata, imputation within stratum – Ratio of commodity (soybeans, alfalfa) to total operation acreage calculated for observed cases, – Ratio multiplied by COA total acreage for each nonrespondent – Value imputed for missing data • Crop acreages weighted to estimate population total 18
Simulations run 500 times • Random draw for nonrespondents each time • Average bias calculated • Coverage rate calculated (% of simulations within 95% confidence interval) 19
Results -- Soybeans IOWA TENNESSEE NONRESPONSE PATTERN % Bias Coverage % Bias Coverage rate rate Baseline Estimates 0.16 100 0.32 100 1: Increase refusals and 0.14 100 0.35 99.8 Decrease noncontacts 2: Increase response for 0.11 100 0.40 99.8 soft refusals 3: Lower Overall RR -0.18 99 1.32 91 substantially 20
Results -- Alfalfa IOWA TENNESSEE NONRESPONSE % Bias Coverage % Bias Coverage PATTERN rate rate Baseline Estimates -4.26 90 6.75 97 1: Increase refusals and -4.18 90 5.79 97 Decrease noncontacts 2: Increase response -3.79 94 6.64 98 for soft refusals 3: Lower overall RR -2.60 80 23.7 77 substantially 21
Limitations • Use of 2012 data as proxy for 2010 • Matched data was not available for full sample • Results may differ for other crops, geographies • Alternative strategies are tough to identify 22
Conclusions • Nonresponse adjustments for existing nonresponse are fairly good at minimizing nonresponse bias • Bias is different for different estimates and geographies within the same survey • Strategies for redirecting resources and altering nonresponse patterns had little impact on adjusted estimates • Only major impact seen for drastic reductions in response for less common commodities 23
Takeaways • Current nonresponse adjustments appear to be working well • Self representing strata probably dampens any impact, this may be a key difference between HH and establishment surveys • We chose simulations based on altering response propensities, perhaps another strategy is better • Simulations can be used to evaluate nonresponse bias – both for existing processes and proposed alternatives 24
Recommend
More recommend