The Impact of Targeted Data Collection on Nonresponse Bias in an - - PowerPoint PPT Presentation

the impact of targeted data collection on
SMART_READER_LITE
LIVE PREVIEW

The Impact of Targeted Data Collection on Nonresponse Bias in an - - PowerPoint PPT Presentation

The Impact of Targeted Data Collection on Nonresponse Bias in an Establishment Survey A Simulation Study of Adaptive Survey Design Jaki S. McCarthy, USDA, National Agricultural Statistics Service James Wagner, University of Michigan Herschel


slide-1
SLIDE 1

The Impact of Targeted Data Collection on Nonresponse Bias in an Establishment Survey

A Simulation Study of Adaptive Survey Design Jaki S. McCarthy, USDA, National Agricultural Statistics Service James Wagner, University of Michigan Herschel Sanders, RTI

Journal of Official Statistics Special Issue Workshop

slide-2
SLIDE 2

Adaptive Designs Tailoring Data Collection

  • Differential treatment is applied to individual sample units,

either pre-planned or as data collection progresses to

– Reduce nonresponse bias or – Contain costs

  • For example:

– Self administered web cases assigned to interviewer modes – Call attempts limited for some and increased for others

2

slide-3
SLIDE 3

Additional considerations for establishments

  • More auxiliary information

(frame data) may be available

– Establishment characteristics – Response in previous contacts – Publicly available information

  • Size of establishments may

differ dramatically

– A few very large estabs can dominate some population estimates

3

slide-4
SLIDE 4

Additional considerations for establishments

  • Establishments may view responding differently than

households

– Reluctance to release proprietary information – Response is a cost to the business – Response may be more complicated for some estabs (require multiple reporters, record retrieval, etc.) – Special handling may be in place for some

4

slide-5
SLIDE 5

Do Household Survey Findings Hold for Establishments?

  • More likely nonrespondents can be targeted for earlier or

more intensive fieldwork (Peytchev 2010, Wagner 2012, Luiten and Schouten 2013)

– to increase response – To balance sample representativeness

  • Many establishment surveys already apply differential data

collection strategies

5

slide-6
SLIDE 6

Adaptive Designs: Who and How?

Adaptive or responsive designs have to determine:

– WHO to target: which cases should be targeted? – WHAT to do with the targeted respondents? i.e. what alternative procedures can be applied to these cases?

This is specific to the survey and data collection procedures Goals of our simulation: Can we select establishments to target for alternative procedures? Will targeting these establishments help reduce nonresponse bias?

6

slide-7
SLIDE 7

NASS Crop Acreage and Production Survey (APS)

  • Survey of US farm operations
  • Produces estimates of grain stocks, multiple crop inventories

and production

  • Conducted quarterly in March, June, September and

December

  • Estimates produced at the US, and State level (some crops

differing by state)

7

slide-8
SLIDE 8

NASS Crop Acreage and Production Survey (APS)

  • Sample design is multivariate probability proportional to size
  • Data collection period is short, less than 2 weeks
  • Multimode data collection

– Survey is mailed, can be returned or completed online – Telephone and in person follow up begins a few days after mailing – Most data collected by CATI

  • Strata based non-response adjustment

8

slide-9
SLIDE 9

Crops APS: Who to target?

  • Extensive auxiliary data available for sampled operations

– All included in mandatory Census of Agriculture – Additional response history for other NASS surveys is known

  • Auxiliary data used to develop classification tree models to

identify survey refusals and noncontacts

  • Each operation ranked into response propensity groups

(McCarthy, Jacobs, and McCracken, 2010; Mitchell and McCarthy, 2012)

– 1 - 4 for refusals – 1 - 5 for noncontacts

9

slide-10
SLIDE 10

Crops APS: How to change data collection?

  • What potential strategies are available for THIS survey?

– Move data collection from phone to in person? – Assign cases to best interviewers? – Prioritize effort from one subset of cases to another?

  • Not likely to change response rates dramatically

10

slide-11
SLIDE 11

Data from 2012 COA used as proxy for simulation

  • Key Crops APS estimates include specific crop acreages and

production

  • Some crops more common than others, operations differ in

size and contribution to population totals

  • Prior COA data is highly correlated to Crops APS data

– Matched to Crops APS for ~70% of respondents and nonrespondents

  • Matched data set considered 100% response set

11

slide-12
SLIDE 12

Do changes in response rates for targeted

  • perations reduce nonresponse bias?
  • Identify key crop estimates in simulation states (Iowa, Tenn)

– Soybeans – Alfalfa

  • Substitute 2012 COA proxy data for 2010 Crops APS sample

members

  • Produce estimates and calculate nonresponse bias

– 100% response (gold standard, no NR) – Current nonresponse – Potential nonresponse patterns with changes to data collection

12

slide-13
SLIDE 13

Farms in Iowa and Tennessee

Tennessee

  • Number of Farms: 68,050
  • Average Farm: 160 acres
  • Total Cropland: 9,082,099
  • 2012 Soybean acres:

1,229,385

  • 2012 Alfalfa acres: 14,296

Iowa

  • Number of Farms: 88,637
  • Average Farm: 345 acres
  • Total Cropland: 26,256,347
  • 2012 Soybeans acres:

9,301,594

  • 2012 Alfalfa acres: 656,367

13

slide-14
SLIDE 14

# farms with Soybeans Iowa: 41,710 Tennessee: 3,656

14

slide-15
SLIDE 15

# farms with Alfalfa Iowa: 19,717 Tennessee: 1,140

15

slide-16
SLIDE 16

Simulate different patterns of nonresponse

  • Simulation 1: resources diverted from the most likely refusals

and instead used to increase response among likely noncontacts

  • Simulation 2: resources diverted from the most likely to refuse

and instead used to increase response from cases with a lower likelihood of refusing

  • Simulation 3: if neither of these scenarios results in significant

changes in bias, what would be the impact of severely reduced response rates on bias?

16

slide-17
SLIDE 17

Subgroup Subgroup labels Change to Response Probabilities Baseline Strategy Current Pattern 1: Divert effort from likely refusals to likely noncontacts Refusal rank score highest

  • 10%

Noncontact rank score in highest two categories +10% All others 0% Pattern 2: Divert effort from most likely to refuse to less likely to refuse Refusal rank score highest

  • 10%

Refusal rank score second highest +10% All others 0% Pattern 3: Reduce response propensities by 50%

  • 50%

17

slide-18
SLIDE 18

Current nonresponse adjustment strategy

  • Nonresponse strata based on size and type of commodity
  • Self representing strata of largest operations with sample

weights =1

– No nonresponse allowed in this strata – In practice, records are manually estimated by analysts if missing

  • For remaining strata, imputation within stratum

– Ratio of commodity (soybeans, alfalfa) to total operation acreage calculated for observed cases, – Ratio multiplied by COA total acreage for each nonrespondent – Value imputed for missing data

  • Crop acreages weighted to estimate population total

18

slide-19
SLIDE 19

Simulations run 500 times

  • Random draw for nonrespondents each time
  • Average bias calculated
  • Coverage rate calculated (% of simulations within 95%

confidence interval)

19

slide-20
SLIDE 20

Results -- Soybeans

IOWA TENNESSEE NONRESPONSE PATTERN % Bias Coverage rate % Bias Coverage rate Baseline Estimates 0.16 100 0.32 100 1: Increase refusals and Decrease noncontacts 0.14 100 0.35 99.8 2: Increase response for soft refusals 0.11 100 0.40 99.8 3: Lower Overall RR substantially

  • 0.18

99 1.32 91

20

slide-21
SLIDE 21

Results -- Alfalfa

IOWA TENNESSEE NONRESPONSE PATTERN % Bias Coverage rate % Bias Coverage rate Baseline Estimates

  • 4.26

90 6.75 97 1: Increase refusals and Decrease noncontacts

  • 4.18

90 5.79 97 2: Increase response for soft refusals

  • 3.79

94 6.64 98 3: Lower overall RR substantially

  • 2.60

80 23.7 77

21

slide-22
SLIDE 22

Limitations

  • Use of 2012 data as proxy for 2010
  • Matched data was not available for full sample
  • Results may differ for other crops, geographies
  • Alternative strategies are tough to identify

22

slide-23
SLIDE 23

Conclusions

  • Nonresponse adjustments for existing nonresponse are fairly

good at minimizing nonresponse bias

  • Bias is different for different estimates and geographies within

the same survey

  • Strategies for redirecting resources and altering nonresponse

patterns had little impact on adjusted estimates

  • Only major impact seen for drastic reductions in response for

less common commodities

23

slide-24
SLIDE 24

Takeaways

  • Current nonresponse adjustments appear to be working well
  • Self representing strata probably dampens any impact, this

may be a key difference between HH and establishment surveys

  • We chose simulations based on altering response propensities,

perhaps another strategy is better

  • Simulations can be used to evaluate nonresponse bias – both

for existing processes and proposed alternatives

24