tools for identifying when to modify a
play

Tools for Identifying When to Modify a Surveys Data Collection - PowerPoint PPT Presentation

Univariate Tests for Phase Capacity: Tools for Identifying When to Modify a Surveys Data Collection Protocol Workshop on Responsive and Adaptive Survey Design Bureau of Labor Statistics Washington, DC March 14, 2018 Taylor Lewis 1 Senior


  1. Univariate Tests for Phase Capacity: Tools for Identifying When to Modify a Survey’s Data Collection Protocol Workshop on Responsive and Adaptive Survey Design Bureau of Labor Statistics Washington, DC March 14, 2018 Taylor Lewis 1 Senior Data Scientist U.S. Office of Personnel Management 1 The opinions, findings, and conclusions expressed in this presentation are those of the authors and do not necessarily reflect those of the U.S. Office of Personnel Management. Note: all references in this presentation can be found in the JOS Fall 2017 special issue article of the same name. 1

  2. Outline I. Introduction – Background and definitions – Illustration of phase capacity in the 2011 Federal Employee Viewpoint Survey (FEVS) II. Tests for Phase Capacity – Rao, Glickman, and Glynn (2008): a test based on multiple imputation for nonresponse – New method: a test amenable to weight adjustment methods for nonresponse III. FEVS Application Comparing the Two Phase Capacity Tests IV. Limitations and Avenues for Further Research 2

  3. I. Introduction 3

  4. Nonresponse and Nonrespondent Follow-Up • Invariably, not all sampled entities respond to the initial survey solicitation • Most surveys repeatedly follow-up with nonrespondents making additional mailings, phone calls, household visits, etc., often aiming to meet a preset response rate target • Each subsequent reminder brings in a new “wave” of data, which tends to be progressively smaller in size and, thus, impact estimates less and less • Other temporal delineations of waves are possible 4

  5. Notion of Phase Capacity • In their discussion of responsive survey design, Groves and Heeringa (2006) define the following terms: – design phase – data collection period with stable frame, sample, and recruitment protocol – phase capacity – point during a design phase at which additional responses cease influencing key statistics • Rather than fixating on a target response rate, they argue one should change design phases (e.g., switch mode, increase incentive) or discontinue nonrespondent follow-up altogether once phase capacity has been reached • Problem for practitioners: no calculable rule given 5

  6. Illustration of Phase Capacity in FEVS • Federal Employee Viewpoint Survey (FEVS) – a yearly organizational climate survey administered by the U.S. Office of Personnel Management (OPM) to a sample of ~1.3M federal employees from 80+ agencies • Web-based instrument comprised mainly of attitudinal items posed on a five-point Likert-type scale • Key statistics are “percent positive” estimates based on the dichotomization of, for example, “Completely Agree” or “Agree” elections versus all other possible responses • Weekly reminder emails are sent to nonrespondents 6

  7. Illustration of Phase Capacity in FEVS (2) FEVS 2011 Reminder Schedule and Achieved Responses by Wave of Data Collection (a Calendar Week) for Three Example Agencies: 7

  8. Illustration of Phase Capacity in FEVS (3) Trend of an example agency’s nonresponse-adjusted percent positive statistic for FEVS 2011 Item 4 with 95% confidence limits: • Above is a commonly observed FEVS pattern (Sigman et al., 2014) • Goal: identify estimate stability (i.e., phase capacity) as soon as possible  change design phase 8

  9. II. Tests for Phase Capacity 9

  10. Rao, Glickman, and Glynn (RGG) – MI Test • Rao, Glickman, and Glynn (RGG) (2008) studied retrospective “stopping rules” – best-performing method involves multiple imputation (MI) • Idea is to multiply impute (Rubin, 1987) the missing data M ( M ≥ 2) times for nonrespondents as of wave k , then delete responses obtained during wave k , specifically, and repeat for nonrespondents as wave k – 1  result is 2 M completed data sets and two nonresponse- adjusted, MI point estimates • A t - test is carried out by dividing the two point estimates’ difference by an estimate of the MI variance of the difference • Phase capacity declared once the test statistic is insignificant 10

  11. Visualization of MI Test 2 M M  ˆ ˆ  ˆ ˆ 1  1  M ˆ Calculations : 1        B d d d d U var( d ) M   m M  M m M m M 1 M  M  m 1  m 1 m 1    ˆ 1       H : d 0 H : d 0 t d / U 1 B 0 M 1 M M M M   M 11

  12. New Method: A Test Amenable to Weighting • Alternatively, one could weight up the wave k and k – 1 respondents, respectively, producing two sets of nonresponse- k and w 1 k – 1 adjusted weights w 1 • Fundamentals of Taylor series linearization (Woodruff, 1971) can be used to derive the variance of the wave-specific weighted mean difference, which is a function of p = 4 totals: • Replication variance estimation methods could also be used (Wolter, 2007), and unlike MI test, the weighting version generalizes to other point estimates, not strictly means 12

  13. Visualization of the Weighting Test 6  1 w y ˆ 1 i i 1 Y 99 . 96 ˆ        1 10 i 1 1  y 1 . 666 ˆ ˆ 2         2 ˆ 1 var( ) var u 0 . 00567 6 1 0 . 015 60  N 1 i 1 1   1 w  1 i i 1  i 1  10 0 . 015      2   2 2 H : 0 H : 0 t 0 . 2 w y ˆ 1 i i 0 1 1 1 2 Y 100 . 86 ˆ  0 . 00567     2 i 1 1 y 1 . 681 ˆ 1 10 2 60  N 2 1 w 1 i  i 1 13

  14. III. FEVS Application Comparing the Two Phase Capacity Tests 14

  15. Details of 2011 FEVS Application • Investigated 7 percent positive estimates comprising the Job Satisfaction Index for a purposive sample of three agencies • Treating the ultimate respondent set as the full sample, used time stamps to group responses by field period week and conducted the two test versions retrospectively – full sample used to compute “relative nonresponse error” • Used categorical demographics on sample frame (gender, minority status, supervisor status, work unit, and work location) to adjust for nonresponse as follows: – MI version: demographics served as main effects in a sequence of imputation models fit independently by agency, using IVEware (Raghunathan et al., 2001) ( M = 5) – Weighting Version: demographics served as raking dimensions (Kalton and Flores- Cervantes, 2003) 15

  16. Results of 2011 FEVS Application Comments: • MI version of test tends to declare phase capacity sooner – only one instance calling for a 3 rd wave of data collection • Because the nonresponse- adjusted estimates tend to increase with each wave, the result is a larger residual NR error 16

  17. Interpretation of Application Results • Issue is that the variance decreases to 0 quicker for weighting version  consider extreme case of no new respondents: variance of difference would be 0 for weighting version, but for MI version the d mi ’s not necessarily 0 • Results from a simulation study discussed in article shed some more light on this claim: all else equal, weighting version has smaller estimated variance and is therefore more sensitive to point estimate changes 17

  18. IV. Limitations and Avenues for Further Research 18

  19. Study Limitations • Despite aversion to phrase “stopping rule ,” stopping was the only design phase change investigated in this research • Data must be collected/processed real-time, and it was tacitly assumed that the full sample is “active” – may be impractical for in-person surveys covering a vast geographical expanse, although tests could be applied to subsamples • Actual adoption of these approaches in FEVS would face resistance because: – Desirable to treat each agency equitably; beginning in FEVS 2012, field period was preset to 6 weeks for all agencies – Higher scores are better, and so there may be opposition to any change, shortened field period included, believed to reduce point estimates 19

  20. Ideas for Further Research • Working out a more formal theoretical understanding as to why the covariance is not accounted for equivalently in the two tests • Derive variants of MI test for point estimates other than means, so that more comparisons could be made against the weighting version • Chapter 4 of Lewis (2014) extends weighting version of phase capacity test to multivariate settings – could do something similar for MI version of test • Both phase capacity testing methods discussed today were retrospective in nature; future research could develop prospective variants in the spirit of the one proposed by Wagner and Raghunathan (2010) 20

  21. Thanks! Questions/Comments? Taylor.Lewis@opm.gov 21

Recommend


More recommend