Testing and Imputing Item Nonresponse as Missing Data, with Big and - - PowerPoint PPT Presentation

testing and imputing item nonresponse as missing data
SMART_READER_LITE
LIVE PREVIEW

Testing and Imputing Item Nonresponse as Missing Data, with Big and - - PowerPoint PPT Presentation

Testing and Imputing Item Nonresponse as Missing Data, with Big and Normal Survey Data NATALIE JACKSON Director of Research PRRI JEFF GILL Distinguished Professor, Government, Mathematics & Statistics American University Overview


  • Testing and Imputing Item Nonresponse as Missing Data, with Big and Normal Survey Data NATALIE JACKSON Director of Research PRRI JEFF GILL Distinguished Professor, Government, Mathematics & Statistics American University

  • Overview ◮ Types of item nonresponse: There are two types of survey item nonresponse that matter: whether the respondent actually has an opinion, or not ◮ Implications for missing data treatment: The above distinction matters for missing data classifica- tion and treatment - and how we think about data structures ◮ Data: ANES sets up a partial test for this theory ◮ Distinguishing between types of item nonresponse: Some models demonstrate a difference between types of nonresponse, including after imputation ◮ Limitations and what’s next: This is still a work in progress

  • Types of Item Nonresponse: Sources in the Survey ◮ Don’t Know In theory, this response comes from respondents who truly do not know the answer to a question • In practice, it is not clear whether respondents choose “don’t know” - because they really don’t have an attitude, none of the answer options fit, or they don’t want to answer ◮ Refuse In theory, this response originates from respondents who do not want to provide their an- swer to the question • Incidence of outright refusals is generally very low. In practice, reasons for refusing are as murky as with “don’t know” ◮ Skip This response is completely ambiguous and frequently occurs on web surveys. We have no insight into the motivation for skipping an item

  • Types of Item Nonresponse: Two Types that Matter ◮ The ambiguity in “don’t know”, refuse, and skip provide little help for analysis. ◮ A more useful typology of item nonresponse is: • True Nonattitudes The respondent truly does not have an opinion or factual knowledge re- quired to answer the question, often for knowable reasons - a lack of education, experience, or interest in the topic • Hidden Attitudes The respondent has an opinion or the factual knowledge required to answer the question, but for some (often unknowable) reason, they choose not to disclose it

  • Types of Item Nonresponse: Two Types that Matter ◮ Implications of two types: • The reason for item nonresponse cannot be assumed consistent across - or even within - respon- dents: – The type of missingness can change within a single variable due to different respondents – It can also change within respondents, as they will have distinct reasons for not answering different questions • Result: We’re no longer thinking about rows of respondents and columns of variables - we’re thinking about every cell independently

  • Implications for Missing Data Treatment: Reminder of Missing Data Types ◮ Goal: L ( β | X , Y ) unbiased, efficient, and with the correct standard error. ◮ Define: Z mis = ( X mis , Y mis ) Z obs = ( X obs , Y obs ) ◮ We stipulate a n × k matrix, R , corresponding to X that contains 0 when the X matrix data value is not missing, and 1 when it is missing. ◮ Stipulate a probability model for R where φ is a parameter in this distribution of R . ◮ Standard Terms from Rubin (1979) for the missing data: MCAR p ( R | Z obs , Z mis ) = p ( R | φ ) missingness not related to observed or unobserved MAR p ( R | Z obs , Z mis ) = p ( R | Z obs , φ ) missingness depends only on observed data Non-Ignorable p ( R | Z obs , Z mis ) = p ( R | Z obs , Z mis , φ ) missingness depends on unobserved data

  • Implications for Missing Data Treatment: Two Types of Item Nonresponse True Nonattitudes ◮ When people genuinely don’t know the answer to a factual question or don’t have an attitude to report, typically it’s because they lack education, experience, or interest in the topic ◮ Those factors are generally measured in a survey. Thus, missingness caused by a true nonattitude could be MAR Hidden Attitudes ◮ Social desirability, the interviewer interaction, views of the topic as private, and many other possi- ble justifications exist for why a respondent would decline to state an attitude, in addition to edu- cation, experience, and interest. ◮ Given that we do not/cannot measure these factors in the survey, hidden attitudes are likely Non- Ignorable/NMAR

  • Implications for Missing Data Treatment: Two Types of Item Nonresponse Analysis questions: ◮ Are these two causes of missing data discernible in survey data: Can we model the difference be- tween true nonattitudes and hidden attitudes? ◮ Can we design a model that identifies whether a missing cell is likely to be a true nonattitude or a hidden attitude?

  • Data: ANES 2012, Face-to-Face Sample ◮ Dataset chosen because of a question setup that allows a more direct test of this typology than usual ◮ Working with the face-to-face sample only • 2012 was the first year a web sample was done simultaneously, but given mode differences the web portion is not used here • It’s unclear how “don’t know” and refuse was handled in the published questionnaires, but it appears explicit options were not given • Item nonresponse is generally much lower in web sample, not a clear comparison to face-to-face sample

  • Data: The Test Questions ◮ A two-question setup on the respondent ideology question: ◮ libcpre self asks all respondents to place themselves on the standard 7-point ideology scale (1=extremely liberal), provides “haven’t thought much about it” as an out • 590/2054 respondents did not give a substantive response ◮ libcpre choose asks those who said “moderate,” “haven’t thought much about it” or any other non-response if they had to choose, whether they would select liberal or conservative • 128 respondents still did not give a substantive response when libcpre choose is combined with libcpre self • 462 respondents gave a substantive answer to libcpre choose but did not to libcpre self

  • Data: The Test Questions ◮ The 128 respondents who didn’t answer the ideology question twice can be thought of as the “true nonattitudes” • It’s possible they are just really stubborn hidden attitudes, but the text of the libcpre choose question - “if you had to choose” - is designed to ferret those out. We’re mak- ing the assumption that it worked ◮ The 462 respondents who answered on the second round can be thought of as the “hidden atti- tudes” • It’s possible a few true nonattitudes provided answers in order to mollify interviewers. We’re making the assumption that those cases are few ◮ These are important assumptions that we aren’t able to definitively prove - yet there is some evi- dence in the data to support them

  • Distinguishing Between Types of Item Nonresponse: Hidden Attitudes vs. True Nonattitudes Figure 1: Modeling Nonresponse to Ideology Question

  • Distinguishing Between Types of Item Nonresponse: Hidden Attitudes vs. True Nonattitudes ◮ Overall: Nonresponders are more likely to be female, nonwhite, less interested, Demo- crat/Independent, lower education ◮ True nonattitudes compared to hidden attitudes: Less interested, more likely Independent, maybe less likely nonwhite ◮ Model fit is much better for explaining true nonattitudes than it is for explaining all missingness or the hidden attitudes

  • Distinguishing Between Types of Item Nonresponse: Hidden Attitudes vs. True Nonattitudes ◮ Tentative conclusion: We have a much better handle on what causes true nonattitudes, and it hinges on attitudinal variables ◮ Back to our analysis questions: • Are these two causes of missing data discernible in survey data: Can we model the difference between true nonattitudes and hidden attitudes? Yes • Now we turn to imputation questions: Can we treat all missingness equally given the theoret- ical distinction between true nonattitudes and hidden attitudes? Is it appropriate to impute a response for analytical purposes when the respondent truly does not have an opinion?

  • Effects on Imputed Data: Hot Deck Imputation ◮ R package hot.deck used for imputation given missingness on categorical variable ( libcpre self is 29% missing) • Multiple hot deck imputation for categorical variables (Cranmer & Gill 2013) • Uses mice for continuous variables • 10 imputations per dataset ◮ Two datasets imputed: • All 2054 cases • 1926 cases imputed, 128 true nonattitudes dropped

  • Effects on Imputed Data: Imputed Model Comparison Figure 2: Coefficients Across 4 Models

  • Effects on Imputed Data: Imputed Model Comparison Figure 3: Ideology Coefficients Across 4 Models

  • Takeaways ◮ There is some important difference between the true nonattitudes and revealed hidden attitudes ◮ There is a clear hierarchy of model fit when those values are imputed: • Imputed all 590 missing cases, Imputed 128 true nonattitudes, Imputed 462 missing cases with 128 deleted, No imputation needed ◮ And this is only for one variable in the model - scrutinizing all variables would presumably result in more differences ◮ That indicates we should consider not imputing true nonattitudes ◮ At minimum, we need to be aware of the differences in type of item nonresponse that we’re imput- ing