Testing and Imputing Item Nonresponse as Missing Data, with Big and Normal Survey Data NATALIE JACKSON Director of Research PRRI JEFF GILL Distinguished Professor, Government, Mathematics & Statistics American University
Overview ◮ Types of item nonresponse: There are two types of survey item nonresponse that matter: whether the respondent actually has an opinion, or not ◮ Implications for missing data treatment: The above distinction matters for missing data classifica- tion and treatment - and how we think about data structures ◮ Data: ANES sets up a partial test for this theory ◮ Distinguishing between types of item nonresponse: Some models demonstrate a difference between types of nonresponse, including after imputation ◮ Limitations and what’s next: This is still a work in progress
Types of Item Nonresponse: Sources in the Survey ◮ Don’t Know In theory, this response comes from respondents who truly do not know the answer to a question • In practice, it is not clear whether respondents choose “don’t know” - because they really don’t have an attitude, none of the answer options fit, or they don’t want to answer ◮ Refuse In theory, this response originates from respondents who do not want to provide their an- swer to the question • Incidence of outright refusals is generally very low. In practice, reasons for refusing are as murky as with “don’t know” ◮ Skip This response is completely ambiguous and frequently occurs on web surveys. We have no insight into the motivation for skipping an item
Types of Item Nonresponse: Two Types that Matter ◮ The ambiguity in “don’t know”, refuse, and skip provide little help for analysis. ◮ A more useful typology of item nonresponse is: • True Nonattitudes The respondent truly does not have an opinion or factual knowledge re- quired to answer the question, often for knowable reasons - a lack of education, experience, or interest in the topic • Hidden Attitudes The respondent has an opinion or the factual knowledge required to answer the question, but for some (often unknowable) reason, they choose not to disclose it
Types of Item Nonresponse: Two Types that Matter ◮ Implications of two types: • The reason for item nonresponse cannot be assumed consistent across - or even within - respon- dents: – The type of missingness can change within a single variable due to different respondents – It can also change within respondents, as they will have distinct reasons for not answering different questions • Result: We’re no longer thinking about rows of respondents and columns of variables - we’re thinking about every cell independently
Implications for Missing Data Treatment: Reminder of Missing Data Types ◮ Goal: L ( β | X , Y ) unbiased, efficient, and with the correct standard error. ◮ Define: Z mis = ( X mis , Y mis ) Z obs = ( X obs , Y obs ) ◮ We stipulate a n × k matrix, R , corresponding to X that contains 0 when the X matrix data value is not missing, and 1 when it is missing. ◮ Stipulate a probability model for R where φ is a parameter in this distribution of R . ◮ Standard Terms from Rubin (1979) for the missing data: MCAR p ( R | Z obs , Z mis ) = p ( R | φ ) missingness not related to observed or unobserved MAR p ( R | Z obs , Z mis ) = p ( R | Z obs , φ ) missingness depends only on observed data Non-Ignorable p ( R | Z obs , Z mis ) = p ( R | Z obs , Z mis , φ ) missingness depends on unobserved data
Implications for Missing Data Treatment: Two Types of Item Nonresponse True Nonattitudes ◮ When people genuinely don’t know the answer to a factual question or don’t have an attitude to report, typically it’s because they lack education, experience, or interest in the topic ◮ Those factors are generally measured in a survey. Thus, missingness caused by a true nonattitude could be MAR Hidden Attitudes ◮ Social desirability, the interviewer interaction, views of the topic as private, and many other possi- ble justifications exist for why a respondent would decline to state an attitude, in addition to edu- cation, experience, and interest. ◮ Given that we do not/cannot measure these factors in the survey, hidden attitudes are likely Non- Ignorable/NMAR
Implications for Missing Data Treatment: Two Types of Item Nonresponse Analysis questions: ◮ Are these two causes of missing data discernible in survey data: Can we model the difference be- tween true nonattitudes and hidden attitudes? ◮ Can we design a model that identifies whether a missing cell is likely to be a true nonattitude or a hidden attitude?
Data: ANES 2012, Face-to-Face Sample ◮ Dataset chosen because of a question setup that allows a more direct test of this typology than usual ◮ Working with the face-to-face sample only • 2012 was the first year a web sample was done simultaneously, but given mode differences the web portion is not used here • It’s unclear how “don’t know” and refuse was handled in the published questionnaires, but it appears explicit options were not given • Item nonresponse is generally much lower in web sample, not a clear comparison to face-to-face sample
Data: The Test Questions ◮ A two-question setup on the respondent ideology question: ◮ libcpre self asks all respondents to place themselves on the standard 7-point ideology scale (1=extremely liberal), provides “haven’t thought much about it” as an out • 590/2054 respondents did not give a substantive response ◮ libcpre choose asks those who said “moderate,” “haven’t thought much about it” or any other non-response if they had to choose, whether they would select liberal or conservative • 128 respondents still did not give a substantive response when libcpre choose is combined with libcpre self • 462 respondents gave a substantive answer to libcpre choose but did not to libcpre self
Data: The Test Questions ◮ The 128 respondents who didn’t answer the ideology question twice can be thought of as the “true nonattitudes” • It’s possible they are just really stubborn hidden attitudes, but the text of the libcpre choose question - “if you had to choose” - is designed to ferret those out. We’re mak- ing the assumption that it worked ◮ The 462 respondents who answered on the second round can be thought of as the “hidden atti- tudes” • It’s possible a few true nonattitudes provided answers in order to mollify interviewers. We’re making the assumption that those cases are few ◮ These are important assumptions that we aren’t able to definitively prove - yet there is some evi- dence in the data to support them
Distinguishing Between Types of Item Nonresponse: Hidden Attitudes vs. True Nonattitudes Figure 1: Modeling Nonresponse to Ideology Question
Distinguishing Between Types of Item Nonresponse: Hidden Attitudes vs. True Nonattitudes ◮ Overall: Nonresponders are more likely to be female, nonwhite, less interested, Demo- crat/Independent, lower education ◮ True nonattitudes compared to hidden attitudes: Less interested, more likely Independent, maybe less likely nonwhite ◮ Model fit is much better for explaining true nonattitudes than it is for explaining all missingness or the hidden attitudes
Distinguishing Between Types of Item Nonresponse: Hidden Attitudes vs. True Nonattitudes ◮ Tentative conclusion: We have a much better handle on what causes true nonattitudes, and it hinges on attitudinal variables ◮ Back to our analysis questions: • Are these two causes of missing data discernible in survey data: Can we model the difference between true nonattitudes and hidden attitudes? Yes • Now we turn to imputation questions: Can we treat all missingness equally given the theoret- ical distinction between true nonattitudes and hidden attitudes? Is it appropriate to impute a response for analytical purposes when the respondent truly does not have an opinion?
Effects on Imputed Data: Hot Deck Imputation ◮ R package hot.deck used for imputation given missingness on categorical variable ( libcpre self is 29% missing) • Multiple hot deck imputation for categorical variables (Cranmer & Gill 2013) • Uses mice for continuous variables • 10 imputations per dataset ◮ Two datasets imputed: • All 2054 cases • 1926 cases imputed, 128 true nonattitudes dropped
Effects on Imputed Data: Imputed Model Comparison Figure 2: Coefficients Across 4 Models
Effects on Imputed Data: Imputed Model Comparison Figure 3: Ideology Coefficients Across 4 Models
Takeaways ◮ There is some important difference between the true nonattitudes and revealed hidden attitudes ◮ There is a clear hierarchy of model fit when those values are imputed: • Imputed all 590 missing cases, Imputed 128 true nonattitudes, Imputed 462 missing cases with 128 deleted, No imputation needed ◮ And this is only for one variable in the model - scrutinizing all variables would presumably result in more differences ◮ That indicates we should consider not imputing true nonattitudes ◮ At minimum, we need to be aware of the differences in type of item nonresponse that we’re imput- ing
Recommend
More recommend