Testing and Imputing Item Nonresponse as Missing Data, with Big and - PowerPoint PPT Presentation

Testing and Imputing Item Nonresponse as Missing Data, with Big and Normal Survey Data NATALIE JACKSON Director of Research PRRI JEFF GILL Distinguished Professor, Government, Mathematics & Statistics American University

Overview ◮ Types of item nonresponse: There are two types of survey item nonresponse that matter: whether the respondent actually has an opinion, or not ◮ Implications for missing data treatment: The above distinction matters for missing data classifica- tion and treatment - and how we think about data structures ◮ Data: ANES sets up a partial test for this theory ◮ Distinguishing between types of item nonresponse: Some models demonstrate a difference between types of nonresponse, including after imputation ◮ Limitations and what’s next: This is still a work in progress

Types of Item Nonresponse: Sources in the Survey ◮ Don’t Know In theory, this response comes from respondents who truly do not know the answer to a question • In practice, it is not clear whether respondents choose “don’t know” - because they really don’t have an attitude, none of the answer options fit, or they don’t want to answer ◮ Refuse In theory, this response originates from respondents who do not want to provide their answer to the question • Incidence of outright refusals is generally very low. In practice, reasons for refusing are as murky as with “don’t know” ◮ Skip This response is completely ambiguous and frequently occurs on web surveys. We have no insight into the motivation for skipping an item

Types of Item Nonresponse: Two Types that Matter ◮ The ambiguity in “don’t know”, refuse, and skip provide little help for analysis. ◮ A more useful typology of item nonresponse is: • True Nonattitudes The respondent truly does not have an opinion or factual knowledge required to answer the question, often for knowable reasons - a lack of education, experience, or interest in the topic • Hidden Attitudes The respondent has an opinion or the factual knowledge required to answer the question, but for some (often unknowable) reason, they choose not to disclose it

Types of Item Nonresponse: Two Types that Matter ◮ Implications of two types: • The reason for item nonresponse cannot be assumed consistent across - or even within - respondents: – The type of missingness can change within a single variable due to different respondents – It can also change within respondents, as they will have distinct reasons for not answering different questions • Result: We’re no longer thinking about rows of respondents and columns of variables - we’re thinking about every cell independently

Implications for Missing Data Treatment: Reminder of Missing Data Types ◮ Goal: L ( β | X , Y ) unbiased, efficient, and with the correct standard error. ◮ Define: Z mis = ( X mis , Y mis ) Z obs = ( X obs , Y obs ) ◮ We stipulate a n × k matrix, R , corresponding to X that contains 0 when the X matrix data value is not missing, and 1 when it is missing. ◮ Stipulate a probability model for R where φ is a parameter in this distribution of R . ◮ Standard Terms from Rubin (1979) for the missing data: MCAR p ( R | Z obs , Z mis ) = p ( R | φ ) missingness not related to observed or unobserved MAR p ( R | Z obs , Z mis ) = p ( R | Z obs , φ ) missingness depends only on observed data Non-Ignorable p ( R | Z obs , Z mis ) = p ( R | Z obs , Z mis , φ ) missingness depends on unobserved data

Implications for Missing Data Treatment: Two Types of Item Nonresponse True Nonattitudes ◮ When people genuinely don’t know the answer to a factual question or don’t have an attitude to report, typically it’s because they lack education, experience, or interest in the topic ◮ Those factors are generally measured in a survey. Thus, missingness caused by a true nonattitude could be MAR Hidden Attitudes ◮ Social desirability, the interviewer interaction, views of the topic as private, and many other possible justifications exist for why a respondent would decline to state an attitude, in addition to education, experience, and interest. ◮ Given that we do not/cannot measure these factors in the survey, hidden attitudes are likely Non- Ignorable/NMAR

Implications for Missing Data Treatment: Two Types of Item Nonresponse Analysis questions: ◮ Are these two causes of missing data discernible in survey data: Can we model the difference between true nonattitudes and hidden attitudes? ◮ Can we design a model that identifies whether a missing cell is likely to be a true nonattitude or a hidden attitude?

Data: ANES 2012, Face-to-Face Sample ◮ Dataset chosen because of a question setup that allows a more direct test of this typology than usual ◮ Working with the face-to-face sample only • 2012 was the first year a web sample was done simultaneously, but given mode differences the web portion is not used here • It’s unclear how “don’t know” and refuse was handled in the published questionnaires, but it appears explicit options were not given • Item nonresponse is generally much lower in web sample, not a clear comparison to face-to-face sample

Data: The Test Questions ◮ A two-question setup on the respondent ideology question: ◮ libcpre self asks all respondents to place themselves on the standard 7-point ideology scale (1=extremely liberal), provides “haven’t thought much about it” as an out • 590/2054 respondents did not give a substantive response ◮ libcpre choose asks those who said “moderate,” “haven’t thought much about it” or any other non-response if they had to choose, whether they would select liberal or conservative • 128 respondents still did not give a substantive response when libcpre choose is combined with libcpre self • 462 respondents gave a substantive answer to libcpre choose but did not to libcpre self

Data: The Test Questions ◮ The 128 respondents who didn’t answer the ideology question twice can be thought of as the “true nonattitudes” • It’s possible they are just really stubborn hidden attitudes, but the text of the libcpre choose question - “if you had to choose” - is designed to ferret those out. We’re making the assumption that it worked ◮ The 462 respondents who answered on the second round can be thought of as the “hidden attitudes” • It’s possible a few true nonattitudes provided answers in order to mollify interviewers. We’re making the assumption that those cases are few ◮ These are important assumptions that we aren’t able to definitively prove - yet there is some evi- dence in the data to support them

Distinguishing Between Types of Item Nonresponse: Hidden Attitudes vs. True Nonattitudes Figure 1: Modeling Nonresponse to Ideology Question

Distinguishing Between Types of Item Nonresponse: Hidden Attitudes vs. True Nonattitudes ◮ Overall: Nonresponders are more likely to be female, nonwhite, less interested, Demo- crat/Independent, lower education ◮ True nonattitudes compared to hidden attitudes: Less interested, more likely Independent, maybe less likely nonwhite ◮ Model fit is much better for explaining true nonattitudes than it is for explaining all missingness or the hidden attitudes

Distinguishing Between Types of Item Nonresponse: Hidden Attitudes vs. True Nonattitudes ◮ Tentative conclusion: We have a much better handle on what causes true nonattitudes, and it hinges on attitudinal variables ◮ Back to our analysis questions: • Are these two causes of missing data discernible in survey data: Can we model the difference between true nonattitudes and hidden attitudes? Yes • Now we turn to imputation questions: Can we treat all missingness equally given the theoret- ical distinction between true nonattitudes and hidden attitudes? Is it appropriate to impute a response for analytical purposes when the respondent truly does not have an opinion?

Effects on Imputed Data: Hot Deck Imputation ◮ R package hot.deck used for imputation given missingness on categorical variable ( libcpre self is 29% missing) • Multiple hot deck imputation for categorical variables (Cranmer & Gill 2013) • Uses mice for continuous variables • 10 imputations per dataset ◮ Two datasets imputed: • All 2054 cases • 1926 cases imputed, 128 true nonattitudes dropped

Effects on Imputed Data: Imputed Model Comparison Figure 2: Coefficients Across 4 Models

Effects on Imputed Data: Imputed Model Comparison Figure 3: Ideology Coefficients Across 4 Models

Takeaways ◮ There is some important difference between the true nonattitudes and revealed hidden attitudes ◮ There is a clear hierarchy of model fit when those values are imputed: • Imputed all 590 missing cases, Imputed 128 true nonattitudes, Imputed 462 missing cases with 128 deleted, No imputation needed ◮ And this is only for one variable in the model - scrutinizing all variables would presumably result in more differences ◮ That indicates we should consider not imputing true nonattitudes ◮ At minimum, we need to be aware of the differences in type of item nonresponse that we’re imputing

Testing and Imputing Item Nonresponse as Missing Data, with Big and - PowerPoint PPT Presentation

Testing and Imputing Item Nonresponse as Missing Data, with Big and Normal Survey Data NATALIE JACKSON Director of Research PRRI JEFF GILL Distinguished Professor, Government, Mathematics & Statistics American University Overview

Approaches to imputing missing data in complex survey data Christine Wells, Ph.D. IDRE UCLA

During and after fieldwork: Fieldwork monitoring, quality control and several ways to assess

Nonresponse Bias J. Michael Brick, Westat Roger Tourangeau, Westat Adaptive Survey Design

Imputing Missing Social Media Data Stream in Multisensor Studies of Human Behavior Saha, K. ,

Imputing missing values in satellite data: From parametric to non-parametric approaches

Multiple Imputation for Missing Data in KLoSA Juwon Song Korea University and UCLA Contents 1.

Missing Data and Imputation NINA ORWITZ OCTOBER 30 TH , 2017 Outline Types of missing data

Imputing using fancyimpute DEALIN G W ITH MIS S IN G DATA IN P YTH ON Suraj Donthi Deep

Mental Health Adult Pre-Charge Diversion Program Agenda Why Pre-Charge Diversion? Item 1 Item 1

Missing Values in SAS Magnus Mengelbier Director PhUSE 2011 1 Topics Introduction

Pathfinder: /| child::element person { item* } (iter, item1) /| child::element closed_auction {

Searching for and replacing missing values Nicholas Tierney Statistician DataCamp Dealing With

Bayesian Generalized linear mixed models with data missing not at random Overview: Two simple

Levels of Testing Chapter 12 Beyond unit testing Developer Testing stages Unit testing

Testing Terminology System testing Types of errors Function testing Structure

Missing data and data imputation with the Swiss Household Panel Andr Berchtold LIVES, LINES,

Hierarchies Biswabandan Panda WOCS 2018 December 8 th , 2018 Memory in Single-core Systems DRAM

Dynamic and Adaptive Calling Context Encoding Jianjun Li , Zhenjiang Wang, Chenggang Wu State Key

CORDEX_CORE @ ICTP http://www.cordex.org/experiment-guidelines/cordex-core Where are the data?

sss r r

Minimal and Canonical Images Chris Je ff erson, Markus Pfei ff er, Rebecca Waldecker

Request Tracking in DROPS (Diplomverteidigung) Bj orn D obel

IMPROVING PROGRAMS THROUGH SOURCE CODE TRANSFORMATIONS Disserta(on

Please note: This syllabus was used for the first 5 weeks of this course, prior to the College