Assessing and adjusting bias deriving from mode effect in mixed mode social surveys C. De Vitiis, A. Guandalini, F. Inglese and M.D. Terribili (ISTAT, Italian National Statistical Institute) Session 29 29th June 2018
Summary 1. The Mixed Mode in social surveys 2. The survey context 3. The analyses of mode effect treatment for the Aspect of Daily Life survey 4. Final considerations and future developments
1. Mixed mode in social surveys Why mixed modes? To contrast declining response rates and coverage, reducing also the total cost of the surveys The use of different data collection techniques helps in contacting different types of respondents in the most suitable way for each of them so allowing a gain in population coverage and response rate Which drawbacks has this choice? The difficulty of control over mode effects and the confounding between selection and measurement effect Mode effect refers strictly to measurement error differences due to the mode of survey administration A selection effect generally occurs, due to the differences in the distributions of the respondents to the alternative modes, even if this is a desirable aspect of MM strategy How and when dealing with mode effect? Mainly in the planning of the survey (questionnaire and survey design) to limit measurement error as much as possible In the estimation phase to treat mainly the selection effect, while estimating the measurement effect
1. Mixed mode in social surveys The focus of the present work is the illustration of the experimentation plan for the treatment of mode effect in the web/PAPI “Multipurpose Survey on Aspect of daily life” (ADL) survey. Through the linkage of survey data with administrative data we exploit the auxiliary variables to define mixed mode models. The final goal is to make an assessment of the introduction of the mixed mode and define an estimation strategy for the future editions of the survey
2. The survey context The sample survey “Multipurpose survey on households: Aspects of daily” Collects information about recreational and cultural activities in free time, such as sports, reading, cinema, music, the Internet, social relations, issues for the quality of life of people Based on a sample of about 24.000 households, selected through a two stage sample design (municipalities/households) from the centralized municipal register (LAC) Mixed technique: sequential web-PAPI A self-compiled questionnaire (web) proposed in the inviting letter sent by ISTAT and after, on non respondent households, direct interview with a questionnaire on paper with an interviewer (PAPI) In 2017: sequential web/PAPI with a control single mode sample PAPI The selected sample of individuals was linked to an administrative data base (Archimede Project) through the individual code available from the selection frame to obtain external auxiliary variables
2. The survey context To treat mode effect the use of models is advisable and the availability of auxiliary variables is a crucial issue External sources: from registers or administrative data, socio-demographic and economic variables Survey variables: Mode insensitive socio-demographic variables; Mode preference (not yet introduced at ISTAT); Paradata (information about data collection phase) Auxiliary variables in ADL survey at household level: Household type: one-component under 55, one-component over 54, couple with children at least one under 25, couple with children without under 25, couple without children, one parent at least one under 25, one parent without under 25, other types Higher education level: below/equal/above high school diploma Occupation type: Prevalence of: employed, self employed, not in labor age, mixed types Municipal type: Metropolitan cities, metropolitan area, other municipalities <2000, 2000-10000, 10000-50000, >50000 Income class: 5 quintiles (€ 11.955, 20.892, 30.028, 46.119) Citizenship: Italian/Foreign household
3. The analyses on ADL survey data The aim of the presented analyses is to Evaluate first the impact on the estimates of the survey of the introduction of mixed mode design with respect to the previous single mode design (control SM sample) To analyze in depth the reasons that determine significant differences in the estimates obtained with the two designs For this purpose, the study is developed on two main levels of analysis: the first level is based on the comparison between the two samples SM and MM tests were performed on the differences in the estimates calculated on the two sample, SM and MM analyses were conducted to study the bias caused by the total nonresponse in the two samples Total response rates and indicators of response representativeness were evaluated in order to identify differences (especially in terms of magnitude of the bias) that could explain the differences in the estimates of the survey produced with the SM and MM samples the second level investigates the mode effect (selection and measurement) of the samples of respondents web and PAPI in the MM design analysis of the mode effect in the MM sample was carried out using methods that make the samples of respondents web and PAPI comparable, as propensity score (Rosenbaum and Rubin, 1983), to study the selection effect and the measurement effect of some target variables of the survey
3. The analyses on ADL survey data Test of differences between estimates To evaluate the differences between the estimates of the main parameters of interest of the survey, obtained with the mixed and the single mode samples, hypothesis tests were carried out (Martin and Lynn, 2011). Test of the differences in proportions through t-test, while the independence between the distributions were evaluated as a whole through the Chi-square test The hypothesis tests concerned the following estimates: Satisfaction for life (Satisfaction) Health conditions (Health) Valuation of the economic situation compared to the previous year (EcoSit) Reading books in the last 12 months (Books) Frequency of seeing friends (Friends) Habit to smoke (Smoke) The difference for Satisfaction, Books and Friends resulted significant
3. The analyses on ADL survey data Test of differences between estimates Table 1. Estimates for “Reading books” in SM and MM samples No Yes NR MM 54.8 41.6 3.6 SM 57.5 39.9 2.6 t-test <.0001 0.0004 <.0001 𝜓 � <.0001 Table 2. Estimates for “Seeing friends” in SM and MM samples Everyday Sometimes a Once a Sometimes a Sometimes a Never No friends NR week week month year MM 14.8 26.0 20.4 19.4 10.8 5.2 2.0 1.5 SM 17.3 27.2 20.8 18.6 8.3 5.0 1.6 1.1 t-test <.0001 0.0018 0.3271 0.0303 <.0001 0.5372 0.0029 0.0027 𝜓 � <.0001
3. The analyses on ADL survey data Test of differences between response rates To assess whether the response rate distributions are independent from the individual structural variables, the hypothesis of independence between the response and the variables was tested, in the two samples The structural variables influence the response in both samples Table 3. Independence test between response and auxiliary variables in SM and MM samples c 2 Sample Auxiliary variables DF p-value Geographical area 4 131.1118 <.0001 Municipal typology 5 293,713 <.0001 SM sample Household typology by number of (PAPI) components and age 6 295.3983 <.0001 Income class 4 270.174 <.0001 Nationality 2 567.6386 <.0001 Geographical area 4 91.9192 <.0001 Municipal typology 5 268,3902 <.0001 Household typology by number of MM sample (web/PAPI) components and age 6 142.5824 <.0001 Income class 4 127.8876 <.0001 Nationality 2 168.9341 <.0001
3. The analyses on ADL survey data Test of differences between response rates Analysis is also carried out on the PAPI component of the two samples The result shows that the distribution by geographical area and income class of the respondents to PAPI is not independent from whether they were selected for the PAPI or web/PAPI samples Table 4. Independence test between auxiliary variables in PAPI respondent of SM and MM samples c 2 Variable DF p-value Geographical area 4 186.5848 <.0001 Municipal typology 5 17,3572 0,0039 Household typology by number of components and age 6 6.2375 0.3971 Income class 4 144.5565 <.0001 Nationality 2 6.2907 0.0431
3. The analyses on ADL survey data Analysis of total nonresponse bias R -indicators are based on a measure of the variability of the response propensity and describe how the sample of respondents to a survey reflects the population of interest with respect to certain characteristics 1 R 2 S X X � � MM sample of respondents deviates less from the representative response with respect to the SM sample Table 5. R-indicators in SM and MM samples R_Indicator SM sample MM sample R 0.812 0.852 X ˆ R ˆ 0.814 0.854 X
Recommend
More recommend