combining estimates from related surveys via bivariate
play

Combining Estimates from Related Surveys via Bivariate Models - PowerPoint PPT Presentation

Combining Estimates from Related Surveys via Bivariate Models (Application: using ACS estimates to improve estimates from smaller U.S. surveys) William R. Bell and Carolina Franco, U.S. Census Bureau 2016 Ross-Royall Symposium February 26,


  1. Combining Estimates from Related Surveys via Bivariate Models (Application: using ACS estimates to improve estimates from smaller U.S. surveys) William R. Bell and Carolina Franco, U.S. Census Bureau 2016 Ross-Royall Symposium February 26, 2016 Bell & Franco () Combining estimates from related surveys February 26, 2016 1 / 17

  2. Disclaimer: This report is released to inform interested parties of ongoing research and to encourage discussion. The views expressed on statistical, methodological, technical, or operational issues are those of the author(s) and not necessarily those of the U.S. Census Bureau. Bell & Franco () Combining estimates from related surveys February 26, 2016 2 / 17

  3. Introduction Investigate the potential of using bivariate models to borrow strength from estimates from a large survey to improve related estimates from smaller surveys. Motivation: “Large survey” is the Census Bureau’s American Community Survey (ACS), the largest U.S. household survey. Approach is simple and requires no covariates from auxiliary information. Real examples show that large reductions in standard errors of estimates are possible. Bell & Franco () Combining estimates from related surveys February 26, 2016 3 / 17

  4. ACS: The Largest U.S. Household Survey American Community Survey (ACS) Conducted annually (data collected throughout the year) and has replaced the decennial census long form sample. Samples approximately 3.5 million addresses each year. Encompasses a broad range of topics: demographic, income, health insurance, employment, disabilities, occupations, housing, education, veteran status, etc. Produces estimates annually based on 1 or 5 years of data. Bell & Franco () Combining estimates from related surveys February 26, 2016 4 / 17

  5. Three Smaller U.S. Surveys Survey of Income and Program Participation (SIPP) Disability Module Approx. 37,000 households and 70,000 persons in 2008 panel. Detailed questions about many di¤erent aspects of disability. National Health Interview Survey (NHIS) About 110,000 persons in Family Core component, 2013. Questions about a broad range of health topics asked in personal household interviews. Estimates used to track health status, health care access, and progress toward achieving national health objectives Current Population Survey (CPS) Annual Social and Economic Supplement . Samples about 100,000 addresses. Provides o¢cial national estimates of income and poverty. Bell & Franco () Combining estimates from related surveys February 26, 2016 5 / 17

  6. Four Applications SIPP estimates of U.S. state disability rates . 1 ACS variable: Estimate of state disability rates (types of disabilities and the time frames di¤er from SIPP). NHIS estimates of U.S. state uninsured rates . 2 ACS variable: Estimate of U.S. state uninsured rates (questions asked and the mode of survey delivery and design di¤er from NHIS). CPS estimates of per capita expenditure on health insurance 3 premiums by state ACS variable: Estimated per capita income by state. ACS 1-yr estimates (of anything! Take county rates of children 4 in poverty to illustrate) 2nd variable: Corresponding previous ACS 5-yr estimates (larger sample size, but less current). Bell & Franco () Combining estimates from related surveys February 26, 2016 6 / 17

  7. Univariate Gaussian Shrinkage Model for Survey Estimates For m small areas: y i = Y i + e i i = 1 , . . . , m Y i = µ + u i y i is the direct survey estimate of Y i , the population characteristic of interest for area i . e i is the sampling error in y i , generally assumed to be N ( 0 , v i ) , independent with v i known. u i is the area i random e¤ect, usually assumed to be i.i.d. N ( 0 , σ 2 u ) and independent of the e i . Bell & Franco () Combining estimates from related surveys February 26, 2016 7 / 17

  8. Shrinkage Estimation (Stein 1956, Carter and Rolph 1974) Best linear predictor of Y i ( µ and σ 2 known): ˆ Y i = ( 1 � γ i ) y i + γ i µ where v i γ i = v i + σ 2 u Weighted average ˆ Y i “shrinks” the direct estimate y i towards the overall mean µ . The smaller is the sampling variance v i the more weight is placed on the direct survey estimate y i . Parameters unknown: estimate by ML or REML, or take Bayesian approach. Fay and Herriot (1979) extended the approach to shrink y i towards a regression mean µ i = x 0 i β , and applied this approach to small area estimation. Bell & Franco () Combining estimates from related surveys February 26, 2016 8 / 17

  9. Bivariate Gaussian Model y 1 i = Y 1 i + e 1 i = ( µ 1 + u 1 i ) + e 1 i , i = 1 , . . . , m . y 2 i = Y 2 i + e 2 i = ( µ 2 + u 2 i ) + e 2 i � u 1 i � � σ 11 � σ 12 i . i . d � N ( 0 , Σ ) , Σ = u 2 i σ 12 σ 22 � e 1 i � � v 11 � 0 i . i . d � N ( 0 , V i ) , V i = e 2 i 0 σ 22 y 1 i is the direct estimate of the quantity of interest Y 1 i , and y 2 i is the direct estimate from another survey of a related quantity Y 2 i . Note that V i assumes the sampling errors e 1 i and e 2 i are uncorrelated. This can be generalized. The alternative of simply including y 2 i as a regression covariate in the model would ignore their sampling errors! Bell & Franco () Combining estimates from related surveys February 26, 2016 9 / 17

  10. Estimation/Inference for Model Parameters Unknown parameters: µ 1 , µ 2 , σ 11 , σ 22 , and σ 12 or ρ = σ 12 / p σ 11 σ 22 . Sampling variances v 1 i and v 2 i are treated as known (really estimated using survey microdata). Can estimate unknown parameters by ML or REML. We shall use a Bayesian approach with ‡at priors on µ 1 , µ 2 , σ 11 > 0 , σ 22 > 0 and ρ 2 ( � 1 , 1 ) . Approach was implemented in JAGS. Bell & Franco () Combining estimates from related surveys February 26, 2016 10 / 17

  11. Prediction When Model Parameters are Known In matrix notation y i = Y i + e i = ( µ + u i ) + e i ^ = E ( Y i j y i ) = µ + Σ ( Σ + V i ) � 1 ( y i � µ ) Y BP i MSE ( ^ Y BP ) = Var ( Y i j y i ) = Σ � Σ ( Σ + V i ) � 1 Σ i We are interested in predicting Y 1 i only, not Y 2 i Y BP ˆ is a linear combination of µ 1 , ( y 1 i � µ 1 ) , and ( y 2 i � µ 2 ) . 1 i Bell & Franco () Combining estimates from related surveys February 26, 2016 11 / 17

  12. MSE % Reductions from Shrinkage Estimation direct estimation to univariate shrinkage: � � 1 � Var ( Y 1 i j y 1 i ) 100 � v 1 i (more reduction as v 1 i increases) Bell & Franco () Combining estimates from related surveys February 26, 2016 12 / 17

  13. MSE % Reductions from Shrinkage Estimation direct estimation to univariate shrinkage: � � 1 � Var ( Y 1 i j y 1 i ) 100 � v 1 i (more reduction as v 1 i increases) univariate to bivariate shrinkage: � � 1 � Var ( Y 1 i j y 1 i , y 2 i ) 100 � Var ( Y 1 i j y 1 i ) (more reduction as v 2 i decreases and as ρ increases) Bell & Franco () Combining estimates from related surveys February 26, 2016 12 / 17

  14. MSE % Reductions from Shrinkage Estimation direct estimation to univariate shrinkage: � � 1 � Var ( Y 1 i j y 1 i ) 100 � v 1 i (more reduction as v 1 i increases) univariate to bivariate shrinkage: � � 1 � Var ( Y 1 i j y 1 i , y 2 i ) 100 � Var ( Y 1 i j y 1 i ) (more reduction as v 2 i decreases and as ρ increases) direct estimation to bivariate shrinkage: � � 1 � Var ( Y 1 i j y 1 i , y 2 i ) 100 � v 1 i Bell & Franco () Combining estimates from related surveys February 26, 2016 12 / 17

  15. Application I: 2010 Disability Rates for U.S. States: SIPP borrowing from ACS y 1 i = SIPP disability estimate, y 2 i = ACS disability estimate Smoothing of SIPP direct sampling variance estimates is applied. ρ = . 82 ˆ Univariate shrinkage yields an MSE decrease of 2 % � 67 % from direct, with a median of 19 % Bell & Franco () Combining estimates from related surveys February 26, 2016 13 / 17

  16. Application I: 2010 Disability Rates for U.S. States: SIPP borrowing from ACS y 1 i = SIPP disability estimate, y 2 i = ACS disability estimate Smoothing of SIPP direct sampling variance estimates is applied. ρ = . 82 ˆ Univariate shrinkage yields an MSE decrease of 2 % � 67 % from direct, with a median of 19 % The MSE decrease from bivariate vs. univariate model is 6 % � 59 % with a median of 29 % Bell & Franco () Combining estimates from related surveys February 26, 2016 13 / 17

  17. Application I: 2010 Disability Rates for U.S. States: SIPP borrowing from ACS y 1 i = SIPP disability estimate, y 2 i = ACS disability estimate Smoothing of SIPP direct sampling variance estimates is applied. ρ = . 82 ˆ Univariate shrinkage yields an MSE decrease of 2 % � 67 % from direct, with a median of 19 % The MSE decrease from bivariate vs. univariate model is 6 % � 59 % with a median of 29 % The MSE decrease from bivariate vs. direct is 8 � 86 % , with a median decrease of 43 % Bell & Franco () Combining estimates from related surveys February 26, 2016 13 / 17

Recommend


More recommend