robust fay herriot estimators in small area estimation
play

Robust Fay Herriot Estimators in Small Area Estimation Sebastian - PowerPoint PPT Presentation

Robust Fay Herriot Estimators in Small Area Estimation Sebastian Warnholz Statistical Consultancy FU Berlin 5th May 2016 Outline Small Area Estimation Area Level Models Robust Area Level Models Example & Simulation Study


  1. Robust Fay Herriot Estimators in Small Area Estimation Sebastian Warnholz Statistical Consultancy – FU Berlin 5th May 2016

  2. Outline ◮ Small Area Estimation ◮ Area Level Models ◮ Robust Area Level Models ◮ Example & Simulation Study , S3RI Research Seminars 2

  3. Small Area Estimation ◮ SAE: Estimation of population parameters for small domains / areas ◮ Problem: Direct estimations may have insufficient precision (variance) ◮ Estimations may be based on survey data which was not designed to make predictions for small domains ◮ Very view or no sampled units are available within target domains ◮ Methods used in SAE borrow strength to improve domain predictions by ◮ using additional data sources ◮ exploiting correlation structures (space and time) ◮ often models , S3RI Research Seminars 3

  4. Models in SAE ◮ Area level models: ◮ Use information on the area level, e.g. aggregates like a direct estimator ◮ Are used when unit level information is not available ◮ May be useful to reduce computational complexity ◮ Unit level models: ◮ Use the sampled observations directly ◮ May provide more precise parameter estimates due to increased number of observations , S3RI Research Seminars 4

  5. Area Level Models ◮ Fay and Herriot (1979): ◮ ¯ y i = θ i + e i ; e i ∼ N ( 0 , σ 2 ei ) ; i = 1 , . . . , D ◮ θ i = x ⊤ i β + v i ; v i ∼ N ( 0 , σ 2 v ) ◮ And combined, an estimator for the population mean can be derived: ˆ i ˆ θ FH γ i ) x ⊤ = ˆ γ i ¯ y i + ( 1 − ˆ β i with σ 2 ˆ v γ i = ˆ σ 2 v + σ 2 ˆ ei ◮ When σ 2 σ 2 ei >> ˆ v we rely more on the synthetic estimator ◮ When σ 2 σ 2 ei << ˆ v the direct estimator is preferred ◮ σ 2 ei is assumed to be known under the model – in practice we may use the sampling variance , S3RI Research Seminars 5

  6. Outliers in Area Level Models y i = x ⊤ ¯ i β + v i + e i ◮ Area level outliers are outliers in the random effect: v i – i.e. all units within a domain are outlying ◮ Here a robust method can be beneficial ◮ Unit level outliers are outliers in e i – single units ◮ We may use estimated sampling variances for σ 2 ei ; then the FH model will automatically plug-in the synthetic estimator ◮ When the sampling variances are unreliable they may be replaced using a more stable estimate based on generalised variance functions , S3RI Research Seminars 6

  7. Robust Area Level Methods – Review ◮ When framed as a violation of the distributional assumption (of v i ): ◮ Transform the response, i.e. the direct estimator – Sugasawa and Kubokawa (2015) ◮ Replace the distribution (e.g.) ◮ generalised normal: Fabrizi and Trivisano (2010) ◮ t-distribution: Bell and Huang (2006) ◮ Cauchy distribution: Datta and Lahiri (1995) ◮ When we still believe in the normal distribution: ◮ Use influence functions in the context Hierarchical Bayes: Ghosh, Maiti and Roy (2008) ◮ Use influence functions in the context of linear mixed models: Sinha and Rao (2009) ◮ M-Quantile regression: Chambers and Tzavidis (2006) , S3RI Research Seminars 7

  8. Robust Area Level Methods – Method ◮ Here the method by Sinha and Rao (2009) is adapted for area level models framed as linear mixed model y ∼ N ( X β , ZV v Z ⊤ + V e ) � �� � V ◮ Restrict the influence of the residuals in ML estimation equations. E.g. for the regression parameters we use: 1 2 ψ ( U − 1 X ⊤ V − 1 U 2 ( y − X β )) = 0 instead of X ⊤ V − 1 ( y − X β ) = 0 , S3RI Research Seminars 8

  9. Robust Area Level Methods – Method ◮ Solving these robust estimation equations leads to outlier robust β ψ and σ 2 ,ψ parameter estimates, ˆ , and outlier robust predictions: v v ψ ˆ i β ψ + ˆ ˆ i ˆ v ψ θ RFH = x ⊤ i i ◮ In the setting of linear mixed models this representation is the robust empirical best linear unbiased prediction (REBLUP) ◮ The MSE of these predictions can be computed using a parametric bootstrap or an approximation based on the results of Chambers, Chandra and Tzavidis (2011) , S3RI Research Seminars 9

  10. Robust Area Level Methods – Extensions ◮ Framed as linear mixed effects models we can incorporate spatial and temporal correlation in the random effects: ◮ Simultanous autoregressive process – Pratesi and Salvati (2008) ◮ Random intercept + temporal autocorrelation – Rao and Yu (1994) ◮ Combining spatial and temporal correlation – Marhuenda et.al. (2013) ◮ The same idea for robust predictions can be used for these methods , S3RI Research Seminars 10

  11. Robust Area Level Methods – Optimisation ◮ Sinha and Rao (2009) derived Newton-Raphson algorithms based on a Taylor series expansion of the estimation equations (unit level models) ◮ Schmid (2011) minimised the squared estimation equations for variance components – more stable ◮ Schoch (2012) uses a IRWLS algorithm for β and a robust method of moments estimator for the variance parameters – more stable for starting values ◮ Chatrchi (2012) uses a fixed point algorithm for variance components – slow but stable for starting values ◮ For area level models: ◮ IRWLS algorithm for the regression parameters ◮ Fixed-point algorithm for the random effects ◮ For variance components: ◮ Fixed point algorithm for variances ◮ Newton-Raphson for correlation parameters , S3RI Research Seminars 11

  12. Robust Area Level Methods – Software ◮ R-packages: ◮ rsae – implements the methods by Schoch (2012) for unit level models ◮ saeRobust (about to be released) – implements the presented methods for ◮ Standard RFH ◮ Spatial RFH ◮ Temporal RFH ◮ Spatio-Temporal RFH , S3RI Research Seminars 12

  13. CBS Data Example ◮ The target statistic is the mean tax turnover of 20 industry sectors in the Netherlands ◮ Available is a synthetic population with 63981 observations ◮ Based on the Structural Business Survey (SBS) which is an annual survey in the Netherlands conducted by CBS ◮ In this example one sample is drawn similar to the design in the SBS: ◮ Stratified for the size class (employee) of firms ◮ SRSWOR within each stratum ◮ Large firms are selected with probability one ◮ Sample sizes range between 9 and 1052; 5074 overall ◮ This is repeated 500 times and compared to the population parameters , S3RI Research Seminars 13

  14. Modeling Strategy ¯ y i = β 0 + β 1 ¯ y i , t − 1 + v i + e i ◮ ¯ y i is the direct estimator based on the HT estimator ◮ ¯ y i , t − 1 is the true tax turnover from the previous period ◮ The sampling variances under the FH model, σ 2 ei , are either based on the estimated standard error of the direct estimator; or smoothed using a generalised variance function , S3RI Research Seminars 14

  15. QQ Plots RFH FH Random Effects 0.02 0.00 −0.02 −0.04 residuals / sqrt(samplingVar) 2 1 0 −1.00 −2 −3 −2 −1 0 1 2 −2 −1 0 1 2 theoretical theoretical , S3RI Research Seminars 15

  16. Coefficient of Variation 60 direct eblup reblup 40 CV in % 20 0 5 10 15 20 direct eblup reblup domain (sorted by increasing CV of direct) , S3RI Research Seminars 16

  17. RBIAS & RRMSE RFH.GVF FH.GVF RFH FH Direct −20 −10 0 0 20 40 60 80 RBIAS in % RRMSE in % , S3RI Research Seminars 17

  18. Discussion ◮ Outlier robust predictions may be beneficial to address area level outliers ◮ Unit level outliers? ◮ MSE estimation is problematic in scenarios where the estimated variance of the random effect is very small , S3RI Research Seminars 18

  19. Thank you for your attention! Sebastian Warnholz ( Sebastian.Warnholz@fu-berlin.de ) , S3RI Research Seminars 19

  20. Bibliography ◮ Bell / Huang (2006): Using the t-distribution to Deal with Outliers in Small Area Estimation, Proceedings of Statistics Canada Symposium 2006: Methodological Issues in Measuring Population Health ◮ Chatrchi (2012): Robust Estimation of Variance Components in Small Area Estimation, MA thesis, School of Mathematics and Statistics, Carleton University, Ottawa, Canada ◮ Datta / Lahiri (1995): Robust Hierarchical Bayes Estimation of Small Area Characteristics in the Presence of Covariates and Outliers, Journal of Multivariate Analysis 54, pp. 310–328 ◮ Ghosh / Maiti / Roy (2008): Influence functions and robust Bayes and empirical Bayes small area estimation. Biometrika 95.3, pp. 573–585 ◮ Fabrizi / Trivisano (2010): Robust Linear Mixed Models for Small Area Estimation, Journal of Statistical Planning and Inference 140, 433–43 , S3RI Research Seminars 20

  21. Bibliography ◮ Fay / Herriot (1979): Estimation of income for small places: An application of james-stein procedures to census data, Journal of the American Statistical Association 74 (366), 269–277 ◮ Gershunskaya (2010): Robust Small Area Estimation Using a Mixture Model, Section on Survey Methods, JSM, 2783-2796 ◮ Marhuenda / Molina / Morales (2013): Small area estimation with spatio-temporal Fay-Herriot models, Computational Statistics and Data Analysis 58, pp. 308–325 ◮ Pratesi / Salvati (2008): Small area estimation: the EBLUP estimator based on spatially correlated random area effects, Statistical Methods & Applications 17, pp. 113–141 ◮ Rao / Yu (1994): Small-Area Estimation by Combining Time-Series and Cross-Sectional Data, Canadian Journal of Statistics 22.4, pp. 511–528 ◮ Schmid (2011): Spatial Robust Small Area Estimation applied on Business Data, PhD thesis, University of Trier , S3RI Research Seminars 21

Recommend


More recommend