Robust Fay Herriot Estimators in Small Area Estimation Sebastian Warnholz Statistical Consultancy – FU Berlin 5th May 2016
Outline ◮ Small Area Estimation ◮ Area Level Models ◮ Robust Area Level Models ◮ Example & Simulation Study , S3RI Research Seminars 2
Small Area Estimation ◮ SAE: Estimation of population parameters for small domains / areas ◮ Problem: Direct estimations may have insufficient precision (variance) ◮ Estimations may be based on survey data which was not designed to make predictions for small domains ◮ Very view or no sampled units are available within target domains ◮ Methods used in SAE borrow strength to improve domain predictions by ◮ using additional data sources ◮ exploiting correlation structures (space and time) ◮ often models , S3RI Research Seminars 3
Models in SAE ◮ Area level models: ◮ Use information on the area level, e.g. aggregates like a direct estimator ◮ Are used when unit level information is not available ◮ May be useful to reduce computational complexity ◮ Unit level models: ◮ Use the sampled observations directly ◮ May provide more precise parameter estimates due to increased number of observations , S3RI Research Seminars 4
Area Level Models ◮ Fay and Herriot (1979): ◮ ¯ y i = θ i + e i ; e i ∼ N ( 0 , σ 2 ei ) ; i = 1 , . . . , D ◮ θ i = x ⊤ i β + v i ; v i ∼ N ( 0 , σ 2 v ) ◮ And combined, an estimator for the population mean can be derived: ˆ i ˆ θ FH γ i ) x ⊤ = ˆ γ i ¯ y i + ( 1 − ˆ β i with σ 2 ˆ v γ i = ˆ σ 2 v + σ 2 ˆ ei ◮ When σ 2 σ 2 ei >> ˆ v we rely more on the synthetic estimator ◮ When σ 2 σ 2 ei << ˆ v the direct estimator is preferred ◮ σ 2 ei is assumed to be known under the model – in practice we may use the sampling variance , S3RI Research Seminars 5
Outliers in Area Level Models y i = x ⊤ ¯ i β + v i + e i ◮ Area level outliers are outliers in the random effect: v i – i.e. all units within a domain are outlying ◮ Here a robust method can be beneficial ◮ Unit level outliers are outliers in e i – single units ◮ We may use estimated sampling variances for σ 2 ei ; then the FH model will automatically plug-in the synthetic estimator ◮ When the sampling variances are unreliable they may be replaced using a more stable estimate based on generalised variance functions , S3RI Research Seminars 6
Robust Area Level Methods – Review ◮ When framed as a violation of the distributional assumption (of v i ): ◮ Transform the response, i.e. the direct estimator – Sugasawa and Kubokawa (2015) ◮ Replace the distribution (e.g.) ◮ generalised normal: Fabrizi and Trivisano (2010) ◮ t-distribution: Bell and Huang (2006) ◮ Cauchy distribution: Datta and Lahiri (1995) ◮ When we still believe in the normal distribution: ◮ Use influence functions in the context Hierarchical Bayes: Ghosh, Maiti and Roy (2008) ◮ Use influence functions in the context of linear mixed models: Sinha and Rao (2009) ◮ M-Quantile regression: Chambers and Tzavidis (2006) , S3RI Research Seminars 7
Robust Area Level Methods – Method ◮ Here the method by Sinha and Rao (2009) is adapted for area level models framed as linear mixed model y ∼ N ( X β , ZV v Z ⊤ + V e ) � �� � V ◮ Restrict the influence of the residuals in ML estimation equations. E.g. for the regression parameters we use: 1 2 ψ ( U − 1 X ⊤ V − 1 U 2 ( y − X β )) = 0 instead of X ⊤ V − 1 ( y − X β ) = 0 , S3RI Research Seminars 8
Robust Area Level Methods – Method ◮ Solving these robust estimation equations leads to outlier robust β ψ and σ 2 ,ψ parameter estimates, ˆ , and outlier robust predictions: v v ψ ˆ i β ψ + ˆ ˆ i ˆ v ψ θ RFH = x ⊤ i i ◮ In the setting of linear mixed models this representation is the robust empirical best linear unbiased prediction (REBLUP) ◮ The MSE of these predictions can be computed using a parametric bootstrap or an approximation based on the results of Chambers, Chandra and Tzavidis (2011) , S3RI Research Seminars 9
Robust Area Level Methods – Extensions ◮ Framed as linear mixed effects models we can incorporate spatial and temporal correlation in the random effects: ◮ Simultanous autoregressive process – Pratesi and Salvati (2008) ◮ Random intercept + temporal autocorrelation – Rao and Yu (1994) ◮ Combining spatial and temporal correlation – Marhuenda et.al. (2013) ◮ The same idea for robust predictions can be used for these methods , S3RI Research Seminars 10
Robust Area Level Methods – Optimisation ◮ Sinha and Rao (2009) derived Newton-Raphson algorithms based on a Taylor series expansion of the estimation equations (unit level models) ◮ Schmid (2011) minimised the squared estimation equations for variance components – more stable ◮ Schoch (2012) uses a IRWLS algorithm for β and a robust method of moments estimator for the variance parameters – more stable for starting values ◮ Chatrchi (2012) uses a fixed point algorithm for variance components – slow but stable for starting values ◮ For area level models: ◮ IRWLS algorithm for the regression parameters ◮ Fixed-point algorithm for the random effects ◮ For variance components: ◮ Fixed point algorithm for variances ◮ Newton-Raphson for correlation parameters , S3RI Research Seminars 11
Robust Area Level Methods – Software ◮ R-packages: ◮ rsae – implements the methods by Schoch (2012) for unit level models ◮ saeRobust (about to be released) – implements the presented methods for ◮ Standard RFH ◮ Spatial RFH ◮ Temporal RFH ◮ Spatio-Temporal RFH , S3RI Research Seminars 12
CBS Data Example ◮ The target statistic is the mean tax turnover of 20 industry sectors in the Netherlands ◮ Available is a synthetic population with 63981 observations ◮ Based on the Structural Business Survey (SBS) which is an annual survey in the Netherlands conducted by CBS ◮ In this example one sample is drawn similar to the design in the SBS: ◮ Stratified for the size class (employee) of firms ◮ SRSWOR within each stratum ◮ Large firms are selected with probability one ◮ Sample sizes range between 9 and 1052; 5074 overall ◮ This is repeated 500 times and compared to the population parameters , S3RI Research Seminars 13
Modeling Strategy ¯ y i = β 0 + β 1 ¯ y i , t − 1 + v i + e i ◮ ¯ y i is the direct estimator based on the HT estimator ◮ ¯ y i , t − 1 is the true tax turnover from the previous period ◮ The sampling variances under the FH model, σ 2 ei , are either based on the estimated standard error of the direct estimator; or smoothed using a generalised variance function , S3RI Research Seminars 14
QQ Plots RFH FH Random Effects 0.02 0.00 −0.02 −0.04 residuals / sqrt(samplingVar) 2 1 0 −1.00 −2 −3 −2 −1 0 1 2 −2 −1 0 1 2 theoretical theoretical , S3RI Research Seminars 15
Coefficient of Variation 60 direct eblup reblup 40 CV in % 20 0 5 10 15 20 direct eblup reblup domain (sorted by increasing CV of direct) , S3RI Research Seminars 16
RBIAS & RRMSE RFH.GVF FH.GVF RFH FH Direct −20 −10 0 0 20 40 60 80 RBIAS in % RRMSE in % , S3RI Research Seminars 17
Discussion ◮ Outlier robust predictions may be beneficial to address area level outliers ◮ Unit level outliers? ◮ MSE estimation is problematic in scenarios where the estimated variance of the random effect is very small , S3RI Research Seminars 18
Thank you for your attention! Sebastian Warnholz ( Sebastian.Warnholz@fu-berlin.de ) , S3RI Research Seminars 19
Bibliography ◮ Bell / Huang (2006): Using the t-distribution to Deal with Outliers in Small Area Estimation, Proceedings of Statistics Canada Symposium 2006: Methodological Issues in Measuring Population Health ◮ Chatrchi (2012): Robust Estimation of Variance Components in Small Area Estimation, MA thesis, School of Mathematics and Statistics, Carleton University, Ottawa, Canada ◮ Datta / Lahiri (1995): Robust Hierarchical Bayes Estimation of Small Area Characteristics in the Presence of Covariates and Outliers, Journal of Multivariate Analysis 54, pp. 310–328 ◮ Ghosh / Maiti / Roy (2008): Influence functions and robust Bayes and empirical Bayes small area estimation. Biometrika 95.3, pp. 573–585 ◮ Fabrizi / Trivisano (2010): Robust Linear Mixed Models for Small Area Estimation, Journal of Statistical Planning and Inference 140, 433–43 , S3RI Research Seminars 20
Bibliography ◮ Fay / Herriot (1979): Estimation of income for small places: An application of james-stein procedures to census data, Journal of the American Statistical Association 74 (366), 269–277 ◮ Gershunskaya (2010): Robust Small Area Estimation Using a Mixture Model, Section on Survey Methods, JSM, 2783-2796 ◮ Marhuenda / Molina / Morales (2013): Small area estimation with spatio-temporal Fay-Herriot models, Computational Statistics and Data Analysis 58, pp. 308–325 ◮ Pratesi / Salvati (2008): Small area estimation: the EBLUP estimator based on spatially correlated random area effects, Statistical Methods & Applications 17, pp. 113–141 ◮ Rao / Yu (1994): Small-Area Estimation by Combining Time-Series and Cross-Sectional Data, Canadian Journal of Statistics 22.4, pp. 511–528 ◮ Schmid (2011): Spatial Robust Small Area Estimation applied on Business Data, PhD thesis, University of Trier , S3RI Research Seminars 21
Recommend
More recommend