CALIBRATION OF SMALL AREA ESTIMATES IN BUSINESS SURVEYS Rodolphe Priam, Natalie Shlomo Southampton Statistical Sciences Research Institute University of Southampton United Kingdom SAE, August 2011 The BLUE-ETS Project is financed by the grant agreement no: 244767 under Theme 8 of the 7th Framework Programme (FP7) of the European Union, Socio-economic Sciences and Humanities. Page 1 Trier- August 2011
BUSINESS SURVEYS • Statistical units are organisational entities in a country • Interested in small area/domain estimates • Business registers allow for unit level covariates • Distributions are typically skewed with outliers • Transformations, such as the log, to ensure normality assumptions Page 2 Trier- August 2011
SMALL AREA ESTIMATION • Central problem in many areas of social statistics. Recently used in business statistics. • Estimation of the mean in diverse domains Y Y Y Y Y 1 2 i m M area i ˆ ˆ ˆ ˆ ˆ Y ; Y ; Y ; … Y Y … 1 w 2 w i w m ; w M ; w � � � • True population mean � � and design-based estimate � ��� � ��� � because of small � � • Estimated small area mean (EBLUP) � Page 3 Trier- August 2011
SMALL AREA ESTIMATION AND BENCHMARKING • Small area estimation of the total in the different domains Y Y Y Y Y 1 2 i m M ˆ ˆ ˆ ˆ ˆ θ θ θ θ θ … … 1 ; y 2 ; y i ; y m ; y M ; y ~ � ˆ Problem: The total estimated by the model = θ should T w y i i ; y i � ˆ ˆ match the design based estimate of the population total = . T w Y y i i ; w i • Solution by benchmarking the estimates by appropriate method • Consequence of more robust estimation to misspecifications of the model. Page 4 Trier- August 2011
NESTED ERROR UNIT LEVEL MODEL • The Battese, Harter and Fuller (1988) (BHF) model for small areas i=1, …, M : = β + + Y X 1 u e i i N i i i • The target parameter of interest is the area mean: ′ = Y 1 Y / N i N i i i • The EBLUP for non-negligible sampling fractions: ) [ ] ˆ ( ˆ f ′ θ = + − β + ˆ f y 1 f X u i ; y i i i ic GLS i Page 5 Trier- August 2011
BENCHMARKING AT THE LINEAR SCALE (1/2) • Existing methods considered (see for instance Wang & al. (2008) ) ~ − 1 ˆ ˆ ˆ � The ratio method by multiplicative term: RT f f θ = θ T T i ; y y y i ; y ( ) 2 2 σ ˆ + σ ˆ N / n ) ( ~ ) ˆ ˆ ˆ VAR f f i u e i θ = θ + − T T � An additive term with variance weighting: � = ( i ; y i ; y y y m 2 2 2 σ ˆ + σ ˆ N / n i u e i i 1 ) [ ] ˆ ( ˆ PB PB � Pfeffermann and Barnard (1991): ′ θ = + − β + ˆ f y 1 f X u i ; y i i i ic PB i PB = ( ) ˆ = ˆ PB ′ ′ ′ ′ η ˆ = β ˆ ˆ y − ˆ η ˆ = η ˆ − − η ˆ / ( , u ,..., u ) r T n y η C R r R RC R , , , R r , where GLS 1 M = � ( ) M � � − − − R N X , N n , N n , , N n , N , , N + i i 1 1 2 2 m m m 1 M i = 1 Ugarte & al. (2009) applied this constrained model for a business survey for several regions with variance calculations Page 6 Trier- August 2011
BENCHMARKING AT THE LINEAR SCALE (2/2) • We propose the method Augmentation of the unconstrained least-squares system by adding to the original GLS system one row and one column: � � � � � � y X w X � � � � � � s s a s ; a = β + = β + e e � � � � � � PSW a PSW a ′ ′ y � � X w X � � � � + ; a + + + ; a ; a ; a where, ) ′ � = ( ) { } m ( ) ( � ′ ′ ′ = ′ ′ ′ ; = − × ; = − − + ( γ ˆ − ; , , , w N / n 1 1 X N n X 2 1 ) x w w w w + a 1 ; a 2 ; a m ; a i ; a i i Ni ; a i i ic ; a i i ; a i 1 � = � = ( ) ( ) ) m m ( ( ) ( ) 2 ; = 2 γ ˆ − − + − = γ ˆ − − y 1 N n n 1 N / n y w 2 ( 1 ) N n / n . + ; a i i i i i + ; a i i i i 1 1 i i • The benchmarking equation is obtained by orthogonality of the residual to the new added column Page 7 Trier- August 2011
SIMULATION FOR LINEAR CASE • Nested error unit level regression model • B=1000 populations generated • M = 30 areas (no empty areas) f i ≈ 4% • T σ = σ = 0.1 0.3 β = ( 2 , 0 . 25 ) • , , and u e x ~ N (m , s ) m i ~ N(10,3) s i = 2 • ; ; ij i i ONE POPULATION GENERATED TWO AREAS IN THE POPULATION Page 8 Trier- August 2011
SIMULATION RESULT FOR LINEAR CASE (1/2) 1 EBLUP 2 Ratio Benchmark Variance Weighted 3 Benchmark Pfeffermann and Barnard 4 Benchmark Proposed Method 5 Benchmark 1 2 3 4 5 ˆ ˆ ˆ ˆ ˆ VAR PB PSW RT f θ θ θ θ θ i ; y i ; y i ; y i ; y i ; y BIASREL 0.06% 0.58% 0.60% 0.60% 0.60% AARB 0.04% 0.60% 0.62% 0.62% 0.62% ARMSE 1.31% 1.45% 1.46% 1.46% 1.47% DIFFTOT 4.0x10 2 0.000 0.000 0.000 0.000 Page 9 Trier- August 2011
SIMULATION RESULT FOR LINEAR CASE (2/2) 1 EBLUP 2 Ratio Benchmark Variance Weighted 3 Benchmark Pfeffermann and Barnard 4 Benchmark 0.012 Proposed Method 5 Benchmark 0.01 0.008 1 0.006 2 0.004 3 4 0.002 5 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 -0.002 -0.004 Page 10 Trier- August 2011
LOG TRANSFORMATION FOR SKEWED VARIABLE • In BHF model, = β + + y x e u ij ij i i • In business surveys, distributions are skewed o Log normal transformation ( ) = β + + z exp x u e ij ij i i o New formulation of the predictors Page 11 Trier- August 2011
BACK-TRANSFORMATION WITH BIAS CORRECTION • Formulation of a nearly unbiased estimator is: ) � ∈ ˆ ( f , sum θ = + − ˆ + α ˆ (1) f z 1 f exp( y ) i ; z i i i ij i j U i s \ i ˆ and can be defined at the unit level or area level (see α The bias correction is i Chambers, Dorfman (2003) and Molina (2009)) • Other formulation from Kurnia, Notodiputro, Chambers (2009): ~ ˆ ˆ *,exp * θ = θ + α exp( ) (2) i ; z i ; y i ~ α o The bias correction is the modified term at the area level i ~ ~ α and compare to α o We propose the corrective term i 2 i 1 ˆ is the covariance matrix of the covariates. Σ where i Page 12 Trier- August 2011
BACK-TRANSFORMATION WITH BIAS CORRECTION • Approaches under model (1) � Chambers, Dorfman (2003) introduce several estimators: the rast predictor and smearing predictor � Fabrizi, Ferrante, Pacei (2007) compare estimators to a naïve predictor without a bias correction. The twiced smeared estimator performed best in simulation � Chandra, Chambers (2011) discuss calibration after a log- transformation Page 13 Trier- August 2011
BENCHMARKING AFTER BACK-TRANSFORMATION Compare benchmarking at different stages with back transformation ( ) 2 2 2 ~ ˆ ˆ ˆ α ˆ = σ ˆ + σ ˆ / ′ and bias correction by: (a) or (b) α = α ˆ + β Σ β / 2 i u e 2 i i i • Ratio method under different scenarios ˆ f , RT θ � No benchmark at log scale, back-transformed method (2), bias correction (a) i ; z ˆ VAR , RT � Benchmark at log scale, back-transformed method (2), bias correction (a) θ i ; z ˆ ˆ PB , RT PSW , RT θ θ i ; z i ; z ˆ f , sum , RT θ � No benchmark at log scale, back-transformed method (1), bias correction (a) i ; z � No benchmark at log scale, back- transformed method (2), bias correction (b) ˆ f 2 , RT θ i ; z • A maximization of the log-likelihood of the BHF model under constraints, back transformed method (2) and bias correction (b) ˆ MLC θ i ; z Page 14 Trier- August 2011
SIMULATION RESULT FOR NON-LINEAR CASE (1/2) � No benchmark at log scale, back-transformed method (2) , ,bias correction (a) , ratio adjusted � Benchmark at log scale, back- transformed method (2) , bias correction (a), ratio adjusted � No benchmark at log scale, back- transformed method (1) , bias correction (a) , ratio adjusted � No benchmark at log scale, back- transformed method (2) , bias correction (b), ratio adjusted � MLC adjustment, back- transformed method (2) , bias correction (b) NOT BENCHMARKED BENCHMARKED 1a 2a 3a 4a 5a 6a 1b 2b 3b 4b 5b 6b 7b ˆ ˆ f 2 , RT ˆ f , sum , RT ˆ θ VAR , RT θ PSW , RT θ θ ˆ ˆ f ˆ ˆ ˆ ˆ f 2 ˆ PB ˆ f , sum PSW ˆ PB , RT θ θ VAR θ MLC θ θ θ f , RT θ θ ; i ; z θ i ; z i ; z i z i ; z i ; z i ; z i ; z i ; z i ; z i ; z i ; z i ; z BIASREL 0.39% 11.16% 0.47% 8.77% 8.77% 8.75% 2.99% 2.84% 3.03% 2.83% 2.87% 2.90% 2.58% AARB 0.66% 10.89% 0.28% 8.50% 8.49% 8.49% 3.30% 3.15% 3.34% 3.15% 3.18% 3.20% 2.89% ARMSE 5.81% 12.05% 5.75% 10.01% 10.01% 10.02% 6.87% 6.84% 6.90% 6.84% 6.86% 6.90% 6.69% DIFFTOT 5.6x10 4 3.0x10 5 7.1x10 4 2.5x10 5 2.5x10 5 2.5x10 5 0.00 0.00 0.00 0.00 0.00 0.00 0.00 Page 15 Trier- August 2011
Recommend
More recommend