SAE POVERTY INDICATORS EB ELL SIMULATIONS MODIFICATIONS EXTENSIONS CONCLUSIONS Estimation of Complex Small Area Parameters with Application to Poverty Indicators J.N.K. Rao School of Mathematics and Statistics, Carleton University (Joint work with Isabel Molina) 1
SAE POVERTY INDICATORS EB ELL SIMULATIONS MODIFICATIONS EXTENSIONS CONCLUSIONS SAE POVERTY INDICATORS EB ELL SIMULATIONS MODIFICATIONS EXTENSIONS CONCLUSIONS 2
SAE POVERTY INDICATORS EB ELL SIMULATIONS MODIFICATIONS EXTENSIONS CONCLUSIONS NOTATION • U finite population of size N . • Population partitioned into D subsets U 1 , . . . , U D of sizes N 1 , . . . , N D , called domains or areas . • Variable of interest Y . • Y dj value of Y for unit j from domain d . • Target: to estimate domain parameters. δ d = h ( Y d 1 , . . . , Y dN d ) , d = 1 , . . . , D . • We want to use data from a sample S ⊂ U of size n drawn from the whole population. • S d = S ∩ U d sub-sample from domain d of size n d . • Problem: n d small for some domains. 3
SAE POVERTY INDICATORS EB ELL SIMULATIONS MODIFICATIONS EXTENSIONS CONCLUSIONS DIRECT ESTIMATORS • Direct estimator: Estimator that uses only the sample data from the corresponding domain. • Small area/domain: subset of the population that is target of inference and for which the direct estimator does not have enough precision. • What does “enough precision” mean? Some National Statistical Offices (GB, Spain) allow a maximum coefficient of variation of 20 %. • Indirect estimator: Borrows strength from other areas. 4
SAE POVERTY INDICATORS EB ELL SIMULATIONS MODIFICATIONS EXTENSIONS CONCLUSIONS NESTED-ERROR REGRESSION MODEL • Model: x dj auxiliary variables at unit level, iid iid Y dj = x ′ ∼ N (0 , σ 2 ∼ N (0 , σ 2 dj β + u d + e dj , u d u ) , e dj e ) . • Vector of variance components: θ = ( σ 2 u , σ 2 e ) ′ • BLUP of ¯ Y d : Predict non-sample values ˆ dj ˆ Y dj = x ′ β WLS + ˆ u d . = 1 ˆ Y BLUP ¯ � � ˆ , Y dj + Y dj d = 1 , . . . , D . d N d j ∈ s d j ∈ r d • Empirical BLUP (EBLUP): ˆ θ estimator of θ ˆ = ˆ ¯ ¯ (ˆ Y EBLUP Y BLUP θ ) d d � Battese, Harter & Fuller (1988), JASA 5
SAE POVERTY INDICATORS EB ELL SIMULATIONS MODIFICATIONS EXTENSIONS CONCLUSIONS SOME POVERTY AND INCOME INEQUALITY MEASURES • FGT poverty indicator • Gini coefficient • Sen index • Theil index • Generalized entropy • Fuzzy monetary index 6
SAE POVERTY INDICATORS EB ELL SIMULATIONS MODIFICATIONS EXTENSIONS CONCLUSIONS FGT POVERTY INDICATORS • E dj welfare measure for indiv. j in domain d : for instance, equivalised annual net income. • z = poverty line. • FGT family of poverty indicators for domain d : N d � α F α d = 1 � z − E dj � I ( E dj < z ) , α = 0 , 1 , 2 . N d z j =1 When α = 0 ⇒ Poverty incidence When α = 1 ⇒ Poverty gap When α = 2 ⇒ Poverty severity � Foster, Greer & Thornbecke (1984), Econometrica 7
SAE POVERTY INDICATORS EB ELL SIMULATIONS MODIFICATIONS EXTENSIONS CONCLUSIONS FGT POVERTY INDICATORS • Complex non-linear quantities (non continuous): Even if FGT poverty indicators are also means N d � α F α d = 1 � z − E dj � F α dj , F α dj = I ( E dj < z ) , N d z j =1 we cannot assume normality for the F α dj . • Not easy to obtain small area estimators with good bias and MSE properties. • A method valid to estimate poverty measures in small areas for any α and for other poverty or inequality measures would be desirable. 8
SAE POVERTY INDICATORS EB ELL SIMULATIONS MODIFICATIONS EXTENSIONS CONCLUSIONS SMALL AREA ESTIMATION • Due to the relative nature of the mentioned poverty line, poverty has usually low frequency : Large sample size is needed. � In Spain, poverty line for 2006: 6557 euros , approx. 20 % population under the line. • Survey on Income and Living Conditions (EU-SILC) has limited sample size. � In the Spanish SILC 2006, n = 34 , 389 out of N = 43 , 162 , 384 (8 out 10,000) . 9
SAE POVERTY INDICATORS EB ELL SIMULATIONS MODIFICATIONS EXTENSIONS CONCLUSIONS SAMPLE SIZES OF PROVINCES BY GENDER • Direct estimators for Spanish provinces are not very precise. • Provinces × Gender → Small areas (52 × 2). • CVs of direct and EB estimators of poverty incidences for 5 selected provinces: Province Gender n d Obs. Poor CV Dir. CV EB Soria F 17 6 40.37 16.52 Tarragona M 129 18 19.85 16.15 C´ ordoba F 230 73 7.52 6.73 Badajoz M 472 175 7.12 3.57 Barcelona F 1483 191 6.67 5.37 10
SAE POVERTY INDICATORS EB ELL SIMULATIONS MODIFICATIONS EXTENSIONS CONCLUSIONS EB METHOD (EMPIRICAL BEST/BAYES) • Vector with population elements for domain d : y d = ( Y d 1 , . . . , Y dN d ) ′ = ( y ′ ds , y ′ dr ) ′ • Target parameter: δ d = h ( y d ) • Best estimator: The estimator ˆ δ d that minimizes the MSE is ˆ δ B d = E y dr ( δ d | y ds ) . • Best estimator of F α d : We need to express δ d = F α d in terms of a vector y d = ( y ′ ds , y ′ dr ) ′ , F α d = h α ( y d ) for which we can derive the distribution of y dr | y ds . 11
SAE POVERTY INDICATORS EB ELL SIMULATIONS MODIFICATIONS EXTENSIONS CONCLUSIONS EB METHOD FOR POVERTY ESTIMATION • Assumption: there exists a transformation Y dj = T ( E dj ) of the welfare variables E dj which follows a normal distribution (i.e., the nested error model with normal errors u d and e dj ). • FGT poverty indicator as a function of transformed variables: N d � α � z − T − 1 ( Y dj ) F α d = 1 � T − 1 ( Y dj ) < z � � I . N d z j =1 • EB estimator of F α d : ˆ F EB α d = E y dr [ F α d | y ds ] , F α d = h α ( y d ) . 12
SAE POVERTY INDICATORS EB ELL SIMULATIONS MODIFICATIONS EXTENSIONS CONCLUSIONS EB METHOD FOR POVERTY ESTIMATION ind • Distribution: y d ∼ N ( µ d , V d ), d = 1 . . . , D , where � y ds � V ds � µ ds � � � V dsr y d = , µ d = , V d = . y dr V dsr V dr µ dr • Distribution of y dr given y ds : y dr | y ds ∼ N ( µ dr | ds , V dr | ds ) , where µ dr | ds = µ dr + V drs V − 1 ds ( y ds − µ ds ) , V dr | ds = V dr − V drs V − 1 ds V dsr . 13
SAE POVERTY INDICATORS EB ELL SIMULATIONS MODIFICATIONS EXTENSIONS CONCLUSIONS EB METHOD FOR POVERTY ESTIMATION • For the nested-error model: µ dr | ds = X dr β + σ 2 u 1 N d − n d 1 ′ n d V − 1 ds ( y ds − X ds β ) V dr | ds = σ 2 u (1 − γ d ) 1 N d − n d 1 ′ N d − n d + σ 2 e I N d − n d , where γ d = σ 2 u ( σ 2 u + σ 2 e / n d ) − 1 • Model for simulations: y dr = µ dr | ds + v d 1 N d − n d + ǫ dr , with v d ∼ N { 0 , σ 2 ǫ dr ∼ N ( 0 N d − n d , σ 2 u (1 − γ d ) } and e I N d − n d ) . • We only need to generate N + D univariate normal random variables. � Molina and Rao (2010), CJS 14
SAE POVERTY INDICATORS EB ELL SIMULATIONS MODIFICATIONS EXTENSIONS CONCLUSIONS MONTE CARLO APPROXIMATION (a) Generate L non-sample vectors y ( ℓ ) dr , ℓ = 1 , . . . , L from the (estimated) conditional distribution of y dr | y ds . (b) Attach the sample elements to form a population vector y ( ℓ ) = ( y ds , y ( ℓ ) dr ), ℓ = 1 , . . . , L . d (c) Calculate the poverty measure with each population vector F ( ℓ ) α d = h α ( y ( ℓ ) d ), ℓ = 1 , . . . , L . Then take the average over the L Monte Carlo generations: L = 1 F ( ℓ ) ˆ α d = E y dr [ F α d | y ds ] ∼ � F EB α d . L ℓ =1 15
SAE POVERTY INDICATORS EB ELL SIMULATIONS MODIFICATIONS EXTENSIONS CONCLUSIONS NON-SAMPLED AREAS • Y ( ℓ ) for j = 1 , . . . , N d and ℓ = 1 , . . . , L generated from dj Y ( ℓ ) β + u ( ℓ ) + e ( ℓ ) dj ˆ = x ′ dj . dj d u ( ℓ ) iid e ( ℓ ) iid σ 2 σ 2 ∼ N (0 , ˆ u ); ∼ N (0 , ˆ e ) . d dj F ( ℓ ) α d from { Y ( ℓ ) • Calculate ˆ dj } and use L α d ≃ 1 F ( ℓ ) ˆ � ˆ F EB α d L ℓ =1 • ˆ F EB α d is a synthetic estimator.
SAE POVERTY INDICATORS EB ELL SIMULATIONS MODIFICATIONS EXTENSIONS CONCLUSIONS MSE ESTIMATION • Construct bootstrap populations { Y ∗ ( b ) , b = 1 , . . . , B } from dj dj ˆ Y ∗ dj = x ′ β + u ∗ d + e ∗ dj ; j = 1 , . . . , N d , d = 1 , . . . , D . iid iid u ∗ σ 2 e ∗ σ 2 ∼ N (0 , ˆ u ); ∼ N (0 , ˆ e ) . d dj • Calculate bootstrap population parameters F ∗ α d ( b ) • From each bootstrap population, take the sample with the same indexes S as in the initial sample and calculate EBs F EB ∗ α d ( b ) using bootstrap sample data y ∗ s and known x dj . B α d ) = 1 2 mse ∗ (ˆ F EB � { ˆ F EB ∗ α d ( b ) − F ∗ α d ( b ) } B b =1
SAE POVERTY INDICATORS EB ELL SIMULATIONS MODIFICATIONS EXTENSIONS CONCLUSIONS WORLD BANK (WB) / ELL METHOD • Elbers et al. (2003) also used nested error model on transformed variables Y dj , using clusters as d . • For comparability we take cluster as small area. • Generate A bootstrap populations { Y ∗ dj ( a ) , a = 1 , . . . , A } • Calculate F ∗ α d ( a ) , a = 1 , . . . , A . Then ELL estimator is: A = 1 F ( ELL ) ˆ � F ∗ α d ( a ) = F ∗ α d ( · ) α d A a =1
Recommend
More recommend