Aligning estimates from different surveys using Empirical Likelihood methods EWA KABZINSKA AND YVES G. BERGER THE UNIVERSITY OF SOUTHAMPTON
OUTLINE 1. INTRODUCTION 1. WHY IS IT BENFICIAL TO COMBINE INFORMATION? 2. CURRENT APPROACHES 3. WHY EMPIRICAL LIKELIHOOD? 4. EMPIRICAL LOGLIKELIHOOD FUNCTION 5. CONSTRAINTS 2. POINT ESTIMATION 1. ESTIMATION OF SCALE LOADS 2. ESTIMATING EQUATIONS 3. EMPIRICAL LOGLIKELIHOOD RATIO FUNCTION 4. COMPARISON WITH OTHER APPROACHES 3. CONFIDENCE REGIONS 4. SUMMARY
INTRODUCTION • Often different surveys carried out independently in the same population measure some common variables • Population level parameters associated with these common variables may be unknown or unreliable • Examples: • Household size and composition, tenure type • Income, expenditure • Educational attainment • Ethnic origin
INTRODUCTION WHY IS IT BENFICIAL TO COMBINE INFORMATION? • CONSISTENCY - both samples give the same point estimate for the common variables • IMPROVED PRECISION – „ borrowing strength ” from the other samples
INTRODUCTION
INTRODUCTION
INTRODUCTION • Using information from two surveys , we want to obtain a single set of positive weights , which: • give the same estimates for the unknown totals of the common variables (Z), • Capture aditional benchmark constraints ( 𝑌 1 and 𝑌 2 ) • may be used for estimation of other population level parameters 𝑂 and 𝜄 2 𝑂 ) ( 𝜄 1 • Once the weights are created, each survey can be analysed separatelly
INTRODUCTION CURRENT APPROACHES • GREG estimators with enlarged number of predictors by Zieschang [1], Renssen and Nieuwenbroek [2] and Merkouris [3] • Pseudo Empirical Likelihood estimator of Wu [4] • Single sample Empirical Likelihood approach for complex sampling designs by Berger and De La Riva Torres [5] OTHER RELEVANT WORK • Model based projection estimator of Kim and Rao (2011) • Weighted Empirical Likelihood approach to the common mean problem by Tsao and Wu (2006)
INTRODUCTION WHY EMPIRICAL LIKELIHOOD? • Variables of interest are often skewed (e.g. income, expenditure) - Empirical Likelihood is a nonparametric approach • EL allows to easily incorporate additional benchmark constraints • Asymmetric, data-driven confidence reagions may be obtained easily, without relying on variance estimation • Weights are positive by definition
INTRODUCTION EMPIRICAL LOGLIKELIHOOD FUNCTION 𝓂 𝑛 = 𝓂 𝑛 1 , 𝑛 2 = log 𝑛 1𝑗 + log 𝑛 2𝑘 (1) 𝑗∈𝑡 1 𝑘∈𝑡 2
INTRODUCTION CONSTRAINTS 𝑛 1𝑗 𝜌 1𝑗 = 𝑜 1 𝑛 2𝑗 𝜌 2𝑗 = 𝑜 2 (2) 𝑗∈𝑡 1 𝑘∈𝑡 2 𝑛 1𝑗 𝑔(𝑦 1𝑗 , 𝜘 1 ) = 0 𝑛 2𝑗 𝑔(𝑦 2𝑗 , 𝜘 2 ) = 0 (3) 𝑗∈𝑡 1 𝑘∈𝑡 2 𝑛 1𝑗 𝑨 1𝑗 = 𝑛 2𝑗 𝑨 2𝑗 (4) 𝑗∈𝑡 1 𝑘∈𝑡 2 𝑛 1𝑗 > 0 𝑛 2𝑗 > 0 (5)
POINT ESTIMATION ESTIMATION OF SCALE LOADS = Find 𝒏 = 𝑏𝑠 𝑛𝑏𝑦 {𝓂 𝒏 log 𝑛 𝑗 } 𝑗∈𝑡 1 ∪𝑡 2 Solution: 𝑗 = (𝜌 𝑗 + 𝜽 𝑼 𝒅 𝑗 ) −1 𝑛 (6) where 𝜌 𝑗 is the inclusion probability of the i -th unit, 𝜽 is the vector of Lagrange multipliers and 𝒅 is the vector of constraints (2)-(5)
POINT ESTIMATION ESTIMATING EQUATIONS 𝑂 and 𝜄 2 𝑂 befixed, unknown population level Let 𝜄 1 parameters of interest, solutions to: 1𝑗 𝑧 1𝑗 , 𝜄 1 = 0, 2𝑗 𝑧 2𝑗 , 𝜄 2 = 0 (7) 𝑗∈𝑉 𝑗∈𝑉 −1 Example: t𝑗 𝑧 t𝑗 , 𝜄 t = 𝑧 t𝑗 − 𝜄 t 𝜌 𝑗 𝑜 𝑢 𝑂 and 𝜄 2 𝑂 Aim: point estimators for 𝜄 1
POINT ESTIMATION EMPIRICAL LOGLIKELIHOOD RATIO FUNCTION ∗ , 𝜄 1 , 𝜄 2 𝑠 𝜄 1 , 𝜄 2 = 2 𝓂 𝒏 − 𝓂 𝒏 (8) Constraints (2)-(5) Constraints (2)-(5), ∗ 1𝑗 𝑧 1𝑗 , 𝜄 1 = 0, 𝑛 1𝑗 𝑗∈𝑡 1 ∗ 2𝑗 𝑧 2𝑗 , 𝜄 2 = 0 𝑛 2𝑗 𝑗∈𝑡 2
POINT ESTIMATION EMPIRICAL LOGLIKELIHOOD RATIO FUNCTION 𝑂 , 𝜄 2 𝑂 : Point estimators for 𝜄 1 1 , 𝜄 2 = arg 𝑛𝑗𝑜 𝜄 1 ,𝜄 2 𝑠 𝜄 1 , 𝜄 2 𝜄 (9) SOME ASYMPTOTIC PROPERTIES OF THE ESTIMATOR: • Equivalent to a GREG-family estimator making use of information from both samples • Root-n consistent
POINT ESTIMATION COMPARISON WITH OTHER APPROACHES Two test populations: 1. Skewed distribution generated according to the model proposed in [6] 2. 2006 British Expenditure and Food Survey • x : number of people living in the household and number of rooms in the household • z : gross weekly income • y : total gross expenditure and total expenditure on housing (7), total expenditure on clothing and the total expenditure on housing (8), total expenditure on clothing and the total expenditure on food (9)
POINT ESTIMATION COMPARISON WITH OTHER APPROACHES • 10 000 iterations • Two independent samples selected by systematic random sampling • Tested estimators: • The proposed Empirical Likelihood estimator (EL), • Wu’s Pseudo Empirical Likelihood estimator (PEL) [4], • Renssen and Nieuwenbroek’s GREG-type estimator (RN) [2], • Zieschang’s GREG-type estimator (ZG) [1]
POINT ESTIMATION RELATIVE BIASES OF THE ESTIMATORS (𝐹𝑀) (𝑋𝑉) (𝑆𝑂) (𝑎𝐻) (𝐹𝑀) (𝑋𝑉) (𝑆𝑂) (𝑎𝐻) 1 1 1 1 2 2 2 2 𝑜 1 𝑜 2 𝜄 𝜄 𝜄 𝜄 𝜄 𝜄 𝜄 𝜄 N Generated data 1 100000 1000 1000 0.01% -0.02% 0.19% -0.16% -0.03% -0.06% -0.16% -0.17% 2 100000 200 400 0.01% 0.01% -0.99% -0.76% -0.01% -0.11% -0.37% -0.53% 3 100000 200 200 0.01% 0.13% -0.76% -0.64% 0.02% -0.06% -0.62% -0.68% 4 2500 160 160 0.00% -0.04% -1.14% -0.98% -0.02% -0.12% -0.97% -1.09% 5 2500 140 260 -0.01% 0.15% -1.28% -0.98% 0.00% -0.13% -0.51% -0.72% 6 2500 240 240 0.01% 0.13% -0.76% -0.64% 0.02% -0.06% -0.62% -0.68% Expenditure and Food Survey data 7 6645 500 500 -0.11% 0.07% -0.57% -0.31% -0.05% 0.21% -0.56% -0.20% 8 6645 500 500 0.38% 0.44% -0.07% 0.03% 0.06% 0.06% -0.38% -0.35% 9 6645 500 500 0.07% 0.07% -0.38% -0.30% 0.01% 0.01% -0.36% -0.32% T A B L E 1 . R E L A T I V E B I A S E S O F T H E P R O P O S E D E M P I R I C A L L I K E L I H O O D E S T I M A T O R ( E L ) , W U ’ S P S E U D O E M P I R I C A L L I K E L I H O O D E S T I M A T O R [ 4 ] ( W U ) , G R E G E S T I M A T O R S P R O P O S E D B Y Z I E S C H A N G [ 1 ] ( Z G ) A N D R E N S S E N A N D N I E U W E N B R O E K [ 2 ] ( R N )
CONFIDENCE REGIONS Under some regularity conditions 𝑂 𝜓 2 𝑂 , 𝜄 2 2 𝑠 𝜄 1 (10) 𝑂 , 𝜄 2 𝑂 is constructed by choosing: The (1−α ) Wilk type confidence region for 𝜄 1 2 𝜄 1 , 𝜄 2 : r 𝜄 1 , 𝜄 2 ≤ 𝜓 𝑒𝑔=2,𝛽 (11) CONFIDENCE INTERVALS obtained using a numerical algorithm
SUMMARY • We present an Empirical Likelihood approach to combining information from multiple surveys in presence of benchmark and consistency constraints • This approach may be used to estimate a wide class of parameters and can be used in complex sampling designs • Under the tested scenarios, the proposed point estimator shows satisfactory performance compared to the other available estimators in terms of relative bias • The main advantage lies in the possibility to construct confidence regions using the 𝝍 𝟑 approximation of the empirical log likelihood ratio function • A numerical algorithm for constructing confidence intervals is proposed • Although the proposed method entails some numerical operations, it is still less computationally intensive than methods such as bootstrap and relatively easy to implement
LITERATURE [1] K. D. Zieschang, Sample weighting methods and estimation of totals in the consumer expenditure survey. Journal of the American Statistical Association, 85(412), (1990), 986 – 1001. [2] R.H. Renssen and N.J. Nieuwenbroek. Aligning estimates for common variables in two or more sample surveys. Journal of the American Statistical Association, 92(437), (1997), 368 – 374. [3] Takis Merkouris. Combining independent regression estimators from multiple surveys. Journal of the American Statistical Association, 99(468), (2004), 1131-1139. [4] Ch. Wu, Combining information from multiple surveys through the empirical likelihood method, Canadian Journal of Statistics, 32(1) (2004), 15 – 26. [5] Y.G. Berger and O. De La Riva Torres. Empirical likelihood confidence intervals for complex sampling designs. Southampton Statistical Sciences Research Institute, (S3RI Methodology Working Papers), (2012). [6] Ch. Wu and J.K. Rao, Pseudo Empirical Likelihood Ratio Confidence Intervals for Complex Surveys, The Canadian Journal of Statistics, 34, (2006), 359-375. [7] Office for National Statistics and Department for Environment, Food and Rural Affairs, Expenditure and Food Survey, 2006 [computer file]. 3rd Edition. Colchester, Essex: UK Data Archive [distributor], July 2009. SN: 5986.
Recommend
More recommend