Eideh, A. Informative Nonignorable Universidad Carlos III de Madrid Parametric and Semiprametric Prediction of Finite Population Total Under Informative Sampling and Nonignorable Nonresponse (IN) (Theory and In Progress) Abdulhakeem Eideh Department of Mathematics Al-Quds University Abu-Dees Campus, Al-Quds, Palestine E-mail: msabdul@staff.alquds.edu Date: 7 de Noviembre de 2019 Depto. Estadística Universidad Carlos III de Madrid 1
Eideh, A. Informative Nonignorable Universidad Carlos III de Madrid Outline Introduction Sample distribution Response distribution Estimation of response weights General theory Parametric Prediction Semiparametric prediction Simple Ratio Population Model Multiple Regression Population Model Conclusions 2
Eideh, A. Informative Nonignorable Universidad Carlos III de Madrid Introduction Peffermann et al. (1998), survey data may be viewed as the outcome of two random processes: The process generating the values in the finite population, often referred to as the ‘superpopulation model’ The process selecting the sample data from the finite population values, known as the ‘sample selection mechanism’ Analytic inference from survey data relates to the superpopulation model, 3
Eideh, A. Informative Nonignorable Universidad Carlos III de Madrid Standard analysis of survey data often fails to account for the complex nature of the sampling design (stratification clustering, unequal probability of selection, informative sampling) The effect of the sample design on the analysis is due to the fact that the models in use typically do not incorporate all the design variables determining the sample selection, either because there may be too many of them or because they are not of substantive interest. However, if the sampling design is informative, standard estimates of the model parameters can be severely biased, leading possibly to false inference, since the sample distribution differs from that of the population In the literature three methods dealing with the effect of unequal probability of selection and informative sampling are discussed. 4
Eideh, A. Informative Nonignorable Universidad Carlos III de Madrid Classical methods: Probability weighting Pseudo likelihood estimation To overcome the difficulties associated with the use of classical inference procedures Pfeffermann et al (1998) proposed the use of the sample distribution (3 rd method) induced by the assumed population models, under informative sampling in case of Cross sectional survey data Eideh (2002, PhD) fitted time series models for longitudinal survey data, 2-stage clustered (SAE) (prediction and estimation), under informative sampling and the treated nonignorable nonresponse as informative sampling 5
Eideh, A. Informative Nonignorable Universidad Carlos III de Madrid Sample distribution (dist. after selection) The sample distribution is the distribution of the observed outcomes given the selected sample 1 ,..., U N denote a finite population consisting of N unit y be the study variable of interest y be the value of y for the th i population unit i i x , be the value of an auxiliary variable(s), x U i 6
Eideh, A. Informative Nonignorable Universidad Carlos III de Madrid z ,..., z z be the values of a known design variable, used for 1 N the sample selection process but not included in the working model under consideration. (Secondary data analysis) Pr( ) 0 i s , first order selection probabilities i 1 1 ,..., sampling weight ; i N w i i y ,..., | , y independent random variables, with pdf f y x , 1 N p i i indexed by a vector parameter . (Dist. Before selection) 7
Eideh, A. Informative Nonignorable Universidad Carlos III de Madrid Pfeffermann et al. (1998), the sample pdf of y i | , , | , , , s f y x f y x i s i i p i i Pr | , , | , , , s i s x y f y x i i i p i i Pr | , , i s x i | x , , | , E y f y x p i i i p i i | , , E x p i i | , , | , , | , E x E x y f y x dy p i i p i i i p i i i | x , , Modeling E y p i i i Pfeffermann et al. (1999) Linear and exponential Eideh (2002) Logit and probit. 8
Eideh, A. Informative Nonignorable Universidad Carlos III de Madrid Based on the sample data we can estimate in , , ; y x w i s i i i two steps: Step-one : Estimate , using | , , 1 | , , E w x y E x y . s i i i p i i i Step-two : ~ ~ n , log | , log | , , l f y x E w x i s rs p i i s i i 1 i ˆ : Variance Estimation of Pfeffermann and Sverchkov (2003), Eideh and Nathan (2006, 2009), and Eideh (2009), have considered the use of inverse information for estimating the variance of the maximum sample likelihood estimators, but they treat the informativeness parameter estimates as fixed. For theoretical justification of this practice, see Bonnéry et al. (2018). 9
Eideh, A. Informative Nonignorable Universidad Carlos III de Madrid Response Distribution Three random processes: 1. Process generating the values in the finite population, 2. Sample selection process – sampling design, 3. The nonresponse process See Pfeffermann et al . (1998). Informative sample selection. For inference problem, Little (1982) classify the nonresponse mechanism as ignorable (MAR and MCAR) and nonignorable (NMAR). 10
Eideh, A. Informative Nonignorable Universidad Carlos III de Madrid Cross Classification of Sampling Design and Nonresponse Mechanism Sampling Nonresponse Mechanism Design ignorable nonignorable informative II-Observed IN- Missing noninformative NI- Observed NN-Observed Brick (2013 ) …Thus, bias is often the largest component of mean square error of the estimates 11
Eideh, A. Informative Nonignorable Universidad Carlos III de Madrid Notations 1 ,..., - finite population U N y - value of y for the th i population unit i i x , - value x - auxiliary variable U i z ,..., z z values of known design variables 1 N Pr( , , ) 0 i s x y z i 1 1 ,..., sampling weight ; i N w i i y ,..., | , y ind r.vs with pdf f y x 1 N p i i 12
Eideh, A. Informative Nonignorable Universidad Carlos III de Madrid i 1 0 I if unit is sampled I otherwise U i i | , 1 - sample size n s i i U I i | , 0 s i i U I - nonsampled unit i i is observed 1 0 if unit otherwise R s R i i | 1 r i s R -response set i | 0 r i s R - nonresponse set i Pr( | , , ) 0 i r x y v response probability i 1 - response weight i i 13
Eideh, A. Informative Nonignorable Universidad Carlos III de Madrid Nonsampled dist. Sverchkov and Pfeffermann (2004) 1 | , , | , E x y f y x p i i i p i i | , , | , , , f y x f y x i s s i i p i i 1 | , , E x p i i Response dist. Eideh (2002, 2007): | , , x E y | , , , | , , , , | , , f y x f y x i r s i i i f y x r i i p i i s i i | , , , E x s i i Nonresponse dist. Eideh (2007, 2009): 1 | , | E x y f y x | , , , | , , , , s i i i s i i f y x f y x i r r i i p i i 1 | x E s i i 14
Recommend
More recommend