Robust LH stratified sampling strategy Maria Caterina Bramati Sapienza University of Rome Southampton Research Seminar - July 15th 2014 - ( Southampton Research Seminar ) Robust LH stratified sampling strategy 1 / 37
Introduction Outline Motivation Robustness issues in Stratified design Some proposals Simulation Study Further issues: Time-dependent survey variables Agenda ( Southampton Research Seminar ) Robust LH stratified sampling strategy 2 / 37
Introduction Key features on surveying firms 1 Population has a skewed distribution (small number of units accounts for a large share of the study variables) 2 Availability of administrative information, providing a list of the statistical units of the target population (i.e. tax declaration, social security registers) 3 Survey burdens for firms and costs for NSIs 4 Data quality (administrative sources and survey collection) 5 @EU: Compliance requirements established by EUROSTAT ( Southampton Research Seminar ) Robust LH stratified sampling strategy 3 / 37
Introduction Key features on surveying firms 1 Population has a skewed distribution (small number of units accounts for a large share of the study variables) 2 Availability of administrative information, providing a list of the statistical units of the target population (i.e. tax declaration, social security registers) 3 Survey burdens for firms and costs for NSIs 4 Data quality (administrative sources and survey collection) 5 @EU: Compliance requirements established by EUROSTAT ( Southampton Research Seminar ) Robust LH stratified sampling strategy 3 / 37
Introduction Key features on surveying firms 1 Population has a skewed distribution (small number of units accounts for a large share of the study variables) 2 Availability of administrative information, providing a list of the statistical units of the target population (i.e. tax declaration, social security registers) 3 Survey burdens for firms and costs for NSIs 4 Data quality (administrative sources and survey collection) 5 @EU: Compliance requirements established by EUROSTAT ( Southampton Research Seminar ) Robust LH stratified sampling strategy 3 / 37
Introduction Key features on surveying firms 1 Population has a skewed distribution (small number of units accounts for a large share of the study variables) 2 Availability of administrative information, providing a list of the statistical units of the target population (i.e. tax declaration, social security registers) 3 Survey burdens for firms and costs for NSIs 4 Data quality (administrative sources and survey collection) 5 @EU: Compliance requirements established by EUROSTAT ( Southampton Research Seminar ) Robust LH stratified sampling strategy 3 / 37
Introduction Key features on surveying firms 1 Population has a skewed distribution (small number of units accounts for a large share of the study variables) 2 Availability of administrative information, providing a list of the statistical units of the target population (i.e. tax declaration, social security registers) 3 Survey burdens for firms and costs for NSIs 4 Data quality (administrative sources and survey collection) 5 @EU: Compliance requirements established by EUROSTAT ( Southampton Research Seminar ) Robust LH stratified sampling strategy 3 / 37
Sampling Strategy Sampling design: 3 main problems 1 choice of the sampling design 2 sample size determination 3 sample allocation under some constraints -costs related to the surveying process and statistical burdens -statistical precision -legal obligations and requirements (EUROSTAT, NBB, . . . ) -availability of auxiliary information ( Southampton Research Seminar ) Robust LH stratified sampling strategy 4 / 37
Sampling Strategy Sampling design: 3 main problems 1 choice of the sampling design 2 sample size determination 3 sample allocation under some constraints -costs related to the surveying process and statistical burdens -statistical precision -legal obligations and requirements (EUROSTAT, NBB, . . . ) -availability of auxiliary information ( Southampton Research Seminar ) Robust LH stratified sampling strategy 4 / 37
Sampling Strategy Sampling design: stratified sample -population is divided into subgroups (or strata) in order to maximize the intra-group ‘ homogeneity ’ (according to a chosen target variable) and to minimize the inter-group ‘ homogeneity ’. ( Southampton Research Seminar ) Robust LH stratified sampling strategy 5 / 37
Sampling Strategy Sampling design: stratified sample -population is divided into subgroups (or strata) in order to maximize the intra-group ‘ homogeneity ’ (according to a chosen target variable) and to minimize the inter-group ‘ homogeneity ’. It requires mutually exclusive strata: 1 unit can belong to 1 stratum only collectively exhaustive strata: no population unit excluded ( Southampton Research Seminar ) Robust LH stratified sampling strategy 5 / 37
Sampling Strategy Sampling design: stratified sample -population is divided into subgroups (or strata) in order to maximize the intra-group ‘ homogeneity ’ (according to a chosen target variable) and to minimize the inter-group ‘ homogeneity ’. It requires mutually exclusive strata: 1 unit can belong to 1 stratum only collectively exhaustive strata: no population unit excluded The choice of 1 ) − 3 ) should be linked to quality issues of the final statistical product , balancing costs and benefits. = ⇒ Target statistical precision is the constraint under which choices are made ( Southampton Research Seminar ) Robust LH stratified sampling strategy 5 / 37
Sampling Strategy HL sampling algorithm t ystrat = � L � N h The HT estimator for the total ^ k ∈ S h y k h = 1 n h has variance estimated by L � ( 1 − a h ) ^ s 2 Var (^ t ystrat ) = N h (1) yh a h h = 1 where � 1 s 2 y h ) 2 , ( y k − ^ yh = n h − 1 k ∈ S h and ^ y h is the sample mean of Y within stratum h . ( Southampton Research Seminar ) Robust LH stratified sampling strategy 6 / 37
Sampling Strategy HL sampling algorithm The HL algorithm with Neyman allocation represents an optimal solution for the three problems. � L − 1 W 2 h s 2 yh h = 1 a h n ^ = N L + (2) ( cY / N ) 2 + � L − 1 t ystrat W h N s 2 h = 1 yh a h = n h W h s yh = (3) � L − 1 N h k = 1 W k s yk ( Southampton Research Seminar ) Robust LH stratified sampling strategy 7 / 37
Sampling Strategy HL sampling algorithm The idea of HL algorithm is to find the optimal strata boundaries b 1 , . . . , b L − 1 which minimize the size n ^ t ystrat subject to a required precision c , with some appropriate sampling allocation (Neyman, proportional...). ( Southampton Research Seminar ) Robust LH stratified sampling strategy 8 / 37
Sampling Strategy HL sampling algorithm The idea of HL algorithm is to find the optimal strata boundaries b 1 , . . . , b L − 1 which minimize the size n ^ t ystrat subject to a required precision c , with some appropriate sampling allocation (Neyman, proportional...). However ( Southampton Research Seminar ) Robust LH stratified sampling strategy 8 / 37
Sampling Strategy HL sampling algorithm The idea of HL algorithm is to find the optimal strata boundaries b 1 , . . . , b L − 1 which minimize the size n ^ t ystrat subject to a required precision c , with some appropriate sampling allocation (Neyman, proportional...). However 1 s 2 yh is unknown = ⇒ use of auxiliary information X for Y 2 number L of strata is selected by the user 3 low quality of the administrative records: outliers? ( Southampton Research Seminar ) Robust LH stratified sampling strategy 8 / 37
Sampling Strategy HL sampling algorithm The idea of HL algorithm is to find the optimal strata boundaries b 1 , . . . , b L − 1 which minimize the size n ^ t ystrat subject to a required precision c , with some appropriate sampling allocation (Neyman, proportional...). However 1 s 2 yh is unknown = ⇒ use of auxiliary information X for Y 2 number L of strata is selected by the user 3 low quality of the administrative records: outliers? BUT auxiliary information X � = Y target variable. = ⇒ modified HL algorithm ( Rivest, 2002 ): the discrepancy between Y and X is estimated ( Southampton Research Seminar ) Robust LH stratified sampling strategy 8 / 37
Sampling Strategy The effects of outliers in the HL sampling algorithm Type of anomalies erroneous records in the surveyed data ( Y ) (vertical outliers) quality issues in the administrative registers ( X ) (leverage) outliers in both variables ( X , Y ) (good/bad leverages) = ⇒ Unreliable conditional mean and variance of Y | X, affecting sample size strata bounds sample allocation ( Southampton Research Seminar ) Robust LH stratified sampling strategy 9 / 37
Sampling Strategy The effects of outliers in the HL sampling algorithm Type of anomalies erroneous records in the surveyed data ( Y ) (vertical outliers) quality issues in the administrative registers ( X ) (leverage) outliers in both variables ( X , Y ) (good/bad leverages) = ⇒ Unreliable conditional mean and variance of Y | X, affecting sample size strata bounds sample allocation ( Southampton Research Seminar ) Robust LH stratified sampling strategy 9 / 37
Recommend
More recommend