Using the Superpopulation Model for Imputations and Variance - PowerPoint PPT Presentation

Using the Superpopulation Model for Imputations and Variance Computation in Survey Sampling Petr Novák, Václav Kosina Czech Statistical Office Petr Novák, Václav Kosina Using the Superpopulation Model for Imputations and Variance

Introduction Situation Let us have a population of N units: n sampled ( sam ) and N-n unknown ( imp ). We want to estimate the population total Y = � N i = 1 y i . Petr Novák, Václav Kosina Using the Superpopulation Model for Imputations and Variance

Introduction Situation Let us have a population of N units: n sampled ( sam ) and N-n unknown ( imp ). We want to estimate the population total Y = � N i = 1 y i . Model assumptions y i = β x i + ǫ i , ǫ i are independent random variables, E ǫ i = 0 and var ǫ i = c i σ 2 , x i and c i known constants for all i = 1 , ..., N , β and σ 2 unknown parameters. Petr Novák, Václav Kosina Using the Superpopulation Model for Imputations and Variance

Imputation Estimation Estimate β from the sampled part using the least squares method: � sam w i x i y i / c i ˆ β = � . sam w i x 2 i / c i w i are some appropriate weights. sam y i Note: constant weights and c i = x i gives ˆ � β = sam x i . � Petr Novák, Václav Kosina Using the Superpopulation Model for Imputations and Variance

Imputation Estimation Estimate β from the sampled part using the least squares method: � sam w i x i y i / c i ˆ β = � . sam w i x 2 i / c i w i are some appropriate weights. sam y i Note: constant weights and c i = x i gives ˆ � β = sam x i . � Data imputation For each unit from the unknown part we impute y i = x i ˆ ˆ β. The estimate of the population total is then � � Y = ˆ y i + y i . ˆ sam imp Petr Novák, Václav Kosina Using the Superpopulation Model for Imputations and Variance

Differences from classic techniques Classic reweighting approach: y i treated as constants. Randomness through sample inclusion indicators. Error computed through var ˆ Y . Superpopulation model approach: y i treated as random variables. Real y i from the imputed part predicted with ˆ y i = x i ˆ β . Error computed through mse ˆ Y = E ( ˆ Y − Y ) 2 . Petr Novák, Václav Kosina Using the Superpopulation Model for Imputations and Variance

Error computation The least squares estimator is unbiased ( E ˆ β = β ) . Therefore E ˆ y i = Ex i ˆ β = x i β = Ey i . The mean square error of the prediction is then Y − Y ) 2 = E ( ˆ mse ˆ Y = E ( ˆ Y imp − Y imp ) 2 = E ( ˆ Y imp − E ˆ Y imp − Y imp + EY imp ) 2 Y imp ) 2 + E ( Y imp − EY imp ) 2 = E ( ˆ Y imp − E ˆ − 2 E ( ˆ Y imp − E ˆ Y imp )( Y imp + EY imp ) = var ˆ Y imp + varY imp . Petr Novák, Václav Kosina Using the Superpopulation Model for Imputations and Variance

Variance computation The variance of estimated values is � sam w 2 i x 2 i / c i var ˆ Y imp = varX imp ˆ β = X 2 imp var ˆ β = X 2 i / c i ) 2 σ 2 . ( � imp sam w i x 2 We denote var ˆ β as σ 2 β . The variance of the predicted real values is � varY imp = c i σ 2 . imp Denote C imp := � imp c i . We get mse ˆ Y = X 2 β + C imp σ 2 . imp σ 2 Petr Novák, Václav Kosina Using the Superpopulation Model for Imputations and Variance

Variance computation The variance of estimated values is � sam w 2 i x 2 i / c i var ˆ Y imp = varX imp ˆ β = X 2 imp var ˆ β = X 2 i / c i ) 2 σ 2 . ( � imp sam w i x 2 We denote var ˆ β as σ 2 β . The variance of the predicted real values is � varY imp = c i σ 2 . imp Denote C imp := � imp c i . We get mse ˆ Y = X 2 β + C imp σ 2 . imp σ 2 Possible estimators for σ 2 : � ( y i − ˆ β x i ) 2 � w i ( y i − ˆ β x i ) 2 1 1 � w i − ¯ , . n − 1 c i w i c i sam sam Petr Novák, Václav Kosina Using the Superpopulation Model for Imputations and Variance

Special cases If w i ≡ const . and c i = x i , we get 1 σ 2 σ 2 β = X sam and therefore + X imp σ 2 = X imp X all σ 2 mse ˆ Y = X 2 σ 2 . imp X sam X sam If we have no auxiliary information available and set x i ≡ 1, we impute the sample mean for each unit. We get then the commonly used formula � � Y = ( N − n ) N σ 2 = N 2 1 − n mse ˆ σ 2 . n n N Petr Novák, Václav Kosina Using the Superpopulation Model for Imputations and Variance

Chain imputation Situation: x i not known, but estimated from z i Model: y i | x i ∼ ( x i β yx , c i σ 2 x i ∼ ( z i β xz , d i σ 2 yx ) , xz ) With help of conditional variance decomposition we get mse ( ˆ Y ) = var ˆ Y imp + varY imp = Evar [ ˆ Y imp | X ] + varE [ ˆ Y imp | X ] + Evar [ Y imp | X ] + varE [ Y imp | X ] ... = Emse ( ˆ Y | X ) + β 2 yx mse (ˆ X ) . Petr Novák, Václav Kosina Using the Superpopulation Model for Imputations and Variance

Chain imputation Estimated error: mse ˆ Y = � mse ( Y | ˆ X ) + ˆ mse ˆ X . � β 2 yx � The chain structure can be followed up and stacked until we get to an auxiliary variable which is known for all units, i.e. administrative data. Petr Novák, Václav Kosina Using the Superpopulation Model for Imputations and Variance

Stratification level shifts Situation: The population is divided into strata (size class, NACE, region). There are several stratification levels, going from relatively small groups to larger ones. When there are not enough responding units to estimate β in one stratum, we use the estimates from corresponding higher level stratum. 0.6 S2 0.2 S1 −0.2 S0 −1.0 −0.5 0.0 0.5 1.0 Petr Novák, Václav Kosina Using the Superpopulation Model for Imputations and Variance

Stratification level shifts If the estimated total of the whole population divided into strata m 1 , ..., m K is � Y = ˆ Y m j , ˆ j the mean square error is mse ˆ Y = var ˆ Y imp + varY imp � � Y imp Y imp = var ˆ + var m j m j j j � � � Y imp Y imp Y imp varY imp var ˆ cov ( ˆ m j , ˆ = + m k ) + m j . m j j j � = k j Both variances of estimated and real values can be computed with methods from above. Petr Novák, Václav Kosina Using the Superpopulation Model for Imputations and Variance

Stratification level shifts - covariance computation Covariance computation Let m 1 and m 2 be two basic strata. β estimated from superstrata S 1 and S 2 respectively. ˆ Denote S d = S 1 ∩ S 2 , which is the smaller of S 1 and S 2 , if the stratification levels are well ordered. Denote S = S 1 ∪ S 2 , which is then the larger of both. Then Y m 2 ) = cov ( X imp β S 1 , X imp β S 2 ) = X imp m 1 X imp cov ( ˆ Y m 1 , ˆ m 2 cov (ˆ m 1 ˆ m 2 ˆ β S 1 , ˆ β S 2 ) �� w i x i y i / c i w i x i y i / c i S sam S sam = X imp m 1 X imp m 2 cov � 1 , � 2 . w i x 2 i / c i w i x 2 i / c i S sam S sam 1 2 Petr Novák, Václav Kosina Using the Superpopulation Model for Imputations and Variance

Stratification level shifts - covariance computation The variables y i belonging to either S 1 or S 2 but not to S d are mutually independent. Denote as B S 1 and B S 1 the sums in the denominator:   Y m 2 ) = X imp m 1 X imp � m 2 cov ( ˆ Y m 1 , ˆ var w i x i y i / c i  B S 1 B S 2 S sam d = X imp m 1 X imp � m 2 w 2 i x 2 i / c 2 i vary i B S 1 B S 2 S sam d = X imp m 1 X imp � B S d m 2 S d = X imp m 1 X imp w 2 i x 2 i / c i σ 2 σ 2 β Sd . m 2 B S 1 B S 2 B S S sam d This way we can compute all the covariances between base strata and the mean square error of the whole sum. Petr Novák, Václav Kosina Using the Superpopulation Model for Imputations and Variance

Stratification level shifts - chained imputations If we have a sophisticated stratification structure and chained imputations, we need to compute the chained covariance also. The covariances are computed with help of conditional covariance decomposition: cov ( ˆ Y m 1 , ˆ Y m 2 ) = Ecov [ ˆ Y m 1 , ˆ Y m 2 | X ] + cov ( E [ ˆ Y m 1 | X ] , E [ ˆ Y m 2 | X ]) = Ecov [ ˆ Y m 1 , ˆ Y m 2 | X ] + β S 1 β S 2 cov (ˆ X m 1 , ˆ X m 2 ) . The computation of the mean of the first term with respect to X would X : be rather difficult, we substitute it with the estimate with the help of ˆ cov ( ˆ Y m 1 , ˆ Y m 2 ) = � cov [ ˆ Y m 1 , ˆ Y m 2 | X ] + ˆ β S 2 cov (ˆ X m 1 , ˆ X m 2 ) . β S 1 ˆ � Petr Novák, Václav Kosina Using the Superpopulation Model for Imputations and Variance

Choosing the weights If no stratification shifts are involved and no outliers are present, we can use w i ≡ 1. If we compute ˆ β from a superstratum S consisting of basic strata k = 1 , .., K , we can use w i ≡ N k / n k for units from stratum k . Data from the greater strata then influence the estimates more than the data from the smaller strata. If we apply some outlier-detection methods, we can use w i = 0 for data which may not fit the model, so that they will not influence the estimates. Petr Novák, Václav Kosina Using the Superpopulation Model for Imputations and Variance

Using the Superpopulation Model for Imputations and Variance - PowerPoint PPT Presentation

Using the Superpopulation Model for Imputations and Variance Computation in Survey Sampling Petr Novk, Vclav Kosina Czech Statistical Office Petr Novk, Vclav Kosina Using the Superpopulation Model for Imputations and Variance

Sensitivity of the population size estimates for different imputations of a covariate B.F.M.

Mean, median & mode imputations DEALIN G W ITH MIS S IN G DATA IN P YTH ON Suraj Donthi

Cosmological model : Cosmological model Cosmological model Cosmological model : : : :

Performing and tracking imputation Nicholas Tierney Statistician DataCamp Dealing With Missing

Software Model Checking Using Bogor Software Model Checking Using Bogor a Modular and

Software Model Checking Using Bogor Software Model Checking Using Bogor a Modular and

Software Model Checking Using Bogor Software Model Checking Using Bogor a Modular and

CGE model development (1) CGE model development (1) Concept of CGE model and Concept of CGE

k -Step Ahead Prediction Error Model 1. k -Step Ahead Prediction Error Model 1. ARMAX model is

Model-Free Methods Model-Free Methods Model-based: use all branches S 2 A 1 S 3 R=2 A 2 S 2 S 1

Developing and Using Special Developing and Using Special Developing and Using Special Purpose

Linear Model using Excel 2013 Trendline XL2A 4/3/2017 V0L XL2A V0L Model Trendline

MODEL SELECTION AND REGULARISATION MODEL SELECTION ESTIMATING THE ACCURACY OF THE MODEL We

Take out your DNA model DNA and the Human Genome DNA Model How was your How was your model

Probabilistic and Model Fusion: . . . Model Fusion: . . . Interval Uncertainty Model Fusion:

Model REM Rapid Engineering Model What is REM? REM Rapid Engineering Model What is REM? REM

Middle Grade Design and Improvement Initiative District Committee Meeting: Placement/Gain

CITY OF EUCLID MASTER PLAN COMMUNITY SURVEY RESULTS County Planning Team James Sonnhalter ,

By IMP.A.A.C.T. Photography The American Public Works Association exists to develop and support the

th the e Fut uture ure University of Pittsburgh Pittsburgh Campus Pit itt Ins nstitu

IRS INTERMEDIATE SANCTIONS: How THEY W ILL IMP ACT COLLEGES AND UNIVERSITIES MILTON CERNY

2016 Community Improvement Awards Huntingdon County Chamber of Commerce Huntingdon County

Pacific Grove Hospitality Improvement District (PGHID) Established PGHID in 2007 Included

First Amendment to the 2010 -2020 Institutional Master Plan Task Force/Community Meeting April

Sambuz

Useful Links

Newsletter

Mail Us

Using the Superpopulation Model for Imputations and Variance - PowerPoint PPT Presentation

Using the Superpopulation Model for Imputations and Variance Computation in Survey Sampling Petr Novk, Vclav Kosina Czech Statistical Office Petr Novk, Vclav Kosina Using the Superpopulation Model for Imputations and Variance

Sensitivity of the population size estimates for different imputations of a covariate B.F.M.

Mean, median &amp; mode imputations DEALIN G W ITH MIS S IN G DATA IN P YTH ON Suraj Donthi

Cosmological model : Cosmological model Cosmological model Cosmological model : : : :

Performing and tracking imputation Nicholas Tierney Statistician DataCamp Dealing With Missing

Software Model Checking Using Bogor Software Model Checking Using Bogor a Modular and

Software Model Checking Using Bogor Software Model Checking Using Bogor a Modular and

Software Model Checking Using Bogor Software Model Checking Using Bogor a Modular and

CGE model development (1) CGE model development (1) Concept of CGE model and Concept of CGE

k -Step Ahead Prediction Error Model 1. k -Step Ahead Prediction Error Model 1. ARMAX model is

Model-Free Methods Model-Free Methods Model-based: use all branches S 2 A 1 S 3 R=2 A 2 S 2 S 1

Developing and Using Special Developing and Using Special Developing and Using Special Purpose

Linear Model using Excel 2013 Trendline XL2A 4/3/2017 V0L XL2A V0L Model Trendline

MODEL SELECTION AND REGULARISATION MODEL SELECTION ESTIMATING THE ACCURACY OF THE MODEL We

Take out your DNA model DNA and the Human Genome DNA Model How was your How was your model

Probabilistic and Model Fusion: . . . Model Fusion: . . . Interval Uncertainty Model Fusion:

Model REM Rapid Engineering Model What is REM? REM Rapid Engineering Model What is REM? REM

Middle Grade Design and Improvement Initiative District Committee Meeting: Placement/Gain

CITY OF EUCLID MASTER PLAN COMMUNITY SURVEY RESULTS County Planning Team James Sonnhalter ,

By IMP.A.A.C.T. Photography The American Public Works Association exists to develop and support the

th the e Fut uture ure University of Pittsburgh Pittsburgh Campus Pit itt Ins nstitu

IRS INTERMEDIATE SANCTIONS: How THEY W ILL IMP ACT COLLEGES AND UNIVERSITIES MILTON CERNY

2016 Community Improvement Awards Huntingdon County Chamber of Commerce Huntingdon County

Pacific Grove Hospitality Improvement District (PGHID) Established PGHID in 2007 Included

First Amendment to the 2010 -2020 Institutional Master Plan Task Force/Community Meeting April

Sambuz

Useful Links

Newsletter

Mail Us

Mean, median & mode imputations DEALIN G W ITH MIS S IN G DATA IN P YTH ON Suraj Donthi