STAT373/ Week 10 STAT814_STAT714 Statistical Divisions (SD) NOTE: LGAs (182 of them) are grouped into 12 Statistical Week 10: STRATIFIED SAMPLING Divisions (SDs). These become our strata. SD id SD name Number of LGAs LGA example 5 Sydney 46 10 Hunter 14 • We return to the problem of estimating the 15 Illawarra 4 mean number of overseas-born people per 20 Richmond-Tweed 7 25 Mid-North Coast 11 NSW LGA (1996) . 30 Northern 20 • It seems plausible that overseas-born people 35 North Western 14 would be more likely to settle in urban 40 Central West 14 rather than rural areas. 45 South Eastern 19 50 Murrumbidgee 14 • So perhaps a stratification based on broad 55 Murray 16 geographical groupings of LGAs would be 60 Far West 3 sensible. 1 2 Descriptive statistics of OS born by SD_id Descriptive Statistics 30 Variable SD_id N Mean Median TrMean StDev OSBorn P 5 46 26426 20093 24728 19966 60 10 14 9731 1192 4070 23028 20 15 4 15311 7439 15311 19457 35 20 7 3048 3263 3048 3000 25 25 11 2113 1281 1823 2193 30 20 1084 350 554 2554 40 10 35 14 476 239 388 555 50 5 40 14 913 407 796 1001 55 45 19 1420 879 1267 1498 15 50 14 644 264 427 994 55 16 511 262 297 947 60 3 1486 972 1486 1333 45 3 4 Number of OS born in NSW LGAs A reasonable stratification strategy for sampling by SD LGAs would be to have the following three strata: • Stratum 1: 5 (Sydney), 15 (Illawarra) 100000 • Stratum 2: 10 (Hunter), 20 (Richmond-Tweed), 25 (Mid- North Coast), 45 (South Eastern) OSBorn P • Stratum 3: 50000 rest of NSW Note : Alternatively, Hunter (10), Sydney (5) or 0 Illawarra (15) may be considered as a separate 5 10 15 20 25 30 35 40 45 50 55 60 stratum due to its difference in variability. SD_id 5 6 2019 1
STAT373/ Week 10 STAT814_STAT714 Descriptive statistics of the three strata Descriptive Statistics Now let’s draw a simple random sample of size 10 LGAs from each of the three Variable stratum N Mean Median TrMean StDev OSBorn P 1 50 25537 19435 23429 19964 strata. 2 51 4074 1211 1933 12384 3 81 775 322 549 1488 Variable stratum SE Mean Minimum Maximum Q1 Q3 OSBorn P 1 2823 2216 97203 9695 33953 2 1734 102 87264 377 3522 3 165 18 11607 159 651 7 8 Sample from Stratum 1 Sample from Stratum 2 Pittwater 11177 Merriwa 150 Baulkham Hills 30267 Eurobodalla 3996 Marrickville 33538 Newcastle 16266 Leichhardt 16556 Muswellbrook 930 Manly 9759 Blacktown 72350 Yarrowlumla 1449 Botany 16002 Scone 543 Willoughby 19180 Gunning 203 Waverley 24006 Singleton 1455 Hunter's Hill 2690 Cntrl Darling 126 Indigo 1086 Variable N Mean Median TrMean StDev SE Mean Variable N Mean Median TrMean StDev SE Mean OS_born 10 23552 17868 20061 19522 6173 OS_born 10 2620 1008 1227 4927 1558 Variable Minimum Maximum Q1 Q3 OS_born 2690 72350 10823 31085 Variable Minimum Maximum Q1 Q3 OS_born 126 16266 190 2090 9 10 Sample from Stratum 3 Estimation of Cobar 300 We have Tamworth 1888 Parkes 808 Holbrook 137 Walgett 940 Jerilderie 127 N 50 N 51 N 81 Cessnock 2999 1 2 3 Warren 109 n 10 n 10 n 10 Evans 420 1 2 3 Parry 642 y 23 , 552 y 2 , 620 y 837 1 2 3 Variable N Mean Median TrMean StDev SE Mean OS_born 10 837 531 658 932 295 s 19 , 522 s 4 , 927 s 932 1 2 3 Variable Minimum Maximum Q1 Q3 Total sample size, n n n n 30 . OS_born 109 2999 135 1177 1 2 3 11 12 2019 2
STAT373/ Week 10 STAT814_STAT714 Estimated variance and standard error Therefore, estimated mean number of oversea - born people per NSW LGA is : 2 3 N s 2 ˆ V a r ( y ) i 1 f i ST i N n 50 51 81 i 1 i y 23 , 552 2 , 620 837 2 50 10 19 , 522 2 1 182 182 182 ST 182 50 10 2 51 10 4 , 927 2 1 182 51 10 7 , 577 2 81 10 932 2 1 182 81 10 2 , 469 , 424 ˆ S E ( y ) 2 , 469 , 424 ST 1 , 571 . 13 14 Comparison of SRS with stratified sampling results With stratified sampling, for the same sample size n =30, we have estimated with Recall: Last week we estimated the mean increased precision (ie, smaller standard LGA OS-born people, based on a simple error/variance as shown in Slide 14): random sample of size n =30. SE y Var y 1 , 571 s ( y ) 1 , 620 . ST ST We obtained y 6 , 527 with SE y 9 , 713 ( 1 0 . 165 ) / 30 1,620. 15 16 Design Effect (Lohr §7.5) Estimation of the population total The Design Effect is defined as Recall the population total is Var estimate under current sampling plan deff N Var estimate under SRS with same sample size and its sample estimator (based on a SRS) is This quantifies the effect on the sampling variance obtained by using the current sampling scheme y N y T (e.g. stratified sampling) over SRS. 2 1 , 571 We have here deff 0 . 94 2 For a stratified sample, 1 , 620 Note : Usually the design effect for a stratified y N y sample will be less than one (ie, higher precision), T , ST ST unless all the stratum means are equal. 17 18 2019 3
STAT373/ Week 10 STAT814_STAT714 Confidence intervals for and y is an unbiased estimator of . T , ST (Easy to show.) We have the usual normal approximat ions : Var ( y ) Var N y T , ST ST y ~ N , Var y N 2 Var y ST ST ST and 2 L 2 2 N W 1 f i i i y ~ N , Var y n i 1 T , ST T , ST i L 2 N 2 1 f i i i n i 1 i 19 20 OS-born example Approximat e 100(1 - )% confidence intervals are given by Population mean : y z SE y for ST / 2 ST and y z SE y ST / 2 ST y z SE y 7 , 577 1 . 96 1 , 571 T , ST / 2 T , ST N y z N SE y for 7 , 577 3 , 079 ST / 2 ST ( 4 , 498 , 10 , 656 ) Includes =8,502 t distribution? number of df unclear, and so use z instead . 21 22 Population total : Choice of stratum sample sizes n i 1. Proportional allocation y N y 182 7 , 577 T , ST ST Sometimes N i ’s and n are known, but we need 1 , 379 , 014 to work out the n i ’s ( ). n n n n 1 2 L 95% CI : One approach is to insist on sampling the N y z SE y same proportion of each stratum, i.e ST / 2 ST 182 4 , 498 ; 10 , 656 n n i f i 1 , , L N N 818 , 636 ; 1 , 939 , 392 Includes =1,547,364 i 23 24 2019 4
Recommend
More recommend