weights in understanding society
play

Weights in Understanding Society Olena Kaminska An initiative by - PowerPoint PPT Presentation

Weights in Understanding Society Olena Kaminska An initiative by the Economic and Social Research Council, with scientific leadership by the Institute for Social and Economic Research, University of Essex, and survey delivery by NatCen Social


  1. Weights in Understanding Society Olena Kaminska An initiative by the Economic and Social Research Council, with scientific leadership by the Institute for Social and Economic Research, University of Essex, and survey delivery by NatCen Social Research and Kantar Public

  2. Topics covered? • Should I use weights? • How to select a correct weight? • I want higher sample size but you have 0-weights • Can I create my own tailored weights?

  3. Should I use weights?

  4. The easiest way to represent population with UKHLS: svyset command in Stata use [...]a_indall.dta svyset a_psu [pweight= a_psnenus_xw], strata(a_strata) svy: tabulate a_ethn_dv svy: logistic a_single_dv a_dvage

  5. Effect of weights in UKHLS: Country distribution population unweighted 84.2 79.8 England 4.7 6.2 Wales 8.2 7.8 Scotland 2.8 6.2 NI Wave 8 (2016-2017) estimates of 0+ population Population estimates of mid-2016 from ONS

  6. Effect of weights in UKHLS: Country distribution population unweighted weighted 84.2 79.8 84.5 England 4.7 6.2 4.6 Wales 8.2 7.8 8.1 Scotland 2.8 6.2 2.8 NI Wave 8 (2016-2017) estimates of 0+ population Population estimates of mid-2016 from ONS

  7. Effect of weights in UKHLS: General election 2017 population unweighted 42.4 36.5 Conservatives 40.0 48.9 Labour 7.4 7.3 Liberal Democrat Scottish National 3.0 1.9 Party 0.5 0.3 Plaid Cymru 1.6 1.7 Green Party Wave 8 estimates for July-December 2017, excludes NI 2017 UK general election results from Wikipedia

  8. Effect of weights in UKHLS: General election 2018 population unweighted weighted 42.4 36.5 42.5 Conservatives 40.0 48.9 40.6 Labour 7.4 7.3 7.8 Liberal Democrat Scottish National 3.0 1.9 2.7 Party 0.5 0.3 0.4 Plaid Cymru 1.6 1.7 2.4 Green Party Wave 8 estimates for July-December 2018, excludes NI, weight is adjusted as BHPS is also excluded 2017 UK general election results from Wikipedia

  9. What if I ran my analysis without weights? • Your results may be quite off, they may be only a little off, but you will not know

  10. How to select a weight for my analysis?

  11. Naming convention for Understanding Society weights w_xxxyyzz_aa w_ Xxx Yy Zz _aa a_ hhd: en: us: GPS & EMB _xw: cross-sectional analysis household enumeration weight b_ bh: BHPS psn: persons in: interview _lw: longitudinal weight c_ ub: GPS, EMB & BHPS 0+ px: interview or _xd: x-sectional design weight d_ ui: GPS, EMB, BHPS & ind: persons proxy IEMB _li: longitudinal inclusion … 16+ 5m: “extra 5 weight 91: BHPS original yth: persons minutes” sample 10-15 sc: self- 01: BHPS original completion sample + boosts ns: nurse visit bd: blood

  12. _ aa part: Is your analysis longitudinal or cross-sectional • Longitudinal _lw • Cross-sectional _xw

  13. w _ part: which waves do you use? • (Last) wave of your analysis: e.g. wave 9: i _ weight 1 2 3 4 5 6 7 8 9 a b c d e f g h i

  14. _ xxx part: whom do you study • Household level analysis: _ hhd enzz_xw in _hhresp.dta • Everyone in the household (0+): _ psn enzz_ weight in _indall.dta • Youth analysis (10-15): _ yth sczz_xw in _youth.dta • Adults (16+): _ ind yyzz_aa in _indresp.dta

  15. yy part: analysis of adults • Questions asked to proxies: _ind px zz_ • Questions in main questionnaire: _ind in zz_ • Questions in self-completion questionnaire: _ind sc zz_ • Questions from nurse visit: _ind ns zz_ • Questions from using information from blood samples: _ind bd zz_ • Extra 5 minutes questionnaire: _ind 5m zz_

  16. Combination of instruments Level of Analysis Questions available for Household level (all enumerated individuals) 5 Adult proxy and main interview 4 Adult main interview only (no proxy) 3 Adult self‐completion interview 2 2 Extra 5 minutes interview Youth questionnaire 2 Nurse visit 2 Information from blood sample 1 Use the lowest level of analysis for your weight:

  17. zz _ part: which waves • Wave 6 onwards (BHPS+GPS+EMB+IEMB): _XXXXX ui _ • Wave 2 onwards (BHPS+GPS+EMB): _XXXXX ub _ • Wave 1 onwards (GPS+EMB): _XXXXX us _ • 2001 onwards (BHPS, including NI): _XXXXX 01 _lw • 1991 onwards (BHPS, excluding NI): _XXXXX 91 _lw

  18. I want higher sample size but you have 0-weights

  19. Why 0 weights • TSMs are not part of a longitudinal sample by design – they all have 0 longitudinal weight • ‘TSMs from wave 1’ – non-eligible people in eligible EMB and IEMB households (they started at wave 1 and 6) – always 0 weights, even in their wave 1 • Longitudinal weights assume participation in all waves – so 0 weight for anyone who missed at least one wave • Cross-sectional weights require household participation in all waves (although ui weights require participation in waves1,2,6 and onwards)

  20. Zero weights sample size estimate Std Error CI low CI high 20 0.193 0.096 -0.008 0.395 30 0.171 0.075 0.017 0.325 40 0.188 0.067 0.053 0.323 50 0.139 0.051 0.037 0.241 75 0.097 0.036 0.025 0.168 100 0.106 0.033 0.039 0.172 300 0.171 0.028 0.115 0.226 500 0.146 0.020 0.107 0.185 1000 0.138 0.013 0.113 0.164 5000 0.133 0.006 0.122 0.144 10000 0.133 0.004 0.125 0.140 15000 0.132 0.003 0.126 0.138 20000 0.131 0.003 0.126 0.136 30000 0.131 0.002 0.127 0.135 33818 0.132 0.002 0.128 0.136 unweighted 39,289 0.150 0.002 0.146 0.153 Proportion of natural/adoptive/step mothers of child under 16 from wave 8

  21. Zero weights 0.450 0.400 0.350 0.300 0.250 estimate 0.200 CI low 0.150 CI high 0.100 0.050 0.000 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 -0.050 20 30 40 50 75 100 300 500 1000 5000 10000 15000 20000 30000 33818 Proportion and CIs of natural/adoptive/step mothers of child under 16 from wave 8

  22. I want more (sample size): • First, analyse with our weights • If significant – just published that • If not significant and p=0.6 – it’s unlikely that adding 20% of sample will take p below 0.05 • If p is marginal – worth considering tailored weighting

  23. Tailored weights

  24. Start with one of our weights • You are studying wave 1 and wave 9: start with wave 1 weight and model wave 9 response conditional on wave 1; • You are studying wave 8 to 9 change: start with wave 6 issue weight and model wave 8-9 joint response conditional on wave 6 positive weight; • You are studying youth questions at wave 1 and main questionnaire at wave 9: start with enumerated weight i_ psn enus_lw

  25. If you want your own attrition adjustment • Start with either: - Wave 1 for (GPS+EMB) weight: a_psnenus_xw - Wave 2 issue weight (BHPS+GPS+EMB): b_psnenub_li - Wave 6 issue weight (BHPS+GPS+EMB): f_psnenub_li • Use predictors from wave 1, 2 or 6 respectively • Remember to take into account newborns, death, moving out of the country and becoming 16 (entering adult questionnaire) adjustments • You can create your own cross-sectional weights too through a weight share

  26. BHPS 1991 only 2009-10 2014-15 1991 Original 1999 2001 Sample Weight = 1/prob Prob=prob_selection*prob_w1*prob_attr prob_selection – selection probability reflecting sample design prob_w1 – correction for household and person nonresponse at wave 1 prob_attr – correction for nonresponse after wave 1

  27. UKHLS samples 2009-10 2014-15 1991 Original 1999 Sc and 2001 NI GPS + EMB IEMB+NIB Sample W boost sample samples samples

  28. UKHLS samples 2009-10 2014-15 1991 Original 1999 Sc and 2001 NI GPS + EMB IEMB+NIB Sample W boost sample samples samples • For each person we infer where they lived in ‘91, ‘99, ’01, ’09-’10 and ’14-’15 (E, W, Sc, NI or abroad) using: - the place they were selected at - for long-term members we know where and when they moved - for new members based on survey questions - all IEMB sample members were asked where they lived - for ethnic minority groups, place is more detailed: residence in ’09-’10 and ’14-’15 (London borough, postcode sector)

  29. Design weight at wave 8 2009-10 2014-15 1991 Original 1999 Sc and 2001 NI GPS + EMB IEMB+NIB Sample W boost sample samples samples Dweight=1/Dprob Total is a sum of 17 selection probabilities Dprob=probE91+probSc91+probW91+ + probSc99 + probW99 + probNI01 + + probE09 + probSc09 + probW09 + probNI09 + + pembE09 + pembSc09 + pembW09 + + piembE14 + piembSc14 + piembW14 + pnib14 Newborns get their mother’s Dweight

  30. Issue weights at waves 2 and 6 2009-10 2014-15 1991 Original 1999 Sc and 2001 NI GPS + EMB IEMB+NIB Sample W boost sample samples samples Iweight=1/Iprob Nr – prob of w1 response and retention until w2 (w6) Total is a sum of 17*2 probabilities Iprob=probE91*nrE91+probSc91*nrSc91+probW91*nrW91+ + probSc99*nrSc99 + probW99*nrW99 + probNI01*nrNI01 + + probE09*nrE09 + probSc09*nrSc09 + probW09*nrW09 + + probNI09*nrNI09 + pembE09*nreE09 + embSc09*nreSc09 + + pembW09*nreW09 + piembE14*nrieE14 + piembSc14*nrieSc14 + piembW14*nrieW14 + pnib14*nrnib14 Newborns get their mother’s Iweight

Recommend


More recommend