a unified statistical framework for demographic rates
play

A Unified Statistical Framework for Demographic Rates Using - PDF document

A Unified Statistical Framework for Demographic Rates Using Demographic and Health Survey Data Thomas W. Pullum tom.pullum@icf.com The Demographic and Health Surveys Program Draft of October 25, 2017 Prepared for the IUSSP International Population


  1. A Unified Statistical Framework for Demographic Rates Using Demographic and Health Survey Data Thomas W. Pullum tom.pullum@icf.com The Demographic and Health Surveys Program Draft of October 25, 2017 Prepared for the IUSSP International Population Conference, Cape Town, South Africa, October 29 to November 4, 2017. DHS is a project of the United States Agency for International Development. 1

  2. Abstract The Demographic and Health Surveys Program is a major source of estimates of fertility, under ‐ five mortality, and adult and maternal mortality in developing countries. The textbook definitions of these rates are geared to settings with good vital statistics and census data. This paper is intended to clarify how DHS rates are calculated from survey data, particularly from retrospective birth histories and sibling histories. The rates in DHS reports and on STATcompiler are calculated with CSPro, a non ‐ statistical data processing package that essentially cumulates numerators and denominators and divides. Confidence intervals in the reports are calculated with a repetitive jackknife procedure. This paper presents a statistical modeling approach, in which generalized linear models are applied to the individual ‐ level data files, and the coefficients from those models are converted to rates. Confidence intervals for single rates and compound rates can also be estimated analytically. The methods are described in terms of Stata, within which it is also easy to incorporate the sampling weights and sample design effects. The estimated rates from the models agree exactly with the DHS estimates and the confidence intervals are very similar. The framework is intended to help bridge the gap between demographic and statistical methods, as well as to bring out the conceptual similarities among the different kinds of rates. 1. Introduction The Demographic and Health Surveys Program has been a major source of estimates of fertility and child mortality in developing countries for more than 30 years. For approximately the last half of that interval, it has also been a major source of estimates of adult and maternal mortality. The procedures for estimating these rates from survey data evolved from methods that were originally developed for vital statistics and census data. The procedures are implemented by DHS with the Census and Survey Processing System (CSPro), a package developed largely by the U.S. Census Bureau and DHS (with USAID support) for the entry, editing, and tabulation of census and survey data. CSPro is a Windows ‐ based package that has been in widespread use since about 2000 but evolved from previous DOS ‐ based packages that did much the same thing. CSPro is able to produce publication ‐ ready tables of very complex indicators, but it is not a statistical package. It does not include estimation commands and it does not analytically produce standard errors. The DHS estimates of standard errors and confidence intervals that appear in the main survey reports are calculated with a jackknife procedure that is computationally intensive but essentially re ‐ calculates the indicator repeatedly with the omission of one cluster, or PSU, at a time. The DHS procedure for calculating demographic rates is to accumulate a numerator, accumulate a denominator, and divide, for each category of a covariate or each cell of a cross ‐ classification. Programs written in CSPro to accomplish this are computationally efficient, and are based on sound demographic procedures, but their logic is quite different from how a demographer or statistician might approach the task with a package such as SPSS, SAS, Stata, or R. It is very difficult for someone working outside of DHS to match the DHS rates. The main exception to this statement is the well ‐ known Stata program “TFR2” that has been prepared and made widely available by Bruno Schoumaker (Schoumaker 2013). Even within DHS, some analysts find TFR2 to be the easiest and fastest way to produce fertility rates that are consistent with those produced by CSPro. So far as I know, there is no generally available alternative to TFR2, and there are no generally available programs at all that are consistent with DHS estimates of under ‐ five mortality or adult and maternal 2

  3. mortality. Such programs surely exist, but presumably they are restricted to personal use or to internal use within various agencies. Prior to joining DHS in 2011 I had developed a Stata program to calculate the fertility rates, using similar logic to TFR2. This program was consistent with a description of the procedures for calculating fertility rates from survey data (Pullum 2004). Since joining DHS I have developed Stata programs that calculate all the rates, with emerging features and with the inclusion of confidence intervals. The goal has been to reproduce the DHS rates exactly, including very subtle and somewhat arbitrary features such as the treatment of age and time boundaries, the exclusion of any events that occur in the month of the survey, and so on, but within a statistical framework that would yield confidence intervals, adjusted for weights and other aspects of the survey design. A secondary goal has been to develop the flexibility to have different time intervals, specified either as a range of calendar time or time before the survey, to include categorical covariates, to include multiple surveys from the same country or from different countries in a single run, to save the results in files that can be re ‐ analyzed, and so on. It is expected that these programs will be distributed through the DHS website in 2018, in two formats: as Stata ado files, with limited options, and with complete Stata code that users can freely adapt for their own use. This paper is in part an introduction to those programs, in advance of their release. DHS produces the following three sets of demographic rates that will be discussed in this paper: Fertility rates General Fertility Rate (GFR) Age ‐ specific Fertility Rates for ages 15 ‐ 19 through 45 ‐ 49 (ASFRs) Total Fertility Rate (TFR) Under ‐ five Mortality Rates Neonatal Mortality Rate (NNMR) Post ‐ Neonatal Mortality Rate (PNNMR) Infant Mortality Rate (IMR. 1q0)) Child Mortality Rate (CMR, 4q1)) Under ‐ Five Mortality Rate (U5MR, 5q0)) Adult Mortality Rates Adult Male Mortality Rate (AMMR) Adult Female Mortality Rate (AFMR) Maternal Mortality Rate (MMRate) Maternal Mortality Ratio (MMRatio) 3

  4. We will make minimal use of formal notation. It will be assumed that readers have a good familiarity with the standard definitions of demographic rates in terms of numbers that could be obtained from, say, a complete civil registration and vital statistics (CRVS) system; with statistical concepts such as confidence intervals and generalized linear models, or at least with poisson and logit regression; and with basic computational procedures that can be implemented with a package such as Stata. It will not be assumed that readers are familiar with how demographic rates can be constructed from DHS data. The report will describe the calculation of fertility rates, under ‐ five mortality rates, and adult mortality rates, in succession. Within each topic, there will be two sections. The first section describes how DHS data, particularly birth histories and sibling histories, can be converted to estimates of rates. Many demographers who can ably interpret these rates are probably not clear on the details of how they are calculated. This discussion uses the kind of terminology that is compatible with either CSPro calculations or a statistical approach. Section 4.1 is extracted from the current version of the Guide to DHS Statistics (Rutstein and Rojas 2006). Sections 2.1 and 3.1 are drafts (by Pullum) of portions of the next version of this core document, which will appear in 2018. The second section for each set of rates shifts to an emphasis on the statistical approach. The final section of the paper will attempt to bring out the commonalities among the three sets of rates and procedures. 2. Fertility Rates We will discuss the following fertility rates: General Fertility Rate (GFR) Age ‐ specific Fertility Rates for ages 15 ‐ 19 through 45 ‐ 49 (ASFRs) Total Fertility Rate (TFR) DHS reports include estimates of the Crude Birth Rate (CBR) but it will not be included here. The most recent surveys obtain the day of children’s births, for all children in the birth histories, in addition to calendar month and year. Given the day, month, and year of the interview, which have always been obtained, it will be possible to estimate children’s ages more precisely (although it is expected that day of birth will often have to be imputed). Some DHS programs will be modified to take this information into account. The description given here is consistent with current procedures. 2.1. How the fertility rates are calculated with DHS data The main reports on DHS surveys include several measures of fertility for recent reference periods of time, usually the three years before the survey. We will provide a brief description of the calculation of the following rates: the General Fertility Rate (GFR), Age ‐ Specific Fertility Rates (ASFRs), and the Total Fertility Rate (TFR). These are all calculated as exposure ‐ occurrence rates, in which the numerators consist of numbers of births and the denominators are woman ‐ years of exposure to the risk of childbearing. 4

Recommend


More recommend