GoBack
Confidentiality Protection in the Census Bureau’s Quarterly Workforce Indicators John M. Abowd 1 , 2 , Bryce E. Stephens 2 , 3 , and Lars Vilhuber 1 1 Cornell University, 2 U.S. Census Bureau, 3 University of Maryland August 10, 2005 - p. 1/31
Introduction ■ We will describe the confidentiality protection mechanism as applied to a new statistical product: the Quarterly Workforce Indicators (QWI). ■ The underlying data infrastructure was designed by the Longitudinal Employer-Household Dynamics Program at the Census Bureau and is described in detail in Abowd et al (2005). ■ From a longitudinally integrated frame of state unemployment insurance wage records, we create measures of employment, wages, hiring, separation, job creation and job destruction. August 10, 2005 - p. 2/31
Core problem ■ Disclosure proofing is required to protect the information about individuals and businesses that contribute to ✦ (confidential) unemployment insurance (UI) wage records ✦ (confidential) Quarterly Census of Employment and Wages (QCEW, also known as ES-202) reports ✦ as well as information from Census Bureau demographic data that have been integrated with these sources. ■ Primary concern of the confidentiality protection mechanism is thus with small cells, i.e. , cells that reflect data on few individuals or few firms. August 10, 2005 - p. 3/31
Protection provided ■ In general, data are considered protected when aggregate cell values do not closely approximate data for any one respondent in the cell (Cox and Zayatz, 1993, pg. 5) ■ In the QWI confidentiality protection scheme, confidential micro-data are considered protected by noise infusion if 1. any inference regarding the magnitude of a particular respondent’s data differs from the confidential quantity by at least c % even if that inference is made by a coalition of respondents with exact knowledge of their own answers or 2. any inference regarding the magnitude of an item is incorrect with probability no less than y % , where c and y are confidential but generally large August 10, 2005 - p. 4/31
Quality of the disclosed data ■ The confidentiality-protected data must be inference-valid for a well-defined set of analyses ■ We show that ✦ the theoretical properties of the disclosure-proofing mechanism are designed to maintain analytical validity for trend analysis; ✦ in practice, the disclosure-proofed data are not biased; ✦ in practice, the time-series properties of the disclosure-proofed data remain intact. August 10, 2005 - p. 5/31
Three-layer confidentiality protection in QWI (I) ■ Layer 1: Multiplicative noise-infusion at the establishment level, with three very important properties 1. every establishment-level data item is distorted by some minimum amount 2. distortion amount and direction are time-invariant: data are always distorted in the same direction (increased or decreased) by the same percentage amount in every period. 3. when estimates are aggregated, the effects of the distortion cancel out for the vast majority of the estimates August 10, 2005 - p. 6/31
Three-layer confidentiality protection in QWI (II) ■ Layer 2: Weighting of estimates at higher levels ( e.g., sub-state geography and industry detail) ✦ construct weights such that state-level beginning of quarter employment for all private employers matches the first month in quarter employment in QCEW. ✦ the establishment-level weight is used for every indicator in the QWIs August 10, 2005 - p. 7/31
Three-layer confidentiality protection in QWI (II) ■ Layer 3: Small-cell editing (Suppression or synthesizing) ✦ Some aggregate estimates are based on fewer than three persons or establishments. ✦ Currently, these estimates are suppressed and a flag set to indicate suppression. ✦ In next version of disclosure-proofing system, these estimates are replaced with synthetic values. Note: Editing is only used when the combination of noise infusion and weighting may not distort the publication data with a high enough probability to meet the criteria layed out above. ✦ Count data such as employment are subject to editing. ✦ Continuous dollar measures like payroll are not. ✦ Regardless of small-cell editing, all published estimates are still substantially influenced by the noise that was infused in the first layer of the protection system. August 10, 2005 - p. 8/31
Implementation of multiplicative noise model ■ a random fuzz factor δ j is drawn for each establishment j ( b − δ ) / ( b − a ) 2 , δ ∈ [ a, b ] ( b + δ − 2) / ( b − a ) 2 , δ ∈ [2 − b, 2 − a ] p ( δ j ) = 0 , otherwise 0 , δ < 2 − b ( δ + b − 2) 2 / � 2 ( b − a ) 2 � , δ ∈ [2 − b, 2 − a ] F ( δ j ) = 0 . 5 , δ ∈ (2 − a, a ) � ( b − a ) 2 − ( b − δ ) 2 � 0 . 5 + , δ ∈ [ a, b ] 1 , δ > b where a = 1 + c/ 100 and b = 1 + d/ 100 are constants chosen such that the true value is distorted by a minimum of c percent and a maximum of d percent. August 10, 2005 - p. 9/31
Distribution of Fuzz Factors 0 0 2−b 2−a 1 a b 2 August 10, 2005 - p. 10/31
Distorting magnitudes and counts The exact implementation depends on the type of estimate: ■ Magnitudes and counts X ∗ jt = δ j X jt , where X jt is an establishment level statistic among B , E , M , F , A , S , H , R , FA , FS , FH , W k , WFH , NA , NH , NR , NS August 10, 2005 - p. 11/31
Distorting ratios Ratios are distorted by distorting numerators (magnitudes), but using undistorted denominators: Y ∗ Y jt jt ZY ∗ jt = = δ j , B ( Y ) jt B ( Y ) jt This method is used for ■ average earnings ( ZW k ) and ■ average periods of nonemployment ( ZN _) for various groups August 10, 2005 - p. 12/31
Distorting flows ■ Distorted net job flow ( JF ) is computed at the aggregate ( k = geography, industry, or combination of the two for the appropriate age and sex categories) level as the product of the aggregated, undistorted rate of growth and the aggregated distorted employment: ¯ E ∗ kt = G kt × ¯ kt JF ∗ E ∗ kt = JF kt × . ¯ E kt ■ The formulas for distorting gross job creation ( JC ) and job destruction ( JD ) are similar. ■ Same logic is used to distort wage changes for subgroups August 10, 2005 - p. 13/31
Item suppression ■ Some disclosure risk remains for counts based on very few entities in a cell (fewer than three individuals or employers) ■ Variables affected are: B , E , M , F , A , S , H , R , FA , FH , FS , JC , JD , JF , FJC , FJD , FJF . ➜ item suppression based on the number of either workers or the number of employers that contribute data for that item in a cell k in time period t , where a cell represents a particular combination of geography × industry × age × sex. ■ Because of noise infusion, no complementary suppressions are needed ■ Some denominators may be zeroes - the ratio or rate cannot be computed. August 10, 2005 - p. 14/31
Economic concepts in QCEW and QWI Difference in the economic concepts underlying the Quarterly Census of Employment and Wages (QCEW) and the QWI statistics ■ QCEW: employment on the 12th day of the first month in the quarter ( QCEW 1 ,jt ) ■ QWI: several measures of employment, derived from reports of quarterly employment and wages of individual workers at particular employers (state UI accounts). ■ Key definition: Beginning of quarter employment B jt employees at establishment j in both quarter t and t − 1 , and by inference, on the 1st day of quarter t . August 10, 2005 - p. 15/31
Protection by Weighting ■ QCEW 1 ,jt and B jt are not identical because ✦ they do not refer to exactly the same point in time, ✦ the in-scope establishments differ slightly, and ✦ they are computed from different universe data. ■ Actual differences captured by the QWI weighting scheme: time-series of adjustment weights are defined by � � w t b jt = QCEW 1 ,jt j j ■ All variables are weighted by w t August 10, 2005 - p. 16/31
Protection by Imputation ■ no actual confidential micro-data measured at the establishment level in QWI ■ workplace characteristics ( geography, industry) are multiply-imputed for multi-unit employers ➜ these establishments are protected by a form of synthetic data. August 10, 2005 - p. 17/31
Protection Table 1: Small Cells: B , Raw vs. Weighted (a) Illinois Weighted count Unweighted 5 or count 0 1 2 3 4 more 0 99.33 0.66 0.00 0.00 0.00 0.00 1 0.10 96.76 3.13 0.00 0.00 0.00 2 0.01 2.00 84.68 13.26 0.04 0.01 3 0.01 0.01 3.42 75.72 20.26 0.59 4 0.00 0.00 0.01 4.49 67.62 27.87 5 or more 0.00 0.00 0.00 0.01 0.59 99.39 Total number of cells: 14,229,968 . For details, see text. August 10, 2005 - p. 18/31
Protection Table 2: Small Cells: B , Undistorted vs. Distorted (a) Illinois Distorted count Undistorted 5 or count 0 1 2 3 4 more 0 99.86 0.14 0.00 0.00 0.00 0.00 1 0.91 95.75 3.34 0.00 0.00 0.00 2 0.00 4.27 87.25 8.47 0.00 0.00 3 0.00 0.00 10.69 77.20 12.11 0.00 4 0.00 0.00 0.00 14.73 67.49 17.78 5 or more 0.00 0.00 0.00 0.00 1.93 98.07 Total number of cells: 14,229,968 . Both comparisons are for weighted data. For details, see text. August 10, 2005 - p. 19/31
Recommend
More recommend