catherine fitch and steven ruggles family history
play

Catherine Fitch and Steven Ruggles Family History Technology - PowerPoint PPT Presentation

Catherine Fitch and Steven Ruggles Family History Technology Workshop February 2018 Keypunch operators, 1940 Census Each person in the world creates a Book of Life. This Book starts with birth and ends with death. Record linkage is the name


  1. Catherine Fitch and Steven Ruggles Family History Technology Workshop February 2018 Keypunch operators, 1940 Census

  2. Each person in the world creates a Book of Life. This Book starts with birth and ends with death. Record linkage is the name of the process of assembling the pages of this Book into a volume. - Halbert L. Dunn, 1946

  3. Big Data Transactional or Designed Data “Organic” Data • Censuses • Administrative • Social Security • Surveys • Medicare • Remote sensing • Military • satellite imagery • Taxes • weather stations • Commercial • Credit ratings • Phone records • Social Media

  4. The biggest payoff will lie in new combinations of designed data and organic data, not in one type alone - Robert Groves, 2011

  5. Organic/Transactional data is voluminous, but • shallow (few variables) and • non-representative Both problems can be overcome by linking to Designed data

  6. National Longitudinal Research Infrastructure Life histories for each person • Censuses • Social Security • Military records (draft, enlistment) • Vital records (birth, death, marriage, divorce) • Health (Medicare, Medicaid) • Surveys

  7. National Longitudinal Research Infrastructure Link across 5+ generations, 1850-2020

  8. The First Microdata: The 1960 Census Samples Distributed on 13 Univac Tapes Cover, 1960 Census Microdata Codebook (or 18,000 punchcards)

  9. Historical Data

  10. 1991: Eight Census Years 1850-1980 All Incompatible (except 1960 and 1970)

  11. 1991 IPUMS proposal: An integrated database for 1880, 1900, 1910, 1940, 1950, 1960, 1970, 1980, 1990  Harmonized codes  Consistent record layout  Integrated documentation  No loss of information .

  12. IPUMS Graph from “A Century of Women in Science and Engineering,” History Day project by Abby Norling-Ruggles, age 12 Percent Female; Scientists and Engineers 40 35 30 Scientists Percent Female 25 20 15 Engineers 10 5 0 1900 1910 1920 1930 1940 1950 1960 1970 1980 1990 2000 2005 Year

  13. 5,000 4,500 4,000 Five Terabytes Gigabytes per week 3,500 distributed each 3,000 week 2,500 2,000 1,500 1,000 500 0 1995 2000 2005 2010 2015 IPUMS Data Dissemination, 1995-2017

  14. 200,000 175,000 189,000 Number of users 150,000 Registered 125,000 IPUMS Users 100,000 75,000 50,000 25,000 0 1995 2000 2005 2010 2015 Registered IPUMS data users, 1995-2017

  15. 2,000 Annual citations of IPUMS Data (Google Scholar) 1,500 Annual citations 1,000 A new paper 500 every four hours 0 1995 2000 2005 2010 2015

  16. usa.ipums.org

  17. U.S. public use microdata available for research, 1973-2018 (number of person records) 1,000,000,000 800,000,000 1940 Microdata digitized 600,000,000 from historical manuscripts 400,000,000 1880 200,000,000 Public-use microdata from Census Bureau 0

  18. Integrated U.S. microdata available for research, 1970-2018 (number of person records) 2,000,000,000 We are here 1,500,000,000 IPUMS-Format Microdata in the Census Research Data Centers 1,000,000,000 IPUMS Microdata digitized from historical 500,000,000 manuscripts Public-use IPUMS data from Census 0 1970 1980 1990 2000 2010

  19. Federal Statistical Research Data Centers 30 locations and growing

  20. • Census Longitudinal Infrastructure Project • IPUMS Multigenerational Longitudinal Panel

  21. The Census Longitudinal Infrastructure Project (CLIP) Sanders Ferrie O’Hara Alexander 1940 Linking Meeting Minneapolis, February 10-11, 2014

  22. SSA 2000, 2010 1940 Census Numident Censuses WW II Medicare Military Medicaid HUD Federal 1940-2020 Selective Service Surveys Deaths Private Vendors CLIP Linking Strategy

  23. Capturing names in the 1990 census through OCR

  24. Multigenerational Longitudinal Panel Hacker Ruggles Warren Fitch Sobek Roberts Bailey Goeken Price

  25. IPUMS Linked Representative Samples 1920 IPUMS 1930 IPUMS 1900 IPUMS 1910 IPUMS Sample Sample Sample Sample 100% 1880 Census 1860 IPUMS 1870 IPUMS 1850 IPUMS Sample Sample Sample Final version June 2010

  26. Multigenerational Longitudinal Panel 1940 Census 1930 Census 1920 Census 1910 Census 1900 Census 1880 Census 1870 Census 1860 Census 1850 Census

  27. CLIP 1940 Census Numident WW I Genealogies? Military 1881-2020 Deaths 1850-1930 1881-1930 Censuses Births Marriages MLP Linking Strategy

  28. National Longitudinal Research Infrastructure Life histories for each person • Impact of early life conditions on later health and well-being • Social, Economic, Geographic Mobility • Life course transitions

  29. National Longitudinal Research Infrastructure Link across 5+ generations • Impact of forebears on health and well-being • Socioeconomic mobility across generations: Do we have dynasties?

  30. National Longitudinal Research Infrastructure Understanding the great transformations: demographic transition, family transition, urbanization, immigration, industrialization

  31. Higher prior exposure to water-borne lead among male World War Two U.S. Army enlistees was associated with lower intelligence test scores. Exposure was proxied by urban residence and the water pH levels of the cities where enlistees lived in 1930.

  32. National Longitudinal Research Infrastructure • Impact of lead exposure on Alzheimer’s disease • Effect of early-life cognitive capacity on later economic success • Transmission of health and well-being over multiple generations • Effects of early-life income support on later outcomes

  33. Thank You.

Recommend


More recommend