DataCamp Analyzing US Census Data in R ANALYZING US CENSUS DATA IN R US Census data: an overview Kyle Walker Instructor
DataCamp Analyzing US Census Data in R Course overview What you'll learn: How to acquire US Census data with the tidycensus R package How to wrangle US Census data with tidyverse tools How to use the R tigris package to acquire US Census Bureau boundary data How to visualize and map US Census Bureau data in R with ggplot2
DataCamp Analyzing US Census Data in R About your instructor Fields: spatial demography & spatial data science R developer: tidycensus, tigris, & idbr packages
DataCamp Analyzing US Census Data in R US Census Bureau Data
DataCamp Analyzing US Census Data in R The US Census Bureau API To get started using US Census data in R, sign up for a Census API key library(tidycensus) census_api_key("YOUR KEY GOES HERE", install = TRUE) Example key: "rw6pozt48ur2ugc8kg69x5phdrtnuhb2cb1subd6"
DataCamp Analyzing US Census Data in R Using decennial Census data with tidycensus state_pop <- get_decennial(geography = "state", variables = "P001001") head(state_pop) # A tibble: 6 x 4 GEOID NAME variable value <chr> <chr> <chr> <dbl> 1 01 Alabama P001001 4779736 2 02 Alaska P001001 710231 3 04 Arizona P001001 6392017 4 05 Arkansas P001001 2915918 5 06 California P001001 37253956 6 08 Colorado P001001 5029196
DataCamp Analyzing US Census Data in R Using ACS data with tidycensus state_income <- get_acs(geography = "state", variables = "B19013_001") head(state_income) # A tibble: 6 x 5 GEOID NAME variable estimate moe <chr> <chr> <chr> <dbl> <dbl> 1 01 Alabama B19013_001 44758 314 2 02 Alaska B19013_001 74444 809 3 04 Arizona B19013_001 51340 231 4 05 Arkansas B19013_001 42336 234 5 06 California B19013_001 63783 188 6 08 Colorado B19013_001 62520 287
DataCamp Analyzing US Census Data in R ANALYZING US CENSUS DATA IN R Let's get started!
DataCamp Analyzing US Census Data in R ANALYZING US CENSUS DATA IN R Basic tidycensus functionality Kyle Walker Instructor
DataCamp Analyzing US Census Data in R Geography in tidycensus Legal entities: geography = "county" Statistical entities: geography = "tract" Available geographies
DataCamp Analyzing US Census Data in R Geography and variables in tidycensus county_income <- get_acs(geography = "county", variables = "B19013_001") county_income # A tibble: 3,220 x 5 GEOID NAME variable estimate moe <chr> <chr> <chr> <dbl> <dbl> 1 01001 Autauga County, Alabama B19013_001 53099 2631 2 01003 Baldwin County, Alabama B19013_001 51365 991 3 01005 Barbour County, Alabama B19013_001 33956 2655 4 01007 Bibb County, Alabama B19013_001 39776 3306 5 01009 Blount County, Alabama B19013_001 46212 2443 6 01011 Bullock County, Alabama B19013_001 29335 5435 7 01013 Butler County, Alabama B19013_001 34315 2904 8 01015 Calhoun County, Alabama B19013_001 41954 1381 9 01017 Chambers County, Alabama B19013_001 36027 1870 10 01019 Cherokee County, Alabama B19013_001 38925 2598 # ... with 3,210 more rows
DataCamp Analyzing US Census Data in R Geographic subsets in tidycensus texas_income <- get_acs(geography = "county", variables = c(hhincome = "B19013_001"), state = "TX") texas_income # A tibble: 254 x 5 GEOID NAME variable estimate moe <chr> <chr> <chr> <dbl> <dbl> 1 48001 Anderson County, Texas hhincome 42146 2539 2 48003 Andrews County, Texas hhincome 70121 7053 3 48005 Angelina County, Texas hhincome 44185 2107 4 48007 Aransas County, Texas hhincome 44851 4261 5 48009 Archer County, Texas hhincome 62407 5368 6 48011 Armstrong County, Texas hhincome 65000 9415 7 48013 Atascosa County, Texas hhincome 53181 4114 8 48015 Austin County, Texas hhincome 56681 4903 9 48017 Bailey County, Texas hhincome 40589 8438 10 48019 Bandera County, Texas hhincome 55434 4503 # ... with 244 more rows
DataCamp Analyzing US Census Data in R Wide data with tidycensus get_acs(geography = "county", variables = c(hhincome = "B19013_001", medage = "B01002_001"), state = "TX", output = "wide") # A tibble: 254 x 6 GEOID NAME hhincomeE hhincomeM medageE medageM <chr> <chr> <dbl> <dbl> <dbl> <dbl> 1 48001 Anderson County, Texas 42146 2539 38.9 0.5 2 48003 Andrews County, Texas 70121 7053 31.2 0.8 3 48005 Angelina County, Texas 44185 2107 36.7 0.3 4 48007 Aransas County, Texas 44851 4261 50.7 1.1 5 48009 Archer County, Texas 62407 5368 44.1 0.7 6 48011 Armstrong County, Texas 65000 9415 45.9 2.8 7 48013 Atascosa County, Texas 53181 4114 35.4 0.2 8 48015 Austin County, Texas 56681 4903 40.8 0.4 9 48017 Bailey County, Texas 40589 8438 34.4 1.1 10 48019 Bandera County, Texas 55434 4503 51.3 0.9 # ... with 244 more rows
DataCamp Analyzing US Census Data in R ANALYZING US CENSUS DATA IN R Let's practice!
DataCamp Analyzing US Census Data in R ANALYZING US CENSUS DATA IN R Searching for data with tidycensus Kyle Walker Instructor
DataCamp Analyzing US Census Data in R Searching for Census variables To find Census variable IDs, use: Online resources like Census Reporter Built-in variable searching in tidycensus
DataCamp Analyzing US Census Data in R Choosing a dataset to search v16 <- load_variables(year = 2016, dataset = "acs5", cache = TRUE) v16 # A tibble: 22,815 x 3 name label concept <chr> <chr> <chr> 1 B00001_001 Estimate!!Total UNWEIGHTED... 2 B00002_001 Estimate!!Total UNWEIGHTED... 3 B01001_001 Estimate!!Total SEX BY AGE 4 B01001_002 Estimate!!Total!!Male SEX BY AGE 5 B01001_003 Estimate!!Total!!Male!!Under 5 years SEX BY AGE 6 B01001_004 Estimate!!Total!!Male!!5 to 9 years SEX BY AGE 7 B01001_005 Estimate!!Total!!Male!!10 to 14 years SEX BY AGE 8 B01001_006 Estimate!!Total!!Male!!15 to 17 years SEX BY AGE 9 B01001_007 Estimate!!Total!!Male!!18 and 19 years SEX BY AGE 10 B01001_008 Estimate!!Total!!Male!!20 years SEX BY AGE # ... with 22,805 more rows
DataCamp Analyzing US Census Data in R Filtering a variables dataset library(tidyverse) B19001 <- filter(v16, str_detect(name, "B19001")) B19001 # A tibble: 170 x 3 name label concept <chr> <chr> <chr> 1 B19001_001E Estimate!!Total HOUSEHOLD INCOME… 2 B19001_002E ...Less than $10,000 HOUSEHOLD INCOME… 3 B19001_003E ...$10,000 to $14,999 HOUSEHOLD INCOME… 4 B19001_004E ...$15,000 to $19,999 HOUSEHOLD INCOME… 5 B19001_005E ...$20,000 to $24,999 HOUSEHOLD INCOME… 6 B19001_006E ...$25,000 to $29,999 HOUSEHOLD INCOME… 7 B19001_007E ...$30,000 to $34,999 HOUSEHOLD INCOME… 8 B19001_008E ...$35,000 to $39,999 HOUSEHOLD INCOME… 9 B19001_009E ...$40,000 to $44,999 HOUSEHOLD INCOME… 10 B19001_010E ...$45,000 to $49,999 HOUSEHOLD INCOME… # ... with 160 more rows
DataCamp Analyzing US Census Data in R ACS variable structure Anatomy of an ACS variable B19001_002E : B : refers to base table. Other prefixes: C , DP , S . 19001 : the table ID 002 : the variable code within the table E : refers to estimate . optional in tidycensus functions, which return both E and M for each variable.
DataCamp Analyzing US Census Data in R ANALYZING US CENSUS DATA IN R Let's practice!
DataCamp Analyzing US Census Data in R ANALYZING US CENSUS DATA IN R Visualizing Census data with ggplot2 Kyle Walker Instructor
DataCamp Analyzing US Census Data in R ggplot2: a layered grammar of graphics in R
DataCamp Analyzing US Census Data in R Example: plotting income by state library(tidycensus) library(tidyverse) ne_income <- get_acs(geography = "state", variables = "B19013_001", survey = "acs1", state = c("ME", "NH", "VT", "MA", "RI", "CT", "NY")) ggplot(ne_income, aes(x = estimate, y = NAME)) + geom_point()
DataCamp Analyzing US Census Data in R
DataCamp Analyzing US Census Data in R Customizing ggplot2 graphics of ACS data ggplot(ne_income, aes(x = estimate, y = reorder(NAME, estimate))) + geom_point(color = "navy", size = 4) + scale_x_continuous(labels = scales::dollar) + theme_minimal(base_size = 14) + labs(x = "2016 ACS estimate", y = "", title = "Median household income by state")
DataCamp Analyzing US Census Data in R
DataCamp Analyzing US Census Data in R ANALYZING US CENSUS DATA IN R Let's practice!
Recommend
More recommend