welcome to the course
play

Welcome to the course! Joining Data in R with dplyr Var_1 Var_2 - PowerPoint PPT Presentation

JOINING DATA IN R WITH DPLYR Welcome to the course! Joining Data in R with dplyr Var_1 Var_2 Var_3 Var_4 obs_1 33 3 54 obs_2 20 90 22 obs_3 58 12 15 obs_4 83 81 5 > mean(df$Var_2) [1] 48.5 Joining Data in R with dplyr


  1. JOINING DATA IN R WITH DPLYR Welcome to the course!

  2. Joining Data in R with dplyr Var_1 Var_2 Var_3 Var_4 obs_1 33 3 54 obs_2 20 90 22 obs_3 58 12 15 obs_4 83 81 5 > mean(df$Var_2) [1] 48.5

  3. Joining Data in R with dplyr Var_1 Var_1 Var_2 Var_2 Var_3 Var_3 Var_4 Var_4 Var_5 obs_1 obs_1 33 33 3 3 54 54 87 obs_2 obs_2 20 20 90 90 22 22 42 obs_3 obs_3 58 58 12 12 15 15 73 obs_4 obs_4 83 83 81 81 5 5 88 > df$Var_5 <- df$Var_2 + df$Var_4

  4. Joining Data in R with dplyr

  5. Joining Data in R with dplyr Course outline ● Chapter 1 - Mutating joins + = ● Chapter 2 - Filtering joins and set operations + = ● Chapter 3 - Assembling data + = ● Chapter 4 - Advanced joining ● Chapter 5 - Case study

  6. Joining Data in R with dplyr ● arrange() ● filter() ● select() ● mutate() ● summarise()

  7. Joining Data in R with dplyr merge()

  8. Joining Data in R with dplyr Benefits of dplyr join functions ● Always preserve row order ● Intuitive syntax ● Can be applied to databases, spark, etc.

  9. Joining Data in R with dplyr

  10. JOINING DATA IN R WITH DPLYR Let’s practice!

  11. JOINING DATA IN R WITH DPLYR Keys

  12. Joining Data in R with dplyr

  13. Joining Data in R with dplyr > names > plays name band name plays 1 Mick Stones 1 John Guitar 2 John Beatles 2 Paul Bass 3 Paul Beatles 3 Keith Guitar # Example join output name band plays 1 Mick Stones <NA> 2 John Beatles Guitar 3 Paul Beatles Bass 4 Keith <NA> Guitar

  14. Joining Data in R with dplyr Keys primary foreign key key > names > plays name band name plays 1 Mick Stones 1 John Guitar 2 John Beatles 2 Paul Bass 3 Paul Beatles 3 Keith Guitar # Example join output name band plays 1 Mick Stones <NA> 2 John Beatles Guitar 3 Paul Beatles Bass 4 Keith <NA> Guitar

  15. Joining Data in R with dplyr Keys primary foreign key key > names2 > plays2 name surname band name surname plays 1 John Coltrane NA 1 John Lennon Guitar 2 John Lennon Beatles 2 Paul McCartney Bass 3 Paul McCartney Beatles 3 Keith Richards Guitar # Example join output name surname band plays 1 John Coltrane <NA> <NA> 2 John Lennon Beatles Guitar 3 Paul McCartney Beatles Bass 4 Keith Richards <NA> Guitar

  16. JOINING DATA IN R WITH DPLYR Let’s practice!

  17. JOINING DATA IN R WITH DPLYR Joins

  18. Joining Data in R with dplyr left_join() > names > plays name band name plays 1 Mick Stones 1 John Guitar 2 John Beatles 2 Paul Bass 3 Paul Beatles 3 Keith Guitar > left_join(names, plays, by = "name") name band plays 1 Mick Stones <NA> table to table to key column name(s) as a 2 John Beatles Guitar augment augment with character string 3 Paul Beatles Bass rows from values from first table second table

  19. Joining Data in R with dplyr Multi-column keys > names2 > plays2 name surname band name surname plays 1 John Coltrane NA 1 John Lennon Guitar 2 John Lennon Beatles 2 Paul McCartney Bass 3 Paul McCartney Beatles 3 Keith Richards Guitar > left_join(names2, plays2, by = c("name", "surname")) name surname band plays 1 John Coltrane <NA> <NA> vector of key 2 John Lennon Beatles Guitar column name(s) 3 Paul McCartney Beatles Bass

  20. Joining Data in R with dplyr right_join() > names > plays name band name plays 1 Mick Stones 1 John Guitar 2 John Beatles 2 Paul Bass 3 Paul Beatles 3 Keith Guitar > right_join(names, plays, by = "name") name band plays 1 John Beatles Guitar 2 Paul Beatles Bass 3 Keith <NA> Guitar rows from values from second table first table

  21. Joining Data in R with dplyr "tables" ● data frames ● tibbles ( tbl_df ) ● tbl references

  22. Joining Data in R with dplyr tibble vs. data frame > library(tibble) > # A data frame 
 > as.tibble(mtcars) > mtcars # A tibble: 32 × 11 > # entire data frame prints, leaving only last values of last mpg cyl disp hp drat wt qsec vs am gear columns (which have been wrapped around to appear below the first * <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> columns) visible in the window, as below 1 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 2 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 3 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 Camaro Z28 3 4 4 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 Pontiac Firebird 3 2 5 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 Fiat X1-9 4 1 6 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 Porsche 914-2 5 2 7 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 Lotus Europa 5 2 8 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 Ford Pantera L 5 4 9 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 Ferrari Dino 5 6 10 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 Maserati Bora 5 8 # ... with 22 more rows, and 1 more variables: carb <dbl> Volvo 142E 4 2

  23. Joining Data in R with dplyr "tables" ● data frames ● tibbles ( tbl_df ) ● tbl references

  24. JOINING DATA IN R WITH DPLYR Let’s practice!

  25. JOINING DATA IN R WITH DPLYR Mutating joins

  26. Joining Data in R with dplyr mutate() > pressure[1:4, ] temperature pressure 1 0 0.0002 2 20 0.0012 3 40 0.0060 4 60 0.0300 > mutate(pressure[1:4, ], fahrenheit = temperature * 1.8 + 32) temperature pressure fahrenheit 1 0 0.0002 32 2 20 0.0012 68 3 40 0.0060 104 4 60 0.0300 140

  27. Joining Data in R with dplyr left_join() > names > plays name band name plays 1 Mick Stones 1 John Guitar 2 John Beatles 2 Paul Bass 3 Paul Beatles 3 Keith Guitar > left_join(names, plays, by = "name") name band plays 1 Mick Stones <NA> 2 John Beatles Guitar 3 Paul Beatles Bass

  28. Joining Data in R with dplyr right_join() > names > plays name band name plays 1 Mick Stones 1 John Guitar 2 John Beatles 2 Paul Bass 3 Paul Beatles 3 Keith Guitar > right_join(names, plays, by = "name") name band plays 1 John Beatles Guitar 2 Paul Beatles Bass 3 Keith <NA> Guitar

  29. Joining Data in R with dplyr inner_join() > names > plays name band name plays 1 Mick Stones 1 John Guitar 2 John Beatles 2 Paul Bass 3 Paul Beatles 3 Keith Guitar > inner_join(names, plays, by = "name") name band plays 1 John Beatles Guitar 2 Paul Beatles Bass

  30. Joining Data in R with dplyr full_join() > names > plays name band name plays 1 Mick Stones 1 John Guitar 2 John Beatles 2 Paul Bass 3 Paul Beatles 3 Keith Guitar > full_join(names, plays, by = "name") name band plays 1 Mick Stones <NA> 2 John Beatles Guitar 3 Paul Beatles Bass 4 Keith <NA> Guitar

  31. Joining Data in R with dplyr Syntax > left_join( names, plays, by = "name") > right_join(names, plays, by = "name") > inner_join(names, plays, by = "name") > full_join( names, plays, by = "name") x y by %>%

  32. Joining Data in R with dplyr Pipe operator > x <- 1:10 > x %>% sum() [1] 55 > sum(x) [1] 55 > abs(diff(range(x))) [1] 9 > x %>% > range() %>% > diff() %>% > abs() [1] 9

  33. Joining Data in R with dplyr dplyr and pipes > names > plays name band name plays 1 Mick Stones 1 John Guitar 2 John Beatles 2 Paul Bass 3 Paul Beatles 3 Keith Guitar > names %>% + full_join(plays, by = "name") %>% + mutate(missing_info = is.na(band) | is.na(plays)) %>% + filter(missing_info == TRUE) %>% + select(name, band, plays) name band plays 1 Mick Stones <NA> 2 Keith <NA> Guitar

  34. Joining Data in R with dplyr Summary ● left_join() = ● right_join() = ● inner_join() = ● full_join() =

  35. JOINING DATA IN R WITH DPLYR Let’s practice!

Recommend


More recommend