JOINING DATA IN R WITH DPLYR Binds
Joining Data in R with dplyr
Joining Data in R with dplyr ● rbind() ● cbind() ● bind_rows() ● bind_cols()
Joining Data in R with dplyr bind_rows() > band1 > band2 name surname name surname 1 John Lennon 1 Mick Jagger 2 Paul McCartney 2 Keith Richards 3 George Harrison 3 Charlie Watts 4 Ringo Starr 4 Ronnie Wood > bind_rows(band1, band2) name surname 1 John Lennon 2 Paul McCartney tables to combine 3 George Harrison 4 Ringo Starr 5 Mick Jagger 6 Keith Richards 7 Charlie Watts 8 Ronnie Wood
Joining Data in R with dplyr bind_cols() > band1 > plays1 name surname instrument born 1 John Lennon 1 Guitar 1940 2 Paul McCartney 2 Bass 1942 3 George Harrison 3 Guitar 1943 4 Ringo Starr 4 Drums 1940 > bind_cols(band1, plays1) name surname instrument born 1 John Lennon Guitar 1940 2 Paul McCartney Bass 1942 3 George Harrison Guitar 1943 4 Ringo Starr Drums 1940
Joining Data in R with dplyr Benefits of bind_rows() and bind_cols() ● Faster ● Return a tibble ● Can handle lists of data frames ● .id
Joining Data in R with dplyr bind_rows() > band1 > band2 name surname name surname 1 John Lennon 1 Mick Jagger 2 Paul McCartney 2 Keith Richards 3 George Harrison 3 Charlie Watts 4 Ringo Starr 4 Ronnie Wood > bind_rows(Beatles = band1, Stones = band2, .id = "band") band name surname 1 Beatles John Lennon 2 Beatles Paul McCartney Label names for new Column name for new 3 Beatles George Harrison column column 4 Beatles Ringo Starr 5 Stones Mick Jagger 6 Stones Keith Richards 7 Stones Charlie Watts 8 Stones Ronnie Wood
JOINING DATA IN R WITH DPLYR Let’s practice!
JOINING DATA IN R WITH DPLYR Build a be � er data frame
Joining Data in R with dplyr ● data.frame() ● as.data.frame() ● data_frame() ● as_data_frame()
Joining Data in R with dplyr data.frame() defaults ● Changes strings to factors ● Adds row names ● Changes unusual column names
Joining Data in R with dplyr data_frame() > data_frame( + Beatles = c("John", "Paul", "George", "Ringo"), + Stones = c("Mick", "Keith", "Charlie", "Ronnie"), + Zeppelins = c("Robert", "Jimmy", "John Paul", "John") + ) # A tibble: 4 × 3 Beatles Stones Zeppelins <chr> <chr> <chr> 1 John Mick Robert 2 Paul Keith Jimmy 3 George Charlie John Paul 4 Ringo Ronnie John
Joining Data in R with dplyr data_frame() data_frame() will not… ● Change the data type of vectors (e.g. strings to factors) ● Add row names ● Change column names ● Recycle vectors greater than length one
Joining Data in R with dplyr data_frame() ● Evaluates arguments lazily, in order > data_frame( + numbers = 1:5, + squares = numbers ^ 2 + ) # A tibble: 5 × 2 numbers squares <int> <dbl> 1 1 1 2 2 4 3 3 9 4 4 16 5 5 25 ● Returns a tibble
Joining Data in R with dplyr as_data_frame()
JOINING DATA IN R WITH DPLYR Let’s practice!
JOINING DATA IN R WITH DPLYR Working with data types
Joining Data in R with dplyr > 1 + 1 [1] 2 > "one" + "one" Error in "one" + "one" : non-numeric argument to binary operator
Joining Data in R with dplyr Character Character? Number? Number
Joining Data in R with dplyr Atomic data types Logical > typeof(TRUE) [1] "logical" Character (i.e. string) > typeof("hello") [1] "character" Double (i.e. numeric w/ decimal) > typeof(3.14) [1] "double" > typeof(1L) Integer (i.e. numeric w/o decimal) [1] "integer" > typeof(1 + 2i) Complex [1] "complex" > typeof(raw(1)) Raw [1] "raw"
Joining Data in R with dplyr Classes > x <- c(1L, 2L, 3L, 2L) > x [1] 1 2 3 2 > typeof(x) [1] "integer" > class(x) [1] "integer" > attributes(x) <- list(class = "factor", levels = c("A", "B", "C", "D")) > x [1] A B C B 1L = A Levels: A B C D 2L = B > typeof(x) 3L = C [1] "integer" 4L = D > class(x) [1] "factor"
JOINING DATA IN R WITH DPLYR Let’s practice!
JOINING DATA IN R WITH DPLYR dplyr 's coercion rules
Joining Data in R with dplyr Character Character? Number? Number
Joining Data in R with dplyr Integer Character Double as.character() (string) Logical Character Integer Double as.numeric() Logical TRUE -> 1 FALSE -> 0 Double as.integer() Integer Logical TRUE -> 1 FALSE -> 0 Integer
Joining Data in R with dplyr factors # x is a factor > x [1] A B C B Levels: A B C D # How x is stored? > unclass(x) [1] 1 2 3 2 attr(,"levels") [1] "A" "B" "C" "D" > as.character(x) [1] "A" "B" "C" "B" > as.numeric(x) [1] 1 2 3 2
Joining Data in R with dplyr factors # y is a factor > y <- factor(c(5, 6, 7, 6)) > y [1] 5 6 7 6 Levels: 5 6 7 > unclass(y) [1] 1 2 3 2 attr(,"levels") [1] "5" "6" "7" > as.character(y) [1] "5" "6" "7" "6" > as.numeric(y) [1] 1 2 3 2 > as.numeric(as.character(y)) [1] 5 6 7 6
Joining Data in R with dplyr dplyr 's coercion behavior ● dplyr functions will not automatically coerce data types ● Returns an error ● Expects you to manually coerce data ● Exception: factors ● dplyr converts non-aligning factors to strings ● Gives warning message
JOINING DATA IN R WITH DPLYR Let’s practice!
Recommend
More recommend