tidyverse wrapup
play

Tidyverse wrapup Steve Bagley somgen223.stanford.edu 1 Making - PowerPoint PPT Presentation

Tidyverse wrapup Steve Bagley somgen223.stanford.edu 1 Making numbers into factors using numeric ranges somgen223.stanford.edu 2 Making numbers into factors using numeric ranges We use factors for grouping, but numbers themselves do not


  1. Tidyverse wrapup Steve Bagley somgen223.stanford.edu 1

  2. Making numbers into factors using numeric ranges somgen223.stanford.edu 2

  3. Making numbers into factors using numeric ranges • We use factors for grouping, but numbers themselves do not make very good groups. Would you want to group together all subjects with weight of 12.5? • Instead, we set up non-overlapping intervals, and use those as the factor values. • Example: 0–10, 10–20, 20–30 somgen223.stanford.edu 3

  4. y <- c (1, 2, 3, 4, 5) cut_number (y, n = 2) [1] [1,3] [1,3] [1,3] (3,5] (3,5] Levels : [1,3] (3,5] Example • cut_number tries to create n bins with approximately the same number of values in each bin. • It returns a factor vector using a special symbolic code for the ranges. • The interval (a,b] spans from a to b , open on the left end, and closed on the right. This does not include a , but does include b . • Note the levels of the factor. somgen223.stanford.edu 4

  5. 2 [1,3] 5 (3,5] # A tibble: 5 x 2 y y_cut < dbl > < fct > 1 1 [1,3] 2 tibble (y = y, y_cut = cut_number (y, n = 2)) 3 3 [1,3] 4 4 (3,5] 5 Example somgen223.stanford.edu 5

  6. cut_interval z <- c (1, 1, 1, 2, 4, 5) cut_number (z, n = 2) [1] [1,1.5] [1,1.5] [1,1.5] (1.5,5] (1.5,5] (1.5,5] Levels : [1,1.5] (1.5,5] cut_interval (z, n = 2) [1] [1,3] [1,3] [1,3] [1,3] (3,5] (3,5] Levels : [1,3] (3,5] • cut_interval makes n intervals with the same range (width). somgen223.stanford.edu 6

  7. cut_width cut_width (z, width = 1) [1] [0.5,1.5] [0.5,1.5] [0.5,1.5] (1.5,2.5] (3.5,4.5] (4.5,5.5] Levels : [0.5,1.5] (1.5,2.5] (2.5,3.5] (3.5,4.5] (4.5,5.5] • cut_width makes intervals of the specified width. somgen223.stanford.edu 7

  8. iris %>% mutate (petal_length = cut_number (Petal.Length, n = 4)) %>% ggplot ( aes (petal_length, Petal.Width)) + geom_boxplot () Graphics example 2.5 2.0 Petal.Width 1.5 1.0 0.5 0.0 [1,1.6] (1.6,4.35] (4.35,5.1] (5.1,6.9] petal_length somgen223.stanford.edu 8

  9. Formatting numbers somgen223.stanford.edu 9

  10. 2.4 1.5 0 0 0 [1] round (x, digits = -1) 2.5 10.6 x <- c (1.4234, 1.5, 1.6234, 2.4, 2.5, 10.6) 1.6 1.4 0 10 [1] round (x, digits = 1) 2 11 2 2 2 1 [1] round (x) 0 round • round creates a new, rounded, number. • At 0.5 it rounds to the even digit. • You can specify the number of digits. Negative numbers round to multiples of 10 . somgen223.stanford.edu 10

  11. [1] signif (x, digits = 1) 1.6234 1.5000 1.4234 [1] signif (x, digits = 5) 2 10 2 2 2 1 x 2.5000 10.6000 2.5000 10.6000 2.4000 1.6234 1.5000 1.4234 [1] signif (x) 2.5000 10.6000 2.4000 1.6234 1.5000 1.4234 [1] 2.4000 signif • signif creates a new number, rounded to the specified number of significant digits. somgen223.stanford.edu 11

  12. 1 234.0 1.23 123. 123. 120 123 123.4 5 12.3 12.3 12 12 12.3 6 1.23 library (scales) # for the number function 1.2 1 1.2 7 0.123 0.12 0.12 0 0.1 8 0.0123 0.01 4 1 234 0.0 < dbl > < chr > (d <- tibble (x = c (123400, 12340, 1234, 123.4, 12.34, 1.234, 0.1234, 0.01234)) %>% mutate (rounded = round (x, digits = 2), signifed = signif (x, digits = 2), number1 = number (x, accuracy = 1), number2 = number (x, accuracy = 0.1))) # A tibble: 8 x 5 x rounded signifed number1 number2 < dbl > < dbl > < chr > 1200 1 123400 123400 120000 123 400 123 400.0 2 12340 12340 12000 12 340 12 340.0 3 1234 1234 0.012 0 data frame example somgen223.stanford.edu 12

  13. 120 1. options (pillar.sigfig = 1) 123 123.4 5 12. 12. 12 12 12.3 6 1. 1. 123. 1 1.2 7 0.1 0.1 0.1 0 0.1 8 0.01 0.01 123. 4 0.0 120000 d # A tibble: 8 x 5 x rounded signifed number1 number2 < dbl > < dbl > < dbl > < chr > < chr > 1 123400 123400 123 400 123 400.0 1 234.0 2 12340 12340 12000 12 340 12 340.0 3 1234 1234 1200 1 234 0.01 0 set option • This sets a print option for tibbles. • The default value is 3. • A value you set stays in place until you change it (or quit R). somgen223.stanford.edu 13

  14. 123.4 1.23 120 123 123.4 5 12.34 12.34 12 12 12.3 6 1.234 1.2 123.4 1 1.2 7 0.1234 0.12 0.12 0 0.1 8 0.01234 0.01 options (pillar.sigfig = 5) 4 0.0 120000 d # A tibble: 8 x 5 x rounded signifed number1 number2 < dbl > < dbl > < dbl > < chr > < chr > 1 123400 123400 123 400 123 400.0 1 234.0 2 12340 12340 12000 12 340 12 340.0 3 1234 1234 1200 1 234 0.012 0 set option somgen223.stanford.edu 14

  15. sprintf ("The value of x is approximately: %.2f", 1.23456) [1] "The value of x is approximately: 1.23" sprintf • sprintf inserts values into a format string, which contains both literal text and format codes, starting with % . • The result is of type character. You can print this (or save it). • For more about the many format codes, see the help page. somgen223.stanford.edu 15

  16. # # ... with 6 more number1 <chr>, # signifed <dbl>, # more variables: # rows, and 3 print (d, n = 2, width = 20) 12340 number2 <chr> 12340 2 123400 1 123400 < dbl > < dbl > x rounded # A tibble: 8 x 5 # controlling how data frames print • This will print 2 rows, and the first 20 characters per row. somgen223.stanford.edu 16

  17. print (d, n = + Inf) printing the entire data frame • This will print all rows. somgen223.stanford.edu 17

  18. Row vs column operations somgen223.stanford.edu 18

  19. 2 100 3 3 101 12 2 (d1 <- tibble (x = 1 : 3, y = 11 : 13, z = 100 : 102)) 11 102 1 1 < int > < int > < int > z y x # A tibble: 3 x 3 13 Exercise: Sum along all the columns • How would you create a new row that contains the column sums? somgen223.stanford.edu 19

  20. d1 %>% summarize_all (sum) # A tibble: 1 x 3 x y z < int > < int > < int > 1 6 36 303 Answer: Sum all the columns • This applies the sum function to every one of the columns. somgen223.stanford.edu 20

  21. 12 2 6 4 102 13 3 3 101 bind_rows (d1, summarize_all (d1, sum)) 2 303 100 11 1 1 < int > < int > < int > z y x # A tibble: 4 x 3 36 Include the sum as the last row • This includes the row with the summed values as the bottom row. somgen223.stanford.edu 21

  22. 2 100 3 3 101 12 2 d1 11 102 1 1 < int > < int > < int > z y x # A tibble: 3 x 3 13 Exercise: Sum across all the rows • How would you create a new column with the sum of all the previous columns? • This is a bit more complicated: each column is a vector, but each row is not. somgen223.stanford.edu 22

  23. 2 100 13 3 3 115 101 12 2 d1 %>% 112 11 118 1 1 < dbl > < int > < int > < int > z row_sum y x # A tibble: 3 x 4 mutate (row_sum = rowSums (.)) 102 Answer: Sum across all the rows • rowSums is built-in. • There is also a rowMeans function. • But what if we want a different calculation? somgen223.stanford.edu 23

  24. 112 100 13 3 3 115 101 12 2 2 d1 %>% 11 118 1 1 < int > < int > < int > < int > z row_sum y x # A tibble: 3 x 4 mutate (row_sum = reduce (., `+`)) 102 Answer: Sum across all the rows • + is a binary operator to compute the sum. somgen223.stanford.edu 24

  25. 112 100 13 3 3 115 101 12 2 2 d1 %>% 11 118 1 1 < dbl > < int > < int > < int > z row_sum y x # A tibble: 3 x 4 mutate (row_sum = flatten_dbl ( pmap (., sum))) 102 Answer: Sum across all the rows • This is a more complex approach using functions from the purrr package. somgen223.stanford.edu 25

  26. How to combine multiple plots together somgen223.stanford.edu 26

  27. ## install.packages("patchwork") library (patchwork) package patchwork • This package allows you to easily combine multiple ggplot plots into a single graphic. somgen223.stanford.edu 27

  28. g1 <- tibble (x = 1 : 3, y = 1 : 3) %>% ggplot ( aes (x, y)) + geom_point (size = 5) g2 <- tibble (x = 1 : 3, y = 3 : 1) %>% ggplot ( aes (x, y)) + geom_point (size = 5) Create two graphs somgen223.stanford.edu 28

  29. g1 | g2 Combine using patchwork 3.0 3.0 2.5 2.5 2.0 2.0 y y 1.5 1.5 1.0 1.0 1.0 1.5 2.0 2.5 3.0 1.0 1.5 2.0 2.5 3.0 x x • Use “ | ” to place side-by-side somgen223.stanford.edu 29

  30. g1 / g2 Combine using patchwork 3.0 2.5 2.0 y 1.5 1.0 1.0 1.5 2.0 2.5 3.0 x 3.0 2.5 2.0 y 1.5 1.0 1.0 1.5 2.0 2.5 3.0 x • Use “ / ” to place on top of somgen223.stanford.edu 30

  31. (g1 | g2) / (g2 | g1) Combine using patchwork 3.0 3.0 2.5 2.5 2.0 2.0 y y 1.5 1.5 1.0 1.0 1.0 1.5 2.0 2.5 3.0 1.0 1.5 2.0 2.5 3.0 x x 3.0 3.0 2.5 2.5 2.0 2.0 y y 1.5 1.5 1.0 1.0 1.0 1.5 2.0 2.5 3.0 1.0 1.5 2.0 2.5 3.0 x x • Use “ ( ) ” for grouping somgen223.stanford.edu 31

  32. g1 | g2 | g1 | g2 Combine using patchwork 3.0 3.0 3.0 3.0 2.5 2.5 2.5 2.5 2.0 2.0 2.0 2.0 y y y y 1.5 1.5 1.5 1.5 1.0 1.0 1.0 1.0 1.0 1.5 2.0 2.5 3.0 1.0 1.5 2.0 2.5 3.0 1.0 1.5 2.0 2.5 3.0 1.0 1.5 2.0 2.5 3.0 x x x x somgen223.stanford.edu 32

Recommend


More recommend