graphics using ggplot2
play

Graphics using ggplot2 Steve Bagley somgen223.stanford.edu 1 - PowerPoint PPT Presentation

Graphics using ggplot2 Steve Bagley somgen223.stanford.edu 1 data_dir <- "https://web.stanford.edu/class/somgen223/data/" gene_exp1 <- read_csv ( str_c (data_dir, "gene_exp1.csv")) control : treatment) Setup data


  1. Graphics using ggplot2 Steve Bagley somgen223.stanford.edu 1

  2. data_dir <- "https://web.stanford.edu/class/somgen223/data/" gene_exp1 <- read_csv ( str_c (data_dir, "gene_exp1.csv")) control : treatment) Setup data gene_tall <- gather (gene_exp1, condition, expression_level, somgen223.stanford.edu 2

  3. ggplot2 : a package for the grammar of graphics • The grammar of graphics is the idea that graphs are composed of known elements in specific ways, in the same way that types of words are assembled through the rules of English syntax to form sentences. • ggplot2 is a package for doing this in R. • There are other plotting packages in R, but they do not follow this conceptual model. somgen223.stanford.edu 3

  4. ggplot : the main function • ggplot (in the package ggplot2 ) is the main function for constructing a graph. • Nearly every aspect of the graph can be changed. Usually, the defaults are pretty good. • You combine the graph produced by ggplot with plot specifications to add to or modify the graph. somgen223.stanford.edu 4

  5. ggplot (data = BOD, mapping = aes (x = Time, y = demand)) + geom_point () ggplot (BOD, aes (Time, demand)) + geom_point () How to call ggplot • ggplot is the main plotting function. • data = BOD : this tells which data frame contains the data to be plotted • mapping = aes(x = Time, y = demand) : use the data in the Time column on x-axis, demand column on y-axis • geom_point() : plot the data as points • Note that you can use positional instead of named arguments to make this expression shorter: • The use of “+” to glue these operations together will be explained later. somgen223.stanford.edu 5

  6. BOD %>% ggplot ( aes (Time, demand)) + geom_point () Use geom_point for scatterplot 20.0 17.5 15.0 demand 12.5 10.0 2 4 6 Time • aes specifies the aesthetic mapping from the data (columns) to some aspect of the graph (x, y position). • There are 6 rows in the BOD data frame. • There are 6 points in the graph, one for each row. • This alignment of rows and points applies through much of ggplot2 . somgen223.stanford.edu 6

  7. BOD %>% ggplot ( aes (Time, demand)) + geom_line () Example: geom_line 20.0 17.5 15.0 demand 12.5 10.0 2 4 6 Time somgen223.stanford.edu 7

  8. BOD %>% ggplot ( aes (Time, demand)) + geom_point () + geom_line () Example: points and lines 20.0 17.5 15.0 demand 12.5 10.0 2 4 6 Time somgen223.stanford.edu 8

  9. BOD %>% ggplot ( aes (Time, demand)) + geom_point (size = 5) Giving arguments to plot specifications: change the size 20.0 17.5 15.0 demand 12.5 10.0 2 4 6 Time somgen223.stanford.edu 9

  10. BOD %>% ggplot ( aes (Time, demand)) + geom_point (color = "red") Giving arguments to plot specifications: change the color 20.0 17.5 15.0 demand 12.5 10.0 2 4 6 Time somgen223.stanford.edu 10

  11. Exercise: orange trees Using the data in Orange : 1. Pull out the data for tree 2 only 2. Plot circumference versus age for those data somgen223.stanford.edu 11

  12. Orange %>% filter (Tree == 2) %>% ggplot ( aes (age, circumference)) + geom_point () Answer: orange trees 200 150 circumference 100 50 400 800 1200 1600 age somgen223.stanford.edu 12

  13. ggplot and + operator • ggplot(...) + geom_point() is a strange expression: it uses the + operator to add things (plots and plot specifications), which are not numbers. • This uses a feature called generic functions: the types of the arguments to + determine which piece of code, called a method, to run. • ggplot2 relies on this feature heavily. somgen223.stanford.edu 13

  14. plot1 <- ggplot (BOD, aes (Time, demand)) spec1 <- geom_point () plot1 + spec1 plot1 + spec2 Using + in ggplot2 spec2 <- geom_line (color = "blue") • Note you can save parts of the graph specification and then add them together. somgen223.stanford.edu 14

  15. BOD %>% ggplot ( aes (Time, demand)) + geom_point () + geom_smooth (method = "lm") Adding a smoother 20 demand 10 2 4 6 Time • lm means linear model (best fit, least-square regression) somgen223.stanford.edu 15

  16. ## fixed size, default BOD %>% ggplot ( aes (Time, demand)) + geom_point () ## fixed size, given as size argument BOD %>% ggplot ( aes (Time, demand)) + geom_point (size = 5) ## size of each point depends on value of Time column for that point BOD %>% ggplot ( aes (Time, demand)) + geom_point ( aes (size = Time)) ## THIS CAUSES AN ERROR! BOD %>% ggplot ( aes (Time, demand)) + geom_point (size = Time) Making the plot specification depend on the data somgen223.stanford.edu 16

  17. 0 12 # A tibble: 3 x 3 gene control treatment < chr > < dbl > < dbl > 1 ABC123 gene_exp1 1 13 10 3 3 GKK7 Combining numbers and text in a graph 2 DEF234 somgen223.stanford.edu 17

  18. gene_exp1 %>% ggplot ( aes (control, treatment)) + geom_point () Plotting treatment vs control 10 treatment 5 0.0 2.5 5.0 7.5 10.0 12.5 control somgen223.stanford.edu 18

  19. ggplot ( aes (control, treatment)) + geom_point () + geom_text ( aes (label = gene)) gene_exp1 %>% Plotting treatment vs control with gene labels GKK7 10 treatment 5 DEF234 ABC123 0.0 2.5 5.0 7.5 10.0 12.5 control somgen223.stanford.edu 19

  20. ggplot ( aes (control, treatment)) + geom_point () + geom_text ( aes (label = gene), hjust = "left", vjust = "bottom") gene_exp1 %>% Control placement of text GKK7 10 treatment 5 DEF234 ABC123 0.0 2.5 5.0 7.5 10.0 12.5 control somgen223.stanford.edu 20

  21. 1 13 # A tibble: 3 x 3 gene control treatment < chr > < dbl > < dbl > 1 ABC123 0 gene_exp1 2 DEF234 10 3 3 GKK7 12 Grouping • Let’s graph the control and treatment values separately for each gene. • We’ll need the data in tall format. somgen223.stanford.edu 21

  22. gene_tall %>% ggplot ( aes (gene, expression_level)) + geom_point () Grouping in a graph 10 expression_level 5 0 ABC123 DEF234 GKK7 gene • It would be nice of the data for each condition were grouped together (color? line?). somgen223.stanford.edu 22

  23. ggplot ( aes (gene, expression_level)) + geom_point ( aes (color = condition)) gene_tall %>% Use the mapping to assign color to the grouping variable 10 expression_level condition control treatment 5 0 ABC123 DEF234 GKK7 gene somgen223.stanford.edu 23

  24. ggplot ( aes (gene, expression_level)) + geom_line ( aes (group = condition, color = condition)) gene_tall %>% Use group to form groups for geom_line 10 expression_level condition control treatment 5 0 ABC123 DEF234 GKK7 gene somgen223.stanford.edu 24

  25. When to use group ? • You need to include group when the number of graphical objects is not the same as the number of observations to graph. • With geom_line , there are n endpoints, but only n-1 lines between them. somgen223.stanford.edu 25

  26. Facets • Most explorations of data involve making comparison to highlight an important difference between subsets. • One way to do this visually is to put the data for each condition in a separate graph, called a “facet”. • ggplot can do this, making sure that the facet axes are nicely lined up. somgen223.stanford.edu 26

  27. ggplot ( aes (gene, expression_level)) + geom_point () + facet_wrap ( vars (condition)) gene_tall %>% Facet example control treatment 10 expression_level 5 0 ABC123 DEF234 GKK7 ABC123 DEF234 GKK7 gene somgen223.stanford.edu 27

  28. ggplot (Orange, aes (age, circumference)) + geom_point () Exercise: Orange trees 200 150 circumference 100 50 400 800 1200 1600 age • It would be better if we visually distinguish each tree’s data. • What is the visual equivalent of group_by ? somgen223.stanford.edu 28

  29. ggplot (Orange, aes (age, circumference)) + geom_point () + facet_wrap ( vars (Tree)) Answer: Orange trees, using facets 3 1 5 200 150 100 circumference 50 400 800 1200 1600 2 4 200 150 100 50 400 800 1200 1600 400 800 1200 1600 age somgen223.stanford.edu 29

  30. ggplot (Orange, aes (age, circumference)) + geom_point ( aes (color = Tree)) + geom_line ( aes (color = Tree, group = Tree)) Answer: Orange trees, using grouping 200 Tree 150 circumference 3 1 5 100 2 4 50 400 800 1200 1600 age somgen223.stanford.edu 30

  31. ggplot (Orange, aes (age, circumference)) + geom_point ( aes (color = Tree)) + geom_line ( aes (color = Tree, group = Tree)) + labs (x = "Age (days)", y = "Circumference (mm)", title = "Circumference vs. age for orange trees", subtitle = "Data from built-in data frame Orange") Labeling the graph Circumference vs. age for orange trees Data from built-in data frame Orange 200 Tree Circumference (mm) 150 3 1 5 100 2 4 50 400 800 1200 1600 Age (days) somgen223.stanford.edu 31

  32. geom_point ( aes (color = Tree)) + ggplot (Orange, aes (age, circumference, group = Tree)) + geom_line ( aes (color = Tree, group = Tree)) + expand_limits (x = 0, y = 0) Including the origin (0, 0) 200 Tree 150 circumference 3 1 100 5 2 4 50 0 0 500 1000 1500 age somgen223.stanford.edu 32

Recommend


More recommend