BootcampR AN INTRODUCTION TO R Jason A. Heppler, PhD University of Nebraska at Omaha March 3, 2020 @jaheppler
Hi. I'm Jason. I like to gesture at screens. Digital Engagement Librarian , University of Nebraska at Omaha Mentor, Mozilla Open Leaders Researcher, Humanities+Design , Stanford University
Schedule March 10: 1:30-3 Making Networks in CL 232 March 17: 1:30-3 Making Maps in CL 112 March 31: 1:30-3 Clustering and Classifying in CL 112
Today's plan • Aesthetics and design • Intro to ggplot • The grammar of graphics • Hands-on! Open up RStudio. We'll start doing a few things together soon.
"The bad news is that when ever you learn a new skill you’re going to suck . It’s going to be frustrating. The good news is that is typical and happens to everyone and it is only temporary . You can’t go from knowing nothing to becoming an expert without going through a period of great frustration and great suckiness." —Hadley Wickham
ggplot is part of the tidyverse A highly functional package for reasoning the creation of statistical charts and graphics.
Edward Tufte suggests that graphical excellence is defined by "that which gives the viewer the greatest number of ideas, in the shortest time, with the least ink, the smallest space, and which tells the truth about data." Edward Tufte, The Visual Display of Quantitative Information (Graphics Press, 1983)
1 For the exploration of data and evidence 2 For the communication of information and results
Types of Visualization • Information visualization: statistical charts and graphs to represent data
Types of Visualization • Information visualization: statistical charts and graphs to represent data • Scientific visualization: scientific data that has close ties to real-world objects with spatial properties
Types of Visualization • Information visualization: statistical charts and graphs to represent data • Scientific visualization: scientific data that has close ties to real-world objects with spatial properties • Infographic: combining various statistics and visualizations with narrative
Cognitive and Social Aspects of Visualization
Gestalt Principles of Data Visualization
Gestalt Principles of Data Visualization Gestalt psychology is an old practice of understanding how humans perceive patterns. The principles of Gestalt psychology attempt to explore how we view separate visual elements as a whole.
Gestalt Principles of Data Visualization Similarity. Objects that are visually similar (the same color, e.g.) are perceived as part of the same group.
Gestalt Principles of Data Visualization Similarity. Objects that are visually similar (the same color, e.g.) are perceived as part of the same group.
Gestalt Principles of Data Visualization Similarity. Objects that are visually similar (the same color, e.g.) are perceived as part of the same group.
Gestalt Principles of Data Visualization Proximity. Humans perceive objects close together as being part of a single group.
Gestalt Principles of Data Visualization Enclosure. Surrounding a group of related elements with a visual element.
Gestalt Principles of Data Visualization Closure. Humans tend to fill in the blanks when presented with missing information. When viewing a shape with missing segments, we perceive it as a single unit.
Effective design of complex visualizations must consider these principles and the intentional and unintentional signals our graphics send to our readers.
In the mid-1980s, statisticians William Cleveland and Robert McGill ran experiments with human volunteers to study the perception of quantitative information encoded by different cues.
Let's get started. ggplot2 is a data visualization package that uses • a grammar of graphics: breaking up graphs into components • popular method for creating explanatory and exploratory graphics Supplementary packages for ggplot are available for more customization and function, for example: • gganimate : create animations • gghighlight : highlight lines and points • ggrepel : automatic adjustment of text labels • ggbeeswarm : add non-overlapping points ...and many more.
Grammar of graphics ggplot has three essential components: 1. data: a dataset you are visualizing 2. aesthetic mappings: that identify coordinates (what columns to map to x and y), and assigning variables to visual elements (color, shape, size, etc.) 3. geometric layer: a type of graphic (point, line, boxplot, bars, maps, etc.)
Grammar of graphics • geom_* : type of graphic • stat_* : statistical representation of the data • scale_* : visual values (axis scale, color scale) • facet_* : divide plot into subplots • theme() : adjust background colors, grid lines, font sizes, etc. • labs() : add labels like title, x and y labels, subtitles, captions, etc.
Grammar of graphics A visualization concept created by Leland Wilkinson (1999) to define the elements of statistical graphics: "... describes the meaning of what we do when we construct statistical graphics ... More than a taxonomy ... Computational system based on the underlying mathematics of representing statistical functions of data." Adapted by the creator of ggplot , Hadley Wickham, in 2009. ggplot offers a: • consistent and simple syntax for • describing statistical graphics, and is • highly modular to break graphs into • semantic components. See Hadley Wickham, "A Layered Grammar of Graphics," Journal of Computational and Graphical Statistics vol. 19 no. 1 (2010): 3--28 http://vita.had.co.nz/papers/layered-grammar.pdf.
Grammar of graphics library (tidyverse) # devtools::install_github("hepplerj/superfundr") library (superfundr) data(superfunds) # Let's look at the first five rows superfunds %>% head(5) # A tibble: 5 x 20 site_name epa_id city county state zipcode region npl_status superfund_agree… <chr> <chr> <chr> <chr> <chr> <chr> <dbl> <chr> <chr> 1 ATLAS TA… MAD00… FAIR… BRIST… MA 02719 1 Currently… N 2 ATLAS TA… MAD00… FAIR… BRIST… MA 02719 1 Currently… N 3 ATLAS TA… MAD00… FAIR… BRIST… MA 02719 1 Currently… N 4 ATLAS TA… MAD00… FAIR… BRIST… MA 02719 1 Currently… N 5 ATLAS TA… MAD00… FAIR… BRIST… MA 02719 1 Currently… N # … with 11 more variables: federal_facility <chr>, op_unit_no <dbl>, seq_id <dbl>, # decision_type <chr>, completion_date <dttm>, fiscal_year <dbl>, media <chr>, # contaminant <chr>, address <fct>, latitude <dbl>, longitude <dbl>
Grammar of graphics library (tidyverse) # devtools::install_github("hepplerj/superfundr") library (superfundr) data(superfunds) # Let's look at the first five rows superfunds %>% head(5) # distinct() lets us identify unique values # and the .keep_all argument returns all data # that matches superfunds_subset <- superfunds %>% distinct (site_name, .keep_all = TRUE) ggplot (data = superfunds_subset, aes(x = state)) + geom_bar() + labs(title = "U.S. Superfund Sites", x = "State", y = "Count")
Grammar of graphics
Grammar of graphics Let's break down what we did with the ggplot code. The code for our previous bar chart looked like: ggplot (data = superfunds, aes(x = state)) + geom_bar() + labs(title = "U.S. Superfund Sites", x = "State", y = "Count")
Grammar of graphics Let's break down what we did with the ggplot code. The code for our previous bar chart looked like: ggplot (data = superfunds, aes(x = state)) + geom_bar()
Grammar of graphics Let's break down what we did with the ggplot code. The code for our previous bar chart looked like: ggplot (data = superfunds, aes(x = state)) + geom_bar() ggplot needs: 1. mapping of data 2. to aesthetic attributes 3. using geometric objects 4. with data statistically transformed 5. and, if needed, mapped onto a facet or coordinate system
Grammar of graphics Let's break down what we did with the ggplot code. The code for our previous bar chart looked like: ggplot (data = superfunds, aes(x = state)) + geom_bar() 1. mapping of data data = superfunds 2. to aesthetic attributes aes(x = state)
Grammar of graphics Let's break down what we did with the ggplot code. The code for our previous bar chart looked like: ggplot (data = superfunds, aes(x = state)) + geom_bar() 1. mapping of data data = superfunds 2. to aesthetic attributes aes(x = state)
Grammar of graphics Let's break down what we did with the ggplot code. The code for our previous bar chart looked like: ggplot (data = superfunds, aes(x = state)) + geom_bar() 3. Using geometric objects geom_bar() See the ggplot geom_bar() documentation for the differences in the stat flag. By default, geom_bar() uses "stat='count'" which sets the height of the bar proportion to the number of cases in each group. Since we want the height of the bars to represent values in the data, we use "stat='identity'" to map a variable to the y aesthetic.
Recommend
More recommend