1 Workshop 4 Data visualisation
2 Learning Objectives By following the slides and applying the techniques to select examples from the workbook the successful student will be able to: • explain what is important in choosing a figure • determine which variables are best mapped to which elements of a figure • explain the fundamentals of ggplot and recognise it as a 'tidy' package • create, with ggplot, appropriate figures to accompany lm and glm analyses of up to three explanatory variables which show the data, the main statistical model and the results of any post-hoc testing as appropriate.
3 Key ideas in data visualisation • communicate information clearly, efficiently, honestly (without distortion) • Should help the reader understand the data, the analysis, and the results • Make it as easy as possible to make relevant comparisons • Minimise ink • See Edward Tufte
4 Key ideas in data visualisation • Variables to elements – Response variable almost always on the vertical axis – Explanatory variables: horizontal axis, colour, shape, size, facets. Consider the variable type • Ideally plot all the data and the model • Or model and additional summary of data
5 Data visualisation to enjoy David McCandless http://www.informationisbeautiful.net/ Hans Rosling’s Gapminder
6 ggplot2 • ‘Tidy’ datasets • are easy to manipulate, model and visualize • have a specific structure: each variable is a column, each observation is a row, and each type of observational unit is a table Wickham, H. (2014), “Tidy Data,” Journal of Statistical Software, 59, available at http://www.jstatsoft.org/article/view/v059i10 “tidy datasets are all alike but every messy dataset is messy in its own way”
7 Tidy data Each variable is in a named column Each row is an observation Easy to explore, plot, model, report. Easy way to think about data. Several powerful packages exist.
8 Keys to understanding ggplot ggplot() ggplot(clover, aes(x = yarrow.s, y = clov.y)) ggplot(data =clover) Empty plot The aesthetic maps variables to axes The data 'data.frame':30 obs. of 3 variables: $ cycle : Factor w/ 3 levels "A","B","C": $ clov.y : num 14 50.7 11.4 23.1 32.2 18.5 $ yarrow.s: int 220 20 510 40 120 300 60 10 ….but it doesn’t know what to plot
9 Keys to understanding ggplot geoms say what what the data should be plotted as ggplot(clover, aes(x = yarrow.s, y = clov.y)) + geom_line() ggplot(clover, aes(x = yarrow.s, y = clov.y)) + ggplot(clover, aes(x = yarrow.s, y = clov.y)) + geom_bar(stat = "identity") geom_point() ...but will plot what you tell it, sensible or not
10 Keys to understanding ggplot ● You can have as many geoms as you want ● geoms use the aes() previously defined, or you can add ● geoms have a default ‘stat’ often count or identity
11 ggplot • Axes: xlim(), xlab() • Annotations • Themes • Code layout • google
ggplot All the figures you’ve ever done The cookbook for R ggplot2 cheatsheet The R Graph Gallery Googling and more googling 12
13 Summary • Making the data, the analysis and the result easier to understand is the most important thing • The response nearly always goes on the vertical; explanatory variables are mapped to the horizontal axis, colours, shapes, sizes, facets • ggplot is awesome
Recommend
More recommend