good graphics graphs
play

Good Graphics Graphs Fundamental Principal of Statistical Graphics - PowerPoint PPT Presentation

Good Graphics Graphs Fundamental Principal of Statistical Graphics Above all else show the data. Ed Tufte Aaron Rendahl slides by Sanford Weisberg & G. Oehlert Graphics can be . . . all that is read in an article School of Statistics .


  1. Good Graphics Graphs Fundamental Principal of Statistical Graphics Above all else show the data. Ed Tufte Aaron Rendahl slides by Sanford Weisberg & G. Oehlert Graphics can be . . . all that is read in an article School of Statistics . . . efficiently summarize a problem University of Minnesota . . . very aesthetic February 9, 2009 . . . misleading or otherwise awful We must use them well, or else who will? STAT8801 (Univ. of Minnesota) Graphs February 9, 2009 1 / 45 STAT8801 (Univ. of Minnesota) Graphs February 9, 2009 2 / 45 From Tilman, Hill and Lehman (2006) Science , p. 1598 . . . adding prediction intervals ● 500 ● ● Average above ground Biomass, g/m^2 ● ● 400 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 300 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 200 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 100 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 5 10 15 Number of Species STAT8801 (Univ. of Minnesota) Graphs February 9, 2009 3 / 45 STAT8801 (Univ. of Minnesota) Graphs February 9, 2009 4 / 45

  2. . . . adding species indicator Paper usage, New York Times , Feb. 10, 2008 500 None ● Other legume Luppe Average above ground biomass, g/m^2 400 ● 300 ● 200 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 100 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 5 10 15 Number of Species STAT8801 (Univ. of Minnesota) Graphs February 9, 2009 5 / 45 STAT8801 (Univ. of Minnesota) Graphs February 9, 2009 6 / 45 Bush Tax Cuts, New York Times , Feb. 10, 2008 New York Times , Feb. 10, 2008 STAT8801 (Univ. of Minnesota) Graphs February 9, 2009 7 / 45 STAT8801 (Univ. of Minnesota) Graphs February 9, 2009 8 / 45

  3. The Aesthetics of Graphics Map of Cancer Rates Ed Tufte is at the top of the pantheon of statistical graphics gods. Tufte has three extremely influential books on graphics. Not everyone agrees with Tufte, but no one can ignore him. Other important sources: Lee Wilkenson ( The Grammar of Graphics ) Bill Cleveland ( The Elements of Graphing Data ) Howard Wainer (lots of articles) STAT8801 (Univ. of Minnesota) Graphs February 9, 2009 9 / 45 STAT8801 (Univ. of Minnesota) Graphs February 9, 2009 10 / 45 Avoid puzzles John Snow, Cholera & the Broad St. Pump Try to figure this one out STAT8801 (Univ. of Minnesota) Graphs February 9, 2009 11 / 45 STAT8801 (Univ. of Minnesota) Graphs February 9, 2009 12 / 45

  4. The best graphs ever The Worst Graph Ever Challenger data 4 LS line 3 www.economist.com/printedition/ Failures displayStory.cfm?Story_ID=10278643 ● ● 2 1 ● ● ● ● ● 0 30 40 50 60 70 80 90 Temperature STAT8801 (Univ. of Minnesota) Graphs February 9, 2009 13 / 45 STAT8801 (Univ. of Minnesota) Graphs February 9, 2009 14 / 45 What they should have done Technique Challenger data 4 Graphs may be the only part of an article that is read. Poisson line Good format and design 3 Aesthetics, elegance, and style difficult to prescribe. Construct, revise, edit, try again Failures ● ● 2 Words/numbers/graphics together Data graphics are paragraphs about numbers (Tufte, p 181). 1 ● ● ● ● ● Graphics and tables must always reinforce message and text. 0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 30 40 50 60 70 80 90 Temperature STAT8801 (Univ. of Minnesota) Graphs February 9, 2009 15 / 45 STAT8801 (Univ. of Minnesota) Graphs February 9, 2009 16 / 45

  5. Don’t. . . Do. . . 1 . . . Mislead 2 . . . Use mysterious abbreviations 1 . . . use accessible friendly graphic 3 . . . Include too much clutter (forest for the trees) 2 . . . include axis labels, titles and legends 4 . . . Misuse placement of origin 3 . . . use sensible tick marks 5 . . . Include graphs without explanation 4 . . . facilitate comparisons between graphs by using common scales. 6 . . . Use gratuitous color/line variation 5 . . . avoid unclear abbreviations. 7 . . . SHOUT (use all capital letters) 8 . . . use chart junk 9 . . . use pie charts STAT8801 (Univ. of Minnesota) Graphs February 9, 2009 17 / 45 STAT8801 (Univ. of Minnesota) Graphs February 9, 2009 18 / 45 Content-free decoration Graphs in R 1 Basic graphs use plot, pairs, boxplot ◮ Uses sensible defaults, but not always ◮ Reasonably, but not completely, flexible 2 Lattice graphics ◮ Very aesthetic and moderately flexible ◮ Very hard to use well 3 ggplot2 ◮ I’ve not used it ◮ Should be very flexible and maybe easier to use STAT8801 (Univ. of Minnesota) Graphs February 9, 2009 19 / 45 STAT8801 (Univ. of Minnesota) Graphs February 9, 2009 20 / 45

  6. A lattice graph 1 2 3 4 5 xyplot(log(RF)~Date|Location,data=scn1,groups=Treatment, Morris Lamberton Waseca auto.key=FALSE,layout=c(3,1), panel=function(x,y,subscripts,...){ 2 panel.superpose(x,y,subscripts,...) }, 1 panel.groups=function(x,y,...){ panel.loess(x,y,...) log(RF) 0 panel.xyplot(x,y,...)} ) −1 −2 I couldn’t figure out how to get a reasonable legend added to the plot to name the colors/symbols, or how to label dates. −3 1 2 3 4 5 1 2 3 4 5 Date STAT8801 (Univ. of Minnesota) Graphs February 9, 2009 21 / 45 STAT8801 (Univ. of Minnesota) Graphs February 9, 2009 22 / 45 Tufte’s Data Ink Bad data-ink ratio Definition (Data ink) Data ink is the “ink” that displays non-redundant data information. Definition (Data ink ratio) Proportion of a graphic’s ink devoted to the non-redundant display of data information. 1 Maximize data ink ratio, within reason 2 Erase non data ink, within reason 3 Erase redundant data ink, within reason STAT8801 (Univ. of Minnesota) Graphs February 9, 2009 23 / 45 STAT8801 (Univ. of Minnesota) Graphs February 9, 2009 24 / 45

  7. Good data-ink ratio Zero data-ink ratio STAT8801 (Univ. of Minnesota) Graphs February 9, 2009 25 / 45 STAT8801 (Univ. of Minnesota) Graphs February 9, 2009 26 / 45 Erasable non-data ink Erasable non-data ink STAT8801 (Univ. of Minnesota) Graphs February 9, 2009 27 / 45 STAT8801 (Univ. of Minnesota) Graphs February 9, 2009 28 / 45

  8. Improved non-data ink Mighty Ducks Non-data ink can be chartjunk. Could be shading, hatching, grid, etc. Really egregious examples are “ducks”. Get rid of it! STAT8801 (Univ. of Minnesota) Graphs February 9, 2009 29 / 45 STAT8801 (Univ. of Minnesota) Graphs February 9, 2009 30 / 45 Moir´ e patterns Data, not frames STAT8801 (Univ. of Minnesota) Graphs February 9, 2009 31 / 45 STAT8801 (Univ. of Minnesota) Graphs February 9, 2009 32 / 45

  9. Quack Don’t lie with graphics Lies, damned lies, and statistics could also be Lies, damned lies, and graphics. What can we do to avoid misleading? STAT8801 (Univ. of Minnesota) Graphs February 9, 2009 33 / 45 STAT8801 (Univ. of Minnesota) Graphs February 9, 2009 34 / 45 Data, area and dimension Wolf depredations The size of the representation of a number should be proportional to the number The number of information carrying dimensions should not exceed the dimension of the data. STAT8801 (Univ. of Minnesota) Graphs February 9, 2009 35 / 45 STAT8801 (Univ. of Minnesota) Graphs February 9, 2009 36 / 45

  10. Backward in time? Oil STAT8801 (Univ. of Minnesota) Graphs February 9, 2009 37 / 45 STAT8801 (Univ. of Minnesota) Graphs February 9, 2009 38 / 45 Context and labels Oil Keep data in context. Use clear and thorough labels to avoid distortion and ambiguity. STAT8801 (Univ. of Minnesota) Graphs February 9, 2009 39 / 45 STAT8801 (Univ. of Minnesota) Graphs February 9, 2009 40 / 45

  11. Oil Appropriate data Use consistent graphic design. Deflate monetary time series. STAT8801 (Univ. of Minnesota) Graphs February 9, 2009 41 / 45 STAT8801 (Univ. of Minnesota) Graphs February 9, 2009 42 / 45 Opec How to Display Data Badly (Wainer) 1 Show as few data as possible. 2 Hide what data you do show. 3 Ignore the visual metaphor. 4 Only order matters. 5 Graph data out of context. 6 Change scales in mid-axis. 7 Emphasize the trivial, not the important. 8 Jiggle the baseline. 9 Austria first. 10 Label illegibly, incompletely, inaccurately, and ambiguously. STAT8801 (Univ. of Minnesota) Graphs February 9, 2009 43 / 45 STAT8801 (Univ. of Minnesota) Graphs February 9, 2009 44 / 45

  12. Summary Many, many ways to do things badly. Show the data. Do not distort. Cause no pain. STAT8801 (Univ. of Minnesota) Graphs February 9, 2009 45 / 45

Recommend


More recommend