comparing distributions
play

Comparing Distributions Nick Strayer Instructor DataCamp - PowerPoint PPT Presentation

DataCamp Visualization Best Practices in R VISUALIZATION BEST PRACTICES IN R Comparing Distributions Nick Strayer Instructor DataCamp Visualization Best Practices in R Why compare distributions? Verify balanced groups For comparisons sake


  1. DataCamp Visualization Best Practices in R VISUALIZATION BEST PRACTICES IN R Comparing Distributions Nick Strayer Instructor

  2. DataCamp Visualization Best Practices in R Why compare distributions? Verify balanced groups For comparisons sake

  3. DataCamp Visualization Best Practices in R Why not facet histogams? ggplot(md_speeding, aes(x = speed_over)) + geom_histogram() + facet_grid(vehicle_color~.)

  4. DataCamp Visualization Best Practices in R The box plot

  5. DataCamp Visualization Best Practices in R Box plot pros Familiar Lots of good summary statistics

  6. DataCamp Visualization Best Practices in R Boxplot cons Show me the data!

  7. DataCamp Visualization Best Practices in R A simple addition geom_jitter() shows raw points jostled to avoid overlap. Layer under your geom_boxplot . md_speeding %>% filter(vehicle_color == 'BLUE') %>% ggplot(aes(x = gender, y = speed)) + # Draw points behind geom_jitter(alpha = 0.3, color = 'steelblue') + geom_boxplot(alpha = 0) + # make transparent labs(title = 'Distribution of speed for blue cars by gender')

  8. DataCamp Visualization Best Practices in R

  9. DataCamp Visualization Best Practices in R VISUALIZATION BEST PRACTICES IN R Let's compare some distributions

  10. DataCamp Visualization Best Practices in R VISUALIZATION BEST PRACTICES IN R Boxplot alternatives Nick Strayer Instructor

  11. DataCamp Visualization Best Practices in R Limitations of the boxplot w/ jitter Josteling points can only deal with so much overlap Hard to get an idea of data density

  12. DataCamp Visualization Best Practices in R What are some other options? Beeswarm plots Violin plots

  13. DataCamp Visualization Best Practices in R Beeswarm plots 'Smart' jittering Individual points are clumped together as close to the axis as possible Handily included as geom_beeswarm in the ggbeeswarm package. library(ggbeeswarm) ggplot(data, aes(y = y, x = group)) + geom_beeswarm(color = 'steelblue')

  14. DataCamp Visualization Best Practices in R

  15. DataCamp Visualization Best Practices in R Beeswarm pros Individual datapoints Distributional shape

  16. DataCamp Visualization Best Practices in R Beeswarm cons Get hard with lots of data Arbitrary stacking

  17. DataCamp Visualization Best Practices in R Violin plots KDE reflected to be symmetric Just replace geom_boxplot with geom_violin . ggplot(data, aes(y = y, x = group)) + geom_violin(fill = 'steelblue')

  18. DataCamp Visualization Best Practices in R

  19. DataCamp Visualization Best Practices in R Violin pros Every datapoint is heard Not every datapoint is seen, so good for lots of data.

  20. DataCamp Visualization Best Practices in R Violin cons Kernel width choice Not every datapoint is seen

  21. DataCamp Visualization Best Practices in R VISUALIZATION BEST PRACTICES IN R Let's try some more advanced comparisons!

  22. DataCamp Visualization Best Practices in R VISUALIZATION BEST PRACTICES IN R Comparing spatially related distribution Nick Strayer Instructor

  23. DataCamp Visualization Best Practices in R What are 'spatially connected axes'? There is an underlying ordering of the classes. E.g. months of the year: Jan < Feb < Mar < ...

  24. DataCamp Visualization Best Practices in R The ridgeline plot library(ggridges) # gives us geom_density_ridges() ggplot(md_speeding, aes(x = speed_over, y = month)) + geom_density_ridges(bandwidth = 2) + xlim(1, 35)

  25. DataCamp Visualization Best Practices in R Ridgeline pros

  26. DataCamp Visualization Best Practices in R Ridgeline cons

  27. DataCamp Visualization Best Practices in R

  28. DataCamp Visualization Best Practices in R VISUALIZATION BEST PRACTICES IN R Let's make some ridgelines!

  29. DataCamp Visualization Best Practices in R VISUALIZATION BEST PRACTICES IN R Congratulations! Nick Strayer Instructor

  30. DataCamp Visualization Best Practices in R

  31. DataCamp Visualization Best Practices in R

  32. DataCamp Visualization Best Practices in R

  33. DataCamp Visualization Best Practices in R

  34. DataCamp Visualization Best Practices in R Going further Flowing data Datawrapper Blog Curated list of data visualizations and R- Articles that dig deep into visualization based tutorials. techniques and mistakes. Twitter (#datavis) Books! An ongoing stream of cool projects and Data Visualization , Andy Kirk inspiration. The Functional Art and The Truthful Art by Alberto Cairo

  35. DataCamp Visualization Best Practices in R VISUALIZATION BEST PRACTICES IN R Thank You!

Recommend


More recommend