AMOUNTS MPA 635: Data Visualization September 25, 2018
P L A N F O R T O D A Y More on truth Amounts Verbs Live example
M O R E O N T R U T H
D A T A A N D W H I T E L I E S “I secretly wonder if I'm a righteous dude, is it OK for me to sort of maybe possibly mislead people so they pursue a more righteous policy?” Anonymous MPA 635 student
I S N U D G I N G O K A Y ?
W H O D E F I N E S “ G O O D ” ?
D O N ’ T M E S S W I T H D A T A You can push people towards policy outcomes, but don't distort data to do it. “Lies, damned lies, and statistics” ↑ Don’t perpetuate this ↑
A M O U N T S
P R O B L E M S W I T H B A R P L O T S
# b a r b a r p l o t s
B A R P L O T S A N D S U M M A R Y S T A T S
G E N E R A L R U L E S More data = better Show actual points Don’t use bars for Counts okay, but there summary stats are better solutions The end of the bar is Lollipops, points, often all that matters heatmaps Always start at zero!
V E R B S
M O S T C O M M O N V E R B S Choose rows based on conditions filter() Choose (and rename) columns select() Add column (or change existing column) mutate() Make subgroups based on a column group_by() Calculate summary statistics for groups summarize()
F I LT E R gapminder %>% filter(year == 1967)
F I LT E R gapminder %>% filter(lifeExp < 40)
F I LT E R gapminder %>% filter(continent == "Asia", lifeExp < 40)
S E L E C T gapminder %>% select(country, year, pop)
M U T A T E gapminder %>% mutate(something_new = 5)
M U T A T E gapminder %>% mutate(pop_million = pop / 1000000)
M U T A T E gapminder %>% mutate(lifeExp_binary = ifelse(lifeExp < 40, "Very low", "Not very low"))
G R O U P _ B Y + S U M M A R I Z E gapminder %>% group_by(continent) %>% summarize(avg_lifeexp = mean(lifeExp), median_lifeexmp = median(lifeExp), num_countries = n())
G R O U P _ B Y + S U M M A R I Z E gapminder %>% group_by(continent, year) %>% summarize(avg_lifeexp = mean(lifeExp), median_lifeexmp = median(lifeExp), num_countries = n())
O T H E R H E L P F U L V E R B S Sort a data frame by a column arrange() Merge two data frames by column(s) left_join() count() group_by() %>% summarize(n = n()) Make a data frame long gather() Make a data frame wide spread()
L I V E E X A M P L E
Recommend
More recommend