Visual comparisons Comparing distributions: Part 1 R.W. Oldford
The Titanic The data set ‘Titanic‘ provides “information on the fate of passengers on the fatal maiden voy- age of the ocean liner ‘Titanic’, summarized ac- cording to economic status (class), sex, age and survival.” The Titanic data records the number of passengers in various categories for four different categorical variates No. Variate Values 1 Class 1st, 2nd, 3rd, Crew 2 Sex Male, Female 3 Age Child, Adult 4 Survived No, Yes
The Titanic Might be interested in comparing classes by survival library (knitr) ## Warning: package 'knitr' was built under R version 3.5.2 # Subtable of survival/not by class classTable <- apply (Titanic, MARGIN = c (4,1), FUN = sum) kable (classTable) 1st 2nd 3rd Crew No 122 167 528 673 Yes 203 118 178 212 # Number in each class is classTotals <- apply (classTable, MARGIN = 2, FUN = sum) classSurvival <- t (classTable["Yes", ] / classTotals) rownames (classSurvival) <- c ("Survived") kable (classSurvival) 1st 2nd 3rd Crew Survived 0.6246154 0.4140351 0.2521246 0.239548
The Titanic Following the rules for tables, a better way to present these numbers is as # Rescale and round to two decimals newTable <- 100 * round (classSurvival, 2) # swap rows and columns newTable <- t (newTable) # Values are already in the right order, but in general # order the values in descending order descendingOrder <- order (newTable, decreasing = TRUE) newTable <- newTable[descendingOrder, ,drop = FALSE] # Note drop argument colnames (newTable) <- c ("% survived") kable (newTable, caption = "Survival rates on the Titanic by class") Table 4: Survival rates on the Titanic by class % survived 1st 62 2nd 41 3rd 25 Crew 24 How else might we visually compare these sets of numbers?
The Titanic As lengths of bars, colour coded (and labelled) by class: nvals <- nrow (newTable) cols <- rainbow (nvals, alpha = 0.5) barplot (newTable, col = cols, horiz = TRUE, names.arg = c (""), axes = FALSE, xlab = colnames (newTable)) xlocs <- cumsum (newTable) centres <- c (xlocs[1] / 2, xlocs[1 : (nvals - 1)] + diff (xlocs) / 2) text (centres, 0.75, labels = rownames (newTable)) 1st 2nd 3rd Crew % survived which compares lengths along a common NON-aligned scale.
The Titanic barplot (newTable, col = cols, horiz = TRUE, beside = TRUE, names.arg = c (""), xlab = colnames (newTable), legend.text = rownames (newTable)) Crew 3rd 2nd 1st 0 10 20 30 40 50 60 % survived which compares lengths along a common ALIGNED scale.
The Titanic Survival and not surviving survivalProportions <- classTable survivalProportions["Yes",] <- survivalProportions["Yes", ] / classTotals survivalProportions["No",] <- survivalProportions["No", ] / classTotals survivalCols <- adjustcolor ( c ("black", "grey"), 0.5) barplot (survivalProportions, col = survivalCols, horiz = TRUE, beside = TRUE, xlab = "Proportion of class", xlim = c (0,1)) legend ("bottomright", title = "Survival", fill = survivalCols, legend = rownames (survivalProportions)) Crew 3rd 2nd Survival No 1st Yes 0.0 0.2 0.4 0.6 0.8 1.0 Proportion of class
The Titanic Survival and not surviving; frame barplot (survivalProportions, col = survivalCols, horiz = TRUE, beside = FALSE, xlab = "Proportion of class", space = 0) Crew 3rd 2nd 1st 0.0 0.2 0.4 0.6 0.8 1.0 Proportion of class Both are again along a common but non-aligned scale, but now bars to be compared are closer and we have the positive effect of framing.
Warning – problems with stacked bars Bars placed side by side are pretty natural in some contexts, for example when the horizontal axis (and bar width) represents time. For example, consider the following “sleep telemetry chart”: Yellow corresponds to when the baby is awake, blue when they are asleep. But take care when these bars are stacked on top of each other (as above; or placed side by side if arranged vertically). Look what happens for many many stacked bars (and many bars in each). www.trixietracker.com/tour/sleep/
Warning – problems with stacked bars Take care when placing bars of stacked colours side by side. For example, Horizontal lines look crooked.
Warning – problems with stacked bars
Warning – problems with stacked bars Even when the rectangles are the same size, unintended visual effects can be introduced. All lines are perfectly horizontal! This is called the “cafe wall illusion” after a cafe in Bristol, England.
Aside – The cafe wall illusion Take care when placing bars of stacked colours side by side or you might induce unintended visual variation. Cafe on St. Michael’s Hill in Bristol, England
The Titanic - Number of passengers by class barplot ( apply (classTable, MARGIN = 2, FUN = sum), col= adjustcolor ("steelblue", 0.5), xlab="Class", ylab="Number of passengers") 800 Number of passengers 600 400 200 0 1st 2nd 3rd Crew Class
The Titanic - Number who died in each class barplot (classTable["No",], col = survivalCols[1], xlab="Class", ylab="Number of passengers") 600 Number of passengers 400 200 0 1st 2nd 3rd Crew Class
The Titanic - Number who survived in each class barplot (classTable["Yes",], col = survivalCols[2], xlab="Class", ylab="Number of passengers") 200 Number of passengers 150 100 50 0 1st 2nd 3rd Crew Class
The Titanic - The proportion of deaths in each class barplot (classTable, col= survivalCols, xlab="Class", ylab="Number of passengers") 800 Number of passengers 600 400 200 0 1st 2nd 3rd Crew Class
The Titanic savePar <- par (mfrow= c (1,3)) barplot ( apply (classTable, MARGIN = 2, FUN = sum), col= adjustcolor ("steelblue", 0.5), ylim = c (0,1000), # ensure common scale xlab="Class", ylab="Number of passengers") barplot (classTable["No",], col = survivalCols[1], ylim = c (0,1000), # ensure common scale main="Died", xlab="Class", ylab="Number of passengers") barplot (classTable["Yes",], col = survivalCols[2], ylim = c (0,1000), # ensure common scale main="Survived", xlab="Class", ylab="Number of passengers") par (savePar)
The Titanic Comparing counts Died Survived 1000 1000 1000 800 800 800 Number of passengers Number of passengers Number of passengers 600 600 600 400 400 400 200 200 200 0 0 0 1st 2nd 3rd Crew 1st 2nd 3rd Crew 1st 2nd 3rd Crew Class Class Class Can easily compare number of each class. Common aligned scales. Position, length, areas redundantly encode the values. Easier to compare the “shapes” of the distributions as well. Again, “Died” shape looks fairly similar to the total, except perhaps for 1st and 2nd classes. (Differences easier to tell in framed versions.)
The Titanic Comparing shapes - no common scale savePar <- par (mfrow= c (1,3)) barplot ( apply (classTable, MARGIN = 2, FUN = sum), col= adjustcolor ("steelblue", 0.5), # NO COMMON SCALE main="Total", xlab="Class", ylab="Number of passengers") barplot (classTable["No",], col = survivalCols[1], # NO COMMON SCALE main="Died", xlab="Class", ylab="Number of passengers") barplot (classTable["Yes",], col = survivalCols[2], # NO COMMON SCALE main="Survived", xlab="Class", ylab="Number of passengers") par (savePar)
The Titanic Comparing shapes - no common scale Total Died Survived 200 800 Number of passengers Number of passengers Number of passengers 500 150 600 100 400 300 200 50 100 0 0 0 1st 2nd 3rd Crew 1st 2nd 3rd Crew 1st 2nd 3rd Crew Class Class Class Different scaling makes it easier to compare the “shapes” of the distributions but harder to compare the actual values.
South African heart disease Here we will look at a dataset ‘SAheart‘ from the package ‘ElemStatLearn‘. It is a sample from a retrospective study of heart disease in males from a high-risk region of the Western Cape, South Africa. There are 462 cases and 10 variates (see ‘help(SAheart, package="ElemStatLearn")‘ for details). For example, ’sbp’ is the measured systolic blood pressure which is the blood pressure when the heart pumps, ‘chd‘ is 1 if the patient has coronary heart disease, and ‘famhist‘ indicates whether or not the patient has a family history of heart disease. library (ElemStatLearn) ## Warning: package 'ElemStatLearn' was built under R version 3.5.2 kable ( head (SAheart)) sbp tobacco ldl adiposity famhist typea obesity alcohol age chd 160 12.00 5.73 23.11 Present 49 25.30 97.20 52 1 144 0.01 4.41 28.61 Absent 55 28.87 2.06 63 1 118 0.08 3.48 32.28 Present 52 29.14 3.81 46 0 170 7.50 6.41 38.03 Present 51 31.99 24.26 58 1 134 13.60 3.50 27.78 Present 60 25.99 57.34 49 1 132 6.20 6.47 36.21 Present 62 30.77 14.14 45 0
Recommend
More recommend