Workshop 5.2: The Grammar of Graphics Murray Logan 16 Jul 2017
Section 1 Graphics in R
Options • Traditional (base) graphics ◦ isolated instructions to the device • Grid graphics ◦ instruction sets ◦ lattice ◦ ggplot2
> library (ggplot2) Packages > library (grid) > library (gridExtra) > library (scales)
Graphics infrustructure • layers of data driven objects • coordinate system • scales • faceting • themes
:19.80 Median :15.80 Min. :1.000 Min. : 8.30 1st Qu.:2.250 1st Qu.:11.62 Median :3.500 Mean Time :3.667 Mean :14.83 3rd Qu.:4.750 3rd Qu.:18.25 Max. :7.000 Max. demand > summary (BOD) > head (BOD) 3 Time demand 1 1 8.3 2 2 10.3 3 19.0 19.8 4 4 16.0 5 5 15.6 6 7 ggplot
> p #print the plot + + stat="identity", #use original data + geom="line", #plot data as a line + position="identity", + params = list (na.rm = TRUE), show.legend = FALSE + + ) + + coord_cartesian () + #cartesian coordinates + scale_x_continuous () + #continuous x axis + scale_y_continuous () #continuous y axis mapping= aes (y=demand,x=Time), layer ( data=BOD, #data.frame > stat="identity", #use original data p <- ggplot () + + #single layer - points + layer (data=BOD, #data.frame + mapping= aes (y=demand,x=Time), + + + geom="point", #plot data as points + position="identity", + params = list (na.rm = TRUE), + show.legend = FALSE + )+ #layer of lines ggplot
ggplot 20.0 ● ● 17.5 ● ● 15.0 demand 12.5 ● 10.0 ● 2 4 6 Time
> ggplot (data=BOD, map= aes (y=demand,x=Time)) + geom_point ()+ geom_line () ggplot 20.0 ● ● 17.5 ● ● 15.0 demand 12.5 ● 10.0 ● 2 4 6 Time
> p<- ggplot (data=BOD) > p<-p + geom_point ( aes (y=demand, x=Time)) > p Overview • data • layers (geoms) 20.0 ● ● 17.5 ● ● 15.0 demand 12.5 ● 10.0 ● 2 4 6 Time
> p<- ggplot (data=BOD) > p<-p + geom_point ( aes (y=demand, x=Time)) > p <- p + scale_x_sqrt (name="Time") > p Overview • data • layers (geoms) • scales 20.0 ● ● 17.5 ● ● demand 15.0 12.5 ● 10.0 ● 2 4 6 Time
Section 2 Layers
Layers • layers of data driven objects ◦ geometric objects to represent data ◦ statistical methods to summarize the data ◦ mapping of aethetics ◦ position control
geom_ and stat_ • coupled together • engage either • stat_identity
geom_ • data - obvious • mapping - aesthetics If omitted, inherited from ggplot() • stat - the stat_ function • position - overlapping geoms
geom_ > ggplot (data=BOD) + geom_point ( aes (y=demand, x=Time)) > ggplot (data=BOD, aes (y=demand, x=Time)) + geom_point () > #OR 20.0 ● ● 17.5 ● ● 15.0 demand 12.5 ● 10.0 ● 2 4 6 Time
Optional mapping • alpha - transparency • colour - colour of the geometric features • fill - colour of the geometric features • linetype - fill colour of geometric features • size - size of geometric features such as points or text • shape - shape of geometric features such as points • weight - weightings of values
geom_point : 7 Qn3 1st Qu.:17.90 1st Qu.: 175 :42 chilled Mississippi:42 Qn2 Median : 350 : 7.70 Min. 95 : Min. nonchilled:42 : 7 Median :28.30 Quebec 3rd Qu.:37.12 :45.50 Max. :1000 Max. : 7 Qc2 3rd Qu.: 675 Qc1 : 7 Qc3 :27.21 Mean : 435 Mean : 7 :42 : 7 > head (CO2) 2 250 Qn1 Quebec nonchilled 3 30.4 175 Qn1 Quebec nonchilled 16.0 4 95 Qn1 Quebec nonchilled 1 Treatment conc uptake Type Plant 34.8 Qn1 Quebec nonchilled Qn1 39.2 uptake conc Treatment Type Plant > summary (CO2) 675 350 Qn1 Quebec nonchilled 6 35.3 500 Qn1 Quebec nonchilled 5 37.2 (Other):42
geom_point > ggplot (CO2)+ geom_point ( aes (x=conc,y=uptake), colour="red") ● ● ● ● ● ● ● ● ● ● ● 40 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 30 ● uptake ● ● ● ● ● ● ● ● ● ● ● ● ● 20 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 10 ● ● 250 500 750 1000 conc
geom_point > ggplot (CO2)+ geom_point ( aes (x=conc,y=uptake, colour=Type)) ● ● ● ● ● ● ● ● ● ● ● 40 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 30 ● Type uptake ● ● ● ● ● ● ● Quebec ● ● ● Mississippi ● ● ● ● ● 20 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 10 ● ● 250 500 750 1000 conc
geom_point > ggplot (CO2)+ geom_point ( aes (x=conc,y=uptake), + stat="summary",fun.y=mean) ● ● ● ● 30 ● 25 uptake ● 20 15 ● 250 500 750 1000 conc
:31.800 : 5066 :61.75 Mean :57.46 3rd Qu.:1.0400 Ideal :21551 H: 8304 VVS2 3rd Qu.:62.50 : 8171 3rd Qu.:59.00 Max. :5.0100 I: 5422 VVS1 : 3655 Max. :79.00 Mean VS1 :95.00 1st Qu.:56.00 :43.00 1st Qu.:0.4000 Good : 4906 E: 9797 VS2 :12258 1st Qu.:61.00 Median :0.7000 G:11292 Very Good:12082 F: 9542 SI2 : 9194 Median :61.80 Median :57.00 Mean :0.7979 Premium :13791 Max. J: 2808 :43.00 3rd Qu.: 6.540 : 3933 Mean : 5.731 Mean : 5.735 Mean : 3.539 3rd Qu.: 5324 3rd Qu.: 6.540 Median : 3.530 3rd Qu.: 4.040 Max. :18823 Max. :10.740 Max. :58.900 Max. Mean Median : 5.710 (Other): 2531 : 0.000 price x y z Min. : 326 Min. Min. Median : 5.700 : 0.000 Min. : 0.000 1st Qu.: 950 1st Qu.: 4.710 1st Qu.: 4.720 1st Qu.: 2.910 Median : 2401 Min. Min. > head (diamonds) 3 E SI1 59.8 61 326 3.89 3.84 2.31 0.23 0.21 Good E VS1 56.9 65 327 4.05 4.07 Premium 2 4 <ord> <dbl> <dbl> <int> <dbl> <dbl> <dbl> # A tibble: 6 x 10 carat cut color clarity depth table price x y z <dbl> <ord> <ord> 1 2.43 0.23 Ideal E SI2 61.5 55 326 3.95 3.98 2.31 0.29 :13065 cut 62.8 57 336 3.94 3.96 2.48 > summary (diamonds) carat color J clarity depth table Min. :0.2000 Fair : 1610 D: 6775 SI1 VVS2 0.24 Very Good Premium 5 I VS2 62.4 58 334 4.20 4.23 2.63 0.31 6 Good J SI2 63.3 58 335 4.34 4.35 2.75 Example data sets
Section 3 Primary geometric objects
_bar geom_bar _bin > ggplot (diamonds) + geom_bar ( aes (x = carat)) Feature geom stat position Histogram stack 2000 count 1000 0 0 1 2 3 4 5 carat
_bar geom_bar _bin > ggplot (diamonds) + geom_bar ( aes (x = cut)) Feature geom stat position Barchart stack 20000 15000 count 10000 5000 0 Fair Good Very Good Premium Ideal cut
_bar geom_bar _bin > ggplot (diamonds) + geom_bar ( aes (x = cut, fill = clarity)) Feature geom stat position barchart stack 20000 clarity I1 15000 SI2 SI1 count VS2 10000 VS1 VVS2 VVS1 5000 IF 0 Fair Good Very Good Premium Ideal cut
_bar geom_bar _bin > ggplot (diamonds) + geom_bar ( aes (x = cut, fill = clarity)) Feature geom stat position barchart stack 20000 clarity I1 15000 SI2 SI1 count VS2 10000 VS1 VVS2 VVS1 5000 IF 0 Fair Good Very Good Premium Ideal cut
_bar geom_bar _bin > ggplot (diamonds) + geom_bar ( aes (x = cut, fill = clarity), + position='dodge') Feature geom stat position barchart dodge 5000 clarity 4000 I1 SI2 3000 SI1 count VS2 VS1 2000 VVS2 VVS1 IF 1000 0 Fair Good Very Good Premium Ideal cut
Recommend
More recommend