The Strucplot Framework for Visualizing Categorical Data David Meyer - - PowerPoint PPT Presentation

the strucplot framework for visualizing categorical data
SMART_READER_LITE
LIVE PREVIEW

The Strucplot Framework for Visualizing Categorical Data David Meyer - - PowerPoint PPT Presentation

Introduction Basic techniques Highlighting and shading Visualizing test statistics Multiway tables Conclusion The Strucplot Framework for Visualizing Categorical Data David Meyer 1 , Achim Zeileis 2 and Kurt Hornik 2 1 Department of


slide-1
SLIDE 1

Introduction Basic techniques Highlighting and shading Visualizing test statistics Multiway tables Conclusion

The Strucplot Framework for Visualizing Categorical Data

David Meyer1, Achim Zeileis2 and Kurt Hornik2

1Department of Information Systems and Operations 2Department of Statistics and Mathematics

Wirtschaftsuniversit¨ at Wien

Dortmund, useR! 2008

slide-2
SLIDE 2

Introduction Basic techniques Highlighting and shading Visualizing test statistics Multiway tables Conclusion

Introduction

This talk is about statistical graphics: Visualizing Categorical Data using the vcd package. (Motivation: VCD book for SAS by Michael Friendly.) vcd includes tools for fitting discrete distributions, manipulating two- and higher-dimensional“flat”tables, computing test statistics, and creating plots supporting both exploratory analysis and inference. There are also a lot of data sets. The talk focuses on the“strucplot”framework in vcd, supporting the creation of (variants of) mosaic, association, and sieve plots in a flexible way. It will start with exploratory techniques for two-way tables, discuss highlighting and shading techniques, link this with inference methods, and conclude on some methods for higher-dimensional data.

slide-3
SLIDE 3

Introduction Basic techniques Highlighting and shading Visualizing test statistics Multiway tables Conclusion

The Arthritis data (Koch and Edwards, 1988)

Results from a double-blind clinical trial among 84 patients investigating a new treatment for rheumatoid arthritis, stratified by age and gender. (In this talk, we ignore age.) Improvement Gender Treatment None Some Marked Female Placebo 19 7 6 Treatment 6 5 16 Male Placebo 10 1 Treatment 7 2 5 We start with the results for female patients (two-way data).

slide-4
SLIDE 4

Introduction Basic techniques Highlighting and shading Visualizing test statistics Multiway tables Conclusion

Visualize this with ... a barplot (?)

Treated Placebo None Some Marked 5 10 15

slide-5
SLIDE 5

Introduction Basic techniques Highlighting and shading Visualizing test statistics Multiway tables Conclusion

... a 3D-barplot (?!?)

None Some Marked 5 10 15 20 Placebo Treated

Treatment Improvement Number of patients

slide-6
SLIDE 6

Introduction Basic techniques Highlighting and shading Visualizing test statistics Multiway tables Conclusion

Mosaic of observed frequencies (1)

Improvement None Some Marked

slide-7
SLIDE 7

Introduction Basic techniques Highlighting and shading Visualizing test statistics Multiway tables Conclusion

Mosaic of observed frequencies (2)

Improvement Treatment None Treated Placebo Some Marked

slide-8
SLIDE 8

Introduction Basic techniques Highlighting and shading Visualizing test statistics Multiway tables Conclusion

Mosaic of observed frequencies—alternative splitting

Improvement Treatment Treated Placebo None Some Marked

slide-9
SLIDE 9

Introduction Basic techniques Highlighting and shading Visualizing test statistics Multiway tables Conclusion

Mosaic of expected frequencies

Improvement Treatment Treated Placebo None Some Marked

slide-10
SLIDE 10

Introduction Basic techniques Highlighting and shading Visualizing test statistics Multiway tables Conclusion

Parquet-(Sieve-)diagram

Improvement Treatment None Treated Placebo Some Marked

19 6 7 5 6 16

slide-11
SLIDE 11

Introduction Basic techniques Highlighting and shading Visualizing test statistics Multiway tables Conclusion

Association plot

Pearson residuals rij: standardized deviations of observed (nij) from expected (ˆ nij) frequencies (rij =

nij−ˆ nij

ˆ nij ).

Improvement Treatment None Treated Placebo Some Marked

slide-12
SLIDE 12

Introduction Basic techniques Highlighting and shading Visualizing test statistics Multiway tables Conclusion

Highlighting

Mark improvements levels:

Improvement Treatment Treated Placebo None Some Marked

slide-13
SLIDE 13

Introduction Basic techniques Highlighting and shading Visualizing test statistics Multiway tables Conclusion

Spine plot

Turning it clockwise yields a spine plot. (Similar to barplot, but frequencies are shown by bar widths.)

Treated Placebo Marked Some None 0.2 0.4 0.6 0.8 1 Treatment Improvement

slide-14
SLIDE 14

Introduction Basic techniques Highlighting and shading Visualizing test statistics Multiway tables Conclusion

Friendly’s residual-based shading

Idea: extend mosaic plot by adding information on Pearson residuals through color-coding.

−1.72 −1.00 0.00 1.00 1.87 Pearson residuals: p−value = 0.0032 Improvement Treatment Treated Placebo None Some Marked

slide-15
SLIDE 15

Introduction Basic techniques Highlighting and shading Visualizing test statistics Multiway tables Conclusion

Association plot with shading

−1.72 −1.00 0.00 1.00 1.87 Pearson residuals: p−value = 0.0032 Improvement Treatment None Treated Placebo Some Marked

slide-16
SLIDE 16

Introduction Basic techniques Highlighting and shading Visualizing test statistics Multiway tables Conclusion

Sieveplot with shading

Improvement Treatment None Treated Placebo Some Marked

slide-17
SLIDE 17

Introduction Basic techniques Highlighting and shading Visualizing test statistics Multiway tables Conclusion

Choice of the cutoff points

Friendly wanted to show“patterns of deviation”only. Any ad-hoc choice can lead to wrong conclusions: Colored cells not necessarily indicate a significant χ2 test. The χ2 test can be significant without any colored cell. Reason: the cutoff points for given significance levels depend

  • n the data.
slide-18
SLIDE 18

Introduction Basic techniques Highlighting and shading Visualizing test statistics Multiway tables Conclusion

Again: Mosaic for the Arthritis data

Visualization of the χ2 statistic with Friendly’s default cutoff points (2, 4):

−1.72 0.00 1.87 Pearson residuals: p−value = 0.0032 Improvement Treatment Treated Placebo None Some Marked

slide-19
SLIDE 19

Introduction Basic techniques Highlighting and shading Visualizing test statistics Multiway tables Conclusion

The maximum statistic

Wanted: one-to-one-correspondency between visualization and test, i.e., significance iff at least one cell is colored. The χ2 statistic does not do this: X 2 =

i,j r2 ij

But we can use other functionals to aggregate the residuals than the sum of squares, e.g. the maximum: M = maxi,j |rij| This is the only test statistic with the desired properties. The distribution under the null can be obtained through simulation (permutation test).

slide-20
SLIDE 20

Introduction Basic techniques Highlighting and shading Visualizing test statistics Multiway tables Conclusion

Mosaic diagram for the Arthritis data

Visualization of the maximum statistic with data-driven cutoff points (for levels 10% and 1%):

−1.72 −1.24 0.00 1.24 1.64 1.87 Pearson residuals: p−value = 0.0096 Improvement Treatment Treated Placebo None Some Marked

slide-21
SLIDE 21

Introduction Basic techniques Highlighting and shading Visualizing test statistics Multiway tables Conclusion

A doubledecker diagram

  • Gender

Treatment Female Placebo Treated Male Placebo Treated Marked Some None Improved

slide-22
SLIDE 22

Introduction Basic techniques Highlighting and shading Visualizing test statistics Multiway tables Conclusion

A mosaic plot for conditional independence

−1.72 −1.45 0.00 1.45 1.87 Pearson residuals: p−value = 0.0142

  • Gender

Improved Treatment Female Treated None Some Marked Placebo Male None Some Marked

slide-23
SLIDE 23

Introduction Basic techniques Highlighting and shading Visualizing test statistics Multiway tables Conclusion

A conditional mosaic diagram

If the conditioning variables have unbalanced frequencies, the resulting strata can become distorted. Solution: trellis layout:

Gender = Female Treated None Some Marked Placebo Gender = Male

  • Treated

None Some Marked Placebo

slide-24
SLIDE 24

Introduction Basic techniques Highlighting and shading Visualizing test statistics Multiway tables Conclusion

A conditional association diagram

Gender = Female Treated None Some Marked Placebo Gender = Male Treated None Some Marked Placebo

slide-25
SLIDE 25

Introduction Basic techniques Highlighting and shading Visualizing test statistics Multiway tables Conclusion

Conclusion

The strucplot framework includes visualization techniques like mosaic, sieve and association diagrams (and variants thereof). The can be used for both explorative and modeling tasks. Many features would not exist without the grid graphics engine (Thanks, Paul [Murrell]!) The framework integrates several different plots. which share some customizable graphical aspects: split directions, spacing, labeling, shading, legend, and content of the tiles. The resulting set of graphical parameters is enormeous. Therefore, in developing the package, modularization was key! The useRs’ benefit is a flexible framework that can further be adapated and extended.

slide-26
SLIDE 26

Introduction Basic techniques Highlighting and shading Visualizing test statistics Multiway tables Conclusion

References

Zeileis A, Meyer D, Hornik K (2007). Residual-based Shadings for Visualizing (Conditional) Independence. Journal of Computational and Graphical Statistics, 16(3), pp. 507–525. Meyer D, Zeileis A, Hornik K (2006). The Strucplot Framework: Visualizing Multi-way Contingency Tables with

  • vcd. Journal of Statistical Software, 17(3), pp. 1–48.

Meyer D, Zeileis A, and Hornik K (2008). vcd: Visualizing Categorical Data. R package version 1.0-9. e-mail: Firstname.Lastname@R-Project.org