Introduction Basic techniques Highlighting and shading Visualizing test statistics Multiway tables Conclusion The Strucplot Framework for Visualizing Categorical Data David Meyer 1 , Achim Zeileis 2 and Kurt Hornik 2 1 Department of Information Systems and Operations 2 Department of Statistics and Mathematics Wirtschaftsuniversit¨ at Wien Dortmund, useR! 2008
Introduction Basic techniques Highlighting and shading Visualizing test statistics Multiway tables Conclusion Introduction This talk is about statistical graphics: Visualizing Categorical Data using the vcd package. (Motivation: VCD book for SAS by Michael Friendly.) vcd includes tools for fitting discrete distributions, manipulating two- and higher-dimensional“flat”tables, computing test statistics, and creating plots supporting both exploratory analysis and inference. There are also a lot of data sets. The talk focuses on the“strucplot”framework in vcd , supporting the creation of (variants of) mosaic, association, and sieve plots in a flexible way. It will start with exploratory techniques for two-way tables, discuss highlighting and shading techniques, link this with inference methods, and conclude on some methods for higher-dimensional data.
Introduction Basic techniques Highlighting and shading Visualizing test statistics Multiway tables Conclusion The Arthritis data (Koch and Edwards, 1988) Results from a double-blind clinical trial among 84 patients investigating a new treatment for rheumatoid arthritis, stratified by age and gender. (In this talk, we ignore age.) Improvement None Some Marked Gender Treatment Female Placebo 19 7 6 Treatment 6 5 16 Male Placebo 10 0 1 Treatment 7 2 5 We start with the results for female patients (two-way data).
Introduction Basic techniques Highlighting and shading Visualizing test statistics Multiway tables Conclusion Visualize this with ... a barplot (?) None Some Marked 15 10 5 0 Treated Placebo
Introduction Basic techniques Highlighting and shading Visualizing test statistics Multiway tables Conclusion ... a 3D-barplot (?!?) 20 Number of patients 15 Improvement 10 Treated 5 Placebo None Some Marked Treatment
Introduction Basic techniques Highlighting and shading Visualizing test statistics Multiway tables Conclusion Mosaic of observed frequencies (1) Improvement None Some Marked
Introduction Basic techniques Highlighting and shading Visualizing test statistics Multiway tables Conclusion Mosaic of observed frequencies (2) Improvement None Some Marked Placebo Treatment Treated
Introduction Basic techniques Highlighting and shading Visualizing test statistics Multiway tables Conclusion Mosaic of observed frequencies—alternative splitting Improvement None Some Marked Placebo Treatment Treated
Introduction Basic techniques Highlighting and shading Visualizing test statistics Multiway tables Conclusion Mosaic of expected frequencies Improvement None Some Marked Placebo Treatment Treated
Introduction Basic techniques Highlighting and shading Visualizing test statistics Multiway tables Conclusion Parquet-(Sieve-)diagram Improvement None Some Marked Placebo 19 7 6 Treatment Treated 6 5 16
Introduction Basic techniques Highlighting and shading Visualizing test statistics Multiway tables Conclusion Association plot Pearson residuals r ij : standardized deviations of observed ( n ij ) from n ij − ˆ n ij √ expected (ˆ n ij ) frequencies ( r ij = n ij ). ˆ Improvement None Some Marked Placebo Treatment Treated
Introduction Basic techniques Highlighting and shading Visualizing test statistics Multiway tables Conclusion Highlighting Mark improvements levels: Improvement None Some Marked Placebo Treatment Treated
Introduction Basic techniques Highlighting and shading Visualizing test statistics Multiway tables Conclusion Spine plot Turning it clockwise yields a spine plot . (Similar to barplot, but frequencies are shown by bar widths .) 1 None 0.8 Some Improvement 0.6 0.4 Marked 0.2 0 Treated Placebo Treatment
Introduction Basic techniques Highlighting and shading Visualizing test statistics Multiway tables Conclusion Friendly’s residual-based shading Idea: extend mosaic plot by adding information on Pearson residuals through color-coding. Improvement None Some Marked Pearson residuals: 1.87 Placebo 1.00 Treatment 0.00 Treated −1.00 −1.72 p−value = 0.0032
Introduction Basic techniques Highlighting and shading Visualizing test statistics Multiway tables Conclusion Association plot with shading Improvement None Some Marked Pearson residuals: 1.87 Placebo 1.00 Treatment 0.00 Treated −1.00 −1.72 p−value = 0.0032
Introduction Basic techniques Highlighting and shading Visualizing test statistics Multiway tables Conclusion Sieveplot with shading Improvement None Some Marked Placebo Treatment Treated
Introduction Basic techniques Highlighting and shading Visualizing test statistics Multiway tables Conclusion Choice of the cutoff points Friendly wanted to show“patterns of deviation”only. Any ad-hoc choice can lead to wrong conclusions: Colored cells not necessarily indicate a significant χ 2 test. The χ 2 test can be significant without any colored cell. Reason: the cutoff points for given significance levels depend on the data.
Introduction Basic techniques Highlighting and shading Visualizing test statistics Multiway tables Conclusion Again: Mosaic for the Arthritis data Visualization of the χ 2 statistic with Friendly’s default cutoff points (2, 4): Improvement None Some Marked Pearson residuals: 1.87 Placebo Treatment 0.00 Treated −1.72 p−value = 0.0032
Introduction Basic techniques Highlighting and shading Visualizing test statistics Multiway tables Conclusion The maximum statistic Wanted: one-to-one-correspondency between visualization and test, i.e., significance iff at least one cell is colored. The χ 2 statistic does not do this: X 2 = � i , j r 2 ij But we can use other functionals to aggregate the residuals than the sum of squares, e.g. the maximum: M = max i , j | r ij | This is the only test statistic with the desired properties. The distribution under the null can be obtained through simulation (permutation test).
Introduction Basic techniques Highlighting and shading Visualizing test statistics Multiway tables Conclusion Mosaic diagram for the Arthritis data Visualization of the maximum statistic with data-driven cutoff points (for levels 10% and 1%): Improvement None Some Marked Pearson residuals: 1.87 1.64 Placebo 1.24 Treatment 0.00 Treated −1.24 −1.72 p−value = 0.0096
Introduction Basic techniques Highlighting and shading Visualizing test statistics Multiway tables Conclusion A doubledecker diagram Improved None Some Marked ● Placebo Treated Placebo Treated Treatment Female Male Gender
Introduction Basic techniques Highlighting and shading Visualizing test statistics Multiway tables Conclusion A mosaic plot for conditional independence Gender Female Male Pearson residuals: 1.87 1.45 Placebo ● Treatment 0.00 Treated −1.45 −1.72 p−value = 0.0142 None Some Marked None Some Marked Improved
Introduction Basic techniques Highlighting and shading Visualizing test statistics Multiway tables Conclusion A conditional mosaic diagram If the conditioning variables have unbalanced frequencies, the resulting strata can become distorted. Solution: trellis layout: Gender = Female Gender = Male Placebo Placebo ● Treated Treated None Some Marked None Some Marked
Introduction Basic techniques Highlighting and shading Visualizing test statistics Multiway tables Conclusion A conditional association diagram Gender = Female Gender = Male Placebo Placebo Treated Treated None Some Marked None Some Marked
Introduction Basic techniques Highlighting and shading Visualizing test statistics Multiway tables Conclusion Conclusion The strucplot framework includes visualization techniques like mosaic, sieve and association diagrams (and variants thereof). The can be used for both explorative and modeling tasks. Many features would not exist without the grid graphics engine (Thanks, Paul [Murrell]!) The framework integrates several different plots. which share some customizable graphical aspects: split directions, spacing, labeling, shading, legend, and content of the tiles. The resulting set of graphical parameters is enormeous. Therefore, in developing the package, modularization was key! The useRs’ benefit is a flexible framework that can further be adapated and extended.
Introduction Basic techniques Highlighting and shading Visualizing test statistics Multiway tables Conclusion References Zeileis A, Meyer D, Hornik K (2007). Residual-based Shadings for Visualizing (Conditional) Independence. Journal of Computational and Graphical Statistics , 16(3), pp. 507–525. Meyer D, Zeileis A, Hornik K (2006). The Strucplot Framework: Visualizing Multi-way Contingency Tables with vcd. Journal of Statistical Software , 17(3), pp. 1–48. Meyer D, Zeileis A, and Hornik K (2008). vcd: Visualizing Categorical Data . R package version 1.0-9. e-mail: Firstname.Lastname@R-Project.org
Recommend
More recommend