Detecting Clusters and Nonlinearity in 3D Dynamic Graphs John Fox McMaster University Robert Stine University of Pennsylvania Georges Monette and Nehru Vohra York University Interface 2002
Clusters and Nonlinearity in 3D Dynamic Graphs 1 Clusters and Nonlinearity in 3D Dynamic Graphs 2 • In an experiment on cluster detection, we seek to establish 1 Introduction – whether data analysts are able to discern clustering in 3D plots • Three-dimensional dynamic scatterplots have been promoted as – if so, whether the probability of detection varies by easily characterized – geometrical tools for understanding statistical concepts properties of the clusters – tools for analyzing data – whether the design of the display is related to cluster detection • 3D scatterplots can reveal certain features of data that cannot be • Three further experiments investigate similar issues with respect to the apprehended in conventional two-dimensional displays: detection of nonlinearity. – some kinds of clustering – some kinds of nonlinearity (interaction) • Originally the province of experimental graphical systems, 3D dynamic scatterplots are now found in standard statistical software packages. • But 3D dynamic scatterplots can be difficult to decode. Fox, Stine, Monette, and Vohra Interface 2002 Fox, Stine, Monette, and Vohra Interface 2002 Clusters and Nonlinearity in 3D Dynamic Graphs 3 Clusters and Nonlinearity in 3D Dynamic Graphs 4 – When the plot rotates around an axis through the center of the data 2 Design of the Experiments perpendicular to the horizontal plane, the detection of diagonally displaced clusters depends less critically upon the specific orientation 2.1 Cluster Detection of the point-cloud. • Data Generation – Data were generated at three levels of average separation: – Subjects viewed displays of rotatable 3D point clouds. The subjects (i) no separation — a single cluster of 100 observations were asked to report whether they saw one or two clusters of points in (ii) a low level of average separation each display. (iii) a medium level of average separation – All data displays included 100 points, either as a single trivariate- – Expected data ellipsoids for the various kinds of clusters employed in normal cluster or as two trivariate-normal clusters with 50 points each, the experiment are illustrated in Figure 1. and identical within-cluster covariance matrices but different centroids – Cluster centroids were either displaced diagonally , that is along the vector (1,1,1)’, or horizontally , that is parallel to the vector (1,1,0)’. – We maintained the same overall covariance matrix for all diagonally displaced displays (including those at ‘zero separation’); a similar procedure was employed for horizontally displaced clusters. Fox, Stine, Monette, and Vohra Interface 2002 Fox, Stine, Monette, and Vohra Interface 2002
Clusters and Nonlinearity in 3D Dynamic Graphs 5 Clusters and Nonlinearity in 3D Dynamic Graphs 6 • Display Elements – We concentrated on those aspects that seem particularly relevant to perceiving the relative positions of points in 3D space: perspective, depth-cueing, and motion of the display. ∗ Perspective was set at three level: none (i.e., orthographic projec- tion), medium, and high. ∗ Depth-cueing was performed by varying the color-saturation of objects on the screen according to their distance, close points appearing more saturated. ∗ Motion of the plot was either continuous, by rotation in the horizontal plane, or under the control of the subject. – Our subjective preference is for a display with depth-cueing and moderate perspective. Figure 1.Fifty-percent concentration ellipsoids and planes of separation for diagonally oriented clusters (top) and horizontally oriented clusters (bot- tom). The level of separation of clusters increases from left to right; the graphs at the far left show single clusters (no separation). Fox, Stine, Monette, and Vohra Interface 2002 Fox, Stine, Monette, and Vohra Interface 2002 Clusters and Nonlinearity in 3D Dynamic Graphs 7 Clusters and Nonlinearity in 3D Dynamic Graphs 8 – We find a continuously rotating display easier to look at and adequate • Software for the present task. – The software used to conduct the experiment was written in Lisp-Stat, ∗ This effect can be achieved by manipulating plot controls, but clumsy and is a modified version of the programs described in some detail by use of the controls can also prove disorienting and time-consuming Fox and Stine (1998 Interface). ∗ More generally, there is no guarantee that a simple automatic – We modified the low-level handling of 3D plots in Lisp-Stat to strategy such as horizontal rotation will reveal structure in a 3D point incorporate perspective and color-based depth-cueing. cloud. Fox, Stine, Monette, and Vohra Interface 2002 Fox, Stine, Monette, and Vohra Interface 2002
Clusters and Nonlinearity in 3D Dynamic Graphs 9 Clusters and Nonlinearity in 3D Dynamic Graphs 10 • Procedures 2.2 Detection of Nonlinearity – Ten volunteer graduate students were recruited for the study. Each • Data Generation subject participated in five sessions, each session lasting approxi- – Subjects in three experiments on detecting nonlinearity viewed mately one-half hour. displays of rotatable 3D point clouds, and were asked whether the – A session comprised 72 trials (3 levels of separation × 2 cluster response variable in the graph ( Y ) is linearly or nonlinearly related to orientations × 3 levels of perspective × 2 levels of depth-cueing × 2 the two predictors ( X 1 and X 2 ). levels of motion, in random order). – All of the datasets employed in the study included 100 randomly – On each trial of the experiment, subjects were asked “to determine generated points. whether there are one or two clusters (groups) of points in the data – Values of the response variable were constructed according to the for each graph.” The stimulus graph was visible until the subject full-quadratic model responded, to a maximum of 30 seconds. √ Y = X 1 + X 2 + α ( X 2 1 + X 2 2 + 2 X 1 X 2 ) + σε where X 1 , X 2 , ε ∼ NID (0 , 1) , and the values of α and σ were selected to generate different (expected) levels of nonlinearity and correlation. Fox, Stine, Monette, and Vohra Interface 2002 Fox, Stine, Monette, and Vohra Interface 2002 Clusters and Nonlinearity in 3D Dynamic Graphs 11 Clusters and Nonlinearity in 3D Dynamic Graphs 12 – We selected three levels of correlation or ‘signal’ (defined as the expected R 2 for the model: 1 / 3 , 1 / 2 , or 2 / 3 ), and four levels of nonlinearity (defined as the expected proportion of the R 2 for the 15 15 10 model due to its nonlinear component: 0 , 1 / 3 , 1 / 2 , or 2 / 3 , producing Y 10 5 Y 5 0 12 fundamental stimulus configurations. 0 2 -2 1 -1 2 0 – Figure 2 illustrates the expected response surface corresponding to a 0 1 X2 2 -1 X1 1 1 � 0 2 -2 X 0 2 relatively high degree of nonlinearity (for which α = 2 / 3 ). - 1 -1 X1 - 2 -2 – The three experiments used the same 12 basic stimulus configura- 15 15 tions, but manipulated different aspects of the displays. 10 10 Y Y 5 • Procedures generally similar to the experiment on clustering. 5 0 -2 0 -1 2 -2 0 • Experiment 1. General Design: Four variations of spinning 3D 1 X2 -1 -2 0 1 0 -1 X - 1 1 0 - 2 X2 2 scatterplots that are common, and one that is unusual (Figure 3): 1 1 X1 2 2 (a) A bare point cloud (except for labelled axes). √ � 2 / 3( X 2 1 + X 2 Figure 2.The response surface E ( Y ) = X 1 + X 2 + 2 + 2 X 1 X 2 ) (b) A display with the least-squares regression plane shown as a gray from several points of view. wire-frame. Fox, Stine, Monette, and Vohra Interface 2002 Fox, Stine, Monette, and Vohra Interface 2002
Recommend
More recommend