E ULERIAN TOUR ALGORITHMS FOR DATA VISUALIZATION AND THE P AIR V IZ PACKAGE Catherine Hurley R.W. Oldford NUI Maynooth U. Waterloo July 8 2009 UseR! Monday 13 July 2009
Graphics: Effect Ordering • Packages: seriation, gclus, corrgram • Example: PCP Flea data S tandard order Correlation order Tars2 Aede1 Aede2 Aede3 Tars1 Head Tars1 Tars2 Aede1 Aede2 Head Aede3 0.2 -0.6 Monday 13 July 2009
Pairviz: relationship ordering • Statistical graphics are about comparisons between variables, cases, groups, models Flea data: correlation order Aede3 Aede2 Aede1 Tars2 Aede2 Aede1 Aede3 Tars2 Tars1 Head Tars2 Tars1 Aede1 Head Aede2 Tars1 Aede3 Head 0.6 0.0 -0.6 Monday 13 July 2009
A graph model • Build a graph where nodes are statistical objects C B • Edges are relationships D A • Example: Node Vis Edge Vis E F Group Boxplot Two groups CI for mean diff Var Hist Two vars Scatterplot 2 vars Scat 4-d space Dynamic scat Model Resid 2 Models PCP Monday 13 July 2009
Example: planned comparisons Mice in 5 diet groups, response is lifetime Nodes are treatments, edges are planned comparisons Weights are p-values Planned comparisons of diets 10 50 lopro 40 5 0.0147 Lifetime Differences 30 0 20 NP 0 N/N85 0 N/R50 0.3111 R/R50 -5 10 0.0083 N/R50 N/N85 NP lopro N/R50 N/R40 R/R50 N/R50 N/R40 Reducing calories and protein increases lifetime Monday 13 July 2009
Graph Traversal • Traverse all nodes: hamiltonian path C C D B D B E E A A C B F H F H G G D A Open hamiltonian path Closed hamiltonian path E G F Closed eulerian path on K 7 • Traverse all edges: eulerian path • Use gclus, seriation : hamiltonian paths on complete graphs • PairViz: eulerian paths Monday 13 July 2009
Graph Structures • Complete graph: all comparisons are interesting Aede3 Aede2 Aede1 Tars2 Aede2 Aede1 Aede3 Tars2 Tars1 Head Tars2 Tars1 Aede1 Head Aede2 Tars1 Aede3 Head 0.6 0.0 • Edge-weighted graphs: low -0.6 weight edges are more Weight edges by 1-corr, eulerian follows low weight edges interesting • Bipartite graph X1 eg only treatment-control Y1 X2 comparisons are of interest Y2 X3 Monday 13 July 2009
Graph Structures- cont’d • Hypercube graph • Line graph B Cube for factorial experiment transform G C A 110 111 D 010 011 AD AC to L(G) 100 101 BC AB 000 001 BD CD or model selection: eg Each node in G is a var, Each node in G is a predictor subset each node in L(G) is var pair, edge: add/drop predictor edge is 3-d transition Monday 13 July 2009
Algorithms- Complete graph • Closed eulerian path exists when each node has odd number of vertices: ie for K 2n+1 • Hamiltonian decomposition of graph • into hamiltonian cycles: eulerian for K 2n+1 3 3 3 2 2 2 4 4 4 1 1 1 5 5 5 7 7 7 6 6 6 • into hamiltonian paths: approx eulerian for K 2n • classical algorithm: hpaths • WHam: weighted_hpaths: pick best for H 1 , best orientaton and order for others. Monday 13 July 2009
Algorithms-Complete graph cont’d • Recursive algorithm: eseq : • Start with eulerian on K n , append edges to get eulerian on K n+2 1 2 3 4 5 6 7 Monday 13 July 2009
Algorithms- general • Eulerian graph: connected, all nodes have even number of edges • Otherwise, add edges, pairing up odd nodes lopro 0.0147 Chinese postman does this in optimal way NP 0 N/N85 0 N/R50 0.3111 R/R50 0.0083 • Classical algorithm (Hierholzer, Fleury) N/R40 • Our version GrEul, ( etour ) follows weight increasing edges Monday 13 July 2009
Algorithms comparison Complete-no weights Etour 9 Eseq 9 hpaths 9 8 8 8 6 6 6 4 4 4 2 2 2 0 5 10 15 20 25 30 35 0 5 10 15 20 25 30 35 0 5 10 15 20 25 30 35 prefers low vertices prefers low edges 4 hamiltonians Monday 13 July 2009
Algorithms: complete, weighted Eurodist: 21 European cities Algorithm eseq: Eurodist edge weights Weighted etour on Eurodist Weighted hamiltonians on Eurodist 4000 4000 4000 3000 3000 3000 2000 2000 2000 1000 1000 1000 0 0 0 0 50 100 150 200 0 50 100 150 200 0 50 100 150 200 ignores weights Starts in Geneva 1 2 3 4 5 6 7 8 9 10 hamiltonian decomp, with increasing path lengths Monday 13 July 2009
Example: model selection Mammal sleep data Y= log brain wt. Predictors A= non dreaming sleep, B=dreaming sleep, C=log body wt, D=life span ABCD • Hypercube graph represents possible moves in a stepwise regression algorithm • Graph Q n is hamiltonian, and eulerian for even n ABC ABD ACD BCD • Edge weights: change in SSE AB AC AD BC BD CD A B C D 0 Sleep data: Model residuals. • Eulerian starting with full model • All models with C are good ABCD BCD CD ACD ABCD ABC BC C AC ABC AB A AD ABD BD D AD ACD AC A 0 D CD C 0 B BD BCD BC B AB ABD ABCD • Bar chart: change in SSE Monday 13 July 2009
More variables Sleep data: 10 vars (nodes) 45 edges Eulerian has length 50 Eulerian on scagnostics: Outlying GP Bd L Br Bd SW PS TS SE PS TS D L P L PS Br P TS Bd TS PS P D D Br P D 0.6 0.3 0.0 Using outlying index from scagnostics package for eulerian traversal zoom on first half of display Monday 13 July 2009
More variables-cont’d Reduce the graph NN graph: eliminate edges with outlier index < .2 Bd NN Eulerian on scagnostics: Outlying Br SW L GP L Bd SW L Br GP GP 0.6 0.3 0.0 Reduces graph from 10 to 5 nodes, and 45 to 5 edges Other nodes have no edges Monday 13 July 2009
IN CONCLUSION.. • Pairviz package: relationship ordering for data visualisation • Current version: algorithms presented here • Thanks to graph, igraph • Work in progress: ordering dynamic visualisations via ggobi . with Adrian Waddell, UW Monday 13 July 2009
Recommend
More recommend