Big Data Era 1 1 https://vimeo.com/102998774
The big problem: Scalability Visualization Algorithm Hardware 2
The big problem: Scalability Visualization Algorithm Hardware https://upload.wikimedia.org/wikipedia/commons/0/05/Sna_large.png https://upload.wikimedia.org/wikipedia/commons/9/9b/Social_Network_Analysis_Visualization.png https://c1.staticflickr.com/5/4033/4520018121_6dd39e8d7e_z.jpg 3 https://c1.staticflickr.com/1/1/916142_ddc2fd0140.jpg
Graph Sampling • Randomly pick nodes /edges to construct a subgraph that represents the original unfiltered graph: 4
Which sampling strategy to use? 5
Graph Sampling Evaluation [Leskovec and Faloutsos, KDD 2006] Random Walk (RW) v.s. Forest Fire (FF) 6
Graph Sampling Evaluation in Visualization Random Walk (RW) Original Graph Forest Fire (FF) Avg. node degree: 2.4 Avg. node degree: 2.4 Power-law degree distribution Power-law degree distribution Distinct Visual Result! 7
Graph Sampling Evaluation in Visualization Similarity Measurements Statistical Features: Hub Inclusion ? Clustering Coeff. Discovery Quotient … Data Mining Visualization 8
Graph Sampling Evaluation in Visualization Similarity Measurements Goals Procedure G1: Identify the key visual factors Pilot Statistical that makes the sampled graphs representative Study Features: Visual Factors: Hub Inclusion Clustering Coeff. ? Discovery Quotient G2: Evaluate the performance of different Formal … sampling algorithms on these visual factors Studies Data Mining Visualization 9
Outline • Selected Sampling Methods • Pilot Study • Formal Studies • Perception of High Degree Nodes • Perception of Cluster Quality • Perception of Coverage Area 10
Node-Based Sampling Original Graph Random Node Sampling 11
Node-Based Sampling Original Graph Random Node Sampling 12
Node-Based Sampling Original Graph Random Node Sampling 13
Node-Based Sampling Original Graph Random Node Sampling 14
Edge-Based Sampling Original Graph Random Edge Sampling 15
Edge-Based Sampling Original Graph Random Edge Sampling 16
Edge-Based Sampling Original Graph Random Edge Sampling 17
Traversal-Based Sampling: Random Walk Original Graph Random Walk 18
Traversal-Based Sampling: Random Walk Original Graph Random Walk 19
Traversal-Based Sampling: Random Jump Original Graph Random Jump 20
Traversal-Based Sampling: Random Jump Original Graph Random Jump 21
Traversal-Based Sampling: Forest Fire Original Graph Forest Fire 22
Traversal-Based Sampling: Forest Fire Original Graph Forest Fire 23
Outline • Selected Sampling Methods • Pilot Study • Formal Studies • Perception of High Degree Nodes • Perception of Cluster Quality • Perception of Coverage Area 24
Pilot Study • Task: • Identify the visual factors that strongly influence the representativeness of sampled graphs • We also determine the sampling rate used in the formal studies. Dataset: 5 Real-World Graphs Visual Factor Candidates 25
Pilot Study • Task: • Identify the visual factors that strongly influence the representativeness of sampled graphs • We also determine the sampling rate used in the formal studies. High Degree Nodes Cluster Quality Coverage Area Results (key visual factors) Visual Factor Candidates 26
Outline • Selected Sampling Methods • Pilot Study • Formal Studies • Perception of High Degree Nodes • Perception of Cluster Quality • Perception of Coverage Area 27
Formal Study I: High Degree Nodes A A B B 20 high degree nodes 8 high degree nodes? Sampled Graph Original Graph 28
Formal Study I: High Degree Nodes 29
Formal Study I: High Degree Nodes N: 1024, D: S N: 2048, D: S N: 1024, D: L N: 2048, D: L Experiment Setting 20 high degree nodes Data Generation 30
Formal Study I: High Degree Nodes Results • Discussions: • It is easier to perceive high degree nodes in the RW Samples • It is more difficult to perceive high degree nodes in RN Samples • Above results hold across datasets 31
Formal Study I: High Degree Nodes Results • Discussions: • It will be easier to perceive high degree nodes in the RW Samples • It will be more difficult to perceive high degree nodes in RN Samples. • Above results hold across datasets RW FF Number of high degree nodes perceived (Visualization): + Contradiction with Number of high degree nodes remained (Data Mining): * metric-based results! 32
Formal Study I: High Degree Nodes Results 16 high degree nodes remained 7 high degree nodes remained Random Walk (RW) Forest Fire (FF) 33
Formal Study I: High Degree Nodes Results 6 high degree nodes perceived 3 high degree nodes perceived 16 high degree nodes remained 7 high degree nodes remained Random Walk (RW) Forest Fire (FF) 34
Outline • Selected Sampling Methods • Pilot Study • Formal Studies • Perception of High Degree Nodes (more high degree nodes are perceived in RW ) • Perception of Cluster Quality • Perception of Coverage Area 35
Formal Study II: Cluster Quality 36
Formal Study II: Cluster Quality Experiment Setting Data Generation 37
Formal Study II: Cluster Quality Results • Discussions: • RE and RJ best preserve the perceived cluster quality in samples • RN and FF struggles in preserving the perceived cluster quality • The performance of RW and FF depends on graph modularity 38
Formal Study II: Cluster Quality Results The number of clusters remained is important for perceiving the cluster quality in visualization! 39
Outline • Selected Sampling Methods • Pilot Study • Formal Studies • Perception of High Degree Nodes (more high degree nodes are perceived in RW ) • Perception of Cluster Quality (cluster number is important) • Perception of Coverage Area 40
Formal Study III: Coverage Area 41
Formal Study III: Coverage Area N: 1024, D: S N: 2048, D: S N: 1024, D: L N: 2048, D: L Experiment Setting Data Generation 42
Formal Study III: Coverage Area Results • Discussions: • RE and RJ have the largest perceived coverage area • RW has a smallest perceived coverage area in most cases • RW and FF ’s performance vary depending on graph properties G4: (N:2048, D: L) Overall G1 : (N:1024, D: S) G2: (N:1024, D: L) G3: (N:2048, D: S) BA RN RN All RW REN All REN REN All All REN REN REN REN REN RN,RW,RJ All REN RJ RW RW REN RW RW RW REN RW RW RW RW FF RJ RW All FF RW All All All All All All 2 (4) = 481.4, p 2 (4) = 483.9, p 2 (4) = 542.5, p 2 (4) = 475.2, p 2 (4) = 2272.8, p 0.006 0.006 0.006 0.006 0.05 Contradiction with 2.87 3.71 1.30 3.19 2.88 2.79 3.56 1.26 3.03 3.46 2.85 3.99 1.29 3.19 3.32 2.81 3.79 1.32 3.37 3.27 2.77 3.75 1.92 3.39 2.67 G8: (N:2048, M: H) Sah G5: (N:1024, M: L) G6: (N:1024, M: H) G7: (N:2048, M: L) Data RN REN RW RJ FF metric-based results! RN,RW,FF RN RN,RW,FF RN,RW,FF All RN RN,RW,FF G1 22% 29% 22% 28% 27% All REN FF REN FF REN RN G2 23% 31% 24% 29% 29% RW All REN RJ REN RJ All RJ RJ RJ G3 21% 29% 23% 28% 28% G4 22% 31% 24% 28% 28% All All G5 24% 39% 41% 41% 40% All All G6 25% 36% 34% 36% 33% G7 27% 45% 46% 46% 47% 2 (4) = 581.9, p G8 21% 32% 29% 32% 29% 2 (4) = 67.99, p 2 (4) = 605.8, p 2 (4) = 234.7, p 0.006 0.006 0.006 0.006 All 23% 34% 30% 34% 33% 43 2.54 3.35 3.13 3.29 2.87 2.78 3.78 2.15 3.88 1.44 2.49 3.88 2.94 3.5 2.69 3.03 3.92 2.01 3.66 1.44
Formal Study III: Coverage Area Results RW RN 44
Conclusion • We provided the first study of how graph sampling strategies can influence the perception of node-link visualizations • Important visual factors: high degree nodes, cluster quality, and coverage area • Recommendations for sampling network visualizations: • Recommend Random Edge and Random Jump for global structure and cluster quality • Recommend Random Walk for perceived high degree nodes • Use Random Node unless for specific requirements • Random Walk and Forest Fire are modularity sensitive Graph sampling performance in visualization may VARY from previous metric-based results! 45
Q&A Evaluation of Graph Sampling: A Visualization Approach Yanhong Wu , Nan Cao, Daniel Archambault, Qiaomu Shen, Huamin Qu, and Weiwei Cui yanhong.wu@ust.hk http://yhwu.me
Recommend
More recommend