network analysis
play

Network Analysis Ma Maneesh Agrawala CS 448B: Visualization Fall - PDF document

Network Analysis Ma Maneesh Agrawala CS 448B: Visualization Fall 2020 1 Last Time: Network Layout 2 1 Force-Directed Layout 3 Interactive Example: Configurable Force Layout 4 2 5 d3.force 7,922 nodes 11,881 edges [Kai Chang] 6 3


  1. Network Analysis Ma Maneesh Agrawala CS 448B: Visualization Fall 2020 1 Last Time: Network Layout 2 1

  2. Force-Directed Layout 3 Interactive Example: Configurable Force Layout 4 2

  3. 5 d3.force 7,922 nodes 11,881 edges [Kai Chang] 6 3

  4. Use the Force! http://mbostock.github.io/d3/talk/20110921/ 7 Force-Directed Layout Nodes = charged particles F = q i * q j / d ij 2 with air resistance F = -b * v i Edges = springs F = k * (L - d ij ) D3’s force layout uses velocity Verlet integration Assume uniform mass m and timestep Δ t : F = ma → F = a → F = Δ v / Δ t → F = Δ v Forces simplify to velocity offsets! Repeatedly calculate forces, update node positions Naïve approach O(N 2 ) Speed up to O(N log N) using quadtree or k-d tree Numerical integration of forces at each time step 8 4

  5. 9 10 5

  6. Naive calculation of forces at a point uses sum of forces from all other n-1 points. 11 For fast approximate calculation, we build a spatial index (here, a quadtree) and use it to compare with distant groups of points instead. 12 6

  7. The Barnes-Hut θ parameter controls when to compare with an aggregate center of charge. w quadnode / d ij < θ ? θ = 0.5 13 θ = 0.9 (default setting) 14 7

  8. θ = 1.5 15 θ = 2.0 16 8

  9. Alternative Layouts 18 Linear node layout, circular arcs show connections. Layout quality sensitive to node ordering! 19 9

  10. The Shape of Song [Wattenberg ’01] 20 Limitations of Node-Link Layout Edge-crossings and occlusion 21 10

  11. 22 Attribute-Driven Layout Large node-link diagrams get messy! Is there additional structure we can exploit? Idea: Use data attributes to perform layout I e.g., scatter plot based on node values Dynamic queries and/or brushing can be used to explore connectivity 23 11

  12. Attribute-Driven Layout The “ Skitter ” Layout Internet Connectivity • Radial Scatterplot • Angle = Longitude Geography • Radius = Degree # of connections • (a statistic of the nodes) • 24 Semantic Substrates [Shneiderman Semantic Substrates [Shneiderman 06] 06] 25 12

  13. Summary Tree Layout Indented / Node-Link / Enclosure / Layers How to address issues of scale? I Filtering and Focus + Context techniques Graph Layout Tree layout over spanning tree Hierarchical “ Sugiyama ” Layout Optimization (Force-Directed Layout) Attribute-Driven Layout 26 Announcements 27 13

  14. Final project Data analysis/explainer or conduct research I Data analysis : Analyze dataset in depth & make a visual explainer I Research : Pose problem, Implement creative solution Deliverables I Data analysis/explainer : Article with multiple interactive visualizations I Research : Implementation of solution and web-based demo if possible I Short video (2 min) demoing and explaining the project Schedule I Project proposal: Thu 10/29 I Design Review and Feedback: Tue 11/17 & Thu 11/19 I Final code and video: Sat 11/21 11:59pm Grading I Groups of up to 3 people, graded individually I Clearly report responsibilities of each member 28 Network Analysis *Slides adapted from E. Adar’s / L. Adamic’s Network Theory and Applications course slides. 29 14

  15. Diseases http://diseasome.eu/ 31 Transportation http://www.lx97.com/maps/ 32 15

  16. Lombardi, M. ‘George W. Bush, Harken Energy and Jackson Stephens, ca 1979–90’ 34 Actors and movies (bipartite) 35 16

  17. 36 Characterizing networks What does it look like? 37 17

  18. Size? Density? Centralization? Clustering? Components? Cliques? Motifs? Avg. path length? … www.opte.org 38 Topics Network Analysis Centrality / centralization • Community structure • Pattern identification • Models • 39 18

  19. Centrality 40 How far apart are things? 41 19

  20. Distance: shortest paths Shortest path (geodesic path) I The shortest sequence of links connecting two nodes B I Not always unique A C n A and C are connected by 2 shortest paths n A – E – B - C D n A – E – D - C E 42 Distance: shortest paths Shortest path from 2 to 3: 1 4 6 2 1 5 7 3 43 20

  21. Distance: shortest paths Shortest path from 2 to 3? 4 6 2 1 5 7 3 44 Most important node? 45 21

  22. Centrality Y Y X X outdegree indegree Y X X Y closeness betweenness 46 Degree centrality (undirected) å = = = C d ( n ) A A + D i i ij j 47 22

  23. Normalized degree centrality C D ( i ) = d ( i ) N − 1 48 When is degree not sufficient? Does not capture Ability to broker between groups Likelihood that information originating anywhere in the network reaches you 49 23

  24. Betweenness Assuming nodes communicate using the most direct (shortest) route, how many pairs of nodes have to pass information through target node? Y Y X X Y X 50 Betweenness - examples non-normalized: A B C D E 51 24

  25. Betweenness: definition ∑ C B ( i ) = g jk ( i ) / g jk j , k ≠ i , j < k g jk = the number of paths connecting jk g jk (i) = the number that node i is on. Normalization: ' ( i ) = C B ( i )/[( n − 1)( n − 2)/2] C B number of pairs of vertices excluding the vertex itself 52 When are C d , C b not sufficient? Do not capture Likelihood that information originating anywhere in the network reaches you 54 25

  26. Closeness: definition Being close to the center of the graph Closeness Centrality: − 1 # & N ∑ C c ( i ) = % d ( i , j ) ( % ( $ ' j = 1, j ≠ i Normalized Closeness Centrality N − 1 ' ( i ) = ( C C ( i )) / ( N − 1) = C C N ∑ d ( i , j ) j = 1, j ≠ i 55 Examples - closeness 56 26

  27. Centrality in directed networks Prestige ~ indegree centrality Betweenness ~ consider directed shortest paths Closeness ~ consider nodes from which target node can be reached Influence range ~ nodes reachable from target node Straight-forward modifications to equations for non-directed graphs 57 Characterizing nodes • generally different centrality metrics will be positively correlated • when they are not, there is likely something interesting about the network • suggest possible topologies and node positions to fit each square Low Low Low Degree Closeness Betweenness High Degree Node embedded in Node's connections cluster that is far are redundant - from the rest of the communication network bypasses him/her High Closeness Node links to a Many paths likely small number of to be in network; important/active node is near many other nodes. people, but so are many others High Node’s few ties are Rare. Node Betweenness crucial for network monopolizes the flow ties from a small number of people to many others. 58 27

  28. Centralization – how equal Variation in the centrality scores among the nodes Freeman’s general formula for centralization: maximum value in the network g ∑ [ ] C D ( n * ) − C D ( i ) i = 1 C D = [( N − 1)( N − 2)] 59 Examples [ ] = å = g - * C ( n ) C ( n ) D D i C i 1 D - - [( N 1 )( N 2 )] C D = (5 − 5) + (5 − 1) × 5 = 1 (6 − 1)(6 − 2) 60 28

  29. Examples C D = 0.167 C D = 1.0 C D = 0.167 61 Community Structure 66 29

  30. How dense is it? density = e/ e max Max. possible edges: I Directed: e max = n*(n-1) I Undirected: e max = n*(n-1)/2 67 Is everything connected? 68 30

  31. Connected Components - Directed Strongly connected components I Each node in component can be reached from every other node in component by following directed links F n B C D E B G C n A A H n G H D n F E Weakly connected components I Each node can be reached from every other node by following links in either direction n A B C D E n G H F 69 Community finding (clustering) 72 31

  32. Hierarchical clustering Process: I Calculate affinity weights W for all pairs of vertices I Start: N disconnected vertices I Adding edges (one by one) between pairs of clusters in order of decreasing weight (use closest distance to compare clusters) I Result: nested components 73 Cluster Dendrograms 74 32

  33. Hierarchical clustering (closeness) 76 Betweenness clustering Girvan and Newman 2002 iterative algorithm: I Compute C b of all edges I Remove edge i where C b (i) == max(C b ) I Recalculate betweenness 77 33

  34. Clustering coefficient Local clustering coefficient: number of closed triplets centered on i C i = i number of connected triplets centered on i Global clustering coefficient: C i = 1/3 = 0.33 C G = 3* number of closed triplets C G = 3*1/5 = 0.6 number of connected triplets 78 Pattern finding - motifs Define / search for a particular structure, e.g. complete triads W X Y Z 80 34

  35. Motifs can overlap in the network graph motif to be found motif matches http://mavisto.ipk-gatersleben.de/frequency_concepts.html 81 4 node subgraphs 82 35

  36. Simulating network models 95 Small world network Milgram (1967) I Mean path length in US social networks I ~ 6 hops separate any two people 96 36

Recommend


More recommend