Why Graphs? Discussion is based on the book and slides by Let us - PDF document

Why Graphs? • Discussion is based on the book and slides by Let us now look at implementing graph Jimmy Lin and Chris Dyer algorithms in MapReduce. • Analyze hyperlink structure of the Web • Social networks – Facebook friendships, Twitter followers, email flows, phone call patterns • Transportation networks – Roads, bus routes, flights • Interactions between genes, proteins, etc. 1 2 What is a Graph? Graph Problems • G = (V, E) • Graph search and path planning – V: set of vertices (nodes) – Find driving directions from A to B – E: set of edges (links), 𝐹 ⊆ 𝑊 × 𝑊 – Recommend possible friends in social network • Edges can be directed or undirected – How to route IP packets or delivery trucks • Graph might have cycles or not (acyclic graph) • Graph clustering • Nodes and edges can be annotated – Identify communities in social networks – E.g., social network: node has demographic – Partition large graph to parallelize graph processing information like age; edge has type of relationship • Minimum spanning trees like friend or family – Connected graph of minimum total edge weight 3 4 More Graph Problems Graph Representations • Bipartite graph matching • Usually one of these two: – Match nodes on “left” with nodes on “right” side – Adjacency matrix – E.g., match job seekers and employers, singles looking – Adjacency list for dates, papers with reviewers • Maximum flow – Maximum traffic between source and sink – E.g., optimize transportation networks • Finding “special” nodes – E.g., disease hubs, leader of a community, people with influence 5 6

Adjacency Matrix Properties • Matrix M of size |N| by |N| • Advantages – Entry M(i,j) contains weight of edge from node i to – Easy to manipulate with linear algebra node j; 0 if no edge • M  M: entry (i,j) = number of two-step paths to go from node i to node j 2 1 2 3 4 – Operation on outlinks and inlinks corresponds to 1 0 1 0 1 1 iteration over rows and columns 3 2 1 0 1 1 • Disadvantage 3 1 0 0 0 – Huge space overhead for sparse matrix 4 1 0 1 0 4 – E.g., Facebook friendship graph Example source: Jimmy Lin 7 8 Adjacency List Properties • Compact row-wise representation of matrix • Advantages – More space-efficient – Still easy to compute over outlinks for each node • Disadvantage 1 2 3 4 1: 2, 4 – Difficult to compute over inlinks for each node 1 0 1 0 1 2: 1, 3, 4 2 1 0 1 1 3: 1 • Note: remember inverse Web graph 3 1 0 0 0 4: 1, 3 discussion 4 1 0 1 0 9 10 Parallel Breadth-First Search Dijkstra’s Algorithm Example • Case study: single-source shortest path problem 1   – Find the shortest path from a source node s to all other nodes in the graph 10 • For non-negative edge weights, Dijkstra’s algorithm is the classic sequential solution 9 0 2 3 4 6 – Initialize distance d[s]=0, all others to  – Maintain priority queue of nodes sorted by distance 7 5 – Remove first node u from queue and update d[v] for   each node v in adjacency list of u if (1) v is in queue 2 and (2) d[v] > d[u]+weight(u,v) Example from Jimmy Lin’s presentation 11 12 Example from CLR

Dijkstra’s Algorithm Example Dijkstra’s Algorithm Example 1 1  10 8 14 10 10 9 9 0 2 3 4 6 0 2 3 4 6 7 7 5 5  5 5 7 2 2 13 14 Example from CLR Example from CLR Dijkstra’s Algorithm Example Dijkstra’s Algorithm Example 1 1 8 13 8 9 10 10 2 3 9 4 6 2 3 9 4 6 0 0 7 7 5 5 5 7 5 7 2 2 15 16 Example from CLR Example from CLR Dijkstra’s Algorithm Example Parallel Single-Source Shortest Path • Priority queue is core element of Dijkstra’s 1 8 9 algorithm – No global shared data structure in MapReduce 10 • Dijkstra’s algorithm proceeds sequentially, 9 0 2 3 4 6 node by node – Taking non-min node could affect correctness of 7 5 algorithm 5 7 • Solution: perform parallel breadth-first search 2 17 18 Example from CLR

Parallel Breadth-First Search BFS Visualization n 7 • Start at source s n 0 n 1 • In first round, find all nodes reachable in one hop from s n 2 n 3 n 6 • In second round, find all nodes reachable in two hops from s, and so on n 5 • Keep track of min distance for each node n 4 n 8 – Also record corresponding path • Iterations stop when no shorter path possible Example from Jimmy Lin’s n 9 presentation 19 20 MapReduce Code: Single Iteration Overall Algorithm • Need driver program to control the iterations map(nid n, node N) // N stores node’s current min distance and adjacency list • Initialization: SourceNode.distance = 0, all others d = N.distance have distance=  emit(nid n, N) // Pass along graph structure for all nid m in N.adjacencyList do • When to stop iterating? emit(nid m, d + w(n,m)) // Emit distances to reachable nodes • If all edges have weight 1, can stop as soon as no node has  distance any more reduce(nid m, [d1,d2,…]) dMin =  ; M =  – Can detect this with Hadoop counter for all d in [d1,d2,…] do • Number of iterations depends on graph diameter if isNode(d) then M = d // Recover graph structure – In practice, many networks show the small-world else if d < dMin then // Look for min distance in list phenomenon, e.g., six degrees of separation dMin = d if dMin < M.distance // N eeded to avoid overwriting of source node’s distance M.distance = dMin // Update node’s shortest distance emit(nid m, node M) 21 22 Dealing With Diverse Edge Weights MapReduce Algorithm Analysis • “Detour” path can be shorter than “direct” connection, • Brute-force approach that performs many hence cannot stop as soon as all node distances are irrelevant computations finite • Stop when no node’s shortest distance changes any – Computes distances for nodes that still have more infinity distance – Can be detected with Hadoop counter – Repeats previous computations inside “search – Worst case: |N| iterations frontier” 1 1 1 n 6 n 7 • Dijkstra’s algorithm only explores the search n 8 10 n 9 1 frontier, but needs the priority queue n 5 n 1 1 Example from Jimmy Lin’s 1 presentation n 4 1 1 n 2 n 3 23 24

Typical Graph Processing in PageRank Introduction MapReduce • Graph represented by adjacency list per node, • Popularized by Google for evaluating the quality plus extra node data of a Web page • Map works on a single node u • Based on random Web surfer model – Node u’s local state and links only – Web surfer can reach a page by jumping to it or by • Node v in u’s adjacency list is intermediate key following the link from another page pointing to it – Passes results of computation along outgoing edges • Reduce combines partial results for each – Modeled as random process destination node • Intuition: important pages are linked from many • Map also passes graph itself to reducers other (important) pages • Driver program controls execution of iterations – Goal: find pages with greatest probability of access 25 26 PageRank Definition Computing PageRank • PageRank of page n: • Similar to BFS for shortest path – 𝑄 𝑜 = 𝛽 1 𝑄(𝑛) • Computing P(n) only requires P(m) and C(m) |𝑊| + (1 − 𝛽) 𝑛∈𝑀(𝑜) 𝐷(𝑛) for all pages linking to n – |V| is number of pages (nodes) –  is probability of random jump – During iteration, distribute P(m) evenly over – L(n) is the set of pages linking to n outlinks – P(m) is m’s PageRank – Then add contributions over all of n’s inlinks – C(m) is m’s out -degree • Initialization: any probability distribution over • Definition is recursive the nodes – Compute by iterating until convergence (fixpoint) 27 28 PageRank Example PageRank Example Iteration 2 Iteration 1 n 2 (0.2) n 2 (0.166) n 2 (0.166) n 2 (0.133) 0.1 0.033 0.083 n 1 (0.2) 0.1 0.1 n 1 (0.066) n 1 (0.066) 0.083 n 1 (0.1) 0.033 0.1 0.066 0.1 0.066 0.1 0.066 0.1 n 5 (0.2) n 5 (0.3) n 5 (0.3) n 5 (0.383) n 3 (0.2) n 3 (0.166) n 3 (0.166) n 3 (0.183) 0.2 0.2 0.3 0.166 n 4 (0.2) n 4 (0.3) n 4 (0.3) n 4 (0.2) Source: Jimmy Lin’s presentation 29 30

Why Graphs? Discussion is based on the book and slides by Let us - PDF document

Why Graphs? Discussion is based on the book and slides by Let us now look at implementing graph Jimmy Lin and Chris Dyer algorithms in MapReduce. Analyze hyperlink structure of the Web Social networks Facebook friendships, Twitter

Graphs () Graphs () Graphs Graphs Graphs are collections of nodes

Graphs Graphs Examples Definitions Implementation/Representation of graphs Graphs

Graphs Graphs Simple graphs Algorithms Depth-first search Breadth-first search

Weighted graphs 2 Weighted graphs So far we have only considered weighted graphs with

Graphs Graph definitions There are two kinds of graphs: directed graphs (sometimes called

Examples of Obstructions to Apex Graphs, Edge-Apex Graphs, and Contraction-Apex Graphs

CS200: Graphs Prichard Ch. 14 Rosen Ch. 10 CS200 - Graphs 1 Graphs A collection of What can

Graphs 3 4 8 1 ORD SFO 802 1743 337 1 2 3 3 LAX DFW Outline / Reading Graphs (6.1)

Weighted graphs Weighted graphs Weighted graphs Weighted graphs Graphs with numbers, called

Advertisement! CSE 528 Computational Neuroscience now open to undergraduates How does the

Graphs Outline and Reading Graphs (12.1) Definition Applications Terminology

STACKED GRAPHS STACKED GRAPHS EVOLUTION OF STACKED GRAPHS Stacked Area Chart Themeriver

Graphs Graphs Definitions Implementation/Representation of graphs Search Traversing

Weighted Graphs Weighted graphs: graphs for which each edge has an Minimum Spanning Trees

Examples of Obstructions to Apex Graphs, Edge-Apex Graphs, and Contraction-Apex Graphs Mike

House of Graphs: Introduction what are interesting graphs? GraPHedron First Definition of

Data and Graphs Creating a Pictograph Bar Graphs Creating a Bar Graph Line Plots

Graphs Graphs Simple graphs Algorithms Depth-first search Breadth-first search

Graphs Graphs Simple graphs Algorithms Depth-first search Breadth-first search

Total graph coherent configuration: New graphs from Moore graphs Leif K. Jrgensen Aalborg

Graphs Graphs Definitions Implementation/Representation of graphs Search

CSE 421: Introduction to Algorithms Induction)* Graphs Shayan&Oveis&Gharan 1 Graphs

Graphs 15-110 Monday 10/19 Learning Goals Identify core parts of graphs , including

Graphs Graphs What is connected to what Many things we deal with in computer science are

Why Graphs? Discussion is based on the book and slides by Let us - PDF document

Why Graphs? Discussion is based on the book and slides by Let us now look at implementing graph Jimmy Lin and Chris Dyer algorithms in MapReduce. Analyze hyperlink structure of the Web Social networks Facebook friendships, Twitter

Graphs () Graphs () Graphs Graphs Graphs are collections of nodes

Graphs Graphs Examples Definitions Implementation/Representation of graphs Graphs

Graphs Graphs Simple graphs Algorithms Depth-first search Breadth-first search

Weighted graphs 2 Weighted graphs So far we have only considered weighted graphs with

Graphs Graph definitions There are two kinds of graphs: directed graphs (sometimes called

Examples of Obstructions to Apex Graphs, Edge-Apex Graphs, and Contraction-Apex Graphs

CS200: Graphs Prichard Ch. 14 Rosen Ch. 10 CS200 - Graphs 1 Graphs A collection of What can

Graphs 3 4 8 1 ORD SFO 802 1743 337 1 2 3 3 LAX DFW Outline / Reading Graphs (6.1)

Weighted graphs Weighted graphs Weighted graphs Weighted graphs Graphs with numbers, called

Advertisement! CSE 528 Computational Neuroscience now open to undergraduates How does the

Graphs Outline and Reading Graphs (12.1) Definition Applications Terminology

STACKED GRAPHS STACKED GRAPHS EVOLUTION OF STACKED GRAPHS Stacked Area Chart Themeriver

Graphs Graphs Definitions Implementation/Representation of graphs Search Traversing

Weighted Graphs Weighted graphs: graphs for which each edge has an Minimum Spanning Trees

Examples of Obstructions to Apex Graphs, Edge-Apex Graphs, and Contraction-Apex Graphs Mike

House of Graphs: Introduction what are interesting graphs? GraPHedron First Definition of

Data and Graphs Creating a Pictograph Bar Graphs Creating a Bar Graph Line Plots

Graphs Graphs Simple graphs Algorithms Depth-first search Breadth-first search

Graphs Graphs Simple graphs Algorithms Depth-first search Breadth-first search

Total graph coherent configuration: New graphs from Moore graphs Leif K. Jrgensen Aalborg

Graphs Graphs Definitions Implementation/Representation of graphs Search

CSE 421: Introduction to Algorithms Induction)* Graphs Shayan&amp;Oveis&amp;Gharan 1 Graphs

Graphs 15-110 Monday 10/19 Learning Goals Identify core parts of graphs , including

Graphs Graphs What is connected to what Many things we deal with in computer science are

CSE 421: Introduction to Algorithms Induction)* Graphs Shayan&Oveis&Gharan 1 Graphs