Big graphs for big data: parallel matching and Outline clustering on billion-vertex graphs Matching Introduction Greedy Parallelisable BSP algorithm GPU algorithm Rob H. Bisseling Clustering Introduction Sequential Mathematical Institute, Utrecht University Results Conclusion Collaborators: Bas Fagginger Auer, Fredrik Manne, Albert-Jan Yzelman Asia-trip A-Eskwadraat, July 2014 1
Graph Matching Introduction Greedy algorithm Outline Parallelisable 1/2-approximation algorithm Matching Introduction BSP algorithm Greedy Parallelisable GPU algorithm BSP algorithm GPU algorithm Results Clustering Introduction Sequential Clustering Results Conclusion Introduction Sequential algorithm GPU algorithm Results Conclusion 2
Matching can win you a Nobel prize Outline Matching Introduction Greedy Parallelisable BSP algorithm GPU algorithm Clustering Introduction Sequential Results Conclusion Source: Slate magazine October 15, 2012 3
Motivation of graph matching Outline Matching Introduction ◮ Graph matching is a pairing of neighbouring vertices. Greedy Parallelisable ◮ It has applications in BSP algorithm GPU algorithm • medicine: finding suitable donors for organs Clustering • social networks: finding partners Introduction Sequential • scientific computing: finding pivot elements in matrix Results Conclusion computations • graph coarsening: making the graph smaller by merging similar vertices 4
Motivation of greedy/approximation graph matching Outline Matching ◮ Optimal solution is possible in polynomial time. Introduction Greedy ◮ Time for weighted matching in graph G = ( V , E ) is Parallelisable BSP algorithm O ( mn + n 2 log n ) with n = | V | the number of vertices, GPU algorithm Clustering and m = | E | the number of edges (Gabow 1990). Introduction Sequential ◮ The aim is a billion vertices, n = 10 9 , with 100 edges per Results vertex, i.e. m = 10 11 . Conclusion ◮ Thus, a time of O (10 20 ) = 100 , 000 Petaflop units is far too long. Fastest supercomputer today, the Chinese Tianhe-2 (Milky-Way 2), performs 33.8 Petaflop/s. ◮ We need linear-time greedy or approximation algorithms. 5
Formal definition of graph matching Outline Matching Introduction Greedy ◮ A graph is a pair G = ( V , E ) with vertices V and edges E . Parallelisable BSP algorithm GPU algorithm ◮ All edges e ∈ E are of the form e = ( v , w ) for vertices Clustering v , w ∈ V . Introduction Sequential ◮ A matching is a collection M ⊆ E of disjoint edges. Results Conclusion ◮ Here, the graph is undirected, so ( v , w ) = ( w , v ). 6
Maximal matching Outline Matching Introduction Greedy Parallelisable BSP algorithm GPU algorithm Clustering Introduction Sequential Results Conclusion ◮ A matching is maximal if we cannot enlarge it further by adding another edge to it. 7
Maximum matching Outline Matching Introduction Greedy Parallelisable BSP algorithm GPU algorithm Clustering Introduction Sequential Results Conclusion ◮ A matching is maximum if it possesses the largest possible number of edges, compared to all other matchings. 8
Edge-weighted matching Outline Matching ◮ If the edges are provided with weights ω : E → R > 0 , Introduction Greedy finding a matching M which maximises Parallelisable BSP algorithm GPU algorithm � ω ( M ) = ω ( e ) , Clustering Introduction Sequential e ∈ M Results Conclusion is called edge-weighted matching. ◮ Greedy matching provides us with maximal matchings, but not necessarily with maximum possible weight. 9
Sequential greedy matching Outline ◮ In random order, vertices v ∈ V select and match Matching Introduction neighbours one-by-one. Greedy Parallelisable ◮ Here, we can pick BSP algorithm GPU algorithm • the first available neighbour w of v , Clustering greedy random matching Introduction Sequential • the neighbour w with maximum ω ( v , w ), Results greedy weighted matching Conclusion ◮ Or: we sort all the edges by weight, and successively match the vertices v and w of the heaviest available edge ( v , w ). This is commonly called greedy matching. 10
Sequential greedy random matching Outline 8 1 Matching 2 Introduction Greedy Parallelisable BSP algorithm GPU algorithm Clustering 4 Introduction 3 Sequential 7 Results Conclusion 9 5 6 11
Sequential greedy random matching Outline 8 1 Matching 2 Introduction Greedy Parallelisable BSP algorithm GPU algorithm Clustering 4 Introduction 3 Sequential 7 Results Conclusion 9 5 6 11
Sequential greedy random matching Outline 8 1 Matching 2 Introduction Greedy Parallelisable BSP algorithm GPU algorithm Clustering 4 Introduction 3 Sequential 7 Results Conclusion 9 5 6 11
Sequential greedy random matching Outline 8 1 Matching 2 Introduction Greedy Parallelisable BSP algorithm GPU algorithm Clustering 4 Introduction 3 Sequential 7 Results Conclusion 9 5 6 11
Sequential greedy random matching Outline 8 1 Matching 2 Introduction Greedy Parallelisable BSP algorithm GPU algorithm Clustering 4 Introduction 3 Sequential 7 Results Conclusion 9 5 6 11
Sequential greedy random matching Outline 8 1 Matching 2 Introduction Greedy Parallelisable BSP algorithm GPU algorithm Clustering 4 Introduction 3 Sequential 7 Results Conclusion 9 5 6 11
Sequential greedy random matching Outline 8 1 Matching 2 Introduction Greedy Parallelisable BSP algorithm GPU algorithm Clustering 4 Introduction 3 Sequential 7 Results Conclusion 9 5 6 11
Sequential greedy random matching Outline 8 1 Matching 2 Introduction Greedy Parallelisable BSP algorithm GPU algorithm Clustering 4 Introduction 3 Sequential 7 Results Conclusion 9 5 6 11
Sequential greedy random matching Outline 8 1 Matching 2 Introduction Greedy Parallelisable BSP algorithm GPU algorithm Clustering 4 Introduction 3 Sequential 7 Results Conclusion 9 5 6 11
Sequential greedy random matching Outline 8 1 Matching 2 Introduction Greedy Parallelisable BSP algorithm GPU algorithm Clustering 4 Introduction 3 Sequential 7 Results Conclusion 9 5 6 11
Sequential greedy random matching Outline 8 1 Matching 2 Introduction Greedy Parallelisable BSP algorithm GPU algorithm Clustering 4 Introduction 3 Sequential 7 Results Conclusion 9 5 6 11
Sequential greedy random matching Outline 8 1 Matching 2 Introduction Greedy Parallelisable BSP algorithm GPU algorithm Clustering 4 Introduction 3 Sequential 7 Results Conclusion 9 5 6 11
Sequential greedy random matching Outline 8 1 Matching 2 Introduction Greedy Parallelisable BSP algorithm GPU algorithm Clustering 4 Introduction 3 Sequential 7 Results Conclusion 9 5 6 11
Sequential greedy random matching Outline 8 1 Matching 2 Introduction Greedy Parallelisable BSP algorithm GPU algorithm Clustering 4 Introduction 3 Sequential 7 Results Conclusion 9 5 6 11
Sequential greedy random matching Outline 8 1 Matching 2 Introduction Greedy Parallelisable BSP algorithm GPU algorithm Clustering 4 Introduction 3 Sequential 7 Results Conclusion 9 5 6 11
Sequential greedy random matching Outline 8 1 Matching 2 Introduction Greedy Parallelisable BSP algorithm GPU algorithm Clustering 4 Introduction 3 Sequential 7 Results Conclusion 9 5 6 11
Sequential greedy random matching Outline 8 1 Matching 2 Introduction Greedy Parallelisable BSP algorithm GPU algorithm Clustering 4 Introduction 3 Sequential 7 Results Conclusion 9 5 6 11
Greedy matching is a 1/2-approximation algorithm Outline Matching Introduction Greedy Parallelisable BSP algorithm ◮ Weight ω ( M ) ≥ ω optimal / 2 GPU algorithm ◮ Cardinality | M | ≥ | M card − max | / 2, because M is maximal. Clustering Introduction Sequential ◮ Time complexity is O ( m log m ), because all edges must be Results sorted. Conclusion 12
Parallel greedy matching: trouble 8 1 Outline 2 Matching Introduction Greedy Parallelisable BSP algorithm GPU algorithm 4 Clustering 3 Introduction 7 Sequential Results Conclusion 9 5 6 Suppose we match vertices simultaneously. 13
Parallel greedy matching: trouble 8 1 Outline 2 Matching Introduction Greedy Parallelisable BSP algorithm GPU algorithm 4 Clustering 3 Introduction 7 Sequential Results Conclusion 9 5 6 Two vertices each find an unmatched neighbour. . . 13
Parallel greedy matching: trouble 8 1 Outline 2 Matching Introduction Greedy Parallelisable BSP algorithm GPU algorithm 4 Clustering 3 Introduction 7 Sequential Results Conclusion 9 5 6 . . . but generate an invalid matching. 13
Parallelisable dominant-edge algorithm while E � = ∅ do pick a dominant edge ( v , w ) ∈ E Outline M := M ∪ { ( v , w ) } Matching E := E \ { ( x , y ) ∈ E : x = v ∨ x = w } Introduction Greedy V := V \ { v , w } Parallelisable BSP algorithm return M GPU algorithm Clustering ◮ An edge ( v , w ) ∈ E is dominant if Introduction Sequential Results ω ( v , w ) = max { ω ( x , y ) : ( x , y ) ∈ E ∧ ( x = v ∨ x = w ) } Conclusion 2 5 3 9 v w 6 7 6 8 14
Recommend
More recommend