DiCeS: Detecting Communities in Network Streams Over the Cloud Panagiotis Liakos † - Katia Papakonstantinopoulou ‡ Alexandros Ntoulas † - Alex Delis † † University of Athens ‡ Athens University of Economics and Business 12 th IEEE International Conference on Cloud Computing, Milan, Italy July 8 th –13 th , 2019
Belgian Mobile Phone Network Fast unfolding of community hierarchies in large networks: Blondel et al. UoA Panagiotis Liakos DiCeS- • Motivation 2/24
Belgian Mobile Phone Network two large clusters of communities Fast unfolding of community hierarchies in large networks: Blondel et al. UoA Panagiotis Liakos DiCeS- • Motivation 2/24
Belgian Mobile Phone Network two large clusters l i m i t e of communities d i n b t e e r t a w c e t e i n o n c l u s t e r s ! Fast unfolding of community hierarchies in large networks: Blondel et al. UoA Panagiotis Liakos DiCeS- • Motivation 2/24
Belgian Mobile Phone Network two large clusters l i m i t e of communities d i n b t e e r t a w c e t e i n o n c l u s t e r s ! Brussels acts as a bridge! Fast unfolding of community hierarchies in large networks: Blondel et al. UoA Panagiotis Liakos DiCeS- • Motivation 2/24
Climate change conversation on Twitter carbonbrief.org UoA Panagiotis Liakos DiCeS- • Motivation 3/24
Climate change conversation on Twitter real-world networks are massive! carbonbrief.org UoA Panagiotis Liakos DiCeS- • Motivation 3/24
Climate change conversation on Twitter real-world networks c h a n g e r a p i d l y ! are massive! carbonbrief.org UoA Panagiotis Liakos DiCeS- • Motivation 3/24
Climate change conversation on Twitter real-world networks c h a n g e r a p i d l y ! are massive! exhibit community structure! carbonbrief.org UoA Panagiotis Liakos DiCeS- • Motivation 3/24
Motivation We want to extract the community structure of nodes in a network that changes rapidly. Many useful applications: we can launch accurate & successful advertising campaigns we can provide more informative & engaging social network feeds we can gain insights on the evolution of large real-world networks Size of graph data appears to be ever-increasing: Facebook has more than 2 billion registered users Google indexes more than 1 trillion unique URLs UoA Panagiotis Liakos DiCeS- • Motivation 4/24
Prior Contribution CoEuS [LND17] IEEE Big Data 2017 A novel community detection algorithm that operates on a graph stream, using space sublinear to the number of edges. Additionally: A PageRank-like A Novel Clustering Technique Edge Quality Variation for Community Size Determination UoA Panagiotis Liakos DiCeS- • Our Approach 5/24
CoEuS ’ context . . . 2 3 Graph stream 8 9 Communities initialized with seed-sets 1 5 4 6 8 2 7 3 UoA Panagiotis Liakos DiCeS- • Our Approach 6/24
CoEuS ’ context . . . 2 3 centralized by design Graph stream 8 9 Communities initialized with seed-sets 1 5 4 6 8 2 7 3 UoA Panagiotis Liakos DiCeS- • Our Approach 6/24
DiCeS ’ context Worker node 8 6 7 . . . 3 9 8 Worker node 2 9 5 9 2 5 2 3 . Worker node . 3 . 1 4 7 5 7 Worker node UoA Panagiotis Liakos DiCeS- • Our Approach 7/24
Our Contribution We propose DiCeS , a novel distributed community detection algorithm for network streams. We implement DiCeS as a cloud application that handles streams of real-world networks at impressive rates. Using just 8 workers we can handle 50 million edges per hour. We achieve horizontal scalability that is close to linear. We offer significant improvements with regard to accuracy. UoA Panagiotis Liakos DiCeS- • Our Approach 8/24
Apache Storm Apache Storm: Stream processing framework with broad use in production environments. Tuple : fundamental data unit Spout : source of tuples Bolt : responsible for transforming streams into the desired result Grouping : determines how the tuples are exchanged UoA Panagiotis Liakos DiCeS- • Technologies Involved 9/24
Redis Redis: In-memory key-value data store. Ultra-fast read/write operations Complex data types: Strings Sets Sorted Sets Redis-cluster UoA Panagiotis Liakos DiCeS- • Technologies Involved 10/24
Design Principles Scalability Isolate the processing for every edge Distributed key-value store Fault Tolerance All edges must be processed Failing nodes must be restored Interactivity Updating the target communities Obtaining results on demand UoA Panagiotis Liakos DiCeS- • Technologies Involved 11/24
DiCeS’ Spout Community initialization Stream ingestion UoA Panagiotis Liakos DiCeS- • Cloud Components 12/24
DiCeS’ Bolts Stream processing Community expansion Community pruning UoA Panagiotis Liakos DiCeS- • Cloud Components 13/24
Our topology Processing Distributed Bolt key-value store (Redis Cluster) Processing Network Bolt stream Pruning . Spout Processing Bolt . Bolt . Community seed-sets Processing Bolt UoA Panagiotis Liakos DiCeS- • Cloud Components 14/24
Our topology Processing Distributed Bolt key-value store (Redis Cluster) Processing Network Bolt stream Pruning . Spout Processing Bolt . Bolt . Community seed-sets Processing Bolt $ storm rebalance topology-name [-n new-num-workers] [-e component=parallelism]* UoA Panagiotis Liakos DiCeS- • Cloud Components 14/24
DiCeS’ Bolt Algorithm 1: DiCeS input : A tuple emitted from the spout. begin if tuple .length == 1 then // renewed set of communities communities ← tuple [0] ; else // handling of an edge u ← tuple [0] ; v ← tuple [1] ; degrees [ u ]+ = 1 ; degrees [ v ]+ = 1 ; foreach C ∈ { nc [ u ] ∪ nc [ v ] } do if u ∈ C then cDegrees [ C ][ v ]+ = cDegrees [ C ][ u ] ; degrees [ u ] if v ∈ C then cDegrees [ C ][ u ]+ = cDegrees [ C ][ v ] ; degrees [ v ] if u ∈ C then communities [ C ] .put ( v, cDegrees [ C ][ v ] ) ; degrees [ v ] nc [ v ] .add ( C ) ; if v ∈ C then communities [ C ] .put ( u, cDegrees [ C ][ u ] ) ; degrees [ u ] nc [ u ] .add ( C ) ; emit(1); UoA Panagiotis Liakos DiCeS- • Cloud Components 15/24
Dataset Graphs Type Nodes Edges Av. Degree Av. Community Size Co-authorship DBLP 317 , 080 1 , 049 , 866 3 . 31 22 . 45 Amazon Co-purchasing 334 , 863 925 , 872 2 . 76 13 . 49 Youtube Social 1 , 134 , 890 2 , 987 , 624 2 . 63 14 . 59 Social LiveJournal 3 , 997 , 962 34 , 681 , 189 8 . 67 27 . 80 Orkut Social 3 , 072 , 441 117 , 185 , 083 38 . 14 215 . 72 Friendster Social 65 , 608 , 366 1 , 806 , 067 , 135 27 . 53 46 . 81 Networks exceeding 1 . 8 billion links Accompanying ground-truth communities allow for the evaluation of accuracy UoA Panagiotis Liakos DiCeS- • Experimental Evaluation 16/24
Performance 2 bolts 600 4 bolts Average Processing Time per Edge 8 bolts 500 400 300 200 100 0 A D Y L O F o i r m B r v i u k e L e a t u n P u J z o t d o b u s e n r t n e a r l UoA Panagiotis Liakos DiCeS- • Experimental Evaluation 17/24 Network
Performance 2 bolts 600 4 bolts Average Processing Time per Edge 8 bolts 500 we can reduce our 400 processing time by adding bolts 300 200 100 0 A D Y L O F o i r m B r v i u k e L e a t u n P u J z o t d o b u s e n r t n e a r l UoA Panagiotis Liakos DiCeS- • Experimental Evaluation 17/24 Network
Scalability 600 Execution time (in s ) 500 400 300 200 100 0 Pending tuples (in thousands) 20 15 2 4 10 Worker nodes 8 5 UoA Panagiotis Liakos DiCeS- • Experimental Evaluation 18/24
Scalability maximum al- lowed pending tuples impacts 600 the performance Execution time (in s ) 500 400 300 200 100 0 Pending tuples (in thousands) 20 15 2 4 10 Worker nodes 8 5 UoA Panagiotis Liakos DiCeS- • Experimental Evaluation 18/24
Scalability 600 Execution time (in s ) 500 400 300 200 100 0 DiCeS offers Pending tuples (in thousands) 20 near-linear scaling! 15 2 4 10 Worker nodes 8 5 UoA Panagiotis Liakos DiCeS- • Experimental Evaluation 18/24
Fault Tolerance 10 8 Processing time (in sec) 6 4 2 0 0 200 400 600 800 1000 Total edges processed (in thousands) UoA Panagiotis Liakos DiCeS- • Experimental Evaluation 19/24
Fault Tolerance 10 DiCeS recovers 8 Processing time (in sec) its speed almost immediately 6 4 2 0 0 200 400 600 800 1000 Total edges processed (in thousands) UoA Panagiotis Liakos DiCeS- • Experimental Evaluation 19/24
Average Degree & Number of Communities 700 CoEuS DiCeS (8 bolts) Average Processing Time Per Edge 600 500 400 300 200 100 0 Degree:10, Comm:2K Degree:10, Comm:4K Degree:20, Comm:2K Degree:20, Comm:4K UoA Panagiotis Liakos DiCeS- • Experimental Evaluation 20/24
Average Degree & Number of Communities 700 CoEuS DiCeS (8 bolts) Average Processing Time Per Edge 600 500 400 less impact for DiCeS 300 200 100 0 Degree:10, Comm:2K Degree:10, Comm:4K Degree:20, Comm:2K Degree:20, Comm:4K UoA Panagiotis Liakos DiCeS- • Experimental Evaluation 20/24
Recommend
More recommend