Graph Algorithms for Community Detection & Recommendations Mark Needham & Amy Hodler, Neo4j
Mark Needham Amy Hodler @markhneedham @amyhodler Analytics & AI Programs Neo4j Labs Engineer 2
Graph Algorithms for Community Detection & Recommendations • Graph Algorithms • Neo4j Social Network • Finding Influencers • Identifying Communities Investigating the Graph Community 3
What are Graph Analytics and Algorithms? 4
Query (e.g. Cypher/Python) Graph Algorithms Libraries Real-time, local decisioning Global analysis and pattern matching and iterations Local Global Patterns Computation You know what you’re looking You’re learning the overall structure of a for and making a decision network, updating data, and predicting
Don’t Need Graph Algorithms to Answer . . . Questions with just a few connections or flat (not nested) • Questions solved with specific, well-crafted queries • Simple statistical results (sums, averages, ratios) • Example: • Regular reporting based on defined criteria and • well-organized data 6
What Do People Do with Graph Algorithms? 7
Understand & Predict Complex Behavior Propagation Flow & Interactions & Pathways Dynamics Resiliency Requires Understanding Relationships and Structures
Using Graph Algorithms Machine Learning Explore, Plan, Measure Find significant patterns and plan Use the measures as features to for optimal structures train an ML model 1st 2nd Common Preferential label Node Node Neighbors Attachment 1 2 4 15 1 Score outcomes and set a threshold 3 4 7 12 1 value for a prediction 5 6 1 1 0
Neo4j Graph Algorithms Library 10
+ 45 Graph & ML Algorithms in Neo4j Pathfinding Centrality / Community & Search Importance Detection Finds optimal paths Determines the Detects group or evaluates route importance of distinct clustering or partition availability and quality nodes in the network options neo4j.com/ Estimates the likelihood Evaluates how graph-algorithms- of nodes forming a alike nodes are future relationship book/ Link Similarity Prediction
Graph and ML Algorithms in Neo4j Pathfinding Community Centrality / & Search Detection Importance • Parallel Breadth First Search & • Degree Centrality • Triangle Count DFS • Closeness Centrality • Clustering Coefficients • Shortest Path • CC Variations: Harmonic, Dangalchev, • Connected Components (Union Find) • Single-Source Shortest Path Wasserman & Faust • Strongly Connected Components • All Pairs Shortest Path • Betweenness Centrality • Label Propagation • Minimum Spanning Tree • Approximate Betweenness Centrality • Louvain Modularity – 1 Step & • A* Shortest Path • PageRank Multi-Step • Yen’s K Shortest Path • Personalized PageRank • Balanced Triad (identification) • K-Spanning Tree (MST) • ArticleRank • Random Walk • Eigenvector Centrality Link Similarity Prediction • Adamic Adar • Euclidean Distance • Common Neighbors • Cosine Similarity • Preferential Attachment • Jaccard Similarity neo4j.com/docs/ • Resource Allocations • Overlap Similarity graph-algorithms/current/ • Same Community • Pearson Similarity • Total Neighbors Updated April 2019
How To… Pathfinding 1. Call as Cypher procedure & Search 2. Pass in specification (Label, Prop, Query) and Community configuration Detection Centrality / 3. stream variant returns (a lot) of results Importance CALL algo.<name>.stream('Label','TYPE',{conf}) YIELD nodeId, score Similarity 4. non-stream variant writes results to graph returns statistics Link Prediction CALL algo.<name>('Label','TYPE',{conf})
Cypher Projection Pass in Cypher statement for node- and relationship-lists. CALL algo.<name>( 'MATCH ... RETURN id(n)', 'MATCH (n)-->(m) RETURN id(n) as source, id(m) as target', {graph:'cypher'})
Cypher Projection Example Russian Twitter Trolls https://www.nbcnews.com/pages/author/ben-popken 15
Inferred Relationships AMPLIFIED
PageRank on Inferred AMPLIFIED Graph CALL algo.pageRank( "MATCH (t:Troll) RETURN id(t) AS id", "MATCH (r1:Troll)-[:POSTED]->(:Tweet)<-[:RETWEETED]- (:Tweet)<-[:POSTED]-(r2:Troll) RETURN id(r2) as source, id(r1) as target", {graph:'cypher'}) https://www.nbcnews.com/tech/social-media/russian-trolls-went-a ttack-during-key-election-moments-n827176
How does it work? 1) Read projected graph 2) Load projected graph Procedures 3) Execute algorithm 4) Store results Execute 3 algorithm In Memory Graph Projected Graph Loader Load projected graph Store Read projected 2 4 1 results graph Neo4j Everything is concurrent
Architecture Considerations Parallelization - everything, leverage lots of CPUs ● Community Edition restricted to 4 cores! ○ Memory ● Need enough heap to fit projected graph in memory ○ Memory requirements vary per algorithm ○ Causal Clusters ● Do not run graph algos on core members ○ Streaming method only for read replicas ○ Consider snapshot ○
Enter the NEuler 20
install.graphapp.io 21
Investigating the Neo4j Social Graph 22
Neo4j Twitter Graph
Twint: Twitter scraping tool
Neo4j Twitter Graph
Centrality Algorithms Determines the importance of distinct nodes in the network Developed for distinct uses or types of importance.
Degree Centrality
Degree Centrality Measures the number of direct Tip / Caution relationships This is the simplest of the centrality algorithms. Can measure in-degree, out-degree, or both. When globally averaged, it can be In-Degree skewed by supernodes. Other algorithms are better for determining influence over more than just direct neighbors.
Degree Centrality - Uses Use When Understanding immediate connectedness or direct influence Popularity & Gregariousness Quick estimation of network densities such as min/max Likelihood of Flu and mean degrees Individual probabilities
Degree Centrality 31
PageRank
PageRank Tip / Caution Test your dampening factor as it will change results (default works well for power law distributions.) Spark uses a inverse dampening factor resetProbability=0.15 is equal to dampingFactor:0.85 in Neo4j Measures the transitive (directional) influence of and other libraries nodes and considers the influence of neighbors and their neighbors Careful with mixing node types
Personalized PageRank Calculation CALL algo.pageRank('Page', 'LINKS', {iterations:20, dampingFactor:0.85, sourceNodes: [siteA]}) Nodes Linking To -> “u” Node Being Ranked Outdegree of that Node Dampening Factor
PageRank - Uses Use When Recommendations Anytime you’re looking for Who To Follow with broad influence over a personalized PR network Many domain specific variations Fraud Detection for differing analysis, e.g. Feature engineering Personalized PageRank for for machine personalized recommendations learning
PageRank 36
Betweenness Centrality
Betweenness Centrality The sum of the % Tip / Caution shortest paths that Computationally intensive: use pass through a node, RA Brandes approximation on calculated by pairs large graphs. Assumes all communication between nodes happens along the shortest path and with the same frequency (not always the case in real life)
Betweenness Centrality Node D Calculation 0 Pairs with Total Possible % of Total Shortest Paths Shortest Paths for Through D B Through D that Pair (1/Total) 3.5 0 0.5 A,E 1 1 A E A D B,E 1 1 C,E 1 1 C B,C 2 0.5 (through D & A) 0 Betweenness 3.5 Score 1. For a node, find the shortest paths that go through it • B,C, E have no shortest paths and are assigned 0 value 2. For each shortest path in step one, calculate it’s percentage of the total possible shortest paths for that pair 3. Add together all the values in step two; this is a nodes Betweenness Centrality score 4. Repeat for each node
Betweenness Centrality - Uses Use When Identify bridges Uncover control points Find bottlenecks and vulnerabilities Network Resilience Key points of cascading failure
Betweenness Centrality 41
Community Detection Algorithms Evaluates how a group is clustered or partitioned Different approaches to define a community
Louvain Modularity
Louvain Modularity 4 Tip / Caution 14 4 ALL Modularity algorithms: 1 1 • Merge smaller communities 1 into larger ones 2 2 • Review intermediates 14 • Can plateau with similar modularity on several partitions - forming local maxima & stalling progress Continually maximizes the modularity by • Treat as a guide and comparing relationship weights and test/validate results densities to an estimate /average
Louvain Modularity - Uses Use When Community detection in large networks Uncover hierarchical structures in data Evaluate different grouping thresholds Understanding the Brain Mapping hierarchy of functions
Recommend
More recommend