graph algorithms for community detection recommendations
play

Graph Algorithms for Community Detection & Recommendations Mark - PowerPoint PPT Presentation

Graph Algorithms for Community Detection & Recommendations Mark Needham & Amy Hodler, Neo4j Mark Needham Amy Hodler @markhneedham @amyhodler Analytics & AI Programs Neo4j Labs Engineer 2 Graph Algorithms for Community


  1. Graph Algorithms for Community Detection & Recommendations Mark Needham & Amy Hodler, Neo4j

  2. Mark Needham Amy Hodler @markhneedham @amyhodler Analytics & AI Programs Neo4j Labs Engineer 2

  3. Graph Algorithms for Community Detection & Recommendations • Graph Algorithms • Neo4j Social Network • Finding Influencers • Identifying Communities Investigating the Graph Community 3

  4. What are Graph Analytics and Algorithms? 4

  5. Query (e.g. Cypher/Python) Graph Algorithms Libraries Real-time, local decisioning Global analysis and pattern matching and iterations Local Global Patterns Computation You know what you’re looking You’re learning the overall structure of a for and making a decision network, updating data, and predicting

  6. Don’t Need Graph Algorithms to Answer . . . Questions with just a few connections or flat (not nested) • Questions solved with specific, well-crafted queries • Simple statistical results (sums, averages, ratios) • Example: • Regular reporting based on defined criteria and • well-organized data 6

  7. What Do People Do with Graph Algorithms? 7

  8. Understand & Predict Complex Behavior Propagation Flow & Interactions & Pathways Dynamics Resiliency Requires Understanding Relationships and Structures

  9. Using Graph Algorithms Machine Learning Explore, Plan, Measure Find significant patterns and plan Use the measures as features to for optimal structures train an ML model 1st 2nd Common Preferential label Node Node Neighbors Attachment 1 2 4 15 1 Score outcomes and set a threshold 3 4 7 12 1 value for a prediction 5 6 1 1 0

  10. Neo4j Graph Algorithms Library 10

  11. + 45 Graph & ML Algorithms in Neo4j Pathfinding Centrality / Community & Search Importance Detection Finds optimal paths Determines the Detects group or evaluates route importance of distinct clustering or partition availability and quality nodes in the network options neo4j.com/ Estimates the likelihood Evaluates how graph-algorithms- of nodes forming a alike nodes are future relationship book/ Link Similarity Prediction

  12. Graph and ML Algorithms in Neo4j Pathfinding Community Centrality / & Search Detection Importance • Parallel Breadth First Search & • Degree Centrality • Triangle Count DFS • Closeness Centrality • Clustering Coefficients • Shortest Path • CC Variations: Harmonic, Dangalchev, • Connected Components (Union Find) • Single-Source Shortest Path Wasserman & Faust • Strongly Connected Components • All Pairs Shortest Path • Betweenness Centrality • Label Propagation • Minimum Spanning Tree • Approximate Betweenness Centrality • Louvain Modularity – 1 Step & • A* Shortest Path • PageRank Multi-Step • Yen’s K Shortest Path • Personalized PageRank • Balanced Triad (identification) • K-Spanning Tree (MST) • ArticleRank • Random Walk • Eigenvector Centrality Link Similarity Prediction • Adamic Adar • Euclidean Distance • Common Neighbors • Cosine Similarity • Preferential Attachment • Jaccard Similarity neo4j.com/docs/ • Resource Allocations • Overlap Similarity graph-algorithms/current/ • Same Community • Pearson Similarity • Total Neighbors Updated April 2019

  13. How To… Pathfinding 1. Call as Cypher procedure & Search 2. Pass in specification (Label, Prop, Query) and Community configuration Detection Centrality / 3. stream variant returns (a lot) of results Importance CALL algo.<name>.stream('Label','TYPE',{conf}) YIELD nodeId, score Similarity 4. non-stream variant writes results to graph returns statistics Link Prediction CALL algo.<name>('Label','TYPE',{conf})

  14. Cypher Projection Pass in Cypher statement for node- and relationship-lists. CALL algo.<name>( 'MATCH ... RETURN id(n)', 'MATCH (n)-->(m) RETURN id(n) as source, id(m) as target', {graph:'cypher'})

  15. Cypher Projection Example Russian Twitter Trolls https://www.nbcnews.com/pages/author/ben-popken 15

  16. Inferred Relationships AMPLIFIED

  17. PageRank on Inferred AMPLIFIED Graph CALL algo.pageRank( "MATCH (t:Troll) RETURN id(t) AS id", "MATCH (r1:Troll)-[:POSTED]->(:Tweet)<-[:RETWEETED]- (:Tweet)<-[:POSTED]-(r2:Troll) RETURN id(r2) as source, id(r1) as target", {graph:'cypher'}) https://www.nbcnews.com/tech/social-media/russian-trolls-went-a ttack-during-key-election-moments-n827176

  18. How does it work? 1) Read projected graph 2) Load projected graph Procedures 3) Execute algorithm 4) Store results Execute 3 algorithm In Memory Graph Projected Graph Loader Load projected graph Store Read projected 2 4 1 results graph Neo4j Everything is concurrent

  19. Architecture Considerations Parallelization - everything, leverage lots of CPUs ● Community Edition restricted to 4 cores! ○ Memory ● Need enough heap to fit projected graph in memory ○ Memory requirements vary per algorithm ○ Causal Clusters ● Do not run graph algos on core members ○ Streaming method only for read replicas ○ Consider snapshot ○

  20. Enter the NEuler 20

  21. install.graphapp.io 21

  22. Investigating the Neo4j Social Graph 22

  23. Neo4j Twitter Graph

  24. Twint: Twitter scraping tool

  25. Neo4j Twitter Graph

  26. Centrality Algorithms Determines the importance of distinct nodes in the network Developed for distinct uses or types of importance.

  27. Degree Centrality

  28. Degree Centrality Measures the number of direct Tip / Caution relationships This is the simplest of the centrality algorithms. Can measure in-degree, out-degree, or both. When globally averaged, it can be In-Degree skewed by supernodes. Other algorithms are better for determining influence over more than just direct neighbors.

  29. Degree Centrality - Uses Use When Understanding immediate connectedness or direct influence Popularity & Gregariousness Quick estimation of network densities such as min/max Likelihood of Flu and mean degrees Individual probabilities

  30. Degree Centrality 31

  31. PageRank

  32. PageRank Tip / Caution Test your dampening factor as it will change results (default works well for power law distributions.) Spark uses a inverse dampening factor resetProbability=0.15 is equal to dampingFactor:0.85 in Neo4j Measures the transitive (directional) influence of and other libraries nodes and considers the influence of neighbors and their neighbors Careful with mixing node types

  33. Personalized PageRank Calculation CALL algo.pageRank('Page', 'LINKS', {iterations:20, dampingFactor:0.85, sourceNodes: [siteA]}) Nodes Linking To -> “u” Node Being Ranked Outdegree of that Node Dampening Factor

  34. PageRank - Uses Use When Recommendations Anytime you’re looking for Who To Follow with broad influence over a personalized PR network Many domain specific variations Fraud Detection for differing analysis, e.g. Feature engineering Personalized PageRank for for machine personalized recommendations learning

  35. PageRank 36

  36. Betweenness Centrality

  37. Betweenness Centrality The sum of the % Tip / Caution shortest paths that Computationally intensive: use pass through a node, RA Brandes approximation on calculated by pairs large graphs. Assumes all communication between nodes happens along the shortest path and with the same frequency (not always the case in real life)

  38. Betweenness Centrality Node D Calculation 0 Pairs with Total Possible % of Total Shortest Paths Shortest Paths for Through D B Through D that Pair (1/Total) 3.5 0 0.5 A,E 1 1 A E A D B,E 1 1 C,E 1 1 C B,C 2 0.5 (through D & A) 0 Betweenness 3.5 Score 1. For a node, find the shortest paths that go through it • B,C, E have no shortest paths and are assigned 0 value 2. For each shortest path in step one, calculate it’s percentage of the total possible shortest paths for that pair 3. Add together all the values in step two; this is a nodes Betweenness Centrality score 4. Repeat for each node

  39. Betweenness Centrality - Uses Use When Identify bridges Uncover control points Find bottlenecks and vulnerabilities Network Resilience Key points of cascading failure

  40. Betweenness Centrality 41

  41. Community Detection Algorithms Evaluates how a group is clustered or partitioned Different approaches to define a community

  42. Louvain Modularity

  43. Louvain Modularity 4 Tip / Caution 14 4 ALL Modularity algorithms: 1 1 • Merge smaller communities 1 into larger ones 2 2 • Review intermediates 14 • Can plateau with similar modularity on several partitions - forming local maxima & stalling progress Continually maximizes the modularity by • Treat as a guide and comparing relationship weights and test/validate results densities to an estimate /average

  44. Louvain Modularity - Uses Use When Community detection in large networks Uncover hierarchical structures in data Evaluate different grouping thresholds Understanding the Brain Mapping hierarchy of functions

Recommend


More recommend