hej
play

Hej! @ryguyrg ABOUT ME Developed web apps for 5 years including - PowerPoint PPT Presentation

Hej! @ryguyrg ABOUT ME Developed web apps for 5 years including e-commerce, business workflow, more. Worked at Google for 8 years on Google Apps, Cloud Platform Technologies: Python, Java, BigQuery, Oracle, MySQL, OAuth


  1. Hej! @ryguyrg

  2. ABOUT ME • Developed web apps for 5 years including e-commerce, business workflow, more. • Worked at Google for 8 years on Google Apps, Cloud Platform • Technologies: Python, Java, BigQuery, Oracle, MySQL, OAuth ryan@neo4j.com @ryguyrg

  3. Carpe Diem Data

  4. Why are YOU here today, hopefully

  5. Power of Graph Algorithms to Understand Your Data

  6. Power of Graph Algorithms to Understand Your Data

  7. Graph Algorithms on ACID

  8. Graph Algorithms on ACID Graph Algorithms + ACID-compliant native graph database

  9. IS_OFFICER_OF Acme Inc Person B D E R E T S I G E R LIVES_AT Bank Bahamas NODE Address X RELATIONSHIP LIVES_AT Bank US Person A WITH N T U O C C A _ S A H Account 123

  10. Anti Money Laundering Anti Money Laundering

  11. Product Recommendations

  12. Sports

  13. Literature

  14. Urban Planning

  15. Toxic Waste Management

  16. Historical Tooling

  17. OR OR NumPy

  18. The New World

  19. Twitter 
 Streaming API Tableau MySQL R Scripts 
 Python Tweet MongoDB -Graph Stats Rabbit MQ Collection 
 Neo4j -Community Detection (includes user data) Graph .grap Graph hml Visualization

  20. • Hit a wall with igraph/R • Need to scale graph algorithms

  21. Twitter 
 Streaming API Tableau MySQL R Scripts 
 Python Tweet MongoDB -Graph Stats Rabbit MQ Collection 
 Neo4j -Community Detection (includes user data) Graph .grap Graph hml Visualization

  22. OPTIMIZED FOR

  23. OLTP

  24. GREAT FOR

  25. Subgraph Queries

  26. WORKING ON

  27. Global Queries

  28. IN

  29. Neo4j Graph Algorithms

  30. Neo4j Cypher Query Native Graph Language Database Analytics Integrations Wide Range of APOC Procedures Optimized Graph Algorithms

  31. Finds the optimal path or evaluates route availability and quality Determines the importance of distinct nodes in the network Evaluates how a group is clustered or partitioned

  32. Usage 1.Call as Cypher procedure 2.Pass in specification (Label, Prop, Query) and configuration 3.~.stream variant returns ( a lot ) of results 
 CALL algo.<name>.stream('Label','TYPE',{conf}) 
 YIELD nodeId, score 4.non-stream variant writes results to graph 
 returns statistics 
 CALL algo.<name>('Label','TYPE',{conf})

  33. 
 What about Virtual Graphs? Pass in Cypher statement for node- and relationship-lists. 
 CALL algo.<name>( 
 'MATCH ... RETURN id(n)', 
 'MATCH (n)-->(m) 
 RETURN id(n) as source, 
 id(m) as target', {graph:'cypher'})

  34. Supported Centrality Algos • PageRank (baseline) • Betweeness • Closeness • Degree

  35. 
 Supported Centrality Algos CALL algo.pageRank.stream ('Page', 'LINKS', {iterations:20, dampingFactor:0.85}) YIELD node, score RETURN node, score ORDER BY score DESC LIMIT 20 CALL algo.pageRank('Page', 'LINKS', 
 {iterations:20, dampingFactor:0.85, write: true, writeProperty:"pagerank"}) YIELD nodes, loadMillis, computeMillis, writeMillis

  36. Supported Pathfinding Algos • Single Source Short Path • All-Nodes SSP • Parallel BFS / DFS

  37. Goal: Iterate Quickly • Combine data from sources into one graph • Project to relevant subgraphs • Enrich data with algorithms • Traverse, collect, filter aggregate 
 with queries • Visualize, Explore, Decide, Export • From all APIs and Tools

  38. A note on Performance 500 416 Neo4j is Significantly 375 Faster • 251 Seconds 250 GraphX 152 124 GraphX 125 Neo4j Neo4j 0 Union-Find (Connected Components) PageRank Spark GraphX results publicly available Neo4j Configuration Twitter 2010 Dataset • Amazon EC2 cluster running 64-bit Linux • Physical machine running 64-bit • 1.47 Billion Relationships • 128 CPUs with 68 GB of memory, 2 hard Linux • 41.65 Million Nodes • 128 CPUs with 55 GB RAM, SSDs disks

  39. What’s the Future Look Like?

  40. Improved Performance & Testing

  41. Improved Performance & Testing Scaling via Parallel Processing

  42. Scaling Across the Cluster

  43. THANK YOU! ryan@neo4j.com @ryguyrg

Recommend


More recommend