an intro to graphs stefan armbruster
play

An Intro to Graphs Stefan Armbruster Neo Technology Agenda - PowerPoint PPT Presentation

An Intro to Graphs Stefan Armbruster Neo Technology Agenda Introductjon NO-SQL context What is Neo4j? When/why should I use it? Graph Queries Cypher query language Create and query data Technical Overview


  1. An Intro to Graphs Stefan Armbruster Neo Technology

  2. Agenda • Introductjon – NO-SQL context – What is Neo4j? – When/why should I use it? • Graph Queries – Cypher query language – Create and query data • Technical Overview – Deployment modes – Java APIs – Other libraries • Case Studies • Q&A

  3. Introductjon

  4. Relatjonal all the things VOLUME COMPLEXITY VOLUME COMPLEXITY

  5. The Relatjonal Crossroads

  6. KV CF Doc Denormalise Four NOSQL Categories arising from the “ relational crossroads ” Normalise Graph

  7. Denormalise Four NOSQL Categories arising from the “ relational crossroads ” Normalise

  8. Let’s talk about graphs

  9. What is a graph? Vertjce Edge

  10. What is a graph? Node Relatjonship

  11. Meet Leonhard Euler • Swiss mathematjcian • Inventor of Graph Theory (1736) http://en.wikipedia.org/wiki/File:Leonhard_Euler_2.jpg

  12. Königsberg (Prussia) - 1736

  13. A A B B C C D D

  14. 1 A A 4 2 B B 3 C C 6 5 7 D D

  15. What are graphs good for? Complexity

  16. Data Complexity complexity = f( size , semi-structure, connectedness)

  17. Size

  18. The Real Complexity complexity = f(size , semi-structure , connectedness )

  19. Semi-Structure

  20. Semi-Structure USER_ID FIRST_NAME LAST_NAME EMAIL_1 EMAIL_2 FACEBOOK TWITTER SKYPE mark.needham@neotech 315 Mark Needham m.h.needham@gmail.com NULL @markhneedham mk_jnr1984 nology.com Email : mark.needham@neotechnology.com Email: m.h.needham@gmail.com CONTACT witter: @markhneedham T Skype: mk_jnr1984 USER CONTACT_TYPE

  21. The Real Complexity complexity = f(size , semi-structure , connectedness )

  22. Social Network

  23. Network Impact Analysis

  24. Route Finding

  25. Recommendatjons

  26. Logistjcs

  27. Access Control

  28. Fraud Analysis

  29. Neo4j is a Graph Database

  30. When Should I Use Graph Databases?? • Densely-connected, semi-structured domains – Lots of join tables? Connectedness – Lots of sparse tables? Semi-structure • Data Model Volatility • Join Complexity and Performance • Millions of ‘ joins ’ per second • Consistent query tjmes as dataset grows

  31. Graph Modeling

  32. Labeled Property Graph Data Model

  33. Relatjonships (contjnued) Nodes can be connected by more than one relatjonship Nodes can have more than one relatjonship Self relatjonships are allowed

  34. Graph Queries • A language for describing graphs • Creatjng nodes, relatjonships and propertjes • Querying data

  35. Querying a Graph • “Graph local” vs “Graph global” – Contextualized “ ego-centric ” queries • “ Parachute ” into graph – Start node(s) • Found through Index lookups • Crawl the surrounding graph – 2 million+ joins per second • No more Index lookups: Index-free adjacency

  36. Queries: Patuern Matching Patuern

  37. Start Node Patuern

  38. Match Patuern

  39. Match Patuern

  40. Match Patuern

  41. Non-Match Patuern

  42. Non-Match Patuern Not anchored to start node

  43. Other models to look at • Graph Gist htups://github.com/neo4j-contrib/graphgist/wiki • Chapter 3 of Graph Databases • Neo4j Manual htup://docs.neo4j.org/chunked/milestone/data-modeling- examples.html 7 8

  44. Technical Overview • Deployment modes • Java APIs • Additjonal libraries

  45. Embedded • Host in Java process • Access to Java APIs

  46. Server • HTTP/JSON interface • Server wraps embedded instance

  47. High Availability • Available in Enterprise editjon • Scale horizontally for availability and read throughput – Scale vertjcally for writes • Master-Slave replicatjon – Every instance is full copy of store • Master coordinates writes – Master is immediately consistent – Cluster is eventually consistent

  48. Neo4j Architecture

  49. Other Libraries • Graph Algorithms – Shortest Path – Shortest Weighted Path – A* – Dijkstra – Custom cost evaluators – Available in the core distributjon • Neo4j Spatjal – Geospatjal data – 3rd party library – Used in Telco productjon systems – htups://github.com/neo4j/spatjal

  50. Spring Data Neo4j • POJO based development • Dynamically generated repositories • Polyglot persistence – Object state persisted to graph and SQL database – Distributed transactjons • Maintained by Neo Technology

  51. Case Studies

  52. Industry: Retail Use case: Retail & C2C Delivery San Francisco & London Background • As eBay seeks to expand its global retail presence. Quick & predictable delivery is an important competitive cornerstone • To counter & upstage Amazon Prime, eBay acquired U.K.-based Shutl to form the core of a new delivery service, launching eBay Now ( www.ebay.com/now) prior to Christmas 2013 • Founded in 2009, Shutl was the U.K. Leader in same-day delivery, with 70% of the market Solution & Benefits Business problem • Neo4j runs at the heart of the system, calculating all • Enable customer-selected delivery inside 90min possible routes in real time for every order • Maintain a large network routes covering many • The Neo4j-based solution is thousands of times faster carriers and couriers. Calculate multiple routing than the prior MySQL solution operations simultaneously, in real time, across all • Queries require 10-100 times less code, improving time- possible routes to-market & code quality • Scale to enable a variety of services, including • Neo4j makes it possible to add functionality that was same-day delivery, consumer-to-consumer shipping previously not possible, and to easily extend the platform (www.shutl.it) and more predictable delivery times over time

  53. Industry: Media Use case: Master Data Management (Television EPG Data) London, UK Background • Zeebox is a well-established UK startup that offers second screen applications to end-users, advertisers and broadcasters • Founded by true media experts, Zeebox aims to reinvent TV since the advent of … TV. Business problem Solution & Benefits • Data complexity was growing exponentially as more • Neo4j 2.0 offered a much simpler, natural way to model, broadcasters and more shows were being added implement and query their electronic program guide data • leading to development time increases for • leading to faster development cycles • no “wedging” of the model into an artificial relational applications - a key strategic disadvantage in a fast- moving industry representation • Query times on the MySQL based model were starting • Future-safe solution: adding more to explode channels/broadcasters/programs does not complicate the • risk of having worse end-user experience. This was model unnecessarily • Query times went from 80 seconds (MySQL) to 42 “make or break” with respect to Zeebox’ offering and market position milliseconds (neo4j 2.0 traversal)

  54. Industry: Online Job Search Use case: Social / Recommendatjons Sausalito, CA Background T A Company _ • Online jobs and career community, providing S Company K R O W Person Person anonymized inside information to job seekers KNOWS KNOWS Person Person KNOWS WORKS_AT Company Person Company Person Solution & Benefits Business problem • Wanted to leverage known fact that most jobs are found • First-to-market with a product that let users find jobs through through personal & professional connections their network of Facebook friends • Needed to rely on an existing source of social network • Job recommendations served real-time from Neo4j • Individual Facebook graphs imported real-time into Neo4j data. Facebook was the ideal choice. • End users needed to get instant gratification • Glassdoor now stores > 50% of the entire Facebook social • Aiming to have the best job search service, in a very graph • Neo4j cluster has grown seamlessly, with new instances being competitive market brought online as graph size and load have increased Neo Technology Confidential

Recommend


More recommend