how graph databases started the multi model revolution
play

How graph databases started the multi-model revolution Luca Garulli - PowerPoint PPT Presentation

How graph databases started the multi-model revolution Luca Garulli Author and CEO @OrientDB QCon Sao Paulo - March 26, 2015 Welcome to Big Data 90% of the data in the world today has been created in the last two years alone.


  1. How graph databases started the multi-model revolution Luca Garulli Author and CEO @OrientDB QCon Sao Paulo - March 26, 2015

  2. Welcome to Big Data “90% of the data 
 in the world today 
 has been created 
 in the last two years alone.” - IBM

  3. Just Data Commodore Jill Amiga 1200 (Customer) (Product) Order #134 Luca (Order) (Provider) Bruno Monitor 40” (Provider) (Product) Mouse (Product)

  4. Just Data Commodore Data by itself has little Jill Amiga 1200 (Customer) (Product) value, it’s the relationship between data that gives it Order #134 Luca (Order) (Provider) incredible value Bruno Monitor 40” (Provider) (Product) Mouse (Product)

  5. Relationships give data “meaning” Commodore Jill Amiga 1200 (Customer) (Product) (Makes) (Has) (Sells) Order #134 Luca (Order) (Provider) (Has) (Has) Bruno Monitor 40” (Sells) (Provider) (Sells) (Product) Mouse (Product)

  6. Top NoSQL categories Key/Value Databases Document Databases Column Databases Graph Databases

  7. Top NoSQL categories Key/Value Databases Graph Databases Document Databases Column Databases

  8. Why do most NoSQL products avoid managing relationships?

  9. Joins is the Evil Customer CustomerAddress Address ID Name ID Address ID Location 10 John 10 24 24 Milan 11 John 10 33 33 London 24 Mike 32 44 18 Paris 28 Mike 18 Madrid 44 Moscow Is this familiar?

  10. Why ¡is ¡the ¡join ¡ so ¡slow?

  11. Index Lookup: how does it work? A-­‑Z A-­‑L M-­‑Z Imagine ¡an ¡ ¡ Address ¡Book ¡ where ¡we ¡want ¡to ¡find ¡ Luca’s ¡phone ¡number

  12. Index Lookup: how does it work? A-­‑Z A-­‑L M-­‑Z A-­‑L M-­‑Z A-­‑D E-­‑L M-­‑R S-­‑Z Index ¡algorithms ¡are ¡all ¡ similar ¡and ¡based ¡on ¡ balanced ¡trees

  13. Index Lookup: how does it work? A-­‑Z A-­‑L M-­‑Z A-­‑L M-­‑Z A-­‑D E-­‑L M-­‑R S-­‑Z A-­‑D E-­‑L A-­‑B C-­‑D E-­‑G H-­‑L

  14. Index Lookup: how does it work? A-­‑Z A-­‑L M-­‑Z A-­‑L M-­‑Z A-­‑D E-­‑L M-­‑R S-­‑Z A-­‑D E-­‑L A-­‑B C-­‑D E-­‑G H-­‑L E-­‑G H-­‑L E-­‑F G H-­‑J K-­‑L

  15. Index Lookup: how does it work? A-­‑Z A-­‑L M-­‑Z Found! ¡ ¡ A-­‑L M-­‑Z This ¡lookup ¡took ¡ 5 ¡steps. ¡ A-­‑D E-­‑L M-­‑R S-­‑Z With ¡millions ¡of ¡indexed ¡ records, ¡the ¡tree ¡depth ¡ A-­‑D E-­‑L could ¡be ¡1000’s ¡of ¡levels! A-­‑B C-­‑D E-­‑G H-­‑L E-­‑G H-­‑L E-­‑F G H-­‑J K-­‑L Luca

  16. Joins Kill Performance Customer CustomerAddress Address Joins are executed every time ID Name ID Address ID Location you cross relationships 10 John 10 24 24 Milan 11 John 10 33 33 London Querying million of records 24 Mike 32 44 18 Paris joining 3-4 tables could 28 Mike 18 Madrid generate billions of 44 Moscow combinations

  17. This is why the database query performance suffers as the database increases in size O(Log N)

  18. RDBMS performance on traversal DATABASE SIZE P E R F O R M A N C E

  19. In a world that’s becoming more connected, we need a better way to store data and manage relationships Read: Data is important, but relationships are even more fundamental today

  20. “A graph database is any storage system that provides index-free adjacency” - Marko Rodriguez (author of TinkerPop Blueprints)

  21. Every developer knows the Relational Model, but who knows the Graph one?

  22. Back to school: Graph Theory crash course

  23. Basic Graph Visited Luca Sao ¡Paulo

  24. Property Graph Model* Vertices ¡are ¡directed Luca ¡ Sao ¡Paulo ¡ Visited ¡ company: ¡ on: ¡2015 people: ¡12,000,000 OrientTechnologies Vertices ¡and ¡Edges ¡can ¡ Vertices ¡and ¡Edges ¡can ¡ Vertices ¡and ¡Edges ¡can ¡ have ¡properties have ¡properties have ¡properties * ¡https://github.com/tinkerpop/blueprints/wiki/Property-­‑Graph-­‑Model

  25. 1-N and N-M Relationships Visited ¡ o n : ¡ 2 0 1 5 Luca Sao ¡Paulo Worked ¡ o n : ¡ 2 0 1 5 An ¡Edge ¡connects ¡only ¡2 ¡vertices ¡ ¡ Use ¡multiple ¡edges ¡to ¡represent ¡1-­‑N ¡ and ¡N-­‑M ¡relationships

  26. Congrats! This is your diploma in «Graph Theory»

  27. The Graph theory is so simple, yet so powerful

  28. How does a true* Graph Database manage relationships? *a “Graph” layer on top of a DBMS doesn’t qualify as a true GraphDB

  29. Each element in the Graph Each element in the Graph Each element in the Graph has own immutable Record ID has own immutable Record ID has own immutable Record ID #22:11 Visited ¡ o n : ¡ 2 0 1 5 #15:99 #13:55 (Edge) Luca Sao ¡Paulo (Vertex) (Vertex)

  30. #22:11 Visited ¡ in = #15:99 out = #22:11 out = #13:55 o n : ¡ 2 0 1 5 #15:99 in = #22:11 #13:55 (Edge) Luca Sao ¡Paulo (Vertex) (Vertex) Connections use persistent pointers

  31. #22:11 Visited ¡ in = #15:99 out = #22:11 out = #13:55 o n : ¡ 2 0 1 5 #15:99 in = #22:11 #13:55 (Edge) Luca Sao ¡Paulo (Vertex) (Vertex)

  32. #22:11 Visited ¡ in = #15:99 out = #22:11 out = #13:55 o n : ¡ 2 0 1 5 #15:99 in = #22:11 #13:55 (Edge) Luca Sao ¡Paulo (Vertex) (Vertex)

  33. A Graph Database creates the relationship just once (when the edge is created) VS RDBMS computes the relationship every time you query a database

  34. When you move from a RDBMS to a Graph Database you jump from a O(log N) speed to a near O(1) With a Graph Database, the traversing time is not affected by database size! This is huge in the BigData age

  35. Graph Databases Easily Manage Complex Relationships Lives in Theater John B Pulp Thriller Fiction NYC Likes Theater A Comedy Mr Bean San Josè Theater No costs to traverse relationships: C Recommendation engines • Social Applications • Spatial Apps • Master Data Management • Information Clustering •

  36. GraphDB Database Quadrant Graph Relationships Complexity > Relational Document Column Key Value Data Complexity >

  37. GraphDB Database Quadrant Graph These were 1st generation NoSQL products, where each tool was Relationships Complexity > Relational only good at a few use cases Document Column Key Value Data Complexity >

  38. 1st Generation NoSQL: Scenario Redis or Memcache (Key/Value) Neo4j Application (GraphDB) Primary DB Oracle MongoDB (RDBMS) (DocDB) ETL

  39. 1st Generation NoSQL: Fact In > 90% of use cases, NoSQL products are used as second DBMS

  40. 1st Generation NoSQL: Problems - No standard between NoSQL Redis or Memcache products (Key/Value) - Multiple vendors = multiple skills - ETL + synchronization code is costly to write and maintain Neo4j - Performance and Reliability is Application (GraphDB) hard to predict Oracle MongoDB (RDBMS) (DocDB) ETL

  41. 2nd Generation NoSQL is Multi-Model

  42. What’s Multi-Model DBMS? Key/Value Graph Document Object Multi Model represents the intersection of multiple models in just one product

  43. What’s Multi-Model DBMS? Key/Value - Just one product to learn and maintain - Just one vendor relationship to manage - No ETL, no synchronization required Graph Document - Performance and Reliability is easy to test from the beginning Object Multi Model represents the intersection of multiple models in just one product

  44. Relationships give data “meaning” Commodore Jill Amiga 1200 (Customer) (Product) (Makes) (Has) (Sells) Order #134 Luca (Order) (Provider) (Has) (Has) Bruno Monitor 40” (Sells) (Provider) (Sells) (Product) 3 Wheel Mouse (Product)

  45. Multi-Model domain schema Legenda: Actor Vertex V name: string surname: string Edge Inherits Customer Provider Makes Order Sells number: int price: decimal date: datetime Has Product price: decimal name: string qty: int

  46. Vertices and Edges are Documents Jill M a k e s { ”@rid": “12:382”, Order ”@class": ”Customer", “name”: “Jill”, General purpose solution: “surname” : “Raggio”, ` • JSON “phone” : “+39 33123212”, “details”: { • Schema-less “city”:”London", • Schema-full “tags”:”millennial” • Schema-hybrid } } • Nested documents • Rich indexing and querying • Developer friendly

  47. Polymorphic queries Jill SELECT * FROM Customer (Customer) Luca Bruno SELECT * FROM Provider (Provider) (Provider) Bruno Jill Luca SELECT * FROM Actor (Provider) (Customer) (Provider)

  48. Multi-Model complex domains schema Legenda: Vertex V MusicTaste Account Likes Edge Inherits Genre Band Performs Plays Location

  49. Multi-Model complex domains (Likes) Indie Jill (Genre) (Account) (Plays) (Likes) Luca (Likes) (Account) Rock (Genre) (Likes) Snow Patrol (Band) 123, 1st Street Austin, TX (Performs) (Location) April 7, 2015 9pm-11.30pm

Recommend


More recommend