push vs pull
play

Push vs. Pull The Future of Real-Time Databases in the Cloud - PowerPoint PPT Presentation

Push vs. Pull The Future of Real-Time Databases in the Cloud Wolfram Wingerath ww@baqend.com December 10, SCDM 2018, Seattle www.baqend.com About me Wolfram Wingerath PhD Thesis & Distributed Research Systems Engineer Research:


  1. Push vs. Pull The Future of Real-Time Databases in the Cloud Wolfram Wingerath ww@baqend.com December 10, SCDM 2018, Seattle www.baqend.com

  2. About me Wolfram Wingerath PhD Thesis & Distributed Research Systems Engineer Research: Practice : • Real-Time Databases • Backend-as-a-Service • Stream Processing • Web Caching + • NoSQL & Cloud Databases • Real-Time Database … • … • www.baqend.com

  3. Outline • A Small History Lesson Push-Based Data Access • The Problem With Why Real-Time Databases? Traditional Databases • Real-Time Databases to the Real-Time Databases Rescue! System survey Discussion … What are the bottlenecks? Future Directions Scalability & Use Cases 3

  4. A S Short His istory ry of Data Management Hot t Topic ics Th Through Th The Ages CEP & Stream Relational Databases Streams Processing Baqend Entity-Relationship Model Spark MapReduce Triggers Starburst SQL Samza STREAM Ingres Bigtable Meteor Standard Telegraph 1970 1970 1990 1990 2010 2010 1980 1980 2000 2000 tod oday HiPAC GFS System R Rapide Flink Dynamo Firebase PostgreSQL Aurora & RethinkDB Relational Storm Borealis Model Big Data & Real-Time Active Databases NoSQL Databases

  5. Traditional Databases The Problem: No No Request – No No Data! What‘s the current state? circular shapes Periodic Pol ollin ling for query result maintenance: → in inefficie ient → sl slow 5

  6. Real-time Databases Alw lways Up Up-to to-Date Wit ith Database St State circular shapes Real-Time Querie ies for query result maintenance: → efficient → fast 6

  7. Real-Time Query ry Main intenance Matchin ing Every ry Query ry Again inst Every ry Update  Potential bottlenecks: • Number of queries • Write throughput • Query complexity Similar processing for: • Triggers • ECA rules • Materialized views

  8. Outline • Meteor Push-Based Data Access • RethinkDB Why Real-Time Databases? • Parse • Firebase Real-Time Databases • Others System survey Discussion … What are the bottlenecks? Future Directions Scalability & Use Cases 8

  9. Real-Time Databases

  10. Meteor Overvie iew: ◦ Ja JavaScript Fr Framework for interactive apps and websites  Mon ongoDB under the hood  Real-time result updates, full MongoDB expressiveness ◦ Open-source: MIT license ◦ Man anaged se service: Galaxy (Platform-as-a-Service) His istory ry: ◦ 2011: Skybreak is announced ◦ 2012: Skybreak is renamed to Meteor ◦ 2015: Managed hosting service Galaxy is announced 10

  11. Liv ive Queries Poll ll-and and-Dif iff • Chan ange monit itoring: app servers detect relevant changes → incomplete in multi-server deployment • Pol oll-and-diff: queries are re-executed periodically ? → stale leness win indow → doe oes not ot sc scal ale with queries ! repeat query every 10 seconds forward CRUD monitor incoming writes app server app server CRUD 11

  12. Oplog Tail iling Basic ics: MongoDB Repli lication • Oplog: rolling record of data modifications • Mas aster-slave replication: write operation Secondaries subscribe to oplog Primary A Primary B Primary C apply MongoDB cluster (3 shards) propagate change Secondary C1 Secondary C2 Secondary C3 12

  13. Oplog Tail iling Tapping in into the Oplo log MongoDB cluster (3 shards) Primary A Primary B Primary C query (when in doubt) Oplog broadcast monitor oplog App server App server push relevant events CRUD 13

  14. Oplog Tail iling Oplo log In Info is is In Incomple lete What game does Bobby pla lay? → if baccarat, he takes first place! → if something else, nothing changes! Partial update from oplog: { name: „Bobby“, score: 500 } // game: ??? Baccarat players sorted by high-score 1. { name : „Joy“, game : „ baccarat “, score: 100 } 2. { name: „Tim“, game : „ baccarat “ , score: 90 } 3. { name: „Lee“, game : „ baccarat “ , score: 80 } 14

  15. Oplog Tail iling Tapping in into the Oplo log • Every Meteor server receives all DB writes through oplogs MongoDB cluster (3 shards) → doe oes not ot sc scal ale Primary A Primary B Primary C query (when in doubt) Oplog broadcast monitor oplog App server App server push relevant events CRUD Bot ottle leneck! 15

  16. RethinkDB Overv rview: ◦ „ Mon ongoDB don one rig right “ : comparable queries and data model, but also:  Pus ush-base sed qu querie ies (filters only)  Jo Joins ins (non-streaming)  Str trong con onsi sistency: linearizability ◦ Ja JavaS aScript SD SDK ( Horizon ): open-source, as managed service ◦ Op Open-source: Apache 2.0 license His istory ry: ◦ 2009: RethinkDB is founded ◦ 2012: RethinkDB is open-sourced under AGPL ◦ 2016, May: first official release of Horizon (JavaScript SDK) ◦ 2016, October: RethinkDB announces shutdown ◦ 2017: RethinkDB is relicensed under Apache 2.0 16

  17. RethinkDB Changefeed Archit itecture • Range-sharded data • RethinkDB proxy: support node without data • Client communication • Request routing RethinkDB storage cluster • Real-time query matching • Every proxy receives all database writes → doe oes not ot sc scale RethinkDB proxy RethinkDB proxy App server App server William Stein, RethinkDB versus PostgreSQL: my personal experience (2017) Bot ottle leneck! http://blog.sagemath.com/2017/02/09/rethinkdb-vs-postgres.html (2017-02-27) Daniel Mewes, Comment on GitHub issue #962: Consider adding more docs on RethinkDB Proxy (2016) https://github.com/rethinkdb/docs/issues/962 (2017-02-27) 17

  18. Parse Overv rview: ◦ Bac ackend-as as-a-Service for mobile apps  Mon ongoDB: : largest deployment world-wide  Eas asy de develo elopment: great docs, push notifications, authentication , …  Rea eal-ti time updates for most MongoDB queries ◦ Op Open-source: BSD license ◦ Man anaged serv service: discontinued His istory ry: ◦ 2011: Parse is founded ◦ 2013: Parse is acquired by Facebook ◦ 2015: more than 500,000 mobile apps reported on Parse ◦ 2016, January: Parse shutdown is announced ◦ 2016, March: Liv Live Quer eries are announced ◦ 2017: Parse shutdown is finalized 18

  19. Parse Liv iveQuery ry Archit itecture • Liv LiveQuery Se Server: no data, real-time query matching • Every LiveQuery Server receives all database writes → doe oes not ot sc scal ale Bot ottle leneck! Illustration taken from: 19 http://parseplatform.github.io/docs/parse-server/guide/#live-queries (2017-02-22)

  20. Fir irebase Overv rview: ◦ Real-tim ime state synchroniz izatio ion across devices ◦ Sim Simpli listic ic data model: : nested hierarchy of lists and objects ◦ Sim Simpli listic ic querie ies: mostly navigation/filtering ◦ Fu Full lly managed, proprietary ◦ Ap App SDK SDK for App development, mobile-first ◦ Go Google le se services in integratio ion: analytics, hosting, authorization , … His istory: ◦ 2011: chat service startup Envolve is founded → was often used for cross-device state synchronization → state synchronization is separated (Firebase) ◦ 2012: Firebase is founded ◦ 2013: Firebase is acquired by Google 20

  21. Fir irebase Real-Time State Syn ynchronization • Tree data mod odel: application state ̴ JSON object • Su Subtree syn ynchin ing: push notifications for specific keys only → Flat structure for fine granularity → Limited expr pressiv iveness! Illustration taken from: Frank van Puffelen, Have you met the Realtime Database? (2016) https://firebase.googleblog.com/2016/07/have-you-met-realtime-database.html (2017-02-27) 21

  22. Fir irebase Query Processing in in the Clie lient • Push notifications for sp specific keys only • Order by a si single le attribute • Apply a si single le filt filter on that attribute • Non-trivial query processing in client → doe oes not ot sc scal ale! Jacob Wenger, on the Firebase Google Group (2015) https://groups.google.com/forum/#!topic/firebase-talk/d-XjaBVL2Ko (2017-02-27) Illustration taken from: Frank van Puffelen, Have you met the Realtime Database? (2016) https://firebase.googleblog.com/2016/07/have-you-met-realtime-database.html (2017-02-27) 22

  23. Fir irebase Hard Scali ling Lim imits “Scale to around 100,000 concurrent connections and 1,000 writes/second in a single database. Scaling beyond that requires sharding your data across multiple databases.” Bot ottle leneck! Firebase, Choose a Database: Cloud Firestore or Realtime Database (2018) https://firebase.google.com/docs/database/rtdb-vs-firestore (2018-03-10)

  24. Fir irebase Fir irestore: New Model documents colle llections references Illustration taken from: Todd Kerpelman, Cloud Firestore for Realtime Database Developers (2017) https://firebase.googleblog.com/2017/10/cloud-firestore-for-rtdb-developers.html (2018-03-10)

  25. Fir irebase Fir irestore: New Model fin finer ac access granula lates tr tree-lik ike str tructure Illustration taken from: Todd Kerpelman, Cloud Firestore for Realtime Database Developers (2017) https://firebase.googleblog.com/2017/10/cloud-firestore-for-rtdb-developers.html (2018-03-10)

Recommend


More recommend