The Case for Change Notifications in Pull-Based Databases Wolfram Wingerath, Felix Gessert, Steffen Friedrich, Erik Witt and Norbert Ritter Wolfram Wingerath wingerath@informatik.uni-hamburg.de March 6th, 2017, Stuttgart
Traditional Databases No No Request? No No Data! What‘s the current state? circular shapes Query ry main aintenance: : periodic polling → In Inefficient → Sl Slow 45
Ideal: Id : Push-Based Data Access Self lf-Main intaining Results Find people in Room B: 1. Erik (5/10) db.User.find() Wolle (22/8) 2. .equal('room','B') .ascending('name') 3. .limit(3) .streamResult() 15 A 10 B y 5 C x 15 0 5 10 20 25 46
Real-Time Databases
Fir irebase Overv rview: ◦ Real-tim ime state synchroniz izatio ion across devices ◦ Sim Simpli listic ic data model: : nested hierarchy of lists and objects ◦ Sim Simpli listic ic querie ies: mostly navigation/filtering ◦ Fu Full lly managed, proprietary ◦ Ap App SDK SDK for App development, mobile-first ◦ Go Google le se services in integratio ion: analytics, hosting, authorization , … His istory: ◦ 2011: chat service startup Envolve is founded → was often used for cross-device state synchronization → state synchronization is separated (Firebase) ◦ 2012: Firebase is founded ◦ 2013: Firebase is acquired by Google 48
Fir irebase Real-Time State Syn ynchronization • Tree data mod odel: application state ̴ JSON object • Su Subtree syn ynchin ing: push notifications for specific keys only → Flat structure for fine granularity → Limited expr pressiv iveness! Illustration taken from: Frank van Puffelen, Have you met the Realtime Database? (2016) https://firebase.googleblog.com/2016/07/have-you-met-realtime-database.html (2017-02-27) 49
Fir irebase Query Processing in in the Clie lient • Push notifications for sp specific keys only • Order by a si single le attribute • Apply a si single le filt filter on that attribute • Non-trivial query processing in client → doe oes not ot sc scal ale! Jacob Wenger, on the Firebase Google Group (2015) https://groups.google.com/forum/#!topic/firebase-talk/d-XjaBVL2Ko (2017-02-27) Illustration taken from: Frank van Puffelen, Have you met the Realtime Database? (2016) https://firebase.googleblog.com/2016/07/have-you-met-realtime-database.html (2017-02-27) 50
Meteor Overvie iew: ◦ Ja JavaScript Fr Framework for interactive apps and websites Mon ongoDB under the hood Real-time result updates, full MongoDB expressiveness ◦ Open-source: MIT license ◦ Man anaged se service: Galaxy (Platform-as-a-Service) His istory ry: ◦ 2011: Skybreak is announced ◦ 2012: Skybreak is renamed to Meteor ◦ 2015: Managed hosting service Galaxy is announced 51
Liv ive Queries Poll ll-and and-Dif iff • Chan ange monit itoring: app servers detect relevant changes → incomplete in multi-server deployment • Pol oll-and-diff: queries are re-executed periodically ? → stale leness win indow → doe oes not ot sc scal ale with queries ! poll DB every 10 seconds forward CRUD monitor incoming writes app server app server CRUD 52
Oplog Tail iling Basic ics: MongoDB Repli lication • Oplog: rolling record of data modifications • Mas aster-slave replication: write operation Secondaries subscribe to oplog Primary A Primary B Primary C apply MongoDB cluster (3 shards) propagate change Secondary C1 Secondary C2 Secondary C3 53
Oplog Tail iling Tapping in into the Oplo log • Every Meteor server receives all DB writes through oplogs MongoDB cluster (3 shards) → doe oes not ot sc scal ale Primary A Primary B Primary C query (when in doubt) Oplog broadcast monitor oplog App server App server push relevant events CRUD Bot ottle leneck! 54
Oplog Tail iling Oplo log In Info is is In Incomple lete What game does Bobby pla lay? → if baccarat, he takes first place! → if something else, nothing changes! Partial update from oplog: { name : „Bobby“, score: 500 } // game: ??? Baccarat players sorted by high-score 1. { name : „Joy“, game : „ baccarat “, score: 100 } 2. { name : „Tim“, game : „ baccarat “, score: 90 } 3. { name : „Lee“, game : „ baccarat “, score: 80 } 55
RethinkDB Overv rview: ◦ „ Mon ongoDB don one rig right “ : comparable queries and data model, but also: Pus ush-base sed qu querie ies (filters only) Jo Joins ins (non-streaming) Str trong con onsi sistency: linearizability ◦ Ja JavaS aScript SD SDK ( Horizon ): open-source, as managed service ◦ Op Open-source: Apache 2.0 license His istory ry: ◦ 2009: RethinkDB is founded ◦ 2012: RethinkDB is open-sourced under AGPL ◦ 2016, May: first official release of Horizon (JavaScript SDK) ◦ 2016, October: RethinkDB announces shutdown ◦ 2017: RethinkDB is relicensed under Apache 2.0 56
RethinkDB Changefeed Archit itecture • Range-sharded data • Rethin inkDB proxy: support node without data • Client communication RethinkDB storage cluster • Request routing • Real-time query matching • Every proxy receives all database writes → doe oes not ot sc scal ale RethinkDB proxy RethinkDB proxy App server App server William Stein, RethinkDB versus PostgreSQL: my personal experience (2017) Bot ottle leneck! http://blog.sagemath.com/2017/02/09/rethinkdb-vs-postgres.html (2017-02-27) Daniel Mewes, Comment on GitHub issue #962: Consider adding more docs on RethinkDB Proxy (2016) https://github.com/rethinkdb/docs/issues/962 (2017-02-27) 57
Parse Overv rview: ◦ Bac ackend-as as-a-Service for mobile apps Mon ongoDB: : largest deployment world-wide Eas asy de develo elopment: great docs, push notifications, authentication , … Rea eal-ti time updates for most MongoDB queries ◦ Op Open-source: BSD license ◦ Man anaged serv service: discontinued His istory ry: ◦ 2011: Parse is founded ◦ 2013: Parse is acquired by Facebook ◦ 2015: more than 500,000 mobile apps reported on Parse ◦ 2016, January: Parse shutdown is announced ◦ 2016, March: Liv Live Quer eries are announced ◦ 2017: Parse shutdown is finalized 58
Parse Liv iveQuery ry Archit itecture • Liv LiveQuery Se Server: no data, real-time query matching • Every LiveQuery Server receives all database writes → doe oes not ot sc scal ale Bot ottle leneck! Illustration taken from: 59 http://parseplatform.github.io/docs/parse-server/guide/#live-queries (2017-02-22)
Comparison by by Real-Time Query ry Why Comple lexit ity Matters matching conditions ordering Firebase Meteor RethinkDB Parse Todos created by „Bob“ ordered by deadline created by „Bob“ Todos AND with status equal to „ active “ Todos with „ work “ in the name ordered by deadline with „ work “ in the name ordered by deadline Todos AND AND status of „ active “ then by the creator‘s name 60
Quick Comparison DBMS vs. . RT DB vs. . DSMS vs. . Stream Processing Database Real-Time Data Stream Stream Management Databases Management Processing Data persistent collections persistent/ephemeral streams one-time + Processing one-time continuous continuous random + Access random sequential sequential structured, Streams structured unstructured 61
Dis iscussion Common Is Issues Every database with real-time features suffers from several of these problems: • Expr xpres essiveness : • Queries • Data model • Legacy support • Per erformance : • Latency & throughput • Scala labil ilit ity • Robustness : • Fault-tolerance, handling malicious behavior etc. • Separation of concerns: → Avail ilabili lity: will a crashing real-time subsystem take down primary data storage? → Co Consis istency: can real-time be scaled out independently from primary storage? 62
Engineering Efforts: Add-On Real-Time Queries
In InvaliDB Ext xternal Query ry Main intenance Pub-Sub Pub-Sub 65
InvaliDB In Change Notifications SELECT * FROM posts WHERE title LIKE "%NoSQL%" ORDER BY year DESC { title: "SQL", year: 2016 } ad add ch changeIndex change ch remove 66
In InvaliDB Filt ilter Queries: Dis istributed Query Matching Two-dimensional l par artit itioning: • by Query • by Object → sc scale les wit ith querie ies an and writ rites Write op! Implementation: • Apache Storm • Topology in Java Match! • MongoDB query language • Plu lugg ggable le query ry engin ine 67
In InvaliDB Staged Real-Tim ime Query ry Processin ing Change notifications go through up to 4 query processing stages: Filtering 1. 1. Filt Filter queries: track matching status Event! → before- and after-images 2. 2. So Sorted querie ies: maintain result order Ordering 3. Joi 3. Joins: combine maintained results a Event! 4. Aggregations: maintain aggregations 4. b c Joins Event! Aggregation ∑ Event! 68
In InvaliDB Low Latency + Lin inear Scalabili lity 69
Research in Hamburg
Deli livering Dynamic Content Two Bottlenecks: : Latency und und Processing High Latency Processing Time
Solution: : Glo lobal Caching Fresh Data from Ubiq iquitous Web Caches Low Latency Less Processing
Recommend
More recommend