Agile web-development with RethinkDB Ilya Verbitskiy
Ilya Verbitskiy • Distributed systems, application security, fintech • ilya@verbitskiy.co • @ilich_x86 • https://github.com/ilich 2
Demo https://github.com/ilich/rethinkdb-101 3
What is RethinkDB? • Open-source database for building realtime web applications. • NoSQL database that stores schemaless JSON documents. • Distributed database that is easy to scale. • High availability database with automatic failover and robust fault tolerance. • The second most popular database on GitHub 4
The Good • Changefeeds. • Map-reduce. • Geospatial queries. • Collaborative web and mobile apps. • Streaming analytics apps. • Multiplayer games. • Realtime marketplaces. • Connected devices. 5
The Bad • RethinkDB is not a good choice if you need full ACID support or strong schema enforcement. • If you are doing deep, computationally-intensive analytics you are better off using a system like Hadoop. • In some cases RethinkDB trades off write availability in favor of data consistency. 6
RethinkDB vs. … • MongoDB • Firebase 7
Can I use my programming language? • JavaScript/Node.js • Python • Ruby • Java • C#/.NET • C++ • Go • PHP • … and even more on https://rethinkdb.com/docs/install- drivers/ 8
RethinkDB Structure • Database → Table → Document • Document is a schemaless JSON documents. 9
Introduction to ReQL • ReQL is the RethinkDB query language. • ReQL key principles: • ReQL embeds into your programming language. • All ReQL queries are chainable. • All queries execute on the server. • Good starting point if you already know SQL: https://www.rethinkdb.com/docs/sql-to-reql/javascript/ 10
Understanding ReQL • Client driver translates ReQL queries RethinkDB protocol and sends to the server for execution. • Anonymous function must return a valid ReQL expression. • In JavaScript you should use lt and gt commands instead of < and > operators. 11
Supported data types • Number • String (UTF-8) • Boolean • Null • Object • Array (by default, up to 100,000 elements) • Dates and times • Binary objects • Geometry objects and geospatial queries (indexes, GeoJSON support) 12
Data modeling in RethinkDB • Embedded arrays • Similar to MongoDB • Queries are simpler. • The data is often colocated on disk. If you have a dataset that doesn’t fit into RAM, data is loaded from disk faster. • Any update to the main document atomically updates both the main data and the linked data. • Up to 100,000 elements by default. • Deleting, adding or updating a document requires loading the entire array, modifying it, and writing the entire document back to disk. • Because of the previous limitation, it’s best to keep the size of the array to no more than a few hundred documents. 13
Data modeling in RethinkDB • Multiple tables • Similar to SQL • Operations on parent document don’t require loading the data for every child document for a given parent into memory. • There is no limitation on the number of child documents, so this approach is more suitable for large amounts of data. • The queries linking the data tend to be more complicated. • With this approach you cannot atomically update both the parent data and the child data. 14
Changefeeds • Changefeeds allow clients to receive changes on a table. • The changes command returns a cursor that receives updates. • Each update includes the new and old value of the modified record. • Changefeeds cannot guarantee delivery, since they are unidirectional with no acknowledgement returned from clients. 15
// Node.js r.table('users'). changes() .run(conn, function(err, cursor) { // Use cursor to process changes }); …. // Sample cursor { old_val: null, new_val: { { "city" : "MINNEAPOLIS", "state" : "MN", "_id" : "55311" }, } } 16
Commands supporting changefeeds • filter • getAll • map • pluck • between • union • min • max • orderBy.limit 17
Sharding and replication • RethinkDB is designed for clustering and easy scalability. • To add a new server to the cluster, just launch it with the --join parameter. • Configure sharding and replication per table. • Any feature that works with a single database will work in a sharded cluster. 18
Sharding and replication • There is a hard limit of 64 shards. • All sharding is currently done based on the table’s primary key only. • RethinkDB uses system statistics for the table to find the optimal set of split points to break up the table evenly • Sharding and replication is configured through table configurations • Number of shards • Number of replicas • Replicas can be associated with servers using server tags. • Tags are assigned to a server using --server-tag parameter • Use rebalance command to rebalances the shards of a table. • Use reconfigure command to setup a table’s sharding and replication. 19
RethinkDB Security • There is a little chance to have an injection attack against RethinkDB because it embeds into your programming language. • Make sure that you use the latest database drivers! • Be careful with .match() function. It may cause regular expression injection attack. • Do not use r.js(‘…’) to execute JavaScript code on the server. It is vulnerable to JavaScript injection attack. • Use TLS encryption. • By default, admin account does not have password. Always run you primary server with --initial-password parameter. • You cannot set password to administrator web-interface. Make sure it is behind firewall or bound to localhost (--bind-http parameter). 20
Additional Resources • RethinkDB: https://www.rethinkdb.com/ • RethinkDB installation: https://www.rethinkdb.com/docs/install/ • Thirty-second quickstart: https://www.rethinkdb.com/docs/quickstart/ • Ten-minute guide: https://www.rethinkdb.com/docs/guide/javascript/ • Cookbook: https://www.rethinkdb.com/docs/cookbook/javascript/ • Cheat sheet: https://www.rethinkdb.com/docs/sql-to-reql/javascript/ 21
Questions? 22
Recommend
More recommend