Designing for Distributed, Unstructured Data Matt Brender Developer Advocate at Basho 1
=> curl $RIAK/props { “Matt Brender” : ‘developer advocate’, ‘ops > dev’, ’mbrender@basho.com’, ‘@mjbrender’, ‘neckbeardinfluence.com’, ‘geek-whisperers.com’, ‘indoor enthusiast’ } tweet me @mjbrender 2
I’m saying “Riak” Not “react,” as in react.js tweet me @mjbrender 3
tweet me @mjbrender 4
tweet me @mjbrender 5
tweet me @mjbrender 6
{ "text": ”Woot! #qconnewyork", "entities": { "hashtags": [“#qconnewyork”], "symbols": [], "urls": [], "user_mentions": [{ "screen_name": ”mjbrender", "name": ”Matt Brender", "id": 4948123, "id_str": ”42424242", "indices": [81, 92] }, { "screen_name": ”mjbrender", "name": ”Matt Brender", "id": 376825877, "id_str": "376825877", "indices": [121, 132] }] } } 7
Just Hording? tweet me @mjbrender 8
Just Hording? tweet me @mjbrender 9
A common pattern tweet me @mjbrender 10
tweet me @mjbrender 11
tweet me @mjbrender 12
tweet me @mjbrender 13
tweet me @mjbrender 14
tweet me @mjbrender 15
tweet me @mjbrender 16
tweet me @mjbrender 17
tweet me @mjbrender 18
tweet me @mjbrender 19
tweet me @mjbrender 20
Our Problem(s) • Same data in different formats • Cache • Denormalisation • Indexes • Aggregations • We’re sticking to what we know • Relational databases with SQL queries • Not anticipating scaling needs • We’re not sure what’s next • Bitten by architectural choices in the past • New systems require consideration • Not sure what’s justifies investment tweet me @mjbrender 21
Can’t I just … tweet me @mjbrender 22
tweet me @mjbrender 23
tweet me @mjbrender 24
tweet me @mjbrender 25
tweet me @mjbrender 26
tweet me @mjbrender 27
tweet me @mjbrender 28 36
tweet me @mjbrender 29
tweet me @mjbrender 30
The Choices tweet me @mjbrender 31
This or That • NoSQL • Hadoop • Types • HDFS • Key/Value • Map/Reduce • Document • YARN • Columnar • Graph • “Messaging Queues” • Spark • Pub/Sub • Successor to Map/ Reduce • Commit Log • Compute-focused tweet me @mjbrender 32
So, NoSQL tweet me @mjbrender 33
What Qualifies as NoSQL? tweet me @mjbrender 34 Basho Confidential
NOSQL Community tweet me @mjbrender 35 Basho Confidential
Persistence Querying Scaling tweet me @mjbrender 36
Persistence tweet me @mjbrender 37
tweet me @mjbrender 38
Querying tweet me @mjbrender 39
Other Queries Understanding how you get your data back Query Languages • SQL(?) Query Interfaces • HTTP/S • Protocol Buffers tweet me @mjbrender 40
Apache Solr Integration Write it like Riak. Query it like Solr. Distributed Full-Text Search Standard full-text Solr queries automatically expand into distributed search queries for a complete result set across instances. Ad-Hoc Query Support Broad support for Solr query parameters, e.g., exact match, range queries, and/or/not, sorting, pagination, scoring, ranking, etc. Index Synchronization Data is automatically synchronized between Riak KV and Solr using intelligent monitoring to detect changes, and propagates those to Solr indexes. Solr API Support Query data in Riak KV using existing Solr APIs Auto-Restart Monitor Solr OS processes continuously and automatically start or restart them whenever failures are detected. tweet me @mjbrender 41
Polylingual Querying There are a diverse group of client libraries for Riak that support both the HTTP and Protocol Bu fg er APIs: Basho Supported Libraries: Community Libraries: • Java • Clojure • Ruby • Go • Python • Perl • PHP • Scala • Erlang • R • .NET • Node.js • C tweet me @mjbrender 42
Scale means tweet me @mjbrender 43
tweet me @mjbrender 44
Sharding tweet me @mjbrender 45
Sharding Strategies Master OR Slave Slave Slave Node%1% Node%2% Node%3% tweet me @mjbrender 46
Sharding Strategies tweet me @mjbrender 47
CAP Theorem A AP CA Riak RDBMS Cassandra MySQL Couchbase Postgres Voldemort C P CP MongoDB BigTable Redis Hbase tweet me @mjbrender 48
What Are You Sacrificing? • CA • Data is consistent and R/W from any node until partition, when data will be out of sync (and won't re-sync) • CP • Data is consistent between all nodes, and maintains partition tolerance (preventing data de-sync) by becoming unavailable when a node goes down • AP • Nodes remain online even if they can't communicate with each other and will resync data once the partition is resolved, but you aren't guaranteed that all nodes will have the same data (either during or after the partition) tweet me @mjbrender 49
The Dynamo Paper tweet me @mjbrender 50
Conflict tweet me @mjbrender 51
Conflict Resolution tweet me @mjbrender 52
set conflict resolution 2015:05:25 2015:05:26 2015:05:27 { { { [“Tom” : “Beth”], [“George” : “Tom”], [“Beth” : “Tom”], [“Beth” : “Tom”], [“Beth” : “Jim”], [“Beth” : “Jim”], [“George” : “Jim”] [“George” : “Jim”] [“Beth” : “George”] } } } 53 tweet me @mjbrender
set conflict resolution Client Client Client Riak 54 tweet me @mjbrender
set conflict resolution Client Client Client { [“Tom” : “Beth”], { [“Beth” : “Tom”], Riak [“Tom” : “Beth”], [“George” : “Jim”] [“Beth” : “Tom”], } [“George” : “Jim”] } 55 tweet me @mjbrender
set conflict resolution Client { [“Jane”: “Tom”], Client [“Tom” : “Beth”], Client [“Beth” : “Tom”], [“George” : “Jim”] } { Riak [“Tom” : “Beth”], [“Beth” : “Tom”], [“George” : “Jim”] } 56 tweet me @mjbrender
set conflict resolution Client { [“Jane”: “Tom”], Client [“Tom” : “Beth”], Client [“Beth” : “Tom”], [“George” : “Jim”] } { [“Jane”: “Tom”], Riak [“Tom” : “Beth”], [“Beth” : “Tom”], [“George” : “Jim”] } 57 tweet me @mjbrender
set conflict resolution Client { [“Jane”: “Tom”], Client [“Tom” : “Beth”], Client [“Beth” : “Tom”], [“George” : “Jim”] { } [“Jane”: “Tom”], [“Tom” : “Beth”], { [“Beth” : “Tom”], [“Jane”: “Tom”], Riak [“George” : “Jim”] [“Tom” : “Beth”], } [“Beth” : “Tom”], [“George” : “Jim”] } 58 tweet me @mjbrender
set conflict resolution Client { [“Jane”: “Tom”], Client [“Tom” : “Beth”], Client [“Beth” : “Tom”], [“George” : “Jim”] { } [“Jane”: “Tom”], [“Tom” : “Beth”], { [“Beth” : “Tom”], [“Jane”: “Tom”], Riak [“George” : “Jim”] [“Tom” : “Beth”], } [“Beth” : “Tom”], [“George” : “Jim”] } 59 tweet me @mjbrender
set conflict resolution Client { [“Jane”: “Tom”], Client [“Tom” : “Beth”], Client [“Beth” : “Tom”], [“George” : “Jim”] { } [“Jane”: “Tom”], { [“Tom” : “Beth”], [“Jane”: “Tom”], [“Beth” : “Tom”], Riak [“Tom” : “Beth”], [“George” : “Jim”] [“Beth” : “Tom”], } [“George” : “Jim”], [“Tom”: “Jane”] } 60 tweet me @mjbrender
set conflict resolution Client { [“Jane”: “Tom”], [“Tom” : “Beth”], Client Client [“Beth” : “Tom”], [“George” : “Jim”], [“Beth”, “Jane”] } { { [“Jane”: “Tom”], [“Jane”: “Tom”], [“Tom” : “Beth”], Riak [“Tom” : “Beth”], [“Beth” : “Tom”], [“Beth” : “Tom”], [“George” : “Jim”] [“George” : “Jim”], } [“Tom”: “Jane”] } 61 tweet me @mjbrender
set conflict resolution Client { [“Jane”: “Tom”], [“Tom” : “Beth”], Client Client [“Beth” : “Tom”], [“George” : “Jim”], [“Beth”, “Jane”] } { [“Jane”: “Tom”], { Riak [“Tom” : “Beth”], [“Jane”: “Tom”], [“Beth” : “Tom”], [“Tom” : “Beth”], [“George” : “Jim”], [“Beth” : “Tom”], [“Tom”: “Jane”] [“George” : “Jim”], } [“Beth”, “Jane”] } 62 tweet me @mjbrender
Recommend
More recommend