supercharging cassandra
play

Supercharging Cassandra... Tom Wilkie Founder & VP - PowerPoint PPT Presentation

Supercharging Cassandra... Tom Wilkie Founder & VP Engineering @tom_wilkie Before the Flood 1990 Small databases BTree indexes BTree File systems RAID Old hardware Two Revolutions 2010 Distributed, shared-nothing databases


  1. Supercharging Cassandra... Tom Wilkie Founder & VP Engineering @tom_wilkie

  2. Before the Flood 1990 Small databases BTree indexes BTree File systems RAID Old hardware

  3. Two Revolutions 2010 Distributed, shared-nothing databases Write-optimised indexes Write-optimised indexes BTree file systems BTree file systems ... RAID RAID New hardware New hardware

  4. Bridging the Gap 2011 Distributed, shared-nothing databases Castle Castle ... New hardware New hardware

  5. Big Data Applications Memcached Open API Management ... Deployment . . . . . . . . . Monitoring ... ... ... ... ... ... ... Acunu Storage Core ... ... Cross-Cluster Management UI

  6. 1. Predictability

  7. Small random inserts Inserting 3 billion rows Acunu powered Cassandra - ‘standard’ Cassandra -

  8. Insert latency While inserting 3 billion rows Acunu powered Cassandra x ‘standard’ Cassandra +

  9. Small random range queries Performed immediately after inserts Acunu powered Cassandra - ‘standard’ Cassandra -

  10. Performance summary Standard Acunu Benefits inserts rate ~32k/s ~45k/s >1.4x 95% latency ~32s ~0.3s >100x gets rate ~100/s ~350/s >3.5x 95% latency ~2s ~0.5s >4x >100x range queries ~0.4/s ~40/s >7.5x 95% latency ~15s ~2s

  11. Doubling Array Inserts 2 2 9 9 Buffer arrays in memory until we have > B of them

  12. Doubling Array Inserts 11 2 9 2 8 9 11 etc... 8 11 8 Similar to log-structured merge trees (LSM), cache- oblivious lookahead array (COLA), ...

  13. Demo https://acunu-videos.s3.amazonaws.com/dajs.html

  14. 8KB @ 100MB/s, w/ 8ms seek 100 / 5 = 100 IOs/s = 20 updates/s ~ log (2^30)/log 100 = 5 IOs/update Range Query Update (Size Z) O(log B N) O(Z/B) B-Tree random IOs random IOs O((log N)/B) O(Z/B) Doubling Array sequential IOs sequential IOs 13k / 0.2 8KB @ 100MB/s ~ log (2^30)/100 = 65k updates/s = 13k IOs/s = 0.2 IOs/update B = “block size”, say 8KB at 100 bytes/entry ~= 100 entries

  15. More Shared memory interface Castle keys Userspace Acunu Kernel userspace interface values http://goo.gl/wXNDQ In-kernel async, shared workloads memory ring shared buffers kernelspace interface Streaming interface range key buffered key buffered queries insert value insert get value get • Opensource (GPLv2, MIT Doubling Arrays doubling array mapping layer for user libraries) Bloom filters insert key queues get arrays x range arrays queries management key • http://bitbucket.org/acunu insert merges Arrays mapping layer modlist btree • Loadable Kernel Module, Version tree key btree insert http://goo.gl/gzihe key get btree targeting CentOS’s 2.6.18 range queries value arrays • http://www.acunu.com/ Cache block mapping & cacheing layer "Extent" layer prefetcher extent block cache extent blogs/andy-twigg/why- freespace allocator manager flusher & mapper page cache acunu-kernel/ linux's block & Linux Kernel MM layers Block layer Memory manager

  16. 2. Monitoring

  17. jQuery VisualVM

  18. mx4j: Rest-JMX adapter Munin, Nagios etc

  19. 3. Operations

  20. -bash-3.2$ nodetool ... Available commands: ring - Print informations on the token ring join - Join the ring info - Print node informations (uptime, load, ...) cfstats - Print statistics on column families version - Print cassandra version tpstats - Print usage statistics of thread pools drain - Drain the node (stop accepting writes and flush all column families) decommission - Decommission the node compactionstats - Print statistics on compactions disablegossip - Disable gossip (effectively marking the node dead) enablegossip - Reenable gossip disablethrift - Disable thrift server enablethrift - Reenable thrift server netstats [host] - Print network information on provided host (connecting node by default) move <new token> - Move node on the token ring to a new token removetoken status|force|<token> - Show status of current token removal, force completion of pending removal or remove providen token setcompactionthroughput <value_in_mb> - Set the MB/s throughput cap for compaction in the system, or 0 to disable throttling. snapshot [keyspaces...] -t [snapshotName] - Take a snapshot of the specified keyspaces using optional name snapshotName clearsnapshot [keyspaces...] -t [snapshotName] - Remove snapshots for the specified keyspaces. Either remove all snapshots or remove the snapshots with the given name. flush [keyspace] [cfnames] - Flush one or more column family repair [keyspace] [cfnames] - Repair one or more column family cleanup [keyspace] [cfnames] - Run cleanup on one or more column family compact [keyspace] [cfnames] - Force a (major) compaction on one or more column family scrub [keyspace] [cfnames] - Scrub (rebuild sstables for) one or more column family invalidatekeycache [keyspace] [cfnames] - Invalidate the key cache of one or more column family invalidaterowcache [keyspace] [cfnames] - Invalidate the key cache of one or more column family getcompactionthreshold <keyspace> <cfname> - Print min and max compaction thresholds for a given column family cfhistograms <keyspace> <cfname> - Print statistic histograms for a given column family setcachecapacity <keyspace> <cfname> <keycachecapacity> <rowcachecapacity> - Set the key and row cache capacities of a given column family setcompactionthreshold <keyspace> <cfname> <minthreshold> <maxthreshold> - Set the min and max compaction thresholds for a given column family

  21. * S T O H S P A N S * And clones!

  22. v0 v2 v1 v1 v5 v3 v3 v6 v4

  23. Rebuild

  24. Disk Layout: RDA random duplicate allocation 4 2 2 1 4 5 5 3 1 3 5 2 7 10 7 6 9 9 10 6 8 8 8 9 15 12 14 14 11 11 12 13 13 15 13 14 16 16

  25. Future

  26. Memcache + Cassandra get/insert get/put memcached Cass client 100k random inserts/sec! Cassandra memcache Cassandra memcache Castle Castle ... H/W H/W

  27. v1 v1 v1 v1 v12 v13 v15 v13 v12 v13 v15 v13 v12 v13 v15 v13 v12 v13 v15 v13 v24 v16 v24 v16 v24 v16 v24 v16

  28. ~device capacity Beware the “write cliff”...

  29. • Castle: Predictable Performance for Big Data • Monitoring: distributed, multi- master tools, give you aggregated and summarised view of your cluster • Snapshots & Clones: addressing real problems with new workloads • RDA: lightening fast rebuilds for massive disks

  30. Questions? Tom Wilkie @tom_wilkie tom@acunu.com http://bitbucket.org/acunu http://github.com/acunu http://www.acunu.com/download http://www.acunu.com/insights

Recommend


More recommend