nitro a fast scalable in memory storage engine for nosql
play

Nitro: A Fast, Scalable In-Memory Storage Engine for NoSQL Global - PowerPoint PPT Presentation

Nitro: A Fast, Scalable In-Memory Storage Engine for NoSQL Global Secondary Index Sarath Lakshman, Sriram Melkote, John Liang, Ravi Mayuram Couchbase, Inc Presenter: Xiaoyao Qian 04.04.2017 4 million entries/sec 10 million lookups/sec 2


  1. Nitro: A Fast, Scalable In-Memory Storage Engine for NoSQL Global Secondary Index Sarath Lakshman, Sriram Melkote, John Liang, Ravi Mayuram Couchbase, Inc Presenter: Xiaoyao Qian • 04.04.2017

  2. 4 million entries/sec 10 million lookups/sec 2

  3. 3 https://www.mysql.com/why-mysql/benchmarks/

  4. Motivation 4

  5. 5

  6. Lock-Free Backup & Memory MVCC GC Evaluation GSI Skiplist Recovery Reclamation Ordered Linked List 6

  7. Lock-Free Backup & Memory MVCC GC Evaluation GSI Skiplist Recovery Reclamation n : #nodes in next level f : fanout factor Avg O(logN): insert, lookup, delete 7

  8. Lock-Free Backup & Memory MVCC GC Evaluation GSI Skiplist Recovery Reclamation Lock-free List Operations 8

  9. Lock-Free Backup & Memory MVCC GC Evaluation GSI Skiplist Recovery Reclamation isdeleted=0 isdeleted=1 1 4 8 6 DoubleCAS 9

  10. Lock-Free Backup & Memory MVCC GC Evaluation GSI Skiplist Recovery Reclamation MVCC: Multi-Version Concurrency Control - Immutable snapshots - Fast and low overhead snapshots - Avoid phantom reads - Memory efficiency - Fast and scalable garbage collection 10

  11. Lock-Free Backup & Memory MVCC GC Evaluation GSI Skiplist Recovery Reclamation Descriptor: refcount = x Descriptor: refcount = y MVCC primitives: lifetime and descriptor 11

  12. Lock-Free Backup & Memory MVCC GC Evaluation GSI Skiplist Recovery Reclamation Snapshot Iteration filter with bornSn>termSn && deadSn>=termSn 12

  13. Lock-Free Backup & Memory MVCC GC Evaluation GSI Skiplist Recovery Reclamation Comparison with Copy-On-Write B+ Tree (COW B+) 13

  14. Lock-Free Backup & Memory MVCC GC Evaluation GSI Skiplist Recovery Reclamation 1. The snapshot Sn(x) descriptor shows refcount = 0 2. The previous snapshot Sn(x-1) has been garbage collected, i.e garbage collection of snapshots can only be performed in the sequential order of the snapshot termSn 3. #gc_workers = #concurrent_writers 4. Writers keep track of deadList which is attached to the snapshot descriptor. Whenever a node is marked as deleted, add to deadList . 5. GC workers use deadList of a snapshot to perform physical node removal from the skiplist 14

  15. Lock-Free Backup & Memory MVCC GC Evaluation GSI Skiplist Recovery Reclamation 1. Traverse level 0 linked list of the skiplist, Minimum backup file size ✓ and write out the entries into data files ✓ Compression friendly 2. All entries that don’t belong to the snapshot Since skiplist is ordered, the data written ✓ are ignored to disk is also ordered 3. Node metadata (i.e lifetime) are not ❌ Could block garbage collection serialized. They can be recreated during recovery 15

  16. Lock-Free Backup & Memory MVCC GC Evaluation GSI Skiplist Recovery Reclamation Backup Backup Backup shard1 shard2 shard3 16

  17. Lock-Free Backup & Memory MVCC GC Evaluation GSI Skiplist Recovery Reclamation Buf: [nil, nil, nil, nil] Recovery 17

  18. Lock-Free Backup & Memory MVCC GC Evaluation GSI Skiplist Recovery Reclamation Buf: [nil, nil, nil, nil] -> [n1, n1, n1, n1] Recovery 18

  19. Lock-Free Backup & Memory MVCC GC Evaluation GSI Skiplist Recovery Reclamation Buf: [n1, n1, n1, n1] -> [n2, n2, n1, n1] Recovery 19

  20. Lock-Free Backup & Memory MVCC GC Evaluation GSI Skiplist Recovery Reclamation Buf: [n2, n2, n1, n1] -> [n3, n3, n3, n3] Recovery 20

  21. Lock-Free Backup & Memory MVCC GC Evaluation GSI Skiplist Recovery Reclamation Buf: [n3, n3, n3, n3] -> [n4, n3, n3, n3] Recovery 21

  22. Lock-Free Backup & Memory MVCC GC Evaluation GSI Skiplist Recovery Reclamation Buf: [n4, n3, n3, n3] -> [n5, n5, n5, n5] Recovery 22

  23. Lock-Free Backup & Memory MVCC GC Evaluation GSI Skiplist Recovery Reclamation Buf: [n5, n5, n5, n5] -> [n6, n6, n6, n5] Recovery 23

  24. Lock-Free Backup & Memory MVCC GC Evaluation GSI Skiplist Recovery Reclamation Buf: [n6, n6, n6, n5] -> [n7, n6, n6, n5] Recovery 24

  25. Lock-Free Backup & Memory MVCC GC Evaluation GSI Skiplist Recovery Reclamation Buf: [n7, n6, n6, n5] -> [nil, nil, nil, nil] Recovery 25

  26. Lock-Free Backup & Memory MVCC GC Evaluation GSI Skiplist Recovery Reclamation Backup worker Garbage collector Backing up termSn INIT ack Unlink, and write ACTIVE eligible data to delta backup files TERMINATE Are you done? Close delta backup files ack 26 Non-intrusive Backup

  27. Lock-Free Backup & Memory MVCC GC Evaluation GSI Skiplist Recovery Reclamation 27

  28. Lock-Free Backup & Memory MVCC GC Evaluation GSI Skiplist Recovery Reclamation BarrierSession: AccessBarrier liveCount = 2 t1 t2 t3 28

  29. Lock-Free Backup & Memory MVCC GC Evaluation GSI Skiplist Recovery Reclamation BarrierSessionClos e BarrierSession: AccessBarrier liveCount = 2 t1 t2 t3 29

  30. Lock-Free Backup & Memory MVCC GC Evaluation GSI Skiplist Recovery Reclamation Terminated BarrierSession: AccessBarrier liveCount = 2 t1 t2 t3 30

  31. Lock-Free Backup & Memory MVCC GC Evaluation GSI Skiplist Recovery Reclamation 31

  32. Lock-Free Backup & Memory MVCC GC Evaluation GSI Skiplist Recovery Reclamation 32

  33. Lock-Free Backup & Memory MVCC GC Evaluation GSI Skiplist Recovery Reclamation Global Secondary Index architecture 33

  34. Lock-Free Backup & Memory MVCC GC Evaluation GSI Skiplist Recovery Reclamation 34

  35. https://github.com/couchbase/nitro “TALK IS CHEAP, ~15,000 lines of code SHOW ME THE mainly in Golang, with a little C/C++ CODE” Apache 2.0 Licence 35

  36. Questions & Discussions 1. #GC_workers = #writers? Wouldn’t that be too intense? 2. Skiplist may not be good in cache utilization because of not consecutive memory. Can this be optimized? 3. How can a single large index be distributed? 36

Recommend


More recommend