sti bt a scalable transactional index
play

STI-BT: A Scalable Transactional Index Nuno Diegues and Paolo Romano - PowerPoint PPT Presentation

STI-BT: A Scalable Transactional Index Nuno Diegues and Paolo Romano 34th International Conference on Distributed Systems (ICDCS) Distributed Key-Value (DKV) stores rise in popularity: Distributed Key-Value (DKV) stores rise in popularity:


  1. STI-BT: A Scalable Transactional Index Nuno Diegues and Paolo Romano 34th International Conference on Distributed Systems (ICDCS)

  2. Distributed Key-Value (DKV) stores rise in popularity:

  3. Distributed Key-Value (DKV) stores rise in popularity: • scalability • fault-tolerance • elasticity

  4. Distributed Key-Value (DKV) stores rise in popularity: • scalability • fault-tolerance • elasticity Recent trend: • cloud adoption, large/elastic scaling • move towards strong consistency • easy and transparent APIs

  5. We focus on two main disadvantages: typically embrace weak consistency key-value access is too simplistic • mainly an index for primary key

  6. We focus on two main disadvantages: typically embrace weak consistency key-value access is too simplistic • mainly an index for primary key Providing Secondary index is non-trivial

  7. We focus on two main disadvantages: typically embrace weak consistency key-value access is too simplistic • mainly an index for primary key Providing Secondary index is non-trivial But it is desirable!

  8. We focus on two main disadvantages: typically embrace weak consistency key-value access is too simplistic • mainly an index for primary key Providing Secondary index is non-trivial But it is desirable! State of the art solutions either…: did not provide strongly consistent transactions — more difficult were not fully decentralised — not scalable required several hops to access the index — more latency

  9. In this work we present STI-BT: a Scalable Transactional Index

  10. In this work we present STI-BT: a Scalable Transactional Index Serializable distributed transactions Secondary indexes via a distributed B+Tree implementation Index accesses/changes obey transactions’ semantics

  11. In this work we present STI-BT: a Scalable Transactional Index Serializable distributed transactions Secondary indexes via a distributed B+Tree implementation Index accesses/changes obey transactions’ semantics Provide strong consistency + scalable indexing

  12. Outline • Background on DKV stores • STI-BT • Evaluation • Related Work

  13. Background on DKV store Infinispan DKV store by Red Hat

  14. Background on DKV store Infinispan DKV store by Red Hat Distributed vector-clock based protocol: GMU [ICDCS’12]

  15. Background on DKV store Infinispan DKV store by Red Hat Distributed vector-clock based protocol: GMU [ICDCS’12] Read-only transactions do not abort M ulti-versioned

  16. Background on DKV store Infinispan DKV store by Red Hat Distributed vector-clock based protocol: GMU [ICDCS’12] Read-only transactions do not abort M ulti-versioned Update transactions U pdate Serializability

  17. Background on DKV store Infinispan DKV store by Red Hat Distributed vector-clock based protocol: GMU [ICDCS’12] Read-only transactions do not abort M ulti-versioned G enuine Update transactions U pdate Serializability

  18. GMU data set:

  19. GMU data set: replication degree: 2 consistent hash function

  20. GMU data set: replication degree: 2 consistent hash function

  21. GMU data set: replication degree: 2 consistent hash function

  22. GMU: genuine partial replication No central component Transactions require only machines holding data used

  23. GMU: genuine partial replication No central component Transactions require only machines holding data used read/write

  24. GMU: genuine partial replication No central component Transactions require only machines holding data used commit tx

  25. GMU: genuine partial replication No central component Transactions require only machines holding data used commit tx consensus for commit

  26. Outline • Background on DKV stores • STI-BT � • Evaluation • maximizing data locality • hybrid replication • Related Work • elastic scaling • concurrency enhancements

  27. The need for data locality of the index Starting point: • consider a distributed B+Tree built on the DKV

  28. The need for data locality of the index Starting point: • consider a distributed B+Tree built on the DKV consistent hash function S1 S2 S3 S4

  29. The need for data locality of the index Starting point: • consider a distributed B+Tree built on the DKV • tree nodes placed with random consistent hash consistent hash function S1 S4 S3 S1 S2 S3 S4 S3 S1 S4 S2

  30. Current problems with data locality Problems with consistent hashing data placement: S1 S4 S3 S2 Z P S3 S1 S4

  31. Current problems with data locality Problems with consistent hashing data placement: - One index access entails several hops S1 S4 S3 S2 Z P S3 S1 S4

  32. Current problems with data locality Problems with consistent hashing data placement: - One index access entails several hops S1 delete Z S4 S3 P Z S3 S1 S4 S2

  33. Current problems with data locality Problems with consistent hashing data placement: - One index access entails several hops S1 delete Z S1 S2 S3 S4 S3 Z P Z S3 S1 S4 S2

  34. Current problems with data locality Problems with consistent hashing data placement: - One index access entails several hops - Some servers receive more load than others S1 delete Z S1 S2 S3 S4 S3 Z P Z S3 S1 S4 S2

  35. Current problems with data locality Problems with consistent hashing data placement: - One index access entails several hops - Some servers receive more load than others server load S1 delete Z S1 S2 S3 S4 S3 Z P Z S3 S1 S4 S2

  36. Current problems with data locality Problems with consistent hashing data placement: - One index access entails several hops - Some servers receive more load than others - Range scan operations are also inefficient server load S1 delete Z S1 S2 S3 S4 S3 Z P Z S3 S1 S4 S2

  37. Current problems with data locality Problems with consistent hashing data placement: - One index access entails several hops - Some servers receive more load than others - Range scan operations are also inefficient server load S1 delete Z S1 S2 S3 S4 S3 Z P Z S3 S1 S4 S2 scan P to Z

  38. Where typical solutions fall short Partial replication of the index: poor locality poor load balancing

  39. Where typical solutions fall short Partial replication of the index: poor locality poor load balancing Full replication of the index: consensus on updates is too expensive prevents scaling out storage

  40. STI-BT: Maximizing data locality of the index

  41. STI-BT: Maximizing data locality of the index Hybrid replication top nodes are more accessed but less modified better load balancing, rare cost for expensive consensus full replication C (cut-off level) partial replication

  42. STI-BT: Maximizing data locality of the index Hybrid replication top nodes are more accessed but less modified better load balancing, rare cost for expensive consensus Co-located data placement groups of sub-trees, reduce network hops migrate transaction to exploit co-location full replication C (cut-off level) S1 S3 S4 S2 partial replication

  43. Transaction migration driven by data co-location full C S1 S3 S4 partial S2 K

  44. Transaction migration driven by data co-location S1 S2 S3 S4 full C S1 S3 S4 partial S2 K

  45. Transaction migration driven by data co-location Lookup K S1 S2 S3 S4 full C S1 S3 S4 partial S2 K

  46. Transaction migration driven by data co-location Lookup K 1 S1 S2 S3 S4 1 full C S1 S3 S4 partial S2 K

  47. Transaction migration driven by data co-location Lookup K 1 2 local search S1 S2 S3 S4 full 2 C S1 S3 S4 partial S2 K

  48. Transaction migration driven by data co-location Lookup K 1 local 2 search 3 migrate tx S1 S2 S3 S4 full C 3 S1 S3 S4 partial S2 K

  49. Transaction migration driven by data co-location Lookup K 1 local 4 search local 2 search 3 migrate tx S1 S2 S3 S4 full C S1 S3 S4 partial S2 4 K

  50. Transaction migration driven by data co-location Lookup K 1 local 4 search local 2 search 3 migrate tx S1 S2 S3 S4 full C S1 S3 S4 partial S2 K

  51. Grouping index in sub-trees Still rely on consistent hashing: • preserve fully decentralized design and quick lookup of data Exploit knowledge over structure of the indexed data • general purpose data placement is agnostic of the data • but we know how it will be structured

  52. Grouping index in sub-trees Still rely on consistent hashing: • preserve fully decentralized design and quick lookup of data Exploit knowledge over structure of the indexed data • general purpose data placement is agnostic of the data • but we know how it will be structured k u : unique key consistent hash function server

  53. Grouping index in sub-trees Still rely on consistent hashing: • preserve fully decentralized design and quick lookup of data Exploit knowledge over structure of the indexed data • general purpose data placement is agnostic of the data • but we know how it will be structured k u : unique key consistent hash function server local map lookup k u : unique key

Recommend


More recommend