implementing riak in erlang benefits and challenges
play

IMPLEMENTING RIAK IN ERLANG: BENEFITS AND CHALLENGES Steve Vinoski - PowerPoint PPT Presentation

IMPLEMENTING RIAK IN ERLANG: BENEFITS AND CHALLENGES Steve Vinoski Basho Technologies Cambridge, MA USA http://basho.com @stevevinoski vinoski@ieee.org http://steve.vinoski.net/ Wednesday, April 24, 13 1 ERLANG Wednesday, April 24, 13


  1. Consistent Hashing • Riak uses SHA-1 as a hash function node 0 • Treats its 160-bit value space as a ring node 1 node 2 node 3 Wednesday, April 24, 13 29

  2. Consistent Hashing • Riak uses SHA-1 as a hash function node 0 • Treats its 160-bit value space as a ring node 1 • Divides the ring into partitions called "virtual node 2 nodes" or vnodes (default 64) node 3 Wednesday, April 24, 13 29

  3. Consistent Hashing • Riak uses SHA-1 as a hash function node 0 • Treats its 160-bit value space as a ring node 1 • Divides the ring into partitions called "virtual node 2 nodes" or vnodes (default 64) node 3 • Each vnode claims a portion of the ring space Wednesday, April 24, 13 29

  4. Consistent Hashing • Riak uses SHA-1 as a hash function node 0 • Treats its 160-bit value space as a ring node 1 • Divides the ring into partitions called "virtual node 2 nodes" or vnodes (default 64) node 3 • Each vnode claims a portion of the ring space • Each physical node in the cluster hosts multiple vnodes Wednesday, April 24, 13 29

  5. Hash Ring node 0 2 160 0 node 1 node 2 node 3 3*2 160 /4 2 160 /4 2 160 /2 Wednesday, April 24, 13 30

  6. Hash Ring node 0 node 1 node 2 node 3 Wednesday, April 24, 13 31

  7. Hash Ring node 0 node 1 node 2 node 3 bucket key Wednesday, April 24, 13 31

  8. N/R/W Values Wednesday, April 24, 13 32

  9. N/R/W Values • N = number of replicas to store (default 3, can be set per bucket) Wednesday, April 24, 13 32

  10. N/R/W Values • N = number of replicas to store (default 3, can be set per bucket) • R = read quorum = number of replica responses needed for a successful read (can be specified per-request) Wednesday, April 24, 13 32

  11. N/R/W Values • N = number of replicas to store (default 3, can be set per bucket) • R = read quorum = number of replica responses needed for a successful read (can be specified per-request) • W = write quorum = number of replica responses needed for a successful write (can be specified per- request) Wednesday, April 24, 13 32

  12. N/R/W Values node 0 node 1 node 2 node 3 for details see http://docs.basho.com/riak/1.3.1/tutorials/fast-track/Tunable-CAP-Controls-in-Riak/ Wednesday, April 24, 13 33

  13. N/R/W Values Wednesday, April 24, 13 34

  14. Implementing Consistent Hashing Wednesday, April 24, 13 35

  15. Implementing Consistent Hashing • Erlang's crypto module integration with OpenSSL provides the SHA-1 function Wednesday, April 24, 13 35

  16. Implementing Consistent Hashing • Erlang's crypto module integration with OpenSSL provides the SHA-1 function • Hash values are 160 bits Wednesday, April 24, 13 35

  17. Implementing Consistent Hashing • Erlang's crypto module integration with OpenSSL provides the SHA-1 function • Hash values are 160 bits • But that's OK, Erlang's integers are infinite precision Wednesday, April 24, 13 35

  18. Implementing Consistent Hashing • Erlang's crypto module integration with OpenSSL provides the SHA-1 function • Hash values are 160 bits • But that's OK, Erlang's integers are infinite precision • And Erlang binaries store these large values e ffj ciently Wednesday, April 24, 13 35

  19. Implementing Consistent Hashing Wednesday, April 24, 13 36

  20. Implementing Consistent Hashing Wednesday, April 24, 13 37

  21. Implementing Consistent Hashing Wednesday, April 24, 13 38

  22. Implementing Consistent Hashing Wednesday, April 24, 13 39

  23. Implementing Consistent Hashing Wednesday, April 24, 13 40

  24. Implementing Consistent Hashing Wednesday, April 24, 13 41

  25. Riak's Ring Wednesday, April 24, 13 42

  26. Riak's Ring Wednesday, April 24, 13 43

  27. Riak's Ring Wednesday, April 24, 13 44

  28. Riak's Ring Wednesday, April 24, 13 45

  29. Riak's Ring Wednesday, April 24, 13 46

  30. Ring State • All nodes in a Riak cluster are peers, no masters or slaves • Nodes exchange their understanding of ring state via a gossip protocol Wednesday, April 24, 13 47

  31. Distributed Erlang • Erlang has distribution built in — it's required for supporting multiple nodes for reliability • By default Erlang nodes form a mesh, every node knows about every other node • Riak uses this for intra-cluster communication Wednesday, April 24, 13 48

  32. Distributed Erlang • Riak lets you simulate a multi-node installment node 0 on a single machine, nice for development node 1 • "make devrel" or "make stagedevrel" in a riak repository clone (git://github.com/basho/riak.git) node 2 • Let's assume we have nodes dev1, dev2, and node 3 dev3 running in a cluster, nothing on the 4th node yet • Instead of starting riak, let's start the 4th node as just a plain distributed erlang node Wednesday, April 24, 13 49

  33. Distributed Erlang Wednesday, April 24, 13 50

  34. Distributed Erlang Wednesday, April 24, 13 51

  35. Distributed Erlang Wednesday, April 24, 13 52

  36. Distributed Erlang Wednesday, April 24, 13 53

  37. Distributed Erlang Wednesday, April 24, 13 54

  38. Distributed Erlang Mesh node 0 node 1 node 3 node 2 Wednesday, April 24, 13 55

  39. Distributed Erlang Mesh node 0 node 1 node 3 node 2 Wednesday, April 24, 13 55

  40. Distributed Erlang Mesh • Nodes talk to each other occasionally to check liveness node 0 node 1 node 3 node 2 Wednesday, April 24, 13 55

  41. Distributed Erlang Mesh • Nodes talk to each other occasionally to check liveness node 0 • Mesh approach makes it easy to set up a cluster node 1 node 3 node 2 Wednesday, April 24, 13 55

  42. Distributed Erlang Mesh • Nodes talk to each other occasionally to check liveness node 0 • Mesh approach makes it easy to set up a cluster node 1 node 3 • But communication overhead means it node 2 doesn't scale to large clusters > 150 nodes (yet) Wednesday, April 24, 13 55

  43. Gossip • Riak nodes are peers, there's no master • But the ring has state, such as what vnodes each node has claimed • Nodes periodically send their understanding of the ring state to other randomly chosen nodes • Riak gossip module also provides an API for sending ring state to specific nodes Wednesday, April 24, 13 56

  44. Control Vs. Data Wednesday, April 24, 13 57

  45. Control Vs. Data • Distributed Erlang: good for control plane, not so good for data plane Wednesday, April 24, 13 57

  46. Control Vs. Data • Distributed Erlang: good for control plane, not so good for data plane • Sending large data can cause busy distribution ports and head-of-line blocking Wednesday, April 24, 13 57

  47. Control Vs. Data • Distributed Erlang: good for control plane, not so good for data plane • Sending large data can cause busy distribution ports and head-of-line blocking • Use TCP, UDP, etc. directly for data plane tra ffj c Wednesday, April 24, 13 57

Recommend


More recommend