consistent hashing in your python applications
play

Consistent Hashing in your python applications Europython 2017 - PowerPoint PPT Presentation

Consistent Hashing in your python applications Europython 2017 @ultrabug Gentoo Linux developer CTO at Numberly History & main use cases Distributed (web) caching (Akamai) P2P (Chord & BitTorrent) Distributed databases (data


  1. Consistent Hashing in your python applications Europython 2017

  2. @ultrabug Gentoo Linux developer CTO at Numberly

  3. History & main use cases Distributed (web) caching (Akamai) P2P (Chord & BitTorrent) Distributed databases (data distribution / sharding) ● Amazon DynamoDB ● Cassandra / ScyllaDB ● Riak ● CockroachDB

  4. MAPPING referential -> information

  5. Phonebook name -> phone number

  6. Map logic Referential selection Logical operation INFORMATION lookup efficiency

  7. MAP key -> value

  8. Python dict() {key: value}

  9. Python dict() is a Hash Table

  10. Hash Table logic Hash function ( key ) Logical operation LOCATION implementation

  11. Python dict() implementation Array (in memory) hash(key) & (size of array - 1) = array index hash(‘a’) = 12416037344 & 11 = 0 0 | value: 123 1 | hash(‘c’) = 12672038114 & 11 = 2 2 | value: ‘coco’ hash(‘b’) = 12544037731 & 11 = 3 3 | value: None ... 11 |

  12. Key factors to consider Distribution (balancing) Accuracy LOCATION efficiency scaling

  13. Python dict efficiency & scaling hash(key) & (size of array - 1) = array index hash(‘a’) = 12416037344 & 11 = 0 0 | value: 123 hash() = uneven distribution 1 | MEMORY hash(‘c’) = 12672038114 & 11 = 2 2 | value: ‘coco’ hash(‘b’) = 12544037731 & 11 = 3 3 | value: None ... Optimized for fast lookups O(1) Memory inefficient (probing) 11 | MEMORY

  14. Distributed Hash Tables (DHT)

  15. Split your key space into buckets hash(key) operator bucket h o v hash(key) operator bucket h o v hash(key) operator bucket h o v the hash function will impact the size of each bucket

  16. Distribute your buckets to servers bucket 0 hash(key) operator SERVER 0 bucket 1 hash(key) operator SERVER 1 bucket 2 hash(key) operator SERVER 2 what’s the best operator function to find the server hosting the bucket for my key ?

  17. Naive DHT implementation md5(key) % (number of buckets) = server bucket 0 int( md5(b'd') .hexdigest(), 16) % 3 = 0 SERVER 0 bucket 1 int( md5(b'e') .hexdigest(), 16) % 3 = 1 SERVER 1 bucket 2 int( md5(b'f') .hexdigest(), 16) % 3 = 2 SERVER 2 simple & looking good...

  18. Naive DHT implementation md5(key) % (number of buckets) = server bucket 0 int( md5(b'd') .hexdigest(), 16) % 4 = 1 (was 0) SERVER 0 bucket 1 int( md5(b'e') .hexdigest(), 16) % 4 = 2 (was 1) SERVER 1 bucket 2 int( md5(b'f') .hexdigest(), 16) % 4 = 3 (was 2) SERVER 2 bucket 1 int( md5(b'g') .hexdigest(), 16) % 4 = 1 SERVER 1 ...until you add / remove a bucket/server! bucket 3 SERVER 3

  19. n/(n+1) ~ fraction of remapped keys

  20. HELP! we need consistency

  21. The Hash Ring

  22. Place your servers on the continuum (ring) hash(server 1) hash(server 0) hash(server 2)

  23. Keys’ bucket is on the next server in the ring hash(key) SERVER 2 SERVER 0 hash(key) SERVER 1

  24. 1/n ~ fraction of remapped keys

  25. Uneven partitions lead to hotspots server 0 server 2 server 1 hash functions are not perfect

  26. Which hash function to use ? Cryptographic hash functions Non cryptographic hash functions ● MD5 ● CityHash (google) ● SHA1 ● Murmur (v3) ● SHA256 optimized for key lookups standard fast adoption need conversion to int need of C libs SHAX - MD5 - CityHash128 - Murmur3 - CityHash64 - CityHash32 speed

  27. Hash Rings vnodes & weights mitigate hotspots reduces load variance on servers

  28. My preciouuus!

  29. Consistent Hashing implementations in python ConsistentHashing A simple implement of consistent hashing consistent_hash The algorithm is the same as libketama hash_ring Using md5 as hashing function python-continuum Using md5 as hashing function uhashring Full featured, ketama compatible

  30. uhashring

  31. Example use case #1 Database instances distribution DB1 client A client B DB2 client C DB3 client D DB4

  32. Example use case #1 Database instances distribution

  33. Example use case #1 Database instances distribution

  34. Example use case #2 Disk & network I/O distribution disk 1 task A disk task B 2 task C disk 3 task D disk 4

  35. Example use case #3 Log & tracing consistency worker 1 user_id A worker user_id B 2 user_id C worker 3 user_id D worker 4

  36. Example use case #4 python-memcached consolidation cache 1 ‘potato’ cache ‘coconut’ 2 ‘tomato’ cache 3 ‘raspberry’ cache 4

  37. Live demo raffle List of GIFs One of the GIF is the winner Every participant is a node (bucket) hash(WINNER_GIF_URL) picks the winner node

  38. http://ep17.nbly.co (silly live demo)

  39. Thanks github.com/ultrabug/ep2017 github.com/ultrabug/uhashring @ultrabug

Recommend


More recommend