leveraging bloom filters on redis cristian castiblanco
play

Leveraging bloom filters on Redis Cristian Castiblanco - PowerPoint PPT Presentation

Leveraging bloom filters on Redis Cristian Castiblanco me@cristian.io | cristian@scopely.com https://cristian.io Stream processing at Scopely Stream processing at Scopely Idempotence An operation is said to be idempotent when applying it


  1. Leveraging bloom filters on Redis

  2. Cristian Castiblanco me@cristian.io | cristian@scopely.com https://cristian.io

  3. Stream processing at Scopely

  4. Stream processing at Scopely

  5. Idempotence

  6. An operation is said to be idempotent when applying it multiple times has the same effect.

  7. Simplest approach to idempotence

  8. Idempotence with Redis sets

  9. Idempotence with Redis sets

  10. Idempotence with Redis sets

  11. Idempotence with Redis sets

  12. Memory usage per idempotence store 320 million records/day ≈ 70GB of memory

  13. Is there a better way?

  14. Is there a better way? • Space-efficient

  15. Is there a better way? • Space-efficient • Cost-effective

  16. Is there a better way? • Space-efficient • Cost-effective • More performant

  17. Is there a better way? • Space-efficient • Cost-effective • More performant • Awesome

  18. Enter bloom filters Probabilistic data structure to check for item membership

  19. Enter bloom filters Probabilistic data structure to check for item membership

  20. Bloom filters query

  21. Bloom filters query • Definitely not in the set

  22. Bloom filters query • Definitely not in the set • Probably in the set

  23. Bloom filters query • Definitely not in the set • Probably in the set • Configurable error rate

  24. Bloom fiters space efficiency Given 10.000.000 UUIDs...

  25. Bloom fiters space efficiency Given 10.000.000 UUIDs... • Redis set: 1GB

  26. Bloom fiters space efficiency Given 10.000.000 UUIDs... • Redis set: 1GB • Plain text: ~300 MB

  27. Bloom fiters space efficiency Given 10.000.000 UUIDs... • Redis set: 1GB • Plain text: ~300 MB • gzip: ~150 MB

  28. Bloom fiters space efficiency Given 10.000.000 UUIDs... • Redis set: 1GB • Plain text: ~300 MB • gzip: ~150 MB • Bloom filter with 1e-05 error rate: ~30MB (i.e., 1 in a million)

  29. Bloom fiters space efficiency Given 10.000.000 UUIDs... • Redis set: 1GB • Plain text: ~300 MB • gzip: ~150 MB • Bloom filter with 1e-05 error rate: ~30MB (i.e., 1 in a million) • Bloom filter with 1e-11 error rate: ~60MB (i.e., 1 in a million million)

  30. Memory usage comparison Sets 70GB vs Bloom Filters 7GB

  31. Latency comparison Redis sets Bloom filters

  32. Bloom filters example

  33. False positive == dropped data

  34. Bloom filters characteristics • Capacity • Error rate probability

  35. Scaling bloom filters

  36. Scaling bloom filters

  37. Scaling bloom filters

  38. Scaling bloom filters

  39. Scaling bloom filters

  40. Scaling bloom filters

  41. Scaling bloom filters

  42. Scaling bloom filters

  43. Tuning bloom filters Size depends on capacity/error probability

  44. Tuning bloom filters

  45. Tuning bloom filters • False positive probability: • Depends on your use case

  46. Tuning bloom filters • False positive probability: • Depends on your use case • Initial capacity: • Can't be too generous • Can't be too conservative

  47. First attempt: LUA scripts

  48. Second attempt: bloomd github.com/armon/bloomd

  49. bloomd drawbacks

  50. bloomd drawbacks • Lack of High Availability

  51. bloomd drawbacks • Lack of High Availability • No clustering support

  52. bloomd drawbacks • Lack of High Availability • No clustering support • Maintenance

  53. bloomd drawbacks • Lack of High Availability • No clustering support • Maintenance • Rigid API

  54. bloomd drawbacks • Lack of High Availability • No clustering support • Maintenance • Rigid API • Feels like abandonware

  55. ReBloom Bloom filters as a Redis module

  56. ReBloom example > BF.RESERVE your_filter 0.00001 50000000 OK > BF.ADD your_filter foo 1 > BF.EXISTS your_filter foo 1 > BF.EXISTS your_filter bar 0

  57. ReBloom

  58. ReBloom • Clustering

  59. ReBloom • Clustering • Redundancy/replication

  60. ReBloom • Clustering • Redundancy/replication • Lower cognitive overhead

  61. ReBloom • Clustering • Redundancy/replication • Lower cognitive overhead • Powerful API

  62. ReBloom • Clustering • Redundancy/replication • Lower cognitive overhead • Powerful API • No maintainance

  63. Summary • Bloom filters significantly reduce memory usage and latency • Redis modules allows your custom data structures to scale github.com/casidiablo cristian.io

Recommend


More recommend