Leveraging bloom filters on Redis Cristian Castiblanco - PowerPoint PPT Presentation

Leveraging bloom filters on Redis

Cristian Castiblanco me@cristian.io | cristian@scopely.com https://cristian.io

Stream processing at Scopely

Idempotence

An operation is said to be idempotent when applying it multiple times has the same effect.

Simplest approach to idempotence

Idempotence with Redis sets

Memory usage per idempotence store 320 million records/day ≈ 70GB of memory

Is there a better way?

Is there a better way? • Space-efficient

Is there a better way? • Space-efficient • Cost-effective

Is there a better way? • Space-efficient • Cost-effective • More performant

Is there a better way? • Space-efficient • Cost-effective • More performant • Awesome

Enter bloom filters Probabilistic data structure to check for item membership

Bloom filters query

Bloom filters query • Definitely not in the set

Bloom filters query • Definitely not in the set • Probably in the set

Bloom filters query • Definitely not in the set • Probably in the set • Configurable error rate

Bloom fiters space efficiency Given 10.000.000 UUIDs...

Bloom fiters space efficiency Given 10.000.000 UUIDs... • Redis set: 1GB

Bloom fiters space efficiency Given 10.000.000 UUIDs... • Redis set: 1GB • Plain text: ~300 MB

Bloom fiters space efficiency Given 10.000.000 UUIDs... • Redis set: 1GB • Plain text: ~300 MB • gzip: ~150 MB

Bloom fiters space efficiency Given 10.000.000 UUIDs... • Redis set: 1GB • Plain text: ~300 MB • gzip: ~150 MB • Bloom filter with 1e-05 error rate: ~30MB (i.e., 1 in a million)

Bloom fiters space efficiency Given 10.000.000 UUIDs... • Redis set: 1GB • Plain text: ~300 MB • gzip: ~150 MB • Bloom filter with 1e-05 error rate: ~30MB (i.e., 1 in a million) • Bloom filter with 1e-11 error rate: ~60MB (i.e., 1 in a million million)

Memory usage comparison Sets 70GB vs Bloom Filters 7GB

Latency comparison Redis sets Bloom filters

Bloom filters example

False positive == dropped data

Bloom filters characteristics • Capacity • Error rate probability

Scaling bloom filters

Tuning bloom filters Size depends on capacity/error probability

Tuning bloom filters

Tuning bloom filters • False positive probability: • Depends on your use case

Tuning bloom filters • False positive probability: • Depends on your use case • Initial capacity: • Can't be too generous • Can't be too conservative

First attempt: LUA scripts

Second attempt: bloomd github.com/armon/bloomd

bloomd drawbacks

bloomd drawbacks • Lack of High Availability

bloomd drawbacks • Lack of High Availability • No clustering support

bloomd drawbacks • Lack of High Availability • No clustering support • Maintenance

bloomd drawbacks • Lack of High Availability • No clustering support • Maintenance • Rigid API

bloomd drawbacks • Lack of High Availability • No clustering support • Maintenance • Rigid API • Feels like abandonware

ReBloom Bloom filters as a Redis module

ReBloom example > BF.RESERVE your_filter 0.00001 50000000 OK > BF.ADD your_filter foo 1 > BF.EXISTS your_filter foo 1 > BF.EXISTS your_filter bar 0

ReBloom

ReBloom • Clustering

ReBloom • Clustering • Redundancy/replication

ReBloom • Clustering • Redundancy/replication • Lower cognitive overhead

ReBloom • Clustering • Redundancy/replication • Lower cognitive overhead • Powerful API

ReBloom • Clustering • Redundancy/replication • Lower cognitive overhead • Powerful API • No maintainance

Summary • Bloom filters significantly reduce memory usage and latency • Redis modules allows your custom data structures to scale github.com/casidiablo cristian.io

Leveraging bloom filters on Redis Cristian Castiblanco - PowerPoint PPT Presentation

Leveraging bloom filters on Redis Cristian Castiblanco me@cristian.io | cristian@scopely.com https://cristian.io Stream processing at Scopely Stream processing at Scopely Idempotence An operation is said to be idempotent when applying it

Redis for Fast Data Ingest Agenda Fast Data Ingest and its challenges Redis for Fast

Redis Graph A graph database built on top of redis Whats Redis? Open source in-memory

Outline Bloom filters Applications of Bloom filters Our replacement for Bloom filters

Bloom Filters Queries False-Positives Analysis Summary Anil Maheshwari anil@scs.carleton.ca

Bloom Filters References A. Broder and M. Mitzenmacher, Network applications of Bloom A.

Bloom Filters Anna Karlin Most slides by Shreya Jayaraman, Luxi Wang, Alex Tsun Bloom Filters:

Revisiting Bloom Filters Payload attribution via Hierarchiecal Bloom Filters Kulesh

Overview of Discrete-Time Filters First-order filters Ideal filters Practical filters

Overview of Discrete-Time Filters Discrete-Time Filters Overview First-order filters N M

An Examination of Bloom Filters and their Applications Jacob Honoroff March 16, 2006 Outline

Lecture #2: Advanced hashing and concentration bounds o Bloom filters o Cuckoo hashing o Load

Vectorized Bloom Filters for Advanced SIMD Processors Orestis Polychroniou Kenneth A. Ross

Filters (Bloom & Quotient) CSCI 333 Operations Filters approximately represent sets.

Multiple NoSQL Use Cases with Redis Modules Kamran Yousaf kamran@redislabs.com About Redis Open

Redis 2.2 October 27 th 2010 Pieter Noordhuis Who am I? Live in Groningen, NL Redis

Redis Presentation by Atreyee Maiti What is redis? an in-memory key-value store, with

Creating a presentation on 529 plans Students conduct Internet research on 529 plans and create a

Update of Regenerative Reheat Furnace Conversions in North America Valerie Wentling The Timken

STEM and Blooms Kimberley Dempsey and Laura Ganley AccomplishedSTEM @gmail.com A BOUT K

BLOOMS TAXONOMY At the end of this workshop you will be able to: Explain what a Taxonomy

BEST PRACTICES FOR DESIGNING AND ASSESSING ONLINE DISCUSSION QUESTIONS Andrea McCourt, PhD

Climate and Harmful Algal Blooms in Lake Erie Richard P. Stumpf NOAA National Centers for

Harmful Algal Blooms and Domoic Acid: Latest Forecast and a Look Ahead to the Upcoming Season

A Brief History of Metals Speciation Method Development and Analysis in the Pacific Northwest

Leveraging bloom filters on Redis Cristian Castiblanco - PowerPoint PPT Presentation

Leveraging bloom filters on Redis Cristian Castiblanco me@cristian.io | cristian@scopely.com https://cristian.io Stream processing at Scopely Stream processing at Scopely Idempotence An operation is said to be idempotent when applying it

Redis for Fast Data Ingest Agenda Fast Data Ingest and its challenges Redis for Fast

Redis Graph A graph database built on top of redis Whats Redis? Open source in-memory

Outline Bloom filters Applications of Bloom filters Our replacement for Bloom filters

Bloom Filters Queries False-Positives Analysis Summary Anil Maheshwari anil@scs.carleton.ca

Bloom Filters References A. Broder and M. Mitzenmacher, Network applications of Bloom A.

Bloom Filters Anna Karlin Most slides by Shreya Jayaraman, Luxi Wang, Alex Tsun Bloom Filters:

Revisiting Bloom Filters Payload attribution via Hierarchiecal Bloom Filters Kulesh

Overview of Discrete-Time Filters First-order filters Ideal filters Practical filters

Overview of Discrete-Time Filters Discrete-Time Filters Overview First-order filters N M

An Examination of Bloom Filters and their Applications Jacob Honoroff March 16, 2006 Outline

Lecture #2: Advanced hashing and concentration bounds o Bloom filters o Cuckoo hashing o Load

Vectorized Bloom Filters for Advanced SIMD Processors Orestis Polychroniou Kenneth A. Ross

Filters (Bloom &amp; Quotient) CSCI 333 Operations Filters approximately represent sets.

Multiple NoSQL Use Cases with Redis Modules Kamran Yousaf kamran@redislabs.com About Redis Open

Redis 2.2 October 27 th 2010 Pieter Noordhuis Who am I? Live in Groningen, NL Redis

Redis Presentation by Atreyee Maiti What is redis? an in-memory key-value store, with

Creating a presentation on 529 plans Students conduct Internet research on 529 plans and create a

Update of Regenerative Reheat Furnace Conversions in North America Valerie Wentling The Timken

STEM and Blooms Kimberley Dempsey and Laura Ganley AccomplishedSTEM @gmail.com A BOUT K

BLOOMS TAXONOMY At the end of this workshop you will be able to: Explain what a Taxonomy

BEST PRACTICES FOR DESIGNING AND ASSESSING ONLINE DISCUSSION QUESTIONS Andrea McCourt, PhD

Climate and Harmful Algal Blooms in Lake Erie Richard P. Stumpf NOAA National Centers for

Harmful Algal Blooms and Domoic Acid: Latest Forecast and a Look Ahead to the Upcoming Season

A Brief History of Metals Speciation Method Development and Analysis in the Pacific Northwest

Filters (Bloom & Quotient) CSCI 333 Operations Filters approximately represent sets.