Bloom Filters and their Applications These slides were developed by - PDF document

Bloom Filters and their Applications These slides were developed by -- and used with permission from -- Shengquan Wang. CPSC 662 Introduction • Membership Query Given a set S={x 1 , x 2 , …, x n } on a universe U , want to answer the query of the form: Is y � S ? – Spell check • Data structure – Space – Search time x i can be a long string n can be a very large number • Hashing is one of the good candidates (randomized) 1

Hash Function • It converts an input from a (typically) large domain into an output in a (typically) smaller range H(x) 0 1 1 XXXXXXXXXXX 2 2 XXXXXXXXXXX 3 3 collision XXXXXXXXXXX 4 4 XXXXXXXXXXX 5 false positive XXXXXXXXXXX 6 7 7 y � H(y) ? Examples of Simple Hash Functions • Truncation : If students have an 9-digit identification number, take the last 3 digits as the table position – e.g. 925371622 becomes 622 • Folding: Split a 9-digit number into three 3-digit numbers, and add them – e.g. 925371622 becomes 925 + 376 + 622 = 1923 • Modular arithmetic: If the table size is 1000, the first example always keeps within the table range, but the second example does not (it should be mod 1000) – e.g. 1923 mod 1000 = 923 (1923 % 1000) 2

Hashing Performance • Hash each element of the set to b number of bits, with b = 2 log 2 n – The probability that two elements collide is 1/n 2 . – False positive probability = 1/n (Asymptotically vanishing probability of error) – Binary search time = O(log 2 n) – Space = � (n log 2 n) Bloom Filters • Generalized randomized data structure • Invented by Burton Bloom in 1970 • Basic idea: Use m -bit array to represent a set with n elements with k hashing functions • Bloom filter provides a answer in – “Constant” search time (time to hash). – Small amount of space. – But with some probability of being wrong B. Bloom, “ Space/time tradeoffs in hash coding with allowable errors,” CACM 13 (1970). 3

Example • Start with an m bit array, filled with 0 s B 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 • Hash each item x j � S into [1,…,m] , k number of times. If H i (x j ) = a � [1,…,m] , then set B[a] = 1 B 0 1 0 0 1 0 1 0 0 1 1 1 0 1 1 0 • To check if y � S , check if all H i (y) are ones B 0 1 0 0 1 0 1 0 0 1 1 1 0 1 1 0 • False positive: All H i (y) are ones, but y not in S B 0 1 0 0 1 0 1 0 0 1 1 1 0 1 1 0 Example Y 2 Y 3 X 2 Y 1 X 1 False Positive h 3 h 1 h 2 1 2 3 4 5 6 7 8 9 10 11 12 =m x 1 -> {2, 5, 9} x 2 -> {5, 7, 11} 4

Probabilities 1 0 1 0 0 1 1 1 0 1 1 0 • Notation: – n = number of elements in the set to be represented – m = size of the bloom filter – k = number of hash functions • Probability that a bit is still zero after all elements are hashed into the Bloom filter • Probability of a false positive Determining the value of k • Goal: Optimize k that minimizes false positive rate Optimal result: k = (ln 2)m/n � f = (0.6185) m/n • – m = number of bits in bloom filter – n = number of elements in the set 5

Example 0.1 m / n = 8 0.09 0.08 False positive rate 0.07 Opt k = 8 ln 2 = 5.45 ... 0.06 0.05 0.04 0.03 0.02 0.01 0 0 1 2 3 4 5 6 7 8 9 10 Hash functions Tradeoffs • Three parameters. – Size m / n : bits per item. – Time k : number of hash functions. – Error f : false positive probability. False positive probability decreases exponentially with linear increase in the number of hash functions & space 6

Comparison Hashing Bloom filters bit per element bit per element 2 log 2 n m/n (m/n = 8) space � (n log 2 n) n * (m/n) space false postive false postive rate (f) rate (f) 1/n (1-e –k n/m ) k ( � 0.02) Lookup time Lookup time O(log 2 n) O(k) k = 1 tradeoff between m/n and f Application: Distributed Caching • Send Bloom filters of URLs • False positives do not hurt much – Get errors from cache changes anyway Web Cache 1 Web Cache 2 Web Cache 3 Web Cache 4 Web Cache 5 Web Cache 6 L. Fan, P. Cao, J. Almeida and A.Z. Broder “Summary Cache: A scalable wide-area Web cache sharing protocol” IEEE/ACM Transactions on Networking 2000 7

Example http://www.perl.com/pub/a/2004/04/08/bloom_filters.html http://www.cs.wisc.edu/~cao/papers/summary-cache/node8.html http://www.flipcode.com/articles/article_bloomfilters.shtml http://loaf.cantbedone.org/about.htm http://www.cap-lore.com/code/BloomTheory.html http://www.eecs.harvard.edu/~michaelm/NEWWORK/postscripts/cbf2.pdf http://lemonodor.com/archives/000881.html http://citeseer.ist.psu.edu/mitzenmacher01compressed.html Application: Set Reconciliation for Content Delivery • Suppose two hosts A and B have S A and S B • A wants to know S A -S B so that it can send those documents to B, that B does not have • B sends Bloom filter corresponding to S B • A sends its documents which are not in that bloom filter • False positives: approximate J. Byers, J. Considine, M. Mitzenmacher, S. Rost, “Informed Content Delivery Across Adaptive Overlay Networks” SIGCOMM 2002 8

Application: Set Intersection for Keyword Search • Let H A , H B be hosts responsible for keywords A and B respectively • Suppose we want documents having both keywords A and B � FIND S A ∩ S B • Steps: – H A sends Bloom filter corresponding to S A to H B – H B computes approximate S A ∩ S B and sends back to H A • False positives : H A can find out, so no problem P. Reynolds and A. Vahdat, “Efficient Peer-to-peer keyword searching” Application: Moderate-sized P2P networks • Distributed hash tables for scalability • For moderate sized P2P network – per-node Bloom filter – Use 8 or 16 bits per object instead of 64 bit identifiers – False positives : Not much problem F. M. Cuena-Acuna, C. Peery, R. P. Martin, and T. D. Nguyen, “PlanetP: Using gossiping to build content addressable peer-to-peer information sharing communities.” 9

Application: Resource Routing • Network has tree topology. • B has bloom filters for all children S b , S f , S g , S h A sub -trees collectively and also for each child sub-tree individually. B C D E F G H I J K L M N S. Rhea and J. Kubiatowicz, “Probabilistic Location and Routing” INFOCOMM 2002 Application: Multicast • Typically routers maintain a list of interfaces for each multicast address • An Efficient Solution: Keep list of addresses for each interface and use Bloom filter to represent these addresses – Parallelizable • False Positives: Not bad, just wastes some resources B. Gronvall “Scalable Multicast Forwarding” SIGCOMM 2002 10

Application: Detecting Routing Loops • Current mechanism: TTL • Each packet contain a small Bloom filter to track the nodes visited – If filter does not change at a node, then a possible loop !! • False positives: Problematic A. Whitaker and D. Wetherall “Forwarding without Loops in Icarus” OPENARCH 2002 Application: IP Traceback • Use Bloom filters to record the packets seen by each router • False positives: – Router mistakenly identifies packet as having been seen – Multiple possible paths A.C. Snoeren, C. Partridge, L.A. Sanchez, C.E. Jones, F. Tchakountio, S.T.Kent and W.T. Strayer “Hash-based IP traceback” SIGCOMM 2001 11

Summary • The Bloom Filter Principle: Wherever a list or set is used, and space is a consideration, a Bloom filter should be considered. When using a Bloom filter, consider the potential effects of false positives. References • Space/time tradeoffs in hash coding with allowable errors. B. Bloom. CACM 13 (1970). • Network Applications of Bloom Filters: A Survey. A. Broder and M. Mitzenmacher. Allerton Conference 2002. • Compressed Bloom Filters. M. Mitzenmacher. PODC 2001 . • Spectral Bloom Filters. S. Cohen and Y. Matias. SIGMOD 2003. • The Bloomier Filter: An Efficient Data Structure for Static Support Lookup Tables. B. Chazelle, J. Kilian, R. Rubinfeld, and A. Tal. SODA 2004 12

Bloom Filters and their Applications These slides were developed by - PDF document

Bloom Filters and their Applications These slides were developed by -- and used with permission from -- Shengquan Wang. CPSC 662 Introduction Membership Query Given a set S={x 1 , x 2 , , x n } on a universe U , want to answer the query

Outline Bloom filters Applications of Bloom filters Our replacement for Bloom filters

Bloom Filters References A. Broder and M. Mitzenmacher, Network applications of Bloom A.

Bloom Filters Queries False-Positives Analysis Summary Anil Maheshwari anil@scs.carleton.ca

An Examination of Bloom Filters and their Applications Jacob Honoroff March 16, 2006 Outline

Bloom Filters Anna Karlin Most slides by Shreya Jayaraman, Luxi Wang, Alex Tsun Bloom Filters:

Revisiting Bloom Filters Payload attribution via Hierarchiecal Bloom Filters Kulesh

Overview of Discrete-Time Filters First-order filters Ideal filters Practical filters

Overview of Discrete-Time Filters Discrete-Time Filters Overview First-order filters N M

Lecture #2: Advanced hashing and concentration bounds o Bloom filters o Cuckoo hashing o Load

Vectorized Bloom Filters for Advanced SIMD Processors Orestis Polychroniou Kenneth A. Ross

Filters (Bloom & Quotient) CSCI 333 Operations Filters approximately represent sets.

Mayfield in Bloom 2019 Categories: Large Village Parish in Bloom Judging day 4th

Sampling and Reconstruction Using Bloom Filters Neha Sengupta 1 , Amitabha Bagchi 1 , Srikanta

Finite Impulse Response (FIR) Digital Filters Digital filters are rapidly replacing classic

room to bloom EUROPEAN ALTERNATIVES- 2020 EUROPEAN ALTERNATIVES- 2020 Summary ROOM TO BLOOM

AngularJS Unit Testing AngularJS Filters and Services with Karma & Jasmine Filters

Category-level localization Cordelia Schmid Recognition Classification Object

THE MILLERRABIN PRIMALITY TEST 1. Fast Modular Exponentiation Given positive integers a , e ,

Christian Folini / @ChrFolini Introducing the OWASP ModSecurity Core Rule Set 3.0 Seat Belts

Creating a trust-group for security information sharing (in Asia Pacific?) Romain Wartel, ISGC

Bug Driven Bug Finding Chadd C. Williams Jeffrey K. Hollingsworth University of Maryland

CS 5410 - Computer and Network Security: Intrusion Detection Professor Kevin Butler Fall 2015

Lab 8: Firewalls & Intrusion Detection Systems Fengwei Zhang SUSTech CS 315 Computer

Interest Points Computer Vision Jia-Bin Huang, Virginia Tech Many slides from N Snavely, K.

Bloom Filters and their Applications These slides were developed by - PDF document

Bloom Filters and their Applications These slides were developed by -- and used with permission from -- Shengquan Wang. CPSC 662 Introduction Membership Query Given a set S={x 1 , x 2 , , x n } on a universe U , want to answer the query

Outline Bloom filters Applications of Bloom filters Our replacement for Bloom filters

Bloom Filters References A. Broder and M. Mitzenmacher, Network applications of Bloom A.

Bloom Filters Queries False-Positives Analysis Summary Anil Maheshwari anil@scs.carleton.ca

An Examination of Bloom Filters and their Applications Jacob Honoroff March 16, 2006 Outline

Bloom Filters Anna Karlin Most slides by Shreya Jayaraman, Luxi Wang, Alex Tsun Bloom Filters:

Revisiting Bloom Filters Payload attribution via Hierarchiecal Bloom Filters Kulesh

Overview of Discrete-Time Filters First-order filters Ideal filters Practical filters

Overview of Discrete-Time Filters Discrete-Time Filters Overview First-order filters N M

Lecture #2: Advanced hashing and concentration bounds o Bloom filters o Cuckoo hashing o Load

Vectorized Bloom Filters for Advanced SIMD Processors Orestis Polychroniou Kenneth A. Ross

Filters (Bloom &amp; Quotient) CSCI 333 Operations Filters approximately represent sets.

Mayfield in Bloom 2019 Categories: Large Village Parish in Bloom Judging day 4th

Sampling and Reconstruction Using Bloom Filters Neha Sengupta 1 , Amitabha Bagchi 1 , Srikanta

Finite Impulse Response (FIR) Digital Filters Digital filters are rapidly replacing classic

room to bloom EUROPEAN ALTERNATIVES- 2020 EUROPEAN ALTERNATIVES- 2020 Summary ROOM TO BLOOM

AngularJS Unit Testing AngularJS Filters and Services with Karma &amp; Jasmine Filters

Category-level localization Cordelia Schmid Recognition Classification Object

THE MILLERRABIN PRIMALITY TEST 1. Fast Modular Exponentiation Given positive integers a , e ,

Christian Folini / @ChrFolini Introducing the OWASP ModSecurity Core Rule Set 3.0 Seat Belts

Creating a trust-group for security information sharing (in Asia Pacific?) Romain Wartel, ISGC

Bug Driven Bug Finding Chadd C. Williams Jeffrey K. Hollingsworth University of Maryland

CS 5410 - Computer and Network Security: Intrusion Detection Professor Kevin Butler Fall 2015

Lab 8: Firewalls &amp; Intrusion Detection Systems Fengwei Zhang SUSTech CS 315 Computer

Interest Points Computer Vision Jia-Bin Huang, Virginia Tech Many slides from N Snavely, K.

Filters (Bloom & Quotient) CSCI 333 Operations Filters approximately represent sets.

AngularJS Unit Testing AngularJS Filters and Services with Karma & Jasmine Filters

Lab 8: Firewalls & Intrusion Detection Systems Fengwei Zhang SUSTech CS 315 Computer