semi joins and bloom join
play

Semi-Joins and Bloom Join Databases: The Complete Book Ch 20 1 - PowerPoint PPT Presentation

Semi-Joins and Bloom Join Databases: The Complete Book Ch 20 1 Practical Concerns UNION R 1 S 1 R 1 S 2 R 2 S 1 R N S M R 1 R 2 R N S 1 S 2 S M 2 Practical Concerns UNION R 1 S 1 R 1 S 2 R 2 S 1 R N


  1. Semi-Joins and Bloom Join Databases: The Complete Book Ch 20 1

  2. Practical Concerns UNION R 1 ⋈ S 1 R 1 ⋈ S 2 R 2 ⋈ S 1 R N ⋈ S M R 1 R 2 R N S 1 S 2 S M … … 2

  3. Practical Concerns UNION R 1 ⋈ S 1 R 1 ⋈ S 2 R 2 ⋈ S 1 R N ⋈ S M R 1 R 2 R N S 1 S 2 S M … … Where does the computation happen? How does the data get there? 2

  4. Distributing the Work Let’s start simple… what can we do with no partitions? ⋈ B R S 3

  5. Distributing the Work Let’s start simple… what can we do with no partitions? ⋈ B R S R and S may be any RA expression… 3

  6. Distributing the Work ⋈ B R S Node 1 4

  7. Distributing the Work ⋈ B R S Node 1 No Parallelism! 4

  8. Distributing the Work ⋈ B Node 3 R S Node 1 Node 2 5

  9. Distributing the Work All of R and ⋈ B All of S Node 3 get sent! R S Node 1 Node 2 Lots of Data Transfer! 5

  10. Distributing the Work ⋈ B R S Node 1 Node 2 6

  11. Distributing the Work All of R get sent ⋈ B R S Node 1 Node 2 Better! We can guess whether R or S is smaller. 6

  12. Distributing the Work What can we do if R is partitioned? U ⋈ B ⋈ B R 1 S R 2 7

  13. Distributing the Work There are lots of partitioning strategies, but this one is interesting…. U ⋈ B ⋈ B R 1 S R 2 Node 1 Node 2 Node 3 8

  14. Distributing the Work … it can be used as a model for partitioning S… U ⋈ B ⋈ B R 1 S 1 R 2 Node 1 Node 2 Node 3 9

  15. Distributing the Work … it can be used as a model for partitioning S… U ⋈ B ⋈ B R 1 S 2 R 2 Node 1 Node 2 Node 3 10

  16. Distributing the Work …and neatly captures the data transfer issue. U ⋈ B ⋈ B R 1 S R 2 Node 1 Node 2 Node 3 11

  17. Distributing the Work So let’s use it: S i joins with R 1 ,R 2 ,…,R N locally. 12

  18. Distributing the Work So let’s use it: S i joins with R 1 ,R 2 ,…,R N locally. Goal : Minimize amount of data sent from R k to S i 12

  19. Distributing the Work So let’s use it: S i joins with R 1 ,R 2 ,…,R N locally. Goal : Minimize amount of data sent from R k to S i Solution 1 : Use the partitioning strategy (like last lecture) 12

  20. Distributing the Work So let’s use it: S i joins with R 1 ,R 2 ,…,R N locally. Goal : Minimize amount of data sent from R k to S i Solution 1 : Use the partitioning strategy (like last lecture) Solution 2 : “Hints” to figure out what R k should send 12

  21. Sending Hints R k ⋈ B S i The naive approach… Node 1 Node 2 R k S i 13

  22. Sending Hints R k ⋈ B S i The naive approach… Node 1 Node 2 Send me R k R k S i 14

  23. Sending Hints R k ⋈ B S i The naive approach… R k Node 1 Node 2 R k S i 15

  24. Sending Hints R k ⋈ B S i The smarter approach… π B ( ) S i Node 1 Node 2 R k S i 16

  25. Sending Hints R k ⋈ B S i The smarter approach… π B ( ) S i Node 1 Node 2 ⋈ π B ( ) R k S i R k S i 17

  26. Sending Hints R k ⋈ B S i The smarter approach… Node 1 Node 2 <1,A> <2,X> <2,B> <3,Y> <2,C> <6,Y> <3,D> <4,E> 18

  27. Sending Hints R k ⋈ B S i The smarter approach… Node 1 Node 2 Send me rows <1,A> <2,X> with a ‘B’ of <2,B> <3,Y> 2,3, or 6 <2,C> <6,Y> <3,D> <4,E> 19

  28. Sending Hints R k ⋈ B S i The smarter approach… Node 1 Node 2 <2,B> Send me rows <1,A> <2,X> <2,C> with a ‘B’ of <2,B> <3,Y> 2,3, or 6 <3,D> <2,C> <6,Y> <3,D> <4,E> 20

  29. Sending Hints R k ⋈ B S i The smarter approach… Node 1 Node 2 <2,B> Send me rows <1,A> <2,X> <2,C> with a ‘B’ of <2,B> <3,Y> 2,3, or 6 <3,D> <2,C> <6,Y> <3,D> This is called a semi-join. <4,E> 20

  30. Sending Hints Now Node 1 sends as little data as possible… … but Node 2 needs to send a lot of data. Can we do better? 21

  31. Sending Hints R k ⋈ B S i Strategy 1 : Parity Bits Node 1 Node 2 <1,A> <2,X> 1 0 <2,B> <6,Y> 0 0 <2,C> 0 <3,D> 1 <4,E> 0 22

  32. Sending Hints R k ⋈ B S i Strategy 1 : Parity Bits Node 1 Node 2 Send me data <1,A> <2,X> 1 0 with a parity <2,B> <6,Y> 0 0 bit of ‘0’ <2,C> 0 <3,D> 1 <4,E> 0 23

  33. Sending Hints R k ⋈ B S i Strategy 1 : Parity Bit Node 1 Node 2 <2,B> Send me data <1,A> <2,X> 1 0 <2,C> with a parity <2,B> <6,Y> 0 0 <4,E> bit of ‘0’ <2,C> 0 <3,D> 1 <4,E> 0 24

  34. Sending Hints R k ⋈ B S i Strategy 1 : Parity Bit Node 1 sending too much is ok! (Node 2 still needs to compute ⋈ B ) Node 1 Node 2 <2,B> Send me data <1,A> <2,X> 1 0 <2,C> with a parity <2,B> <6,Y> 0 0 <4,E> bit of ‘0’ <2,C> 0 <3,D> 1 <4,E> 0 24

  35. Sending Hints R k ⋈ B S i Strategy 1 : Parity Bit Node 1 sending too much is ok! (Node 2 still needs to compute ⋈ B ) Node 1 Node 2 <2,B> Send me data <1,A> <2,X> 1 0 <2,C> with a parity <2,B> <6,Y> 0 0 <4,E> bit of ‘0’ <2,C> 0 <3,D> 1 Problem: One parity bit is too little <4,E> 0 24

  36. Sending Hints R k ⋈ B S i Strategy 1 : Parity Bit Node 1 Node 2 <1,A> <2,X> 1 0 <2,B> <3,Y> 0 1 <2,C> <6,Y> 0 0 <3,D> 1 Problem: One parity bit is too little <4,E> 0 25

  37. Sending Hints R k ⋈ B S i Strategy 2 : Parity Bits Node 1 Node 2 <2,B> Send me data <1,A> <2,X> 01 10 <2,C> with parity <2,B> 11 <3,Y> 10 <3,D> bits 10 or 11 <2,C> <6,Y> 10 10 <3,D> 11 <4,E> 00 26

  38. Sending Hints R k ⋈ B S i Strategy 2 : Parity Bits Node 1 Node 2 <2,B> Send me data <1,A> <2,X> 01 10 <2,C> with parity <2,B> 11 <3,Y> 10 <3,D> bits 10 or 11 <2,C> <6,Y> 10 10 <3,D> 11 Problem: Almost as much data as π B <4,E> 00 26

  39. Sending Hints Can we summarize the parity bits? 27

  40. Bloom Filters Alice Bob Carol Dave 28

  41. Bloom Filters Alice Bob Bloom Filter Carol Dave 29

  42. Bloom Filters Is Alice part of the set? Alice Bob Bloom Filter Carol Dave 30

  43. Bloom Filters Is Alice part of the set? Yes Alice Bob Bloom Filter Carol Dave 30

  44. Bloom Filters Is Alice part of the set? Yes Alice Bob Is Eve part of Bloom Filter the set? Carol Dave 30

  45. Bloom Filters Is Alice part of the set? Yes Alice Bob Is Eve part of Bloom No Filter the set? Carol Dave 30

  46. Bloom Filters Is Alice part of the set? Yes Alice Bob Is Eve part of Bloom No Filter the set? Carol Dave Is Fred part of the set? 30

  47. Bloom Filters Is Alice part of the set? Yes Alice Bob Is Eve part of Bloom No Filter the set? Carol Dave Yes Is Fred part of the set? 30

  48. Bloom Filters Is Alice part of the set? Yes Alice Bob Is Eve part of Bloom No Filter the set? Carol Dave Yes Bloom Filter Guarantee Is Fred part Test definitely returns Yes if the element is in the set of the set? Test usually returns No if the element is not in the set 30

  49. Bloom Filters A Bloom Filter is a bit vector M - # of bits in the bit vector K - # of hash functions For ONE key (or record): For i between 0 and K: bitvector[ hash i (key) % M ] = 1 31

  50. Bloom Filters A Bloom Filter is a bit vector M - # of bits in the bit vector K - # of hash functions For ONE key (or record): For i between 0 and K: bitvector[ hash i (key) % M ] = 1 Each bit vector has ~K bits set 31

  51. Bloom Filters Filters are combined Key 1 00101010 by Bitwise-OR e.g. (Key 1 | Key 2) Key 2 01010110 = 01111110 Key 3 10000110 Key 4 01001100 32

  52. Bloom Filters Filters are combined Key 1 00101010 by Bitwise-OR e.g. (Key 1 | Key 2) Key 2 01010110 = 01111110 Key 3 10000110 How do we test for inclusion? Key 4 01001100 32

  53. Bloom Filters Filters are combined Key 1 00101010 by Bitwise-OR e.g. (Key 1 | Key 2) Key 2 01010110 = 01111110 Key 3 10000110 How do we test for inclusion? (Key & Filter) == Key? Key 4 01001100 (Key 1 & S) = 00101010 (Key 3 & S) = 00000110 (Key 4 & S) = 01001100 32

  54. Bloom Filters Filters are combined Key 1 00101010 by Bitwise-OR e.g. (Key 1 | Key 2) Key 2 01010110 = 01111110 Key 3 10000110 How do we test for inclusion? (Key & Filter) == Key? √ Key 4 01001100 (Key 1 & S) = 00101010 (Key 3 & S) = 00000110 (Key 4 & S) = 01001100 32

  55. Bloom Filters Filters are combined Key 1 00101010 by Bitwise-OR e.g. (Key 1 | Key 2) Key 2 01010110 = 01111110 Key 3 10000110 How do we test for inclusion? (Key & Filter) == Key? √ Key 4 01001100 (Key 1 & S) = 00101010 X (Key 3 & S) = 00000110 (Key 4 & S) = 01001100 32

Recommend


More recommend