fast prefix search in little space with applications
play

Fast Prefix Search in Little Space, with Applications Djamal - PowerPoint PPT Presentation

Fast Prefix Search in Little Space, with Applications Djamal Belazzougui Paolo Boldi Rasmus Pagh Sebastiano Vigna ESA 2010 1 Talk overview 2 2 Talk overview 1. What? 2. Why? 3. What else? 4. How? 5. Then what? 2 2 1. What . 3 3


  1. Fast Prefix Search in Little Space, with Applications Djamal Belazzougui Paolo Boldi Rasmus Pagh Sebastiano Vigna ESA 2010 1

  2. Talk overview 2 2

  3. Talk overview 1. What? 2. Why? 3. What else? 4. How? 5. Then what? 2 2

  4. 1. What . 3 3

  5. 1. What ✤ Standard (RAM) model, word size w. ✤ Static set S of n strings . ✤ Prefix query : Given a string p, what strings in S have p as a prefix? ‣ Report all matching strings. 3 3

  6. 1. What ✤ Standard (RAM) model, word size w. ✤ Static set S of n strings . ✤ Prefix query : Given a string p, what strings in S have p as a prefix? ranks of ‣ Report all matching strings. ‣ Index: Assume strings stored sorted. 3 3

  7. 1. What ✤ Standard (RAM) model, word size w. ✤ Static set S of n strings , w bits each. ✤ Prefix query : Given a string p, what strings in S have p as a prefix? ranks of ‣ Report all matching strings. ‣ Index: Assume strings stored sorted. 3 3

  8. 2. Why? 4 4

  9. 2. Why? ALGO Liverp* 4 4

  10. 2. Why? ✤ OLAP in a nutshell: ‣ Dimensions D = Set<rooted tree>. ‣ FactTable F = List<node from each D, number>. ‣ Query : Given subtrees of D, sum up the numbers in F where all nodes are contained in the subtrees. 5 5

  11. 2. Why? slow memory fast memory index data (sorted) 6 6

  12. 3. What else? ✤ Special case of range query ‣ return rank S ([a;b]) ✤ Generalizes point query ‣ return rank S ({x}) ✤ No easier than existence queries ‣ return S ∩ [a;b] ≠ ∅ 7 7

  13. Results on query time (space O(nw) bits) range 8 8

  14. Results on query time (space O(nw) bits) existence rank point range 8 8

  15. Results on query time (space O(nw) bits) existence rank O(1) point [FKS, FOCS ’82] range 8 8

  16. Results on query time (space O(nw) bits) existence rank O(1) point O(log w) [FKS, FOCS ’82] [vEB, FOCS ’75] Time-Space Trade-Offs for Predecessor Search Ω (log w) ∗ (Extended Abstract) range Mihai Pˇ atras ¸cu Mikkel Thorup [PT, STOC ‘06] mip@mit.edu mthorup@research.att.com ABSTRACT Categories and Subject Descriptors We develop a new technique for proving cell-probe lower F.2.3 [ Tradeo ff s between Complexity Measures ]; E.2 bounds for static data structures. Previous lower bounds [ Data Storage Representations ] used a reduction to communication games, which was known not to be tight by counting arguments. We give the first General Terms lower bound for an explicit problem which breaks this com- Algorithms, Performance, Theory munication complexity barrier. In addition, our bounds give the first separation between polynomial and near linear 8 space. Such a separation is inherently impossible by com- Keywords munication complexity. predecessor search, cell-probe complexity, lower bounds Using our lower bound technique and new upper bound constructions, we obtain tight bounds for searching pre- 8

  17. Results on query time (space O(nw) bits) existence rank O(1) point O(log w) [FKS, FOCS ’82] [vEB, FOCS ’75] Optimal Static Range Reporting in One Dimension Ω (log w) O(1) ∗ † ∗ Stephen Alstrup Gerth Stølting Brodal Theis Rauhe range ‡ [PT, STOC ‘06] BRICS The IT University of The IT University of Dept. of Computer Science Copenhagen Copenhagen [ABR, STOC ’01] University of Aarhus stephen@it-c.dk theis@it-c.dk gerth@brics.dk ABSTRACT FindAny ( a, b ) , a, b ∈ U : Report any element in S ∩ [ a, b ] or ⊥ if there is no such element. We consider static one dimensional range searching prob- lems. These problems are to build static data structures for Report ( a, b ) , a, b ∈ U : Report all elements in S ∩ [ a, b ]. an integer set S ⊆ U , where U = { 0 , 1 , . . . , 2 w − 1 } , which Count ε ( a, b ) , a, b ∈ U, ε ≥ 0: Return an integer k such that support various queries for integer intervals of U . For the | S ∩ [ a, b ] | ≤ k ≤ (1 + ε ) | S ∩ [ a, b ] | . query of reporting all integers in S contained within a query interval, we present an optimal data structure with linear We let n denote the size of S and let u = 2 w denote the size space cost and with query time linear in the number of inte- of universe U . Our main result is a static data structure gers reported. This result holds in the unit cost RAM model with space cost O( n ) that supports the query FindAny in with word size w and a standard instruction set. We also 8 constant time. As a corollary, the data structure allows present a linear space data structure for approximate range Report in time O( k ), where k is the number of elements to counting. A range counting query for an interval returns be reported. the number of integers in S contained within the interval. Furthermore, we give linear space structures for the ap- For any constant ε > 0, our range counting data structure proximate range counting problem. We present a data struc- returns in constant time an approximate answer which is ture that uses space O( n ) and supports Count ε in constant within a factor of at most 1 + ε of the correct answer. 8

  18. Results on query time (space O(nw) bits) existence rank O(1) point O(log w) [FKS, FOCS ’82] [vEB, FOCS ’75] Ω (log w) O(1) range [PT, STOC ‘06] [ABR, STOC ’01] 8 8

  19. Weak queries ✤ Guarantee output only on some inputs ‣ Rank of prefixes of strings in S, in O(1) time [ABR ’01]. ‣ Represent a function with domain S, Optimal Static Range Reporting in One Dimension without storing S [SS ‘89], [CKRT, ’04]. † ∗ ∗ Stephen Alstrup Gerth Stølting Brodal Theis Rauhe ‡ BRICS The IT University of The IT University of ‣ Rank of any string in S, using O(n log log w) Dept. of Computer Science Copenhagen Copenhagen University of Aarhus stephen@it-c.dk theis@it-c.dk The Bloomier Filter: An E ffi cient Data Structure for Static Support gerth@brics.dk bits of space [BBPV ‘09]. Lookup Tables ∗ ABSTRACT FindAny ( a, b ) , a, b ∈ U : Report any element in S ∩ [ a, b ] or Bernard Chazelle † Joe Kilian ‡ Ronitt Rubinfeld ‡ Ayellet Tal § ⊥ if there is no such element. We consider static one dimensional range searching prob- lems. These problems are to build static data structures for Report ( a, b ) , a, b ∈ U : Report all elements in S ∩ [ a, b ]. Monotone Minimal Perfect Hashing: an integer set S ⊆ U , where U = { 0 , 1 , . . . , 2 w − 1 } , which Count ε ( a, b ) , a, b ∈ U, ε ≥ 0: Return an integer k such that support various queries for integer intervals of U . For the Searching a Sorted Table with O (1) Accesses “Oh boy, here is another David Nelson” the problem was due to name-matching technology used | S ∩ [ a, b ] | ≤ k ≤ (1 + ε ) | S ∩ [ a, b ] | . query of reporting all integers in S contained within a query Ticket Agent, Los Angeles Airport by airlines.” interval, we present an optimal data structure with linear We let n denote the size of S and let u = 2 w denote the size (Source: BBC News) This story illustrates a common problem that arises Paolo Boldi † Rasmus Pagh ‡ Sebastiano Vigna † space cost and with query time linear in the number of inte- Djamal Belazzougui ∗ of universe U . Our main result is a static data structure 9 when one tries to balance false negatives and false gers reported. This result holds in the unit cost RAM model with space cost O( n ) that supports the query FindAny in Abstract positives: if one is unwilling to accept any false negatives with word size w and a standard instruction set. We also constant time. As a corollary, the data structure allows whatsoever, one often pays with a high false positive present a linear space data structure for approximate range We introduce the Bloomier filter , a data structure for Report in time O( k ), where k is the number of elements to Abstract studied in the last years, leading to fundamental the- rate. Ideally, one would like to adjust one’s system 9

  20. Weak queries ✤ Guarantee output only on some inputs ‣ Rank of prefixes of strings in S, in O(1) time [ABR ’01]. ‣ Represent a function with domain S, without storing S [SS ‘89], [CKRT, ’04]. ‣ Rank of any string in S, using O(n log log w) The Bloomier Filter: An E ffi cient Data Structure for Static Support bits of space [BBPV ‘09]. Lookup Tables ∗ Bernard Chazelle † Joe Kilian ‡ Ronitt Rubinfeld ‡ Ayellet Tal § Monotone Minimal Perfect Hashing: Searching a Sorted Table with O (1) Accesses “Oh boy, here is another David Nelson” the problem was due to name-matching technology used Ticket Agent, Los Angeles Airport by airlines.” (Source: BBC News) This story illustrates a common problem that arises Paolo Boldi † Rasmus Pagh ‡ Sebastiano Vigna † Djamal Belazzougui ∗ 9 when one tries to balance false negatives and false Abstract positives: if one is unwilling to accept any false negatives whatsoever, one often pays with a high false positive We introduce the Bloomier filter , a data structure for Abstract studied in the last years, leading to fundamental the- rate. Ideally, one would like to adjust one’s system 9

  21. Weak queries ✤ Guarantee output only on some inputs ‣ Rank of prefixes of strings in S, in O(1) time [ABR ’01]. ‣ Represent a function with domain S, without storing S [SS ‘89], [CKRT, ’04]. ‣ Rank of any string in S, using O(n log log w) bits of space [BBPV ‘09]. Monotone Minimal Perfect Hashing: Searching a Sorted Table with O (1) Accesses Paolo Boldi † Rasmus Pagh ‡ Sebastiano Vigna † Djamal Belazzougui ∗ 9 Abstract studied in the last years, leading to fundamental the- 9

Recommend


More recommend