tight lower bound for comparison based quantile summaries
play

Tight Lower Bound for Comparison-Based Quantile Summaries Pavel - PowerPoint PPT Presentation

Tight Lower Bound for Comparison-Based Quantile Summaries Pavel Vesel y University of Warwick 8 April 2020 Based on joint work with Graham Cormode (Warwick) Powered by Beamer i k Z Overview of the talk & Quantiles & Distributions


  1. Tight Lower Bound for Comparison-Based Quantile Summaries Pavel Vesel´ y University of Warwick 8 April 2020 Based on joint work with Graham Cormode (Warwick) Powered by Beamer i k Z

  2. Overview of the talk & Quantiles & Distributions Big Data Algorithms 1 0 . 5 Streaming Model 0 median Pavel Vesel´ y Tight Lower Bound for Quantile Summaries 1 / 10

  3. Motivation: Monitoring Latencies of Web Requests Source: C. Masson, J.E. Rim, and H.K. Lee. Ddsketch: A fast and fully-mergeable quantile sketch with relative-error guarantees. PVLDB, 12(12):2195–2205, 2019. Pavel Vesel´ y Tight Lower Bound for Quantile Summaries 2 / 10

  4. Motivation: Monitoring Latencies of Web Requests Source: C. Masson, J.E. Rim, and H.K. Lee. Ddsketch: A fast and fully-mergeable quantile sketch with relative-error guarantees. PVLDB, 12(12):2195–2205, 2019. Millions of observations • no need to store all observed latencies Pavel Vesel´ y Tight Lower Bound for Quantile Summaries 2 / 10

  5. Motivation: Monitoring Latencies of Web Requests Source: C. Masson, J.E. Rim, and H.K. Lee. Ddsketch: A fast and fully-mergeable quantile sketch with relative-error guarantees. PVLDB, 12(12):2195–2205, 2019. Millions of observations • no need to store all observed latencies How does the distribution look like? What is the median latency? Pavel Vesel´ y Tight Lower Bound for Quantile Summaries 2 / 10

  6. Motivation: Monitoring Latencies of Web Requests Source: C. Masson, J.E. Rim, and H.K. Lee. Ddsketch: A fast and fully-mergeable quantile sketch with relative-error guarantees. PVLDB, 12(12):2195–2205, 2019. Millions of observations • no need to store all observed latencies How does the distribution look like? What is the median latency? • Average latency too high due to ∼ 2% of very high latencies Pavel Vesel´ y Tight Lower Bound for Quantile Summaries 2 / 10

  7. Streaming Model Motivation: monitoring latencies of requests Pavel Vesel´ y Tight Lower Bound for Quantile Summaries 3 / 10

  8. Streaming Model Motivation: monitoring latencies of requests Streaming model = one pass over data & limited memory Pavel Vesel´ y Tight Lower Bound for Quantile Summaries 3 / 10

  9. Streaming Model Motivation: monitoring latencies of requests Streaming model = one pass over data & limited memory Streaming algorithm • receives data in a stream, item by item • uses memory sublinear in N = stream length • at the end, computes the answer Pavel Vesel´ y Tight Lower Bound for Quantile Summaries 3 / 10

  10. Streaming Model Motivation: monitoring latencies of requests Streaming model = one pass over data & limited memory Streaming algorithm • receives data in a stream, item by item • uses memory sublinear in N = stream length • at the end, computes the answer Challenges: • N very large & not known • Data independent • Stream ordered arbitrarily • No random access to data Pavel Vesel´ y Tight Lower Bound for Quantile Summaries 3 / 10

  11. Streaming Model Motivation: monitoring latencies of requests Streaming model = one pass over data & limited memory Streaming algorithm • receives data in a stream, item by item • uses memory sublinear in N = stream length • at the end, computes the answer Challenges: • N very large & not known • Data independent • Stream ordered arbitrarily • No random access to data Main objective: space Pavel Vesel´ y Tight Lower Bound for Quantile Summaries 3 / 10

  12. Streaming Model Motivation: monitoring latencies of requests Streaming model = one pass over data & limited memory Streaming algorithm • receives data in a stream, item by item • uses memory sublinear in N = stream length • at the end, computes the answer Challenges: • N very large & not known • Data independent • Stream ordered arbitrarily • No random access to data Main objective: space How to summarize the input? Pavel Vesel´ y Tight Lower Bound for Quantile Summaries 3 / 10

  13. Selection Problem & Streaming • Input: stream of N numbers • Goal: find the k -th smallest • e.g.: the median, 99th percentile • O ( N ) time offline algorithm [Blum et al. ’73] Pavel Vesel´ y Tight Lower Bound for Quantile Summaries 4 / 10

  14. Selection Problem & Streaming • Input: stream of N numbers • Goal: find the k -th smallest • e.g.: the median, 99th percentile • O ( N ) time offline algorithm [Blum et al. ’73] • Streaming restrictions: • just one pass over the data • limited memory: o ( N ) Pavel Vesel´ y Tight Lower Bound for Quantile Summaries 4 / 10

  15. Selection Problem & Streaming • Input: stream of N numbers • Goal: find the k -th smallest • e.g.: the median, 99th percentile • O ( N ) time offline algorithm [Blum et al. ’73] • Streaming restrictions: • just one pass over the data • limited memory: o ( N ) No streaming algorithm for exact selection Ω( N ) space needed to find the median [Munro & Paterson ’80, Guha & McGregor ’07] Pavel Vesel´ y Tight Lower Bound for Quantile Summaries 4 / 10

  16. Selection Problem & Streaming • Input: stream of N numbers • Goal: find the k -th smallest • e.g.: the median, 99th percentile • O ( N ) time offline algorithm [Blum et al. ’73] • Streaming restrictions: • just one pass over the data • limited memory: o ( N ) No streaming algorithm for exact selection Ω( N ) space needed to find the median [Munro & Paterson ’80, Guha & McGregor ’07] What about finding an approximate median? Pavel Vesel´ y Tight Lower Bound for Quantile Summaries 4 / 10

  17. Approximate Median & Quantiles How to define an approximate median? Pavel Vesel´ y Tight Lower Bound for Quantile Summaries 5 / 10

  18. Approximate Median & Quantiles How to define an approximate median? φ -quantile = ⌈ φ · N ⌉ -th smallest element ( φ ∈ [0 , 1]) • Median = .5-quantile Pavel Vesel´ y Tight Lower Bound for Quantile Summaries 5 / 10

  19. Approximate Median & Quantiles How to define an approximate median? Sorted data φ -quantile = ⌈ φ · N ⌉ -th smallest element ( φ ∈ [0 , 1]) • Median = .5-quantile • Quartiles = .25, .5, and .75-quantiles • Percentiles = .01, .02, . . . , .99-quantiles .25-quantile median .75-quantile Pavel Vesel´ y Tight Lower Bound for Quantile Summaries 5 / 10

  20. Approximate Median & Quantiles How to define an approximate median? Sorted data φ -quantile = ⌈ φ · N ⌉ -th smallest element ( φ ∈ [0 , 1]) • Median = .5-quantile • Quartiles = .25, .5, and .75-quantiles • Percentiles = .01, .02, . . . , .99-quantiles .25-quantile median .75-quantile ε -approximate φ -quantile = any φ ′ -quantile for φ ′ = [ φ − ε, φ + ε ] • . 01-approximate medians are . 49- and . 51-quantiles (and items in between) Pavel Vesel´ y Tight Lower Bound for Quantile Summaries 5 / 10

  21. Approximate Median & Quantiles How to define an approximate median? Sorted data φ -quantile = ⌈ φ · N ⌉ -th smallest element ( φ ∈ [0 , 1]) • Median = .5-quantile • Quartiles = .25, .5, and .75-quantiles • Percentiles = .01, .02, . . . , .99-quantiles .25-quantile median .75-quantile ε -approximate φ -quantile = any φ ′ -quantile for φ ′ = [ φ − ε, φ + ε ] • . 01-approximate medians are . 49- and . 51-quantiles (and items in between) ε -approximate selection: • query k -th smallest → return k ′ -th smallest for k ′ = k ± ε N Pavel Vesel´ y Tight Lower Bound for Quantile Summaries 5 / 10

  22. Approximate Median & Quantiles How to define an approximate median? Sorted data φ -quantile = ⌈ φ · N ⌉ -th smallest element ( φ ∈ [0 , 1]) • Median = .5-quantile • Quartiles = .25, .5, and .75-quantiles • Percentiles = .01, .02, . . . , .99-quantiles .25-quantile median .75-quantile ε -approximate φ -quantile = any φ ′ -quantile for φ ′ = [ φ − ε, φ + ε ] • . 01-approximate medians are . 49- and . 51-quantiles (and items in between) ε -approximate selection: • query k -th smallest → return k ′ -th smallest for k ′ = k ± ε N Offline summary: sort data & select ∼ 1 2 ε items R min. 2 ε -quantile 4 ε -quantile . . . (0-quantile) Pavel Vesel´ y Tight Lower Bound for Quantile Summaries 5 / 10

  23. ε -Approximate Quantile Summaries Data structure with two operations: • Update( x ) : x = new item from the stream Pavel Vesel´ y Tight Lower Bound for Quantile Summaries 6 / 10

  24. ε -Approximate Quantile Summaries Data structure with two operations: • Update( x ) : x = new item from the stream • Quantile Query( φ ) : For φ ∈ [0 , 1], return ε -approximate φ -quantile Pavel Vesel´ y Tight Lower Bound for Quantile Summaries 6 / 10

  25. ε -Approximate Quantile Summaries Data structure with two operations: • Update( x ) : x = new item from the stream • Quantile Query( φ ) : For φ ∈ [0 , 1], return ε -approximate φ -quantile Additional operations: • Rank Query( x ) : • For item x , determine its rank = position in the ordering of the input Pavel Vesel´ y Tight Lower Bound for Quantile Summaries 6 / 10

Recommend


More recommend