moment based quantile sketches for efficient aggregation
play

Moment-Based Quantile Sketches for Efficient Aggregation Queries - PowerPoint PPT Presentation

Moment-Based Quantile Sketches for Efficient Aggregation Queries Edward Gan, Jialin Ding, Kai Sheng Tai, Vatsal Sharan, Peter Bailis Stanford University 1 Motivation: Monitoring production data streams Billions of events / day of mobile app


  1. Moment-Based Quantile Sketches for Efficient Aggregation Queries Edward Gan, Jialin Ding, Kai Sheng Tai, Vatsal Sharan, Peter Bailis Stanford University 1

  2. Motivation: Monitoring production data streams Billions of events / day of mobile app telemetry data Android iOS Query for 99-th percentile Group By Operating system p99 latency p99 latency Where Location = USA Quantile Query time time Spike in response latency, need to issue queries: Percentiles are targeted: single metric, for specific sub-populations 2

  3. <latexit sha1_base64="eIC5UG1f97dR+q+EpifAbGjpKek=">ACKXicbZBNS8NAEIY3ftb6VfXoZbGI9WBJiqAXoSiIRwWrQlPLZLtpl242YXcilNC/48W/4kVBUa/+EbdtDn4NLDy87wyz8waJFAZd92Zmp6ZnZsvLBQXl5ZXVktr61cmTjXjDRbLWN8EYLgUijdQoOQ3ieYQBZJfB/2TkX9x7URsbrEQcJbEXSVCAUDtFK7VPexF26BGdEN2jfqiBZb6CQEJ2OqxMjN1hLt3WvovtUtmtuOif8HLoUzyOm+Xnv1OzNKIK2QSjGl6boKtDQKJvmw6KeGJ8D60OVNiwoiblrZ+NIh3bZKh4axtk8hHavfJzKIjBlEge2MAHvmtzcS/OaKYaHrUyoJEWu2GRmEqKMR3FRjtCc4ZyYAGYFvavlPXA5oQ23KINwft98l+4qlU9t+pd7Jfrx3kcBbJtkiFeOSA1MkZOScNwsg9eSQv5NV5cJ6cN+dj0jrl5DMb5Ec5n1+r2qZO</latexit> <latexit sha1_base64="eIC5UG1f97dR+q+EpifAbGjpKek=">ACKXicbZBNS8NAEIY3ftb6VfXoZbGI9WBJiqAXoSiIRwWrQlPLZLtpl242YXcilNC/48W/4kVBUa/+EbdtDn4NLDy87wyz8waJFAZd92Zmp6ZnZsvLBQXl5ZXVktr61cmTjXjDRbLWN8EYLgUijdQoOQ3ieYQBZJfB/2TkX9x7URsbrEQcJbEXSVCAUDtFK7VPexF26BGdEN2jfqiBZb6CQEJ2OqxMjN1hLt3WvovtUtmtuOif8HLoUzyOm+Xnv1OzNKIK2QSjGl6boKtDQKJvmw6KeGJ8D60OVNiwoiblrZ+NIh3bZKh4axtk8hHavfJzKIjBlEge2MAHvmtzcS/OaKYaHrUyoJEWu2GRmEqKMR3FRjtCc4ZyYAGYFvavlPXA5oQ23KINwft98l+4qlU9t+pd7Jfrx3kcBbJtkiFeOSA1MkZOScNwsg9eSQv5NV5cJ6cN+dj0jrl5DMb5Ec5n1+r2qZO</latexit> <latexit sha1_base64="eIC5UG1f97dR+q+EpifAbGjpKek=">ACKXicbZBNS8NAEIY3ftb6VfXoZbGI9WBJiqAXoSiIRwWrQlPLZLtpl242YXcilNC/48W/4kVBUa/+EbdtDn4NLDy87wyz8waJFAZd92Zmp6ZnZsvLBQXl5ZXVktr61cmTjXjDRbLWN8EYLgUijdQoOQ3ieYQBZJfB/2TkX9x7URsbrEQcJbEXSVCAUDtFK7VPexF26BGdEN2jfqiBZb6CQEJ2OqxMjN1hLt3WvovtUtmtuOif8HLoUzyOm+Xnv1OzNKIK2QSjGl6boKtDQKJvmw6KeGJ8D60OVNiwoiblrZ+NIh3bZKh4axtk8hHavfJzKIjBlEge2MAHvmtzcS/OaKYaHrUyoJEWu2GRmEqKMR3FRjtCc4ZyYAGYFvavlPXA5oQ23KINwft98l+4qlU9t+pd7Jfrx3kcBbJtkiFeOSA1MkZOScNwsg9eSQv5NV5cJ6cN+dj0jrl5DMb5Ec5n1+r2qZO</latexit> <latexit sha1_base64="eIC5UG1f97dR+q+EpifAbGjpKek=">ACKXicbZBNS8NAEIY3ftb6VfXoZbGI9WBJiqAXoSiIRwWrQlPLZLtpl242YXcilNC/48W/4kVBUa/+EbdtDn4NLDy87wyz8waJFAZd92Zmp6ZnZsvLBQXl5ZXVktr61cmTjXjDRbLWN8EYLgUijdQoOQ3ieYQBZJfB/2TkX9x7URsbrEQcJbEXSVCAUDtFK7VPexF26BGdEN2jfqiBZb6CQEJ2OqxMjN1hLt3WvovtUtmtuOif8HLoUzyOm+Xnv1OzNKIK2QSjGl6boKtDQKJvmw6KeGJ8D60OVNiwoiblrZ+NIh3bZKh4axtk8hHavfJzKIjBlEge2MAHvmtzcS/OaKYaHrUyoJEWu2GRmEqKMR3FRjtCc4ZyYAGYFvavlPXA5oQ23KINwft98l+4qlU9t+pd7Jfrx3kcBbJtkiFeOSA1MkZOScNwsg9eSQv5NV5cJ6cN+dj0jrl5DMb5Ec5n1+r2qZO</latexit> <latexit sha1_base64="JdL0ONVIGUSTCqe1v65tOHdGSA=">ACDnicbVDLSsNAFJ3UV62vqEs3g6XgqiQi6EYounFZwT6giWEynbRDZyZhZiItIV/gxl9x40IRt67d+TdO2y09cCFwzn3cu89YcKo0o7zbZVWVtfWN8qbla3tnd09e/+greJUYtLCMYtlN0SKMCpIS1PNSDeRBPGQkU4up76nQciFY3FnZ4kxOdoIGhEMdJGCuyax9OAwkvoRLhzM0zkXsq5UE2h4VsJvD8T0N7KpTd2aAy8QtSBUaAb2l9ePcqJ0JghpXquk2g/Q1JTzEhe8VJFEoRHaEB6hgrEifKz2Ts5rBmlD6NYmhIaztTfExniSk14aDo50kO16E3F/7xeqMLP6MiSTUReL4oShnUMZxmA/tUEqzZxBCEJTW3QjxEJhdtEqyYENzFl5dJ+7TuOnX39qzauCriKIMjcAxOgAvOQPcgCZoAQwewTN4BW/Wk/VivVsf89aSVcwcgj+wPn8Am3ab0Q=</latexit> <latexit sha1_base64="JdL0ONVIGUSTCqe1v65tOHdGSA=">ACDnicbVDLSsNAFJ3UV62vqEs3g6XgqiQi6EYounFZwT6giWEynbRDZyZhZiItIV/gxl9x40IRt67d+TdO2y09cCFwzn3cu89YcKo0o7zbZVWVtfWN8qbla3tnd09e/+greJUYtLCMYtlN0SKMCpIS1PNSDeRBPGQkU4up76nQciFY3FnZ4kxOdoIGhEMdJGCuyax9OAwkvoRLhzM0zkXsq5UE2h4VsJvD8T0N7KpTd2aAy8QtSBUaAb2l9ePcqJ0JghpXquk2g/Q1JTzEhe8VJFEoRHaEB6hgrEifKz2Ts5rBmlD6NYmhIaztTfExniSk14aDo50kO16E3F/7xeqMLP6MiSTUReL4oShnUMZxmA/tUEqzZxBCEJTW3QjxEJhdtEqyYENzFl5dJ+7TuOnX39qzauCriKIMjcAxOgAvOQPcgCZoAQwewTN4BW/Wk/VivVsf89aSVcwcgj+wPn8Am3ab0Q=</latexit> <latexit sha1_base64="JdL0ONVIGUSTCqe1v65tOHdGSA=">ACDnicbVDLSsNAFJ3UV62vqEs3g6XgqiQi6EYounFZwT6giWEynbRDZyZhZiItIV/gxl9x40IRt67d+TdO2y09cCFwzn3cu89YcKo0o7zbZVWVtfWN8qbla3tnd09e/+greJUYtLCMYtlN0SKMCpIS1PNSDeRBPGQkU4up76nQciFY3FnZ4kxOdoIGhEMdJGCuyax9OAwkvoRLhzM0zkXsq5UE2h4VsJvD8T0N7KpTd2aAy8QtSBUaAb2l9ePcqJ0JghpXquk2g/Q1JTzEhe8VJFEoRHaEB6hgrEifKz2Ts5rBmlD6NYmhIaztTfExniSk14aDo50kO16E3F/7xeqMLP6MiSTUReL4oShnUMZxmA/tUEqzZxBCEJTW3QjxEJhdtEqyYENzFl5dJ+7TuOnX39qzauCriKIMjcAxOgAvOQPcgCZoAQwewTN4BW/Wk/VivVsf89aSVcwcgj+wPn8Am3ab0Q=</latexit> <latexit sha1_base64="JdL0ONVIGUSTCqe1v65tOHdGSA=">ACDnicbVDLSsNAFJ3UV62vqEs3g6XgqiQi6EYounFZwT6giWEynbRDZyZhZiItIV/gxl9x40IRt67d+TdO2y09cCFwzn3cu89YcKo0o7zbZVWVtfWN8qbla3tnd09e/+greJUYtLCMYtlN0SKMCpIS1PNSDeRBPGQkU4up76nQciFY3FnZ4kxOdoIGhEMdJGCuyax9OAwkvoRLhzM0zkXsq5UE2h4VsJvD8T0N7KpTd2aAy8QtSBUaAb2l9ePcqJ0JghpXquk2g/Q1JTzEhe8VJFEoRHaEB6hgrEifKz2Ts5rBmlD6NYmhIaztTfExniSk14aDo50kO16E3F/7xeqMLP6MiSTUReL4oShnUMZxmA/tUEqzZxBCEJTW3QjxEJhdtEqyYENzFl5dJ+7TuOnX39qzauCriKIMjcAxOgAvOQPcgCZoAQwewTN4BW/Wk/VivVsf89aSVcwcgj+wPn8Am3ab0Q=</latexit> Goal: Enabling fast quantile queries at scale Query for 99-th percentile Users expect Large Group By Operating System interactive response Billions of Datasets events per day Where Location = USA Baseline: Scan and sort billions of rows, multi-second latencies µ i = 1 + θ 0 = θ � r F ( θ ) = + X x i Scalable Queries n r 2 F ( θ ) x ∈ X Statistics Optimization Data Summaries 3

  4. Systems make use of summaries to scale Summary Quantiles Raw values 99-th percentile: 401ms 95-th percentile: 197ms 50-th percentile: 48ms Summaries represent a dataset using sublinear space (e.g. histogram) Quantile estimates can be extracted from a quantile summary Commonly used to avoid sorting large datasets 4

  5. Pre-aggregating summaries reduces latency Systems can pre-aggregate summaries for populations ahead of time Data associated with day of week Day=Weekend Day=Sat 99-th percentile: 105ms 95-th percentile: 87ms 50-th percentile: 40ms Day=Sun Mergeable summaries 1 can be combined without loss of accuracy Improved query response time 1: [Agarwal et al, PODS ‘12] 5

  6. Challenge: aggregations bottlenecked by merge Many attributes means potentially more pre-aggregated subpopulations × × × × App Version OS Version Location Day HW Make 5 columns x 20 distinct values each = 3.2M combinations Queries bottlenecked when merging pre-aggregated summaries Greenwald Khanna Sketch: updatable equi-depth histogram GK Performance: 3 µ s x 1 million merges = 3 seconds How can we optimize quantile summaries for aggregation? 6

  7. Talk Outline 1. Setting: Quantile roll-ups at scale 2. Challenge: merging pre-aggregated summaries 3. Summarizing data using statistics (moments sketch) 4. Improving sketch performance 5. Results: benchmark + integrated into data systems 7

  8. Efficient data summaries using statistics How can we optimize quantile summaries for aggregation? Use statistics to summarize sub-populations (indexing) 1 3 2 4 2 2 Aggregate statistics using arithmetic (query time) 10 1 3 2 4 2 2 8

  9. <latexit sha1_base64="BDk/+7hWptUECFJ5XHazKJjmM=">ACD3icbVDLSsNAFJ3UV62vqEs3g0VxVRIRdCMU3bisYB/QxDCZTtqhM5MwM5GWkD9w46+4caGIW7fu/BunbRbaeuDC4Zx7ufeMGFUacf5tkpLyura+X1ysbm1vaOvbvXUnEqMWnimMWyEyJFGBWkqalmpJNIgnjISDscXk/89gORisbiTo8T4nPUFzSiGkjBfax9OAwkvoRLhzM0zkUNPpTzIRtCjAnZyOLqngV1as4UcJG4BamCAo3A/vJ6MU45ERozpFTXdRLtZ0hqihnJK16qSILwEPVJ1CBOF+Nv0nh0dG6cEolqaEhlP190SGuFJjHpOjvRAzXsT8T+vm+rows+oSFJNBJ4tilIGdQwn4cAelQRrNjYEYUnNrRAPkAlGmwgrJgR3/uVF0jqtuU7NvT2r1q+KOMrgAByCE+Cc1AHN6ABmgCDR/AMXsGb9WS9WO/Wx6y1ZBUz+APrM8f+oub+w=</latexit> <latexit sha1_base64="BDk/+7hWptUECFJ5XHazKJjmM=">ACD3icbVDLSsNAFJ3UV62vqEs3g0VxVRIRdCMU3bisYB/QxDCZTtqhM5MwM5GWkD9w46+4caGIW7fu/BunbRbaeuDC4Zx7ufeMGFUacf5tkpLyura+X1ysbm1vaOvbvXUnEqMWnimMWyEyJFGBWkqalmpJNIgnjISDscXk/89gORisbiTo8T4nPUFzSiGkjBfax9OAwkvoRLhzM0zkUNPpTzIRtCjAnZyOLqngV1as4UcJG4BamCAo3A/vJ6MU45ERozpFTXdRLtZ0hqihnJK16qSILwEPVJ1CBOF+Nv0nh0dG6cEolqaEhlP190SGuFJjHpOjvRAzXsT8T+vm+rows+oSFJNBJ4tilIGdQwn4cAelQRrNjYEYUnNrRAPkAlGmwgrJgR3/uVF0jqtuU7NvT2r1q+KOMrgAByCE+Cc1AHN6ABmgCDR/AMXsGb9WS9WO/Wx6y1ZBUz+APrM8f+oub+w=</latexit> <latexit sha1_base64="BDk/+7hWptUECFJ5XHazKJjmM=">ACD3icbVDLSsNAFJ3UV62vqEs3g0VxVRIRdCMU3bisYB/QxDCZTtqhM5MwM5GWkD9w46+4caGIW7fu/BunbRbaeuDC4Zx7ufeMGFUacf5tkpLyura+X1ysbm1vaOvbvXUnEqMWnimMWyEyJFGBWkqalmpJNIgnjISDscXk/89gORisbiTo8T4nPUFzSiGkjBfax9OAwkvoRLhzM0zkUNPpTzIRtCjAnZyOLqngV1as4UcJG4BamCAo3A/vJ6MU45ERozpFTXdRLtZ0hqihnJK16qSILwEPVJ1CBOF+Nv0nh0dG6cEolqaEhlP190SGuFJjHpOjvRAzXsT8T+vm+rows+oSFJNBJ4tilIGdQwn4cAelQRrNjYEYUnNrRAPkAlGmwgrJgR3/uVF0jqtuU7NvT2r1q+KOMrgAByCE+Cc1AHN6ABmgCDR/AMXsGb9WS9WO/Wx6y1ZBUz+APrM8f+oub+w=</latexit> <latexit sha1_base64="BDk/+7hWptUECFJ5XHazKJjmM=">ACD3icbVDLSsNAFJ3UV62vqEs3g0VxVRIRdCMU3bisYB/QxDCZTtqhM5MwM5GWkD9w46+4caGIW7fu/BunbRbaeuDC4Zx7ufeMGFUacf5tkpLyura+X1ysbm1vaOvbvXUnEqMWnimMWyEyJFGBWkqalmpJNIgnjISDscXk/89gORisbiTo8T4nPUFzSiGkjBfax9OAwkvoRLhzM0zkUNPpTzIRtCjAnZyOLqngV1as4UcJG4BamCAo3A/vJ6MU45ERozpFTXdRLtZ0hqihnJK16qSILwEPVJ1CBOF+Nv0nh0dG6cEolqaEhlP190SGuFJjHpOjvRAzXsT8T+vm+rows+oSFJNBJ4tilIGdQwn4cAelQRrNjYEYUnNrRAPkAlGmwgrJgR3/uVF0jqtuU7NvT2r1q+KOMrgAByCE+Cc1AHN6ABmgCDR/AMXsGb9WS9WO/Wx6y1ZBUz+APrM8f+oub+w=</latexit> Moments: statistics that capture distribution shape Moments: averages of powers of the data values. i th moment: µ i = 1 The first moment is the mean. X x i n x ∈ X Intuition: Averages bound the number of “large” values " ∑ $ % = 1 limits size of the tail ! " ∑ $ ( = 6 further limits size of tail ! Given * moments, distribution known to within +(1/*) , Can estimate quantiles 9

Recommend


More recommend