Effective management of high volume numeric data with histograms - PowerPoint PPT Presentation

Effective management of high volume numeric data with histograms Fred Moyer @Circonus DataEngConf SF ‘18

@phredmoyer ● Engineer to Engineer @circonus ● Recovering C and Perl programmer ● Geeking out on histograms since 2015

Pain driven development ● Observability tools caused a telemetry firehose ● Existing monitoring systems got washed away ● Average based metrics gave limited insight

“Effective Management” ● Performance AND scalability ● Avoid memory allocations, copies, locks, waits ● Persist data in size efficient structures

Histogram Basics Mode q(0.9) Median q(0.5) Number of q(1) Mean Samples Sample Value

Heatmap Basics

Histogram Types ● Fixed Bucket ● Approximate ● Linear ● Log Linear ● Cumulative

Fixed Bucket User specified bins/ buckets Number of Samples Sample Value

Approximate Centroids indicating data grouping Number of Samples Sample Value

Linear Evenly sized bins Number of Samples Sample Value

Log Linear Logarithmically increasing bin sizes Number of Samples Sample Value

Cumulative Number of Total Sample Samples Count Sample Value

Custom Number of Samples Sample Value

Open Source Log Linear C - github.com/circonus-labs/libcircllhist Go - github.com/circonus-labs/circonusllhist

Open Source Log Linear

Open Source Log Linear Bin size increase by 10x 90 bins

Open Source Log Linear

Bin data structure Value Exponent Count int8_t int8_t uint64_t 1 byte 1 byte Max 8 bytes Varbit encoded

Storage efficiency - 1 month 30 days of one minute histograms 30 days * 24 hours/day * 60 bins/hour * 300 bin span * 10 bytes/bin * 1kB/1,024bytes * 1MB/ 1024kB = 123.6 MB

Storage efficiency - 1 year 365 days of five minute histograms 365 days * 24 hours/day * 12 bins/hour * 300 bin span * 10 bytes/bin * 1kB/1,024bytes * 1MB/ 1024kB = 300.9 MB

Quantile calculation 1. Given a quantile q(X) where 0 < X < 1 2. Sum up the counts of all the bins, C 3. Multiply X * C to get count Q 4. Walk bins, sum bin boundary counts until > Q 5. Interpolate quantile value q(X) from bin

Linear interpolation right_count=800 bin_count=200 left_count=600 Q = 700 q(X) = left_value+(Q-left_count) / X = 0.5 (right_count-left_count)*bin_width q(X ) = 1.05 q(X) = 1.0+(700-600) / (800-600)*0.1 left_value=1.0 right_value=1.1

Recap ● Several different types of histograms ● Highly space efficient ● O(1) and O(n) complexity calculating quantiles ● What other fun things can we do?

Inverse Quantiles ● What’s the 95th percentile latency? ○ q(0.95) = 10ms ● What percent of requests exceeded 10ms? ○ 5% for this data set; what about others?

Inverse Quantile calculation 1. Given a sample value X, locate its bin 2. Using the previous linear interpolation equation, solve for Q given X

Inverse Quantile calculation X = left_value+(Q-left_count) / (right_count-left_count)*bin_width X -left_value = (Q-left_count) / (right_count-left_count)*bin_width (X-left_value) /bin_width = (Q-left_count)/(right_count-left_count) (X-left_value)/bin_width *(right_count-left_count) = Q-left_count Q = (X-left_value)/bin_width*(right_count-left_count) +left_count

Linear interpolation left_count=600 right_count=800 Q =(X-left_value)/bin_width * (right_count-left_count)+left_count Q = (1.05-1.0)/0.1*(800-600)+600 X = 1.05 Q = 700 left_value=1.0 right_value=1.1

Inverse Quantile calculation 1. Given a sample value X , locate its bin 2. Using the previous linear interpolation equation, solve for Q given X 3. Sum the bin counts up to Q as Q left 4. Inverse quantile q inv (X) = (Q total -Q left )/Q total 5. For Q left =700 , Q total = 1,000 , q inv (X) = 0.3 6. 30% of sample values exceeded X

Quantiles - Heatmap

Quantiles - q(0.9)

Inverse Quantiles - SLO

Anomalies

Thank you! Questions? Bug me at the Circonus booth Come to Office Hours Tweet @phredmoyer or @circonus

Effective management of high volume numeric data with histograms - PowerPoint PPT Presentation

Effective management of high volume numeric data with histograms Fred Moyer @Circonus DataEngConf SF 18 @phredmoyer Engineer to Engineer @circonus Recovering C and Perl programmer Geeking out on histograms since 2015 Pain driven

Volume Visualization Overview: Volume Visualization (1) Introduction to volume visualization On

Volume Presentation Volume of a Volume with sphere, cone, Slant height cylinder, and pyramid.

SQL Workshop Data Types Doug Shook Data Types Four categories String Numeric

Introduction to Data Science: Principles ordered categorical data do not have magnitude

Computer Graphics - Volume Rendering - Philipp Slusallek Overview Motivation Volume

More Data Mining with Weka Class 2 Lesson 1 Discretizing numeric attributes Ian H. Witten

Implementation: Real machine learning schemes Decision trees z From ID3 to C4.5 (pruning, numeric

Direct Volume Rendering Han-Wei Shen The Ohio State University Volume Rendering A method to

Object Space Volume Rendering Object Space Volume Rendering Ronald Peikert SciVis 2010 - Object

Hierarchical Bounding Volume October 11, 2005 () Hierarchical Bounding Volume October 11, 2005

Comp/Phys/Mtsc 715 3D (Volume) Scalar Fields: Direct volume rendering, Slices, (Textured)

Part I: Soil Mechanics Volume-Volume relation Mass-Mass relation Mass-Volume relation

Presentation Outline Numeric Targets Allocations Other Items Path to Completion

Estuarine Nutrient Numeric Endpoint San Francisco Bay Stakeholder Advisory Group (SF Bay SAG)

Hybrid Numeric Constraints & Applications Michel RUEHER University of Nice Sophia-Antipolis

Decision Trees with Numeric Tests Industrial-strength algorithms For an algorithm to be useful

Motivation Computation and Aggregation of Quantiles Application at Lucent Technologies: from

R EGRESSION RANK - SCORES TESTS IN R D EFINITION : R EGRESSION QUANTILES Jan Dienstbier n

0000000 ooo numerical data e.g alphabetic order names w grades allowed multiple passes

Confidence Intervals for Normal Data 18.05 Spring 2018 Agenda Exam on Monday April 30. Practice

Quantile Regression for Large-scale Applications Jiyan Yang Stanford University June 19, 2013

Tight Lower Bound for Comparison-Based Quantile Summaries Pavel Vesel y University of Warwick

CS 147: Computer Systems Performance Analysis Summarizing Variability and Determining

Stage Quantile regression by random projections Forecasting energy prices Involves

Effective management of high volume numeric data with histograms - PowerPoint PPT Presentation

Effective management of high volume numeric data with histograms Fred Moyer @Circonus DataEngConf SF 18 @phredmoyer Engineer to Engineer @circonus Recovering C and Perl programmer Geeking out on histograms since 2015 Pain driven

Volume Visualization Overview: Volume Visualization (1) Introduction to volume visualization On

Volume Presentation Volume of a Volume with sphere, cone, Slant height cylinder, and pyramid.

SQL Workshop Data Types Doug Shook Data Types Four categories String Numeric

Introduction to Data Science: Principles ordered categorical data do not have magnitude

Computer Graphics - Volume Rendering - Philipp Slusallek Overview Motivation Volume

More Data Mining with Weka Class 2 Lesson 1 Discretizing numeric attributes Ian H. Witten

Implementation: Real machine learning schemes Decision trees z From ID3 to C4.5 (pruning, numeric

Direct Volume Rendering Han-Wei Shen The Ohio State University Volume Rendering A method to

Object Space Volume Rendering Object Space Volume Rendering Ronald Peikert SciVis 2010 - Object

Hierarchical Bounding Volume October 11, 2005 () Hierarchical Bounding Volume October 11, 2005

Comp/Phys/Mtsc 715 3D (Volume) Scalar Fields: Direct volume rendering, Slices, (Textured)

Part I: Soil Mechanics Volume-Volume relation Mass-Mass relation Mass-Volume relation

Presentation Outline Numeric Targets Allocations Other Items Path to Completion

Estuarine Nutrient Numeric Endpoint San Francisco Bay Stakeholder Advisory Group (SF Bay SAG)

Hybrid Numeric Constraints &amp; Applications Michel RUEHER University of Nice Sophia-Antipolis

Decision Trees with Numeric Tests Industrial-strength algorithms For an algorithm to be useful

Motivation Computation and Aggregation of Quantiles Application at Lucent Technologies: from

R EGRESSION RANK - SCORES TESTS IN R D EFINITION : R EGRESSION QUANTILES Jan Dienstbier n

0000000 ooo numerical data e.g alphabetic order names w grades allowed multiple passes

Confidence Intervals for Normal Data 18.05 Spring 2018 Agenda Exam on Monday April 30. Practice

Quantile Regression for Large-scale Applications Jiyan Yang Stanford University June 19, 2013

Tight Lower Bound for Comparison-Based Quantile Summaries Pavel Vesel y University of Warwick

CS 147: Computer Systems Performance Analysis Summarizing Variability and Determining

Stage Quantile regression by random projections Forecasting energy prices Involves

Hybrid Numeric Constraints & Applications Michel RUEHER University of Nice Sophia-Antipolis