internet traffic analysis
play

Internet Traffic Analysis: Mohammed Alasmar Co Cosen ener ers, - PowerPoint PPT Presentation

19 Internet Traffic Analysis: Mohammed Alasmar Co Cosen ener ers, , 2019 2019 https://ieeexplore.ieee.org/document/8737483 Motivations Reliable traffic modelling is important for network planning, deployment and management;


  1. ‘19 Internet Traffic Analysis: Mohammed Alasmar Co Cosen ener ers, , 2019 2019 https://ieeexplore.ieee.org/document/8737483

  2. Motivations § Reliable traffic modelling is important for network planning, deployment and management; e.g. (1) network dimensioning, (2) traffic billing. § Historically, network traffic has been widely assumed to follow a Gaussian distribution . § Deciding whether Internet flows could be heavy-tailed became important as this implies significant departures from Gaussianity . 2

  3. Traffic volumes at different T 𝑌 " : the amount of traffic seen in the time period [𝑗𝑈, (𝑗 + 1)𝑈) § Internet trace.pacp § Aggregation at different sampling times (T)

  4. Traffic volumes at different T 50 50 T = 10 msec Data rate (Mbps) Data rate (Mbps) T = 1 sec 40 40 T = 5 sec 30 30 20 20 10 10 0 0 0 0 300 300 600 600 900 900 Time (seconds) Time (seconds) 4

  5. Goal Goal 0.1 T = 10 ms T = 1 sec T = 5 sec PDF 0.05 0 0 10 20 30 40 Data rate (Mbps) § Investigating the distribution of the amount of traffic per unit time using a robust statistical approach. 5

  6. Goal § Investigating the distribution of the amount of traffic per unit time using a robust statistical approach. 6

  7. Goal T = 10 ms T = 1 sec T = 5 sec 7

  8. Datasets § We study a large number of traffic traces (230) from many 2009 à 2018 different networks: Dataset #Traces Twente 1 40 MAWI 2 107 Auckland 3 25 Waikato 4 30 Caida 5 27 [1] https://www.simpleweb.org/wiki/index.php/Traces , 2009. [2] http://mawi.wide.ad.jp/mawi/ , 2016-2018. [3] https://wand.net.nz/wits/auck/9/ , 2009. [4] https://wand.net.nz/wits/waikato/8/ , 2010-2011. 8 [5] http://www.caida.org/data/overview/ , 2016.

  9. Power-law test § Our analysis is based on the framework proposed in: § The framework combines maximum-likelihood fitting methods with goodness-of-fit tests based on the K olmogorov– S mirnov statistic and likelihood ratios. 9

  10. Power-law test Power-law distribution: o is 𝑞 𝑦 = 𝛽 − 1 𝑦 01 𝑦 01 𝑞 𝑦 = introduce new variables x min x min 3000 2500 2000 PDF 1500 𝛽 : scaling exponent 1000 500 0 0 20 40 60 80 100 120 x min 10 6 data rate (bps) 10

  11. Power-law test 11

  12. Likelihood Ratio: 𝑺 𝑺, 𝑞 = 𝑔𝑗𝑢. 𝒆𝒋𝒕𝒖𝒔𝒋𝒄𝒗𝒖𝒋𝒑𝒐𝑫𝒑𝒏𝒒𝒃𝒔𝒇(𝑞𝑝𝑥𝑓𝑠𝑚𝑏𝑥, 𝑏𝑚𝑢𝑓𝑠𝑜𝑏𝑢𝑗𝑤𝑓) • Weibull • Lognormal • Exponential Likelihood ratio : V power-law likelihood function ∏ TUQ W Q X P Q 𝑺 = P R = V ∏ TUQ W R (X) alternative likelihood function 𝑴 og −Likelihood ratio : 𝑺 § • If 𝑺 > 0 , then the power-law is favoured. • If 𝑺 < 0 , then the alternative is favoured. • If 𝑞 < 0.1 , then the value of 𝑺 can be trusted. 12

  13. Normalised Log-Likelihood Ratio (LLR) T=100 msec (𝑺) Circled points p > 0.1 10 10 10 0 0 0 Normalised LLR Normalised LLR Normalised LLR -10 -10 -10 Weibull -20 -20 -20 Lognormal The log- normal is the -30 -30 -30 Weibull Exponential best fit for the vast majority -40 -40 -40 of traces. Lognormal -50 -50 -50 5 5 5 10 10 10 15 15 15 20 20 20 25 25 25 Rank of trace Rank of trace Rank of trace 13

  14. The log-normal 10 30 Weibull Weibull MAWI traces Waikato traces Lognormal distribution is the 20 Lognormal Normalised LLR Normalised LLR Exponential Exponential 5 10 best fit for the vast 0 majority of traces. 0 -10 -20 The log-normal distribution is not -5 the best fit for … -30 60 70 80 90 100 5 10 15 20 25 30 o 1 out of 27 CAIDA traces Rank of trace Rank of trace o 9 out of 107 MAWI traces 20 10 Twente traces Auckland traces o 2 out of 30 Waikato traces 0 Normalised LLR Normalised LLR 0 o 5 out of 40 Twente traces -10 o 1 out of 25 Auckland traces -20 -20 -40 -30 Weibull Weibull Lognormal -40 Lognormal -60 Exponential Exponential -50 5 10 15 20 25 Anomalous traces 10 20 30 40 Rank of trace Rank of trace 14

  15. Anomalous traces § Anomalous traces are a poor fit for all distributions tried. § This is often due to traffic outages or links that hit maximum capacity. 0.02 0.03 PDF PDF 0.015 Anomalous Log-normal 0.02 trace trace 0.01 0.01 0.005 0 0 0 500 1000 0 500 1000 Data rate (Mbps) Data rate (Mbps) 15

  16. At different sampling times: T Normalised Log-Likelihood Ratio (LLR) test results for all studied traces and log-normal distribution at different timescales 𝑺 0 Normalised LLR 𝑺 < 0 , i.e., -5 log-normal -10 is favoured. T = 5 sec T = 1 sec -15 T = 100 msec CAIDA traces T = 5 msec -20 5 10 15 20 25 Rank of trace 16

  17. The correlation coefficient test § Strong goodness-of-fit (GOF) is assumed to exist when the value of 𝛿 is greater than 0.95. Log-normal Gaussian 1 0.9 0.95 0.8 T=5sec T=5sec T=1sec T=1sec 0.7 T=100msec T=100msec CAIDA traces 0.9 T=5msec CAIDA traces T=5msec 5 10 15 20 25 5 10 15 20 25 Rank of Traces Rank of Traces 17

  18. Use case 1: Bandwidth provisioning § Bandwidth provisioning approach provides the link by the essential bandwidth that guarantees the required performance. § Overprovisioning . In the conventional methods the bandwidth is allocated by up-grading the link bandwidth to 30% of the average traffic value. 18

  19. Use case 1: Bandwidth provisioning § The following inequality (the ‘ link transparency formula ’) has been used for bandwidth provisioning: 𝑄 𝐵 𝑈 ≥ 𝐷 ≤ 𝜁 𝑈 i.e., the probability that the captured traffic A T over a specific aggregation timescale T is larger than the link capacity C has to be smaller than the value of a performance criterion ε . ü 𝛇 has to be chosen carefully by the network provider in order to meet the specified SLA. 19

  20. Use case 1: Bandwidth provisioning 𝑭𝒚𝒃𝒏𝒒𝒎𝒇: 𝛇 = 𝟏. 𝟏𝟐 Expected link capacity 𝑸 𝑩 𝑼 Gaussian ≥ 𝑫 ≤ 𝜻 Weibull 𝑼 Log-normal MAWI traces Performance criterion ε

  21. Bandwidth provisioning: Results 0.6 0.6 0.6 T=0.1s T=0.5s T= 1s T=0.1s T=0.5s T= 1s T=0.1 s T=0.5 s T= 1 s 0.5 0.4 0.4 0.4 0.2 0.2 0.2 0 0 0 M T C W A M T C W A M T C W A Target: ε = 0.5 Target: ε = 0.5 Target: ε = 0.5 Log-normal Weibull Gaussian M: MAWI, T: Twente, C: CAIDA, W: Waikato, A: Auckland 21

  22. Burstable Billing Use case 2: 95th percentile pricing § Customers are not billed for brief spikes in network traffic. 800 800 Data rate (Mbps) Data rate (Mbps) 600 600 400 400 200 200 0 0 0 100 200 300 0 50 100 Time (sec) Percentile Time (sec) [5 minutes] 22

  23. 95th percentile pricing: Results Predicted value (Mbps) Predicted value (Mbps) Predicted value (Mbps) Predicted value (Mbps) 1500 1500 1500 1500 MAWI traces 1000 1000 1000 1000 The red reference line to show where perfect predictions 500 500 500 500 would be located. 0 0 0 0 0 0 0 0 500 500 500 500 1000 1000 1000 1000 1500 1500 1500 1500 Actual value (Mbps) Actual value (Mbps) Actual value (Mbps) Actual value (Mbps) Log-normal model provides much more accurate predictions of the 95th percentile. • 23

  24. More details …. Thanks! Questions? 24

  25. Summary The distribution of traffic on Internet links is an important problem that has received relatively little Ø attention. We use a well-known, state-of-the-art statistical framework to investigate the problem using a large corpus Ø of traces. We investigated the distribution of the amount of traffic observed on a link in a given (small) aggregation Ø period which we varied from 5 msec to 5 sec. The vast majority of traces fitted the lognormal assumption best and this remained true all timescales tried. Ø We investigate the impact of the distribution on two sample traffic engineering problems. Ø 1. Firstly, we looked at predicting the proportion of time a link will exceed a given capacity. 2. Secondly, we looked at predicting the 95th percentile transit bill that ISP might be given. For both of these problems the log-normal distribution gave a more accurate result than heavy-tailed Ø distribution or a Gaussian distribution. 25

  26. Backup …… 26

  27. Power-law Test Power-law test Estimating: ( 𝛽 , x min , n tail ) Power-law distribution: 1 using MLE & KS test 01 10q X Uncertainty in the fitted 𝑞 𝑦 = 2 parameters (Bootstrapping) x min x min Goodness-of-fit p < 0.1 p > 0.1 3 p -value fail to reject Ho reject Ho Ho: Power-law is favoured ℛ > 0 ℛ < 0 ℛ < 0 ℛ > 0 ℛ ℛ 4 p > 0.1 p < 0.1 p > 0.1 p < 0.1 p < 0.1 p -value p -value p > 0.1 p -value 5 for ℛ for ℛ for ℛ None is Alternative None is Alternative Power-law None is None is favoured is favoured favoured is favoured is favoured favoured favoured [Ref] A. Clauset, C. S. Rohilla, and M. Newman, “Power-law Log-Likelihood ratio ( ℛ) distributions in empirical data,” arXiv:0706.1062v2, 2009.

Recommend


More recommend