structural analysis of network traffic flows
play

Structural Analysis of Network Traffic Flows Eric Kolaczyk Anukool - PowerPoint PPT Presentation

Structural Analysis of Network Traffic Flows Eric Kolaczyk Anukool Lakhina, Dina Papagiannaki, Mark Crovella, Christophe Diot, and Nina Taft Traditional Network What ISPs Care Traffic Analysis About Focus on Focus on Long,


  1. Structural Analysis of Network Traffic Flows Eric Kolaczyk Anukool Lakhina, Dina Papagiannaki, Mark Crovella, Christophe Diot, and Nina Taft

  2. Traditional Network What ISPs Care Traffic Analysis About • Focus on • Focus on – Long, nonstationary – Short ‘stationary’ timescales timescales – Traffic on all links – Traffic on a single link simultaneously in isolation • Principal goals • Principal results – Capacity planning – Scaling properties – Traffic engineering – Packet delays and losses – Anomaly detection 2

  3. Need for Whole-Network Traffic Analysis • Traffic Engineering: How does traffic move throughout the network? • Anomaly Detection: Which links show unusual traffic? • Capacity planning: How much and where in network to upgrade? 3

  4. This is Complicated! • Measuring and modeling traffic on all links simultaneously is challenging. – Even single link modeling is difficult – 100s of links in large IP networks – High-Dimensional timeseries • Significant correlation in link traffic • Is there a more fundamental representation? 4

  5. Origin-Destination Flows total traffic on the link traffic time • Link traffic arises from the superposition of Origin- Destination (OD) flows • Modeling OD flows instead of link traffic removes a significant source of correlation • A fundamental primitive for whole-network analysis 5

  6. But, This Is Still Complicated • Even more OD flows than links • Still a high dimensional, multivariate timeseries • How do we extract meaning from this high dimensional structure in a systematic manner? 6

  7. High Dimensionality: A General Strategy • Look for good low-dimensional representations • Often a high-dimensional structure can be explained by a small number of independent variables • A commonly used technique: Principal Component Analysis (PCA) (aka KL-Transform, SVD, …) 7

  8. Our work • Measure complete sets of OD flow timeseries from two backbone networks • Use PCA to understand their structure – Decompose OD flows into simpler features – Characterize individual features – Reconstruct OD flows as sum of features • Call this structural analysis 8

  9. Datasets • Abilene : 11 PoPs, 121 OD flows. • Sprint-Europe : 13 PoPs, 169 OD flows. • Collect sampled traffic from every ingress link using NetFlow • Use BGP tables to resolve egress points • Week-long datasets, 5- or 10-minute timesteps 9

  10. Example OD Flows 7 7 6 4 2.5 x 10 5.5 x 10 x 10 x 10 5 3 4.5 5 2 2.5 4 Traffic in Ab. OD Flow 29 Traffic in OD Flow 167 Traffic in OD Flow 96 Traffic in OD Flow 18 3.5 4.5 2 1.5 3 2.5 4 1.5 1 2 1 1.5 3.5 0.5 1 0.5 0.5 3 Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat Sun 7 5 6 4 x 10 x 10 x 10 x 10 8 3 7 6 7 6 2.5 6 Traffic in Ab. OD Flow 27 5 Traffic in Ab. OD Flow 59 Traffic in OD Flow 124 Traffic in OD Flow 111 5 5 4 2 4 4 3 3 3 1.5 2 2 2 1 1 1 1 0 Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat Sun 100 200 300 400 500 600 700 800 900 1000 4 5 x 10 8 7 x 10 x 10 x 10 3.5 3.6 1.35 5 1.3 3.4 3 1.25 3.2 4 2.5 1.2 Traffic in OD Flow 157 Traffic in OD Flow 42 Traffic in OD Flow 131 Traffic in OD Flow 84 3 1.15 2 3 1.1 2.8 1.05 1.5 2.6 2 1 2.4 1 0.95 2.2 1 0.9 0.5 0.85 2 0 0 Mon Tue Wed Thu Fri Sat Sun 100 200 300 400 500 600 700 800 900 1000 Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat Sun Some have visible structure, some less so … 10

  11. Specific Questions of Structural Analysis • Are there low dimensional representations for a set of OD flows? • Do OD flows share common features? • What do the features look like? • Can we get a high-level understanding of a set of OD flows in terms of these features? 11

  12. Principal Component Analysis Coordinate transformation method Original Data Transformed Data x1 , x2 u1 , u2 12

  13. Properties of Principle Components • Each PC in the direction of maximum (remaining) energy in the set of OD flows • Ordered by amount of energy they capture • Eigenflow : set of OD flows mapped onto a PC; a common trend • Ordered by most common to least common trend 13

  14. PCA on OD flows # OD pairs # OD pairs # OD pairs # OD pairs time time Eigenflow PC OD flow V: Principal X: OD flow U: Eigenflow matrix matrix matrix X=U Σ V T 14

  15. PCA on OD flows (2) Each eigenflow is a weighted sum of all OD flows Eigenflows are orthonormal = Singular values indicate the ; energy attributable to a principal component Each OD flow is weighted sum of all eigenflows = + + 15

  16. An Example Eigenflow and PC 0.05 Eigenflow 6 0 OD Flow 167 − 0.05 4 x 10 5 4.5 4 Mon Tue Wed Thu Fri Sat Sun Traffic in OD Flow 167 3.5 Time 3 2.5 2 1.5 1 0.5 Mon Tue Wed Thu Fri Sat Sun 0.2 PC − 6 0 OD Flow 94 7 5.5 x 10 − 0.2 5 Traffic in OD Flow 96 4.5 − 0.4 4 20 40 60 80 100 120 140 160 3.5 OD Flow 3 16 Thu Fri Sat Sun Mon Tue Wed

  17. Outline For Rest of Talk • Find intrinsic dimensionality of OD flows • Decompose OD flows Structural • Characterize eigenflows Analysis • Reconstruct OD flows • Potential applications 17

  18. Low Intrinsic Dimensionality of OD Flows 1 Sprint − 1 Abilene 0.9 0.8 Plot of (square root 0.7 of) energy captured Magnitude 0.6 Magnitude by each dimension. 0.5 0.4 0.3 0.2 0.1 20 40 60 80 100 120 140 160 Singular Values Singular Values 18

  19. Approximating With Top 5 Eigenflows 7 x 10 3.5 Original 5 PC 3 Traffic in OD Flow 88 2.5 2 1.5 Mon Tue Wed Thu Fri Sat Sun 19

  20. Approximating With Top 5 Eigenflows 7 x 10 Original 2.5 5 PC 2 Traffic in OD Flow 79 1.5 1 0.5 Mon Tue Wed Thu Fri Sat Sun 20

  21. Approximating With Top 5 Eigenflows 7 x 10 Original 7 5 PC 6 5 Traffic in OD Flow 96 4 3 2 1 21 Mon Tue Wed Thu Fri Sat Sun

  22. Outline • Find intrinsic dimensionality of OD flows • Decompose OD flows Structural • Characterize eigenflows Analysis • Reconstruct OD flows • Potential applications 22

  23. Structure of OD Flows 1 Sprint − 1 Abilene Most OD flows have 0.9 less than 20 significant 0.8 eigenflows 0.7 0.6 Pr[X<x] 0.5 Can think of each OD 0.4 flow as having only a 0.3 small set of “features” 0.2 0.1 0 20 40 60 80 100 120 140 160 Number of Eigenflows in an OD flow 23

  24. Kinds of Eigenflows 0.35 0.08 0.04 0.3 0.06 0.25 0.02 0.04 0.2 0.02 Eigenflow 29 Eigenflow 2 Eigenflow 20 0 0.15 0 0.1 − 0.02 − 0.02 − 0.04 0.05 − 0.04 − 0.06 0 − 0.08 − 0.05 − 0.06 − 0.1 Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat Sun Noise Spike Deterministic n-eigenflows s-eigenflows d-eigenflows Roughly stationary Sudden, isolated Predictable (periodic) and Gaussian spikes and drops trends 24

  25. D-eigenflows Have Periodicity 0.038 3.5 0.036 Sprint − 1 Abilene 0.034 3 0.032 Eigenflow 1 0.03 2.5 0.028 FFT Energy 2 0.026 0.024 1.5 0.022 Mon Tue Wed Thu Fri Sat Sun 1 0.5 Power 0 0 6 12 24 36 48 spectrum Hours 25

  26. S-eigenflows Have Spikes Sprint − 1 Eigenflow 8 0.3 0.2 0.1 0 5-sigma Mon Tue Wed Thu Fri Sat Sun threshold Abilene Eigenflow 10 0.05 0 − 0.05 − 0.1 Mon Tue Wed Thu Fri Sat Sun 26

  27. N-eigenflows Are Gaussian 0.1 0.1 0.05 0.08 Eigenflow 39 0 0.06 Quantiles of Input Sample 0.04 − 0.05 0.02 0 − 0.1 − 0.02 Mon Tue Wed Thu Fri Sat Sun − 0.04 − 0.06 Sprint − 1 − 0.08 qq-plot Abilene − 0.1 − 3 − 2 − 1 0 1 2 3 Standard Normal Quantiles 27

  28. 28 But Only Three Basic Types Hundreds of Eigenflows

  29. An OD Flow, Reconstructed 7 x 10 2 Original 1.5 OD flow 1 7 x 10 1.8 d − eigenflows 1.6 1.4 D-components 1.2 1 0.8 0.6 6 x 10 5 s − eigenflows 0 S-components − 5 6 x 10 5 n − eigenflows N-components 0 − 5 Mon Tue Wed Thu Fri Sat Sun 29

  30. Another OD Flow, Reconstructed 7 x 10 6 Original 4 OD flow 2 7 x 10 4 d − eigenflows 2 D-components 0 7 x 10 2 s − eigenflows 0 S-components − 2 7 x 10 2 n − eigenflows 0 N-components − 2 Mon Tue Wed Thu Fri Sat Sun 30

  31. Which Eigenflows Are Most Significant? 1-6: d-eigenflows N n − eigenflow appear to be most S s − eigenflow significant in both networks. D d − eigenflow 0 5 10 15 20 25 30 35 40 45 50 5-10: s-eigenflows Sprint Eigenflows in order are next important. N n − eigenflow 12 and beyond: S s − eigenflow n-eigenflows account D for rest. d − eigenflow 0 10 20 30 40 50 60 70 80 90 Abilene Eigenflows in order 31

  32. Contribution of Eigenflow Types Fraction of total OD flow energy captured by each type of eigenflow 32

  33. Contribution to Each OD Flow (Sprint) Largest OD flows: (Sprint) 0.9 Strong deterministic component. 0.8 0.7 Fraction of Total Energy Smallest OD flows: 0.6 Primarily dominated Deterministic Spike 0.5 Noise by spikes. 0.4 0.3 Regardless of size , 0.2 n-eigenflows account for a fairly constant 0.1 portion. 15 30 45 60 75 90 105 120 135 150 165 OD Flow (large to small) 33

Recommend


More recommend