svm learning of ip address structure for latency
play

SVM Learning of IP Address Structure for Latency Prediction Rob - PowerPoint PPT Presentation

SVM Learning of IP Address Structure for Latency Prediction Rob Beverly, Karen Sollins and Arthur Berger {rbeverly,sollins,awberger}@csail.mit.edu SIGCOMM MineNet06 1 SVM Learning of IP Address Structure for Latency Prediction The case


  1. SVM Learning of IP Address Structure for Latency Prediction Rob Beverly, Karen Sollins and Arthur Berger {rbeverly,sollins,awberger}@csail.mit.edu SIGCOMM MineNet06 1

  2. SVM Learning of IP Address Structure for Latency Prediction • The case for Latency Prediction • The case for Machine Learning • Data and Methodology • Results • Going Forward SIGCOMM MineNet06 2

  3. The Case for Latency Prediction SIGCOMM MineNet06 3

  4. Latency Prediction (again?) • Significant prior work: – King [Gummandi 2002] – Vivaldi [Dabek 2004] – Meridian [Wong 2005] – Others… IDMaps, GNP, etc… • Prior Methods: – Active Queries – Synthetic Coordinate Systems – Landmarks • Our work seeks to provide an agent-centric (single-node) alternative SIGCOMM MineNet06 4

  5. Why Predict Latency? 1. Service Selection : balance load, optimize performance, P2P replication 2. User-directed Routing : e.g. IPv6 with per- provider logical interfaces 3. Resource Scheduling : Grid computing, etc. 4. Network Inference : Measure additional topological network properties SIGCOMM MineNet06 5

  6. An Agent-Centric Approach • Hypothesis : Two hosts within same sub network likely have consistent congestion and latency • Registry allocation policies give network structure – but fragmented and discontinuous • Formulate as a supervised learning problem • Given latencies to a set of (random) destinations as training: – predict_latency(unseen IP) – error = |predict(IP) – actual(IP)| SIGCOMM MineNet06 6

  7. The Case for Machine Learning SIGCOMM MineNet06 7

  8. Why Machine Learning? Internet-scale Networks: – Complex (high-dimensional) – Dynamic • Can accommodate and recover from infrequent errors in probabilistic world • Traffic provides large and continuous training base SIGCOMM MineNet06 8

  9. Candidate Tool: Support Vector Machine • Supervised learning (but amenable to online learning) • Separate training set into two classes in most general way • Main insight : find hyper-plane separator that maximizes the minimum margin between convex hulls of classes • Second insight : if data is not linearly separable, take to higher dimension • Result : generalizes well, fast, accommodate unknown data structure SIGCOMM MineNet06 9

  10. SVMs – Maximize Minimum Margin =positive examples =negative examples =support vector Most Simple Case: 2 Classes in 2 Dimensions Linearly Separable SIGCOMM MineNet06 10

  11. SVMs – Redefining the Margin =positive examples =negative examples The single new positive example redefines margin SIGCOMM MineNet06 11

  12. Non-SV Points don’t affect solution =positive examples =negative examples SIGCOMM MineNet06 12

  13. IP Latency Non-Linearity =200ms examples =10ms examples IP Bit 1 IP Bit 0 SIGCOMM MineNet06 13

  14. Higher Dimensions for Non-Linearity =200ms examples ?? ?? =10ms examples ?? 2 Classes in 2 Dimensions NOT Linearly Separable SIGCOMM MineNet06 14

  15. Kernel Function Φ =200ms examples =10ms examples 2 Classes in 3 Dimensions Linearly Separable SIGCOMM MineNet06 15

  16. Support Vector Regression • Same idea as classification • ε -insensitive loss function Φ ε Latency Latency First Octet SIGCOMM MineNet06 16

  17. Data and Methodology SIGCOMM MineNet06 17

  18. Data Set • 30,000 random hosts responding to ping • Average latency to each over 5 pings • Non-trivial distribution for learning SIGCOMM MineNet06 18

  19. Methodology IP Latency IP Latency # # Train Permute Order Data Set Test • Average 5 experiments: – Randomly permute data set – Split data set into training / test points SIGCOMM MineNet06 19

  20. Methodology IP Latency IP Latency # # Train Build SVM Permute Order Measure Test Performance • Average 5 experiments: – Training data defines SVM – Performance on (unseen) test points – Each bit of IP an input feature SIGCOMM MineNet06 20

  21. Results SIGCOMM MineNet06 21

  22. Results • Spoiler: So, does it work? • Yes, within 30% for more than 75% of predictions • Performance varies with selection of parameters (multi-optimization problem) – Training Size – Input Dimension – Kernel SIGCOMM MineNet06 22

  23. Training Size SIGCOMM MineNet06 23

  24. Question: Are MSB Better Predictors • Determine error versus number most significant bits of test input IPs SIGCOMM MineNet06 24

  25. Question: Are MSB Better Predictors • Determine error versus number most significant bits of test input IPs Powerful result: majority of discriminatory power in first 8 bits SIGCOMM MineNet06 25

  26. Feature Selection • Use feature selection to determine which individual bits of address contribute to discriminatory power of prediction θ i ← ← argmin j V(f( θ θ θ ,x j ),y) ∀ θ ∈ θ 1 ,…, θ i-1 ← ← ∀ ∀ ∀ x j ! ∈ ∈ ∈ SIGCOMM MineNet06 26

  27. Feature Selection Note x-axis: cumulative bit “features” SIGCOMM MineNet06 27

  28. Feature Selection Matches Intuition and Registry Allocations SIGCOMM MineNet06 28

  29. Performance • Given empirically optimal training size and input features • How well can agents predict latency to unknown destinations? SIGCOMM MineNet06 29

  30. Prediction Performance Ideal SIGCOMM MineNet06 30

  31. Prediction Performance Within 30% for >75% of Predictions SIGCOMM MineNet06 31

  32. Prediction Performance Within 30% for >75% of Predictions SIGCOMM MineNet06 32

  33. Going Forward SIGCOMM MineNet06 33

  34. Future Research • How agents select training data (random, BGP prefix, registry allocation, from TCP flows, etc) • How performance decays over time and how often to retrain • Online, continuous learning SIGCOMM MineNet06 34

  35. Summary - Questions? • Major Results: – An agent-centric approach to latency prediction – Validation of SVMs and Kernel Functions as a means to learn on the basis of Internet Addresses – Feature Selection analysis of IP address informational content in predicting latency – Latency estimation accuracy within 30% of true value for > 75% of data points SIGCOMM MineNet06 35

  36. Prediction Accuracy Only ~25% of predictions accurate to within 5% of actual value SIGCOMM MineNet06 36

  37. Prediction Accuracy But >85% of predictions accurate to within 50% of true value About ~45% of predictions accurate to within 10% of true value SIGCOMM MineNet06 37

  38. Prediction Accuracy Bottom line: suitable for coarse-grained latency prediction SIGCOMM MineNet06 38

Recommend


More recommend