Optimization of Decision Trees for TCP Performance RCA Marco Weiss - - PowerPoint PPT Presentation

optimization of decision trees for tcp performance rca
SMART_READER_LITE
LIVE PREVIEW

Optimization of Decision Trees for TCP Performance RCA Marco Weiss - - PowerPoint PPT Presentation

Chair of Network Architectures and Services Department of Informatics Technical University of Munich Optimization of Decision Trees for TCP Performance RCA Marco Weiss advised by Simon Bauer, Benedikt Jaeger Friday 19 th July, 2019 Chair of


slide-1
SLIDE 1

Chair of Network Architectures and Services Department of Informatics Technical University of Munich

Optimization of Decision Trees for TCP Performance RCA

Marco Weiss

advised by Simon Bauer, Benedikt Jaeger Friday 19th July, 2019 Chair of Network Architectures and Services Department of Informatics Technical University of Munich

slide-2
SLIDE 2

Introduction

Why TCP Root Cause Analysis (RCA)?

  • TCP: Widely used transport layer protocol
  • Performance issues directly affect user

experience

  • RCA to identify and overcome perfor-

mance limitations

Figure 1: Affected user (symbol picture)1

Why Decision Trees (DT)?

  • Divide-and-conquer strategy for complex

problems

  • Intuitive and interpretable
  • Variety of extensions to increase predic-

tion performance

Does it have fur? Does it bark? Dog Cat Does it fly? Bird Fish

Figure 2: Simple categorical decision tree2

1taken from shutterstock.com 2taken from IN2064 lecture slides

  • M. Weiss — Decision Trees for TCP RCA

2

slide-3
SLIDE 3

Approach

  • 1. Handcrafted decision trees (baseline)
  • 2. Genetic Algorithm for decision tree optimization
  • 3. Decision tree learning
  • 4. Ensemble methods
  • M. Weiss — Decision Trees for TCP RCA

3

slide-4
SLIDE 4

Selected Related Work

TCP Root Cause Analysis

  • Siekkinen et al.: RCA based on DT and limitation scores [11]

Machine learning approaches to TCP RCA

  • Hagos et al.: Random forest, gradient boosting and recurrent neural networks to predict

congestion window (CW) size and round trip time (RTT) [6, 7]

  • El Khayat et al.: Decision tree boosting to find root cause of package loss in wireless

networks [4] Genetic Algorithms (GA) for DT optimization

  • Bala et al.: GA for data preprocessing [1]
  • Papagelis et al., Cha et al.: GA as optimization method for DT learning [9, 3]
  • M. Weiss — Decision Trees for TCP RCA

4

slide-5
SLIDE 5

TCP Performance Limitations RCA

Root causes

  • Application limited
  • Capacity bottleneck: Unshared (ub) / shared (sb)
  • Receiver window (rw)
  • Congestion avoidance (cw)

Limitation scores

  • Dispersion score sdisp
  • Retransmission score sretr
  • RTT score sRTT
  • Receiver window score srwnd
  • Burstiness score sb
  • M. Weiss — Decision Trees for TCP RCA

5

slide-6
SLIDE 6

TCP Performance Limitations Dataset

Generation [12]

  • Using network emulator Mininet
  • Different test setups and network topolo-

gies to enforce different throughput limita- tions.

  • Measurements during 533 different bulk

transfer periods (BTP) Structure

  • 1 measurement-label pair {x, y} per BTP
  • Measurement vector

x = (sdisp, sretr, sRTT, srwnd, sb) ∈ R5

  • Label (ground truth)

y ∈ {cw, rw, sb, ub}

Figure 3: Dataset visualization using t-SNE embedding

  • M. Weiss — Decision Trees for TCP RCA

6

slide-7
SLIDE 7

Handcrafted Decision Trees

Approach

  • Topology based on underlying

mechanisms of TCP

  • Select threshold values by in-

spection of measurements

  • Note: Classification accuracy for

seen data Method Accuracy [-] Baseline 0.73 Optimized GA DT learning Random Forest Extra-Trees

Figure 4: Baseline decision tree based on [12]

  • M. Weiss — Decision Trees for TCP RCA

7

slide-8
SLIDE 8

Genetic Algorithm for Decision Tree Optimization

Problem

  • Optimize threshold parameters of baseline tree
  • "Off-the-shelf" DT learning algorithms optimize threshold parameters and tree topology
  • DT optimization is NP-complete [8]

Optimization with GA

  • No domain-specific knowledge required
  • Non-restrictive on type of cost function
  • Analogy: Survival (and reproduction) of the fittest
  • Fitness → classification performance
  • Individual → set of threshold values
  • Survival → keep good individuals, discard bad individuals
  • Reproduction → genetic operators
  • M. Weiss — Decision Trees for TCP RCA

8

slide-9
SLIDE 9

Implementation

Problem formulation

  • Candidate solution

c = {th1, th2, th3, th4} ∀j thj ∈ Tj

  • Discrete threshold set

Tj = {xj : x ∈ D}

  • Population

ci ∈ C, where i = 1...npop

  • Genetic operators

Crossover(c, c ′) Mutate(c) Elitism(C)

  • Hyperparameters

pcross, pmut, nelit, npop

Figure 5: Schematic optimization with GA: Initial population

  • M. Weiss — Decision Trees for TCP RCA

9

slide-10
SLIDE 10

Implementation

Problem formulation

  • Candidate solution

c = {th1, th2, th3, th4} ∀j thj ∈ Tj

  • Discrete threshold set

Tj = {xj : x ∈ D}

  • Population

ci ∈ C, where i = 1...npop

  • Genetic operators

Crossover(c, c ′) Mutate(c) Elitism(C)

  • Hyperparameters

pcross, pmut, nelit, npop

Figure 6: Schematic optimization with GA: Crossover

  • M. Weiss — Decision Trees for TCP RCA

10

slide-11
SLIDE 11

Implementation

Problem formulation

  • Candidate solution

c = {th1, th2, th3, th4} ∀j thj ∈ Tj

  • Discrete threshold set

Tj = {xj : x ∈ D}

  • Population

ci ∈ C, where i = 1...npop

  • Genetic operators

Crossover(c, c ′) Mutate(c) Elitism(C)

  • Hyperparameters

pcross, pmut, nelit, npop

Figure 7: Schematic optimization with GA: Mutation

  • M. Weiss — Decision Trees for TCP RCA

11

slide-12
SLIDE 12

Implementation

Problem formulation

  • Candidate solution

c = {th1, th2, th3, th4} ∀j thj ∈ Tj

  • Discrete threshold set

Tj = {xj : x ∈ D}

  • Population

ci ∈ C, where i = 1...npop

  • Genetic operators

Crossover(c, c ′) Mutate(c) Elitism(C)

  • Hyperparameters

pcross, pmut, nelit, npop

Figure 8: Schematic optimization with GA: Fitness evaluation

  • M. Weiss — Decision Trees for TCP RCA

12

slide-13
SLIDE 13

Implementation

Problem formulation

  • Candidate solution

c = {th1, th2, th3, th4} ∀j thj ∈ Tj

  • Discrete threshold set

Tj = {xj : x ∈ D}

  • Population

ci ∈ C, where i = 1...npop

  • Genetic operators

Crossover(c, c ′) Mutate(c) Elitism(C)

  • Hyperparameters

pcross, pmut, nelit, npop

Figure 9: Schematic optimization with GA: Elitism

  • M. Weiss — Decision Trees for TCP RCA

13

slide-14
SLIDE 14

Implementation

Problem formulation

  • Candidate solution

c = {th1, th2, th3, th4} ∀j thj ∈ Tj

  • Discrete threshold set

Tj = {xj : x ∈ D}

  • Population

ci ∈ C, where i = 1...npop

  • Genetic operators

Crossover(c, c ′) Mutate(c) Elitism(C)

  • Hyperparameters

pcross, pmut, nelit, npop

Figure 10: Schematic optimization with GA: Next generation

  • M. Weiss — Decision Trees for TCP RCA

14

slide-15
SLIDE 15

Results

Convergence

  • Select 3 random subsets of

training data with 10% size

  • Brute-force threshold optimiza-

tion on subsets gives upper boundary for accuracy

  • Use

down-scaled GA- population size to account for smaller subsets Method Accuracy [-] Baseline 0.73 Optimized GA 0.79 DT learning Random Forest Extra-Trees

Figure 11: Best-in-population accuracy of GA-optimized DTs for 3 different subsets of training data (avg. over 10 runs per subset)

  • M. Weiss — Decision Trees for TCP RCA

15

slide-16
SLIDE 16

Decision Tree Learning

Problem

  • Find a DT that performs well on unseen data
  • More precisely: Find best split dimension and threshold for every node

DT learning algorithm

  • Greedy heuristic: Maximize purity of new child nodes
  • Use classification and regression tree (CART) algorithm as implemented in scikit-learn [10]
  • Procedure
  • Hyperparameter search
  • Training of complete training data

t tL tR i(t) = 1 − 0.5 = 0.5 i(tL) = 0 i(tR) = 0.33

2 4 6 8 10 x1 1 2 3 4 5 6 x2

50 150 100

Figure 12: Schematic node distributions for decision tree learning algorithm (using misclassification rate)3

3taken from IN2064 lecture slides

  • M. Weiss — Decision Trees for TCP RCA

16

slide-17
SLIDE 17

Results

Observations

  • ACCtrain = 1.0 for depth > 15
  • No overfitting! Possible explanations:
  • Low-dimensional data
  • No (significant) noise in synthetic data
  • Needs further investigation...
  • Slightly

better performance than

  • pt.

handcrafted DTs with equal depth (ACCval,d=5 = 0.82) Method Accuracy [-] Baseline 0.73 Optimized GA 0.79 DT learning 0.92 Random Forest Extra-Trees

Figure 13: Train-test curve for DT learning hyperparameter estimation

  • M. Weiss — Decision Trees for TCP RCA

17

slide-18
SLIDE 18

Results

Ensemble methods

  • Marginally better performance than single DT
  • Pro: Robust and easy to train
  • Contra: Interpretability lost

Method Accuracy [-] Baseline 0.73 Optimized GA 0.79 DT learning 0.92 Random Forest 0.93 Extra-Trees 0.94

  • M. Weiss — Decision Trees for TCP RCA

18

slide-19
SLIDE 19

Summary

Classification performance

  • Accuracy of handcrafted DTs could be significantly improved by threshold optimization with

GA

  • Accuracy could be improved even further by using the decision tree learning algorithm
  • Highest accuracy with ensemble methods: ACCval = 0.93
  • More expressive models: Better accuracy but less interpretability

Outlook

  • Performance might be limited by scores
  • Further investigate structure of dataset
  • Use ML approaches directly on raw temporal data
  • M. Weiss — Decision Trees for TCP RCA

19

slide-20
SLIDE 20

Ensemble Methods (Backup Material)

Limitations of DTs [8]

  • High-variance estimators
  • Hierarchical tree-growing process: Unstable
  • Small changes in input data might lead to completely different trees
  • Greedy learning algorithm might lead to bad prediction performance

Ensemble methods

  • Average over multiple weak estimators to reduce variance
  • Problem: Estimators have to be decorrelated
  • Random forest: Heuristic-based with random subsets of split variables [2]
  • Extremely randomized trees (extra-trees): Completely randomize DT building [5]
  • Use implementation of random forest and extra-trees in scikit-learn
  • Default hyperparameters and nest = 100 give good performance
  • M. Weiss — Decision Trees for TCP RCA

20

slide-21
SLIDE 21

Bibliography

[1]

  • J. Bala, J. Huang, H. Vafaie, K. DeJong, and H. Wechsler.

Hybrid learning using genetic algorithms and decision trees for pattern classification. In IJCAI (1), pages 719–724, 1995. [2]

  • L. Breiman.

Random forests. Machine learning, 45(1):5–32, 2001. [3] S.-H. Cha and C. C. Tappert. A genetic algorithm for constructing compact binary decision trees. Journal of pattern recognition research, 4(1):1–13, 2009. [4]

  • I. El Khayat, P

. Geurts, and G. Leduc. Improving tcp in wireless networks with an adaptive machine-learnt classifier of packet loss causes. In International Conference on Research in Networking, pages 549–560. Springer, 2005. [5] P . Geurts, D. Ernst, and L. Wehenkel. Extremely randomized trees. Machine learning, 63(1):3–42, 2006. [6]

  • D. H. Hagos, P

. E. Engelstad, A. Yazidi, and Ø. Kure. A machine learning approach to tcp state monitoring from passive measurements. In 2018 Wireless Days (WD), pages 164–171. IEEE, 2018. [7]

  • D. H. Hagos, P

. E. Engelstad, A. Yazidi, and Ø. Kure. Recurrent neural network-based prediction of tcp transmission states from passive measurements. In 2018 IEEE 17th International Symposium on Network Computing and Applications (NCA), pages 1–10. IEEE, 2018. [8]

  • S. K. Murthy.

Automatic construction of decision trees from data: A multi-disciplinary survey. Data mining and knowledge discovery, 2(4):345–389, 1998.

  • M. Weiss — Decision Trees for TCP RCA

21

slide-22
SLIDE 22

Bibliography

[9]

  • A. Papagelis and D. Kalles.

Ga tree: genetically evolved decision trees. In Proceedings 12th IEEE Internationals Conference on Tools with Artificial Intelligence. ICTAI 2000, pages 203–206. IEEE, 2000. [10]

  • F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P

. Prettenhofer, R. Weiss, V. Dubourg,

  • J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay.

Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825–2830, 2011. [11]

  • M. Siekkinen, G. Urvoy-Keller, E. W. Biersack, and D. Collange.

A root cause analysis toolkit for tcp. Computer Networks, 52(9):1846–1858, 2008. [12]

  • L. J. Stemplinger.

Tcp flow performance root cause monitoring. Bachelor’s thesis, Technical University of Munich, 2019.

  • M. Weiss — Decision Trees for TCP RCA

22