Chair of Network Architectures and Services Department of Informatics Technical University of Munich
Optimization of Decision Trees for TCP Performance RCA Marco Weiss - - PowerPoint PPT Presentation
Optimization of Decision Trees for TCP Performance RCA Marco Weiss - - PowerPoint PPT Presentation
Chair of Network Architectures and Services Department of Informatics Technical University of Munich Optimization of Decision Trees for TCP Performance RCA Marco Weiss advised by Simon Bauer, Benedikt Jaeger Friday 19 th July, 2019 Chair of
SLIDE 1
SLIDE 2
Introduction
Why TCP Root Cause Analysis (RCA)?
- TCP: Widely used transport layer protocol
- Performance issues directly affect user
experience
- RCA to identify and overcome perfor-
mance limitations
Figure 1: Affected user (symbol picture)1
Why Decision Trees (DT)?
- Divide-and-conquer strategy for complex
problems
- Intuitive and interpretable
- Variety of extensions to increase predic-
tion performance
Does it have fur? Does it bark? Dog Cat Does it fly? Bird Fish
Figure 2: Simple categorical decision tree2
1taken from shutterstock.com 2taken from IN2064 lecture slides
- M. Weiss — Decision Trees for TCP RCA
2
SLIDE 3
Approach
- 1. Handcrafted decision trees (baseline)
- 2. Genetic Algorithm for decision tree optimization
- 3. Decision tree learning
- 4. Ensemble methods
- M. Weiss — Decision Trees for TCP RCA
3
SLIDE 4
Selected Related Work
TCP Root Cause Analysis
- Siekkinen et al.: RCA based on DT and limitation scores [11]
Machine learning approaches to TCP RCA
- Hagos et al.: Random forest, gradient boosting and recurrent neural networks to predict
congestion window (CW) size and round trip time (RTT) [6, 7]
- El Khayat et al.: Decision tree boosting to find root cause of package loss in wireless
networks [4] Genetic Algorithms (GA) for DT optimization
- Bala et al.: GA for data preprocessing [1]
- Papagelis et al., Cha et al.: GA as optimization method for DT learning [9, 3]
- M. Weiss — Decision Trees for TCP RCA
4
SLIDE 5
TCP Performance Limitations RCA
Root causes
- Application limited
- Capacity bottleneck: Unshared (ub) / shared (sb)
- Receiver window (rw)
- Congestion avoidance (cw)
Limitation scores
- Dispersion score sdisp
- Retransmission score sretr
- RTT score sRTT
- Receiver window score srwnd
- Burstiness score sb
- M. Weiss — Decision Trees for TCP RCA
5
SLIDE 6
TCP Performance Limitations Dataset
Generation [12]
- Using network emulator Mininet
- Different test setups and network topolo-
gies to enforce different throughput limita- tions.
- Measurements during 533 different bulk
transfer periods (BTP) Structure
- 1 measurement-label pair {x, y} per BTP
- Measurement vector
x = (sdisp, sretr, sRTT, srwnd, sb) ∈ R5
- Label (ground truth)
y ∈ {cw, rw, sb, ub}
Figure 3: Dataset visualization using t-SNE embedding
- M. Weiss — Decision Trees for TCP RCA
6
SLIDE 7
Handcrafted Decision Trees
Approach
- Topology based on underlying
mechanisms of TCP
- Select threshold values by in-
spection of measurements
- Note: Classification accuracy for
seen data Method Accuracy [-] Baseline 0.73 Optimized GA DT learning Random Forest Extra-Trees
Figure 4: Baseline decision tree based on [12]
- M. Weiss — Decision Trees for TCP RCA
7
SLIDE 8
Genetic Algorithm for Decision Tree Optimization
Problem
- Optimize threshold parameters of baseline tree
- "Off-the-shelf" DT learning algorithms optimize threshold parameters and tree topology
- DT optimization is NP-complete [8]
Optimization with GA
- No domain-specific knowledge required
- Non-restrictive on type of cost function
- Analogy: Survival (and reproduction) of the fittest
- Fitness → classification performance
- Individual → set of threshold values
- Survival → keep good individuals, discard bad individuals
- Reproduction → genetic operators
- M. Weiss — Decision Trees for TCP RCA
8
SLIDE 9
Implementation
Problem formulation
- Candidate solution
c = {th1, th2, th3, th4} ∀j thj ∈ Tj
- Discrete threshold set
Tj = {xj : x ∈ D}
- Population
ci ∈ C, where i = 1...npop
- Genetic operators
Crossover(c, c ′) Mutate(c) Elitism(C)
- Hyperparameters
pcross, pmut, nelit, npop
Figure 5: Schematic optimization with GA: Initial population
- M. Weiss — Decision Trees for TCP RCA
9
SLIDE 10
Implementation
Problem formulation
- Candidate solution
c = {th1, th2, th3, th4} ∀j thj ∈ Tj
- Discrete threshold set
Tj = {xj : x ∈ D}
- Population
ci ∈ C, where i = 1...npop
- Genetic operators
Crossover(c, c ′) Mutate(c) Elitism(C)
- Hyperparameters
pcross, pmut, nelit, npop
Figure 6: Schematic optimization with GA: Crossover
- M. Weiss — Decision Trees for TCP RCA
10
SLIDE 11
Implementation
Problem formulation
- Candidate solution
c = {th1, th2, th3, th4} ∀j thj ∈ Tj
- Discrete threshold set
Tj = {xj : x ∈ D}
- Population
ci ∈ C, where i = 1...npop
- Genetic operators
Crossover(c, c ′) Mutate(c) Elitism(C)
- Hyperparameters
pcross, pmut, nelit, npop
Figure 7: Schematic optimization with GA: Mutation
- M. Weiss — Decision Trees for TCP RCA
11
SLIDE 12
Implementation
Problem formulation
- Candidate solution
c = {th1, th2, th3, th4} ∀j thj ∈ Tj
- Discrete threshold set
Tj = {xj : x ∈ D}
- Population
ci ∈ C, where i = 1...npop
- Genetic operators
Crossover(c, c ′) Mutate(c) Elitism(C)
- Hyperparameters
pcross, pmut, nelit, npop
Figure 8: Schematic optimization with GA: Fitness evaluation
- M. Weiss — Decision Trees for TCP RCA
12
SLIDE 13
Implementation
Problem formulation
- Candidate solution
c = {th1, th2, th3, th4} ∀j thj ∈ Tj
- Discrete threshold set
Tj = {xj : x ∈ D}
- Population
ci ∈ C, where i = 1...npop
- Genetic operators
Crossover(c, c ′) Mutate(c) Elitism(C)
- Hyperparameters
pcross, pmut, nelit, npop
Figure 9: Schematic optimization with GA: Elitism
- M. Weiss — Decision Trees for TCP RCA
13
SLIDE 14
Implementation
Problem formulation
- Candidate solution
c = {th1, th2, th3, th4} ∀j thj ∈ Tj
- Discrete threshold set
Tj = {xj : x ∈ D}
- Population
ci ∈ C, where i = 1...npop
- Genetic operators
Crossover(c, c ′) Mutate(c) Elitism(C)
- Hyperparameters
pcross, pmut, nelit, npop
Figure 10: Schematic optimization with GA: Next generation
- M. Weiss — Decision Trees for TCP RCA
14
SLIDE 15
Results
Convergence
- Select 3 random subsets of
training data with 10% size
- Brute-force threshold optimiza-
tion on subsets gives upper boundary for accuracy
- Use
down-scaled GA- population size to account for smaller subsets Method Accuracy [-] Baseline 0.73 Optimized GA 0.79 DT learning Random Forest Extra-Trees
Figure 11: Best-in-population accuracy of GA-optimized DTs for 3 different subsets of training data (avg. over 10 runs per subset)
- M. Weiss — Decision Trees for TCP RCA
15
SLIDE 16
Decision Tree Learning
Problem
- Find a DT that performs well on unseen data
- More precisely: Find best split dimension and threshold for every node
DT learning algorithm
- Greedy heuristic: Maximize purity of new child nodes
- Use classification and regression tree (CART) algorithm as implemented in scikit-learn [10]
- Procedure
- Hyperparameter search
- Training of complete training data
t tL tR i(t) = 1 − 0.5 = 0.5 i(tL) = 0 i(tR) = 0.33
2 4 6 8 10 x1 1 2 3 4 5 6 x2
50 150 100
Figure 12: Schematic node distributions for decision tree learning algorithm (using misclassification rate)3
3taken from IN2064 lecture slides
- M. Weiss — Decision Trees for TCP RCA
16
SLIDE 17
Results
Observations
- ACCtrain = 1.0 for depth > 15
- No overfitting! Possible explanations:
- Low-dimensional data
- No (significant) noise in synthetic data
- Needs further investigation...
- Slightly
better performance than
- pt.
handcrafted DTs with equal depth (ACCval,d=5 = 0.82) Method Accuracy [-] Baseline 0.73 Optimized GA 0.79 DT learning 0.92 Random Forest Extra-Trees
Figure 13: Train-test curve for DT learning hyperparameter estimation
- M. Weiss — Decision Trees for TCP RCA
17
SLIDE 18
Results
Ensemble methods
- Marginally better performance than single DT
- Pro: Robust and easy to train
- Contra: Interpretability lost
Method Accuracy [-] Baseline 0.73 Optimized GA 0.79 DT learning 0.92 Random Forest 0.93 Extra-Trees 0.94
- M. Weiss — Decision Trees for TCP RCA
18
SLIDE 19
Summary
Classification performance
- Accuracy of handcrafted DTs could be significantly improved by threshold optimization with
GA
- Accuracy could be improved even further by using the decision tree learning algorithm
- Highest accuracy with ensemble methods: ACCval = 0.93
- More expressive models: Better accuracy but less interpretability
Outlook
- Performance might be limited by scores
- Further investigate structure of dataset
- Use ML approaches directly on raw temporal data
- M. Weiss — Decision Trees for TCP RCA
19
SLIDE 20
Ensemble Methods (Backup Material)
Limitations of DTs [8]
- High-variance estimators
- Hierarchical tree-growing process: Unstable
- Small changes in input data might lead to completely different trees
- Greedy learning algorithm might lead to bad prediction performance
Ensemble methods
- Average over multiple weak estimators to reduce variance
- Problem: Estimators have to be decorrelated
- Random forest: Heuristic-based with random subsets of split variables [2]
- Extremely randomized trees (extra-trees): Completely randomize DT building [5]
- Use implementation of random forest and extra-trees in scikit-learn
- Default hyperparameters and nest = 100 give good performance
- M. Weiss — Decision Trees for TCP RCA
20
SLIDE 21
Bibliography
[1]
- J. Bala, J. Huang, H. Vafaie, K. DeJong, and H. Wechsler.
Hybrid learning using genetic algorithms and decision trees for pattern classification. In IJCAI (1), pages 719–724, 1995. [2]
- L. Breiman.
Random forests. Machine learning, 45(1):5–32, 2001. [3] S.-H. Cha and C. C. Tappert. A genetic algorithm for constructing compact binary decision trees. Journal of pattern recognition research, 4(1):1–13, 2009. [4]
- I. El Khayat, P
. Geurts, and G. Leduc. Improving tcp in wireless networks with an adaptive machine-learnt classifier of packet loss causes. In International Conference on Research in Networking, pages 549–560. Springer, 2005. [5] P . Geurts, D. Ernst, and L. Wehenkel. Extremely randomized trees. Machine learning, 63(1):3–42, 2006. [6]
- D. H. Hagos, P
. E. Engelstad, A. Yazidi, and Ø. Kure. A machine learning approach to tcp state monitoring from passive measurements. In 2018 Wireless Days (WD), pages 164–171. IEEE, 2018. [7]
- D. H. Hagos, P
. E. Engelstad, A. Yazidi, and Ø. Kure. Recurrent neural network-based prediction of tcp transmission states from passive measurements. In 2018 IEEE 17th International Symposium on Network Computing and Applications (NCA), pages 1–10. IEEE, 2018. [8]
- S. K. Murthy.
Automatic construction of decision trees from data: A multi-disciplinary survey. Data mining and knowledge discovery, 2(4):345–389, 1998.
- M. Weiss — Decision Trees for TCP RCA
21
SLIDE 22
Bibliography
[9]
- A. Papagelis and D. Kalles.
Ga tree: genetically evolved decision trees. In Proceedings 12th IEEE Internationals Conference on Tools with Artificial Intelligence. ICTAI 2000, pages 203–206. IEEE, 2000. [10]
- F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P
. Prettenhofer, R. Weiss, V. Dubourg,
- J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay.
Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825–2830, 2011. [11]
- M. Siekkinen, G. Urvoy-Keller, E. W. Biersack, and D. Collange.
A root cause analysis toolkit for tcp. Computer Networks, 52(9):1846–1858, 2008. [12]
- L. J. Stemplinger.
Tcp flow performance root cause monitoring. Bachelor’s thesis, Technical University of Munich, 2019.
- M. Weiss — Decision Trees for TCP RCA
22