Chair of Network Architectures and Services Department of Informatics Technical University of Munich Optimization of Decision Trees for TCP Performance RCA Marco Weiss advised by Simon Bauer, Benedikt Jaeger Friday 19 th July, 2019 Chair of Network Architectures and Services Department of Informatics Technical University of Munich
Introduction Why TCP Root Cause Analysis (RCA)? • TCP: Widely used transport layer protocol • Performance issues directly affect user experience • RCA to identify and overcome perfor- mance limitations Figure 1: Affected user (symbol picture) 1 Does it have fur? Why Decision Trees (DT)? • Divide-and-conquer strategy for complex Does it bark? Does it fly? problems • Intuitive and interpretable Dog Cat Bird Fish • Variety of extensions to increase predic- tion performance Figure 2: Simple categorical decision tree 2 1 taken from shutterstock.com 2 taken from IN2064 lecture slides M. Weiss — Decision Trees for TCP RCA 2
Approach 1. Handcrafted decision trees (baseline) 2. Genetic Algorithm for decision tree optimization 3. Decision tree learning 4. Ensemble methods M. Weiss — Decision Trees for TCP RCA 3
Selected Related Work TCP Root Cause Analysis • Siekkinen et al.: RCA based on DT and limitation scores [11] Machine learning approaches to TCP RCA • Hagos et al.: Random forest, gradient boosting and recurrent neural networks to predict congestion window (CW) size and round trip time (RTT) [6, 7] • El Khayat et al.: Decision tree boosting to find root cause of package loss in wireless networks [4] Genetic Algorithms (GA) for DT optimization • Bala et al.: GA for data preprocessing [1] • Papagelis et al., Cha et al.: GA as optimization method for DT learning [9, 3] M. Weiss — Decision Trees for TCP RCA 4
TCP Performance Limitations RCA Root causes • Application limited • Capacity bottleneck: Unshared ( ub ) / shared ( sb ) • Receiver window ( rw ) • Congestion avoidance ( cw ) Limitation scores • Dispersion score s disp • Retransmission score s retr • RTT score s RTT • Receiver window score s rwnd • Burstiness score s b M. Weiss — Decision Trees for TCP RCA 5
TCP Performance Limitations Dataset Generation [12] • Using network emulator Mininet • Different test setups and network topolo- gies to enforce different throughput limita- tions. • Measurements during 533 different bulk transfer periods (BTP) Structure • 1 measurement-label pair { x , y } per BTP • Measurement vector x = ( s disp , s retr , s RTT , s rwnd , s b ) ∈ R 5 • Label (ground truth) Figure 3: Dataset visualization using t-SNE embedding y ∈ { cw , rw , sb , ub } M. Weiss — Decision Trees for TCP RCA 6
Handcrafted Decision Trees Approach • Topology based on underlying mechanisms of TCP • Select threshold values by in- spection of measurements • Note: Classification accuracy for seen data Method Accuracy [-] Baseline 0.73 Optimized GA DT learning Random Forest Extra-Trees Figure 4: Baseline decision tree based on [12] M. Weiss — Decision Trees for TCP RCA 7
Genetic Algorithm for Decision Tree Optimization Problem • Optimize threshold parameters of baseline tree • "Off-the-shelf" DT learning algorithms optimize threshold parameters and tree topology • DT optimization is NP-complete [8] Optimization with GA • No domain-specific knowledge required • Non-restrictive on type of cost function • Analogy: Survival (and reproduction) of the fittest • Fitness → classification performance • Individual → set of threshold values • Survival → keep good individuals, discard bad individuals • Reproduction → genetic operators M. Weiss — Decision Trees for TCP RCA 8
Implementation Problem formulation • Candidate solution c = { th 1 , th 2 , th 3 , th 4 } ∀ j th j ∈ T j • Discrete threshold set T j = { x j : x ∈ D} • Population c i ∈ C , where i = 1... n pop • Genetic operators Crossover ( c , c ′ ) Mutate ( c ) Elitism ( C ) • Hyperparameters p cross , p mut , n elit , n pop Figure 5: Schematic optimization with GA: Initial population M. Weiss — Decision Trees for TCP RCA 9
Implementation Problem formulation • Candidate solution c = { th 1 , th 2 , th 3 , th 4 } ∀ j th j ∈ T j • Discrete threshold set T j = { x j : x ∈ D} • Population c i ∈ C , where i = 1... n pop • Genetic operators Crossover ( c , c ′ ) Mutate ( c ) Elitism ( C ) • Hyperparameters p cross , p mut , n elit , n pop Figure 6: Schematic optimization with GA: Crossover M. Weiss — Decision Trees for TCP RCA 10
Implementation Problem formulation • Candidate solution c = { th 1 , th 2 , th 3 , th 4 } ∀ j th j ∈ T j • Discrete threshold set T j = { x j : x ∈ D} • Population c i ∈ C , where i = 1... n pop • Genetic operators Crossover ( c , c ′ ) Mutate ( c ) Elitism ( C ) • Hyperparameters p cross , p mut , n elit , n pop Figure 7: Schematic optimization with GA: Mutation M. Weiss — Decision Trees for TCP RCA 11
Implementation Problem formulation • Candidate solution c = { th 1 , th 2 , th 3 , th 4 } ∀ j th j ∈ T j • Discrete threshold set T j = { x j : x ∈ D} • Population c i ∈ C , where i = 1... n pop • Genetic operators Crossover ( c , c ′ ) Mutate ( c ) Elitism ( C ) • Hyperparameters p cross , p mut , n elit , n pop Figure 8: Schematic optimization with GA: Fitness evaluation M. Weiss — Decision Trees for TCP RCA 12
Implementation Problem formulation • Candidate solution c = { th 1 , th 2 , th 3 , th 4 } ∀ j th j ∈ T j • Discrete threshold set T j = { x j : x ∈ D} • Population c i ∈ C , where i = 1... n pop • Genetic operators Crossover ( c , c ′ ) Mutate ( c ) Elitism ( C ) • Hyperparameters p cross , p mut , n elit , n pop Figure 9: Schematic optimization with GA: Elitism M. Weiss — Decision Trees for TCP RCA 13
Implementation Problem formulation • Candidate solution c = { th 1 , th 2 , th 3 , th 4 } ∀ j th j ∈ T j • Discrete threshold set T j = { x j : x ∈ D} • Population c i ∈ C , where i = 1... n pop • Genetic operators Crossover ( c , c ′ ) Mutate ( c ) Elitism ( C ) • Hyperparameters p cross , p mut , n elit , n pop Figure 10: Schematic optimization with GA: Next generation M. Weiss — Decision Trees for TCP RCA 14
Results Convergence • Select 3 random subsets of training data with 10% size • Brute-force threshold optimiza- tion on subsets gives upper boundary for accuracy • Use down-scaled GA- population size to account for smaller subsets Method Accuracy [-] Baseline 0.73 Optimized GA 0.79 DT learning Figure 11: Best-in-population accuracy of GA-optimized DTs Random Forest for 3 different subsets of training data (avg. over 10 runs per Extra-Trees subset) M. Weiss — Decision Trees for TCP RCA 15
Decision Tree Learning Problem • Find a DT that performs well on unseen data • More precisely: Find best split dimension and threshold for every node DT learning algorithm • Greedy heuristic: Maximize purity of new child nodes • Use classification and regression tree (CART) algorithm as implemented in scikit-learn [10] • Procedure • Hyperparameter search • Training of complete training data i ( t ) = 1 − 0 . 5 = 0 . 5 t 6 5 100 4 x 2 150 3 2 50 1 t L 0 t R 0 2 4 6 8 10 x 1 i ( t L ) = 0 i ( t R ) = 0 . 33 Figure 12: Schematic node distributions for decision tree learning algorithm (using misclassification rate) 3 3 taken from IN2064 lecture slides M. Weiss — Decision Trees for TCP RCA 16
Results Observations • ACC train = 1.0 for depth > 15 • No overfitting! Possible explanations: • Low-dimensional data • No (significant) noise in synthetic data • Needs further investigation... • Slightly better performance than opt. handcrafted DTs with equal depth ( ACC val , d =5 = 0.82 ) Method Accuracy [-] Figure 13: Train-test curve for DT learning Baseline 0.73 hyperparameter estimation Optimized GA 0.79 DT learning 0.92 Random Forest Extra-Trees M. Weiss — Decision Trees for TCP RCA 17
Results Ensemble methods • Marginally better performance than single DT • Pro: Robust and easy to train • Contra: Interpretability lost Method Accuracy [-] Baseline 0.73 Optimized GA 0.79 DT learning 0.92 Random Forest 0.93 Extra-Trees 0.94 M. Weiss — Decision Trees for TCP RCA 18
Summary Classification performance • Accuracy of handcrafted DTs could be significantly improved by threshold optimization with GA • Accuracy could be improved even further by using the decision tree learning algorithm • Highest accuracy with ensemble methods: ACC val = 0.93 • More expressive models: Better accuracy but less interpretability Outlook • Performance might be limited by scores • Further investigate structure of dataset • Use ML approaches directly on raw temporal data M. Weiss — Decision Trees for TCP RCA 19
Recommend
More recommend