Anomaly Detection in Computer Networks Carolina Fortuna, Bla ž Fortuna, Mihael Mohorčic
Outline • Intrusion Detection • Evaluation Criteria • Scenarios • Data Acquisition • Results • Datasets • Conclusions • Related Work • Data Preprocessing
Intrusion Detection • The intrusion detector learning task is to build a predictive model (i.e. a classifier) capable of distinguishing between ``bad'' connections, called intrusions or attacks, and ``good'' normal connections. http://www.acm.org/sigs/sigkdd/kddcup/index.php?section=1999&method=task
Data Acquisition • Nine weeks of raw TCP dump data for a local-area network (LAN) simulating a typical U.S. Air Force LAN. • TCP dump data was manipulated, prepared for the learning contest
Data Acquisition • Attacks fall into four main categories: – DOS: denial-of-service – R2L: unauthorized access from a remote machine – U2R: unauthorized access to local superuser (root) privileges – probe: surveillance and other probing
Training Datasets For the experiments: Full dataset: • a 10% subset of the full • 4898431 instances dataset • 42 attributes • a 100.000 instance – 8 nominal subset – 34 continuous • a filtered subset • Nominal target variable – 23 values • No missing values
Test Dataset • 311029 instances • 42 attributes – 8 nominal – 34 continuos • Nominal target variable – 38 values with 17 values that are not in the training dataset (Table 2) • No missing values
Data Analysis Traffic categories 10000000 1000000 instances [log] 100000 10000 10% 1000 Full 100 Test 10 1 normal probe DoS U2R R2L categories Datasets Attack Rel. no of Rel. no of Rel. no of category 10% full test Examples examples examples probe 4107 0,8% 41102 0.8% 4166 1.3% DoS 391458 79.2% 3883370 79.2% 229853 73.9% U2R 52 0.01% 52 0.001% 70 0.2% R2L 1126 0.2% 1126 0.02% 16347 5.2%
Data Analysis Rel. no of full Rel. no of test Rel. no of 10% examples examples examples Normal traffic 97278 19.6% 972781 19.8% 60593 19.4%
Data Preprocessing • Two class input data for SVM • Continuous values, encode nominal values in a binary manner • Normalized feature values • Instance format: “class feature_No:value … feature_No:value”; features having zero value can be omitted.
Related Work • Mixture of bagging a boosting with modified sampling and replacement. • Decision trees. • Dynamic subset selection and self- organizing maps. • K-nearest neighbors • SVM • …
Support Vector Machines • Determines a hyperplane that is able to separate positive examples from negative examples. • A linear classifier known as the maximum margin classifier.
Evaluation Criteria • Criteria used for the KDD Cup 1999 • Confusion matrix normal probe DOS U2R R2L • Average cost per normal 0 1 2 2 2 test example probe 1 0 2 2 2 DOS 2 1 0 2 2 computed using the U2R 3 2 2 0 2 R2L 4 2 2 2 0 entrywise product between the cost ( Cm C ) cm c i , j i , j i , j and the confusion 1 matrices. ACTE cm c i , j i , j 311029 i , j
The One-to-all Scenario
The One-to-one Scenario
The One-to-all3Categ Scenario
Results (100k instances) One-to-all- One-to-all One-to-one 3categ ACTE 0.5306 1.6656 0.2641 Detection rate 99.2% 95.0% 90.3% Diagnosis rate 91.3% 3.3% 90.1% False alarm rate 99.6% 12.8% 1.6% • One-to-one IDS has the poorest ACTE, One-to- all3categ IDS has the best ACTE • One-to-all IDS – high detection rate, good diagnosis rate, very high false alarm rate: classifies most of the normal traffic as intrusion. – doesn’t detect probe, R2L and U2R – confuses DoS with normal quite often – needs parameter optimization
Results (100k instances) One-to-all- One-to-all One-to-one 3categ ACTE 0.5306 1.6656 0.2641 Detection rate 99.2% 95.0% 90.3% Diagnosis rate 91.3% 3.3% 90.1% False alarm rate 99.6% 12.8% 1.6% • One-to-one scenario has lower false alarm rate, poor diagnosis performance: detects most of the alarms, but it doesn’t classify them correctly. • The high ACTE seems to come from misclassifying DoS attacks for R2L attacks. • One-to-all-3categ IDS gives the best results: good ACTE, good detection and diagnosis rates and low false alarm rate.
Results • Next: – tune parameters using 10 fold cross validation – use larger training dataset U2R(j=10) 0.7 0.6 0.5 0.4 F1 sigma_F1 % BEP 0.3 sigma_BEP 0.2 0.1 0 0 0.01 0.1 1 10 100 1000 0 0 0.633831 0.63881 0.628171 0.537989 0.124624 F1 0 0 0.154595 0.133095 0.140434 0.123092 0.0824277 sigma_F1 BEP 0.578254 0.578254 0.594087 0.638532 0.651032 0.548254 0.563135 sigma_BEP 0.185968 0.185968 0.181891 0.150817 0.154317 0.101332 0.21024 c
Results (10% dataset) One-to-all- One-to-all One-to-one 3categ ACTE 0.2625 0.2479 0.2653 Detection rate 90.2% 90.9% 90.3% Diagnosis rate 90.1% 90.7% 90.1% False alarm rate 1.6% 2.02% 1.6% • One-to-all IDS improved the overall performance as well as the detection, diagnosis and false alarm rates. • One-to-one IDS also improved: it has the smallest ACTE and good detection and diagnosis rate.
Results (10% dataset) One-to-all- One-to-all One-to-one 3categ ACTE 0.2625 0.2479 0.2653 Detection rate 90.2% 90.9% 90.3% Diagnosis rate 90.1% 90.7% 90.1% False alarm rate 1.6% 2.02% 1.6% • One-to-all-3categ IDS: – unexpected result – there is no improvement in the detection, diagnosis and false alarm rates.
Results (10% dataset) normal probe DOS U2R R2L % normal probe DOS U2R R2L % normal 59367 211 818 12 185 97.9 normal 59611 300 678 4 0 98.3 probe 901 3002 148 0 115 72.0 probe 1053 2922 191 0 0 70.1 DOS 7047 52 222754 0 0 96.9 DOS 7242 22 222589 0 0 96.8 U2R 32 0 0 32 6 45.7 U2R 54 0 0 11 5 15.7 R2L 14791 11 2 11 1532 9.3 R2L 15959 16 2 2 368 2.2 % 72.2 91.6 99.5 58.1 83.3 % 71.0 89.6 99.6 64.7 98.6 normal probe DOS U2R R2L % normal 59593 313 672 5 10 98.3 probe 767 3120 181 6 92 74.8 DOS 7113 324 222406 0 10 96.7 U2R 60 0 0 5 5 7.1 R2L 16186 11 2 1 147 0.8 % 71.1 82.8 99.6 29.4 55.6
Results (10% dataset) normal probe DOS U2R R2L % normal probe DOS U2R R2L % normal 59367 211 818 12 185 97.9 normal 59611 300 678 4 0 98.3 probe 901 3002 148 0 115 72.0 probe 1053 2922 191 0 0 70.1 DOS 7047 52 222754 0 0 96.9 DOS 7242 22 222589 0 0 96.8 U2R 32 0 0 32 6 45.7 U2R 54 0 0 11 5 15.7 R2L 14791 11 2 11 1532 9.3 R2L 15959 16 2 2 368 2.2 % 72.2 91.6 99.5 58.1 83.3 % 71.0 89.6 99.6 64.7 98.6 normal probe DOS U2R R2L % normal 59593 313 672 5 10 98.3 probe 767 3120 181 6 92 74.8 DOS 7113 324 222406 0 10 96.7 U2R 60 0 0 5 5 7.1 R2L 16186 11 2 1 147 0.8 % 71.1 82.8 99.6 29.4 55.6
Results (10% dataset) normal probe DOS U2R R2L % normal probe DOS U2R R2L % normal 59367 211 818 12 185 97.9 normal 59611 300 678 4 0 98.3 probe 901 3002 148 0 115 72.0 probe 1053 2922 191 0 0 70.1 DOS 7047 52 222754 0 0 96.9 DOS 7242 22 222589 0 0 96.8 U2R 32 0 0 32 6 45.7 U2R 54 0 0 11 5 15.7 R2L 14791 11 2 11 1532 9.3 R2L 15959 16 2 2 368 2.2 % 72.2 91.6 99.5 58.1 83.3 % 71.0 89.6 99.6 64.7 98.6 normal probe DOS U2R R2L % normal 59593 313 672 5 10 98.3 probe 767 3120 181 6 92 74.8 DOS 7113 324 222406 0 10 96.7 U2R 60 0 0 5 5 7.1 R2L 16186 11 2 1 147 0.8 % 71.1 82.8 99.6 29.4 55.6
Results • Tradeoff: the more accurate the SVM model for classifying R2L connections, the poorest in classifying normal connections and the other way around. • One-to-all-3categ IDS performs worse than the other two IDSs in classifying R2L and U2R attacks, and performs slightly better on classifying probe attacks. • Even though we introduced the One-toall-3categ IDS in order to perform better at separating the three minority classes from the two major ones, it seems like the model built using SVM is not accurate enough so that this voting system proves efficient.
Evaluation • One-to-one IDS with 0.2479 ACTE would rank 8th in the KDD Cup 1999. • Less accurate then other results in the literature, but more simple system. • Higher accuracy can be obtained by increasing the complexity of the system: – SVMs with different kernels – Hybrid systems • that combine several machine learning methods • combine machine learning methods with the more classical ones based on signatures
Conclusions • Very large and unbalanced dataset. • Proposed a two level voting IDS that proved to perform well on a small training set but performed relatively poor when the training dataset increased. • Attacks such as R2L and U2R that result in small number of traffic packets seem to pose a real challenge for detection and diagnosis. • Usually simplicity and speed are traded for accuracy and machine learning methods are complemented by traditional signature based methods.
Recommend
More recommend