detection in iot networks
play

Detection in IoT Networks Imtiaz Ullah and Qusay H. Mahmoud 33 rd - PowerPoint PPT Presentation

A Scheme for Generating a Dataset for Anomalous Activity Detection in IoT Networks Imtiaz Ullah and Qusay H. Mahmoud 33 rd Canadian Conference on Artificial Intelligence 12-15 May 2020 Agenda Introduction Motivation Problem Statement


  1. A Scheme for Generating a Dataset for Anomalous Activity Detection in IoT Networks Imtiaz Ullah and Qusay H. Mahmoud 33 rd Canadian Conference on Artificial Intelligence 12-15 May 2020

  2. Agenda ▪ Introduction ▪ Motivation ▪ Problem Statement ▪ Related work ▪ Testbed Architecture ▪ Results and Analysis o Correlated Features o Feature Ranking o Learning Curve o Validation Curve o Classification ▪ Conclusion ▪ Future work [2]

  3. Introduction ▪ Smart digital devices o Become part of our daily lives o Improve the quality of life o Make communication easier o Increase the data transfer and information sharing ▪ “Things” in the IoT could be anything o Physical o Virtual ▪ Technological challenges o Security o Power usage o Scalability o Communication mechanisms [3]

  4. Introduction Cont. ▪ Exponential growth of IoT make it a smart object for the attackers. ▪ The effects of cyber-attacks become more destructive. Fig. 1. Source: https://www.forbes.com/sites/gilpress/2016/09/02/internet-of-things-by-the-numbers-what-new-surveys-found/#a60d28116a0e [4]

  5. Motivation ▪ The exponential growth of the Internet of Things (IoT) devices provides a large attack surface for intruders to launch more destructive cyber-attacks. ▪ New techniques and detection algorithms required a well-designed dataset for IoT networks. ▪ Available IoT intrusion dataset had limited number of features. ▪ A very limited number of flow based feature in available IoT datasets. [5]

  6. Problem Statement ▪ Firstly, we reviewed the weaknesses of various intrusion detection datasets. ▪ Secondly, we proposed a new dataset, adopted from https://ieee-dataport.org/open-access/iot-network- intrusion-dataset ▪ Thirdly we provide a significant set of features with their corresponding weights. ▪ Finally, we propose a new detection classification methodology using the generated dataset. ▪ The IoT Botnet dataset can be accessed from https://sites.google.com/view/iot-network-intrusion-dataset. [6]

  7. Related Work ▪ DARPA 98 / 99 o Developed at MIT Lincoln Lab via an emulated network environment. o The DARPA 98 dataset contains seven days. o The DARPA 99 contains five weeks of network traffic. ▪ Lee and Stolfo developed the KDD99 dataset from DARPA 98/99. ▪ NSL-KDD removed redundant records from the KDD99 dataset. o Training data of KDD99 contains 78% redundant instances. o Testing data contains 75% redundant instances. ▪ ISCX Dataset at CIC university of new Brunswick. o Systematic approach to generate normal and malicious traffics. o Multistage attack. o Publicly available. [7]

  8. Related Work Cont. ▪ UNSW-NB15. o Comprehensive modern normal network traffic. o Diverse intrusions scenario. o In-depth structured network traffic information. o Publicly available. o 49 Features • Flow, Basic, Content, Time, Additional Generated features, Connection, Labeled features. ▪ CICIDS2017 o Modern normal and malicious network traffic. o 80 network features. o Reliable normal, and malicious network flows. o Publicly available. ▪ CICDDOS2019 o Up-to-date normal and malicious DDOS network traffic. o 12 DDoS attacks. o Publicly available. o Comprehensive metadata about IP addresses. [8]

  9. Related Work Cont. ▪ BoT-IoT Dataset o Developed via legitimate and emulated IoT networks. o A typical smart home configuration designed. o Dataset is publicly available. o 49 Features. ▪ Botnet IoT Dataset o Dataset generated using. • Nine commercial IoT devices. • Two IoT-based botnets BASHLITE and Mirai. o 115 Network Features. [9]

  10. Testbed Architecture ▪ A typical smart home environment. ▪ Smart home device SKT NGU and EZVIZ Wi-Fi camera to generate the IoTID20 dataset. ▪ Other devices Laptops, Tablets, Smartphones. ▪ The SKT NGU and EZVIZ Wi-Fi camera are IoT victim devices, and all other devices in the testbed are the attacking devices. ▪ CIC Flowmeter to extract features. Fig. 2. Source: https://ieee-dataport.org/open-access/iot-network-intrusion-dataset [10]

  11. Testbed Architecture Cont. ▪ New IoTID20 dataset for anomalous activity detection in IoT networks. ▪ IoTID20 available in CSV format. ▪ Various types of IoT attacks and families. ▪ Large number of general features. ▪ Large number of flow based features. Fig. 3. IoTID20 Dataset Attack Taxonomy ▪ High rank features. [11]

  12. Label Feature of IoTID20 Table 1. Binary, Category, and Sub-Category of IoTID20 Dataset Binary Category Subcategory Normal, Normal Normal, Anomaly DoS, Syn Flooding, Mirai, Brute Force, HTTP Flooding, UDP Flooding MITM, ARP Spoofing Scan Host Port, OS [12]

  13. Result and Analysis TP+TN Accuracy = TP+TN+FP+FN TP Precision = TP + FP TP Recall = TP + FN F − measure = 2 Precision. Recall Precision + Recall [13]

  14. IoTID20 Dataset Correlated Features ▪ The correlated features degrade the detection capability of a machine learning algorithm. ▪ A correlation coefficient of 0.70 to remove a list of correlated features. Table 2. IoTID20 Dataset Correlated Features Total Featur Feature Name es 12 Active_Max, Bwd_IAT_Max, Bwd_Seg_Size_Avg, Fwd_IAT_Max, Fwd_Seg_Size_Avg, Idl e_Max, PSH_Flag_Cnt, Pkt_Size_Avg, Subflow_Bwd_Byts, SubflowBwd_Pkts, Subflow_F wd_Byts, Subflow_Fwd_Pkts [14]

  15. Feature Ranking ▪ More than 70 % of the feature ranked with a value greater than 0.50. ▪ Shapira-Wilk algorithm to rank IoTID20 features. ▪ High ranked features improve feature selection. Ranking Score 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0 1 Flow_ID Src_IP Src_Port Dst_IP Dst_Port Protocol Timestamp Flow_Duration Tot_Fwd_Pkts Tot_Bwd_Pkts TotLen_Fwd_Pkts TotLen_Bwd_Pkts Fwd_Pkt_Len_Max Fwd_Pkt_Len_Min Fwd_Pkt_Len_Mean Fwd_Pkt_Len_Std Bwd_Pkt_Len_Max Bwd_Pkt_Len_Min Bwd_Pkt_Len_Mean Bwd_Pkt_Len_Std Flow_Byts/s Flow_Pkts/s Flow_IAT_Mean Fig. 4. Feature Ranking Shapiro-Wilk algorithm Flow_IAT_Std Flow_IAT_Max Flow_IAT_Min Fwd_IAT_Tot Fwd_IAT_Mean Bwd_IAT_Mean Fwd_IAT_Max Fwd_IAT_Min Bwd_IAT_Tot Bwd_IAT_Mean Bwd_IAT_Std Bwd_IAT_Max Bwd_IAT_Min Fwd_PSH_Flags Bwd_PSH_Flags Features Fwd_URG_Flags Bwd_URG_Flags Fwd_Header_Len Bwd_Header_Len Fwd_Pkts/s Bwd_Pkts/s Pkt_Len_Min Pkt_Len_Max Pkt_Len_Mean Pkt_Len_Std Pkt_Len_Var FIN_Flag_Cnt SYN_Flag_Cnt RST_Flag_Cnt PSH_Flag_Cnt ACK_Flag_Cnt URG_Flag_Cnt CWE_Flag_Count ECE_Flag_Cnt Down/Up_Ratio Pkt_Size_Avg Fwd_Seg_Size_Avg Bwd_Seg_Size_Avg Fwd_Byts/b_Avg Fwd_Pkts/b_Avg Fwd_Blk_Rate_Avg Bwd_Byts/b_Avg Bwd_Pkts/b_Avg Bwd_Blk_Rate_Avg Subflow_Fwd_Pkts Subflow_Fwd_Byts Subflow_Bwd_Pkts Subflow_Bwd_Byts Init_Fwd_Win_Byts Init_Bwd_Win_Byts Fwd_Act_Data_Pkts Fwd_Seg_Size_Min Active_Mean Active_Std Active_Max Active_Min Idle_Mean Idle_Std Idle_Max Idle_Min [15]

  16. Learning Curve ▪ A Learning Curve shows o Relationship between the training and validation of an algorithm using various training samples. o How the algorithm can benefit by providing more data or the data provided enough for better performance of the algorithm. 100 90 80 70 60 50 32000 40000 48000 56000 64000 72000 80000 88000 96000 102000 Training-Binary Testing-Binary [16] Fig. 5. Learning Curve for Binary Label

  17. Learning Curve 100 95 90 85 80 75 70 65 60 55 50 32000 40000 48000 56000 64000 72000 80000 88000 96000 102000 Training-Category Testing-Category Fig. 6. Learning Curve for Category [17]

  18. Learning Curve 100 95 90 85 80 75 70 65 60 55 50 32000 40000 48000 56000 64000 72000 80000 88000 96000 102000 Training-Sub-Category Testing-Sub-Category Fig. 7. Learning Curve for Subcategory [18]

  19. Validation Curve ▪ A validation curve shows o Effectiveness of a classifier on the data it is trained. o Efficiency of the classifier to the new test data. 100 95 90 85 80 75 70 65 60 55 50 1 2 3 4 5 6 7 8 9 10 Training-Binary Testing-Binary Fig. 8. Validation Curve for Binary Label [19]

  20. Validation Curve 100 95 90 85 80 75 70 65 60 55 50 1 2 3 4 5 6 7 8 9 10 Training-Category Testing-Category Fig. 9. Validation Curve for Category [20]

  21. Validation Curve 100 95 90 85 80 75 70 65 60 55 50 1 2 3 4 5 6 7 8 9 10 Training-Sub-Category Testing-Sub-Category Fig. 10. Validation Curve for Subcategory [21]

  22. Binary Classification ▪ Classifies the dataset as normal network traffic or 100 90 80 malicious network traffic. 70 F-Score 60 ▪ SVM, Gaussian NB, LDA, and Logic regression 50 40 poorly performed for binary label classification. 30 20 ▪ The decision tree, random forest, and ensemble 10 0 SVM Gaussian NB LDA Logistic Regression Decision Tree Random Forest Ensemble performed very well for binary label classification. Normal Anomaly Fig. 11. F-Score for Binary Label ▪ 3, 5, and 10-fold cross-validation test to check the overfitting of classifiers. ▪ Cross-fold validation test remains unchanged. [22]

  23. Category Classification ▪ Classifies the dataset as normal network 100 90 80 traffic or any of the following attack category 70 F-Score 60 DoS, Mirai, MITM, or Scan. 50 40 ▪ Decision tree estimator performs very well 30 20 for all attack categories. 10 0 SVM Gaussian NB LDA Logistic Regression Decision Tree Random Forest Ensemble ▪ Poor performance by logic regression, LDA, Normal DoS Mirai MITM Scan Fig. 12. F-Score for Category Label Gaussian NB, and SVM. [23]

Recommend


More recommend