Outline � Related Data Mining Background Data Mining Based Detection Methods � Data Mining in Intrusion detection Feng Pan Outline Typical Dataset in Data Mining � Related Data Mining Background � Dataset consists of records. � Pattern Mining � Records consist of a set of items. � Association Pattern r1 host_IP=“10.11.0.1”, � Sequence Pattern host_port>1100, flag=“SF” r2 flag=“SF” � Association Rules r3 host_IP=“10.11.0.1”, � Classification host_port>1100, duration>100ms, bytes>1KB r4 flag=“SF”,service=“telnet”, � Decision Tree bytes>1KB � CBA: Classification based on association rules r5 host_IP=“10.11.0.1”, flag=“SF”, duration>100ms, bytes>1KB r6 host_IP=“10.11.0.1”, host_port>1100, flag=“SF” Pattern Mining Association Rules � From the association patterns, we can get � Association Patterns association rules (LHS->RHS) A set of items frequently occur, order � cf is an association pattern, then we get the rule as doesn’t matter. flag=“SF” -> bytes>1KB, Pattern1: host_IP=“10.11.0.1”, host_port>1100, � Two measurements for the goodness of rules Pattern 2: flag=“SF”, bytes>1KB � Support: number of records containing ( ) Pattern 3: duration>100ms, bytes>1KB LHS � RHS support ( flag=“SF” -> bytes>1KB )=2, � Confidence: number of records containing ( ) / LHS � RHS number of records containing ( ) LHS confidence ( flag=“SF” -> bytes>1KB )=40%
Classification CBA •Add a new Column: Class Label r1 host_IP=“10.11.0.1”, normal � Many different classifications host_port>1100, flag=“SF” •Records are labeled by user using domain knowledge. � Decision Tree r2 flag=“SF” normal •Class Label can be considered as a � Neural Network special feature. r3 host_IP=“10.11.0.1”, abnormal host_port>1100, duration>100ms, bytes>1KB •We find association rule � CBA (Classification based on Association duration>100ms, bytes>1KB ->abnormal r4 flag=“SF”,service=“telnet”, normal Rules) support =2, confidence =100% bytes>1KB � CBA is constructed based on the r5 host_IP=“10.11.0.1”, abnormal •Then a simple rule of the classifier is flag=“SF”, duration>100ms, bytes>1KB association rules. duration>100ms, bytes>1KB ->abnormal r6 host_IP=“10.11.0.1”, abnormal host_port>1100, flag=“SF” CBA CBA r1 host_IP=“10.11.0.1”, normal •We can find many rules with different � Given an unknown record, apply the rules in order. host_port>1100, flag=“SF” support and confidence � “host_IP=“10.11.0.1”, host_port>1100, flag=“SF””: duration>100ms, bytes>1KB ->abnormal r2 flag=“SF” normal apply rule 3 -> classify as class abnormal : support =2, confidence =100% host_IP=“10.11.0.1”, host_port>1100 ->normal r3 host_IP=“10.11.0.1”, abnormal � “host_IP=“10.11.0.1”, host_port>1100, flag=“SF”, service=“telnet””: host_port>1100, : support =2, confidence =66% duration>100ms, bytes>1KB apply rule2 -> classify as class normal flag=“SF”,service=“telnet”->normal : support =1, confidence =100% � rule 3 can also apply to it, but rule2 has higher support and confidence r4 flag=“SF”,service=“telnet”, normal bytes>1KB � “host_port>1100, service=“telnet”, duration>100ms” r5 host_IP=“10.11.0.1”, abnormal •Sorting all the rules according to their : no rule can apply, then classify to default class flag=“SF”, duration>100ms, bytes>1KB confidence and support. � Random r6 host_IP=“10.11.0.1”, abnormal 1) duration>100ms, bytes>1KB ->abnormal � Majority class host_port>1100, flag=“SF” 2) flag=“SF”,service=“telnet”->normal 3) host_IP=“10.11.0.1”, host_port>1100 ->normal Outline Why Data Mining? � The dataset is large. � Why Data Ming � Challenge for Data Mining in intrusion � Constructing IDS manually is expensive detection. and slow. � Two layers to use Data Mining � Update is frequent since new intrusion � Mining in the connection data occurs frequently. � Mining in the alarm records.
Can Data Mining work? Challenge in feature selection � Challenges for Data Mining in building IDS � Many features in the connection records, relevant or irrelevant. � Develop techniques to automate the � Automatic detection (classifiers) are sensitive to processing of knowledge-intensive feature features. Missing of key features for some attack may selection. result worst performance � Customize the general algorithm to � The missing of “host_count” feature will make the IDS unable to incorporate domain knowledge so only detect DOS attack in the experiments on DARPAR data. relevant patterns are reported � Different attacks require different features � Compute detection models that are accurate � Some useful features are not in the original data and efficient in run-time Challenge in Pattern Mining Challenge in Building Models � Large amount of patterns can be found in the � Single model is not able to capture all dataset. System may be overwhelmed. type of attacks. � For different attacks, pattern mining shall focus � An ideal model consists of several light on different feature subsets. weighted models each of which focuses on its own aspects. � For sequence patterns, different attacks has different optimal window size. Mining in the data Framework of Building IDS � Step1: Preprocessing. Summarize the raw data. � Tow kinds of datset. � Step2: Association Rule Mining. � Network based dataset � Step3: Find sequence patterns (Frequent � Host based dataset Episodes) based on the association rules. � Step4: Construct new features based on the � Build IDS by mining in the records. sequence patterns. � When find attacks, give alarms to � Step5: Construct Classifiers on different set of features administration system.
Preprocessing Association Rule Mining � To summarize raw data to high level event, e.g. � Customizing: Only report important and a network connection (network based data) or relevant patterns. host session (host based data). � Define relevant features, reference features. � Bro and NFR can be used to do the summarizing. � Bro policy script � const ftp_guest_ids = { "anonymous", "ftp", "guest", } &redef; � Pattern must contain relevant features or redef ftp_guest_ids += { "visitor", "student" }; reference features. Otherwise, the redef ftp_guest_ids -= "ftp"; � NFR N-Codes patterns are not interesting. Association Rule Mining Sequence Pattern Mining � Example on the Shell Command Data. � Frequent Episodes. Shell Command Data is a host based dataset � X,Y->Z, [ c , s , w ] � With the existence of itemset X and Y, Z will occur in time w . � example � Different window size may generate different results. � Window size is related to attack type. � DOS: w=2 sec � slow Probing: w=1 min Feature Construction Build Model (classifier) � Construct new feature according to the � Build different classifiers for different frequent episode. attacks. � Some features will show close relationship to each other. Then combine the features. is no is no is � Some frequent episode may indicate Classfier1 Classfier2 Classfier3 DOS? Probing? R2L? yes yes interesting new features. yes DOS Probing R2L no Normal ……..
The DARPA data Preprocessing � 4G compressed tcpdump data of 7 weeks of � Use Bro script to summarize the raw data to network traffics. records for each connection. � Contains 4 main categories of attacks � Each connection contains some “intrinsic” features. � DOS : denial of service, e.g., ping-of-death, syn flood � t ime , duration , service , src_host , dst_host , src_port , � R2L : unauthorized access from a remote machine, e.g., guessing password wrong_fragment , flag � wrong_fragment : e.g., fragment size is not multiple of � U2R : unauthorized access to local super user 8 bytes, fragment offset are overlapped privileges by a local unprivileged user, e.g., buffer overflow � flag : how the connection is established and terminated � PROBING : e.g., port-scan, ping-sweep Build training data Feature Construction � Normal data set: � Time-based “traffic” features: can detect DOS and PROBING attacks. � randomly extract sequences of normal � example connections records � Data set for each attach type: � “same host” � Exam only the connections in the past 2 seconds that have � extract all the records that fall within a the same dst_host as the current one � Features: surrounding time window of plus and minus 5 count , percentage of same service , percentage of different minutes of the whole duration of each attack service , percentage of S0 flag , percentage of rejected connection flag . Feature Construction Feature Construction � Some slow probing attack need a larger � “same service” window size � Exam only the connections in the past 2 � Host-based “traffic” features seconds that have the same service as the current one � Instead of the time window of 2 seconds, use a connection window of 100 connections. � Features : � Same set of features on the connection count , percentage of different dst_host , window. percentage of S0 flag , percentage of rejected � Can detect slow Probing . connection flag .
Recommend
More recommend