Evaluation of Anomaly Detection Method based on Pattern Recognition ● Romain Fontugne The Graduate University for Advanced Studies ● Yosuke Himura The University of Tokyo ● Kensuke Fukuda National Institute of Informatics CNRS-Wide 02-03/03/2009 1
Outline ● Motivation ● Temporal-spatial structure of anomaly ● Pattern-recognition-based method – Hough transform ● Parameter space ● MAWI database ● Study case ● Conclusion CNRS-Wide 02-03/03/2009 2
Motivation (1) ● Network traffic anomaly: – Misconfigurations, failure, network attacks ● Side effects: – Bandwidth consuming – Weaken network performance – Harmful traffic – Alter the traffic's characteristics CNRS-Wide 02-03/03/2009 3
Motivation (2) ● Difficulties: – Huge amount of data – Variety of anomalous traffic – Identification of tiny flows ● Anomaly detection method: – Usually treated as a statistical problem ● Evaluate the main characteristics of traffic ● Discriminate traffic with singularities CNRS-Wide 02-03/03/2009 4
Temporal-spatial structure of anomaly (darknet) ● Unwanted s s e traffic r d d a n o ● Linear i t a n i structures t s e D ● Unusual t r distribution o p e of traffic c r u feature o S CNRS-Wide 02-03/03/2009 5 Time
Temporal-spatial structure of anomaly (MAWI) Destination address ● Samplepoint-F: – 2009/02/21 ssh Source port s s e r d d a n o i t a n i t s e CNRS-Wide 02-03/03/2009 6 D ssh http
Pattern-recognition-based method ● Identification of linear structures in pictures: – Generate pictures from traffic – Hough transform – Retrieve packet information – Report anomalies CNRS-Wide 02-03/03/2009 7
Hough transform ● Voting procedure – Points elects lines – Polar coordinates ρ = x · cos θ + y · sin θ – Hough space Original picture ● Identify line means extract max in the Hough space – Relative threshold CNRS-Wide 02-03/03/2009 8 Hough space
Parameter space ● Hough parameter: – Weight for the voting procedure – Threshold to determine candidate line ● Picture resolution: – Time bin – Size of pictures CNRS-Wide 02-03/03/2009 9
Evaluation of parameter space ● Heuristics: – suspected = false positive + unknown ● Prob. of suspected = suspected / total anomalies – Lower is better CNRS-Wide 02-03/03/2009 10
MAWI database ● Samplepoint-B: – From 2001/01 to 2006/06 CNRS-Wide 02-03/03/2009 11
Study case: sasser infection ● Gamma modeling vs. Pattern recognition (2004/08/01) Gamma modeling-based method tuned to detect the same number of anomalies ● (Includes many false positives) CNRS-Wide 02-03/03/2009 12
13 Gamma only s o u r c e p o r t p o r t e n t r o p y n b . p k t D e s t i n a t i o n i p Both p o r t e n t r o p y n b . p k t D e s t i n a t i o n i p s o u r c e p o r t CNRS-Wide 02-03/03/2009 Hough only
Discussion ● Two different backgrounds – 50% of their results in common ● Detection of anomalies involving a tiny number of packets ● Identify easily network/port scans (dispersed distribution) ● Intensive uses of source port ● Gamma modelling = deeper analysis of the traffic's characteristics (highlight singular traffic) CNRS-Wide 02-03/03/2009 14
Conclusion and future work ● No perfect method ● Combination of several methods ● Need of methods with different backgrounds ● Future work – Auto-tuning of parameters – Sampled data – More graphical representations – Study good combinations CNRS-Wide 02-03/03/2009 15
Thank you Any questions? romain@nii.ac.jp CNRS-Wide 02-03/03/2009 16
Comparison (2) Gamma Hough only only Both CNRS-Wide 02-03/03/2009 17
Original data 18 n a t i o n i p s o u r c e p o r t p o r t e n t r o p y v o l u m e D e s t i CNRS-Wide 02-03/03/2009
Recommend
More recommend