Multivariate Online Anomaly Detection Using Kernel Recursive Least - PowerPoint PPT Presentation

Multivariate Online Anomaly Detection Using Kernel Recursive Least Squares Tarem Ahmed, Mark Coates and Anukool Lakhina * tarem.ahmed@mail.mcgill.ca, coates@ece.mcgill.ca, anukool@cs.bu.edu IEEE Infocom, Anchorage, AK May 06-12, 2007 Research supported by Canadian National Science and Engineering Research Council (NSERC) through the Agile All- Photonics Research Network (AAPN) research network. * Boston University

Introduction � What is a network anomaly? Deviation from normal trend � of some traffic characteristic NYCM-CHIN link Short-lived event � 5 6x 10 Rare event � No. of packets 4 � May be deliberate or accidental, harmful or innocuous 2 Examples: DoS, viruses, � large data transfers, equipment failures 0 690 710 730 Timestep � Objective : Autonomously detect anomalies in real-time in multivariate, network-wide data

Network Traffic Characteristics [Lakhina 05] � Intrinsic low-dimensionality � High spatial correlation Abilene weathermap. � Enables use of Principal Source: Indiana University Component Analysis (PCA)

Existing Approach: PCA � Determine PCs of traffic flow timeseries � Assign few highest PCs to normal subspace � remaining PCs to residual subspace � � Anomaly flagged when magnitude of projection onto residual subspace > threshold � Online PCA: project new arrival onto past PCs � � Problems: covariance structure not stationary � too sensitive to threshold �

Background: The ‘Kernel Trick’ � Mapping from input space onto feature space : ( ) ϕ ∈ → ϕ ∈ R d : H x x i i � Kernel computes inner product of feature vectors, without explicit knowledge of the feature vectors: ( ) ( ) ( ) = ϕ ϕ k , , x x x x i j i j � H typically much higher dimensional than R d � Many algorithms only rely on inner products in H ; hence employ kernel trick

Background: Kernel Recursive Least Squares (KRLS) Should be possible to describe region of normality � { } M = x � in feature space using sparse dictionary , D j = j 1 ( ) ϕ x Feature vector is said to be approximately � t { } ( ) M ϕ � linearly independent on if [Engel 04]: x j = j 1 (1) 2 m − ∑ t 1 δ = φ − φ > ν � min a ( ) ( ) x x t j j t a = j 1 Threshold Dictionary approximation { } D = x x � � � Using (1), recursively construct � x 1 , , 2 . . . , m ( ) φ D such that approximately spans feature space

Kernel-based Online Anomaly Detection (KOAD): Key Idea δ > ν ( ) { } 2 φ D 1 ν < δ < ν 1 2 ( ) { } φ D 2 Simplified 2-D depiction δ : distance between new sample and ν < ν t span of Dictionary [Engel 04], 1 2

KOAD: The Algorithm ν ν 1. Set thresholds , 1 2 2. Evaluate current measurement Process previous Orange Alarm 3. 4. Remove any obsolete dictionary element

ν ν 1. Set thresholds , 1 2 ν : upper threshold � 2 controls immediate flagging ( Red1 Alarms ) of anomalies � ν : lower threshold � 1 determines dictionary that is built � � Thresholds intertwined together determine dictionary, space of normality � should be made adaptive! �

2. Evaluate current measurement � At timestep t with arriving input vector x t : δ Evaluate according to (1), � t ν ν < ν ν compare with and where 2 1 1 2 δ > ν , infer x t far from normality: Red1 If � t 2 δ > ν , raise Orange , resolve l timesteps If � t 1 later, after “ usefulness ” test δ < ν If , infer x t close to normality: Green � t 1

3. Resolving orange alarm � An Orange Alarm may represent a migration or expansion of � region of normality: Green an isolated incident: Red2 � � Track contribution of x t in explaining l subsequent arrivals { } + t l x kernel of with � x t = + i i t 1 perform secondary “ Usefulness Test ” �

3. The “Usefulness Test” � Define closeness threshold d { } + t l � Kernel of with high x x = + t i i t 1 ⇒ x close to x i t ε ) of l subsequent kernels high � Most (fraction ⇒ x useful as a D member t

4. Remove any obsolete D element � Test if kernel of arriving x t with any D member remains consistently low � If so, D element obsolete, must be deleted � Dropping involves dimensionality reduction � Different from downdating � Difficult problem � KOAD also incorporates exponential forgetting � impact of past observations gradually reduced

Relationship with MVS � Region of normality should correspond to a Minimum Normal Anomalous Volume Set (MVS) � One-Class Neighbor Machine (OCNM) for estimating MVS proposed in [Muñoz 06] 2-D isomap of number of packets � Requires choice of sparsity in NYCN-CHIN backbone flow measure, g . Example: k -th nearest-neighbour distance � Identifies fraction µ inside MVS

Experimental Data � Stats collected at 11 backbone routers � IP-space mapped to 121 backbone flows � Obtain timeseries of backbone flow metrics: number of packets � Abilene backbone network number of bytes � number of individual IP flows �

Experimental Setup � KOAD x t = flow vector (number of packets, � bytes or individual IP flows, in each backbone flow during interval t Linear kernel � � PCA 4 PCs to normal subspace � � OCNM 2% outliers � Abilene backbone network � Code, instructions on replicating our experiments [WebPage]

Results: Comparing Algorithms -1 10 KOAD δ t -2 10 11 10 of residual Magnitude PCA 9 10 0 10 Euclidean distance OCNM -1 10 KOAD PCA OCNM 500 750 1000 1250 1500 1750 2000 Timestep

Results: Comparing D Elements 1 Kernel 0.9 Normal value 0.8 1 Kernel 0.9 Obsolete value 0.8 1 Kernel 0.9 Anomaly value 0.8 1000 1250 1500 1750 2000 Timestep

950 1000 1050 1100 1150 1200 Results: Long-lived “Anomalies” Timestep 900 850 4 x 10 2 1 0 0.5 0.3 0.1 IP flows KOAD δ t Number of

1800 1600 Results: PCA Missed Detections 1400 Timestep 1200 1000 800 6 5 4 10 10 10 10 0 10 10 IP flows projection Number of Magnitude of

Conclusions � Anomaly detection important problem � Proposed KOAD equally effective as PCA � Faster time-to-detection (min vs hrs) � KOAD Complexity � O ( m 2 ) generally � O ( m 3 ) when dropping occurs � PCA � O ( tR 2 ) with R PCs

Work-In-Progress � Combinations of PCA , OCNM , KOAD ν ν � Supervised learning, adaptively set parameters: , 2 1 � Distributed versions, incremental OCNM � Other applications Traffic Incident Detection [Ahmed 07] �

References [WebPage] T. Ahmed and M. Coates. Online sequential diagnosis of network anomalies. Project Description. [Online]. Available: http://www.tsp.ece.mcgill.ca/Networks/projects/projdesc-monit- tarem.html [Ahmed 07] T. Ahmed, B. Oreshkin and M. Coates, “Machine learning approaches to network anomaly detection,” Proc. USENIX Workshop on Tackling Computer Systems Problems with Machine Learning Techniques (SysML) , Cambridge, MA, Apr. 2007. [Engel 04] Y. Engel, S. Mannor, and R. Meir, “The kernel recursive least squares algorithm,” IEEE Trans. Signal Proc. , vol. 52, no. 8, pp. 2275–2285, Aug. 2004. [Lakhina 05] A. Lakhina, M. Crovella and C. Diot, “Mining anomalies using traffic feature distributions,” in Proc. ACM SIGCOMM , Philadelphia, PA, Aug. 2005. [Muñoz 06] A. Muñoz and J. Moguerza, “Estimation of high-density regions using one-class neighbor machines,” IEEE Trans. Pattern Analysis and Machine Intelligence , vol. 28, num 3, pp 476--480, Mar. 2006.

Multivariate Online Anomaly Detection Using Kernel Recursive Least - PowerPoint PPT Presentation

Multivariate Online Anomaly Detection Using Kernel Recursive Least Squares Tarem Ahmed, Mark Coates and Anukool Lakhina * tarem.ahmed@mail.mcgill.ca, coates@ece.mcgill.ca, anukool@cs.bu.edu IEEE Infocom, Anchorage, AK May 06-12, 2007 Research

Outline Multivariate Data 1 Multivariate Parametric Methods Multivariate Normal Distribution 2

What is an anomaly? Alastair Rushworth Data Scientist DataCamp Anomaly Detection in R Defining

Isolation trees Alastair Rushworth Data Scientist DataCamp Anomaly Detection in R Isolation

Anomaly Detection of Trajectories Junier B. Oliva Anomaly Detection An anomaly (or outlier)

Anomaly Detection Jia-Bin Huang Virginia Tech Spring 2019 ECE-5424G / CS-5824 Administrative

Data Mining II Anomaly Detection Heiko Paulheim Anomaly Detection Also known as Outlier

Learning Rules for Anomaly Detection (LERAD) of Hostile Network Traffic Matt Mahoney Overview

Data Mining II Anomaly Detection Heiko Paulheim Anomaly Detection Also known as Outlier

Structure of Talk Workload-sensitive Timing Behavior Anomaly Detection 1 Motivation in Large

Dataflow Anomaly Detection Presented By Archana Viswanath Computer Science and Engineering The

<Title> Yiqun Hu, SP Group Agenda Condition monitoring & anomaly detection

In Incorporating Feedback in into Tree-based Anomaly Detection Shubhomoy Das, Weng-Keen Wong,

Multivariate t-distributions Surajit Ray Reader, University of Glasgow DataCamp Multivariate

Reading multivariate data Surajit Ray Reader, University of Glasgow DataCamp Multivariate

Tight Kernel Query Complexity of Kernel Ridge Regression and Kernel -means Clustering Manuel

Inter-Arrival Curves for Multi-Mode and Online Anomaly Detection Mahmoud Salem, Mark Crowley,

Optimization Problems Problem: a general class, e.g., the shortest-path problem for

TSP national initiative

Linda Davis UAW Region 1 Benefits Representative Bill Cremeans Local 5960 Benefits Representative

INCORPORATING SAFETY INTO THE PLANNING PROCESS Session Learning Objectives 1. Understanding TSP

DEMYSTIFYING VIDEO ASSIGNMENTS AND PRESENTATIONS FOR THE ONLINE CLASSROOM Dr. Meghan Ferraro Dr.

T ransportation S ystem P reservation TSP Judith Corley-Lay, Director National Center for

Bailey Meadows Appellant Presentation Key Appeal Points: Construction of Gunderson Rd by

ST STAYT YTON ON TSP TSP UPD UPDATE TE JOINT WORK SESSION #1 PLANNING COMMISSION AND CITY

Sambuz

Useful Links

Newsletter

Mail Us

Multivariate Online Anomaly Detection Using Kernel Recursive Least - PowerPoint PPT Presentation

Multivariate Online Anomaly Detection Using Kernel Recursive Least Squares Tarem Ahmed, Mark Coates and Anukool Lakhina * tarem.ahmed@mail.mcgill.ca, coates@ece.mcgill.ca, anukool@cs.bu.edu IEEE Infocom, Anchorage, AK May 06-12, 2007 Research

Outline Multivariate Data 1 Multivariate Parametric Methods Multivariate Normal Distribution 2

What is an anomaly? Alastair Rushworth Data Scientist DataCamp Anomaly Detection in R Defining

Isolation trees Alastair Rushworth Data Scientist DataCamp Anomaly Detection in R Isolation

Anomaly Detection of Trajectories Junier B. Oliva Anomaly Detection An anomaly (or outlier)

Anomaly Detection Jia-Bin Huang Virginia Tech Spring 2019 ECE-5424G / CS-5824 Administrative

Data Mining II Anomaly Detection Heiko Paulheim Anomaly Detection Also known as Outlier

Learning Rules for Anomaly Detection (LERAD) of Hostile Network Traffic Matt Mahoney Overview

Data Mining II Anomaly Detection Heiko Paulheim Anomaly Detection Also known as Outlier

Structure of Talk Workload-sensitive Timing Behavior Anomaly Detection 1 Motivation in Large

Dataflow Anomaly Detection Presented By Archana Viswanath Computer Science and Engineering The

&lt;Title&gt; Yiqun Hu, SP Group Agenda Condition monitoring &amp; anomaly detection

In Incorporating Feedback in into Tree-based Anomaly Detection Shubhomoy Das, Weng-Keen Wong,

Multivariate t-distributions Surajit Ray Reader, University of Glasgow DataCamp Multivariate

Reading multivariate data Surajit Ray Reader, University of Glasgow DataCamp Multivariate

Tight Kernel Query Complexity of Kernel Ridge Regression and Kernel -means Clustering Manuel

Inter-Arrival Curves for Multi-Mode and Online Anomaly Detection Mahmoud Salem, Mark Crowley,

Optimization Problems Problem: a general class, e.g., the shortest-path problem for

TSP national initiative

Linda Davis UAW Region 1 Benefits Representative Bill Cremeans Local 5960 Benefits Representative

INCORPORATING SAFETY INTO THE PLANNING PROCESS Session Learning Objectives 1. Understanding TSP

DEMYSTIFYING VIDEO ASSIGNMENTS AND PRESENTATIONS FOR THE ONLINE CLASSROOM Dr. Meghan Ferraro Dr.

T ransportation S ystem P reservation TSP Judith Corley-Lay, Director National Center for

Bailey Meadows Appellant Presentation Key Appeal Points: Construction of Gunderson Rd by

ST STAYT YTON ON TSP TSP UPD UPDATE TE JOINT WORK SESSION #1 PLANNING COMMISSION AND CITY

Sambuz

Useful Links

Newsletter

Mail Us

<Title> Yiqun Hu, SP Group Agenda Condition monitoring & anomaly detection