Nonparametric Sequential Change Detection for High-Dimensional Problems Nonparametric Sequential Change Detection for High-Dimensional Problems Yasin Yılmaz Electrical Engineering, University of South Florida Allerton 2017
Nonparametric Sequential Change Detection for High-Dimensional Problems Outline 1 Introduction 2 Background 3 ODIT: Online Discrepancy Test 4 Numerical Results 5 Conclusion
Nonparametric Sequential Change Detection for High-Dimensional Problems Introduction Introduction
f 0 ( x ) f 1 ( x ) f ( x ) x Nonparametric Sequential Change Detection for High-Dimensional Problems Introduction Anomaly Detection Objective: identify patterns that deviate from a nominal behavior Applications: cybersecurity, quality control, fraud detection, fault detection, health care, . . .
Nonparametric Sequential Change Detection for High-Dimensional Problems Introduction Anomaly Detection Objective: identify patterns that deviate from a nominal behavior Applications: cybersecurity, quality control, fraud detection, fault detection, health care, . . . 0.45 In literature typically f 0 ( x ) f 1 ( x ) 0.4 statistical outlier detection 0.35 = 0.3 anomaly detection 0.25 f ( x ) 0.2 However an outlier could be 0.15 nominal tail event 0.1 or 0.05 real anomalous event 0 -5 0 5 10 (e.g., mean shift) x
Nonparametric Sequential Change Detection for High-Dimensional Problems Introduction Problem Formulation Instead of anomaly = outlier , consider also temporal dimension Nominal 4 Proposed Model 2 anomaly = persistent outliers x ( t ) 0 -2 outlier Objective -4 0 2 4 6 8 10 12 14 16 18 20 t Timely and accurate detection of Anomaly after t=10 with prob. 0.2 4 anomalies in high-dimensional 2 persistent outliers datasets x ( t ) 0 -2 Approach -4 0 2 4 6 8 10 12 14 16 18 20 t Sequential & Nonparametric anomaly detection
Nonparametric Sequential Change Detection for High-Dimensional Problems Introduction Motivating Facts: IoT Security, Smart Grid, . . . IoT devices: 8.4B in 2017 and expected to hit 20B by 2020 1 IoT systems: highly vulnerable – needs scalable security solutions 2 Mirai IoT botnet: largest recorded DDoS attack with at least 1.1 Tbps bandwidth (Oct. 2016) 2 Persirai IoT botnet targets at least 120,000 IP cams (May 2017) 3 A plausible cyberattack against the US grid: 100M people may be left without power with up to $1 trillion of monetary loss 4 1 R. Minerva, A. Biru, and D. Rotondi, “Towards a definition of the Internet of Things (IoT),” IEEE Internet Initiative, no. 1, 2015. 2 E. Bertino and N. Islam, “Botnets and Internet of Things Security,” Computer, vol. 50, no. 2, pp. 76-79, Feb. 2017. 3 Trend Micro, “Persirai: New Internet of Things (IoT) Botnet Targets IP Cameras”, May 9 , 2017, available online 4 Trevor Maynard and Nick Beecroft, “Business Blackout,” Lloyd’s Emerging Risk Report, p. 60, May 2015.
Nonparametric Sequential Change Detection for High-Dimensional Problems Introduction Motivating Facts: IoT Security, Smart Grid, . . . Challenges: Unknown anomalous distribution: parametric methods, as well as signature-based methods (e.g., antivirus) are not feasible High-dimensional problems: even nominal distribution is difficult to know Nonparametric methods are needed Timely and accurate detection is critical
Nonparametric Sequential Change Detection for High-Dimensional Problems Background Background
Nonparametric Sequential Change Detection for High-Dimensional Problems Background Sequential Change Detection - CUSUM inf T sup sup E τ [ T − τ | T ≥ τ ] s.t. E ∞ [ T ] ≥ β { x 1 ,..., x T } τ � � W t − 1 + log f 1 ( x t ) W t = max f 0 ( x t ) , 0 T = min { t : W t ≥ h }
Nonparametric Sequential Change Detection for High-Dimensional Problems Background Statistical Outlier Detection Needs to know a statistical description f 0 of the nominal (e.g., no attack) behavior (baseline) Determines instances that significantly deviate from the baseline � ∞ With f 0 completely known, x is outlier if f 0 ( y )d y < α (p-value) x Equivalently, if x �∈ most compact set of data points under f 0 (minimum volume set) � � Ω α = arg min d y subject to f 0 ( y )d y ≥ 1 − α A A A 0.4 Uniformly most powerful test when 0.35 anomalous distribution is a linear mixture 0.3 0.25 of f 0 and the uniform distribution f 0 ( x ) 0.2 Coincides with minimum entropy set which 0.15 0.1 minimizes the R´ enyi entropy while 0.05 satisfying the same false alarm constraint 0 -5 -4 -3 -2 -1 0 1 2 3 4 5 x
Nonparametric Sequential Change Detection for High-Dimensional Problems Background Geometric Entropy Minimization (GEM) 0.8 High-dimensional datasets: even if f 0 is Training set 1 0.75 Training set 2 known, very computationally expensive Test set 0.7 (if not impossible) to determine Ω α 0.65 L ( K ) 0.6 Various methods for learning Ω α L 1 x ij 2 0.55 t GEM is very effective with 0.5 0.45 high-dimensional datasets while 0.4 asymptotically achieving Ω α for L 2 0.35 lim K , N →∞ K / N → 1 − α 0.3 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 x ij 1 t Training: Randomly partitions training set into two and forms K - k NN graph 5 K k ¯ � � X N 1 L k ( X N 1 K , X N 2 ) = K = arg min | e i ( l ) | γ X N 1 i =1 l = k ∗ K Test: new point x t ∈ R d outlier if x t �∈ ¯ X N 1 +1 , K l = k ∗ | e t ( l ) | γ > L ( K ) equivalently if L t = � k 5 A. O. Hero III, “Geometric entropy minimization (GEM) for anomaly detection and localization”, NIPS, pp. 585-592, 2006
Nonparametric Sequential Change Detection for High-Dimensional Problems ODIT: Online Discrepancy Test ODIT: Online Discrepancy Test
Nonparametric Sequential Change Detection for High-Dimensional Problems ODIT: Online Discrepancy Test Online Discrepancy Test (ODIT) 0.7 GEM lacks the temporal aspect ODIT statistic, s ij t Detection threshold, h 0.6 In GEM, x t is outlier if l = k ∗ | e i ( l ) | γ > L ( K ) L t = � k 0.5 0.4 In ODIT, D t = L t − L ( K ) is treated as some positive/negative evidence for 0.3 anomaly 0.2 D t approximates ℓ t = log p ( r ( x t ) | H 1 ) 0.1 p ( r ( x t ) | H 0 ) between H 1 claiming x t is anomalous and 0 1 2 3 4 5 6 7 8 9 10 t H 0 claiming x t is nominal
Nonparametric Sequential Change Detection for High-Dimensional Problems ODIT: Online Discrepancy Test Online Discrepancy Test (ODIT) 0.7 GEM lacks the temporal aspect ODIT statistic, s ij t Detection threshold, h 0.6 In GEM, x t is outlier if l = k ∗ | e i ( l ) | γ > L ( K ) L t = � k 0.5 0.4 In ODIT, D t = L t − L ( K ) is treated as some positive/negative evidence for 0.3 anomaly 0.2 D t approximates ℓ t = log p ( r ( x t ) | H 1 ) 0.1 p ( r ( x t ) | H 0 ) between H 1 claiming x t is anomalous and 0 1 2 3 4 5 6 7 8 9 10 t H 0 claiming x t is nominal Assuming independence, � T t =1 D t gives aggregate anomaly evidence until time T (as � T t =1 ℓ t , sufficient statistic for optimum detection) Similar to CUSUM (optimum minimax sequential change detector), ODIT decides using T d = min { t : s t ≥ h } , s t = max { s t − 1 + D t , 0 }
Nonparametric Sequential Change Detection for High-Dimensional Problems ODIT: Online Discrepancy Test Theoretical Justification - Asymptotic Asymptotic Optimality - Scalarized problem As training set grows ( N 2 → ∞ ) ODIT is asymptotically optimum for H 0 : r ( x t ) ∼ f k 0 , ∀ t H 1 : r ( x t ) ∼ f k and r ( x t ) ∼ f k 0 , t < τ, uni , t ≥ τ { x t } independent r ( x t ) kNN distance f 0 ( x t ) > 0 Lebesgue continuous f k 0 and f k uni distributions of kNN distance under f 0 and uniform distr. � ∞ r α f k on a d -dimensional grid with spacing r α where 0 ( r )d r = α
Nonparametric Sequential Change Detection for High-Dimensional Problems ODIT: Online Discrepancy Test Sketch of the Proof For independent { x t } , continuous f 0 > 0 defines a non-homogeneous Poisson point process with continuous rate λ ( x ) > 0. Obtain a homogeneous Poisson point process with rate k by defining a d -dimensional non-homogeneous grid with volume k /λ ( x ) 6 For this homogeneous Poisson point process, nearest neighbor function is given by D x ( r d ) = k d v d ( x , r ) e − kv d ( x , r ) d r d Under H 0 , r ( x t ) = r t comes from f k 0 which can be computed using training set as L t . Under H 1 , r ( x t ) = r α comes from f k uni which has a single atom at r α , computed as L ( K ) . As training set grows, L t → r t and L ( K ) → r α D x ( r α ) D x ( r t ) = kc ( r d t − r d The optimum CUSUM test computes log α ) 6 Robert Gallager. 6.262 Discrete Stochastic Processes, Chapter 2. Spring 2011. Massachusetts Institute of Technology: MIT OpenCourseWare, https://ocw.mit.edu. License: Creative Commons BY-NC-SA.
Recommend
More recommend