A Game-Theoretic Approach for Alert Prioritization Aron Laszka, Yevgeniy Vorobeychik, Daniel Fabbri, Chao Yan, Bradley Malin
Intrusion Detection • Detection and mitigation of cyber-attacks is of crucial importance; however, attackers try to stay stealthy • Intrusion Detection Systems (IDS) • generate alerts when they encounter suspicious activity • in order to be able to detect novel attacks, they must also generate a large number of false alerts ? ≫ alert investigation budget B attack alerts IDS (available manpower, …) false alerts Problem: Which alerts to investigate?
Alert Prioritization Alerts ?
Alert Types • Alert types T Alerts • for example, matching di ff erent rules in an intrusion detection system (e.g., Snort) t 1 • before investigating them, alerts of the same type appear equally important t 2 • cumulative distribution F t of the number of false alerts of type t is known Alert types T t 3 • Attacks A • for example, targeting certain machines or using certain exploitation techniques t 4 • impact of attack a is L a • probability of attack a raising an alert of type t is R a,t
Alert Types Alerts Attacks t 1 t 2 attack a 1 Alert types T t 3 attack a 2 t 4 probability R a,t
Alert Prioritization Problem Alerts Naïve prioritization t 1 investigate (using budget B ) t 2 Alert types T t 3 attack t 4
Alert Prioritization Problem Random choice Alerts ordering o 1 ordering o 2 ordering o 3 t 1 t 2 … Alert types T t 3 t 4 Problem: What is the optimal probability distribution?
Game-Theoretic Model • Players 1. Defender: selects an alert prioritization strategy p , which is a probability distribution over possible orderings of T 2. Adversary: selects an attack a from the set of possible attacks A • Supposing that the defender uses ordering o ∈ T • probability of investigating type k (before exhausting budget B ) is · ≤ " k − 1 # X Y F ∗ o k ( n k ) − F ∗ � � PI ( o , k ) = o k ( n k − 1) · ( F o i ( n i ) − F o i ( n i − 1)) n : i =1 C ok + P k i =1 n i · C oi ≤ B # • probability of investigating attack a (before exhausting budget B ) is ∈ ⇣ ⌘ X Y Y o , min { i | o i ∈ ˆ PD ( o ,a ) = (1 − R a,t ) PI T } R a,t ˆ t ∈ ˆ t ∈ T \ ˆ X Y Y T ⊆ T T T (4)
Optimal Alert Prioritization • Adversary’s gain and defender’s loss • adversary’s expected gain: X EG ( p , a ) = p o · (1 − PD ( o , a )) · G a − K a , o ∈ O • defender’s expected loss: X EL ( p , a ) = p o · (1 − PD ( o , a )) · L a . o ∈ O • Solution concept: strong Stackelberg equilibrium • adversary’s best responses: BR ( p ) = argmax EG ( p , a ) a ∈ A • optimal prioritization strategy : p ,a ∈ BR ( p ) EL ( p , a ) min Challenge: finding an optimal probability distribution over a set of exponential size! Theorem : Finding an optimal alert prioritization strategy is an NP-hard problem.
Computing Detection Probabilities • Probability of detecting an attack · ≤ " k − 1 # X Y F ∗ o k ( n k ) − F ∗ � � PI ( o , k ) = o k ( n k − 1) · ( F o i ( n i ) − F o i ( n i − 1)) n : i =1 C ok + P k i =1 n i · C oi ≤ B ∈ # ⇣ ⌘ X Y Y o , min { i | o i ∈ ˆ PD ( o ,a ) = (1 − R a,t ) PI T } R a,t ˆ t ∈ ˆ t ∈ T \ ˆ X Y Y T ⊆ T T T exponential number of terms (4) • Dynamic programming algorithm Algorithm 1 Computing PD ( o , a ) Input: prioritization game, prioritization o , attack a 1: for b = 0 , 1 , . . . , B do PD ( o , a, | T | , b ) R a,o | T | · F ⇤ � � b b/C o | T | c � 1 2: o | T | 3: end for 4: for i = | T | � 1 , . . . , 2 , 1 do for b = 0 , 1 , . . . , B do 5: b b/C oi c " # X PD ( o , a, i, b ) R a,o i · F ⇤ o i ( b b/C o i c � 1)+(1 � R a,o i ) ( F o i ( j ) � F o i ( j � 1)) · PD ( o , a, b � j · C o i , i +1) 6: j =0 end for 7: 8: end for 9: Return PD ( o , a ) := PD ( o , a, 1 , B )
Finding an Optimal Alert Prioritization Strategy • Linear-programming based formulation X 2 • for each attack a ∈ A , solve where X max p o · PD ( o , a ) where D ( o, a 0 ) = [(1 � PD ( o , a )) G a � (1 � PD ( o , a 0 )) G a 0 ] 0 � p o 2 O . Once each is solved, we and ∆ ( K a 0 ) = K a � K a 0 . subject to can choose the solution p ⇤ 8 a 0 2 A : exponential number X p o · D ( o , a 0 ) � ∆ ( K a 0 ) of possible orderings o 2 O • output the solution that attains the lowest loss Problem : Finding an improving column (i.e., ordering) is an NP-hard problem. • Polynomial-time column generation approach Algorithm 2 Greedy Column Generation Input: prioritization game, reduced cost function ¯ c 1: o ; where 2: while 9 t 2 T \ o do X y ( ¯ O, a 0 ) D ( o , a 0 ) c ( o ) = PD ( o , a ) + ¯ o o + argmax t 2 T \ o ¯ c ( o + t ) 3: a 0 2 A 4: end while (i.e., reduced cost function) 5: Return o
Numerical Results - Synthetic Dataset Defender’s Loss Running Time 0 . 8 10 3 Optimal Defender’s expected loss Greedy Column Generation 10 2 Running time [s] 0 . 6 10 1 0 . 4 10 0 10 − 1 0 . 2 10 − 2 Optimal Greedy Column Generation 0 2 3 4 5 6 7 2 3 4 5 6 7 Number of attack and alert types Number of attack and alert types K a = 0 , C t = 1 , B = 5| T | , D a and G a were drawn at random from [0.5, 1] , each R a,t is either 0 (with probability 1/3 ) or drawn at random from [0, 1] , and every F t has a Poisson distribution whose mean is drawn at random from [5, 15] .
Real-World Dataset: Electronic Medical Record System Alerts • Access logs from the electronic medical record (EMR) system in place at Vanderbilt University Medical Center • integrated with human-resources data to document medical department a ffi liation, employment information, and home addresses patient 1. same surname record 1 employees 2. coworkers patient record 2 3. home within 0.25 miles patient 4. … 5. … 6. … record 3 … Explanation Based Alert types T Attacks A ~ potential misuses Auditing System [1] [1] Fabbri, D., and LeFevre, K. 2013. Explaining accesses to electronic medical records using diagnosis information. Journal of the American Medical Informatics Association 20(1):52–60.
Numerical Results - Real-World Dataset • Data collected from five Defender’s Loss consecutive weeks of access logs from 2016 1 • 8,481,767 accesses made Defender’s expected loss 0 . 8 by 14,531 users to 161,426 patient records, leading to 0 . 6 a total of 863,989 alerts • Approximated the 0 . 4 distributions of false alerts 0 . 2 using Poisson distributions Optimal • In order to find optimal 0 Greedy Column Generation strategies, we restricted 0 0 . 2 0 . 4 0 . 6 0 . 8 1 1 . 2 the alerts to 12 randomly · 10 4 Defender’s budget selected patients
Conclusion & Future Work • Prioritization of alerts is of crucial importance to the effectiveness of intrusion and misuse detection • Result highlights • introduced first model of alert prioritization against strategic adversaries • showed that finding an optimal prioritization strategy is NP-hard • proposed an e ffi cient column-generation based approach • evaluated numerically using synthetic and real-world datasets • Future work • constant approximation ratio algorithms • modeling multiple adversary types as a Bayesian Stackelberg game
Thank you for your attention! Questions?
Recommend
More recommend