Statistical Modeling of UNIX Statistical Modeling of UNIX Users and Processes With Users and Processes With Application to Computer Application to Computer Intrusion Detection Intrusion Detection Wen-Hua Ju 1
Acknowledgement Acknowledgement Yehuda Vardi (Rutgers) Matthias Schonlau (RAND) William DuMouchel (AT&T Labs) Alan F. Karr (NISS) Allan Wilks (AT&T Labs) Daryl Pregibon (AT&T Labs) 2
How Statistician got involved … How Statistician got involved … • Refine techniques, developed by AT&T Labs Statistics Research, for detection of telephone fraud to detection of intrusion into networked computer systems. • But … – Multiple intruder motives – Hard-to-quantify losses – Massive data • Something simpler: Characterization of and differentiation among users of a computer system 3
Outline Outline • Experiments and Data – UNIX users – UNIX processes • Models for finite-state discrete stochastic processes – Hybrid High-order Markov Chain – Rarity of Occurrence • Results and Discussion 4
Computer Intrusion And Intrusion Detection Computer Intrusion And Intrusion Detection • Computer Intrusion: A sequence of related actions by a malicious adversary that results in the occurrence of unauthorized security threats to a target computing or networking domain. Edward Amoroso (1999) 5
Experiments And Data Experiments And Data • UNIX Users: Detecting Masquerades – Command sequences (AT&T Labs) – Collected by the UNIX acct auditing mechanism 6
Experiments And Data Experiments And Data • UNIX Users: Detecting Masquerades – 70 users, 15,000 commands each • 50 users: normal users (intrusion target) • 20 users: masqueraders – Simplifying assumption • Block of 100 commands – Blocks are randomly chosen from masqueraders and inserted to normal users – Data available at http://www.schonlau.net/intrusion.html 7
Experiments And Data Experiments And Data • UNIX Processes: – System-call traces (Computer Immune System Research, University of New Mexico) – Normal data: synthetic and live – Intrusion data: real intrusion 9
High-order Markov Chain Model High-order Markov Chain Model • High-order vs. regular Markov model • Problem: Huge Parameter Space • Mixture Transition Distribution (MTD) (Raftery 85; Raftery and Tavaré 94) – Auto-regressive – Only one extra parameter is added to the model for each extra lag 10
High-order Markov Chain Model High-order Markov Chain Model MTD Model MTD Model = = = = ( | ,..., ) P X s X s X s − − 1 t i t i t l i 0 1 l l ∑ λ = + + ( | ), 1 , 2 r s s t l l j i i 0 j = 1 j = = λ where { ( | )} and { } satisfy R r s s ? i j i K ∑ ≥ = ∀ = ( | ) 0 and ( | ) 1 , 1 ,... r s s r s s j K i j i j = i 1 l ∑ λ ≥ λ = 0 , 1 i i = 1 i 11
High-order Markov Chain Model High-order Markov Chain Model MTD Model: Parameter estimation via MLE MTD Model: Parameter estimation via MLE K K l ∑ ∑ ∑ = λ log ( ,..., ) ... ( ,..., ) log ( | ) L x x N s s r s s 1 T i i j i i 0 0 l j = = = 1 1 1 i i j 0 l Direct maximization: Sequential quadratic • programming algorithm, but … Alternating maximization • Fix r(.|.): easy – Fix λ : still too many parameters – ∑ log a b k k k ∑ ∑ = − = l where and a T l b k k k k k 12
High-order Markov Chain Model High-order Markov Chain Model MTD Model: MLE MTD Model: MLE It’s equivalent to solve the following linear system for b (or λ ) a a ∑ ˆ = = ∀ l k k , b b K k ∑ − k k a T l k k k l l K ∑ λ = ∀ ˆ ( | ) ( ,..., ) , ( ,... ) r s s N s s i i − 0 j i i i i l 0 j T l 0 l = 1 j Can be “solved” efficiently using EM algorithm in the sense of minimizing the K-L distance 13
High-order Markov Chain Model High-order Markov Chain Model Application to Command Data Application to Command Data • Exhaustive Command Space (ECS) Model: – Treat all commands as Markov chain states • Partial Command Space (PCS) Model: – Treat frequently used commands as Markov chain states, and use “other” to represent the rest • Modification for “other” – r (other | .) are small – r (. | other) are equal • Using the parameter estimations as user profile 14
High-order Markov Chain Model High-order Markov Chain Model Application to Command Data Application to Command Data • Hypothesis Testing as A Decision Rule H 0 : Command blocks are from user u H 1 : Command blocks are NOT from user u • Likelihood-ratio Like test ˆ ˆ ˆ ˆ Λ Λ = ( ,..., | ,..., , ,..., ) X c c R R 1 1 1 u T U U Λ ˆ ˆ max ( ,..., | , ) L c c R ≠ 1 v u T v v log Λ ˆ ˆ ( ,..., | , ) L c c R 1 T u u > Reject if H X w 0 u 15
Hybrid High-order Markov Hybrid High-order Markov Chain Model Chain Model � In case of no or not enough training data: Independence model T T ∏ ∏ = = ( ,... | user ) ( | user ) P c c u P c u q 1 T t uc t = = 1 1 t t � Estimate q’s using modified user/command counts 16
Hybrid High-order Markov Chain Application to Hybrid High-order Markov Chain Application to Command Data Command Data • Test statistics Λ ˆ ˆ max ( ,..., | , ) L c c R = ≠ 1 ( ,..., | user ) log v u T v v X c c u 1 1 u T Λ ˆ ˆ ( ,..., | , ) L c c R 1 T u u ∏ T ˆ max q ≠ = = v u vc 1 i ( ,..., | user ) log i X c c u ∏ 2 1 u T T ˆ q = uc i 1 i = ρ ˆ X X 1 2 u u τ ρ < τ ˆ , if X 1 2 1 u ′ = ρ τ ≤ ρ ≤ τ ˆ ˆ , if X X X 2 u 2 u 1 2 u 2 τ ρ > τ ˆ , if X 2 2 2 u 17
Hybrid High-order Markov Chain Application to Hybrid High-order Markov Chain Application to Command Data Command Data • Hybrid test statistic ≤ ξ , if X s/T 1 1 u ξ − − ξ / / s T s T ′ = + ξ ≤ ≤ ξ 2 1 , if X X X s/T ξ − ξ ξ − ξ 1 2 1 2 u u u 2 1 2 1 ′ > ξ , if X s/T 2 2 u : # of in { ,..., } other s c c 1 T 18
Rarity of Occurrence Model Rarity of Occurrence Model • Motivation: Depend not only on frequency – Schonlau and Theus (2000) • Rarity of Command(s) – Popular and frequently used – Popular but not frequently used – Rare or unique • Define the rarity index of a command based on the number of users who used this command 19
Rarity of Occurrence Model Rarity of Occurrence Model • Rarity Index Example: – Total 50 users – A command used by only 1 user: 50/50 – A command used by all 50 user: 1/50 – A command used by no users: ½(?) – Defined for both individual command and a short sequence of commands 20
Rarity of Occurrence Model Rarity of Occurrence Model • Anomaly signal of user u’s short command sequence (c k1 ,…,c kl ) defined as the weighted rarity index – Weight (+/-) depends on frequency – Case 1: User u has used P u – Case 2: User u didn’t use P u , but has used all the commands – Case 3: User u didn’t use all the commands • Test score is defined as a weighted sum of anomaly signals 21
Rarity of Occurrence Model Rarity of Occurrence Model • Entropy model (only tried on the system call data) – Motivation – Shannon’s entropy of distribution {p i } – Small entropy indicates abnormality – Test score is defined as the sum of weighted entropies 22
Unix command result
x
Discussion Discussion • Hybrid High-order Markov Chain Model – Multi-layer defense scheme – Computation demand – Likelihood-ratio • Rarity of Occurrence Model – Good performance – Global Information are important • Future study – Utilizing more information – Relaxing experiment limitation – Other audit data format 26
Conclusion Conclusion 27
Recommend
More recommend