Empirical Analysis and Statistical Modelling of Attack Processes based on Honeypots M. Kaâniche 1 , E. Alata 1 , V.Nicomette 1 , Y. Deswarte 1 , M. Dacier 2 LAAS-CNRS 1 , Eurecom 2 Mohamed.Kaaniche@laas.fr ACI “Sécurité & Informatique” http://acisi.loria.fr Workshop on Empirical Evaluation of Dependability and Security (WEEDS-DSN06), Philadelphia, PA, June 28, 2006
Outline Context and motivation Data collection Attack processes modeling Conclusion and open issues
Context Need for real data and methodologies to learn about malicious activities on the Internet and analyze their impact on systems security Several initiatives for monitoring malicious threats do exist ■ CAIDA ■ Motion Sensor project ■ Dshield ■ CADHo
CADHo Objectives Build and deploy on the Internet a distributed platform of identically configured low-interaction honeypots in a large number of diverse locations Carry out various analyses based on the collected data to better understand threats and build models to characterize attack processes Analyze and model the behavior of malicious attackers once they manage to get access and compromise a target ■ High-interaction honeypots
Leurré.com data collection platform R Mach0 e Windows 98 v Workstation e V r i Mach1 r s t Windows NT (ftp u e + web server) Internet a l F S I w Mach2 i r Redhat 7.3 (ftp t e c server) h w a l l Observer (tcpdump)
35 platforms, 25 countries, 5 continents
Data analysis Data collection since 2004 80 000 different IP addresses from 91 different countries Information extracted from the logs ■ Raw packets (entire frames including payloads) ■ IP address of the attacking machine ■ Time of the attack and duration ■ Targeted virtual machines and ports ■ Geographic location of the attacking machine ( Maxmind, NetGeo ) ■ Os of the attacking machine ( p0f, ettercap, disco ) Automatic data analyses have been developed to extract useful trends and identify hidden phenomena from the data ■ Clustering techniques, Time series analysis, etc. ■ Publications available at: www.leurrecom.org/paper.htm
Modeling Objectives Identify probability distributions that best characterize attack occurrence and attack propagation processes Model the time relationships between attacks coming from different sources (or to different destinations) Analyze whether data collected from different platforms exhibit similar or different malicious attack activities Predict occurrence of new attacks on a given platform based on past observations on this platform and other platforms Estimate impact of attacks on security of target systems ■ High-interaction honeypots to analyze attackers behavior once they compromise and get access to a target
Examples Analysis of the time evolution of the number of attacks taking into account the geographic location of attacking machines Characterization and statistical modeling of times between attacks Analysis of the propagation of attacks among the honeypot platforms Data ■ 320 days from January 1st 2004 to April 17, 2005 ■ 14 honeypot platforms (the most active ones) ■ 816475 observed attacks
Attack occurrence and geographic distrib. The number of attacks per unit of R 2 α j β j time, considering a single platform or all platforms, can be described as a Russia 44.57 1555.67 0.93 linear regression of the attacks USA 5.13 759.1 0.94 originating from a single country only UK 25.93 438.03 0.94 Y(t) = α j X j (t) + β j
“Times between attacks” analysis An attack is associated to an IP address ■ occurrence time associated to the first time a packet is received from the corresponding address t i = time between attacks i and ( i-1 ) P5 P6 P9 P20 P23 #ti 85890 148942 46268 224917 51580 #IP 79549 90620 42230 162156 47859
Number of attacks per IP address
“Times between attacks” distribution Best fit provided by a mixture distribution k � � t pdf ( t ) = P a k + 1 + (1 � P a ) � e ( t + 1) 0.025 Pa = 0.0115 k = 0.1183 0.020 λ = 0.1364/sec. Data 0.015 pdf Mixture (Pareto, Exp.) 0.010 Exponential 0.005 0.000 1 31 61 91 121 151 181 211 241 271 Time between attacks Platform 6
“Times between attacks” distribution 0.03 0.08 0.03 0.07 P a = 0.0051 k = 0.173 Pa = 0.0019 0.06 λ = 0.121/sec. 0.02 k = 0.1668 λ = 0.276/sec. 0.05 pdf pdf 0.02 0.04 0.01 0.03 Mixture (Pareto, Exp.) Data Mixture (Pareto, Exp.) 0.02 Exponential 0.01 Data Exponential 0.01 0.00 Time (sec.) 0.00 1 31 61 91 121 151 181 211 241 271 1 31 61 91 121 151 181 211 241 271 Time between attacks Time between attacks Platform 5 Platform 9 0.06 0.02 0.01 0.05 Pa = 0.0144 Data 0.01 Pa = 0.0031 k = 0.0183 Mixture (Pareto, Exp.) λ = 0.0136/sec. k = 0.1240 0.01 0.04 λ = 0.275/sec. pdf 0.01 pdf 0.03 0.01 Exponential 0.00 0.02 Mixture (Pareto, Exp.) 0.00 0.01 Exponential Data 0.00 1 31 61 91 121 151 181 211 241 271 0.00 Time between attacks 1 31 61 91 121 151 181 211 241 271 Time between attacks Platform 20 Platform 23
Propagation of attacks A Propagation is assumed to occur when an IP address of an attacking machine observed at a given platform is observed at another platform Propagation graph ■ Nodes identify the platforms ■ Transitions identify propagations A propagation between Pi and Pj occurs from an IP address when the next occurrence of this address is observed on Pj after visiting Pi ■ Probabilities are associated to the transitions to reflect their likelihood of occurrence
Propagation graph 4.3% 15.1% 15.1% 96.1% 96.1% 8.1% P6 43.2% 1.1% 1% P5 1.4% 11.3% 11.3% 12.6% 1.37% 95.5% 95.5% P20 59% 59% 0.9% 15.4% 15.4% 29% 29% 0.6% 3.7% 1.35% 0.6% P9 P23 2.7% 4.1% 54.1% 30.3% Issues under investigation ■ Focus on specific attacks (largest clusters, worms, etc.) ■ Timing characteristics and probability distributions
Summary and Conclusions Preliminary models to characterize attack processes observed on low-interaction honeypots Several open issues ■ Predictive models that can be used to support decision making during design and operation stages ■ How to assess the impact of attacks on the security of target systems? High-interaction honeypots ■ Analyze attackers behavior once they get access to a target ■ Validate a theoretical model for quantitative evaluation of security developed by LAAS in the 90’s Privilege graph to describe vulnerabilities and attack scenarios METF “Mean Effort To security Failure” to quantify security Assumptions about intruders behaviors
Recommend
More recommend