empirical analysis and statistical modelling of attack
play

Empirical Analysis and Statistical Modelling of Attack Processes - PowerPoint PPT Presentation

Empirical Analysis and Statistical Modelling of Attack Processes based on Honeypots M. Kaniche 1 , E. Alata 1 , V.Nicomette 1 , Y. Deswarte 1 , M. Dacier 2 LAAS-CNRS 1 , Eurecom 2 Mohamed.Kaaniche@laas.fr ACI Scurit &


  1. Empirical Analysis and Statistical Modelling of Attack Processes based on Honeypots M. Kaâniche 1 , E. Alata 1 , V.Nicomette 1 , Y. Deswarte 1 , M. Dacier 2 LAAS-CNRS 1 , Eurecom 2 Mohamed.Kaaniche@laas.fr ACI “Sécurité & Informatique” http://acisi.loria.fr Workshop on Empirical Evaluation of Dependability and Security (WEEDS-DSN06), Philadelphia, PA, June 28, 2006

  2. Outline  Context and motivation  Data collection  Attack processes modeling  Conclusion and open issues

  3. Context  Need for real data and methodologies to learn about malicious activities on the Internet and analyze their impact on systems security  Several initiatives for monitoring malicious threats do exist ■ CAIDA ■ Motion Sensor project ■ Dshield ■ CADHo

  4. CADHo Objectives  Build and deploy on the Internet a distributed platform of identically configured low-interaction honeypots in a large number of diverse locations  Carry out various analyses based on the collected data to better understand threats and build models to characterize attack processes  Analyze and model the behavior of malicious attackers once they manage to get access and compromise a target ■ High-interaction honeypots

  5. Leurré.com data collection platform R Mach0 e Windows 98 v Workstation e V r i Mach1 r s t Windows NT (ftp u e + web server) Internet a l F S I w Mach2 i r Redhat 7.3 (ftp t e c server) h w a l l Observer (tcpdump)

  6. 35 platforms, 25 countries, 5 continents

  7. Data analysis  Data collection since 2004  80 000 different IP addresses from 91 different countries  Information extracted from the logs ■ Raw packets (entire frames including payloads) ■ IP address of the attacking machine ■ Time of the attack and duration ■ Targeted virtual machines and ports ■ Geographic location of the attacking machine ( Maxmind, NetGeo ) ■ Os of the attacking machine ( p0f, ettercap, disco )  Automatic data analyses have been developed to extract useful trends and identify hidden phenomena from the data ■ Clustering techniques, Time series analysis, etc. ■ Publications available at: www.leurrecom.org/paper.htm

  8. Modeling Objectives  Identify probability distributions that best characterize attack occurrence and attack propagation processes  Model the time relationships between attacks coming from different sources (or to different destinations)  Analyze whether data collected from different platforms exhibit similar or different malicious attack activities  Predict occurrence of new attacks on a given platform based on past observations on this platform and other platforms  Estimate impact of attacks on security of target systems ■ High-interaction honeypots to analyze attackers behavior once they compromise and get access to a target

  9. Examples  Analysis of the time evolution of the number of attacks taking into account the geographic location of attacking machines  Characterization and statistical modeling of times between attacks  Analysis of the propagation of attacks among the honeypot platforms  Data ■ 320 days from January 1st 2004 to April 17, 2005 ■ 14 honeypot platforms (the most active ones) ■ 816475 observed attacks

  10. Attack occurrence and geographic distrib. The number of attacks per unit of R 2 α j β j time, considering a single platform or all platforms, can be described as a Russia 44.57 1555.67 0.93 linear regression of the attacks USA 5.13 759.1 0.94 originating from a single country only UK 25.93 438.03 0.94 Y(t) = α j X j (t) + β j

  11. “Times between attacks” analysis  An attack is associated to an IP address ■ occurrence time associated to the first time a packet is received from the corresponding address  t i = time between attacks i and ( i-1 ) P5 P6 P9 P20 P23 #ti 85890 148942 46268 224917 51580 #IP 79549 90620 42230 162156 47859

  12. Number of attacks per IP address

  13. “Times between attacks” distribution  Best fit provided by a mixture distribution k � � t pdf ( t ) = P a k + 1 + (1 � P a ) � e ( t + 1) 0.025 Pa = 0.0115 k = 0.1183 0.020 λ = 0.1364/sec. Data 0.015 pdf Mixture (Pareto, Exp.) 0.010 Exponential 0.005 0.000 1 31 61 91 121 151 181 211 241 271 Time between attacks Platform 6

  14. “Times between attacks” distribution 0.03 0.08 0.03 0.07 P a = 0.0051 k = 0.173 Pa = 0.0019 0.06 λ = 0.121/sec. 0.02 k = 0.1668 λ = 0.276/sec. 0.05 pdf pdf 0.02 0.04 0.01 0.03 Mixture (Pareto, Exp.) Data Mixture (Pareto, Exp.) 0.02 Exponential 0.01 Data Exponential 0.01 0.00 Time (sec.) 0.00 1 31 61 91 121 151 181 211 241 271 1 31 61 91 121 151 181 211 241 271 Time between attacks Time between attacks Platform 5 Platform 9 0.06 0.02 0.01 0.05 Pa = 0.0144 Data 0.01 Pa = 0.0031 k = 0.0183 Mixture (Pareto, Exp.) λ = 0.0136/sec. k = 0.1240 0.01 0.04 λ = 0.275/sec. pdf 0.01 pdf 0.03 0.01 Exponential 0.00 0.02 Mixture (Pareto, Exp.) 0.00 0.01 Exponential Data 0.00 1 31 61 91 121 151 181 211 241 271 0.00 Time between attacks 1 31 61 91 121 151 181 211 241 271 Time between attacks Platform 20 Platform 23

  15. Propagation of attacks  A Propagation is assumed to occur when an IP address of an attacking machine observed at a given platform is observed at another platform  Propagation graph ■ Nodes identify the platforms ■ Transitions identify propagations  A propagation between Pi and Pj occurs from an IP address when the next occurrence of this address is observed on Pj after visiting Pi ■ Probabilities are associated to the transitions to reflect their likelihood of occurrence

  16. Propagation graph 4.3% 15.1% 15.1% 96.1% 96.1% 8.1% P6 43.2% 1.1% 1% P5 1.4% 11.3% 11.3% 12.6% 1.37% 95.5% 95.5% P20 59% 59% 0.9% 15.4% 15.4% 29% 29% 0.6% 3.7% 1.35% 0.6% P9 P23 2.7% 4.1% 54.1% 30.3%  Issues under investigation ■ Focus on specific attacks (largest clusters, worms, etc.) ■ Timing characteristics and probability distributions

  17. Summary and Conclusions  Preliminary models to characterize attack processes observed on low-interaction honeypots  Several open issues ■ Predictive models that can be used to support decision making during design and operation stages ■ How to assess the impact of attacks on the security of target systems?  High-interaction honeypots ■ Analyze attackers behavior once they get access to a target ■ Validate a theoretical model for quantitative evaluation of security developed by LAAS in the 90’s  Privilege graph to describe vulnerabilities and attack scenarios  METF “Mean Effort To security Failure” to quantify security  Assumptions about intruders behaviors

Recommend


More recommend