understanding risk for network resilience
play

Understanding risk for network resilience Paul Smith, Marcus - PowerPoint PPT Presentation

Understanding risk for network resilience Paul Smith, Marcus Schoeller (NEC), and David Hutchison p.smith@comp.lancs.ac.uk Multi-service Networks July 2009 What is network resilience? The ability of a networked system to provide an


  1. Understanding risk for network resilience Paul Smith, Marcus Schoeller (NEC), and David Hutchison p.smith@comp.lancs.ac.uk Multi-service Networks July 2009

  2. What is network resilience? • The ability of a networked system to provide an acceptable level service in light of challenges • Challenges • Component faults • Hardware destruction • Human mistakes • Malicious attacks • ....

  3. Nothing comes for free • Providing network resilience has a cost • We need systems to do this • We are resource constrained • $£ € , cpu cycles, bandwidth

  4. Need to prioritise and focus efforts All challenges Most probable high-impact challenges

  5. Identifying critical challenges Determine 1 2 Identify cost of critical asset assets compromise 4 3 Identify Develop challenges to system the system understanding 5 6 7 Identify Identify Determine system probability of measure of faults failure exposure exposure = cost x probability

  6. What’s difficult about this? • Determining reliable measures for challenge occurrence probabilities [and the probability of that leading to failure] • Quantifying the impact of a challenge

  7. Challenge: Getting reliable numbers • Off-line analysis • Advisories are useful (e.g., www.cert.org) • Fault and attack tree analysis • Issues of scalability because of complexity • Simulation • Need to develop good challenge and fault models • Record monitoring data from on-line system • Classify challenges using machine learning • Introduces resource and security concerns

  8. Need for on-line impact measures for automated mitigation Understand Detection of challenge root symptoms cause Detect Identify Symptoms Understand Mitigate Determine Impact E.g., Service failures Network congestion Poor performance Anomalous traffic. E.g., Monetary cost resource utilisation Effect on other services Loss of data

  9. Conclusions • To make best use of limited resources, we need to determine the high-impact challenges • Getting good numbers for challenge probabilities and their impact is hard • Some on-line components necessary, which has implications on system design • For more information see www.resumenet.eu

Recommend


More recommend