Empirical Studies in Cybersecurity: Some Challenges Michel Cukier

Adding Science to Cybersecurity • Empirical studies are needed to add science to cybersecurity • Challenges: – Security metrics are lacking – Security data are not publicly available

Availability of Security Data • Few available datasets have issues (e.g., MIT LL 98/99) • NSF helped initiating collaborations but none succeeded (2001) • NSF workshop on the lack of available data (2010) • DHS PREDICT dataset: – Context is missing – More datasets will be added over time

The End?

A Rare Collaboration • Unique relationship with – G. Sneeringer, Director of Security, and his security team at the Office of Information Technology • Access to security related data collected on the UMD network • Development of testbeds for monitoring attackers Enables unique empirical studies

Incident Data • Incidents: – Confirmed compromised computers – More than 12,000 records since June 2001 • Models: – Software reliability growth models, time series, epidemiological models • Questions: – # incidents: relevant metric? – Impact of time (age, duration)?

Intrusion Prevention System (IPS) Data • Intrusion Prevention System (IPS) alerts: – IPSs located at the border and inside UMD network – More than 7 million events since September 2006 • Models: – Identify outliers, define metrics containing some memory • In-house validation

Network Flows • Network flows: – 130,000 IP addresses monitored (two class B networks belonging to UMD) • Tool: – Goal: increase network visibility – Nfsight (available on sourceforge) • In-house validation • Next goal: – An efficient flow-based IDS

Backend Algorithm Request flow: 2009-07-30 09:34:56.321 TCP 10.0.0.1: 2455 → 10.1.2.3: 80 Host 2 Host 1 Reply flow: 2009-07-30 09:34:56.322 TCP 10.1.2.3: 80 → 10.0.0.1: 2455 Algorithm : • Receive a batch of 5 minutes of flows • Pair up unidirectional flows using {src/dst IP/port and protocol} • Run heuristics and calculate probabilities for each end point to host a service • Output end point results and bidirectional flows Client Server Bi-flow: 2009-07-30 09:34:56.321 TCP 10.0.0.1 :2455 → 10.1.2.3: 80 10.0.0.1 10.1.2.3 to tcp/80 hosts tcp/80

Heuristics Heuristic ID Features and Formula Used Output Values Timing: Timestamp of request < [0, …] Heuristic 0 Timestamp of reply Port numbers: Heuristic 1 Src port > Dst port {0, 0.5, 1} Heuristic 2 Src port > 1024 > Dst port {0, 0.5, 1} Heuristic 3 Port in /etc/services {0, 0.5, 1} Fan in/out relationships: [0, …] Heuristic 4 # ports related [0, …] Heuristic 5 # IP related [0, …] Heuristic 6 # tuples related

Front-end

Case Study: Scanning Activity

Case Study: Worm Outbreak

Case Study: Distributed Attacks

Honeypot (HP) Data • Honeypot data: – Malicious activity collected on more than 1,200 HPs (low and high interaction) – Low interaction HPs deployed at UIUC, AT&T, PJM, France and Morocco – High interaction HPs for study of attacks/attackers

Details of Experiment • Easy access to honeypots though entry point: SSH • Multiple honeypots per attacker for an extended period of time: one month • Configure honeypots given to one attacker with increasing network limitations: some ports blocked • Collect data such as network traffic, keystrokes entered and rogue software downloaded

Configuration Details • The network gateway has two network interfaces: – One in front of the Internet, configured with 40 public IP addresses from the University of Maryland – One configured with a private IP address • OpenSSH was modified to reject SSH attempts on its public IP addresses until the 150 th try • Up to 40 honeypots can exist in parallel • Attackers can deploy up to 3 honeypots • Honeypots: – HP1: no network limitation – HP2: main IRC port blocked (port 6667) – HP3: every port blocked except HTTP, HTTPS, FTP, DNS, and SSH

Test-bed Architecture

Attacker Identification • Attacker IP address • Attacker AS number (identifies network on the Internet) • Attacker actions: – Rogue software origin – Way of performing specific actions – Files accessed • Comparison of keystroke profiles

Attacker Skills • Analyst assesses attacker skill • Preferred approach easier to reproduce • Criteria based on: – Is the attacker careful about not being seen? – Does the attacker check the target environment? – How familiar is the attacker with the rogue software? – Is the attacker protecting the compromised target?

Attacker Skills (Cont.) Criterion Assessment Hide Ratio of # sessions where attacker hid Restore deleted files Ratio # sessions where deleted files were restored Check presence Ratio # sessions where presence checked Delete downloaded 0 if downloaded file is not deleted, 1 otherwise file Check system 0 if system has never been checked, 1 otherwise Edit configuration file 0 if configuration file has never been edited, 1 otherwise Change system 0 if system has never been modified, 1 otherwise Change password 0 if password has never been changed, 1 otherwise Create new user 0 if no new user has been created, 1 otherwise Rogue software 0 if less than half of the installed rogue software is adequacy adequate, 1 otherwise

Overall Results • Experiment run from May 17 th , 2010 to November 5 th , 2010 Honeypot # sessions # non-empty sessions All 312 211 (68%) HP1 160 110 (69%) HP2 105 74 (70%) HP3 47 27 (57%)

Who Launched the Attacks? Based on AS Number Based on IP Address Top countries brute force: Top countries compromise: China (34) Romania (75) USA (27) Lebanon (32) Korea (8) USA (24) Italy (7) UK (16)

Analysis as a Function of Attacker Skill All honeypots 95% 95% Percentage of 100% 79% • Results: 77% attackers 80% 59% 56% 49% 60% 46% – 95% check presence 40% 21% 15% or system 20% 0% – 79% delete downloaded 1 2 3 4 5 6 7 8 9 10 Criterion ID file – 77% change the password – 15% create a new user • There might be a link between attackers actions and their skills

Analysis as a Function of Attacker Skill (Cont.) Create new user Hide 60% 50% Percentage of 50% Percentage of 39% 50% 40% attackers attackers 40% 33% 30% 22% 30% 17% 20% 13% 13% 20% 4% 4% 4% 10% 10% 0% 0% 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 Skill level Skill level (a) (b) Average skill level= 7.7 Average skill level= 6.3

Analysis as a Function of Attacker Skill (Cont.) Password change Check presence 35% 27% 30% 30% Percentage of Percentage of 24% 30% 25% 23% attackers attackers 25% 20% 17% 20% 14% 13% 15% 15% 8% 8% 8% 8% 10% 7% 10% 3% 3% 3% 3% 5% 5% 0% 0% 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 Skill level Skill level (d) (c) Average skill level= 6.0 Average skill level= 5.5

Why Was the Attack Launched? Average number of attackers per Honeypot type • HP1 For the 60 deployed honeypots, 9 (15%) 2 HP2 were targeted Average number of 1.44 1.5 1.20 by more than one attacker 1.18 attackers 1 HP3 1 • 7 honeypots were 0.5 targeted by 2 different All attackers, one honeypot 0 Honeypot type by 3 different attackers and 1 honeypot by 5 different attackers • Raises the important issue about how access is shared and why • Even though 77% of the attackers changed the password, 15% did share access with at least 1 other attacker

Challenges • Generalization? – Replication (same method) – Reproduction (different method) – Re-analysis of data • Issues: – Need collaborations for replication – Need to develop a new method for reproduction – Re-analysis might not be possible

The End?

Theories from Social Sciences to Add Science to Cybersecurity • For the last year: – Focus on criminological theories – Collaboration with David Maimon and his research team • Consider various criminological theories • Identify theories that need to be adapted to cybersecurity

New Use of IPS Alerts • Application to Routine Activity Theory (RAT): – Crime is normal and depends on the opportunities available – If a target is not protected enough, and if the reward is worth it, crime will happen • Alerts = Attack attempts (blocked by IPS) • Results: – Number of alerts is linked to daily activity – Origin of attack is linked to user origin

Use of Honeypot Data • Describe attacker/attack: – Network data – Attacker keystrokes • Empirical study: – Effect of warnings – Various HPs configurations (CPU, memory, disk space)

Issues • Mismatch between what criminological theories need and what HPs data contain • Need statistically significant results (e.g., 6 months, over 120 HPs/week deployed, about 2900 HPs, 3700 sessions) • Experiments need to be deployed over a long period of time: attacks/attackers might evolve

Some Good News • Empirical studies are solid scientific work • Developed approaches can be applied at other locations • Results do not need to be identical (e.g., crime varies between cities)

The End!

Empirical Studies in Cybersecurity: Some Challenges Michel Cukier - PowerPoint PPT Presentation

Empirical Studies in Cybersecurity: Some Challenges Michel Cukier Adding Science to Cybersecurity Empirical studies are needed to add science to cybersecurity Challenges: Security metrics are lacking Security data are not

KACo Cybersecurity Training KACo Cybersecurity Training Cybersecurity Threats Phishing

Presented by: Islanders Bank Cybersecurity Awareness Cybersecurity Awareness Objectives:

Formal Methods and CyberSecurity James Davenport University of Bath Former Fulbright

U.S. National U.S. National Why are we talking about Cybersecurity Cybersecurity

Formal Methods and CyberSecurity James Davenport University of Bath Former Fulbright

Functional Principal Component Analysis May 14, 2018 Empirical Principal Component FPC for the

Cybersecurity A presentation to the National Association of Black Accountants Cleveland

ITU Mandate and Activities ITU Mandate and Activities Related to Cybersecurity Cybersecurity

NCL: Facilitating Cybersecurity Experimentation as a Community Liang Zhenkai National

Empirical Studies & Domain Experts Alark Joshi Visualization and Graphics Lab

CYBERSECURITY Cultural Change to Support the Business Sandra E. Paul-Blanc, CISO NARA Dr. Philip

in the Energy Sector Cybersecurity for Energy Delivery Systems (CEDS) Energy Sector

Internet and CyberSecurity 101 U.S. National Cybersecurity, 10/5/06 presented by: Martin Casado

CYBERSECURITY Situational awareness Franois Thill, Director Cybersecurity, Ministry of the

Selected Topics in Cybersecurity Cybersecurity Selected Topics in Eugene H. Spafford Spafford

8/29/2015 Effect of Empirical Left Atrial Appendage Isolation on Effect of Empirical Left Atrial

Internet measurements at complexnetworks.fr Guillaume Valadon - http://valadon.complexnetworks.fr

DIY Blue Teaming DIY Blue Teaming (Keeping attackers out, with duct tape and chewing gum!) DIY

FDM curriculum group moderated by Frank Neven Questions / issues FDM topics for non-DB

Information warfare The term information warfare refers to peace time activities

Big Data Analytics, Human Data Interaction, and the Databox Richard Mortier Cambridge

1 Models for Translucent Objects Models for Translucent Objects Models for Translucent Objects

Dating Patterns Aino Vonge Corry @apaipi Big Disclaimer How many times, have you thought 'Boy,

Understanding and Securing Device Vulnerabilities through Automated Bug Report Analysis Xuan Feng

Empirical Studies in Cybersecurity: Some Challenges Michel Cukier - PowerPoint PPT Presentation

Empirical Studies in Cybersecurity: Some Challenges Michel Cukier Adding Science to Cybersecurity Empirical studies are needed to add science to cybersecurity Challenges: Security metrics are lacking Security data are not

KACo Cybersecurity Training KACo Cybersecurity Training Cybersecurity Threats Phishing

Presented by: Islanders Bank Cybersecurity Awareness Cybersecurity Awareness Objectives:

Formal Methods and CyberSecurity James Davenport University of Bath Former Fulbright

U.S. National U.S. National Why are we talking about Cybersecurity Cybersecurity

Formal Methods and CyberSecurity James Davenport University of Bath Former Fulbright

Functional Principal Component Analysis May 14, 2018 Empirical Principal Component FPC for the

Cybersecurity A presentation to the National Association of Black Accountants Cleveland

ITU Mandate and Activities ITU Mandate and Activities Related to Cybersecurity Cybersecurity

NCL: Facilitating Cybersecurity Experimentation as a Community Liang Zhenkai National

Empirical Studies &amp; Domain Experts Alark Joshi Visualization and Graphics Lab

CYBERSECURITY Cultural Change to Support the Business Sandra E. Paul-Blanc, CISO NARA Dr. Philip

in the Energy Sector Cybersecurity for Energy Delivery Systems (CEDS) Energy Sector

Internet and CyberSecurity 101 U.S. National Cybersecurity, 10/5/06 presented by: Martin Casado

CYBERSECURITY Situational awareness Franois Thill, Director Cybersecurity, Ministry of the

Selected Topics in Cybersecurity Cybersecurity Selected Topics in Eugene H. Spafford Spafford

8/29/2015 Effect of Empirical Left Atrial Appendage Isolation on Effect of Empirical Left Atrial

Internet measurements at complexnetworks.fr Guillaume Valadon - http://valadon.complexnetworks.fr

DIY Blue Teaming DIY Blue Teaming (Keeping attackers out, with duct tape and chewing gum!) DIY

FDM curriculum group moderated by Frank Neven Questions / issues FDM topics for non-DB

Information warfare The term information warfare refers to peace time activities

Big Data Analytics, Human Data Interaction, and the Databox Richard Mortier Cambridge

1 Models for Translucent Objects Models for Translucent Objects Models for Translucent Objects

Dating Patterns Aino Vonge Corry @apaipi Big Disclaimer How many times, have you thought 'Boy,

Understanding and Securing Device Vulnerabilities through Automated Bug Report Analysis Xuan Feng

Empirical Studies & Domain Experts Alark Joshi Visualization and Graphics Lab