PAM 2004 Typeset by Foil T EX PAM2004 Outline A Robust - PowerPoint PPT Presentation

A Robust Classifier for Passive TCP/IP Fingerprinting Rob Beverly MIT CSAIL rbeverly@csail.mit.edu April 20, 2004 PAM 2004 – Typeset by Foil T EX –

PAM2004 Outline • A Robust Classifier for Passive TCP/IP Fingerprinting – Background – Motivation – Our Approach/Description of Tool – Application 1: Measuring an Exchange Point – Application 2: NAT Inference – Conclusions – Questions? 1

PAM2004 Background • Objective: Identify Properties of a Remote System over the Network • Grand Vision: Passively Determine TCP Implementation in Real Time [Paxson 97] • Easier: Identify Remote Operating System/Version Passively → “Fingerprinting” • What’s the Motivation? 2

PAM2004 Motivation • Fingerprinting Often Regarded as Security Attack • Fingerprinting as a Tool: – In Packet Traces, Distinguish Effects due to OS from Network Path – Intrusion Detection Systems [Taleck 03] – Serving OS-Specific Content • Fingerprinting a Section of the Network: – Provides a Unique Cross-Sectional View of Traffic – Building Representative Network Models – Inventory 3

PAM2004 Motivation Con’t • We Select Two Applications: – Characterizing One-Hour of Traffic from Commercial Internet USA Exchange Point – Inferring NAT (Network Address Translation) Deployment • More on these later... 4

PAM2004 TCP/IP Fingerprinting Background • Observation: TCP Stacks between Vendors and OS Versions are Unique • Differences Due to: – Features – Implementation – Settings, e.g. socket buffer sizes • Two Ways to Fingerprint: – Active – Passive 5

PAM2004 TCP/IP Fingerprinting Con’t • Active Fingerprinting – A “Probe” Host Sends Traffic to a Remote Machine – Scans for Open Ports – Sends Specially Crafted Packets – Observe Response; Match to list of Response Signatures. Probe 1 Active Probe Reply 1 Probe 2 6

PAM2004 TCP/IP Fingerprinting Con’t • Passive Fingerprinting – Assume Ability to Observe Traffic – Make Determination based on Normal Traffic Flow B A Passive Monitor Classifier 7

PAM2004 Active vs. Passive Fingerprinting • Active Fingerprinting – Advantages: Can be run anywhere, Adaptive – Disadvantages: Intrusive, detectable, not scalable – Tool: nmap . Database of ∼ 450 signatures. • Passive Fingerprinting – Advantages: Non-intrusive, scalable – Disadvantages: Requires acceptable monitoring point – Tool: p0f relies on SYN uniqueness exclusively • We want to Fingerprint all Traffic on a Busy, Representative Link • Use Passive Fingerprinting 8

PAM2004 Robust Classifier • Passive Rule-Based tools on Exchange Point Traces: – Fail to identify up to ∼ 5% of trace hosts • Problems: – TCP Stack “Scrubbers” [Smart, et. al 00] – TCP Parameter Tuning – Signatures must be Updated Regularly • Idea: Use Statistical Learning Methods to make a “Best-Guess” for each Host 9

PAM2004 Robust Classifier Con’t • Created Classifier Tool: – Naive Bayesian Classifier – Maximum-Likelihood Inference of Host OS – Each Classification has a Degree of Confidence • Difficult Question: How to Train Classifier? • Train Classifier Using: – p0f Signatures ( ∼ 200) – Web-Logs – Special Collection Web Page + Altruistic Users 10

PAM2004 Robust Classifier Con’t • Question: Why not Measure OS Distribution using, e.g. Web Logs? – Want General Method, Not HTTP-Specific – Avoid Deep-Packet Inspection – Web Browsers Can Lie for anonymity and compatibility 11

PAM2004 Robust Classifier Con’t • Inferences Made Based on Initial SYN of TCP Handshake • Fields with Differentiation Power: – Originating TTL (0-255, as packet left host) – Initial TCP Window Size (bytes) – SYN Size (bytes) – Don’t Fragment Bit (on/off) 12

PAM2004 Robust Classifier Con’t • Originating TTL: – Next highest power of 2 trick – Example: Monitor Observes Packet with TTL=59. Infer TTL=64. • Initial TCP Window Size can be: – Fixed – Function of MTU (Maximum Transmission Unit) or MSS (Maximum Segment Size) – Other • Initial TCP Window Size Matching: – No visibility into TCP-options – For common MSS (1460, 1380, 1360, 796, 536) ± IP Options Size – Check if an Integer Multiple of Window Size 13

PAM2004 Example Win SYN RuleBased Bayesian Description TTL Size Size DF Conf Correct Correct FreeBSD 5.2 64 65535 60 T 0.988 Y Y FreeBSD (1) 64 65535 70 T 0.940 N Y FreeBSD (2) 64 65530 60 T 0.497 N Y • Example 2: Tuned FreeBSD; Window Scaling Throws Off Ruled-Based kern.ipc.maxsockbuf=4194304 net.inet.tcp.sendspace=1048576 net.inet.tcp.recvspace=1048576 net.inet.tcp.rfc3042=1 net.inet.tcp.rfc3390=1 More Fields in Rule-based Approach → Fragile Learning on Additional Fields → more Robust 14

PAM2004 Classifying a Cross-Section of the Internet • Traces: – MIT LCS Border Router – NLANR MOAT – Commercial Internet Exchange Point Link (USA) • Analyze One-Hour Trace from Exchange Point • Collected in 2003 at 16:00 PST on a Wednesday 15

PAM2004 Classifying a Cross-Section of the Internet • Traces: – Commercial Internet Exchange Point Link (USA) AS 1 AS 3 AS 2 AS 4 AS N AS M Passive Monitor Classifier 16

PAM2004 Classifying a Cross-Section of the Internet • For Brevity (and Easier Computationally) – Group in Six Broad OS Categories – Measure Host, Packet and Byte Distribution – Using p0f -trained Bayesian, Web-trained Bayesian and Rule-Based 17

PAM2004 Host Distribution Windows Dominates Host Count: 92.6-94.8% Host Distribution (59,595 unique) 100 Bayesian WT−Bayesian 90 Rule−Based 80 70 60 Percent 50 40 30 20 10 0 Windows Linux Mac BSD Solaris Other Unknown Note: Unknown applies only to Rule-Based 18

PAM2004 Packet Distribution • Windows: 76.9-77.8%; Linux: 18.7-19.1% Packet Distribution (30.7 MPackets) 80 Bayesian WT−Bayesian 70 Rule−Based 60 50 Percent 40 30 20 10 0 • Windows Linux Mac BSD Solaris Other Unknown 19

PAM2004 Byte Distribution • Windows: 44.6-45.2%; Linux: 52.3-52.6% Byte Distribution (7.2 GBytes) 60 Bayesian WT−Bayesian Rule−Based 50 40 Percent 30 20 10 0 • Windows Linux Mac BSD Solaris Other Unknown 20

PAM2004 Byte Distribution • Interesting Results • Windows Dominates Hosts, but Linux hosts contribute the most traffic! • Top 10 Largest Flows: – 55% of byte traffic! – 5 Linux, 2 Windows – Software Mirror, Web Crawlers (packet every 2-3ms) – SMTP servers – Aggressive pre-fetching web caches • Conclusion: Linux Dominates Traffic, Primarily due to Server Applications in our Traces (YMMV) 21

PAM2004 Classifying for NAT Inference • Second Potential Application of Classifier • Goal: Understand NAT prevalence in Internet • Motivation: “E2E-ness” of Internet • Assume hosts behind an IP-Masquerading NAT have different OS or OS versions (strong assumption) • Look for traffic from same IP with different signature to get NAT lower-bound • In hour-long trace, assume DHCP and dual-booting machine influence negligible 22

PAM2004 NAT Inference • Existing Approaches: sflow [Phaal 03], IP ID [Bellovin 02] • sflow: – Monitor must be before 1st hop router – Using TTL trick, look for unexpectedly low TTLs (decremented by NAT) 23

PAM2004 NAT Inference • IP ID [Bellovin 02]: – If IP ID is a sequential counter – Construct IP ID sequences – Coalesce, prune with empirical thresholds – Number of remaining sequences estimates number of hosts 1,2,3,4 NAT 22,23,24,26,28 101,102,105,107,106 Passive Monitor 24

PAM2004 Sequence Matching Obstacles • Question of whether IP ID Sequence Matching Works: – IP ID used for Reassembling Fragmented IP packets – No defined semantic, e.g *BSD uses pseudo-random number generator! – If DF-bit set, no need for reassembly. NAT sets IP ID to 0. – Proper NAT should rewrite IP ID to ensure uniqueness! • Further, these obstacles will become significant in the future! • We seek to determine the practical impact of these limitations and how well alternate approach works in comparison. 25

PAM2004 Evaluating NAT Inference Algorithms • To evaluate different NAT inference algorithms • Gathered ∼ 2.5M packets from academic building (no NAT) • Synthesize NAT traffic • Reduce number of unique addresses by combining traffic of n IP addresses into 1 . • We term n the “NAT Inflation” factor 26

PAM2004 Evaluating NAT Inference • Synthetic Traces Created with 2.0 NAT Inflation Factor • Inferred NAT Inflation: – IP ID Sequence Matching: 2.07 – TCP Signature: 1.22 • IP ID Technique works well! • TCP Classification does not have enough Discrimination Power 27

PAM2004 NAT Inflation in the Internet • Results: – IP ID Sequence Matching: 1.092 – TCP Signature: 1.02 • Measurement-based lower bound to understanding NAT prevalence in Internet 28

PAM2004 Future • How to Validate Performance of Classifier? (What’s the Correct Answer?) • Expand Learning to Additional Fields/Properties of Flow • Properly Train Classifier? • Web Page (Honest Users Please!): http://momo.lcs.mit.edu/finger/finger.php • Identifying TCP Stack Variant (e.g. Reno, Tahoe) 29

PAM 2004 Typeset by Foil T EX PAM2004 Outline A Robust - PowerPoint PPT Presentation

A Robust Classifier for Passive TCP/IP Fingerprinting Rob Beverly MIT CSAIL rbeverly@csail.mit.edu April 20, 2004 PAM 2004 Typeset by Foil T EX PAM2004 Outline A Robust Classifier for Passive TCP/IP Fingerprinting Background

2004: Poisson Matting 2004: Flash/No-Flash 2004: Flash/No-Flash 2004: Flash/No-Flash 2004: The

Kerberos and PAM Russ Allbery May 1, 2007 Russ Allbery (rra@stanford.edu) Stanford University

Premises Assurance Model NHS PAM structure and content The NHS PAM has two distinct but

and Orientation Facilitators Pam Cress and Kim Naten Worship Facilitator Pam Cress Come and Be

Organizational structure As one of the export and m anufacture com pany, Pam Hsiang have

ECEN 5032 Data Networks Communication Theory: PAM Examples Peter Mathys mathys@colorado.edu

10/23/2014 The average score (in bits) per alignment position when using a PAM M matrix to compare

2004 Interim Results Presentation Interim Results Presentation 2004 2004 Interim Results

Interim Results 2004 Presentation Interim Results 2004 Presentation 27 July 2004 27 July 2004

FYE 03/2004 Financial Results FYE 03/2004 Financial Results April 21, 2004 April 21, 2004 This

ab cd Platinum|2004 2004 Platinum Interim Review Interim Review 16th November 2004 16th

A Pathologists Perspective on Naegleria Fowleri Meningoencephalitis(PAM) Dennis Drehner, D.O.

Reduced-Complexity Joint Frequency, Timing and Phase Recovery for PAM Based CPM Receivers Sayak

The Serverless Revolution for JavaScript Developers Pam Selle, IOpipe @pamasaur |

Fiscal Management, Part 1a July 19, 2015 1 Introductions Pam Zeutenhorst, Co-Founder/Trainer

Stand & Deliver: Tips for Delivering Effective Presentations U.S. EPA Community Involvement

ACADEMIC SUPPORT FOR STUDENTS Supporting Access, Fostering Success VUReady 2 PROGRAM NAME

ADVANCED PAYMENT AND CARE MODEL (APCM) March 2016 Learning Session Portland, OR Pre-Work 2

W HAT IS CONFLICT ? 3 C ONFLICT S TYLES Accommodate Avoid Compete Compromise

MISSOURI LTCO MISSOURI LTCO PICKLE PREPARATION PICKLE PREPARATION TRAINING TRAINING (Getting

Philanthropic Strategy Options Link to Recorded Webinar:

Energy prices are rising rapidly leading to globally increasing Rapidly rising oil prices

Pitfalls In Managing Psychological Injuries Dr Josie Sundin 19 March 2015 Webinar Presenter Dr

Overview I. Introduction and Overview II. Preparing the Estate Tax Return III. Examination of

PAM 2004 Typeset by Foil T EX PAM2004 Outline A Robust - PowerPoint PPT Presentation

A Robust Classifier for Passive TCP/IP Fingerprinting Rob Beverly MIT CSAIL rbeverly@csail.mit.edu April 20, 2004 PAM 2004 Typeset by Foil T EX PAM2004 Outline A Robust Classifier for Passive TCP/IP Fingerprinting Background

2004: Poisson Matting 2004: Flash/No-Flash 2004: Flash/No-Flash 2004: Flash/No-Flash 2004: The

Kerberos and PAM Russ Allbery May 1, 2007 Russ Allbery (rra@stanford.edu) Stanford University

Premises Assurance Model NHS PAM structure and content The NHS PAM has two distinct but

and Orientation Facilitators Pam Cress and Kim Naten Worship Facilitator Pam Cress Come and Be

Organizational structure As one of the export and m anufacture com pany, Pam Hsiang have

ECEN 5032 Data Networks Communication Theory: PAM Examples Peter Mathys mathys@colorado.edu

10/23/2014 The average score (in bits) per alignment position when using a PAM M matrix to compare

2004 Interim Results Presentation Interim Results Presentation 2004 2004 Interim Results

Interim Results 2004 Presentation Interim Results 2004 Presentation 27 July 2004 27 July 2004

FYE 03/2004 Financial Results FYE 03/2004 Financial Results April 21, 2004 April 21, 2004 This

ab cd Platinum|2004 2004 Platinum Interim Review Interim Review 16th November 2004 16th

A Pathologists Perspective on Naegleria Fowleri Meningoencephalitis(PAM) Dennis Drehner, D.O.

Reduced-Complexity Joint Frequency, Timing and Phase Recovery for PAM Based CPM Receivers Sayak

The Serverless Revolution for JavaScript Developers Pam Selle, IOpipe @pamasaur |

Fiscal Management, Part 1a July 19, 2015 1 Introductions Pam Zeutenhorst, Co-Founder/Trainer

Stand &amp; Deliver: Tips for Delivering Effective Presentations U.S. EPA Community Involvement

ACADEMIC SUPPORT FOR STUDENTS Supporting Access, Fostering Success VUReady 2 PROGRAM NAME

ADVANCED PAYMENT AND CARE MODEL (APCM) March 2016 Learning Session Portland, OR Pre-Work 2

W HAT IS CONFLICT ? 3 C ONFLICT S TYLES Accommodate Avoid Compete Compromise

MISSOURI LTCO MISSOURI LTCO PICKLE PREPARATION PICKLE PREPARATION TRAINING TRAINING (Getting

Philanthropic Strategy Options Link to Recorded Webinar:

Energy prices are rising rapidly leading to globally increasing Rapidly rising oil prices

Pitfalls In Managing Psychological Injuries Dr Josie Sundin 19 March 2015 Webinar Presenter Dr

Overview I. Introduction and Overview II. Preparing the Estate Tax Return III. Examination of

Stand & Deliver: Tips for Delivering Effective Presentations U.S. EPA Community Involvement