Methodology and tools to analyze DITL DNS data Sebastian Castro - PowerPoint PPT Presentation

Methodology and tools to analyze DITL DNS data Sebastian Castro secastro@caida.org CAIDA 9 th CAIDA/WIDE workshop – January 2008

Process overview Trace • Data collection OARC Collection Data Trace Curation Analysis • Trace curation Trace Plotting Merging • Trace merging Trace upload • Trace analysis Root Server Graphs/ operator Aggregated Data • Plotting I ntermediate Analysis Server Server CAI DA Box Root Server operator Database Fileserver Root Server operator 2

Data collection • Done by each operator based on CAIDA recommendations – http://www.caida.org/projects/ditl/ • Using tcpdump, dnscap, etc and helper scripts • All traces uploaded to OARC file server – All further processing done on OARC boxes due to data access restriction. 3

Data verification • Verify trace completeness and integrity – Has missing pieces? – Truncated packets? – Truncated gzip files? – Check clock skew – Count DNS queries, responses, IPv4 packets, TCP, UDP, etc. • Select the best dataset available – In terms of coverage • Defined as the number of packets seen versus the number of packets expected to seen. 4

5 Example of coverage

Trace merging • Transform the original traces by – Homogeneous time intervals • 1-hour chunk – Correct clock skew (where known) – Translate destination addresses • All instances of the same root share the same IP, impossible to distinguish. • Some use private addresses internally. • Transform from 192.33.4.12(C-root) to 3.0.0.4 (3 represents C, 4 represents instance number) – Filter other traffic • DNS queries sent to other addresses on the same machine • Leave only queries • DNS traffic generated by the machine: zone sync traffic. • To get one file per hour with all instances included 6

DNS analyses • Analyses currently available – Client and query rates per instance – AS/prefix coverage per instance – Distribution of queries by query type • Global and deaggregated by root and instance – Node/cloud switching per client – Source port distribution – EDNS support (client and query), EDNS buffer size – Invalid queries – Recursive queries – RFC1918-sourced queries counter 7

Trace Analysis Tool – Reads pcap and pcap.gz files – Output as text file • SQL files to create tables and the data • Plain files with some stats – C/C++ code – Memory footprint • 300M – 6G – Uses patricia trees to implement route table lookups 8

Database • PostgreSQL – Usually one table per analysis – Not much work on performance – Gave us some problems about table access control • One database per dataset – Root traces 2007 – Root traces 2006 – ORSN 2007 9

Data presentation • Some preprocessing/data aggregation done using Perl/AWK • Graph generated using ploticus • Group things could be easily done 10

Process example DITL 2007 analysis flow example Merged DNS Trace Merging Trace Analysis traces traces (18-30 hours) (2-3 days) ~ 160G ~ 740G SQL dump Data Curation (table and (weeks) data) PNG/ EPS Database Plot & report Plots Loading (1-5 min) Text Files (15-20 min) 11

Recent improvements • Have better performance – Replaced map with hash_map (unordered associative arrays) for a 40% performance gain • Simpler selection of analysis to run – Using command line • A object-oriented design – More organized code – Allowing others to add analyses 12

What’s next • Add new analyses – Daily patterns by query type – Locality of queries by TLD – Improve some criteria on the invalid query classification – IPv6 related traffic (queries and packets) – … put your desired analysis here … 13

Conclusions • Having tools and procedures to collect and analyze the data makes things easier. – Allowed us to make comparisons between 2006 and 2007 pretty straightforward • Current tools covers the basics – Clearly subject to be improved and extended – Performance could become an issue with larger datasets. 14

Methodology and tools to analyze DITL DNS data Sebastian Castro - PowerPoint PPT Presentation

Methodology and tools to analyze DITL DNS data Sebastian Castro secastro@caida.org CAIDA 9 th CAIDA/WIDE workshop January 2008 Process overview Trace Data collection OARC Collection Data Trace Curation Analysis Trace

DNS and Security DNS and Security DNS and Security DNS and Security DNS and Security DNS and

and DNS data mining Making Windows DNS Server Cloud Ready ~Kumar Ashutosh, Microsoft Windows DNS

DNS Session 2: DNS cache operation and DNS debugging TENET NSRC - 2013 DNS Cache Operation

Name Detection System By Auke Zwaan DNS DNS DNS Give me google. gle.nl nl DNS Give me

Resilient Networking 6: Attacks on DNS 1 Chapter Outline Overview of DNS Known attacks

DNS Session 2: DNS cache operation and DNS debugging These materials are licensed under the

DNSSEC and DNS Proxying DNS is hard at scale when you are a huge target 2 CloudFlare

Domain Name System (DNS) Learning Goal Foundations of DNS Security in DNS: Integrity

DNS(SEC) client analysis assisted by Bart Gijsen (TNO) DNS-OARC, San Francisco, March 2011

DNS-OARC Wayne MacLaurin Executive Director DNS-OARC Members EP.NET DNS-OARC's Mandate

DNS Coffee Ian Foster $ whoami Ian Foster UCSD Graduate B.S./M.S. (2014/2015)

Meeting your goals Meeting your goals Meeting your goals Meeting your goals We are DNS DNS

DNSSEC in Windows DNS Server Kumar Ashutosh, Microsoft Windows DNS Server Widely deployed in

DNS Session 2: DNS cache operation and DNS debugging Joe Abley AfNOG 2006 workshop How caching

DNS Session 2: DNS cache How caching NS works (1) operation and DNS debugging If we've dealt

Anomaly Detection on DNS Auths Root DNS, ccTLDs and DNS providers Team

ACCC Regulation & Competition Conference Sydney, 25-26 July 2002 Australias Productivity

Some motivating facts Size distribution of firms (measured by assets, sales or employment) is

Rate of Return on a Stock When considering investments, helpful to describe statistical

Returns Optimization 101 Episode 7: Examining Return Reasons Power of Return Reasons Return

Inter-Reactive Kotlin Applications Julien Viet @julienviet Julien Viet Open source developer

A Long-Term User-Centric Analysis of Deduplication Patterns 32 nd International Conference on

Baumgartner, POLI 203 Fall 2014 Last Class of the Semester Readings: Review and California Court

Explaining the Boom-Bust Cycle in the U.S. Housing Market: A Reverse-Engineering Approach

Methodology and tools to analyze DITL DNS data Sebastian Castro - PowerPoint PPT Presentation

Methodology and tools to analyze DITL DNS data Sebastian Castro secastro@caida.org CAIDA 9 th CAIDA/WIDE workshop January 2008 Process overview Trace Data collection OARC Collection Data Trace Curation Analysis Trace

DNS and Security DNS and Security DNS and Security DNS and Security DNS and Security DNS and

and DNS data mining Making Windows DNS Server Cloud Ready ~Kumar Ashutosh, Microsoft Windows DNS

DNS Session 2: DNS cache operation and DNS debugging TENET NSRC - 2013 DNS Cache Operation

Name Detection System By Auke Zwaan DNS DNS DNS Give me google. gle.nl nl DNS Give me

Resilient Networking 6: Attacks on DNS 1 Chapter Outline Overview of DNS Known attacks

DNS Session 2: DNS cache operation and DNS debugging These materials are licensed under the

DNSSEC and DNS Proxying DNS is hard at scale when you are a huge target 2 CloudFlare

Domain Name System (DNS) Learning Goal Foundations of DNS Security in DNS: Integrity

DNS(SEC) client analysis assisted by Bart Gijsen (TNO) DNS-OARC, San Francisco, March 2011

DNS-OARC Wayne MacLaurin Executive Director DNS-OARC Members EP.NET DNS-OARC's Mandate

DNS Coffee Ian Foster $ whoami Ian Foster UCSD Graduate B.S./M.S. (2014/2015)

Meeting your goals Meeting your goals Meeting your goals Meeting your goals We are DNS DNS

DNSSEC in Windows DNS Server Kumar Ashutosh, Microsoft Windows DNS Server Widely deployed in

DNS Session 2: DNS cache operation and DNS debugging Joe Abley AfNOG 2006 workshop How caching

DNS Session 2: DNS cache How caching NS works (1) operation and DNS debugging If we've dealt

Anomaly Detection on DNS Auths Root DNS, ccTLDs and DNS providers Team

ACCC Regulation &amp; Competition Conference Sydney, 25-26 July 2002 Australias Productivity

Some motivating facts Size distribution of firms (measured by assets, sales or employment) is

Rate of Return on a Stock When considering investments, helpful to describe statistical

Returns Optimization 101 Episode 7: Examining Return Reasons Power of Return Reasons Return

Inter-Reactive Kotlin Applications Julien Viet @julienviet Julien Viet Open source developer

A Long-Term User-Centric Analysis of Deduplication Patterns 32 nd International Conference on

Baumgartner, POLI 203 Fall 2014 Last Class of the Semester Readings: Review and California Court

Explaining the Boom-Bust Cycle in the U.S. Housing Market: A Reverse-Engineering Approach

ACCC Regulation & Competition Conference Sydney, 25-26 July 2002 Australias Productivity