Web Traffic Measurement and Web Traffic Measurement and The University of North Carolina at Chapel Hill The University of North Carolina at Chapel Hill The University of North Carolina at Chapel Hill Department of Computer Science Department of Computer Science Department of Computer Science Analysis at UNC-Chapel Hill Analysis at UNC-Chapel Hill 11 th th ACM/IEEE International Symposium on Modeling, Analysis and • In 1997, In 1997, populating web traffic generators populating web traffic generators for for ACM/IEEE International Symposium on Modeling, Analysis and • In 1997, populating web traffic generators for • 11 Simulation of Computer and Telecommunication Systems (MASCOTS) Simulation of Computer and Telecommunication Systems (MASCOTS) experimental networking research motivated a large- experimental networking research motivated a large- experimental networking research motivated a large- Orlando, October 13 th th , 2003 Orlando, October 13 , 2003 scale study of web traffic at UNC with three goals: scale study of web traffic at UNC with three goals: scale study of web traffic at UNC with three goals: � Develop a light-weight methodology � Develop a light-weight methodology � Develop a light-weight methodology Tracking the Evolution of Tracking the Evolution of Tracking the Evolution of – Based on passive measurement Based on passive measurement – Based on passive measurement – Web Traffic: 1995-2003 Web Traffic: 1995-2003 Web Traffic: 1995-2003 – Easy Easy to maintain models up-to-date to maintain models up-to-date – Easy to maintain models up-to-date – � Replace smaller-scale, quickly aging models � Replace smaller-scale, quickly aging models � Replace smaller-scale, quickly aging models Félix Hernández-Campos – Mah – Mah, 1995 data set , 1995 data set – Mah, 1995 data set Kevin Jeffay – Crovella – Crovella et. al et. al , 1995 data set (revised with 1998 data) , 1995 data set (revised with 1998 data) – Crovella et. al , 1995 data set (revised with 1998 data) F. Donelson Smith � Characterize the use of the HTTP protocol � Characterize the use of the HTTP protocol � Characterize the use of the HTTP protocol http://www.cs.unc.edu/Research/dirt – E.g. E.g. , Use of persistent connections , Use of persistent connections – E.g. , Use of persistent connections – 1 1 2 2 Web Traffic Measurement and Web Traffic Measurement and Methodology Methodology Analysis at UNC-Chapel Hill Study of Web Content Consumers Analysis at UNC-Chapel Hill Study of Web Content Consumers • Our methodology and first results were published in • • Our methodology and first results were published in Our methodology and first results were published in HTTP Requests HTTP Requests SIGMETRICS/Performance’ SIGMETRICS/Performance ’01 01 SIGMETRICS/Performance’01 University of University of North Carolina Internet North Carolina Internet – – What TCP/IP Protocol Headers Can Tell Us About the Web What TCP/IP Protocol Headers Can Tell Us About the Web – What TCP/IP Protocol Headers Can Tell Us About the Web at Chapel Hill at Chapel Hill • • Modeling aspect explored in a series of papers Modeling aspect explored in a series of papers • Modeling aspect explored in a series of papers HTTP Responses HTTP Responses – – E.g., Variable Heavy Tails in Internet Traffic E.g., Variable Heavy Tails in Internet Traffic (with J.S. (with J.S. Web Clients Web Clients – E.g., Variable Heavy Tails in Internet Traffic (with J.S. Web Clients Web Servers Web Servers Web Servers Marron) ) Marron) Marron • We studied a large collection of users (~35,000) as We studied a large collection of users (~35,000) as • We studied a large collection of users (~35,000) as • » » (Part I: (Part I: Understanding Heavy Tails Understanding Heavy Tails published in MASCOTS published in MASCOTS’ ’02) 02) » (Part I: Understanding Heavy Tails published in MASCOTS’02) web content consumers web content consumers web content consumers • In this talk, I will describe our approach and our • In this talk, I will describe our approach and our • In this talk, I will describe our approach and our observation on the evolution of web traffic: observation on the evolution of web traffic: observation on the evolution of web traffic: • The only source of data for our study were packet The only source of data for our study were packet • The only source of data for our study were packet • – Three data sets: 1999, 2001 and 2003 Three data sets: 1999, 2001 and 2003 header traces header traces – Three data sets: 1999, 2001 and 2003 – header traces – Comparisons to Mah and Crovella Comparisons to Mah and Crovella et al. et al. – Comparisons to Mah and Crovella et al. – – Anonymized IP addresses – Anonymized IP addresses – Anonymized IP addresses – No HTTP headers – No HTTP headers – No HTTP headers 3 3 4 4
Methodology Methodology Methodology Methodology One-Way Packet Header Traces One-Way Packet Header Traces Processing Sequence Overview Processing Sequence Overview Raw TCP/IP Raw TCP/IP TCP TCP Gigabit Ethernet Gigabit Ethernet tcpdump tcpdump headers headers Filter & Sort Filter & Sort Connections Connections University of University of trace trace (Port 80) (Port 80) North Carolina North Carolina Internet Internet at Chapel Hill at Chapel Hill Connection-level Connection-level Analysis Analysis Web Clients Traffic Monitor Traffic Monitor Web Clients Web Clients Web Servers Web Servers Web Servers ( tcpdump ) ( tcpdump ) • Only inbound TCP/IP headers are captured Only inbound TCP/IP headers are captured • Only inbound TCP/IP headers are captured • HTTP HTTP HTTP HTTP Client-level Client-level Client Req/ /Rsp Rsp Client Req Analysis Analysis – Eliminate synchronization and buffering issues on the NIC Eliminate synchronization and buffering issues on the NIC – Eliminate synchronization and buffering issues on the NIC – Behavior Behavior Exchanges Exchanges – Reduce trace size Reduce trace size – Reduce trace size – • • Trace collection: 2.7 TB of packet headers Trace collection: 2.7 TB of packet headers • Trace collection: 2.7 TB of packet headers Statistical Analysis Statistical Analysis – ~40 billion packets – ~40 billion packets ~16 TB of data transfers ~16 TB of data transfers – ~40 billion packets ~16 TB of data transfers 5 5 6 6 TCP/IP Headers and HTTP TCP/IP Headers and HTTP TCP/IP Headers and HTTP TCP/IP Headers and HTTP Request/response Exchange Server-to-client Segments Only Request/response Exchange Server-to-client Segments Only Web Client (UNC) Web Server (Internet) Web Client (UNC) Web Server (Internet) Web Client (UNC) Web Server (Internet) Web Client (UNC) Web Server (Internet) S Y S Y N N 1 ackno ackno 1 seqno seqno 1 1 SYN-ACK SYN-ACK SYN-ACK SYN-ACK A A C C K K HTTP HTTP D D A A T T A A s s e e q q n n o 3 o 3 0 0 5 5 a a c c k k n n o o 1 1 Request Request HTTP HTTP Ackno Ackno 305 305 305 305 ackno ackno ackno ackno 304 bytes 304 bytes seqno seqno 1 1 seqno seqno 1 1 ACK Request Request ACK increased increased ACK ACK 304 bytes 304 bytes HTTP HTTP 305 305 ackno ackno 305 ackno ackno 305 1461 1461 seqno 1461 seqno 1461 seqno seqno DATA DATA DATA DATA Response Response 305 305 ackno 305 HTTP HTTP ackno 305 2876 ackno 2876 ackno 2876 2876 seqno seqno seqno seqno DATA DATA DATA DATA 2875 bytes 2875 bytes Response Response Seqno Seqno A A C C K K s s e e q n q n o o 3 3 0 5 0 5 a a c c k k n n o o 2 2 8 8 7 7 6 6 increased increased 2875 bytes 2875 bytes FIN FIN FIN FIN F F I N I N - - A A C C K K F F I N I N FIN-ACK FIN-ACK FIN-ACK FIN-ACK 7 7 8 8
Recommend
More recommend