An Empirical Model of HTTP Network Traffic Bruce A. Mah bmah@ca.sandia.gov University of California at Berkeley and Sandia National Laboratories T Y • O I F S • R C E A V A L I I F N O U • R L L E E I G H T T N H T H I E E T R E B A • • • • 1 8 6 8 IEEE INFOCOM ’97 10 April 1997 An Empirical Model of HTTP Network Traffic Last Change: April 4, 1997 Page 1 of 15
Motivation HTTP dominates wide-area Internet traffi c Leading contributer to byte- and packet-count across NSFNET backbone as early as April 1995 A synthetic workload of this application is needed Network simulators Benchmarks An Empirical Model of HTTP Network Traffi c Last Change: April 4, 1997 Page 2 of 15
Synopsis We have constructed a synthetic workload of HTTP network activity based on traffi c traces. This model is consistent with prior Web measurement studies. An Empirical Model of HTTP Network Traffi c Last Change: April 4, 1997 Page 3 of 15
Overview Prior Work Model Components Methodology Measurements An Empirical Model of HTTP Network Traffi c Last Change: April 4, 1997 Page 4 of 15
Prior Work Measurement Studies Server logs Only measure activity at one server Client logs Require instrumented clients Static Document surveys Document indices Don’t capture user reference patterns Synthetic Workloads tcplib Predates the World Wide Web An Empirical Model of HTTP Network Traffi c Last Change: April 4, 1997 Page 5 of 15
Model Components Request Reply Server Sizes Sizes Primary Secondary Client User Document Think Length Time Consecutive Documents Per Server Server Selection Each component defi ned by a pr obability distribution An Empirical Model of HTTP Network Traffi c Last Change: April 4, 1997 Page 6 of 15
Methodology Packet Traces tcpdump on an Ethernet Easily trace many clients Capture protocol overheads Lose higher-level information (documents, cache behavior) Filtering Remove non-local clients Remove periodic retrievals Apply Heuristics Attempt to recover some higher-level structures Construct Probability Distributions An Empirical Model of HTTP Network Traffi c Last Change: April 4, 1997 Page 7 of 15
Request Sizes 1 0.8 0.6 CDF 0.4 Primary Secondary 0.2 0 0 500 1000 1500 2000 Request Length in Bytes Request sizes have a bimodal distribution An Empirical Model of HTTP Network Traffi c Last Change: April 4, 1997 Page 8 of 15
Reply Sizes 1 0.8 0.6 CDF Primary Mean 17932 0.4 Median 2099 Primary Secondary Mean 6868 Secondary 0.2 Median 1985 0 0 5000 10000 15000 20000 25000 30000 Reply Length in Bytes HTTP reply sizes have a heavy-tailed distribution Primary replies tend to be longer than secondary replies An Empirical Model of HTTP Network Traffi c Last Change: April 4, 1997 Page 9 of 15
Document Length 1 0.8 T thresh = 0.1 0.6 T thresh = 0.2 CDF T thresh = 0.5 T thresh = 1.0 0.4 T thresh = 2.0 T thresh = 5.0 T thresh = 10.0 0.2 T thresh = 20.0 0 0 2 4 6 8 10 Number of Connections A timing heuristic can discriminate between documents 80% of documents require fewer than four fi le transfers Picked idle threshhold T thresh = 1.0 seconds An Empirical Model of HTTP Network Traffi c Last Change: April 4, 1997 Page 10 of 15
User Think Time 1 0.8 0.6 CDF 0.4 Mean 1313 0.2 Median 15 0 0 1000 2000 User Think Time (seconds) An Empirical Model of HTTP Network Traffi c Last Change: April 4, 1997 Page 11 of 15
Consecutive Documents per Server 1 0.8 0.6 CDF 0.4 Mean 4.1 0.2 Median 2.0 0 2 4 6 8 10 12 14 Consecutive Documents Retrieved 80% of visits to a server retrieve fewer than six documents An Empirical Model of HTTP Network Traffi c Last Change: April 4, 1997 Page 12 of 15
Server Selection Rank # Location *.cs.berkeley.edu 1 43 *.berkeley.edu 2 11 *.*.com 3 8 *.*.com 4 7 *.cs.berkeley.edu 5 6 *.*.com 6 6 www4.*.net 7 6 *.cs.berkeley.edu 8 6 www5.*.com 9 5 www.*.com 10 5 Four hosts in the top ten ranking are local Sample size seems small, Zipf’s Law approximation An Empirical Model of HTTP Network Traffi c Last Change: April 4, 1997 Page 13 of 15
Areas for Future Work Better Server Selection Distribution Filter effects caused by local servers Larger sample size Persistent-Connection HTTP Can’t use TCP connection sizes to determine request/reply sizes Correlation between model distributions Do users retrieve more or fewer consecutive documents from “popular” sites? Newer datasets An Empirical Model of HTTP Network Traffi c Last Change: April 4, 1997 Page 14 of 15
Summary Packet traces can help to build a model of HTTP traffi c Characterized HTTP network traffi c to build a synthetic workload Results consistent with prior Web measurement studies C++ source code available for simulators For more information and model data: http://http.cs.berkeley.edu/~bmah/Software/HttpModel An Empirical Model of HTTP Network Traffi c Last Change: April 4, 1997 Page 15 of 15
Recommend
More recommend