Evaluating a New Approach to Strong Web Cache Consistency with Snapshots of Collected Content ∗ Mikhail Mikhailov and Craig E. Wills Computer Science Department Worcester Polytechnic Institute {mikhail,cew}@cs.wpi.edu Presented by Craig E. Wills at the 12 th International World Wide Web Conference Budapest, Hungary May 23, 2003 ∗ Partially supported by the National Science Foundation Grant CCR-9988250. 1
Roadmap • Current Practice and Previous Work • Approach to Object Management • Evaluation Methodology • Results • Conclusions and Future Work 2
Current Practice—Cache Control Servers can tell caches no-cache or Expires and can provide Last-Modified time Given only Last-Modified , caches use %-age of object age as a reasonable freshness estimate: • this approach is heuristic in nature (caches may use different %-ages) • this approach results in many unnecessary validations (15–37% of all requests) • clients still receive stale objects 3
Previous Work • Volumes ( Krishnamurthy and Wills, WWW98; Cohen et al., SIGCOMM98; Krishnamurthy and Rexford, Addison Wesley01 ) • Piggyback (In)Validation ( Krishnamurthy and Wills, WWW98, USITS97 ) • Server Invalidation, Leases, Volume Leases ( Liu and Cao, ICDCS97, Yin et al., ICDCS98, USITS99, WWW01, TOIT02 ) • Data Update Propagation ( Challenger and Iyengar, USITS97, SC98, INFOCOM99, INFOCOM00 ) 4
Our Goals • manage objects deterministically rather than heuristically • guarantee strong cache consistency • scale system to large number of clients (no per-client state) • eliminate unnecessary validations and byte transfers 5
News Portal Example index.html CO EO3 adbanner.gif EO5 logo1.gif EO4 top.photo.jpg main.css EO1 main.js EO2 • CO - HTML page (container) • EO1 - CSS object • EO2 - JavaScript code • EO3 - site logo image • EO4 - top story photo • EO5 - ad banner image 6
Foundation for Our Approach • objects are not stand alone, they have relationships with other objects • objects change at different rates, i. e. they have different change characteristics • a Web page is composed of many heterogeneous objects • caches can contact servers on each access to fetch frequently changing objects • servers can inform caches of updates to other objects on the same page • could expose internal structure of composite objects to clients, so clients too can operate on individual components 7
Object Change Characteristics on each Legend: access Born−on−Access Cacheable (BoA) Deterministic (ND) Uncacheable Changes how often? frequently Relatively Periodic Dynamic (RDyn) Non rarely Relatively Static (RSt) never Static yes no Changes predictably? (Can be managed deterministically?) 8
Object Relationships index.html main.css adbanner.gif top.photo.jpeg main.js logo1.gif BoA St St RSt RSt RSt 9
Combining Object Change Characteristics and Relationships • Use the retrieval of a Born-on-Access object to manage Non-Deterministic objects • If need be, force validation of one Non-Deterministic object to manage other Non-Deterministic objects 10
MONARCH Management of Objects in a Network using Assembly, Relationships, and Change cHaracteristics • servers classify objects based on object change characteristics • servers group related objects into volumes • servers determine which combination describes each volume • servers designate one object in each volume to be the manager • servers assign all objects Content Control Commands (CCCs) • servers and caches use CCCs to manage objects deterministically 11
Evaluation Methodology • selected recognizable Web sites amazon.com, boston.com, cisco.com, cnn.com, espn.com, ora.com, photo.net, slashdot.org, usenix.org, wpi.edu, yahoo.com • collected snapshots of content ( home page , static , and transient links) from each site every 15 minutes 9am–9pm for 14 days (June/July 2002); fetched each object for at least 1 hour • simulated 9 scenarios for each site: content: home, home+static, home+static+transient requests: every 15min, 9am, 9am+noon+4pm+8pm 12
Policies Studied • MONARCH (M) • No Cache (NC) and Optimal (Opt) • Never Validate (NV) and Always Validate (AV) • Heuristic (H5, H10) and Current Practice (CP) • Object and Volume Leases (OVL) 13
Effectiveness of the CP Policy Requests and KB served by Server Site Opt CP NC cisco 1.9 2.9 3.5 19.6 19.2 55.4 cnn* 6.3 56.1 16.4 77.6 31.4 190.8 espn* 4.3 75.3 19.4 85.4 38.7 159.5 (* - stale content served in at least one scenario) • transfers 50–60% fewer bytes than NC • transfers more bytes than Opt • issues more requests than Opt • serves stale content 14
Comparison of Policies 100 CNN, 31.4 Requests, 190.8 KB Opt 90 M %-ages are relative to NC CP 80 OVL % Bytes Fetched 70 H5 *H10 60 AV 50 *NV 40 30 20 10 0 0 10 20 30 40 50 60 70 80 90 100 % Requests to Server • all policies, even AV, offer substantial (at least 50–60%) byte savings • H5 and H10 outperform CP but serve stale content in at least one scenario • M, OVL, and Opt have similar performance 15
Server Overhead Site MONARCH OVL Volumes Revisions Object Leases total avg avg max cisco 4 121 67 70 cnn 84 198 580 941 espn 63 167 525 806 • number of object leases maintained by OVL depends on the number of clients and request arrivals • overhead of MONARCH is independent of arrival rate or number of clients • invalidation traffic is negligible 16
Response Time Based on data obtained by Krishnamurthy et.al., IMW02 20 Response Time for Modem Client (sec.) 15 10 5 M CP AV NC M CP AV NC 0 boston/cnn/espn cisco/photonet/slashdot Web Sites • Opt, M, and OVL map to the same bucket • H5 and H10 map to Opt or CP • can improve upon CP for larger, more dynamic, pages 17
Conclusions • MONARCH manages objects deterministically and provides strong cache consistency • Server state maintained by MONARCH is independent of request rate or number of clients • MONARCH outperforms heuristic policies in terms of requests and bytes • Used snapshots of content actively collected from real Web sites to evaluate cache consistency policies 18
Future Work • Content assembly: selective, personalization, URL-rewriting • Dynamic change characteristics • Coupling MONARCH with existing templating mechanisms • Applying our ideas to non-HTML content: WML, MPEG-4, on-line games • Further refine and expand the content collection methodology 19
Recommend
More recommend