View Invalidation for Dynamic Content Caching in Multitiered Architectures K. Selçuk Candan Divyakant Agrawal Wen-Syan Li Oliver Po Wang-Pin Hsiung NEC USA, C&C Research Labs. CA USA
Multi-tiered architectures…. Clients do not access the � database directly. Instead, they use applications � which invoke DBMSs – or they access result caches proxy cache (A) – front-end cache (B) – E edge cache (C) – user side cache (D) – F middle-tier caches (E) – Presented by K. Selçuk Candan 12/3/2002
Problem….. Users Presented by K. Selçuk Candan 12/3/2002
Result caches and consistency � Various view materialization and update management techniques – have been proposed to deal with updates to the underlying data. � These techniques guarantee that cached results are always consistent – with the underlying data. Presented by K. Selçuk Candan 12/3/2002
Strong consistency requirements.. Data Warehouse Data Data Presented by K. Selçuk Candan 12/3/2002
Strong consistency requirements.. Data Warehouse Data Data Presented by K. Selçuk Candan 12/3/2002
Strong consistency requirements.. Queries Data Warehouse Data Data Presented by K. Selçuk Candan 12/3/2002
Result Caches and consistency � Various view materialization and update management techniques – have been proposed to deal with updates to the underlying data. � These techniques guarantee that cached results are always consistent – with the underlying data. � Other applications do not require caches reflect the database exactly all the time. Presented by K. Selçuk Candan 12/3/2002
Relaxed consistency requirements.. Queries Queries Data Warehouse Middletier Cache Misses Data Data Data Data Presented by K. Selçuk Candan 12/3/2002
Invalidation vs. view maintenance Result caches need all out-dated results be invalidated – in a timely fashion. Presented by K. Selçuk Candan 12/3/2002
Example � Page: http://www.autobuy.com/modelinfo?car=Toyota select maker, model, price from Car where maker = "Toyota"; is cached. Presented by K. Selçuk Candan 12/3/2002
Example (cont.) � If a new tuple (Toyota; Avalon; 25000) – is inserted into Car, then we can either recompute the new results of this query (preferably incrementally) and – rerun the application to regenerate the page. – or purge the corresponding page from the cache. – the request can still served from the database! – Presented by K. Selçuk Candan 12/3/2002
Overinvalidation as a tool � Overinvalidation can be used if accurate invalidation is too expensive or – not feasible in a given time frame – � Underinvalidation is not acceptable! Invalidation is inherently cheaper than view maintenance: • we do not need to compute all consequences of updates • to reduce the invalidation delay, we can overinvalidate Presented by K. Selçuk Candan 12/3/2002
Query and update streams… up1 up2 up3 inv2 inv3 inv1 q1 q2 q3 q4 q5 Presented by K. Selçuk Candan 12/3/2002
Example � Query, select * from Car, Mileage where Car.maker = "Toyota" and Car.model = Mileage.model; New tuples: � (“Mitsubishi", “Galant", 23000), (No additional information required) – (“Toyota", “Avalon", 25000), – (Additional information required) � For the second tuple, we need to check whether Car.model = Mileage.model (Polling query) – can be satisfied using the data in the database. Presented by K. Selçuk Candan 12/3/2002
Polling queries (cont.) � Polling query that has to be answered: select * from Mileage where "Avalon" = Mileage.model; � If the result to polling query is non-empty, then the newly inserted tuple affected the query – Keypoint: We only need to check for existence , we do not need to evaluate the polling query completely Presented by K. Selçuk Candan 12/3/2002
?: the effect of updates on join views Presented by K. Selçuk Candan 12/3/2002
?: the effect of updates on join views - no distinction between deleted or inserted tuples - no need to evaluate entire ? Presented by K. Selçuk Candan 12/3/2002
Challenges in calculating ? available from the update logs Presented by K. Selçuk Candan 12/3/2002
Challenges in calculating ? not available !!! available from the update logs snapshot-based: a copy of the database is maintained � synchronous: a single copy is maintained � the copy is locked during invalidation – asynchronous: a single copy is maintained � no locking is used – Presented by K. Selçuk Candan 12/3/2002
Snapshot-based approach (new and old versions are available) Presented by K. Selçuk Candan 12/3/2002
Results � Snapshot-based approach no over- or under-invalidation – replication overhead – Presented by K. Selçuk Candan 12/3/2002
Synchronous approach (only new available) - old version of the database is not available!!! OVERINVALIDATION Presented by K. Selçuk Candan 12/3/2002
Results � Snapshot-based approach no over- or under-invalidation – replication overhead – � Synchronous approach when there are more than two relations, unrecoverable over- – invalidation is possible locking overhead – Presented by K. Selçuk Candan 12/3/2002
Asynchronous approach (neither old nor new is available) Presented by K. Selçuk Candan 12/3/2002
Results � Snapshot-based approach no over- or under-invalidation – replication overhead – � Synchronous approach when there are more than two relations, unrecoverable over- – invalidation is possible locking overhead – � Asynchronous approach when there are more than two relations, unrecoverable under- – invalidation is possible no overhead – Presented by K. Selçuk Candan 12/3/2002
Efficiency: consolidated invalidation TIME Presented by K. Selçuk Candan 12/3/2002
Consolidated invalidation Presented by K. Selçuk Candan 12/3/2002
Consolidated invalidation Presented by K. Selçuk Candan 12/3/2002
Consolidated invalidation Presented by K. Selçuk Candan 12/3/2002
Consolidated invalidation Presented by K. Selçuk Candan 12/3/2002
Consolidation versus individual invalidation � Individual invalidation: is the average top-1 retrieval cost – is the number of queries – � Consolidated invalidation: is the total size of ? – Presented by K. Selçuk Candan 12/3/2002
Polling query overhead Presented by K. Selçuk Candan 12/3/2002
Polling query overhead Presented by K. Selçuk Candan 12/3/2002
Overinvalidation vs. table sizes Presented by K. Selçuk Candan 12/3/2002
Overinvalidation vs. update rate Presented by K. Selçuk Candan 12/3/2002
Conclusions � Fast invalidation is key for caching in multi-tiered architectures � Hard consistency is not required by many applications Overinvalidation is acceptable – Underinvalidation is not! – � View invalidation is inherently cheaper than view maintenance � View invalidation is feasible! Presented by K. Selçuk Candan 12/3/2002
Recommend
More recommend