GlobeTP: Template-Based ◭◭ ◮◮ Database Replication for Scalable ◭ ◮ Web Applications Page 1 of 18 Tobias Groothuyse, Swaminathan Sivasubramanian, and Guillaume Pierre. In procedings of WWW 2007, May 8-12, Go Back 2007, Banff, Alberta, Canada. Full Screen Dina Adel Said Close dsaid@vt.edu Quit
Problem Definition ◭◭ ◮◮ • How to provide a scalable infrastructure ◭ ◮ for hosting dynamically generated web content? Page 2 of 18 • Past Solutions: Go Back 1. Cache generated pages 2. Distribute the computational across Full Screen multiple application servers 3. Cache the results of DB queries. Close • Problems: Bottleneck resides in the throughput of Quit the origin DB.
Problem Definition (cont.) ◭◭ ◮◮ • Solution: Use DB Replication. ◭ ◮ • Problem: Doesn’t scale linearly because Page 3 of 18 all update, delete, insert (UDI) queries are performed to each DB relipca. Go Back • Past solutions: 1. Increase the throughput of each indi- Full Screen vidual sever 2. Partial Replication Close Quit
Partial Replication ◭◭ ◮◮ • Past Solutions: ◭ ◮ – Depending on the application program- mer Gao et al. [2003] Page 4 of 18 – GlobeDB: Sivasubramanian et al. [2005]. Go Back ∗ Record-level replication granularity ∗ Provides excellent query latency Full Screen ∗ A central sever maintains all the updates then sends batch updates to other servers. ∗ Does not improve the thoughput because the Close central server provides a bottleneck. Quit
DBTP: Template-Based solution ◭◭ ◮◮ • The nature of web applications belong to ◭ ◮ small number of query templates. • Query template: parameterized SQL Page 5 of 18 query where parameters are passed at run time. Go Back • By knowing these templates, table place- Full Screen ments are selected to insure maximum throughput and reasonable latency. Close Quit
Models ◭◭ ◮◮ • Application Model: ◭ ◮ – The application programmer is required to specify explicity the application templates. Page 6 of 18 • System Model: Go Back Full Screen Close Quit
Main problems to consider ◭◭ ◮◮ 1. Cluster Identification: Ensure that the ◭ ◮ placement of tables would find at least one server to execute each query tem- Page 7 of 18 plate. 2. Consider all the defined templates, read Go Back or UDI, and determine the best place- ment to provide the maximum through- Full Screen put. Close 3. Define a load balancing algorithm that al- lows read queries to distribute efficiently. Quit
Data Placement: Cluster ◭◭ ◮◮ Identification ◭ ◮ • Goal: Determines the set of tables that is needed to be replicated together so that Page 8 of 18 templates function correctly. Meanwhile, number of servers that must execute the Go Back UDI query should be minimized. • Characterize each query template: Full Screen 1. Whether it is read or UDI Close 2. The set of tables that it accesses. Quit
Data Placement: Load Analysis ◭◭ ◮◮ • Determines the load received by each of ◭ ◮ the cluster. • Determines the load on Table Clusters: Page 9 of 18 – Read or UDI query – Frequency of template occurrence Go Back – Computational complexity for executing this query: Full Screen ∗ Use DB systems tools to estimate the actual execution time. ∗ Run the query in a live system. Close • Determines the load on DB servers (Read or UDI query) Quit
Data Placement: Cluster Placement ◭◭ ◮◮ • Determines the placement of the cluster ◭ ◮ across the set of DB servers load achieved by each replica is minimized. Page 10 of 18 • Using exhaustive search O (2 N ∗ T /N !) , where T is No. of tables and N number of Nodes. Go Back Full Screen Close Quit
Query Routing ◭◭ ◮◮ • Round Robin (RR): Efficient if all coming ◭ ◮ queries have the same cost. • RR-QID: RR by Query ID Page 11 of 18 – Each Query template is identified by its QID. – Each queue is associated with the set of DB Go Back servers that can server a certain QID. – RR fashion is implemented for each queue. Full Screen • Cost-based Routing – Upon arrival of incoming query, the query Close router estimates the current load on each DB server. – The Query is scheduled to the least loaded DB Quit server (that can serve the query).
Experiments ◭◭ ◮◮ • Compare Globe-TP with full DB replica- ◭ ◮ tion using: – TPC-W: standard e-commerce benchmark Page 12 of 18 – RUBBoS: bulletin-board benchmark modeled after slashdot.org Go Back Full Screen Close Quit
Experiments (cont.) ◭◭ ◮◮ • Query latency distributions using 4 ◭ ◮ servers. Page 13 of 18 Go Back Full Screen Close Quit
Experiments (cont.) ◭◭ ◮◮ • Maximum achievable throughputs with ◭ ◮ 90% of queries processed within 100ms. Page 14 of 18 Go Back Full Screen Close Quit
Advantages ◭◭ ◮◮ • Easily coupled with a distributed DB ◭ ◮ query cache. • Does not require any modification in the Page 15 of 18 application itself. Go Back Full Screen Close Quit
Disadvantages ◭◭ ◮◮ • Does not support transactions. However, ◭ ◮ it can be implemented through query router. Page 16 of 18 • Limitation due to table granularity par- tial replication. Go Back • Fault Tolerance issues. Full Screen • Does not take into consideration the long- term load variations that must be ex- Close pected when operating a popular dy- namic web site. Quit
◭◭ ◮◮ References Lei Gao, Mike Dahlin, Amol Nayate, Jiandan Zheng, and Arun Iyengar. Application specific data ◭ ◮ replication for edge services. In WWW ’03: Proceedings of the 12th international conference on World Wide Web , 449–460, Budapest, Hungary. 2003. ISBN 1-58113-680-3. Swaminathan Sivasubramanian, Gustavo Alonso, Guillaume Pierre, and Maarten van Steen. Globedb: autonomic data replication for web applications. In WWW ’05: Proceedings of the 14th international Page 17 of 18 conference on World Wide Web , 33–42, Chiba, Japan. 2005. ISBN 1-59593-046-9. Go Back Full Screen Close Quit
◭◭ ◮◮ ◭ ◮ Page 18 of 18 Thank you Go Back dsaid@vt.edu Full Screen Close Quit
Recommend
More recommend