building a database on s3 building a database on s3
play

Building a Database on S3 Building a Database on S3 Matthias - PowerPoint PPT Presentation

Building a Database on S3 Building a Database on S3 Matthias Brantner , Daniela Florescu + , David Graf , Donald Kossmann Tim Kraska Donald Kossmann , Tim Kraska Systems Group, ETH Zurich 28msec Inc. Oracle +


  1. Building a Database on S3 Building a Database on S3 Matthias Brantner � , Daniela Florescu + , David Graf � , Donald Kossmann � � Tim Kraska � Donald Kossmann � � , Tim Kraska � Systems Group, ETH Zurich � 28msec Inc. � Oracle + September 25, 2007

  2. Motivation � � Building a web page starting a blog and Building a web page, starting a blog, and making both searchable for the public have become a commodity � But providing your own service (and to get rich) still comes at high cost: � Have the right (business) idea Have the right (business) idea Run your own web-server and database � Maintain the infrastructure � Keep the service up 24 x 7 K th i 24 7 � Backup the data � � Tune the system if the service is used more often A d th And then comes the Digg-Effect th Di Eff t June 18, 2008 Tim Kraska/ETH Zurich/tim.kraska@inf.ethz.ch 2

  3. Requirements for DM on the Web � Scalability Scalability � response time independent of number of clients � No administration � No administration � „outsource“ patches, backups, fault tolerance � 100 percent read + write availability 100 t d it il bilit � no client is ever blocked under any circumstances � Cost ($$$) � get cheaper every year, leverage new technology � pay as you go along, no investment upfront June 18, 2008 Tim Kraska/ETH Zurich/tim.kraska@inf.ethz.ch 3

  4. Utility Computing as a solution � � � Scalability Scalability � response time independent of number of clients � � � No administration � No administration � „outsource“ patches, backups, fault tolerance � no client is ever blocked under any circumstances � � � 100 percent read + write availability 100 t d it il bilit � Cost ($$$) � get cheaper every year, leverage new technology � � pay as you go along, no investment upfront ? ? Consistency: optimization goal, y p g , not constraint June 18, 2008 Tim Kraska/ETH Zurich/tim.kraska@inf.ethz.ch 3

  5. Utility Computing as a solution � � � Scalability Scalability � response time independent of number of clients � � � No administration � No administration � „outsource“ patches, backups, fault tolerance � no client is ever blocked under any circumstances � � � 100 percent read + write availability 100 t d it il bilit Consistency Cost How much consistency is How much does it cost? o uc does t cost � Cost ($$$) � get cheaper every year, leverage new technology � required by my application? � pay as you go along, no investment upfront ? ? Consistency: optimization goal, y p g , not constraint June 18, 2008 Tim Kraska/ETH Zurich/tim.kraska@inf.ethz.ch 3

  6. Amazon Web Services (AWS) � Most popular utility provider Most popular utility provider Gives us all necessary building blocks (Storage, CPU-cycles, etc.) � Other providers also appear on the market � � Amazon infrastructure services: Simple Storage Service (S3) Simple Queuing Service (SQS) • (Virtually) infinite store • (Virtually) infinite store • Message service • Message service • Costs: $0.15 per GB-month + transfer costs • Allows to exclusively receive a message ($0.1-$0.17 In/Out per GB) • Costs: $0.0001 per message sent + transfer costs Elastic Cloud Computing (EC2) SimpleDB • Virtual instance: 1-8 virtual cores (=1.0-2.5 Vi t l i t 1 8 i t l ( 1 0 2 5 • Basically a text-index B i ll t t i d GHz Opterons), 1.7-15 GB of memory, • Costs: $0.14 per Amazon SimpleDB 160GB-1690GB of instance storage machine hour consumed • Costs: $0.1-$0.8 per hour + transfer costs June 18, 2008 Tim Kraska/ETH Zurich/tim.kraska@inf.ethz.ch 4

  7. Plan of Attack � St � Step 1: Use S3 as a huge shared disk 1 U S3 h h d di k � leverage scalability, no admin features � Step 2: Allow concurrent access to shared disk in a distributed system � keep properties of a distributed system, maximize consistency � Step 3: Do application-specific trade-offs � consistency vs. cost � consistency vs. availability � consistency à la carte (levels of consistency) June 18, 2008 Tim Kraska/ETH Zurich/tim.kraska@inf.ethz.ch 5

  8. Plan of Attack � St � Step 1: Use S3 as a huge shared disk 1 U S3 h h d di k � leverage scalability, no admin features � Step 2: Allow concurrent access to shared disk in a distributed system � keep properties of a distributed system, maximize consistency � Step 3: Do application-specific trade-offs � consistency vs. cost � consistency vs. availability � consistency à la carte (levels of consistency) June 18, 2008 Tim Kraska/ETH Zurich/tim.kraska@inf.ethz.ch 6

  9. Shared-Disk Architecture C Client 1 / EC2 Cli t 1 / EC2 ould be Client M / EC2 comp Application Application ........ pletely e exec ........ Record Manager Record Manager cuted o on the ........ Page Manager Page Manager client n EC2 N 1 2 3 4 5 6 Page N Page 1 Page 2 Page 3 Page 4 Page 5 Page 6 or S3 June 18, 2008 Tim Kraska/ETH Zurich/tim.kraska@inf.ethz.ch 7

  10. Problem: Eventual Consistency Client 1 / EC2 Client 2 / EC2 � Two clients update Application Application the same page the same page � Last update wins Record Manager Record Manager � Consistency problem Page Manager Page Manager Page Manager Page Manager C i t bl � Inconsistency between indexes and page i d d age N age 1 age 2 age 3 age 4 age 5 age 6 � Lost records � Lost updates Lost updates Pa Pa Pa Pa Pa Pa Pa S3 S3 June 18, 2008 Tim Kraska/ETH Zurich/tim.kraska@inf.ethz.ch 8

  11. Plan of Attack � St � Step 1: Use S3 as a huge shared disk 1 U S3 h h d di k � leverage scalability, no admin features � Step 2: Allow concurrent access to shared disk in a distributed system � keep properties of a distributed system, maximize consistency � Step 3: Do application-specific trade-offs � consistency vs. cost � consistency vs. availability � consistency à la carte (levels of consistency) June 18, 2008 Tim Kraska/ETH Zurich/tim.kraska@inf.ethz.ch 9

  12. Levels of Consistency [Tanenbaum] � Shared-Disk (Naïve approach) Shared Disk (Naïve approach) � No concurrency control at all � Eventual Consistency (Basic Protocol) � Eventual Consistency (Basic Protocol) � Updates become visible any time and will persist � No lost update on page level No lost update on page level � Atomicity � All or no updates of a transaction become visible � Monotonic reads, Read your writes, Monotonic writes, ... � Strong Consistency g y � database-style consistency (ACID) via OCC June 18, 2008 Tim Kraska/ETH Zurich/tim.kraska@inf.ethz.ch 10

  13. Levels of Consistency [Tanenbaum] � Shared-Disk (Naïve approach) Shared Disk (Naïve approach) � No concurrency control at all � Eventual Consistency (Basic Protocol) � Eventual Consistency (Basic Protocol) � Updates become visible any time and will persist � No lost update on page level No lost update on page level � Atomicity � All or no updates of a transaction become visible � Monotonic reads, Read your writes, Monotonic writes, ... � Strong Consistency g y � database-style consistency (ACID) via OCC June 18, 2008 Tim Kraska/ETH Zurich/tim.kraska@inf.ethz.ch 11

  14. Basic Protocol: Queues � One PU and Lock queue is O PU d L k i associated to each page Client 1 Client 2 Client M � Lock queues contain exactly Lock queues contain exactly one message (inserted directly after creating the queue) Que Lo eues � Commit to pages in two phases lock lock lock lock l k l k l k l k l lock k ck Pend Queues Q ing Update e age N age 1 age 2 age 3 age 4 S3 Pa Pa Pa Pa Pa June 18, 2008 Tim Kraska/ETH Zurich/tim.kraska@inf.ethz.ch 12

  15. Basic Protocol Step 1: Commit St 1 C it Client 1 Client 2 Client M Clients commit update log log log log log log log records to PU-Queues d t PU Q log log l log Que Lo eues l lock lock lock lock k l k l k l k l lock k ck Pend Q Queues ing Update e age N age 1 age 2 age 3 age 4 S3 Pa Pa Pa Pa Pa June 18, 2008 Tim Kraska/ETH Zurich/tim.kraska@inf.ethz.ch 12

  16. Basic Protocol Step 1: Commit St 1 C it Client 1 Client 2 Client M Clients commit update log log log log log log log records to PU-Queues d t PU Q log log l log Que Lo eues l lock lock lock lock k l k l k l k l lock k ck Pend Q Queues ing Update e age N age 1 age 2 age 3 age 4 S3 Pa Pa Pa Pa Pa June 18, 2008 Tim Kraska/ETH Zurich/tim.kraska@inf.ethz.ch 12

  17. Basic Protocol Step 1: Commit St 1 C it Client 1 Client 2 Client M Clients commit update records to PU-Queues d t PU Q � Commit of the transaction Que Lo eues � T Transaction is finished i i fi i h d lock lock lock lock l k l k l k l k lock l k ck Pend Queues Q ing Update log log log log log log log e age N age 1 age 2 age 3 age 4 S3 Pa Pa Pa Pa Pa June 18, 2008 Tim Kraska/ETH Zurich/tim.kraska@inf.ethz.ch 12

Recommend


More recommend