the oceanstore write path
play

The OceanStore Write Path Sean C. Rhea John Kubiatowicz University - PowerPoint PPT Presentation

The OceanStore Write Path Sean C. Rhea John Kubiatowicz University of California, Berkeley June 11, 2002 Introduction: the OceanStore Write Path Introduction: the OceanStore Write Path The Inner Ring Acts as the single point of


  1. The OceanStore Write Path Sean C. Rhea John Kubiatowicz University of California, Berkeley June 11, 2002

  2. Introduction: the OceanStore Write Path

  3. Introduction: the OceanStore Write Path • The Inner Ring – Acts as the single point of consistency for a file

  4. Introduction: the OceanStore Write Path • The Inner Ring – Acts as the single point of consistency for a file – Performs write access control, serialization – Creates archival fragments of new data and disperses them

  5. Introduction: the OceanStore Write Path • The Inner Ring – Acts as the single point of consistency for a file – Performs write access control, serialization – Creates archival fragments of new data and disperses them – Certifies the results of its actions with cryptography

  6. Introduction: the OceanStore Write Path • The Inner Ring – Acts as the single point of consistency for a file – Performs write access control, serialization – Creates archival fragments of new data and disperses them – Certifies the results of its actions with cryptography • The Second Tier – Caches certificates and data produced at the inner ring – Self-organizes into an dissemination tree to share results

  7. Introduction: the OceanStore Write Path • The Inner Ring – Acts as the single point of consistency for a file – Performs write access control, serialization – Creates archival fragments of new data and disperses them – Certifies the results of its actions with cryptography • The Second Tier – Caches certificates and data produced at the inner ring – Self-organizes into an dissemination tree to share results • The Archival Storage Servers – Store archival fragments generated in the Inner Ring

  8. Introduction: the OceanStore Write Path • The Inner Ring – Acts as the single point of consistency for a file – Performs write access control, serialization – Creates archival fragments of new data and disperses them – Certifies the results of its actions with cryptography • The Second Tier – Caches certificates and data produced at the inner ring – Self-organizes into an dissemination tree to share results • The Archival Storage Servers – Store archival fragments generated in the Inner Ring • The Client Machines – Create updates and send them to the inner ring – Wait for responses to come down the dissemination tree 1

  9. Introduction: the OceanStore Write Path (con’t) Archive Inner Ring App App Replica Replica Replica T req Time 1. A client sends an update to the inner ring 2

  10. Introduction: the OceanStore Write Path (con’t) Archive Inner Ring App App Replica Replica Replica T req T agree Time 1. A client sends an update to the inner ring 2. The inner ring performs a Byzantine agreement, applying the update 3

  11. Introduction: the OceanStore Write Path (con’t) Archive Inner Ring App App Replica Replica Replica T req T agree T disseminate Time 1. A client sends an update to the inner ring 2. The inner ring performs a Byzantine agreement, applying the update 3. The results are sent down the dissemination tree and into the archive 4

  12. Write Path Details • Inner Ring uses Byzantine agreement for fault tolerance – Up to f of 3 f + 1 servers can fail – We use a modified version of the Castro-Liskov protocol

  13. Write Path Details • Inner Ring uses Byzantine agreement for fault tolerance – Up to f of 3 f + 1 servers can fail – We use a modified version of the Castro-Liskov protocol • Inner Ring certifies decisions with proactive threshold signatures – Single public (verification) key – Each member has a key share which lets it generate signature shares – Need f + 1 signature shares to generate full signature – Independent sets of key shares can be used to control membership

  14. Write Path Details • Inner Ring uses Byzantine agreement for fault tolerance – Up to f of 3 f + 1 servers can fail – We use a modified version of the Castro-Liskov protocol • Inner Ring certifies decisions with proactive threshold signatures – Single public (verification) key – Each member has a key share which lets it generate signature shares – Need f + 1 signature shares to generate full signature – Independent sets of key shares can be used to control membership • Second Tier and Archive are ignorant of composition of Inner Ring – Know only the single public key – Allows simple replacement of faulty Inner Ring servers 5

  15. � Micro Benchmarks: Update Latency vs. Update Size 140 120 1024 bit keys slope = 0.6 s/MB 512 bit keys 120 100 100 Latency (ms) 80 80 60 slope = 0.6 s/MB 60 40 40 20 20 0 0 0 4 8 12 16 20 24 28 32 Update Size (kB) • Use two key sizes to show effects of Moore’s Law on latency – 512 bit keys are not secure, but are 4 × faster – Gives an upper bound on latency three years from now 6

  16. Micro Benchmarks: Update Latency Remarks • Threshold signatures are expensive – Takes 6.3 ms to generate regular 1024 bit signature – But takes 73.9 ms to generate 1024 bit threshold signature share – (Combining shares takes less than 1 ms)

  17. Micro Benchmarks: Update Latency Remarks • Threshold signatures are expensive – Takes 6.3 ms to generate regular 1024 bit signature – But takes 73.9 ms to generate 1024 bit threshold signature share – (Combining shares takes less than 1 ms) • Unfortunately, this is a mathematical fact of life – Cannot use Chinese Remainder Theorem in computing shares ( 4 × ) – Making individual shares verifiable is expensive

  18. Micro Benchmarks: Update Latency Remarks • Threshold signatures are expensive – Takes 6.3 ms to generate regular 1024 bit signature – But takes 73.9 ms to generate 1024 bit threshold signature share – (Combining shares takes less than 1 ms) • Unfortunately, this is a mathematical fact of life – Cannot use Chinese Remainder Theorem in computing shares ( 4 × ) – Making individual shares verifiable is expensive • Almost no research into performance of threshold cryptography 7

  19. ✁ � Micro Benchmarks: Throughput vs. Update Size 7 Ops/s 80 Total Update Operations per Second MB/s 6 70 Total Bandwidth (MB/s) 5 60 50 4 40 3 30 2 20 1 10 0 2 8 32 128 512 2048 Size of Update (kB) • Using 1024 bit keys, 60 synchronous clients • Max throughput is a respectable 5 MB/s – Berkeley DB through Java can only do about 7.5 MB/s

  20. ✁ � Micro Benchmarks: Throughput vs. Update Size 7 Ops/s 80 Total Update Operations per Second MB/s 6 70 Total Bandwidth (MB/s) 5 60 50 4 40 3 30 2 20 1 10 0 2 8 32 128 512 2048 Size of Update (kB) • Using 1024 bit keys, 60 synchronous clients • Max throughput is a respectable 5 MB/s – Berkeley DB through Java can only do about 7.5 MB/s • But we have a problem with small updates – 13 ops/s is atrocious! 8

  21. Batching: A Solution to the Small Update Problem • What if we could combine many small updates into a single batch ?

  22. Batching: A Solution to the Small Update Problem • What if we could combine many small updates into a single batch ? • Each Inner Ring member – Decides result of each update individually – Generates a signature share over the results of all of the updates

  23. Batching: A Solution to the Small Update Problem • What if we could combine many small updates into a single batch ? • Each Inner Ring member – Decides result of each update individually – Generates a signature share over the results of all of the updates • Saves CPU time – Generating signature shares is expensive

  24. Batching: A Solution to the Small Update Problem • What if we could combine many small updates into a single batch ? • Each Inner Ring member – Decides result of each update individually – Generates a signature share over the results of all of the updates • Saves CPU time – Generating signature shares is expensive • Saves network bandwidth – Each Byzantine agreement requires O (ringsize 2 ) messages

  25. Batching: A Solution to the Small Update Problem • What if we could combine many small updates into a single batch ? • Each Inner Ring member – Decides result of each update individually – Generates a signature share over the results of all of the updates • Saves CPU time – Generating signature shares is expensive • Saves network bandwidth – Each Byzantine agreement requires O (ringsize 2 ) messages • But makes signatures unwieldy – Each signature is now O (batchsize) long – For high throughput, we want batch sizes in the hundreds or thousands 9

  26. Merkle Trees: Making Batching Efficient Key: H = SHA1 (H , H +1 ) H 1 i 2 i 2 i Path 2 Sign: ( n =15, H 1 ) H 2 H 3 H 4 H 5 H H 9 H 15 8 Result 1 Result 2 Result 15 • Build a Merkle Tree over results – Each node is a hash of it’s two children

  27. Merkle Trees: Making Batching Efficient Key: H = SHA1 (H , H +1 ) H 1 i 2 i 2 i Path 2 Sign: ( n =15, H 1 ) H 2 H 3 H 4 H 5 H H 9 H 15 8 Result 1 Result 2 Result 15 • Build a Merkle Tree over results – Each node is a hash of it’s two children • Sign only the tree size and the top hash – To verify Result 2 , need only signature plus

Recommend


More recommend