Pond – The Ocean Store Prototype Pond – The Ocean Store Prototype Overview Pond – The Ocean Store – Goals Prototype – Features – Design Presented By Jon Hess – Implementation cs294-4 Fall 2003 – Experimental Results Pond – The Ocean Store Prototype Pond – The Ocean Store Prototype Key Features Goals – A Distributed File System Offering – Location Independent Routing – Incremental Scalability • Tapestry • More servers translates to more available data – Byzantine Update Agreement – Secure Sharing • For management of the inner ring • Access Control – Push based cache correction – Long term durability • Overlay locality aware multi-cast network • With high probability data should not be able to – Continuous archiving leave the system • Erasure codes
Pond – The Ocean Store Prototype Pond – The Ocean Store Prototype Design The Data Object – Two tier network • Can be thought of as corresponding to a File • Is composed of immutable versions • Upper tier composed of well connected powerful servers • Each version Is broken Into B-tree of blocks – Serialize changes to data • Is referenced by an AGUID • Lower tier composed of user workstations – Versions by VGUID – Cache data – Blocks by BGUID – Archive data • Can be conditionally operated on – Read / Write data Pond – The Ocean Store Prototype Pond – The Ocean Store Prototype Data Object - AGUID Data Object - AGUID Previous Version Newest Version Version – VGUID Version – VGUID Version - VGUID MD BGUID MD BGUID MD BGUID IB IB IB IB IB
Pond – The Ocean Store Prototype Pond – The Ocean Store Prototype • Retrieving Data • Controlling Data – AGUID: secure hash of name and public key – Primary Replica • Publishes AGUID to VGUID mappings – Contact primary replica to find VGUID – Digitally signs – From the VGUID retrieve BGUID’s • Enforces access control – Copy the block data to the local system • Serializes writes – Join the dissemination tree • Pushes cache updates • Act as a cached copy • Archives data Pond – The Ocean Store Prototype Pond – The Ocean Store Prototype • Writing data Archive Servers Primary Replica Writer – Send a request to the primary replica Erasure – Replica verifies credentials – Checks predicates – Creates new VGUID and then associates data – Pushes update down dissemination tree Caching Readers
Pond – The Ocean Store Prototype Pond – The Ocean Store Prototype • Archiving Data With Erasure Codes • Primary Replica – The Inner Ring – Divides data into N chunks – Byzantine internal decisions – Encodes chunks to M erasure blocks – Decisions published with by public key – M > N • Each node has a fraction of the private key – Any N of the M blocks is sufficient for reconstruction • Enough fractions to prove a Byzantine agreement – Located by erasure block number and BGUID. was reached are required to sign a decision – How does one know the BGUID? • The AGUID is unavailable? Pond – The Ocean Store Prototype Pond – The Ocean Store Prototype • Inner Ring – Changing Nodes • The Responsible Party – Byzantine decision – Publishes node statistics • Decides to elect – Used to nominate nodes to inner ring • Decides Who to elect – Has no say over the actions of the inner rings • Chooses the key set – There could be many of them – Old keys are deleted – Being compromised would not destroy the • By Byzantine assumption, conspiring nodes do not network have enough keys to publish
Pond – The Ocean Store Prototype Pond – The Ocean Store Prototype Storage Overhead • Implementation of the Pond Prototype – B-Tree dominates cost – Pros of small files • 50,000 lines of Java – Convergence at 32KB • Event based between modules – Erasure Codes add 4.8x storage penalty • Some modules are pluggable • Highly portable – Cons • Garbage collector ‘Stops The World’ Pond – The Ocean Store Prototype Pond – The Ocean Store Prototype Write Latency Components Write Throughput – For small updates – Increasing data size amortizes signature • Computing the signature dominates time – For large updates – Approaches 8MB/s as Tests are local to minimize • Computing the erasure block size grows network’s effect fragments dominate – With archiving enabled • Performance peaks at 2.6MB/s
Pond – The Ocean Store Prototype Pond – The Ocean Store Prototype Propagation Efficiency Andrew Benchmark – As Replicas Increase – WAN • Network economy • Read Performance becomes more efficient – Up to 4.6x better – Less high RTT links • Write Performance are used – Up to 7.3x worse – Tests are with 10, 20, – LAN and 50 replicas • Read Performance • This is 2%, 4% and – From 2x to 3x worse 10%! of the network • Write Performance • Are these number likely – From 8x to 80x worse to occur in practice? Are these tradeoffs acceptable? Pond – The Ocean Store Prototype Questions?
Recommend
More recommend