GlobalFS: A Strongly Consistent Multi-Site Filesystem Leandro Pacheco Raluca Halalai Valerio Schiavoni Fernando Pedone Etienne Rivière Pascal Felber RainbowFS Workshop May 3rd, 2017
Distributed applications GlobalFS: A Strongly Consistent Multi-Site Filesystem - Leandro Pacheco 2
Distributed applications GlobalFS: A Strongly Consistent Multi-Site Filesystem - Leandro Pacheco 2
Distributed applications GlobalFS: A Strongly Consistent Multi-Site Filesystem - Leandro Pacheco 2
Distributed applications ? GlobalFS: A Strongly Consistent Multi-Site Filesystem - Leandro Pacheco 2
Distributed applications Distributed Storage GlobalFS: A Strongly Consistent Multi-Site Filesystem - Leandro Pacheco 2
Distributed applications Distributed Storage SQL Databases NoSQL Databases Key-value storage Caches Coordination Systems File Systems GlobalFS: A Strongly Consistent Multi-Site Filesystem - Leandro Pacheco 2
Distributed applications Distributed Storage SQL Databases NoSQL Databases Key-value storage Caches Coordination Systems Easy interoperability File Systems File Systems for existing aplications GlobalFS: A Strongly Consistent Multi-Site Filesystem - Leandro Pacheco 2
Global infrastructure Amazon’s AWS global infrastructure GlobalFS: A Strongly Consistent Multi-Site Filesystem - Leandro Pacheco 3
CAP theorem Weak Consistency Strong Consistency Lower latency Clear semantics and guarantees Higher availability Easier to reason about Possibly incorrect/unexpected Block instead of providing incorrect results results GlobalFS: A Strongly Consistent Multi-Site Filesystem - Leandro Pacheco 4
What is GlobalFS? Geographically distributed filesystem Familiar interface (POSIX) Strong consistency Fault-tolerance through replication Flexible performance through locality GlobalFS: A Strongly Consistent Multi-Site Filesystem - Leandro Pacheco 5
Overall design Separate data and metadata Partial replication Metadata protocol exploiting atomic multicast Causal reads GlobalFS: A Strongly Consistent Multi-Site Filesystem - Leandro Pacheco 6
Separate data and metadata Metadata Immutable data Controls file contents, Variable sized blobs properties and filesystem structure Metadata refers to data blobs 1 | 2 | 3 | 4 | … GlobalFS: A Strongly Consistent Multi-Site Filesystem - Leandro Pacheco 7
Partial replication Immutable data is simple to replicate consistently Metadata is partitioned between replica groups (i.e., partitions) GlobalFS: A Strongly Consistent Multi-Site Filesystem - Leandro Pacheco 8
Partial replication EU US SA GlobalFS: A Strongly Consistent Multi-Site Filesystem - Leandro Pacheco 9
Partial replication EU US / www bin etc home SA alice bob mark GlobalFS: A Strongly Consistent Multi-Site Filesystem - Leandro Pacheco 10
Partial replication EU US / www bin etc home SA alice bob mark US SA EU GlobalFS: A Strongly Consistent Multi-Site Filesystem - Leandro Pacheco 11
Partial replication EU US Global Replication / www bin etc home SA alice bob mark US SA EU GlobalFS: A Strongly Consistent Multi-Site Filesystem - Leandro Pacheco 12
Partial replication EU US Global Replication / www bin etc home SA alice bob mark Local multicast US SA EU - fast updates - local or remote reads GlobalFS: A Strongly Consistent Multi-Site Filesystem - Leandro Pacheco 13
Partial replication EU Global multicast (global replication) US - costly updates - fast local reads Global Replication / www bin etc home SA alice bob mark Local multicast US SA EU - fast updates - local or remote reads GlobalFS: A Strongly Consistent Multi-Site Filesystem - Leandro Pacheco 14
Partial ordering GlobalFS exploits atomic multicast Atomic delivery to groups of processes Partial ordering: messages for different groups don’t have to be ordered betweem themselves Partial ordering is critical for scalability GlobalFS: A Strongly Consistent Multi-Site Filesystem - Leandro Pacheco 15
Architecture Metadata replicas Atomic Send read or update multicast commands Application Client Data store (FUSE) Insert or fetch immutable data GlobalFS: A Strongly Consistent Multi-Site Filesystem - Leandro Pacheco 16
Consistent update operations Step 1 Write data blobs to data store Step 2 Issue a metadata update GlobalFS: A Strongly Consistent Multi-Site Filesystem - Leandro Pacheco 17
Consistent update operations Step 1 Write data blobs to data store Step 2 Issue a metadata update Single-partition Uncoordinated Coordinated multi-partition multi-partition Reply Reply Reply Req Req Req G 1 G 1 G 1 G 2 G 2 G 2 write to file in G 1 write to file in { G 1 , G 2 } move file from G 1 to G 2 Atomic Multicast Execution GlobalFS: A Strongly Consistent Multi-Site Filesystem - Leandro Pacheco 17
Causal read operations Causally related updates are seen in the same order e.g., operations done by the same client GlobalFS: A Strongly Consistent Multi-Site Filesystem - Leandro Pacheco 18
Causal read operations Causally related updates are seen in the same order e.g., operations done by the same client Client A Creates an image cat.jpg Modifies a page pets.html to include the image cat.jpg GlobalFS: A Strongly Consistent Multi-Site Filesystem - Leandro Pacheco 18
Causal read operations Causally related updates are seen in the same order e.g., operations done by the same client Client A Client B Creates an image cat.jpg Opens the pets.html page and finds a broken image reference Modifies a page pets.html to include the image cat.jpg Where is the cat? GlobalFS: A Strongly Consistent Multi-Site Filesystem - Leandro Pacheco 18
Causal read operations Step 1 Contact a metadata replica for a list of blob ids Step 2 Get the data from the data store Approach inspired by vector clocks Vector is composed of one counter per replica group GlobalFS: A Strongly Consistent Multi-Site Filesystem - Leandro Pacheco 19
Evaluation Complete prototype in Java https://github.com/pacheco/GlobalFS Filesystem in Userspace (FUSE) URingPaxos for atomic multicast Global deployment using Amazon EC2 GlobalFS: A Strongly Consistent Multi-Site Filesystem - Leandro Pacheco 20
Maximum throughput by operation GlobalFS throughput 60000 1800 GlobalFS CalvinFS 1600 50000 1400 Operations/sec 40000 1200 1000 30000 800 20000 600 Locality 400 10000 200 0 0 read 1KB local create 1KB local write 1KB glob. create 1KB glob. write 1KB 3 region deployment US west, US east and Europe GlobalFS: A Strongly Consistent Multi-Site Filesystem - Leandro Pacheco 21
Geographical scalability 1 Region 3 Regions 6 Regions 9 Regions Geographical Scalability s p s s p p o o o 1 2 2 8 0 8 7 Ideal 8 0 6 1 6 3 1 0.8 0.6 0.4 0.2 read 1KB create write 1KB Normalized throughput per region as more regions are added 9 regions uses all EC2 regions available at the time GlobalFS: A Strongly Consistent Multi-Site Filesystem - Leandro Pacheco 22
GlobalFS: Summary Strong consistency at global scale Simple and familiar API (POSIX) Flexible performance through partial replication and locality Cheap causal read operations GlobalFS: A Strongly Consistent Multi-Site Filesystem - Leandro Pacheco 23
GlobalFS: Summary Strong consistency at global scale Simple and familiar API (POSIX) Flexible performance through partial replication and locality Cheap causal read operations Thank you! Leandro Pacheco pachecol@usi.ch GlobalFS: A Strongly Consistent Multi-Site Filesystem - Leandro Pacheco 23
Recommend
More recommend