CS555: Distributed Systems [Fall 2019] Dept. Of Computer Science , Colorado State University CS 555: D ISTRIBUTED S YSTEMS [G OOGLE F ILE S YSTEM AND RPC] Shrideep Pallickara Computer Science Colorado State University CS555: Distributed Systems [Fall 2019] November 21, 2019 L26.1 Dept. Of Computer Science , Colorado State University Frequently asked questions from the previous class survey ¨ After a snapshot, if a client seeks to change a chunk how is that handled? ¨ Why is caching files (at the application level) not done? L26. 2 CS555: Distributed Systems [Fall 2019] November 21, 2019 Dept. Of Computer Science , Colorado State University Professor: S HRIDEEP P ALLICKARA L19.1 S LIDES C REATED B Y : S HRIDEEP P ALLICKARA
CS555: Distributed Systems [Fall 2019] Dept. Of Computer Science , Colorado State University Topics covered in this lecture ¨ Google File System ¤ Replication ¤ Consistency in GFS ¤ Deletion of files and garbage collection ¨ RPC ¤ Persistence/transience ¤ Synchronous/asynchronous communications ¤ Parameters in RPC settings L26. 3 CS555: Distributed Systems [Fall 2019] November 21, 2019 Dept. Of Computer Science , Colorado State University Professor: S HRIDEEP P ALLICKARA R EPLICATION CS555: Distributed Systems [Fall 2019] November 19, 2019 L25.4 Dept. Of Computer Science , Colorado State University L19.2 S LIDES C REATED B Y : S HRIDEEP P ALLICKARA
CS555: Distributed Systems [Fall 2019] Dept. Of Computer Science , Colorado State University Reasons why chunk replicas are created ¨ Chunk creation ¨ Re-replication ¨ Rebalancing L26. 5 CS555: Distributed Systems [Fall 2019] November 19, 2019 Dept. Of Computer Science , Colorado State University Professor: S HRIDEEP P ALLICKARA Chunk replica creation ¨ Place replicas on chunk servers with below average disk space utilization ¨ Limit the number of recent creations on a chunk server ¤ Predictor of imminent heavy traffic ¨ Spread replicas across racks L26. 6 CS555: Distributed Systems [Fall 2019] November 19, 2019 Dept. Of Computer Science , Colorado State University Professor: S HRIDEEP P ALLICKARA L19.3 S LIDES C REATED B Y : S HRIDEEP P ALLICKARA
CS555: Distributed Systems [Fall 2019] Dept. Of Computer Science , Colorado State University Re-replicate chunks when replication level drops ¨ How far is it from replication goal ¨ Preference for chunks of live files ¨ Boost priority of chunks blocking client progress L26. 7 CS555: Distributed Systems [Fall 2019] November 19, 2019 Dept. Of Computer Science , Colorado State University Professor: S HRIDEEP P ALLICKARA Rebalancing replicas ¨ Examine current replica distribution and move replicas ¤ Better disk space ¤ Load balancing ¨ Removal of existing replicas ¤ Chunk servers with below-average disk space L26. 8 CS555: Distributed Systems [Fall 2019] November 19, 2019 Dept. Of Computer Science , Colorado State University Professor: S HRIDEEP P ALLICKARA L19.4 S LIDES C REATED B Y : S HRIDEEP P ALLICKARA
CS555: Distributed Systems [Fall 2019] Dept. Of Computer Science , Colorado State University Incorporating a new chunk server ¨ Do not swamp new server with lots of chunks ¤ Concomitant traffic will bog down the machine ¨ Gradually fill up new server with chunks L26. 9 CS555: Distributed Systems [Fall 2019] November 19, 2019 Dept. Of Computer Science , Colorado State University Professor: S HRIDEEP P ALLICKARA C ONSISTENCY IN GFS CS555: Distributed Systems [Fall 2019] November 19, 2019 L25.10 Dept. Of Computer Science , Colorado State University L19.5 S LIDES C REATED B Y : S HRIDEEP P ALLICKARA
CS555: Distributed Systems [Fall 2019] Dept. Of Computer Science , Colorado State University In GFS the state of file region after mutation depends on … ¨ T YPE of the mutation ¨ S UCCESS /F AILURE of the mutation ¨ Whether there were CONCURRENT mutations L26. 11 CS555: Distributed Systems [Fall 2019] November 19, 2019 Dept. Of Computer Science , Colorado State University Professor: S HRIDEEP P ALLICKARA GFS has a relaxed consistency model ¨ Consistent : See the same data ¤ On all replicas ¨ Defined ¤ Clients see mutation writes in its entirety L26. 12 CS555: Distributed Systems [Fall 2019] November 19, 2019 Dept. Of Computer Science , Colorado State University Professor: S HRIDEEP P ALLICKARA L19.6 S LIDES C REATED B Y : S HRIDEEP P ALLICKARA
CS555: Distributed Systems [Fall 2019] Dept. Of Computer Science , Colorado State University File state region after a mutation Write Record Append Serial success defined defined Consistent Concurrent interspersed with but undefined inconsistent success Failure Inconsistent L26. 13 CS555: Distributed Systems [Fall 2019] November 19, 2019 Dept. Of Computer Science , Colorado State University Professor: S HRIDEEP P ALLICKARA Implications for applications ¨ Rely on appends instead of overwrites ¨ Checkpoint ¨ Write records that are ¤ Self-validating ¤ Self-identifying L26. 14 CS555: Distributed Systems [Fall 2019] November 19, 2019 Dept. Of Computer Science , Colorado State University Professor: S HRIDEEP P ALLICKARA L19.7 S LIDES C REATED B Y : S HRIDEEP P ALLICKARA
CS555: Distributed Systems [Fall 2019] Dept. Of Computer Science , Colorado State University D ELETION OF F ILES & G ARBAGE C OLLECTION CS555: Distributed Systems [Fall 2019] November 19, 2019 L25.15 Dept. Of Computer Science , Colorado State University Garbage collection in GFS ¨ After a file is deleted, GFS does not reclaim space immediately ¨ Done lazily during garbage collection at ¤ File and chunk levels L26. 16 CS555: Distributed Systems [Fall 2019] November 19, 2019 Dept. Of Computer Science , Colorado State University Professor: S HRIDEEP P ALLICKARA L19.8 S LIDES C REATED B Y : S HRIDEEP P ALLICKARA
CS555: Distributed Systems [Fall 2019] Dept. Of Computer Science , Colorado State University Master logs a file’s deletion immediately ¨ File is renamed to a hidden name ¤ Includes deletion timestamp ¨ Master scans the file system namespace ¤ Delete if hidden file existed for more than 3 days ¨ When file removed from namespace ¤ In memory metadata is also removed ¤ Severs links to all its chunks! L26. 17 CS555: Distributed Systems [Fall 2019] November 19, 2019 Dept. Of Computer Science , Colorado State University Professor: S HRIDEEP P ALLICKARA Garbage collection: When Master scans its chunk namespace ¨ Identifies orphaned chunks ¤ Not reachable from any file ¨ Erase metadata for these chunks L26. 18 CS555: Distributed Systems [Fall 2019] November 19, 2019 Dept. Of Computer Science , Colorado State University Professor: S HRIDEEP P ALLICKARA L19.9 S LIDES C REATED B Y : S HRIDEEP P ALLICKARA
CS555: Distributed Systems [Fall 2019] Dept. Of Computer Science , Colorado State University The role of heart-beats in garbage collection ¨ Chunk server reports subset of chunks it currently has ¨ Master replies with identity of chunks no longer present ¤ Chunk server free to delete its replica of such chunks L26. 19 CS555: Distributed Systems [Fall 2019] November 19, 2019 Dept. Of Computer Science , Colorado State University Professor: S HRIDEEP P ALLICKARA Stale chunks and issues ¨ If a chunk server fails ¤ A ND misses mutations to the chunk ¤ The chunk replica becomes stale ¨ Working with a stale replica causes problems with: ¤ Correctness ¤ Consistency L26. 20 CS555: Distributed Systems [Fall 2019] November 19, 2019 Dept. Of Computer Science , Colorado State University Professor: S HRIDEEP P ALLICKARA L19.10 S LIDES C REATED B Y : S HRIDEEP P ALLICKARA
CS555: Distributed Systems [Fall 2019] Dept. Of Computer Science , Colorado State University Aiding the detection of stale chunks ¨ Master maintains a chunk version number for each chunk ¤ Distinguish between stale and up-to-date chunks ¨ When master grants a new lease on chunk ¤ Increase version number Occurs BEFORE any ¤ Inform replicas client can write to chunk ¤ Record new version L26. 21 CS555: Distributed Systems [Fall 2019] November 19, 2019 Dept. Of Computer Science , Colorado State University Professor: S HRIDEEP P ALLICKARA If a replica is unavailable its version number will not be advanced ¨ When a chunk server restarts, it reports to the Master with the following: ¤ Set of Chunks ¤ Corresponding version numbers ¨ Used to detect stale replicas ¨ Remove stale replicas in regular garbage collection L26. 22 CS555: Distributed Systems [Fall 2019] November 19, 2019 Dept. Of Computer Science , Colorado State University Professor: S HRIDEEP P ALLICKARA L19.11 S LIDES C REATED B Y : S HRIDEEP P ALLICKARA
CS555: Distributed Systems [Fall 2019] Dept. Of Computer Science , Colorado State University Additional safeguards against stale replicas ¨ Include chunk version number ¤ When client requests chunk information n Client/Chunk server verify version to make sure things are up-to-date ¤ During cloning operations n Clone the most up-to-date chunk ¨ Clients and chunk servers expected to verify versioning information L26. 23 CS555: Distributed Systems [Fall 2019] November 19, 2019 Dept. Of Computer Science , Colorado State University Professor: S HRIDEEP P ALLICKARA D ATA I NTEGRITY CS555: Distributed Systems [Fall 2019] November 19, 2019 L25.24 Dept. Of Computer Science , Colorado State University L19.12 S LIDES C REATED B Y : S HRIDEEP P ALLICKARA
Recommend
More recommend