National Technical University of Athens Design & Implementation of a Portable File Synchronisation Mechanism for a Cloud Storage Environment Supervisor Prof. Nektarios Koziris Assistant Supervisor Dr. Vangelis Koukis Candidate Vasilis Gerakaris 2/9/2015
Table of Contents Future Work . . . . . A Portable File Synchronisation Mechanism for a Cloud Storage Environment 2 of 21 Comparison with existing sofuware Introduction Local deduplication - FUSE Local Block Storage Directory Monitoring Request Qveuing Optimisations Core Classes / API Syncing algorithm Design & Implementation .
Table of Contents Future Work . . . . . A Portable File Synchronisation Mechanism for a Cloud Storage Environment 3 of 21 Comparison with existing sofuware Introduction Local deduplication - FUSE Local Block Storage Directory Monitoring Request Qveuing Optimisations Core Classes / API Syncing algorithm Design & Implementation .
Introduction 4 of 21 . . . . . A Portable File Synchronisation Mechanism for a Cloud Storage Environment . (i) - The problem Important Qvalities Why is it needed? difgerent locations, following certain rules. File Synchronisation : The process of updating files in two or more • Copying files between difgerent computers • Backups ✓ Needs to detect & handle update conflicts/renames/deletions ✓ Needs to be reliable (no errors) ✓ Needs to be efgicient
Introduction 4 of 21 . . . . . A Portable File Synchronisation Mechanism for a Cloud Storage Environment We focus on a more specific aspect of the problem. (i) - The problem (cont) Sofuware designed for that purpose already exists, namely: difgerent locations, following certain rules. File Synchronisation : The process of updating files in two or more . • rsync • ownCloud • Dropbox • Google Drive
Large Similar Files their images and snapshots. . . . . . A Portable File Synchronisation Mechanism for a Cloud Storage Environment 5 of 21 ~okeanos, etc) and there should be a way to efgiciently synchronise (i) - Definition Many VMs are being used on cloud service providers (Amazon AWS, Why are they important? Examples: VM images, VM snapshots Files that satisfy the following two requirements: What are they? . • Are large in size (several GBs) • Have a lot of their data in common
Large Similar Files (ii) - Definition (cont) . . . . . A Portable File Synchronisation Mechanism for a Cloud Storage Environment 5 of 21 . Custom image User B User A Connect Upload Compute Service Object Storage Service Clone Register File Image File Files Snapshot VMs Store Snapshot File
Large Similar Files (iii) - Definition (cont) We can use these similarities to optimise the synchronisation! 5 of 21 A Portable File Synchronisation Mechanism for a Cloud Storage Environment . . . . . . Snapshot t 0 Snapshot t 1 Snapshot t 2
Table of Contents Future Work . . . . . A Portable File Synchronisation Mechanism for a Cloud Storage Environment 6 of 21 Comparison with existing sofuware Introduction Local deduplication - FUSE Local Block Storage Directory Monitoring Request Qveuing Optimisations Core Classes / API Syncing algorithm Design & Implementation .
Syncing algorithm 7 of 21 . . . . . A Portable File Synchronisation Mechanism for a Cloud Storage Environment . (i) - Modification detection Need to know what to do in the following cases: Why we need history data: Faster alternative: Use last modification time as an indicator. Modification detection : Comparison of hash digests ✓ Reliable ✗ Very slow, especially on large files • File exists on both locations and is difgerent • File exists on A but not on B (or vice-versa)
Syncing algorithm No Change Created (ETag = J) Created (ETag = K) Deleted Deleted No Action Deleted No Change Delete B Modified Update B (ii) - Initial algorithm Modified (ETag = J) Modified (ETag = K) (b) Syncing actions based on file states 7 of 21 A Portable File Synchronisation Mechanism for a Cloud Storage Environment . . . . . No Action Created (ETag = J) Created (ETag = J) No Change Change Does not Exist Exists Created Exists Does not Exist Deleted Exists (ETag = J) No Action Exists (ETag = J) Exists (ETag = J) Exists (ETag = K) Modified (a) File change detection between two points in time File replica A File replica B Action No Change No Change . Time T 1 Time T 2 Merge ∗ Merge ∗
Syncing algorithm 3. Detect updates from Remote Directory . . . . . A Portable File Synchronisation Mechanism for a Cloud Storage Environment 7 of 21 2. Detect updates from StateDB (iii) - What we propose 1. Detect updates from Local Directory (Remote) in three steps: successful sync on a local state database (StateDB). Our solution for syncing with a central metadata server Limitations . ✗ Can't detect renames (or worse, renames & modifications) • Store the metadata of all files, as they were during the last • Reconcile local directory replicas (Local) and remote server replicas • FCFS updates on conflicts, with conflicting copies being renamed.
3-step synchronisation yes yes no yes no yes no yes no no New yes no 8 of 21 A Portable File Synchronisation Mechanism for a Cloud Storage Environment . . . . . local file Conflict (i) - Updates from Local Directory File exists phash exists in StateDB? Local modtime == StateDB modtime? inode exists in StateDB? No local change on Remote? on remote? StateDB ETag == Remote Etag? Local modified Local modified Conflict Renamed File exists .
3-step synchronisation no StateDB modtime? Deleted Local modified yes no yes yes modified no 8 of 21 A Portable File Synchronisation Mechanism for a Cloud Storage Environment . . . . . Local modtime == Remote (ii) - Updates from StateDB Local exists, File exists on local/remote? Local exists, Remote exists Local doesn't exist, Remote exists Local doesn't exist, Remote doesn't exist Remote doesn't exist Deleted No change Deleted inode exists in StateDB? Renamed / Deleted Remote ETag == StateDB Etag? .
3-step synchronisation yes . . . . . A Portable File Synchronisation Mechanism for a Cloud Storage Environment 8 of 21 no yes no yes no Conflict (iii) - Updates from Remote Directory modified Remote StateDB modtime? Local modtime == changes No remote file remote New StateDB ETag? Remote ETag == in StateDB? phash exists .
Core Classes / API with the Pithos+ service ofgered by ~okeanos. . . . . . A Portable File Synchronisation Mechanism for a Cloud Storage Environment 9 of 21 . What we have done: example. directories and cloud storage services. API functions are implemented. synchronise files with any cloud storage service, as long as some • Built a cross-platform framework in Python that can be used to • Created abstract classes for representations of files, filesystem • Implemented a class that uses the Synnefo (Pithos) API as an • Created a proof-of-concept application that syncs a local directory
Core Classes / API indexing in the StateDB. Assumed unique . . . . . A Portable File Synchronisation Mechanism for a Cloud Storage Environment 10 of 21 Assumed unique for each file version. for each file path. relative path string. It is used for fast (i) - FileStat represent file objects The core class used in this framework to . FileStat phash: int • phash : The (integer) hash digest of the path: str inode: int modtime: int type: int etag: str • etag : The ETag (sha-256 digest) of the file.
Core Classes / API returns None . . . . . . A Portable File Synchronisation Mechanism for a Cloud Storage Environment 10 of 21 were modified since the last sync. (ii) - LocalDirectory objects. . LocalDirectory sync_dir: str + get_all_objects_fstat() + get_modified_objects_fstat() + get_file_fstat(str path) • get_all_objects_fstat : Returns all local files' metadata as FileStat • get_modified_objects_fstat : Return file metadata only for the files that • get_file_fstat : Returns the FileStat object for the file path if it exists, else
Core Classes / API (iii) - CloudClient . . . . . A Portable File Synchronisation Mechanism for a Cloud Storage Environment 10 of 21 update_object() is used for existing files. upload_object() is used for new files To properly handle race conditions: Closely resembles the OpenStack API (used by synnefo as well). . CloudClient PithosClient pithos: PithosClient + get_object_fstat(str path) + init(str auth_URL, str auth_token, str ca_certs_path) + get_all_objects_fstat() - _modtime_from_remote(dict remote_obj) + download_object(str path, file fd) + upload_object(str rel_path, str sync_dir) - _is_directory_from_remote(dict remote_obj) + update_object(str rel_path, str sync_dir, str etag) - _etag_from_remote(dict remote_obj) + delete_object(str path) - _fstat_from_metadata(dict obj_metadata, str path) + rename_object(str old_path, str new_path)
Table of Contents Future Work . . . . . A Portable File Synchronisation Mechanism for a Cloud Storage Environment 11 of 21 Comparison with existing sofuware Introduction Local deduplication - FUSE Local Block Storage Directory Monitoring Request Qveuing Optimisations Core Classes / API Syncing algorithm Design & Implementation .
Recommend
More recommend