Synchronisation solutions, NDS2 Cryptobox Kamil Guryn, BIAMAN, www.pb.edu.pl Maciej Brzeźniak, PSNC, www.psnc.pl & NDS2 project partners:
Presentation agenda ● Background ● Features of good synchronisation mechanisms ● Discussion of features, example scenarios ● Comparison of existing solutions ● Cryptobox: ● Motivation: why yet another solution? ● Features of Cryptobox ● Live demo
Background ● Some NRENs offer data sync & share solutions ● Some NRENs consider providing them: ● Purchase of hosted services (e.g. NorduNET: box.com) ● Purchase of licenses/support for private/community cloud services: ● ownCloud ● PowerFolder ● Other initiatives ● We need to understand technical aspects of sync solutions ● The aim of this presentation is to: ● Discuss selected features of synchronisation algorithms ● Present algorithms presented in Cryptobox of NDS2project
Steps of the typical synchronisation process: How it works: ○ filespace enumeration ○ detection = comparison of current and past state ○ reconciliation ■ planning propagation ■ conflicts resolution ○ propagation
Synchronization problems - overview ● track changes on the stores: file identifiers support not present everywhere ○ ○ async events reporting mechanism: inotify or FSwatcher ● anchor-based sync needs specialized datastore: ● e.g. simple SFTP server not capable ● need to deal with concurrency during sync which can break anchor-based logic; ● consistency, missing changes, complexity ● conflicts detection and resolving unresolved items may lead to sync endless loop ○ ● concurrency of data access and synchronisation ● locking for sync can interfere with user activities ● concurrent file activities should be supported ● unreliable networks, sync interrupts & errors ● sync endless loops, statedb loss or corruption ● and many more …
Data synchronisation methods and algorithms What is needed to synchronise two data stores: ● change detection: ● client and/or server-side rename/move detection: id-based vs hash-based ○ ○ relative ordering of asynchronous events, e.g.: ■ rename within same dir detected ■ move outside dir = delete + create combination of id/hash and fs events (e.g. Dropbox) ○ ● full-enumeration vs anchor-based change detection ● client-side state database ● needed for change detection ● data transmission ● proprietary vs open protocols
Important features of sync solution ● Intelligent name change / move detection on both sides ● Concurrency resistant during detection and synchronization ● Conflicts resolving ● Large datastores indexing efficiency ● Support for advanced scenarios: ○ graceful cancellation of running sync ○ preview mode of the incremental sync operation without committing changes to the replicas ○ support for sync with partially equal file hierarchies ● Security and privacy: client-side cryptography ○ Wuala and Spideroak does it client-side ○ Dropbox/Box does it server-side (PRISM ….) ○ ownCloud does not plan supporting it ○ Cryptobox does it client-side
Intelligent rename/move detection - synchronization clue Example scenario: ● directory folderX contains 10GB of data ● user on machine A moves folderX to folderZ ● changes synchronized to server as move operation ● in some solutions: ○ changes badly interpreted by sync client on machine B ○ this may result in unnecessary download of the data instead of name change / move propagation ○ while acceptable for small files, may be a killer for big data volumes A B
Intelligent change detection explained: FileID-based vs hash-based FileID-based - lightweight: ○ no need for files’ content analysis ○ does not require a lot of I/O ○ does not load CPU Hash-based - resource consuming: ○ needs files’ contents analysis ○ I/O needed to read files content ○ load on CPU for calculation hashes
Intelligent change detection explained: FileID-based FileID must be provided/supported by (file)system • Works in: • o Windows: NTFS, ReFS o Linux: EXT2/3/4 o OS X: HFS+
Sync issues: FS events analysis (1): ● FS events reporting (inotify, Fswatcher) is helper in detecting changes ● BUT reliable interpretation of users activity is not trivial: ○ Common issues: ○ FS events are reported asynchronously (unordered)! ○ all events have to be analysed to ensure consistency ○ reliable interpretations of events not trivial ○ sync on concurrent environment (opposite concurrent file activities events, opposite folder hierarchical events) ○ Client-side issues: ○ Buffer overflows when many operations in short period of time may lead to missed events … ○ Server-side issues: ○ needs specialized server functionality ○ server-client events notify mechanisms needed to avoid costly full namespace scans
Sync issues: FS events analysis (2): operation event logged rename folder\a.txt -> folder\b.txt rename move folder\a.txt -> folder2\a.txt delete folder\a.txt & create folder2\a.txt copy a.txt to local sync folder create (there is no event about relasing file lock - close a file) Client side: copy a.txt to copy a.txt to folderA\folderB\folderC is not valid anymore, folderA\folderB\folderC should be propagated in correct order (plus concurrency resistance …) Sever-side: rename folderA to folderX Example scenario where event logging fails to provide consistent state
Concurrency during detection and synchronization Scenario: 1. File A.txt is synchronized across replicas 2. Bob updates A.txt on machine1 (client) 3. Alice updates A.txt on server (by updating it on machine2 and performing sync), while Bob changes are currently pending This should not lead to lost of Alice changes.
Conflicts resolving Scenario: 1. Bob creates file A.txt on local machine1 2. Alice also creates A.txt on server (by creating it locally on machine2 and performing sync) while Bob changes are currently pending This leads to name conflict and should be automatically resolved with a no-data loss policy
Security and privacy: client-side cryptography • Support for client-side encryption desired these days: o People concerned about privacy o Some organisation ;) known to analyse our data o AES-256 CTR encryption AES-256 recommended by NIST to 2031 CTR mode recommended by Niels Ferguson and Bruce Schneier o AES-NI supported on increasing number of CPU platforms o File names to be encrypted too! as they may be meaningfull ● Integrity control ○ Lets users verify consistency of the data ○ Enables detecting failures and intended tweaks ○ SHA-512 considered to be safe beyond 2012 (SHA-1 only to 2012) ○ Fast implementations possible using regular CPUs
Synchronisation solution – comparison * Feature NDS ownCloud BOX.com Dropbox Spider Power Cryptobox Oak Folder YES Client & server-side YES NO YES YES NO move detection Real-time change detection YES NO YES YES YES YES (client side only) Files under user control YES YES NO NO NO YES Client-side encryption YES NO NO NO YES NO Concurrency resistant on YES NO YES YES ??? YES (may lead (but goes detection and propagation to chaos) through temp) Sync files of any size YES YES NO YES YES YES (in multiple parts, then merge) Synchronisation of any folder YES YES NO NO YES YES NO file-locking policy YES NO NO YES ??? NO (drobbox way) (sync goes through temp) Preview mode YES NO NO NO NO NO Live-sync NO NO NO YES NO NO * Comparison made to our best knowledge. Based on documentation analysis and tests. If you notice any inconsistency, please contact us.
What is NDSCryptobox? (1) ● Motivation: we were not happy with existing solutions : services, tools, applications and libraries (we tested most of them) ● Issues: Simplistic / unreliable sync algorithms ○ Lack of client&server-side move/rename discovery ○ Lack of client-side encryption ○ ○ Slow detection and propagation of changes Reliability vs concurrency changes issues ○ ○ File-locking policy makes local sync folder harder to use ○ Some solutions not able to pass even simple tests: ■ e.g. multiple folder rename while uploading leads to chaos on local computer ■ concurrent changes occur on one replica during detection and propagation No way to benefit from special features of our NDS2 system: ○ ■ inefficient enumeration makes sync too expensive for storage backend ■ fast & lightweight metadata enumeration needed
What is NDSCryptobox? (2) ● About NDS2 project ○ Secure Storage Cloud with efficient and easy data access ○ More information: ■ nds.psnc.pl ■ NDS2 paper published in the TNC2013 ● NDSCryptobox in NDS2 architecture ○ one of client applications… ○ … accessing the system using SFTP ○ + server-side mechanisms: deltas, enumeration etc.
Recommend
More recommend