Parallax Dutch Meyer University of British Columbia dmeyer@cs.ubc.ca
The Plan Virtual Machines and Storage Parallax Feature Overview Technical Design System Evaluation Conclusion
Parallax is a Storage Service
Observations on Naïve storage Virtual machines can be created and destroyed easily - storage can’t VM encapsulation make capturing whole- machine state attractive, but capturing a whole disk image is slow Giving similar VMs similar disk images results in wasted space
Our Research Questions How do we make volume provisioning agile enough to match VM creation? Can we capture whole-disk state at near- continuous granularity? How much data redundancy can we eliminate? How much overhead to do all of this well?
Parallax storage system Use snapshots as a unifying tool for Provisioning new volumes Data sharing Low overhead state capture and backup Allow block-level layout optimization Allow disconnected/degraded operation Compatibility due to VM based architecture operating at the block level
Snapshots Data Protection Low Granularity (eg days) “What if” configuration and testing Backup High Granularity (eg ms) Legal compliance Paranoia Time Travel – By capturing whole-machine state at high frequency, we can revisit previous machine states
Provisioning via Gold Mastering Use snapshots to create a copy of some reference volume, which can be further specialized Requirements include Global availability Efficient operation No hard limits on the number of volumes
Data Sharing Commonly derived disks can share common data Sharing is read-only, COW when data is modified We can further eliminate redundancy by detecting duplicate blocks and deduping them (current focus)
Parallax Implementation Building Virtual Disks Locking and Synchronization Storage Services
System Review Parallax engine is a user-mode tapdisk driver for block management Provides services to any VM sharing the same physical machine Federates across multiple physical machines to share a single volume of storage
Building Virtual Disks Flexibility in block placement is essential to providing disk isolation Parallax uses a radix tree to facilitate this Fixed height Root is linked to a disk image Nodes are disk blocks, containing an array of pointers
Radix Nodes and Trees
Taking A Snapshot
IO Batching Parallax follows the semantics of a physical disk Simultaneous requests may be completed in any order Must retain “crash consistency” Updating radix trees can involve several IO operations Batching becomes essential to maintaining performance Ordering constraints are imposed for crash consistency We use a dependency tracking system to issue writes in the correct order Writes are aggressively pipelined – similar to instruction scheduling
Parallax Implementation Building Virtual Disks Locking and Synchronization Storage Services
Federating Physical Machines All machines share a single disk Some synchronization is required between physical machines Data plane is protected through long lived coarse grained allocation Control plane requires a lock manager
Lock Management Current System has 3 contentious locks Creating a virtual disk Claiming a virtual disk Requesting a new extent In practice these locks are very infrequent It is possible to further limit contention in our design
Parallax Implementation Building Virtual Disks Locking and Synchronization Storage Service
Degraded Operation
Evaluation: Performance System Throughput Per Request Latency
Evaluation: Snapshots Snapshot overhead Storage Overheads
Conclusion We can use VM based encapsulation to extend the services normally provided in a storage stack Despite using several potentially high- overhead techniques, parallax achieves reasonable performance
Future Work Working on deduping, layout optimization Expose features to aware file systems More storage services for VMs: caching, encryption, etc. General release
End of Presentation Thanks! Questions?
Extents We wish to minimize contention for the shared disk The simple approach is to partition the disk into large extents which can be given exclusively to individuals We use a 2GB extent size currently
Translating: Virtual to Physical 01011100 0001 01011100 0001 0101 Root A 0 0 1 1 B 0001 C
Recommend
More recommend