Storage Deduplication in Cloud Computing João Paulo and José Pereira University of Minho July 2010 João Paulo and José Pereira Storage Deduplication in Cloud Computing
Cloud Computing Overview Cloud Computing Cloud services allow clients to shift their data and applications into the “cloud“. These services run in a scalable and dependable infrastructure, which has a large server pool in several data centres. Virtualization Virtualization is a key aspect to achieve the Elasticity provided by cloud computing. Virtual Machines (VMs) can be deployed/migrated in few minutes. VMs Isolation allows a better management of resources and failures. João Paulo and José Pereira Storage Deduplication in Cloud Computing
Cloud Computing Overview Cloud Computing Cloud services allow clients to shift their data and applications into the “cloud“. These services run in a scalable and dependable infrastructure, which has a large server pool in several data centres. Virtualization Virtualization is a key aspect to achieve the Elasticity provided by cloud computing. Virtual Machines (VMs) can be deployed/migrated in few minutes. VMs Isolation allows a better management of resources and failures. João Paulo and José Pereira Storage Deduplication in Cloud Computing
Deduplication Cloud services store client’s data, applications and VMs images. Deduplication allows to: Decrease storage’s size. Optimize the management of storage’s data. Deduplication introduces overhead to the service. João Paulo and José Pereira Storage Deduplication in Cloud Computing
Outline Shared Storage Deduplication 1 Experimental Evaluation - Preliminary Results 2 Conclusions 3 Future Work and Challenges 4 João Paulo and José Pereira Storage Deduplication in Cloud Computing
Shared Storage Deduplication Scenario VM VM VM VM VM VM VM VM VM VM VM VM Groups of VMs run in different physical machines. Each VM has its own virtual disk. Virtual disks are kept in a shared storage. João Paulo and José Pereira Storage Deduplication in Cloud Computing
Shared Storage Deduplication XEN Blktap mechanism Blktap Implemented within Xen. Allows to implement virtual block devices for Virtual Machines. User-level disk I/O interface (Tapdisk). Allows to have independent per-disk handler processes. Easy to implement Copy-on-Write. João Paulo and José Pereira Storage Deduplication in Cloud Computing
Shared Storage Deduplication XEN Blktap mechanism Physical Machine 1 Physical Machine 2 VM1 VM2 VM3 Tap Tap Tap aio aio aio VM1 VM2 VM3 Disk Disk Disk João Paulo and José Pereira Storage Deduplication in Cloud Computing
Shared Storage Deduplication XEN Blktap mechanism Physical Machine 1 Physical Machine 2 VM1 VM2 VM3 Read/ write Tap Tap Tap aio aio aio VM1 VM2 VM3 Disk Disk Disk João Paulo and José Pereira Storage Deduplication in Cloud Computing
Shared Storage Deduplication XEN Blktap mechanism Physical Machine 1 Physical Machine 2 VM1 VM2 VM3 Read/ write Tap Tap Tap aio aio aio VM1 VM2 VM3 Disk Disk Disk João Paulo and José Pereira Storage Deduplication in Cloud Computing
Shared Storage Deduplication XEN Blktap mechanism Physical Machine 1 Physical Machine 2 VM1 VM2 VM3 Read/ write Tap Tap Tap aio aio aio VM1 VM2 VM3 Disk Disk Disk João Paulo and José Pereira Storage Deduplication in Cloud Computing
Shared Storage Deduplication Deduplication Challenges Deduplication is usually used for backup scenarios where data is practically immutable. In a virtualized scenario where stored data changes constantly, we must have in account: The overhead introduced by the deduplication algorithm. The best approach to find duplicated data, which must be transparent to the VMs. The metadata needed to share identical data. João Paulo and José Pereira Storage Deduplication in Cloud Computing
Shared Storage Deduplication Deduplication Algorithm Physical Machine 1 DHT Physical Machine 2 VM1 VM2 VM3 Tap Extend Tap disk Server disk Shared Storage João Paulo and José Pereira Storage Deduplication in Cloud Computing
Shared Storage Deduplication Deduplication Algorithm Physical Machine 1 DHT Physical Machine 2 VM1 VM2 VM3 Read/ write Tap Extend Tap disk Server disk Shared Storage João Paulo and José Pereira Storage Deduplication in Cloud Computing
Shared Storage Deduplication Deduplication Algorithm Physical Machine 1 DHT Physical Machine 2 VM1 VM2 VM3 V‐>P … V‐>P … V‐>P Read/ … … write … … Tap Extend Tap disk Server disk Shared Storage João Paulo and José Pereira Storage Deduplication in Cloud Computing
Shared Storage Deduplication Deduplication Algorithm Physical Machine 1 DHT Physical Machine 2 VM1 VM2 VM3 V‐>P … V‐>P … V‐>P Read/ … … write … … Tap Extend Tap disk Server disk Shared Storage João Paulo and José Pereira Storage Deduplication in Cloud Computing
Shared Storage Deduplication Deduplication Algorithm Physical Machine 1 DHT Physical Machine 2 VM1 VM2 VM3 Write Dirty addresses Dirty Dirty addresses addresses Tap Extend Tap disk Server disk Shared Storage João Paulo and José Pereira Storage Deduplication in Cloud Computing
Shared Storage Deduplication Deduplication Algorithm Physical Machine 1 DHT Physical Machine 2 VM1 VM2 VM3 V‐>P … … COW Share Share Tap Extend Tap disk Server disk Shared Storage João Paulo and José Pereira Storage Deduplication in Cloud Computing
Shared Storage Deduplication Deduplication Algorithm Physical Machine 1 DHT Physical Machine 2 VM1 VM2 VM3 V‐>P … … COW Share Share Tap Extend Tap disk Server disk Shared Storage João Paulo and José Pereira Storage Deduplication in Cloud Computing
Shared Storage Deduplication Deduplication Algorithm Physical Machine 1 DHT Physical Machine 2 VM1 VM2 VM3 V‐>P … Hash‐>(Padd,Cont) … COW Share Share Tap Extend Tap disk Server disk Shared Storage João Paulo and José Pereira Storage Deduplication in Cloud Computing
Shared Storage Deduplication Deduplication Algorithm Physical Machine 1 DHT Physical Machine 2 VM1 VM2 VM3 V‐>P … … Free blocks update queue Share Share Tap Extend Tap disk Server disk Shared Storage João Paulo and José Pereira Storage Deduplication in Cloud Computing
Shared Storage Deduplication Deduplication Algorithm Physical Machine 1 DHT Physical Machine 2 VM1 VM2 VM3 Write COW COW Dirty COW addresses Addresses Addresses Tap Extend Tap disk Server disk free free blocks blocks buffer buffer Shared Storage João Paulo and José Pereira Storage Deduplication in Cloud Computing
Shared Storage Deduplication Deduplication Algorithm Physical Machine 1 DHT Physical Machine 2 VM1 VM2 VM3 GC GC Tap Extend Tap disk Server disk free free blocks blocks buffer buffer Shared Storage João Paulo and José Pereira Storage Deduplication in Cloud Computing
Shared Storage Deduplication Deduplication Algorithm Physical Machine 1 DHT Physical Machine 2 VM1 VM2 VM3 Hash‐>(Padd,Cont) GC GC Tap Extend Tap disk Server disk free free blocks blocks buffer buffer Shared Storage João Paulo and José Pereira Storage Deduplication in Cloud Computing
Shared Storage Deduplication Deduplication Algorithm Physical Machine 1 DHT Physical Machine 2 VM1 VM2 VM3 Free blocks queue GC GC Tap Extend Tap disk Server disk free free blocks blocks buffer buffer Shared Storage João Paulo and José Pereira Storage Deduplication in Cloud Computing
Shared Storage Deduplication Deduplication Algorithm Physical Machine 1 DHT Physical Machine 2 VM1 VM2 VM3 Free blocks queue GC/ GC/ Share Share Tap Extend Tap disk Server disk free free blocks blocks buffer buffer Shared Storage João Paulo and José Pereira Storage Deduplication in Cloud Computing
Shared Storage Deduplication Deduplication Algorithm Physical Machine 1 DHT Physical Machine 2 VM1 VM2 VM3 Free blocks queue GC/ Share Tap Extend Tap disk Server disk free free blocks blocks buffer buffer Shared Storage João Paulo and José Pereira Storage Deduplication in Cloud Computing
Shared Storage Deduplication Deduplication Algorithm Physical Machine 1 DHT Physical Machine 2 VM1 VM2 VM3 Free blocks queue GC/ Share Tap Extend Tap disk Server disk free free blocks blocks buffer buffer Shared Storage João Paulo and José Pereira Storage Deduplication in Cloud Computing
Experimental Evaluation - Preliminary Results Outline Shared Storage Deduplication 1 Experimental Evaluation - Preliminary Results 2 Conclusions 3 Future Work and Challenges 4 João Paulo and José Pereira Storage Deduplication in Cloud Computing
Experimental Evaluation - Preliminary Results Evaluated Prototype Physical Machine 1 VM1 VM2 Free blocks Without Distribution and queue Fault Tolerant design. GC/ Two Optimizations: Tap Share disk Set of mutexes for each VM’s Translation table. free blocks VM’s free blocks buffer buffer refilling granularity. Shared Storage João Paulo and José Pereira Storage Deduplication in Cloud Computing
Recommend
More recommend