with GPU in Hybrid Storage Systems Prince Hamandawana, Awais Khan, - PowerPoint PPT Presentation

Accelerating the Data Deduplication Performance with GPU in Hybrid Storage Systems Prince Hamandawana, Awais Khan, Changgyu Lee , Sungyong Park, Youngjae Kim Department of Computer Science and Engineering Sogang University, Seoul, Republic of Korea PDSW-DISCS 17 WIP session November 13, 2017, Denver, USA Laboratory for Advanced System Software 1

Inline Deduplication in Cloud Storage System  To achieve high space utilization in Tiered Cloud Storage System, following techniques are discussed in community 1. Compression 2. Erasure Coding  Can’t remove replicated data across cluster  Difficult to deploy inline mode 3. Inline Data Deduplication  Higher Storage Efficiency by removing replicated data across cluster  Eliminating duplicated data in Cache tier  But, overhead of inline deduplication directly affects to performance.  In Hybrid Storage system, Cache-tier nodes equip SSDs and inline deduplication can reduce amount of writes to SSD. → Lower Write Amplification, Longer Lifetime 2

Inline Deduplication Framework on Ceph CRUSH Algorithm Object Fingerprint Fingerprint Index Index Cache Node #1 Cache Node #1 Cache Tier (SSD) Storage Tier (SSD) Storage Node #1 Storage Node #2 Storage Node #3 3

Inline Deduplication Framework on Ceph Fingerprint Fingerprint Index Index Chunking 1 2 3 4 Object Cache Node #1 Cache Node #1 Cache Tier (SSD) Storage Tier (SSD) Storage Node #1 Storage Node #2 Storage Node #3 4

Inline Deduplication Framework on Ceph Fingerprint Fingerprint Index Index Fingerprinting 1 2 3 4 1 2 3 4 Cache Node #1 Cache Node #1 Cache Tier (SSD) Storage Tier (SSD) Storage Node #1 Storage Node #2 Storage Node #3 5

Inline Deduplication Framework on Ceph Not Duplicate Fingerprint Fingerprint Index Index Deduplication 1 2 3 4 Check 1 2 3 4 Cache Node #1 Cache Node #1 Cache Tier (SSD) Storage Tier (SSD) Storage Node #1 Storage Node #2 Storage Node #3 6

Inline Deduplication Framework on Ceph Duplicate Fingerprint Fingerprint Index Index Deduplication 2 3 4 Check 2 3 4 Cache Node #1 Cache Node #1 Cache Tier (SSD) Storage Tier (SSD) 1 Storage Node #1 Storage Node #2 Storage Node #3 7

Inline Deduplication Framework on Ceph Increase Reference Count Fingerprint Fingerprint Index Index Deduplication 3 4 Check 3 4 Cache Node #1 Cache Node #1 Cache Tier (SSD) Storage Tier (SSD) 1 Storage Node #1 Storage Node #2 Storage Node #3 8

Fingerprint Overhead and GPU Acceleration  Deduplication overhead consists of  Chunking  Calculating Fingerprint  Fingerprint Query  We observed fingerprint overhead is more than 70% in total deduplication overhead.  To reduce fingerprinting overhead, we propose to use GPU Acceleration for fingerprinting. 9

Accelerating Fingerprint Calculation with GPU Fingerprint Fingerprint Index Index GPU 1 2 3 4 Fingerprinting GPU Cache Node #1 Cache Node #1 Cache Tier (SSD) Storage Tier (SSD) Storage Node #1 Storage Node #2 Storage Node #3 10

Accelerating Fingerprint Calculation with GPU Fingerprint Fingerprint Index Index GPU Fingerprinting 1 2 3 4 1 2 3 4 GPU Cache Node #1 Cache Node #1 Cache Tier (SSD) Storage Tier (SSD) Storage Node #1 Storage Node #2 Storage Node #3 11

Accelerating Fingerprint Calculation with GPU Fingerprint Fingerprint Index Index GPU Fingerprinting 1 2 3 4 GPU 1 2 3 4 Cache Node #1 Cache Node #1 Cache Tier (SSD) Storage Tier (SSD) Storage Node #1 Storage Node #2 Storage Node #3 12

Experiment Setup  Ceph Jewel v10.2.5  CUDA Toolkit 8.0  4 OSD server  Intel Xeon ES-2640 v3 @ 2.60GHz  32GB memory  12GB NVIDIA Tesla K80 GPU  2 SSDs (Cache Tier), 4 HDD (Storage Tier)  Ceph RBD Client  Total 1GB size random 4MB writes using fio benchmark 13

Preliminary Results 20 18  GPU Fingerprinting reduced 16 about 65% of fingerprint 14 Total Time (sec) overhead. 65% Reduced 12 10 8  Total Deduplication overhead 6 is reduced to 52%. 4 2 0 CPU GPU CPU GPU CPU GPU CPU GPU 128 256 512 1024 Chunk Size (KB) Chunking Fingerprint Fingerprint Query 14

Q&A  Contact: Changgyu Lee (changgyu@sogang.ac.kr) Department of Computer Science and Engineering Sogang University, Seoul, Republic of Korea 15

with GPU in Hybrid Storage Systems Prince Hamandawana, Awais Khan, - PowerPoint PPT Presentation

Accelerating the Data Deduplication Performance with GPU in Hybrid Storage Systems Prince Hamandawana, Awais Khan, Changgyu Lee , Sungyong Park, Youngjae Kim Department of Computer Science and Engineering Sogang University, Seoul, Republic of

Hybrid Construction Hybrid Construction Hybrid Construction Hybrid Construction 1 VP

Status of GPU offloading on Wayland Axel Davy FOSDEM 2014 Status of GPU offloading on Wayland

Motivation to Learn GPGPU Julius Parulek Why to Learn About GPU? Computational power of GPU vs.

Model Predictive Control Model Predictive Control of Hybrid Systems of Hybrid Systems Model

Web/CD Hybrid Model Web/CD Hybrid Model Web/CD Hybrid Model Web/CD Hybrid Model for t he Dist

Hybrid Automobiles Hybrid Automobiles It switches easily between fuel, batteries, or both It

Hybrid SAN & Cluster Enterprise Network Storage Hikvision Enterprise Network Storage

GLUSTER The storage for your Hybrid Cloud Amar Tumballi, Manager, Storage Engineering

UNIFIED MEMORY ON PASCAL AND VOLTA Nikolay Sakharnykh - May 10, 2017 1 HETEROGENEOUS

Advancements in V-Ray RT GPU Vlado Koylazov, CTO & Co-founder Blagovest Taskov, RT GPU Team

> SUN STORAGE 7000 UNIFIED STORAGE SYSTEMS ITS TIME TO CHANGE YOUR STORAGE

GPU ACCELERATION OF CHOLMOD: BATCHING, HYBRID AND MULTI-GPU Steve Rennich, Darko Stosic, Tim

Super GPU & Super Kernels: Make programming of multi-GPU systems easy Michael Frumkin, May 8,

Efficient Large-Scale Graph Processing on Hybrid CPU and GPU Systems Abdullah Gharaibeh, Elizeu

A Simulation-based Evaluation of a Hybrid Storage System combining P2P, F2F, and Cloud storage

EXPO REAL Hybrid Summit Your virtual exhibition EXPO REAL Hybrid Summit The Hybrid Conference

Clock Around the Clock Time-Based Device Fingerprinting Iskander Sanchez-Rola, Igor Santos,

Need for Classification Classification required To isolate traffic of interest

Fingerprinting the datacenter: automated classification of performance crises Peter Bodk 1,3 ,

BIAS BIAS Biometric Identity Assurance Services 6 March 2009 Catherine Tilton W3C Workshop on

Online Trust and Digital Certificates: Tech Tutorial Edward W. Felten Professor of Computer

RDKit (cheminformatics) Neo4j Integration Mentors: Christian Pilger (BASF) Presenter - Evgeny

Beauty and the Beast: Diverting modern web browsers to build unique browser fingerprints Pierre

TLS Fingerprinting Techniques Zlatina Gancheva advised by Patrick Sattler, Lars Wstrich Friday

with GPU in Hybrid Storage Systems Prince Hamandawana, Awais Khan, - PowerPoint PPT Presentation

Accelerating the Data Deduplication Performance with GPU in Hybrid Storage Systems Prince Hamandawana, Awais Khan, Changgyu Lee , Sungyong Park, Youngjae Kim Department of Computer Science and Engineering Sogang University, Seoul, Republic of

Hybrid Construction Hybrid Construction Hybrid Construction Hybrid Construction 1 VP

Status of GPU offloading on Wayland Axel Davy FOSDEM 2014 Status of GPU offloading on Wayland

Motivation to Learn GPGPU Julius Parulek Why to Learn About GPU? Computational power of GPU vs.

Model Predictive Control Model Predictive Control of Hybrid Systems of Hybrid Systems Model

Web/CD Hybrid Model Web/CD Hybrid Model Web/CD Hybrid Model Web/CD Hybrid Model for t he Dist

Hybrid Automobiles Hybrid Automobiles It switches easily between fuel, batteries, or both It

Hybrid SAN &amp; Cluster Enterprise Network Storage Hikvision Enterprise Network Storage

GLUSTER The storage for your Hybrid Cloud Amar Tumballi, Manager, Storage Engineering

UNIFIED MEMORY ON PASCAL AND VOLTA Nikolay Sakharnykh - May 10, 2017 1 HETEROGENEOUS

Advancements in V-Ray RT GPU Vlado Koylazov, CTO &amp; Co-founder Blagovest Taskov, RT GPU Team

&gt; SUN STORAGE 7000 UNIFIED STORAGE SYSTEMS ITS TIME TO CHANGE YOUR STORAGE

GPU ACCELERATION OF CHOLMOD: BATCHING, HYBRID AND MULTI-GPU Steve Rennich, Darko Stosic, Tim

Super GPU &amp; Super Kernels: Make programming of multi-GPU systems easy Michael Frumkin, May 8,

Efficient Large-Scale Graph Processing on Hybrid CPU and GPU Systems Abdullah Gharaibeh, Elizeu

A Simulation-based Evaluation of a Hybrid Storage System combining P2P, F2F, and Cloud storage

EXPO REAL Hybrid Summit Your virtual exhibition EXPO REAL Hybrid Summit The Hybrid Conference

Clock Around the Clock Time-Based Device Fingerprinting Iskander Sanchez-Rola, Igor Santos,

Need for Classification Classification required To isolate traffic of interest

Fingerprinting the datacenter: automated classification of performance crises Peter Bodk 1,3 ,

BIAS BIAS Biometric Identity Assurance Services 6 March 2009 Catherine Tilton W3C Workshop on

Online Trust and Digital Certificates: Tech Tutorial Edward W. Felten Professor of Computer

RDKit (cheminformatics) Neo4j Integration Mentors: Christian Pilger (BASF) Presenter - Evgeny

Beauty and the Beast: Diverting modern web browsers to build unique browser fingerprints Pierre

TLS Fingerprinting Techniques Zlatina Gancheva advised by Patrick Sattler, Lars Wstrich Friday

Hybrid SAN & Cluster Enterprise Network Storage Hikvision Enterprise Network Storage

Advancements in V-Ray RT GPU Vlado Koylazov, CTO & Co-founder Blagovest Taskov, RT GPU Team

> SUN STORAGE 7000 UNIFIED STORAGE SYSTEMS ITS TIME TO CHANGE YOUR STORAGE

Super GPU & Super Kernels: Make programming of multi-GPU systems easy Michael Frumkin, May 8,