: Taming the Cloud Object Storage Ali Anwar ★ , Yue Cheng ★ , Aayush Gupta † , Ali R. Butt ★ ★ Virginia Tech & † IBM Research – Almaden
Cloud object stores enable cost-efficient data storage Object storage 2
Cloud object store supports various workloads Website Online gaming Object storage Enterprise backup Online video sharing 3
One size does not fit all Replace monolithic object store with specialized fine-grained object stores each launched on a sub-cluster 4
Reason 1: Classification of workloads Applications have different service level requirements, e.g., average latency per request, Website Online gaming queries per second (QPS), and data transfer throughput (MB/s) Object storage Enterprise backup Online video sharing 5
Small objects Website Online gaming Get: 90%, Put: 5%, Get: 5%, Put: 90%, Object storage Delete5:% Delete5:% ~ 1-100 KB 6
Large objects Enterprise backup Online video sharing Object storage Get: 90%, Put: 5%, Get: 5%, Put: 90%, Delete5:% Delete5:% ~ 1-100 MB 7
Reason 2: Heterogeneous resources ¤ Dcenters hosting object stores are becoming increasingly heterogeneous ¤ Hardware to application workload mismatch ¤ Meeting SLA requirement is challenging 8
Outline Introduction Motivation Contribution Design Evaluation 9
Background: Swift object store = Object storage Proxy Storage server nodes 10
Swift: Proxy and Storage servers = Object storage Proxy Storage server nodes 1 2 11
Swift: Ring architecture = Object storage Proxy 3 Storage server nodes 12
Benchmark used: CosBench ¤ COSBench is Intel developed Bench mark to measure C loud O bject S torage Service performance ¤ For S3, OpenStack Swift like object store ¤ Not for file system or block device system ¤ Used to compare different hardware software stacks ¤ Identify bottlenecks and make optimizations 13
Workload used Workload Workload Characteristics Application scenario Object Size Distribution Workload A 1 – 128 KB G: 90% , P: 5%, D:5% Web hosting Workload B 1 – 128 KB G: 5%, P: 90% , D:5% Online game hosting Workload C 1 – 128 MB G: 90% , P: 5%, D:5% Online video sharing Workload D 1 – 128 MB G: 5%, P: 90% , D:5% Enterprise backup 14
Experimental setup for motivational study 1 Gbps Proxy 10 Gbps servers 32 cores 8 cores COSBench Storage 3 SATA SSD/node servers 32 cores 15
Configuration 1 – Default monolithic Round 1 Gbps robin 10 Gbps COSBench 32 cores 8 cores 3 SATA SSD/node 32 cores 16
Configuration 2 – Favors small objects 10 Gbps 1 Gbps COSBench COSBench 32 cores 8 cores Large objects Small objects 3 SATA SSD/node 32 cores 17
Configuration 3 – Favors large objects 1 Gbps 10 Gbps COSBench COSBench 32 cores 8 cores Large objects Small objects 3 SATA SSD/node 32 cores 18
Performance under multi tenant environment – Workload A & B Small objects Large objects Throughput (QPS) 500 Config 1 400 Config 2 300 Config 3 200 100 0 A B C D Workload 19
Performance under multi tenant environment- – Workload A & B Large objects Small objects Throughput (MB/s) 200 Config 1 150 Config 2 100 Config 3 50 0 A B C D Workload 20
Performance under multi tenant environment - latency Small objects Large objects 18 Latency (sec) 15 Config 1 12 Config 2 9 Config 3 6 3 0 A B C D Workload 21
Key Insights ¤ Cloud object store workloads can be classified based on the size of the objects in their workloads ¤ When multiple tenants run workloads with drastically different behaviors, they compete for the object store resources with each other 22
Outline Introduction Motivation Contribution Design Evaluation 23
Contributions ¤ Perform a performance and resource efficiency analysis on major hardware and software configuration opportunities ¤ We design MOS, M icro O bject S torage: ¤ 1) dynamically provisions fine-grained microstores ¤ 2) exposes the interfaces of microstores to the tenants ¤ Evaluate MOS to showcase its advantages 24
Outline Introduction Motivation Contribution Design Evaluation 25
Design criteria for MOS ¤ We studied the effect of three knobs on performance of a typical object store to come up with design rules/ rules of thumb ¤ Proxy Server settings ¤ Storage Server settings ¤ Hardware changes 26
Effect of Proxy server settings Throughput (10 3 QPS) 4 100% Per-node CPU util 100% util 3.2 80% QPS 2.4 CPU util 60% (%) 1.6 40% 0.8 20% 0 0% Large objects 1 2 4 8 16 32 64 2x Proxy workers 2 Throughput (GB/s) Small objects 1.5 10 Gbps NIC bandwidth limit 1 0.5 0 1 2 4 8 16 32 2x Proxy workers 27
Effect of Proxy server settings Throughput (10 3 QPS) 4 100% Per-node CPU util 100% util 3.2 80% QPS 2.4 CPU util 60% (%) 1.6 40% 0.8 20% Large objects 0 0% 1 2 4 8 16 32 64 2x Proxy workers 2 Throughput (GB/s) Small objects 1.5 10 Gbps NIC bandwidth limit 1 0.5 0 1 2 4 8 16 32 2x Proxy workers 28
Effect of Storage server settings Throughput (10 3 QPS) 2.8 1.2 Throughput (GB/s) QPS 2.4 1 GB/s 2 0.8 1.6 0.6 1.2 0.4 0.8 0.2 0.4 0 0 1 2 4 8 16 32 Object storage workers 29
Effect of Storage server settings Throughput (10 3 QPS) 2.8 1.2 Throughput (GB/s) QPS 2.4 1 GB/s 2 0.8 1.6 0.6 1.2 0.4 0.8 0.2 0.4 0 0 1 2 4 8 16 32 Object storage workers Small objects 30
Effect of Storage server settings Large objects Throughput (10 3 QPS) 2.8 1.2 Throughput (GB/s) QPS 2.4 1 GB/s 2 0.8 1.6 0.6 1.2 0.4 0.8 0.2 0.4 0 0 1 2 4 8 16 32 Object storage workers 31
Effect of hardware settings Small objects Large objects 2.5 1.5 HDD HDD 2 1.2 SSD SSD Throughput Throughput (10 3 QPS) (GB/s) 1.5 0.9 1 0.6 0.5 0.3 0 0 1 Gbps 10 Gbps 1 Gbps 10 Gbps 32
Rules of thumb ¤ CPU on proxy serves as the first-priority resource for small-object intensive workloads ¤ Network bandwidth is more important than CPU on proxy for large-object intensive workloads ¤ proxyCores ¡= ¡storageNodes ¡ ∗ ¡coresPerStorageNode ¡ ¡ ¤ BW proxies ¡= ¡ ¡storageNodes ¡ ∗ ¡BW storageNode ¡ ¤ Faster network cannot effectively improve QPS for small-object intensive workloads – use weak network (1 Gbps NICs) with good storage devices (SSD) 33
MOS Design … Load balancer/ Load balancer/ Load balancer/ Load redirector Load redirector Load redirector Workload monitor Workload monitor … … Proxy Proxy Proxy Proxy Proxy Proxy Microstores … … … Object Object Object Object Object Object storage storage storage storage storage storage Microstore 1 Microstore N Object Object Server Server Object Resource Proxy storage storage Resource storage manager Free resource pool Manager MOS setup 34
Resource Provisioning Algorithm ¤ Initially, the algorithm allocates the same amount of resources to each microstore conservatively then use greedy approach for resource allocation ¤ Keep track of free set of resources (including hardware configuration, current load served, and the resource utilization such as CPU and network bandwidth utilization) ¤ Periodically collect monitoring data from each microstore to aggressively increase and linearly decrease resources from each microstore 35
Outline Introduction Motivation Contribution Design Evaluation 36
Preliminary evaluation via simulation – Experimental setup ¤ Compute nodes: - 3 – 32 core machines - 4 – 16 core - 31 – 8 core machines - 12 – 4 core machines ¤ Network: - 18 – 10 Gbps - 32 – 1 Gbps NICs ¤ HDD to SSD ratio was 70% to 30%. 37
Aggregated throughput Small objects Large objects 20 15 Throughput (10 3 QPS) Default Default Throughput (GB/s) MOS static MOS static 16 12 MOS dynamic MOS dynamic 12 9 8 6 4 3 0 0 0 2 4 6 8 10 12 14 16 18 20 0 2 4 6 8 10 12 14 16 18 20 Time (min) Time (min) 38
Aggregated throughput Small objects Large objects 20 15 Throughput (10 3 QPS) Default Default Throughput (GB/s) MOS static MOS static 16 12 MOS dynamic MOS dynamic 12 9 8 6 4 3 0 0 0 2 4 6 8 10 12 14 16 18 20 0 2 4 6 8 10 12 14 16 18 20 Time (min) Time (min) 39
Aggregated throughput Small objects Large objects 20 15 Throughput (10 3 QPS) Default Default Throughput (GB/s) MOS static MOS static 16 12 MOS dynamic MOS dynamic 12 9 8 6 4 3 0 0 0 2 4 6 8 10 12 14 16 18 20 0 2 4 6 8 10 12 14 16 18 20 Time (min) Time (min) 40
Timeline under dynamically changing workloads A B C D Throughput (10 3 QPS) Stage 1 Stage 2 Stage 3 Stage 4 8 5 Throughput (GB/s) 4 6 3 4 2 2 1 0 0 0 100 200 300 400 500 600 700 800 Time (min) 41
Resource utilization timeline CPU Network 1 A A A A 0.75 0.5 1 Utilization (%) Utilization (%) Utilization (%) Utilization (%) B B B B 0.75 0.5 1 C C C C 0.75 0.5 1 D D D D 0.75 0.5 0 100 200 300 400 500 600 700 800 Time (min) 42
Recommend
More recommend