Optimizing Hash-based Distributed Storage Using Client Choices Peilun Li and Wei Xu Institute for Interdisciplinary Information Sciences, Tsinghua University
Data Placement Design #1 • Centralized management: GFS, HDFS, … data name → server name Data Server server name → server IP Name Server Data Server Client Data Data Server 2
Data Placement Design #2 • Hash-based distributed management: Ceph, Dynamo, FDS, … server name → server IP Data Server Monitor Server server name → Data Server server IP Hash function Data Server Data Data Server Name Name Client 3
Pros and Cons of Different Designs Pros Cons Centralized Global performance Centralized name server can become Management optimization. bottleneck. Fixed placement makes it hard to do optimization. Hash-based Avoid centralized server Distributed bottleneck. Some optimization is vulnerable to Management change of lower-level storage architectures. 4
Motivation • We want to use server information to improve system performance in hash-based distributed management. • Static information: network structure, failure domain, … • Dynamic information: latency, memory utilization, … • We want a flexible system so that new optimizations for specific applications can be added easily. • Do not want to redesign the whole placement algorithm or hash function. 5
Solution: Multiple Hash Functions server name → server IP Hash Server 1 Server 1 Function 1 Hash Server 2 Policy Server 2 Data Server 2 Function 2 Hash Server 3 Server 3 Function 3 Client 6
Solution: Multiple Hash Functions • We can use multiple hash functions to provide multiple choices, and choose the best one with a fixed policy. • Different servers provide different performance. • A performance requirement or even a specific application can have their own optimization policy. • Easy to be implemented as an independent module. 7
How does Write Work Now? Server 1 Choice Cache Write-Query Server 2 Write Data Client No data & Performance Multi-hash Server 3 8
How does Read Work Now? Server 1 Choice Cache Read-Query Server 2 Read Data Client Has data Multi-hash Server 3 9
Simple Server • Gather server performance metrics. • CPU/memory/disk utilization, average read/write latency, unflushed journal size, … • Answer client probing. • Check whether the requested data exist on this server or not. • Piggyback server metrics with probing results. 10
Clever Client • Provide multiple choices. • Probe server choices before the first access. • Make a choice if need to write new data. • Cache the choice after the first access. 11
Making the Best Choice • A policy gets server information as input and output the best choice. • Example policies: 12
Implementation • We implement it based on Ceph. • About 140 lines of C++ codes for server module. • Easy to be implemented on other systems. • Only support block device interface now. • It ensures that only one client is accessing the block device data. 13
Evaluation Setup • Testbed cluster. • 3 machines. • 15*4TB hard drives • 2*12 cores 2.1GHz Xeon CPU • 128 GB memory • 10Gb NIC. • Workloads are generated with librbd engine of FIO. 8 images are read/written with 4MB block size concurrently on the same machine. • Production cluster. • 44 machines. • 4*4TB hard drives and 256GB SSD. • 2 10Gb NICs. • Workloads are generated with webserver module of FileBench. • The number of choice is fixed to 2. 14
Policy space Saves Disk Space • space chooses the server with most free space to store data. • A hash-based storage system is full when there is one full disk. Evaluation of space 96% 100% Disk capacity utilization 73% 80% 60% 40% 20% 0% baseline space 15
Policy local Reduces Network Bottleneck • local chooses the closest server to store data. • Can save cross-rack network bandwidth. Evaluation of local on testbed Evaluation of local on production cluster 12947.2 2500 14000 12000 2000 Throughput (MS/s) Throughput (MB/s) 10000 7963.1 1500 8000 6000 1000 4000 500 2000 0 0 1 2 3 4 5 6 7 8 9 101112131415161718192021222324 baseline local baseline local 16
Policy memory Improves Read Throughput • memory chooses the server with the most free memory. • Coexist with other running programs • More free memory => more file systems buffer => better read perf. Evaluation of memory 1600 1400 Throughput (MB/s) 1200 1000 800 600 400 200 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 baseline memory 17
Inefficient Policies • Policies cpu , latency , and journal do not work well. 1700 1000 1000 900 900 1650 800 800 Throughput (MS/s) Throughput (MB/s) Throughput (MB/s) 1600 700 700 600 600 1550 500 500 1500 400 400 300 300 1450 200 200 1400 100 100 1350 0 0 1 3 5 7 9 11 13 15 17 19 21 23 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 baseline cpu baseline latency baseline journal 18
Why are They Inefficient? • The Ceph server is not CPU intensive under this hardware configuration. • Queue-based transient metrics, e.g. unflushed journal size, changes too fast, so we can not have a consistent measurement. • However, applying ineffective policies still provide similar performance of the baseline! 19
Summary of Different Policies • General improvement: Policy Performance Change Improvement local 1545 MB/s → 1900 MB/s 23.0% memory 778 MB/s → 1403 MB/s 80.3% space 73% → 96% 31.5% cpu 1545 MB/s → 1513MB/s -1.9% latency 402 MB/s → 396MB/s -1.5% journal 402MB/s → 396MB/s -1.5% 20
Probing Overhead • The most significant overhead is server probing. 4MB sequential write 4KB random write 100 100 2 choices 2 choices no probing no probing 90 90 80 80 70 70 60 60 Percentile Percentile 50 50 40 40 30 30 20 20 10 10 0 0 20 40 60 80 100 120 140 160 50 100 150 200 250 300 21 Latency (ms) Latency (ms)
Discussion about Probing Overhead • It has 2.7ms average latency overhead for probing because of an extra round trip time. • Latency is increased by 2.7% for large sequential write and 6.9% for small random write. • The probing is only done in the first access at a client. • The overhead is distributed to all subsequent accesses of an object. 22
Future Work • Develop more advanced choice policies based on multiple metrics. • Provide an application-level API, so the application itself can make the choices. • Exploring different ways to collaboratively cache the choice information, in order to reduce the number of probing. 23
Conclusion • Hash-based design in distributed systems can be flexible as well. • Statistic optimization with best efforts can be both simple and efficient. • Without significant queueing effects, the power of two may not work well in a real computer system. 24
Thank You We are hiring: faculty members, postdocs in any CS field contact: weixu@tsinghua.edu.cn 25
Recommend
More recommend