Data Location Optimization for a Self-Organized Storage System Hannes Mühleisen, Tilman Walther and Robert Tolksdorf 1
[A. Bockoven] 2
[Thomas Schmickl] 3
Brood Sorting - Algorithm item = null; while (true) if (item != null) if (similarity(item,nearbyItems()) > α ) drop(item) item = null else item = min(similarity(nearbyItems() ² )) pickup(item) move() 4
Probabilistic Request Routing #B S5 85% S3 10% 95% 95% S4 50% 25% S1 S6 70% 50% S2 #B? [Lindgren03] 5
Research Question Can brood sorting improve data placement in a large-scale distributed storage system based on probabilistic routing? 6
Some Adaptions • Data is clustered into a limited amount of “buckets” • Movement split up into two phases: • Search phase: Every node periodically generates “profile” of locally stored data and sends it on its way • Response phase: Nodes compare incoming profiles to local stored data, generating movement responses 7
(1) (1) 1 2 3 Profile 8
(2) (2) 1 2 3 9
(3) (3) 1 2 3 ✓ Clean! 10
Evaluation • Cluster of 100 Linux nodes • Two datasets, random & synthetic • 1000 write operations, four phases • Recorded data: • # Data items in network • # Successful movement operations • Bucket amount & size 11
Data Items vs. Move Operations synthetic/100nodes 1e+05 Data Items 2500 Move Operations 8e+04 2000 Move Operations 6e+04 Data Items 1500 1000 4e+04 500 2e+04 0 0 20 40 60 80 100 120 Sample 12
Bucket Amount vs. Average Size synthetic/100nodes 500 200 400 180 Total Amount Average Size 160 300 140 200 120 Total Amount Average Size 100 0 20 40 60 80 100 120 Sample 13
Data Items vs. Move Operations random/100nodes Data Items 4000 Move Operations 80000 3000 Move Operations 60000 Data Items 2000 40000 1000 20000 0 0 50 100 150 200 250 Sample 14
Bucket Amount vs. Average Size random/100nodes 150 8000 6000 100 Total Amount Average Size 4000 50 2000 Total Amount Average Size 0 0 50 100 150 200 250 Sample 15
Conclusion • Brood Sorting works! * * YMMV 16
Thank You! Questions? Web Page: http://hannes.muehleisen.org
Recommend
More recommend