BORG: Block-reORGanization for Self-optimizing Storage Systems Medha Bhadkamkar Jorge Guerra Luis Useche Sam Burnett Jason Liptak Raju Rangaswami Vagelis Hristidis Florida International University March 9, 2009 1 / 33
Problem ◮ I/O is the bottleneck � Legacy filesystems favor sequential access. � Realistic workloads are not necessarily sequential ◮ Proposed Solution � Co-locate data based on workload block access patterns � Improve sequentiality 2 / 33
Workload Characteristics that motivate BORG ◮ Workloads � office - browser, OpenOffice applications, gnuplot, etc � developer - emacs, gcc, gdb, etc � Subversion (SVN) server - Sources and document repository � Web server - Department web server ◮ Workloads Statistics Summary Workload File System Total [GB] Total [GB] type size [GB] Reads Writes office 8.29 6.49 0.32 developer 45.59 3.82 10.46 SVN server 2.39 0.29 0.62 web server 169.54 21.07 2.24 3 / 33
Non-uniform Access Frequency Distribution ◮ Frequently accessed data is usually a small portion of the entire data. ◮ Frequently accessed data is spread over entire disk area Workload File System Unique [GB] Unique [GB] Top 20% type size [GB] Reads Writes data access office 8.29 1.63 0.22 51.40 % 45.59 2.57 3.96 60.27 % developer SVN server 2.39 0.17 0.18 45.79 % 169.54 7.32 0.33 59.50 % web server 4 / 33
Non-uniform Access Frequency Distribution Access Frequency The Opportunity Co-locating frequently accessed data can improve I/O performance. 5 / 33
Workload Characteristics - Partial Determinism ◮ Non-sequential accesses repeat in a block access sequence Workload Partial type determinism office 65.42 % developer 61.56 % SVN server 50.73 % web server 15.55 % The Opportunity Using partial determinism information can improve sequentiality of accesses. 6 / 33
Temporal Locality ◮ There is a substantial overlap in the working sets across days. All accesses Top 20% accesses 100 Data access overlap with Day 1 (%) 80 60 40 20 0 Day 1 Day 2 Day 3 Day 4 Day 5 Day 6 Day 7 Days of the week The Opportunity Using information of past I/O activity for optimizing layout can improve performance. 7 / 33
BORG in a nutshell ◮ Uses block access patterns to identify hot block sequences in the workload. ◮ Reorganizes blocks in a separate B ORG OPT imized partition (BOPT) ◮ Assimilates write request in the partition ◮ Operates in the background ◮ Can be dynamically inserted or removed when required ◮ Is independent of filesystems ◮ Maintains consistency by maintaining a persistent page-level indirection map. 8 / 33
System Architecture Application User Kernel VFS Page Cache File Systems (EXT3, JFS...) BORG Layer I/O Scheduler Device Driver Legend: Existing components New components 9 / 33
System Architecture User−space components Application Analyzer Planner User Kernel VFS Page Cache I/O Layout Trace Plan File Systems (EXT3, JFS...) I/O BOPT−space Profiler Reconfigurator BORG Layer I/O Indirector I/O Scheduler Kernel−space components Device Driver Legend: Existing components New components 10 / 33
System Architecture User−space components Application Analyzer Planner User Kernel VFS Page Cache I/O Layout Trace Plan File Systems (EXT3, JFS...) I/O BOPT−space Profiler Reconfigurator BORG Layer I/O Indirector I/O Scheduler Kernel−space components Device Driver Legend: Existing components New components 11 / 33
I/O Profiler ◮ Each I/O operation logged with: � Temporal Attribute: Timestamp � Process-level Attributes: Process ID, name � Block-level attribute: Start LBA, length of I/O, Mode (R/W) Sample Trace [Timestamp] [PID] [Exec.] [StartLBA] [Size] [Mode] 705423195774700 5745 screen 6914207 32 R 705423259644748 5755 utempter 24379775 8 R 705423379492524 5755 utempter 24787567 8 R 705423421266908 5753 bash 7498311 24 R 705423454005104 5755 utempter 24793415 8 R 705423493292648 5753 bash 34543375 64 R 705423565122668 5766 stty 34543439 16 R ... ... ... ... ... ... 12 / 33
System Architecture User−space components Application Analyzer Planner User Kernel VFS Page Cache I/O Layout Trace Plan File Systems (EXT3, JFS...) I/O BOPT−space Profiler Reconfigurator BORG Layer I/O Indirector I/O Scheduler Kernel−space components Device Driver Legend: Existing components New components 13 / 33
Analyzer ◮ Builds a per-process directed, weighted graph ◮ Vertex is the per request LBA range (Start LBA, length) ◮ Edge is a temporal dependency between two ranges ◮ Weights represent frequency of access ◮ Graphs merged into a single master access graph Process graphs Master access graph after merging 1 r 1 :(0 , 3) r 1 :(0 , 1) s 1 :(6 , 1) s 1 :(1 , 6) 1 1 2 1 r 2 :(4 , 2) 1 r 1 , s 1 :(1 , 2) r 2 , s 1 :(4 , 2) r 3 :(8 , 1) 1 1 s 2 :(9 , 1) 1 1 r 3 :(8 , 2) s 1 :(3 , 1) r 3 , s 2 :(9 , 1) 14 / 33
Planner ◮ Uses master access graph as input ◮ Chooses the most connected node for initial placement ◮ Chooses the node most connected to already placed node-set ◮ Places it depending on its direction of the connecting edge 10 8 A B C 7 5 9 6 1 2 7 2 7 9 G D E F 2 4 6 3 9 6 3 8 8 H I J F → H → J → A → G → C → B → E → D 15 / 33
System Architecture User−space components Application Analyzer Planner User Kernel VFS Page Cache I/O Layout Trace Plan File Systems (EXT3, JFS...) I/O BOPT−space Profiler Reconfigurator BORG Layer I/O Indirector I/O Scheduler Kernel−space components Device Driver Legend: Existing components New components 16 / 33
Reconfigurator 5. Reads from BOPT C’ D’ 1. Graph G 2. Current Plan Planner BOPT Space 3. Writes plan Reconfigurator W’ Source Dest. Leaving C’ C BOPT FS D FS 6. Writes to Space A FS C B 4. Reads plan Legend: BOPT Read Cache BOPT Write Buffer 17 / 33
Reconfigurator 5. Reads from BOPT D’ 1. Graph G 2. Current Plan Planner BOPT D" Space 3. Writes plan Reconfigurator W’ 6. Writes to Source Dest. BOPT Leaving C’ C BOPT FS D FS Relocate D’ D" Space A BOPT BOPT C B 4. Reads plan Legend: BOPT Read Cache BOPT Write Buffer 18 / 33
Reconfigurator 6. Writes to BOPT B’ 1. Graph G 2. Current Plan Planner BOPT D" Space 3. Writes plan Reconfigurator W’ Source Dest. Leaving C’ C BOPT FS D FS Relocate D’ D" Space A BOPT BOPT 5. Reads C Incoming B B’ FS block B 4. Reads plan FS BOPT Legend: BOPT Read Cache BOPT Write Buffer 19 / 33
System Architecture User−space components Application Analyzer Planner User Kernel VFS Page Cache I/O Layout Trace Plan File Systems (EXT3, JFS...) I/O BOPT−space Profiler Reconfigurator BORG Layer I/O Indirector I/O Scheduler Kernel−space components Device Driver Legend: Existing components New components 20 / 33
I/O Indirector B’ BOPT D" Space borg_map BOPT FS Read Dirty Block Block Request B’ B B’ 0 I/O B D Indirector C C’ 1 FS Space A C B Legend: BOPT Read Cache BOPT Write Buffer 21 / 33
I/O Indirector B’ BOPT D" Space borg_map BOPT FS Read Dirty Block Block Request B B’ 0 I/O A C C’ 1 Indirector D FS A X Space A C B Legend: BOPT Read Cache BOPT Write Buffer 22 / 33
I/O Indirector B’ BOPT D" Space borg_map W’ BOPT FS Write W’ Dirty Block Request Block B B’ 0 I/O W D Indirector C C’ 1 FS W W’ W’ 1 Space A C B Legend: BOPT Read Cache BOPT Write Buffer 23 / 33
Evaluation Goals ◮ How effective is BORG? ◮ What are the overheads? ◮ When is it not effective? ◮ How sensitive is it to different parameters? Setup ◮ Metric - Total disk busy times ◮ 5 hosts with different configurations ◮ Linux 2.6.22 kernel ◮ reiserfs and ext 3 24 / 33
Busy times for Webserver Setup ◮ Over 1.1 million requests to over 255,000 files in one week. ◮ BOPT size 8 GB, 4 Reconfigurations ◮ Evaluated BORG with cumulative and partial traces 3500 Vanilla 3000 Disk Busy Time (sec) BORG-C BORG-P 2500 2000 1500 1000 500 0 N 1 N 2 N 3 N 4 N 5 Phases Summary 14-35% reduction in busy times for cumulative and 5-39% for partial traces. 25 / 33
Busy times for Webserver Setup ◮ Over 1.1 million requests to over 255,000 files in one week. ◮ BOPT size 8 GB, 4 Reconfigurations ◮ Evaluated BORG with cumulative and partial traces 700 Vanilla Disk Busy Time (sec) 600 BORG-C BORG-P 500 400 300 200 100 0 R 1 R 2 R 3 R 4 Phases Summary ◮ Busy times higher in reconfiguration phases due to copy overheads. 26 / 33
BORG Overhead Setup ◮ Over 1.1 million requests to over 255,000 files in one week. ◮ BOPT size 8 GB, 4 Reconfigurations ◮ Cumulative and partial traces 30000 Reconfigurator 25000 Planner Analyzer 20000 Time (sec) 15000 10000 5000 0 C P C P C P C P R 1 R 2 R 3 R 4 Reconfigurations Summary ◮ Linear increase in planning and analysis overheads for cumulative traces. 27 / 33
Recommend
More recommend