implementing collocation groups 1 about draper lab
play

IMPLEMENTING COLLOCATION GROUPS #1 About Draper Lab An - PDF document

IMPLEMENTING COLLOCATION GROUPS #1 About Draper Lab An independent, not-for-profit corporation dedicated to applied research, engineering development, education, and technology transfer Spun off from the Massachusetts Institute of


  1. IMPLEMENTING COLLOCATION GROUPS #1

  2. About Draper Lab • An independent, not-for-profit corporation dedicated to applied research, engineering development, education, and technology transfer – Spun off from the Massachusetts Institute of Technology in 1973 – Expertise in guidance, navigation and control systems – Early applications: U.S. Navy's Fleet Ballistic Missile Program and NASA's Apollo Program IMPLEMENTING COLLOCATION GROUPS #2

  3. Agenda • Why collocation groups? • ITSM code components • Additional tools • A process to move 40TB • Conclusions IMPLEMENTING COLLOCATION GROUPS #3

  4. Why do I want Collocation groups? • Number of nodes vs. number of slots 1. Nodes < slots; collocate by node or filespace 2. Nodes > slots; can't collocate • If collocate is on, no control of node mixing, still 1 mount per node during migration • Node size vs. tape capacity 1. Size > tape cap.; collocation fills tape 2. Size < tape cap.; collocation wastes tape • Collocation by group makes "supernodes" which work for both case 1's IMPLEMENTING COLLOCATION GROUPS #4

  5. Server Configuration • Sun v480 4 processors, Solaris 9 • Raw disk for db, log, backuppool, no raid • TSM server code at 5.3.1.3 DB size Number of Acronym Function Physical TB (pages) files Library LM 5,500 5,000 6 manager SS For servers 8,500,000 47,500,000 18 For SD 45,400,000 230,000,000 40 desktops For SD2 4,300,000 33,000,000 2 desktops IMPLEMENTING COLLOCATION GROUPS #5

  6. The starting SD server mess • Volumes – 417 to process – Average nodes / volume is 188 – Max is 713 – 25 are over 500 • Nodes – 1635 nodes – Average 48 volumes / node – Max is 132 – 25 are over 100 IMPLEMENTING COLLOCATION GROUPS #6

  7. New server commands • Def, del, upd, query collocgroup – names and describes the group • Def, del collocmember – adds a node to a group • Query nodedata – Very fast!! – Lists tapes which have files for a node or group, no separation by filespace • Upd stgpool colloc=group IMPLEMENTING COLLOCATION GROUPS #7

  8. The secret perl scripts • 4 scripts in the bin directory, not documented • Used only defgroups.pl – Analyzes ‘q occ’ data, creates define statements to build the groups – Execution • ./defgroups.pl id pwd domain size [execute] IMPLEMENTING COLLOCATION GROUPS #8

  9. Fix the defgroups.pl SQL • Eliminate stgpool subselect – Change stgpool subselect to in list, name your tape stgpool • Eliminate join between nodes & occupancy – check domain_name with a subselect • Eliminate check for a collocgroup – It is always null while implementing • Runtime drops from “beyond the limits of my patience” to 5 minutes IMPLEMENTING COLLOCATION GROUPS #9

  10. Using Query Nodedata • SQL generates a command for each node – Also 'q nodedata * stg=pool_name' • Run file from step1, direct output to 2 nd file – ‘q nodedata’ doesn’t have a corresponding SQL table (the very expensive volumeusage table is close) • Edit output to get only node name and volume name • Load into MySql • Analyze IMPLEMENTING COLLOCATION GROUPS #10

  11. Tools • MySQL desktop development server – Very handy to have! – No select for nodedata – Do complex joins without killing the server – http://www.mysql.com/ • UltraEdit editor – Sorting, column editing, hex editing – http://www.ultraedit.com/ IMPLEMENTING COLLOCATION GROUPS #11

  12. Preliminaries • Decide target number of tapes in each group – Convert it to 'size in megs' for defgroups.pl – Goal is 4 tapes – We compress at the client, so lto2 capacity is 200G, � 'size' is 800,000 • Run defgroups.pl on domain(s) • Execute the commands from defgroups.pl • 'Update stgpool <name> colloc=group' • Mark all current tapes readonly – Stops migrating to uncollocated filling tapes – Makes SQL easier • Have as many scratch tapes as groups IMPLEMENTING COLLOCATION GROUPS #12

  13. A process to minimize tape mounts • By turning on collocation by group, a move or reclaim within the tapepool will need an output tape mount for each collocgroup on the input tape. – Potentially very slow, stressful for the tape drives • Solution is to move data from tape to devt=file pools on disk where files are put into groups, then migrate back to tape. IMPLEMENTING COLLOCATION GROUPS #13

  14. Storage pools • 3 sequential pools on disk – seqdisk3, seqdisk4, seqdisk5 • 2 pools receive data from tapes – Seqdisk3 & 4 each have 2 69G volumes – Not collocated, moves don’t reconstruct • Seqdisk5 receives data from seqdisk3 & 4 – 170 8GB volumes on 10 146GB drives, each with its own file system. – Collocated by group, moves reconstruct IMPLEMENTING COLLOCATION GROUPS #14

  15. The schedules and scripts • Each script is executed every 10 minutes by a schedule – 6 similar schedules for each script • For script a, run at 00:00, 00:10, 00:20, etc. • T4_VOLUMES_ODD, move odd numbered volumes to seqdisk3 • T4_VOLUMES_EVEN, move even numbered volumes to seqdisk4 • T4_MOVES, moves seqdisk3 & 4 volumes to seqdisk5 • T4_MIGRATES, starts migration of seqdisk5 to tape • T4_VOLUMES_DIRECT move some tapes direct to tape IMPLEMENTING COLLOCATION GROUPS #15

  16. SQL to make the scripts • Use a file as a macro to create the script • The T4_VOLUMES* script has a prolog with logic – checks if backuppool migration is running, exit if yes – checks if SEQDISK3 is being used, exit if yes – checks for space in SEQDISK3, if yes then run • Run SQL to select odd/even volumes ordered by pct_utilized and append it to the file • For each volume, need 4 lines in the script – test if the volumes is still full or filling – goto script lines to issue a move command – issue the move command – exit IMPLEMENTING COLLOCATION GROUPS #16

  17. Other methods to move all that data • Direct tape to tape within the pool – Not as bad as I had feared! – Analyzed which tapes had the fewest groups on them and moved them tape to tape. • Of 278 tapes, 219 have 30 or more (42 max) • Move nodedata direct tapes to tapes – Move nodedata list-of-all-the-nodes-in-group – Need extra scratch tapes because source tapes aren't emptied quickly IMPLEMENTING COLLOCATION GROUPS #17

  18. The results so far • Started on Aug-5, results as of Sep-8 • Volumes – 160 to process – Average nodes / volume is 188 – Max is 485 – 10 are over 400 • Nodes – 1629 nodes – Average 22 volumes / node – Max is 63 – 4 are over 50 IMPLEMENTING COLLOCATION GROUPS #18

  19. Summary • Match your process to your resources – Does your disk write speed match your tape read speed? • The more groups you have, the longer a tape to tape move or reclaim will take. • Do 2 processes? – Few cg's on a tape, do tape to tape. – Lots of cg's on a tape, do tape to file. IMPLEMENTING COLLOCATION GROUPS #19

Recommend


More recommend