stork making data placement a first class citizen in the
play

STORK: Making Data Placement a First Class Citizen in the Grid - PowerPoint PPT Presentation

STORK: Making Data Placement a First Class Citizen in the Grid Tevf ik Kosar Universit y of Wisconsin-Madison May 25 th , 2004 CERN Need to move data around. . TB TB PB PB Stork: Making Data Placement a First Class Citizen in the Grid


  1. STORK: Making Data Placement a First Class Citizen in the Grid Tevf ik Kosar Universit y of Wisconsin-Madison May 25 th , 2004 CERN

  2. Need to move data around. . TB TB PB PB Stork: Making Data Placement a First Class Citizen in the Grid

  3. While doing this. . Locat e t he dat a Access het er ogeneous r esour ces Face wit h all kinds of f ailur es Allocat e and de-allocat e st orage Move t he dat a Clean-up ever yt hing All of these need to be done reliably and ef f iciently! Stork: Making Data Placement a First Class Citizen in the Grid

  4. Stork A scheduler f or dat a placement act ivit ies in t he Grid What Condor is f or comput at ional j obs, St or k is f or dat a placement St or k comes wit h a new concept : “Make dat a placement a f ir st class cit izen in t he Grid.” Stork: Making Data Placement a First Class Citizen in the Grid

  5. Outline I nt r oduct ion The Concept St or k Feat ur es Big Pict ure Case St udies Conclusions Stork: Making Data Placement a First Class Citizen in the Grid

  6. The Concept • Stage-in • Execute the Job • Stage-out Individual Jobs Stork: Making Data Placement a First Class Citizen in the Grid

  7. The Concept Allocate space for input & output data Stage-in • Stage-in Execute the job • Execute the Job • Stage-out Release input space Stage-out Release output space Individual Jobs Stork: Making Data Placement a First Class Citizen in the Grid

  8. The Concept Allocate space for input & output data Stage-in • Stage-in Execute the job • Execute the Job • Stage-out Release input space Stage-out Data Placement Jobs Release output space Computational Jobs Stork: Making Data Placement a First Class Citizen in the Grid

  9. The Concept Condor DAG specification Job C DaP A A.submit Queue DaP B B.submit Job C C.submit ….. D F Parent A child B Stork A B C Parent B child C E Job Parent C child D, E E ….. Queue DAGMan Stork: Making Data Placement a First Class Citizen in the Grid

  10. Why Stork? St or k under st ands t he char act er ist ics and semant ics of dat a placement j obs. Can make smar t scheduling decisions, f or r eliable and ef f icient dat a placement . Stork: Making Data Placement a First Class Citizen in the Grid

  11. Understanding Job Characteristics & Semantics J ob_t ype = t r ansf er , r eser ve, r elease? Sour ce and dest inat ion host s, f iles, pr ot ocols t o use? h Det er mine concur r ency level h Can select alt ernat e prot ocols h Can select alt ernat e rout es h Can t une net wor k par amet er s (t cp buf f er size, I / O block size, # of par allel st r eams) h … Stork: Making Data Placement a First Class Citizen in the Grid

  12. Support f or Heterogeneity Prot ocol t ranslat ion using St or k memory buf f er. Stork: Making Data Placement a First Class Citizen in the Grid

  13. Support f or Heterogeneity Prot ocol t ranslat ion using St or k Disk Cache. Stork: Making Data Placement a First Class Citizen in the Grid

  14. Flexible Job Representation and Multilevel Policy Support [ Type = “Tr ansf er ”; Src_Url = “srb:/ / ghidor ac.sdsc.edu/ kosar t .condor / x.dat ”; Dest_Url = “nest :/ / t ur key.cs.wisc.edu/ kosar t / x.dat ”; … … … … Max_Retry = 10; Restart_in = “2 hour s”; ] Stork: Making Data Placement a First Class Citizen in the Grid

  15. Failure Recovery and Ef f icient Resource Utilization Fault t oler ance h J ust submit a bunch of dat a placement j obs, and t hen go away.. Cont r ol number of concur r ent t r ansf er s f r om/ t o any st or age syst em h Pr event s over loading Space allocat ion and De-allocat ions h Make sur e space is available Stork: Making Data Placement a First Class Citizen in the Grid

  16. Run- time Adaptation Dynamic pr ot ocol select ion [ dap_t ype = “t ransf er”; src_url = “drout er:/ / slic04.sdsc.edu/ t mp/ t est .dat ”; dest_url = “drout er:/ / quest 2.ncsa.uiuc.edu/ t mp/ t est .dat ”; alt _prot ocols = “nest-nest , gsif t p-gsif t p”; ] [ dap_t ype = “t ransf er”; src_url = “any:/ / slic04.sdsc.edu/ t mp/ t est .dat ”; dest_url = “any:/ / quest 2.ncsa.uiuc.edu/ t mp/ t est .dat ”; ] Stork: Making Data Placement a First Class Citizen in the Grid

  17. Run- time Adaptation Run-t ime Prot ocol Aut o-t uning [ link = “slic04.sdsc.edu – quest 2.ncsa.uiuc.edu”; pr ot ocol = “gsif t p”; bs = 1024KB; / / block size t cp_bs = 1024KB; / / TCP buf f er size p = 4; ] Stork: Making Data Placement a First Class Citizen in the Grid

  18. Outline I nt r oduct ion The Concept St or k Feat ur es Big Pict ure Case St udies Conclusions Stork: Making Data Placement a First Class Citizen in the Grid

  19. USER JOB PLANNER Abstract DAG DESCRI PTI ONS

  20. USER JOB RLS PLANNER Abstract DAG DESCRI PTI ONS Concrete DAG WORKFLOW MANAGER

  21. USER JOB RLS PLANNER Abstract DAG DESCRI PTI ONS Concrete DAG WORKFLOW MANAGER DATA COMPUTATI ON PLACEMENT SCHEDULER SCHEDULER STORAGE SYSTEMS COMPUTE NODES

  22. USER JOB RLS PLANNER Abstract DAG DESCRI PTI ONS Concrete DAG WORKFLOW MANAGER POLI CY ENFORCER DATA COMPUTATI ON PLACEMENT SCHEDULER SCHEDULER C. JOB D. JOB LOG FI LES LOG FI LES COMPUTE STORAGE SYSTEMS NODES

  23. USER JOB RLS PLANNER Abstract DAG DESCRI PTI ONS Concrete DAG WORKFLOW MANAGER POLI CY ENFORCER DATA COMPUTATI ON PLACEMENT SCHEDULER SCHEDULER C. JOB D. JOB LOG FI LES LOG FI LES COMPUTE STORAGE SYSTEMS NODES DATA MI NER NETWORK MONI TORI NG TOOLS FEEDBACK MECHANI SM

  24. USER JOB RLS PEGASUS Abstract DAG DESCRI PTI ONS Concrete DAG DAGMAN MATCHMAKER CONDOR/ STORK CONDOR-G C. JOB D. JOB LOG FI LES LOG FI LES COMPUTE STORAGE SYSTEMS NODES DATA MI NER NETWORK MONI TORI NG TOOLS FEEDBACK MECHANI SM

  25. Outline I nt r oduct ion The Concept St or k Feat ur es Big Pict ure Case St udies Conclusions Stork: Making Data Placement a First Class Citizen in the Grid

  26. Case Study I : SRB- UniTree Data Pipeline Tr ansf er ~ 3 TB Submit Site of DPOSS dat a SRB f r om SRB @SDSC UniTree Server Server t o UniTr ee @NCSA A dat a t r ansf er pipeline cr eat ed SDSC NCSA wit h St or k Cache Cache Stork: Making Data Placement a First Class Citizen in the Grid

  27. Failure Recovery Diskrouter reconfigured UniTree not responding and restarted Stork: Making Data Placement a First Class Citizen in the Grid SDSC cache reboot & Software problem UW CS Network outage

  28. Case Study - I I Stork: Making Data Placement a First Class Citizen in the Grid

  29. Dynamic Protocol Selection Stork: Making Data Placement a First Class Citizen in the Grid

  30. Runtime Adaptation Bef or e Tuning: • parallelism = 1 • block_size = 1 MB • t cp_bs = 64 KB Af t er Tuning: • parallelism = 4 • block_size = 1 MB • t cp_bs = 256 KB Stork: Making Data Placement a First Class Citizen in the Grid

  31. Case Study - I I I Split files 3 2 DiskRouter/ 4 1 Globus-url-copy 5 7 Condor File Condor Pool Staging Site Transfer @UW @UW Mechanism 6 Merge files SRB put SRB Server 8 Management User submit s @SDSC Site @UW a DAG at WCER management sit e DiskRouter/ Globus-url-copy Other Control flow Replicas Input Data flow Output Data flow Other Condor Stork: Making Data Placement a First Class Citizen in the Grid Processing Pools

  32. Conclusions Regar d dat a placement as individual j obs. Tr eat comput at ional and dat a placement j obs dif f er ent ly. I nt r oduce a specialized scheduler f or dat a placement . Provide end-t o-end aut omat ion, f ault t olerance, run-t ime adapt at ion, mult ilevel policy suppor t , r eliable and ef f icient t r ansf er s. Stork: Making Data Placement a First Class Citizen in the Grid

  33. Future work Enhanced int er act ion bet ween St or k and higher level planner s h bet t er coor dinat ion of CPU and I / O I nt er act ion bet ween mult iple St or k ser ver s and j ob delegat ion Enhanced aut hent icat ion mechanisms More run-t ime adapt at ion Stork: Making Data Placement a First Class Citizen in the Grid

  34. Related Publications Tevf ik Kosar and Miron Livny. “St ork: Making Dat a P lacement a First roceedings of 24 t h I EEE I nt . Conf erence Class Cit izen in t he Grid”. I n P on Dist ribut ed Comput ing Syst ems (I CDCS 2004), Tokyo, J apan, March 2004. George Kola, Tevf ik Kosar and Miron Livny. “A Fully Aut omat ed Fault- t olerant Syst em f or Dist ribut ed Video P rocessing and Of f -sit e roceedings of 14 t h ACM I nt . Workshop on Replicat ion. To appear in P et work and Operat ing Syst ems Support f or Digit al Audio and Video (Nossdav 2004), Kinsale, I reland, J une 2004. Tevf ik Kosar, George Kola and Miron Livny. “A Framework f or Self - opt imizing, Fault-t olerant , High P erf ormance Bulk Dat a Transf ers in a roceedings of 2 nd I nt . Het erogeneous Grid Environment ”. I n P Symposium on P arallel and Dist ribut ed Comput ing (I SP DC 2003), Lj ublj ana, Slovenia, Oct ober 2003. George Kola, Tevf ik Kosar and Miron Livny. “Run-t ime Adapt at ion of Grid Dat a Placement J obs”. I n P roceedings of I nt . Workshop on Adapt ive Grid Middleware (AGridM 2003) , New Orleans, LA, Sept ember 2003. Stork: Making Data Placement a First Class Citizen in the Grid

Recommend


More recommend