STORK: Making Data Placement a First Class Citizen in the Grid Tevf ik Kosar Universit y of Wisconsin-Madison May 25 th , 2004 CERN
Need to move data around. . TB TB PB PB Stork: Making Data Placement a First Class Citizen in the Grid
While doing this. . Locat e t he dat a Access het er ogeneous r esour ces Face wit h all kinds of f ailur es Allocat e and de-allocat e st orage Move t he dat a Clean-up ever yt hing All of these need to be done reliably and ef f iciently! Stork: Making Data Placement a First Class Citizen in the Grid
Stork A scheduler f or dat a placement act ivit ies in t he Grid What Condor is f or comput at ional j obs, St or k is f or dat a placement St or k comes wit h a new concept : “Make dat a placement a f ir st class cit izen in t he Grid.” Stork: Making Data Placement a First Class Citizen in the Grid
Outline I nt r oduct ion The Concept St or k Feat ur es Big Pict ure Case St udies Conclusions Stork: Making Data Placement a First Class Citizen in the Grid
The Concept • Stage-in • Execute the Job • Stage-out Individual Jobs Stork: Making Data Placement a First Class Citizen in the Grid
The Concept Allocate space for input & output data Stage-in • Stage-in Execute the job • Execute the Job • Stage-out Release input space Stage-out Release output space Individual Jobs Stork: Making Data Placement a First Class Citizen in the Grid
The Concept Allocate space for input & output data Stage-in • Stage-in Execute the job • Execute the Job • Stage-out Release input space Stage-out Data Placement Jobs Release output space Computational Jobs Stork: Making Data Placement a First Class Citizen in the Grid
The Concept Condor DAG specification Job C DaP A A.submit Queue DaP B B.submit Job C C.submit ….. D F Parent A child B Stork A B C Parent B child C E Job Parent C child D, E E ….. Queue DAGMan Stork: Making Data Placement a First Class Citizen in the Grid
Why Stork? St or k under st ands t he char act er ist ics and semant ics of dat a placement j obs. Can make smar t scheduling decisions, f or r eliable and ef f icient dat a placement . Stork: Making Data Placement a First Class Citizen in the Grid
Understanding Job Characteristics & Semantics J ob_t ype = t r ansf er , r eser ve, r elease? Sour ce and dest inat ion host s, f iles, pr ot ocols t o use? h Det er mine concur r ency level h Can select alt ernat e prot ocols h Can select alt ernat e rout es h Can t une net wor k par amet er s (t cp buf f er size, I / O block size, # of par allel st r eams) h … Stork: Making Data Placement a First Class Citizen in the Grid
Support f or Heterogeneity Prot ocol t ranslat ion using St or k memory buf f er. Stork: Making Data Placement a First Class Citizen in the Grid
Support f or Heterogeneity Prot ocol t ranslat ion using St or k Disk Cache. Stork: Making Data Placement a First Class Citizen in the Grid
Flexible Job Representation and Multilevel Policy Support [ Type = “Tr ansf er ”; Src_Url = “srb:/ / ghidor ac.sdsc.edu/ kosar t .condor / x.dat ”; Dest_Url = “nest :/ / t ur key.cs.wisc.edu/ kosar t / x.dat ”; … … … … Max_Retry = 10; Restart_in = “2 hour s”; ] Stork: Making Data Placement a First Class Citizen in the Grid
Failure Recovery and Ef f icient Resource Utilization Fault t oler ance h J ust submit a bunch of dat a placement j obs, and t hen go away.. Cont r ol number of concur r ent t r ansf er s f r om/ t o any st or age syst em h Pr event s over loading Space allocat ion and De-allocat ions h Make sur e space is available Stork: Making Data Placement a First Class Citizen in the Grid
Run- time Adaptation Dynamic pr ot ocol select ion [ dap_t ype = “t ransf er”; src_url = “drout er:/ / slic04.sdsc.edu/ t mp/ t est .dat ”; dest_url = “drout er:/ / quest 2.ncsa.uiuc.edu/ t mp/ t est .dat ”; alt _prot ocols = “nest-nest , gsif t p-gsif t p”; ] [ dap_t ype = “t ransf er”; src_url = “any:/ / slic04.sdsc.edu/ t mp/ t est .dat ”; dest_url = “any:/ / quest 2.ncsa.uiuc.edu/ t mp/ t est .dat ”; ] Stork: Making Data Placement a First Class Citizen in the Grid
Run- time Adaptation Run-t ime Prot ocol Aut o-t uning [ link = “slic04.sdsc.edu – quest 2.ncsa.uiuc.edu”; pr ot ocol = “gsif t p”; bs = 1024KB; / / block size t cp_bs = 1024KB; / / TCP buf f er size p = 4; ] Stork: Making Data Placement a First Class Citizen in the Grid
Outline I nt r oduct ion The Concept St or k Feat ur es Big Pict ure Case St udies Conclusions Stork: Making Data Placement a First Class Citizen in the Grid
USER JOB PLANNER Abstract DAG DESCRI PTI ONS
USER JOB RLS PLANNER Abstract DAG DESCRI PTI ONS Concrete DAG WORKFLOW MANAGER
USER JOB RLS PLANNER Abstract DAG DESCRI PTI ONS Concrete DAG WORKFLOW MANAGER DATA COMPUTATI ON PLACEMENT SCHEDULER SCHEDULER STORAGE SYSTEMS COMPUTE NODES
USER JOB RLS PLANNER Abstract DAG DESCRI PTI ONS Concrete DAG WORKFLOW MANAGER POLI CY ENFORCER DATA COMPUTATI ON PLACEMENT SCHEDULER SCHEDULER C. JOB D. JOB LOG FI LES LOG FI LES COMPUTE STORAGE SYSTEMS NODES
USER JOB RLS PLANNER Abstract DAG DESCRI PTI ONS Concrete DAG WORKFLOW MANAGER POLI CY ENFORCER DATA COMPUTATI ON PLACEMENT SCHEDULER SCHEDULER C. JOB D. JOB LOG FI LES LOG FI LES COMPUTE STORAGE SYSTEMS NODES DATA MI NER NETWORK MONI TORI NG TOOLS FEEDBACK MECHANI SM
USER JOB RLS PEGASUS Abstract DAG DESCRI PTI ONS Concrete DAG DAGMAN MATCHMAKER CONDOR/ STORK CONDOR-G C. JOB D. JOB LOG FI LES LOG FI LES COMPUTE STORAGE SYSTEMS NODES DATA MI NER NETWORK MONI TORI NG TOOLS FEEDBACK MECHANI SM
Outline I nt r oduct ion The Concept St or k Feat ur es Big Pict ure Case St udies Conclusions Stork: Making Data Placement a First Class Citizen in the Grid
Case Study I : SRB- UniTree Data Pipeline Tr ansf er ~ 3 TB Submit Site of DPOSS dat a SRB f r om SRB @SDSC UniTree Server Server t o UniTr ee @NCSA A dat a t r ansf er pipeline cr eat ed SDSC NCSA wit h St or k Cache Cache Stork: Making Data Placement a First Class Citizen in the Grid
Failure Recovery Diskrouter reconfigured UniTree not responding and restarted Stork: Making Data Placement a First Class Citizen in the Grid SDSC cache reboot & Software problem UW CS Network outage
Case Study - I I Stork: Making Data Placement a First Class Citizen in the Grid
Dynamic Protocol Selection Stork: Making Data Placement a First Class Citizen in the Grid
Runtime Adaptation Bef or e Tuning: • parallelism = 1 • block_size = 1 MB • t cp_bs = 64 KB Af t er Tuning: • parallelism = 4 • block_size = 1 MB • t cp_bs = 256 KB Stork: Making Data Placement a First Class Citizen in the Grid
Case Study - I I I Split files 3 2 DiskRouter/ 4 1 Globus-url-copy 5 7 Condor File Condor Pool Staging Site Transfer @UW @UW Mechanism 6 Merge files SRB put SRB Server 8 Management User submit s @SDSC Site @UW a DAG at WCER management sit e DiskRouter/ Globus-url-copy Other Control flow Replicas Input Data flow Output Data flow Other Condor Stork: Making Data Placement a First Class Citizen in the Grid Processing Pools
Conclusions Regar d dat a placement as individual j obs. Tr eat comput at ional and dat a placement j obs dif f er ent ly. I nt r oduce a specialized scheduler f or dat a placement . Provide end-t o-end aut omat ion, f ault t olerance, run-t ime adapt at ion, mult ilevel policy suppor t , r eliable and ef f icient t r ansf er s. Stork: Making Data Placement a First Class Citizen in the Grid
Future work Enhanced int er act ion bet ween St or k and higher level planner s h bet t er coor dinat ion of CPU and I / O I nt er act ion bet ween mult iple St or k ser ver s and j ob delegat ion Enhanced aut hent icat ion mechanisms More run-t ime adapt at ion Stork: Making Data Placement a First Class Citizen in the Grid
Related Publications Tevf ik Kosar and Miron Livny. “St ork: Making Dat a P lacement a First roceedings of 24 t h I EEE I nt . Conf erence Class Cit izen in t he Grid”. I n P on Dist ribut ed Comput ing Syst ems (I CDCS 2004), Tokyo, J apan, March 2004. George Kola, Tevf ik Kosar and Miron Livny. “A Fully Aut omat ed Fault- t olerant Syst em f or Dist ribut ed Video P rocessing and Of f -sit e roceedings of 14 t h ACM I nt . Workshop on Replicat ion. To appear in P et work and Operat ing Syst ems Support f or Digit al Audio and Video (Nossdav 2004), Kinsale, I reland, J une 2004. Tevf ik Kosar, George Kola and Miron Livny. “A Framework f or Self - opt imizing, Fault-t olerant , High P erf ormance Bulk Dat a Transf ers in a roceedings of 2 nd I nt . Het erogeneous Grid Environment ”. I n P Symposium on P arallel and Dist ribut ed Comput ing (I SP DC 2003), Lj ublj ana, Slovenia, Oct ober 2003. George Kola, Tevf ik Kosar and Miron Livny. “Run-t ime Adapt at ion of Grid Dat a Placement J obs”. I n P roceedings of I nt . Workshop on Adapt ive Grid Middleware (AGridM 2003) , New Orleans, LA, Sept ember 2003. Stork: Making Data Placement a First Class Citizen in the Grid
Recommend
More recommend