wings for pegasus
play

Wings for Pegasus: A Semantic Approach for Creating Very Large - PowerPoint PPT Presentation

Powered by Powered by Wings for Pegasus: A Semantic Approach for Creating Very Large Scientific Workflows Yolanda Gil Jihie Kim Varun Ratnakar Ewa Deelman USC Information Sciences Institute www.isi.edu/ikcap/wings pegasus.isi.edu


  1. Powered by Powered by Wings for Pegasus: A Semantic Approach for Creating Very Large Scientific Workflows Yolanda Gil Jihie Kim Varun Ratnakar Ewa Deelman USC Information Sciences Institute www.isi.edu/ikcap/wings pegasus.isi.edu Presentation at “OWL: Experiences and Directions”, Athens, GA, November 10-11, 2006 USC INFORMATION SCIENCES INSTITUTE Yolanda Gil 1 November 11, 2006

  2. Powered by Computing and the Future of Science USC INFORMATION SCIENCES INSTITUTE Yolanda Gil 2 November 11, 2006

  3. Powered by Sharing Data Collection Instruments: LIGO (ligo.caltech.edu) USC INFORMATION SCIENCES INSTITUTE Yolanda Gil 3 November 11, 2006

  4. Powered by Sharing Computing Resources [Slide from C. Cattlet of UC and TeraGrid] USC INFORMATION SCIENCES INSTITUTE Yolanda Gil 4 November 11, 2006

  5. Powered by Integrating Diverse Models of Complex Phenomena [Slide from T. Jordan of SCEC] Seismicity Seismicity Paleoseismology Paleoseismology Geologic structure Geologic structure Local site effects Local site effects Faults Faults Seismic Hazard Model Stress Stress Rupture Rupture transfer transfer dynamics dynamics Crustal Crustal Crustal Seismic velocity Crustal Seismic velocity motion motion deformation deformation structure structure USC INFORMATION SCIENCES INSTITUTE Yolanda Gil 5 November 11, 2006

  6. Powered by Computational Workflows Interdependent sets of computations  Dependencies are data flow: output of C1 is input for C2  Computations can be submitted for execution in various remote resources  Input data may be obtained from remote data repositories  New data products may be stored in remote data repositories  Task Result: Hazard curve: SA vs. UTM prob. exc. Converter Lat. long (get-Lat-Long- UTM (, , , ) given-UTM) PEER-Fault Gaussian Dist Ruptures Duration-Year No Truncation Hazard curve: SA Fault-Grid-Spacing vs. prob. exc. Total Moment Rupture Offset Mag-Length-sigma Rate Rake Dip Ruptures rfml Hazard Curve Magnitude (min) Calculator: SA Rupture Magnitude (max) vs. prob. exc. Magnitude (mean) Lat CVM-get- Long. Velocity Velocity- Lat Site VS30 SA exc. Long. at-point Field probs. Site Basin-Depth-2.5 (2000) Basin-Depth IMR: SA rfml Lat Basin-Depth Calculator SA Period Long. exc. prob. Gaussian SA exc. Truncation prob. Std. Dev. Type USC INFORMATION SCIENCES INSTITUTE Yolanda Gil 6 November 11, 2006

  7. Powered by Pegasus: Planning for Execution in Grids [Deelman et al SPJ ’ 05; Deelman et al JGC ’ 05; Deelman et al JGC ’ 03]  Maps from an workflow instance to executable workflow  Automatically locates physical locations for both workflow components and data  Finds appropriate resources to execute the components  Reuses existing data products where applicable  Publishes newly derived data products Adds data management nodes to the workflow • Supports automated provenance information capture •  Restructures workflows to improve performance  Provides reliability via re-tries and re-mapping in case of failures USC INFORMATION SCIENCES INSTITUTE Yolanda Gil 7 November 11, 2006

  8. Powered by Mapping Workflow Instances to Grid Resources in Pegasus Final Workflow c Desired f b a Results h d f i e h Workflow of tasks c g a b a i f KEY The original node e d h Input transfer node g Registration node Output transfer node i Unnecessary nodes USC INFORMATION SCIENCES INSTITUTE Yolanda Gil 8 November 11, 2006

  9. Powered by Pegasus Application Domains Southern California Earthquake  Center _ million jobs, ~10TB data per • workflow Pulsar search for gravitational-  wave physics (LIGO) Largest ever NSF project • ~100,000 tasks per workflow • Galaxy morphology for NVO  and NASA in Montage ~50,000 tasks per workflow • Thomography for neural  structure reconstruction High-energy physics – Compact  Muon Solenoid Gene alignment  USC INFORMATION SCIENCES INSTITUTE Yolanda Gil 9 November 11, 2006

  10. Powered by Creating Large Scientific Workflows  Current approaches: scripts to create thousands of jobs and the dataflow among them Scripts are workflow-specific and costly to create and debug • … … USC INFORMATION SCIENCES INSTITUTE Yolanda Gil 10 November 11, 2006

  11. Powered by Our Approach to Creating Large-Scale Scientific Workflows 1. Capture the underlying structure of workflows as generic workflow “templates” 2. Automatic creation of “workflow instances” for given data inputs 1 Workflow instance Workflow template 2 USC INFORMATION SCIENCES INSTITUTE Yolanda Gil 11 November 11, 2006

  12. Powered by Wings/Pegasus Framework: Creation of Large-Scale Grid Workflows 1. Workflow Template (generic known-to-work recipes) Specifies application components and dataflow • among them No data specified, just their type • 2. Workflow Instance (data-specific) Specifies data files for a given template • Expands parallel data processing steps • Logical file names, not physical file replicas • 3. Executable Workflow (actual run) Specifies physical locations of data files (may be in data repositories) • Assigned hosts/pools for execution of each component • Expand workflow to includes data movements among execution sites • Reduce workflow by reusing previously executed computations • Restructure workflow by grouping related executions for efficiency • USC INFORMATION SCIENCES INSTITUTE Yolanda Gil 12 November 11, 2006

  13. Powered by Wings: Workflow Instance Generation and Selection “Validate this workflow -Workflow templates specify WINGS based on the complex analyses sequences component specs” “Show me - Workflow instances specify data workflows Workflow that prune MT Workflow Workflow Creation rules” Selection Libraries SEASONED NL RESEARCHER Ontologies: Workflow Domain terms, - Specifies data Component types, Template STUDENT requirements Workflow Products Application - Specifies execution Components (OWL) requirements “Run this workflow with the Data Data WSJ-04 data set” Selection Repositories Component Specification - Preexisting data collections - Workflow execution results Workflow ALGORITHM Instance DEVELOPER “Here is a new Rule pruning code, takes in a set of MT rules, DAGMan/ Executable is compiled for MPI” Pegasus Grid Workflow USC INFORMATION SCIENCES INSTITUTE Yolanda Gil 13 November 11, 2006

  14. Powered by Example: A Workflow for Pruning Rules in a Machine Translation System USC INFORMATION SCIENCES INSTITUTE Yolanda Gil 14 November 11, 2006

  15. Powered by Workflows for Brain Imaging Analysis (full ontologies and data available at http://vtcpc.isi.edu/provenance) Workflow template Workflow instance USC INFORMATION SCIENCES INSTITUTE Yolanda Gil 15 November 11, 2006

  16. Powered by Workflows for Brain Imaging Analysis (full ontologies and data available at http://vtcpc.isi.edu/provenance) Workflow template Workflow instance Template Metadata Propagation Axioms USC INFORMATION SCIENCES INSTITUTE Yolanda Gil 16 November 11, 2006

  17. Powered by Workflows for Brain Imaging Analysis (full ontologies and data available at http://vtcpc.isi.edu/provenance) Workflow template Workflow instance Metadata of Actual Input data Template Metadata Propagation Axioms USC INFORMATION SCIENCES INSTITUTE Yolanda Gil 17 November 11, 2006

  18. Powered by Workflows for Brain Imaging Analysis (full ontologies and data available at http://vtcpc.isi.edu/provenance) Workflow template Workflow instance Metadata of Actual Input data Metadata Attributes Automatically Generated for Template New Data Metadata Products of Propagation the Workflow Axioms USC INFORMATION SCIENCES INSTITUTE Yolanda Gil 18 November 11, 2006

  19. Powered by Editing and Creating Workflows with Repetitive Structure Wings Editor Workflow template Workflow instance USC INFORMATION SCIENCES INSTITUTE Yolanda Gil 19 November 11, 2006

  20. Powered by A Wings Workflow Template for Seismic Hazard Analysis Single File Nested File Collection File Collection Application Component Component Collection USC INFORMATION SCIENCES INSTITUTE Yolanda Gil 20 November 11, 2006

  21. Powered by Constraints on Workflow Templates Constraints on files/collections of different workflow components hasFile InputLink_SiteNameFil hasSiteName SiteNameFile e_to_BoxNameCheck SiteName … isSameAs hasFile hasSiteName … CybershakeTemplate CC-SGTs InputLink_SGTCollforRup _to_SeismogramGen SGTsSiteName C-SGT-forRups hasLink F-SGT hasN_Items … N_Rups InputLink_RuptureVars hasFile CC-RuptureVariations _to_SeisgmogramGen hasN_Items C-RuptVars F-RV Constraints on number of elements in different collections USC INFORMATION SCIENCES INSTITUTE Yolanda Gil 21 November 11, 2006

Recommend


More recommend