biokepler a comprehensive bioinformatics scientific
play

bioKepler: A Comprehensive Bioinformatics Scientific Workflow - PowerPoint PPT Presentation

bioKepler: A Comprehensive Bioinformatics Scientific Workflow Module for Distributed Analysis of Large-Scale Biological Data Project Website: http://www.biokepler.org Ilkay Altintas 1 , Daniel Crawl 1 , Weizhong Li 2 , Shulei Sun 2 , Jianwu


  1. bioKepler: A Comprehensive Bioinformatics Scientific Workflow Module for Distributed Analysis of Large-Scale Biological Data � Project Website: http://www.biokepler.org Ilkay Altintas 1 , Daniel Crawl 1 , Weizhong Li 2 , Shulei Sun 2 , Jianwu Wang 1 , Sitao Wu 2 � 1 San Diego Supercomputer Center, UCSD � 2 Center for Research in Biological Systems, UCSD �

  2. Kepler: a Scientific Workflow System � • A cross-project collaboration � Ptolemy II: A laboratory for investigating design initiated August 2003 � KEPLER: A problem-solving environment for Scientific download times > 40,000 � Workflow • 2.3 released on 20 Jan 2012 � KEPLER = “ Ptolemy II + X ” for • Builds upon the open-source Scientific Workflows Ptolemy II framework � 07/14/12 � http://www.biokepler.org/ � 2 �

  3. bioKepler: a Module Being Built in Kepler � • Use Distributed Data-Parallel (DDP) frameworks, e.g., MapReduce, to accelerate bioinformatics tool execution � • Create, configurable, reusable and executable DDP components in Scientific Workflow System � • Support different execution engines and computational environments and optimize workflow execution � 07/14/12 � http://www.biokepler.org/ � 3 �

  4. Conceptual Framework � 07/14/12 � http://www.biokepler.org/ � 4 �

  5. Software Architecture � 07/14/12 � http://www.biokepler.org/ � 5 �

  6. Sample bioActors � • Alignment: BLAST, BLAT � • Profile-Sequence Alignment: PSI-BLAST � • Hidden Markov Model: HMMER � • Mapping: Bowtie, BWA, Samtools � • Multiple Alignment: ClustalW, Muscle � • Clustering: CD-HIT, Blastclust � • Gene Prediction: Glimmer, Genescan, Fraggenescan � • tRNA prediction: tRNA-scan, Meta-RNA � • Phylogeny: FastTree, RAxML � 07/14/12 � http://www.biokepler.org/ � 6 �

  7. DDP BLAST Workflow via Splitting Query Sequences � Switch director to work with other DDP engines, such as Hadoop � execute with data partition � 07/14/12 � http://www.biokepler.org/ � 7 �

  8. DDP BLAST Workflow Experiments � 4.0 3.5 Total Execution Time (hours) 3.0 2.5 2.0 1.5 1.0 16 32 48 64 Number of Slave CPU Cores 07/14/12 � http://www.biokepler.org/ � 8 �

  9. Questions? � • More Information � jianwu@sdsc.edu � http://www.biokepler.org � http://www.kepler-project.org � • Acknowledgements � 07/14/12 � http://www.biokepler.org/ � 9 �

Recommend


More recommend