BioinfoGRID Project: Bioinformatics Grid Application for life science LEGRÉ Yannick (legre@clermont.in2p3.fr) CNRS/IN2P3, LPC Clermont-Ferrand on behalf of the BioInfoGrid consortium Credit slides: Luciano MILANESI, ITB CNR http:/ / w w w .itb.cnr.it/ bioinfogrid I SGC 2 0 0 6 – Taipei – May 1 st – 4 th, 2 0 0 6
BioinfoGRID Project . • The BIOINFOGRID projects proposes to combine the Bioinformatics services and applications for molecular biology users with the Grid Infrastructure operated by EGEE and EGEEII projects. • In the BIOINFOGRID initiative we plan to evaluate genomics, transcriptomics, proteomics and molecular dynamics applications studies based on GRID technology. • start date: 1st January 2006 • end date: 31st December 2007 BioinfoGRID http://www.itb.cnr.it/bioinfogrid 2
Introduction • A typical gene lab can produce 100 terabytes of information a year , the equivalent of 1 million encyclopedias . • Few biologists have the computational skills needed to fully explore such an astonishing amount of data ; nor do they have the skills to explore the exploding amount of data being generated from clinical trials. • The immense amount of data that are available, and the knowledge is the tip of the data iceberg . Bioinformatics: Emerging Opportunities and Emerging Gaps1 Paula E.Stephan and Grant Black BioinfoGRID http://www.itb.cnr.it/bioinfogrid 3
Bioinformatics applications in GRID ID MURA_BACSU STANDARD; PRT; 429 AA. DE PROBABLE UDP-N-ACETYLGLUCOSAMINE 1- CARBOXYVINYLTRANSFERASE DE (EC 2.5.1.7) (ENOYLPYRUVATE TRANSFERASE) (UDP-N- ACETYLGLUCOSAMINE DE ENOLPYRUVYL TRANSFERASE) (EPT). BioinfoGRID http://www.itb.cnr.it/bioinfogrid 4
BIOINFOGRID: Workpackages WP Description Genomics Applications in GRID WP1 Proteomics Applications in GRID WP2 Transcriptomics Applications in GRID WP3 Database and Functional Genomics Applications WP4 Molecular Dynamics Applications WP5 Coordination of technical aspects and relation with RI Projects, WP6 user training, application support and resources integration. Dissemination and Outreach. WP7 Project Management Office WP8 BioinfoGRID http://www.itb.cnr.it/bioinfogrid 5
Genomics applications in GRID Aim of WP1 : use of computational GRID to analyse molecular biological data at the genomic scale Projects • the GRID version of the Portal system : unification of larger groups of bioinformatics tools into single analytical steps and their optimization for GRID • GRID analysis of cDNA data: computer- aided functional annotation of cDNAs in order to optimize sensitivity and specificity WP1 BioinfoGRID http://www.itb.cnr.it/bioinfogrid 6
Genomics applications in GRID • GRID analysis of genomic databases : integration of precomputed data, gene identification, differentiation of pseudogenes, comparative genome analysis, etc. • Multiple alignments: testing of new algorithms for computationally very demanding alignment procedures, optimization for GRID. WP1 BioinfoGRID http://www.itb.cnr.it/bioinfogrid 7
Proteomics Applications in GRID Aim of WP2 : use of computational GRIDs to analysis molecular biological data in proteomics Projects • Perform functional protein analysis in GRID : Testing the functional protein domain annotations of large proteins families using GRID and related databases. WP2 BioinfoGRID http://www.itb.cnr.it/bioinfogrid 8
Proteomics Applications in GRID • Protein surface calculation in GRID . : the grid will be used to elaborate the volumetric description of the protein obtaining a precise representation of the corresponding surface. WP2 BioinfoGRID http://www.itb.cnr.it/bioinfogrid 9
Transcriptomics applications in GRID Aim of WP3 : use of computational GRIDs to analyse trascriptomics data and to perform application of Phylogenetic methods based on estimates trees. Projects • To perform algorithmic tools for gene expression data analysis in GRID : evaluate the computational tools for extracting biologically significant information from gene expression data. • Algorithms will focus on clustering steady state and time series gene expression data, multiple testing and meta analysis of different microarray experiments from different groups, and identification of transcription sites. WP3 BioinfoGRID http://www.itb.cnr.it/bioinfogrid 10
Transcriptomics applications in GRID Data analysis specific for microarray and allow the GRID user to store and search microarray data, with direct access to the data files stored on Data Storage element on GRID servers. Researchers perform their activities regardless geographical location, interact with colleagues, share and access data Scientific instruments and experiments provide huge amount of data from EGEEII microarray WP3 BioinfoGRID http://www.itb.cnr.it/bioinfogrid 11
Phylogenetic application in GRID • Phylogenetics methods : Reconstructing the evolutionary history of a group of taxa is major research thrust in computational biology and a standard part of exploratory sequence analysis. An evolutionary history not only gives relationships among taxa, but also an important tool for inferring the universal tree of life, inferring structural, physiological, and biochemical properties of sequences from other similar sequences, and reconstruction of tissue evolution. WP3 BioinfoGRID http://www.itb.cnr.it/bioinfogrid 12
Database Applications in GRID Aim of WP4 : This WP will provide the possibility to manage the biological databases, by using the GRID EGEEII infrastructure. Projects • Biological database on GRID: these database will be complemented by the other publicly available in Internet, by using GRID and web services where is appropriate. WP4 BioinfoGRID http://www.itb.cnr.it/bioinfogrid 13
Functional Analogous Finder • Functional Analogous Finder: By using the GO terms and the associations to gene products and using a simple chi-square approach we plan to compare the total associated GO terms and their ascending parents to validate the functional analogy between two gene products. • In addition, we weight the GO term dependent how often this term is used; the more the term is used to describe different gene products the less specific it is and the lower the weighting and impact for the statistic. • A search within the UniProt products for a functional analogous therefore involves a comparison of the GO terms of the gene product of interest with the GO terms. WP4 BioinfoGRID http://www.itb.cnr.it/bioinfogrid 14
Molecular applications in GRID Aim of WP5 : The objective is to test the scalability of Molecular Dynamics simulations, which usually takes very long time to complete relevant analysis. Analysis will be performed notably using as a starting point the data generated by the WISDOM application on the EGEE infrastructure Projects • Wide In Silico Docking On Malaria initiative WISDOM : This protocol has to coordinate the different analysis steps in order to complete the simulation on the GRID platform WP5 BioinfoGRID http://www.itb.cnr.it/bioinfogrid 15
WISDOM Data challenge • Mandate: deploy Molecular Dynamics application in a grid environment – Evaluation of performances • Goal: contribute to the WISDOM initiative dedicated to in silico drug discovery – Start from the results of WISDOM docking data challenge in 2005 – Rerank the best hits using Molecular Dynamics • Strategy: deployment of MD softwares on different grid infrastructures – Grid of PCs: EGEE-II – Grid of supercomputers: DEISA BioinfoGRID http://www.itb.cnr.it/bioinfogrid 16
Dissemination and Outreach • The BIOINFOGRID Initial training course http://www.itb.cnr.it/bioinfogrid/project-events/initial-training-course • Course objectives – provide to bioinformatics users a general overview of the state of the art in the development of the Grid Middleware and infrastructures. In particular the state of LCG and gLite Middleware and of the EGEE infrastructure will be presented; – provide detailed technical information and precise instructions on how to use the GRID to enable new users to start using the Grid in the best possible way. � • A BioinfoGRID International Conference will be organized towards the end of the project in 2007. WP6, WP7 BioinfoGRID http://www.itb.cnr.it/bioinfogrid 17
http://www.itb.cnr.it/bioinfogrid BioinfoGRID http://www.itb.cnr.it/bioinfogrid 18
CREDITS • Suhai Sándor (DKFZ) Germany • Mazzucato, Mirco (INFN), Italy • Breton Vincent (CNRS/IN2P3), France. • Giorgio Maggi (INFN), Italy • Legre Yannick (CNRS/IN2P3), France. • Francesco Beltrame (DIST), Italy • Lio’ Pietro (UNIVERSITY OF CAMBRIDGE), UK • Meloni Giovanni (CILEA), Italy • Giselle Andreas (CNR-ITB), Italy • Ivan Merelli (CNR-ITB), Italy BioinfoGRID http://www.itb.cnr.it/bioinfogrid 19
Thank you for your attention! 4 th HealthGrid conference 6th – 9th June Valencia (Spain) http://valencia2006.healthgrid.org BioinfoGRID http://www.itb.cnr.it/bioinfogrid 20
Recommend
More recommend