Integrating multi-omics Luciano Milanesi Outline Introduction - PowerPoint PPT Presentation

Integrating multi-omics Luciano Milanesi

Outline • Introduction • Omics challenges • Data Integration • Big Data • Personalized system medicine • International Initiatives • Conclusions

Big Data in “Omics Sciences” The "Omics Sciences" consist of several areas of investigation : • Genomics, • Proteomics, • Interactomics, • Bioinformatics, • Neuroinformatics • System Biology • Metabolomics • Ecc. These and the correlated disciplines constitute the paradigm around which all the research in the fields of biomedicine, biotechnology and ICT generally applicable to the biomedical sciences

Omics Applications Disease resistant population Disease susceptible population Sequencing Genomes: From Individual to Populations ATG TTATAG gene X ATGTTTATAG

SNP and Biomarkers Analysis ¡ SNP ¡and ¡Biomarkers ¡Analysis ¡ PDB ¡ EnsEMBL ¡ GO ¡ KEGG ¡ CNV ¡ RefGENE ¡ dbSNP ¡ HapMap ¡ BioGRID ¡ Reactome ¡ Integrated ¡Knowledge ¡Database ¡ Integrated ¡ Ontological ¡ SNPs ¡ Biological ¡ Annota:ons ¡ Features ¡ DB ¡ En::es ¡DB ¡ List ¡of ¡ List ¡of ¡ RANKED ¡ GENES ¡ SNPS ¡

Omics Technology

Omics Data Explosion

Rate of sequence data generation 1E+14 1E+13 Capillary reads 1E+12 Assembled sequences Next gen. reads 1E+11 1E+10 Bases 1E+09 100000000 10000000 1000000 100000 1980 1985 1990 1995 2000 2005 2010 2015 Date

Cost of sequence data generation

Omics Complexity Explosion Interactomics and Pathways Discovery

Omics Applications Virology ¡ Clinical ¡ Medicine ¡& ¡ Oncology ¡ Bacterial ¡, ¡ fungal ¡and ¡ protozoal ¡ ¡ Bioinforma:cs ¡ ¡ System ¡Biology ¡

Biomedical Complex System

  Omics Data Intergration System Medicine Bioinforma:cs ¡ System ¡Biology ¡ Biotechnology ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ICT ¡ System ¡Medicine ¡

What is Big Data? • Definition • Big Data refers to a collection of data sets so large and complex that it ’ s impossible to process them with the usual databases and tools. • Because of its size and associated numbers, Big Data is hard to capture, store, search, share, analyze and visualize. • The three V ’ s: Volume, Velocity, Variety • High-Volume : Amount of data • High-Velocity : Speed rate in collecting or acquiring or generating or processing of data • High-Variety : Different data type such as audio, video, image data, sequence data • Processing • Parallel processing (eg. Hadoop) • Processing of data sets too large for transactional databases • Analyzing interactions , rather than transactions

L ¡ Big Data Visualiza:on ¡ Querying Collection Storage ¡– ¡see ¡the ¡ – make – get the – keep the scien:fic ¡ sense of the data data value ¡ data

Who ¡is ¡collec:ng ¡all ¡of ¡this ¡data? ¡ Big Pharmaceutical Companies Medical Science • Data bases from • e-Health • Patient Records • Medical ImagingMRI & CT scans,, … • Telemedicine • Genomics • Environmental data • Food science • Biosensors

Data integration • Cloud computing in combination with Big Data Tools can be used to obtain the power and the scale of computation required to facilitate large-scale efforts required in translational medicine data integration and to perform analysis in more efficient and economical way. 17 ¡

CNR-ITB Data Center • Resources: HPC (High Performance l Computing) Cluster HPSI (High Performace Storage l Infrastructure) DDN – WRVM (Web Remote Virtual l Machine) Databases: MySQL, ORACLE, l SQL Server Cluster Intel Servers: 44 l Total RAM: 2.080 GB l Total Disk space: 1.164 TB l 192 CPU and 1.216 core l GPU Server : 16 GPU, 16 CPU l and 96 core Operating system: Ubuntu 13.04, l Centos 6.5, Window Server, Mac OS Portal technology: Java portal l (LIFERAY) GRID Node l Virtual Node l Cloud Computing l Hadoop l

European Grid and Cloud Infrastructure • Distributed, federated storage • 350 ¡resource ¡centres ¡in ¡40 ¡countries ¡ • 400,000 ¡logical ¡CPU ¡cores ¡ and compute facilities • Grid and Cloud compute • 190 ¡PB ¡disk, ¡180 ¡PB ¡tape ¡ . • > ¡99.6% ¡reliability ¡ platforms . . • Virtual Research Environments • > 200 user research projects

�� European Cloud Infrastructure �� Standards ¡enable ¡federa:on ¡ Domain ¡specific ¡services ¡in ¡ • OCCI: ¡VM ¡Image ¡management ¡ • OVF: ¡VM ¡Image ¡format ¡ Virtual ¡Machine ¡Images ¡ • BDII: ¡Informa:on ¡system ¡ • X509: ¡Authen:ca:on ¡ • APEL: ¡Accoun:ng ¡ • (CDMI: ¡Cloud ¡storage) ¡ EGI ¡FedCloud ¡interfaces ¡ + ¡VM ¡image ¡Marketplace ¡ Cloud ¡hypervisors ¡ OS ¡ OS ¡ OS ¡ OS ¡ OS ¡ Cloud ¡hypervisor ¡is ¡a ¡local ¡choice. ¡Eg. ¡ Cloud ¡resources ¡ ¡ • OpenStack ¡ private/public ¡ • OpenNebula ¡ academic/commercial ¡ • Emo:veCloud ¡(Spain) ¡ • Okeanos ¡(OpenStack ¡impl. ¡in ¡GR) ¡ • WNoDeS ¡(Italy) ¡ �� http://go.egi.eu/cloud • … ¡ �� 20 ��

In ¡Silico ¡Drug ¡Discovery ¡ Millions of chemical 100 CPU years, compounds 1 TB disk space Starting compound Starting target database structure model Docking: predict how small molecules bind to a receptor of known DOCKING 3D structure Predicted binding models Post-analysis D'Ursi P., Chiappori F., Merelli I., Cozzi P., Rovida E., Milanesi L. Virtual screening pipeline and ligand modelling for H5N1 neuraminidase. Biochemical and Biophysical Research Communications. 2009 Compounds for assay

GPU – Graphics Processing Unit GPUs implement a SIMD (Single Instruction Multiple Data) many- core architecture, providing a very high level of parallelism on intense data-parallel computation problems.

GPU – Graphics Processing Unit l GPU-based solution in bioinformatics for: – Sequence Database Searching • CUDASW++ – Multiple Sequence Alignment • CUDA-BLASTP – Next-Generation Sequencing • DecGPU, CUDA-EC, Musket, SOAP3-dp, CUSHAW – Genome-Wide Association Studies • Mendel_GPU, GENIE, SWIFTLINK – Motif Finding • mCUDA-MEME

GPU – Graphics Processing Unit l SNP genotyping analysis is very susceptible to SNPs chromosomal position errors; l SNP mapping data are provided along the SNP arrays without information to assess in advance their accuracy; l moreover, mapping data are related with a given build of a genome and need to be updated when a new build is available.

MIMOmics EU Project • The aim of MIMOmics is to develop new statistical methods for the integrated analysis for metabolomics, proteomics, glycomics and genomic datasets in large studies. • Our partners are involvement involve in EU funded projects, i.e. GEHA, IDEAL, Mark-Age, ENGAGE, EuroSpan, and BBMRI • In these consortia the primary goal is to identify molecular profiles that monitor and explain complex traits with novel findings so far. • MIMOmics web site http://www.mimomics.eu at CNR (Milan, Italy)

Omics Scientific Web Portal MIMOmics resources (data sets and computational tools) MIMOmics authorized users Project Web Portal to: • create define the users credentials for all MIMOmics resources • access MIMOmics resources • develop, test and use tools on the data sets available • create pipeline of analysis combining tools and data sets

Omics Scientific Web Portal • The ¡ Omics ¡ Scien:fic ¡ Web ¡ Portal ¡ is ¡ based ¡ on ¡ Liferay Portal tecnology • Liferay is a robust technology, fully supported in terms of accessibility and Documents Management scalability • Liferay provides a flexible template interface Web Editing • With Liferay the users can manage contents and documents in a distribuited and dinamic way over internet • Liferay is compliabt with the Java Portlet Collaboration, Services API 2.0

User Registration

Omics Scientific Web Portal Project ¡Documents User ¡Registra:on Link ¡to ¡MIMOmics ¡resources Omics scientific web portal: • partner references can create new users with the same credentials for all MIMOmics resources • access MIMOmics resources • load and download MIMOmics datasets • develop, test and use MIMOmics methods • create pipeline of analysis combining tools and data sets

Integrating multi-omics Luciano Milanesi Outline Introduction - PowerPoint PPT Presentation

Integrating multi-omics Luciano Milanesi Outline Introduction Omics challenges Data Integration Big Data Personalized system medicine International Initiatives Conclusions Big Data in Omics Sciences The

PostgreSQL and Omics Data How omics data can be stored in postgres database Postgr tgreSQ eSQL

Integrating Problem Solving 2020 Integrating Problem Solving 2020 Integrating Problem Solving

Multi-Omics with Galaxy for Diverse Biological Applications Tim Griffin and Pratik Jagtap

Statistical analysis of meta-omics data Sandra Plancade INRA (French Institute of Research in

Abou out t OM OMICS S Gr Grou oup OMICS Group International is an amalgamation of

Machine Learning Applications to Omics Data Kelly Ruggles April 9, 2018 Diversity of Omics in

promoters can help to find new drugs. (Practical guide to multi-omics and multi- scale data

Reporting and Evaluation of Studies of Biomarkers and Omics-based Predictors: REMARK Guidelines

High-dimensional omics data analysis using a variable screening protocol with prior knowledge

Defining complex drug mechanisms with metabolomics and multi-omics Darren Creek 1, *, Carlo

System Dynamics based on multi-omics data II - A biologist-centric perspective - Outline

SURFnet6 SURFnet6 SURFnet6 Integrating the IP and Optical worlds Integrating the IP and Optical

Integrating LiDAR data into the Integrating LiDAR data into the workflow of cartographic workflow

Multi Use Civic Facility Multi Use Civic Facility Multi Use Civic Facility Multi Use Civic

Multi Multi Multi- Multi - - -Layer Access Control Layer Access Control Layer Access

Towards Deep Multi-View Stereo Silvano Galliani October 2, 2017 1 / 40 Towards Deep Multi-View

Why we are Challenging the chip Presented to European Work Hazards Network September 29,

Absolute quantification of somatic DNA alterations in human cancer Scott L. Carter, PhD 11.17.11

Introducing the work of AMMF Tie UKs only Cholangiocarcinoma Charity Helen M Morement March

Why our work in health and digital is important? We can make a difference to Aboriginal

Hep B Hangout: Culturally Integrated Education Material: Photonovel Hee-Soon Juon, PhD Professor

National Haemophilia Database UK Statistics for 2006. Dr G Dolan and Dr CRM Hay Principal New

Agenda 11:30 AM Lunch and Networking 12:30 PM Welcome and Updates 1:30 PM Adjourn 1

Transplants for MPD and MDS The question is really who to transplant, with what and when. Focus

Sambuz

Useful Links

Newsletter

Mail Us