from construction to deployment of lifewatchgreece the
play

From construction to deployment of LifeWatchGreece: The potential - PowerPoint PPT Presentation

From construction to deployment of LifeWatchGreece: The potential role of EGI - LW Competence Centre by Emmanouela Panteri Contributors: Christos Arvanitidis, Nicolas Bailly, Sarah Faulwetter,, Jacques Lagnel, George Perantinos, Anastasis


  1. From construction to deployment of LifeWatchGreece: The potential role of EGI - LW Competence Centre by Emmanouela Panteri Contributors: Christos Arvanitidis, Nicolas Bailly, Sarah Faulwetter,, Jacques Lagnel, George Perantinos, Anastasis Oulas, Panagiotis Vavilis, Kleoniki Keklikoglou, Matina Nikolopoulou, Alexandros Gougousis and about 30 data managers 4/23/2015 1

  2. LifeWatchGreece e-Infrastructure LWG e-infrastructure: ● Multi-server e-infrastructure currently deployed in HCMR, Crete ● Hosts biodiversity data and applications Applications: ● e-services: searching datasets/ data or one-shot analyses ● vLabs: interfaces for advanced selection of datasets/data, and more elaborated suites of analyses Insert footer here series of web tools (vLabs or e-services) for the public LifeWatchGreece 4/23/2015 2

  3. Accessed by the LifeWatchGreece Portal Application development in 2 steps: portal.lifewatchgreece.eu ● Independent development of a web application (by the team) ● Integration to the infrastructure / portal Access Control ● Landing page: list of applications ● One-time sign-up for accessing all apps ● A few applications require more credentials: the computer-intensive ones ● User Rights management Graphical Interface ● A common graphical interface frame/wrapper introducing all applications LifeWatchGreece 4/23/2015 3

  4. LWG e-Infrastructure: advantages ● Applications developed in any programming language (PHP, Java EE, .NET, ...) ● Design, development and maintenance of applications independent from each other : a common standard only for data exchange (DwC, …) Each application run in independent execution environment : ● scalable VMs number if needed with more apps. ● Compartmented security : affected application does not compromise others Core developers involved only at integration stage ● LifeWatchGreece LifeWatchGreece 4/23/2015 4

  5. LWG e-Infrastructure: advantages ● Other integration methods : iframes, integrating graphically commercial apps ● Open source applications integration possible with few adaptations both at access level and graphical level, especially when under MVC architecture Moreover, most CMSes can be easily integrated , at least at ● the access control level ● Certain javascript and CSS frameworks provided by default through libraries in order to enforce the consistency of the user interface throughout the portal LifeWatchGreece LifeWatchGreece 4/23/2015 5

  6. LWG Portal diagram LWG: Application Layer, Data Layer, Cluster, Communication LifeWatchGreece 4/23/2015 6

  7. LWG e-Infrastructure: What is missing ● No user workspace Currently, files not retrievable from one session to the other, from ● one tool to the other. ● Could EGI Competence Center provide such functionality? Workspace development will increase significantly the storage requirements. ● Would require some work between LWG infrastructure and EGI-CC (e.g., space allocation after sign up) LifeWatchGreece LifeWatchGreece 4/23/2015 7

  8. HPC bioinformatics platform Mainly focused on OMICs NGS data analysis: ● Transcriptomics (RNA-Seq) ● Genomics (Eukaryote and bacterial) ● Metagenomics (microbial community) ● Metabarcoding ● RAD-Sequencing More than 170 bioinformatics packages covering: ● Genomes & transcriptomes de novo assembly ● Functional and structural genes annotation ● Sequence similarity (parallel BLAST) and mapping ● Population genetics ● Phylogeny reconstruction ● Statistics (250 R packages installed) ● Genetic markers mining/analysis 43 users from 11 institutes in 5 countries (Greece, Italy, France, Norway, Portugal) More than 8000 jobs submitted during the last month LifeWatchGreece 4/23/2015 8

  9. HPC bioinformatics platform upgrade Hardware ● 13 worker nodes ● 300 CPU cores ● 9 worker nodes ● 2.5 TB RAM ● 108 cores, ● 120 TB storage ● 784 GB RAM, ● 40 Gbps Infiniband network ● 30TB storage ● 10 Gbps ethernet network ● Centos linux/debian Software ● Resource Manager: SLURM ● Gentoo Linux (open source) ● Storage: Lustre and ZFS/NFS ● Resource Manager: Torque/Maui ● Storage group/users quota ● storage: XFS/NFS ● LXC Virtualization ● storage users quota ● User management via LDAP Languages : GCC, ICC/IFC, R, BioPerl Biopython, ruby, Biojava.... parallelization : openMPI, OpenMP and pthreads Database servers : MySQL, PostgreSQL, ... ~3x Performance LifeWatchGreece 4/23/2015 9

  10. Bioinformatic challenges RNA-Seq data analysis =>360 Mreads Optimised and parallelised pipeline Runtime on the Runtime on Sequence similarity search: parallel BLAST biocluster (h) a PC (1 CPU) =>10,000 queries Nb CPUs 1 94 Assembly requires~35 blastn (nt) 12 (10 threads) >120 h 0GB shared RAM Speedup / 1.0 / 6.1 105.4 / 1.4 Annotation Runtime (h) days h BLAST 96 (94 jobs) >> 3 month blastx (nr) InterPro 32 (48 jobs) 1.5 month Speedup / 1.0 / 88.8 / 3.1 Mapping 4 (12*10 threads) >10 days Runtime (h) 11.6 h days Total ~ 6 days >5 months LifeWatchGreece 4/23/2015 10

  11. eServices and vLabs: the R-vLab ● Uses the “R” programming language ● Supports an integrated and optimized online R environment (data manipulation and computational speed-up) ● Allows to overcome severe computational power deficit, e. g.: Calculation on large matrices of several biodiversity indices and of multivariate analyses How can EGI Competence Center help LWG e-infrastructure to increase its computational power? LifeWatchGreece 4/23/2015 11

  12. eServices and vLabs: the R-vLab Conventional Mantel compared to Parallel Mantel ~20 fold speed-up LifeWatchGreece 4/23/2015 12

  13. eServices and vLabs: MicroCT ● Micro-computed tomography ● Non-destructive method of 3D x-ray microscopy ● Creation of 3D models of objects from a series of x-ray projection images MicroCT offers: ● Collection of virtual galleries of taxa displayed and disseminated ● Manipulation of the 3D models through a series of online tools ● Download of datasets for local manipulations How can EGI Competence Center help LWG e-infrastructure for the storage and image manipulation, incl. 3D models? LifeWatchGreece 4/23/2015 13

  14. MicroCT: current issues In general: ● Potential large increase of the number image galleries especially from museum specimen collections (several orders of magnitude) ● Need for 3D metadata standards: dissemination and searching ● Need for 3D data annotations protocols and tools ● Need for searching tools over the spread catalogues of galleries (centralized or distributed) In LWG ● MicroCT generates many image files: storage issue ● Processing and manipulating images are CPU intensive: computing issue LifeWatchGreece 4/23/2015 14

  15. LWG and EGI Competence Center Processing power and storage requirements Harvesting various other repositories such as: ● Taxonomic: CoL and PESI (and components: FADA, EMRS, E+MP), WoRMS, EEA/EUNIS, ... ● Occurrences: GBIF, OBIS, ... ● Species traits: PolyTraits, FishBase, SeaLifeBase, eModNet, ... ● Bibliography: RefBank, BHL, AnimalBase, ... ● Citizen Science: iNaturalist, ... ● Workflows: BioVel, ... Install mirror websites : FishBase, RefBank, GNI Develop Web Services for disseminating LWG data: ● Concerns about performance due to Web services use LifeWatchGreece 4/23/2015 15

  16. Linked Data / Linking Open Data LifeWatchGreece principle: make data available to everybody A number of datasets as RDF under triplestores are ready Diagram from http://lod-cloud.net/ LifeWatchGreece LifeWatchGreece 4/23/2015 16

  17. LifeWatchGreece LifeWatchGreece Research Infrastructure , funded by the GSRT (Greek government: structural funds), is the national effort to address the above requirement and to support relevant studies. To materialize its aim, LWG RI adheres to the central lifewatch.eu guidelines, and attempts to ally all the Greek scientific human resources working on biodiversity data and data observatories. Coordinated by the Institute of Marine Biology, Biotechnology and Aquaculture (IMBBC, www. imbbc.hcmr.gr) of the Hellenic Center for Marine Research (HCMR, www.hcmr.gr), LWG includes 49 partner institutions covering a wide range of scientific disciplines (terrestrial, marine and freshwater biology, zoology, botany, Thank you ;) geography, forestry, agriculture, genetics, biotechnology, pharmacy, aquaculture, education and law). LifeWatchGreece 4/23/2015 17

Recommend


More recommend