diversity in vivo multicore in silico
play

Diversity in vivo, Multicore in silico : How to link metagenomics - PowerPoint PPT Presentation

Diversity in vivo, Multicore in silico : How to link metagenomics and community ecology Alain Franc & al. Agreenium, INRA and Universit de Bordeaux, Toulouse Bordeaux, France October 2014 Plan Diversity and ecology


  1. Diversity in vivo, Multicore in silico : How to link metagenomics and community ecology Alain Franc & al. Agreenium, INRA and Université de Bordeaux, Toulouse Bordeaux, France October 2014

  2. Plan • Diversity and ecology • Molecular systematics • Discrete mathematics for molecular systematics • Tools for discrete mathematics for molecular systematics • Case study: amazonian trees and dimensionality reduction • Case study: diatoms and inventories through NGS • Next future

  3. DIVERSITY AND ECOLOGY

  4. Some examples Biodiversity and Applied Mathematics Molecular inprint of evolution : discrete mathematics and statistical modelling - Global alignment (very hard problem) - Inferring large phylogenies (very hard problem) - Coalescence models (technical, rich domain) - Genetic distances and evolutionary distances Donoghue & al., 2009 Ecological modelling : dynamical systems - Community assembly - Diffuse coevolution (geographical mosaic …) A challenge : How to link and assemble those two modelling domains?

  5. MOLECULAR SYSTEMATICS

  6. Evolution

  7. Few Many traits : individuals genome wide cover Many individuals Few DNA regions of interest

  8. DISCRETE MATHEMATICS FOR MOLECULAR SYSTEMATICS

  9. Taxonomy on Edit distance Definition: The edit distance between two strings is defined as the minimum number of edits needed to transform one string into the other, with the allowable edit operations being insertion, deletion, or substitution of a single character. k itten → s itten (substitution of 'k' with 's') sitt e n → sitt i n (substitution of 'e' with 'i') sittin → sittin g (insert 'g' at the end).

  10. A pint of methods … What we have: genetic distances between sequences What we want: an evolutionary distance as branch length on a phylogenetic tree (time?) Tool for linking both: a graph for visualizing pariwise distance network according to athreshold

  11. Ultrametric distances A taxon is … a disc … a clique a clade

  12. Graphs … From Wikipedia, graphs http://fr.wikipedia.org/wiki/Th%C3%A9orie_des_graphes

  13. Two useful notions on graphs Clique Connected component Basis for Phylogenetics Basis for BLAST Ultrametrics Finding: easy Finding: hard (NP complete)

  14. TOOLS FOR DISCRETE MATHEMATICS FOR MOLECULAR SYSTEMATICS

  15. A couple of success stories … HPC and turbulences Astrophysics, Climate change, Earth Sciences, Bioinformatics, etc … From HPC to distributed computing Computational science Tighter links with services Science as service for communities

  16. What about Biology? Discrete mathematics on words and strings: from sequences to proteins Genetics and random processes population genetics, inferring phylogenies , … Ecology, system biology, and ([very] large) dynamical systems Bioinformatics Integrative biology Medecine Agriculture Environment Response to climate change

  17. An experiment with Turing (IDRIS) ? Scaling Millions of reads exact calculation no heuristics (local alignment) Flows of several 10 ×To / job Towards diagnotics for community ecology

  18. towards a Shared, virtual Biodiversity Lab Distinguish, mobilize, and unite three types of knowledge and skills - Evolutionary biology and ecology - Applied mathematics and statistical modelling - Computer Sciences and High Performance Computing

  19. How does it work? One module Modularity and networking in workflows Galaxy servers enables to implement this As soon as a command line launches a module A network of modules

  20. Galaxy Workflows

  21. Where is it possible to compute? • Local Galaxy server • Mesocentre (Tier 2) Avakas From a unique portal 1000 cores the Galaxy instance • Tier 1 (IDRIS, one pipeline, not via Galaxy) • EGI GRID France-Grille • Cloud (on going, with UPV Valencia) Where from? From any computer connected to internet Currenty available from French Guiana (IP Cayenne works with it)

  22. CASE STUDY: AMAZONIAN TREES AND DIMENSIONALITY REDUCTION

  23. CASE STUDY: DIATOMS AND INVENTORIES THROUGH NGS

  24. An example for taxonomic Annotation from NGS Pairwise distances Distance matrix From local alignment Building a graph Selection of a barcoding gap Computing connex components and cliques Statistics on taxa and characters Visualisation

  25. metabarcoding + NGS on diatoms communities Cross validation  False-positives  False-negatives  Abundances Taxonomic inventory Quality Indices

  26. Taxonomic inventories Microscopy mock community Metabarcoding Metabarcoding rbc L / 454 / RSYST DB rbc L / PGM / RSYST DB 100% homology 99% homology 40 000 reads 54 000 reads Kermarrec et al 2013 19/21 17/21 false-negatives = 2 sp under 0.6% false-negatives: 2 false-positives = taxonomy pb 3 sp under 1% ( Gomphonema sp complex ) 1 sp 1.9% 3 false-positives = 1 to 5 reads

  27. Taxonomic inventories Microscopy  Lake Geneva  Seasonal dynamics of benthic diatoms  Monthly samplings during 1 year  10 environmental samples (April 2012 to March 2013) Metabarcoding  Diatoms: scraped from 5 stones, 50 cm depth rbc L / PGM / RSYST DB 250 000 reads

  28. NEXT FUTURE …

  29. Molecular based taxonomy and systematics: An open route for (new) methods Sequences known by pairwise distances Distance geometry pattern recognition machine learning Clustering Multidimensional Scaling linear and nonlinear (e.g. Sammon, 1969) Manifold learning IsoMap, EigenMap, etc … Graph based methods spectral clustering

  30. Continuum of population differentiation Complete independence Pattern recognition … Modest connectivity Substantial connectivity Panmixia (subpopulations are 46 completely congruent) After Waples and Gaggiotti, 2006, Molecular Ecology

  31. Pattern and functions Biodiversity from populations to biomes +

  32. Speculation: Assemblage and Scaling Item Number Living systems: Atoms 92 Molecules 10 6 ? Diversity …. 3 × 10 2 Assembly of heterogeneous parts Cell types Distributed systems 10 7 Organisms  Communiti Distributed computing es For Distributed systems? http://www.fractalforums.com/images-showcase-%28rate-my-fractal%29/the-lego-molecule/?PHPSESSID=00a24d7f4234586a8e5ba4dd9c82541b One modelling goal: howto visualize /simulate large associations of small/large numbers of types with modular structures

  33. Thanks to Team Yec’han Laizet Jean-Marc Frigerion Philippe Chaumeil HPC Pierre Gay MCIA Bordeaux Sylvie Thérond IDRIS Michel Daydé e-Biothon Vincent Breton idGC, GIS FG (Meta)barcoding LMGE, Clermont Didier Debroas Gisèle Bronner Carrtel, Thonon Agès Bouchez Frédéric Rimet Isabelle Domaizon AMAP Jean-François Molino Daniel Sabatier IP Cayenne Benoit de Thoisy Anne Lavergne Sourakhata Tirera

Recommend


More recommend