chromosome scale assemblies of wild musa genomes using
play

Chromosome-scale Assemblies of Wild Musa Genomes using long reads - PowerPoint PPT Presentation

Chromosome-scale Assemblies of Wild Musa Genomes using long reads and optical maps Jean-Marc Aury jmaury@genoscope.cns.fr @J_M_Aury Banana Genomics, 01/15/2019 Genoscope overview http://www.genoscope.cns.fr French National Sequencing


  1. Chromosome-scale Assemblies of Wild Musa Genomes using long reads and optical maps Jean-Marc Aury jmaury@genoscope.cns.fr @J_M_Aury Banana Genomics, 01/15/2019

  2. Genoscope overview http://www.genoscope.cns.fr • French National Sequencing Center lead by Patrick Wincker, created in 1997 and part of the CEA since 2007. • Provide high-throughput sequencing data to the Academic community, and carry out in-house genomic projects • Focus on biodiversity : de novo sequencing and metagenomic projects (TaraOceans) • But…. it's not enough to just know one individual’s genome. Quercus robur Vitis vinifera (oak) (grape) A single reference genome is not compatible with resequencing approaches Brassica napus (seed rape) Musa acuminata (banana) 2

  3. Sequencing capacities 2 Illumina HiSeq 2500 2 Illumina HiSeq 4000 1 Illumina NovaSeq 2 MiSeq 6 Oxford Nanopore MkI 1 PromethION 1 Saphyr System 3

  4. Genome assembly difficulties Repeat R1 Repeat R2 Repeat R3 Genome Short reads sequencing Contig 3 Contig 4 Contig graph Contig 2 Contig 1 Contig 5 => Repetitive regions lead to fragmented assemblies and under-estimate repeat content 4

  5. Genome assembly difficulties Haplotype1 Haplotype2 Short reads sequencing Contig 2 Contig 5 Contig graph Contig 1 Contig 4 Contig 7 Contig 3 Contig 6 => Heterozygous regions lead to fragmented assemblies and cause allelic duplication (over-estimate the size of the haploid genome) 5

  6. Read Length Matters 1 contig per chromosome assemblies => Yeast genome assembly is resolved when using 30X of 25Kb reads in average 6

  7. Sequencing of plant genomes using the MinION • Large scale genomic project focused on Musa genomes • Musa spp are essential crops in (sub-)tropical countries, and are interesting models for studying reticulate evolution • Modern species are hybrids genomes • In this context, we are currently sequencing 7 banana genomes. 7

  8. Continuity of current plant genome assemblies A lot of plant genomes have already been sequenced, but only 6 plant species have an assembly with a contig N50 > 5Mb 2018 2017 http://www.genoscope.cns.fr/genomes 8

  9. Genome assembly of plant genomes using long and short reads So far, 5 Musa have been sequenced Musa Musa Musa Musa Musa acuminata acuminata acuminata schizocarpa textilis ssp zebrina ssp malaccensis ssp burmannica Estimated 587 Mb 700 Mb 530 Mb 530 Mb 530 Mb Genome size # flowcells 18 23 46 21 5 Cumul. Size 27 Gb 36 Gb 81 Gb 35 Gb 32 Gb N50 24 kb 28 Kb 18 Kb 16 Kb 25 Kb Coverage 51 X 51 X 150 X 66 X 60 X N50 longest 32 kb 36 Kb 32 Kb 27 Kb 30 Kb 30X with the goal of reaching at least 30X coverage and an N50 at 30Kb 9

  10. Genome assembly process Nanopore reads Read subset selection Longest Filtlong All reads reads (30X) reads (30X) Assembly with Ra and smartdenovo Best assembly selection (cumulative size & contig N50) Polishing (Racon x 3 + Pilon x 3) 10

  11. Genome assembly results Musa Musa Musa Musa Musa acuminata acuminata acuminata schizocarpa textilis ssp zebrina ssp malaccensis ssp burmannica # contigs 437 608 718 427 704 Cumul. Size 527 Mb 601 Mb 510 Mb 477 Mb 481 Mb N50 2.1 Mb 3.2 Mb 2.0 Mb 2.7 Mb 1.9 Mb Max size 12.8 Mb 21.5 Mb 13.1 Mb 16.0 Mb 11.2 Mb High contiguity of the assemblies, but insufficient to decipher genome organization at the chromosome-level 11

  12. Bionano data Organization of nanopore contigs using optical maps Musa acuminata Musa schizocarpa ssp malaccensis Enzyme BspQI DLE BspQI DLE # of molecules 938,740 1,952,550 1,003,793 357,005 N50 211 Kb 215 Kb 275 Kb 232 Kb Coverage 358X 672X 557X 173X Maps 266 197 252 24 N50 5.1 Mb 28.7 Mb 8.0 Mb 35.0 Mb Cumulative 565 Mb 643 Mb 571 Mb 469 Mb size Contiguity of DLE maps is 5 to 15 times higher than that of BspQI maps 12

  13. Hybrid Assembly Process DLE and BsPQI maps (non haplotype with extend and split) Nanopore Optical Assembly Maps Hybrid scaffolding GapChecker (internal process) Polishing (Pilon x 1) 13

  14. Chromosome-scale assemblies Organization of nanopore contigs using optical maps Bionano Direct Label and Stain (DLS) technology Musa acuminata Musa schizocarpa ssp malaccensis # scaffolds 227 144 Cumul. Size (N’s) 525 Mb (1.5%) 473 Mb (0.8%) N50 36.8 Mb 34.6 Mb Contig N50 6.5 Mb 8.6 Mb (nanopore assembly) (2.1Mb) (2.7Mb) % chromosomes in 11 / 11 11 / 11 ≤3 scaffolds Hybrid scaffolding generated chromosome scale assemblies and but also improved the contig N50 14

  15. Chromosome-scale assemblies Schematic view of chromosome 7 from banana genome assembly 15

  16. Chromosome-scale assemblies Comparison of the existing reference with long-read assemblies Musa sp. Musa Musa acuminata Musa acuminata schizocarpa D’hont et al. Reference This study This study Estimated genome 523 523 587 size # chromosomes 11 11 11 Cumulative size 397,008,016 473,451,791 496,921,565 % of anchored 88.06% >94% 94.60% bases Max size 44,889,171 47,700,946 54,858,060 33,488,183 2,616,737 6,816,353 # of N’s (8.43%) (1.37%) (0.58%) Number of genes 36,542 32,371 In progress % of anchored 91.98% In progress 98.46% genes 16

  17. Chromosome-scale assemblies Comparison of the existing reference with long-read assembly https://dnanexus.github.io/dot/ High variability in the centromeric regions => sequences originated from these regions are always difficult to order and orient correctly 17

  18. Chromosome-scale assemblies Comparison of the existing reference with long-read assembly https://dnanexus.github.io/dot/ 35 Mb vs 43 Mb for chromosome 6 https://dnanexus.github.io/dot/ Significant differences in chromosome length (34Mb vs 48Mb for chromosome 9) mainly in centromeres 18

  19. Continuity of current plant genome assemblies Using Nanopore+Bionano we were able to add four more species with contig N50 > 5Mb M. schizocarpa Pahang http://www.genoscope.cns.fr/genomes 19

  20. Sequencing of the banana genome using the PromethION Musa schizocarpa Musa schizocarpa Estimated Genome PromethION MinION size # flowcells 1 18 Cumul. Size 17.6 Gb 27 Gb N50 26 Kb 24 kb Coverage 34 X 51 X # scaffolds 199 227 Cumulative size 519.5 Mb 525.6 Mb N50 36.8 Mb 36.9 Mb Contig N50 10.0 Mb 6.5 Mb Sequencing Costs ~ $6,000 ~ $16,000 20

  21. Conclusion • Musa schizocarpa assembly and annotation are publicly available Genoscope website : www.genoscope.cns.fr/plants NCBI database : PRJEB26661 • Musa acuminata (Pahang) reference genome based on long reads and optical map should be available publicly in the coming months, gene prediction is under progress • PromethION throughput allows sequencing of large genomes • Nanopore error rate is acceptable for de novo sequencing projects, still an issue with homopolymers • DNA extraction is a key point (quantity and quality) to obtain “ultra - long” reads and generate optical maps 21

  22. Acknowledgments • Genoscope labs • Bioinformatic : Benjamin Istace, Stefan Engelen, Caroline Belser and Marion Dubarry • Sequencing lab: Corinne Cruaud, Erwan Denis and Arnaud Lemainque • Angélique D’Hont , Guillaume Martin, Franc-Christophe Baurens and Jaroslav Dolezel, Eva Hribova. • Funding agencies : CEA, Genoscope and France Génomique R&DBioSeq Team www.genoscope.cns.fr/rdbioseq jmaury@genoscope.cns.fr @J_M_Aury 22

  23. 23

Recommend


More recommend