Chromosome-scale Assemblies of Wild Musa Genomes using long reads - - PowerPoint PPT Presentation

chromosome scale assemblies of wild musa genomes using
SMART_READER_LITE
LIVE PREVIEW

Chromosome-scale Assemblies of Wild Musa Genomes using long reads - - PowerPoint PPT Presentation

Chromosome-scale Assemblies of Wild Musa Genomes using long reads and optical maps Jean-Marc Aury jmaury@genoscope.cns.fr @J_M_Aury Banana Genomics, 01/15/2019 Genoscope overview http://www.genoscope.cns.fr French National Sequencing


slide-1
SLIDE 1

Chromosome-scale Assemblies

  • f Wild Musa Genomes using

long reads and optical maps

Jean-Marc Aury

jmaury@genoscope.cns.fr @J_M_Aury Banana Genomics, 01/15/2019

slide-2
SLIDE 2

2

Genoscope overview

  • French National Sequencing Center lead by Patrick Wincker,

created in 1997 and part of the CEA since 2007.

  • Provide high-throughput sequencing data to the Academic

community, and carry out in-house genomic projects

  • Focus on biodiversity : de novo sequencing and metagenomic

projects (TaraOceans)

  • But…. it's not enough to just know one individual’s genome.

A single reference genome is not compatible with resequencing approaches http://www.genoscope.cns.fr

Brassica napus (seed rape) Musa acuminata (banana) Quercus robur (oak) Vitis vinifera (grape)

slide-3
SLIDE 3

3

Sequencing capacities

2 Illumina HiSeq 2500 2 MiSeq 2 Illumina HiSeq 4000 6 Oxford Nanopore MkI 1 PromethION 1 Saphyr System 1 Illumina NovaSeq

slide-4
SLIDE 4

4

Genome assembly difficulties

Genome Repeat R1 Repeat R2 Repeat R3 Short reads sequencing Contig 1 Contig 2 Contig 3 Contig 4 Contig 5 Contig graph => Repetitive regions lead to fragmented assemblies and under-estimate repeat content

slide-5
SLIDE 5

5

Genome assembly difficulties

Haplotype1 Short reads sequencing Contig 3 Contig 4 Contig 2 Contig graph => Heterozygous regions lead to fragmented assemblies and cause allelic duplication (over-estimate the size of the haploid genome) Haplotype2 Contig 1 Contig 6 Contig 5 Contig 7

slide-6
SLIDE 6

6

Read Length Matters

=> Yeast genome assembly is resolved when using 30X of 25Kb reads in average

1 contig per chromosome assemblies

slide-7
SLIDE 7

7

Sequencing of plant genomes using the MinION

  • Large scale genomic project focused on Musa genomes
  • Musa spp are essential crops in (sub-)tropical countries, and are

interesting models for studying reticulate evolution

  • Modern species are hybrids genomes
  • In this context, we are currently sequencing 7 banana genomes.
slide-8
SLIDE 8

8

Continuity of current plant genome assemblies

A lot of plant genomes have already been sequenced, but only 6 plant species have an assembly with a contig N50 > 5Mb

2017 2018

http://www.genoscope.cns.fr/genomes

slide-9
SLIDE 9

9

Genome assembly of plant genomes using long and short reads

Musa schizocarpa Musa textilis Musa acuminata

ssp zebrina

Musa acuminata

ssp malaccensis

Musa acuminata

ssp burmannica

Estimated Genome size 587 Mb 700 Mb 530 Mb 530 Mb 530 Mb # flowcells 18 23 46 21 5

  • Cumul. Size

27 Gb 36 Gb 81 Gb 35 Gb 32 Gb N50 24 kb 28 Kb 18 Kb 16 Kb 25 Kb Coverage 51 X 51 X 150 X 66 X 60 X N50 longest 30X 32 kb 36 Kb 32 Kb 27 Kb 30 Kb

So far, 5 Musa have been sequenced with the goal of reaching at least 30X coverage and an N50 at 30Kb

slide-10
SLIDE 10

10

Genome assembly process

Nanopore reads Read subset selection Longest reads (30X) All reads Filtlong reads (30X) Best assembly selection (cumulative size & contig N50) Polishing (Racon x 3 + Pilon x 3) Assembly with Ra and smartdenovo

slide-11
SLIDE 11

11

Genome assembly results

Musa schizocarpa Musa textilis Musa acuminata

ssp zebrina

Musa acuminata

ssp malaccensis

Musa acuminata

ssp burmannica

# contigs 437 608 718 427 704

  • Cumul. Size

527 Mb 601 Mb 510 Mb 477 Mb 481 Mb N50 2.1 Mb 3.2 Mb 2.0 Mb 2.7 Mb 1.9 Mb Max size 12.8 Mb 21.5 Mb 13.1 Mb 16.0 Mb 11.2 Mb

High contiguity of the assemblies, but insufficient to decipher genome organization at the chromosome-level

slide-12
SLIDE 12

12

Bionano data

Organization of nanopore contigs using optical maps

Musa schizocarpa Musa acuminata ssp malaccensis Enzyme BspQI DLE BspQI DLE # of molecules 938,740 1,952,550 1,003,793 357,005 N50 211 Kb 215 Kb 275 Kb 232 Kb Coverage 358X 672X 557X 173X Maps 266 197 252 24 N50 5.1 Mb 28.7 Mb 8.0 Mb 35.0 Mb Cumulative size 565 Mb 643 Mb 571 Mb 469 Mb

Contiguity of DLE maps is 5 to 15 times higher than that of BspQI maps

slide-13
SLIDE 13

13

Hybrid Assembly Process

DLE and BsPQI maps (non haplotype with extend and split) Optical Maps Nanopore Assembly GapChecker (internal process) Polishing (Pilon x 1) Hybrid scaffolding

slide-14
SLIDE 14

14

Chromosome-scale assemblies

Organization of nanopore contigs using optical maps

Bionano Direct Label and Stain (DLS) technology

Musa schizocarpa Musa acuminata ssp malaccensis # scaffolds 227 144

  • Cumul. Size (N’s)

525 Mb (1.5%) 473 Mb (0.8%) N50 36.8 Mb 34.6 Mb Contig N50

(nanopore assembly)

6.5 Mb

(2.1Mb)

8.6 Mb

(2.7Mb)

% chromosomes in ≤3 scaffolds 11 / 11 11 / 11

Hybrid scaffolding generated chromosome scale assemblies and but also improved the contig N50

slide-15
SLIDE 15

15

Chromosome-scale assemblies

Schematic view of chromosome 7 from banana genome assembly

slide-16
SLIDE 16

16

Chromosome-scale assemblies

Comparison of the existing reference with long-read assemblies

Musa sp. Musa acuminata Musa acuminata Musa schizocarpa Reference D’hont et al. This study This study Estimated genome size 523 523 587 # chromosomes 11 11 11 Cumulative size 397,008,016 473,451,791 496,921,565 % of anchored bases 88.06% >94% 94.60% Max size 44,889,171 47,700,946 54,858,060 # of N’s 33,488,183 (8.43%) 2,616,737 (0.58%) 6,816,353 (1.37%) Number of genes 36,542 In progress 32,371 % of anchored genes 91.98% In progress 98.46%

slide-17
SLIDE 17

17

Chromosome-scale assemblies

Comparison of the existing reference with long-read assembly High variability in the centromeric regions => sequences originated from these regions are always difficult to order and orient correctly

https://dnanexus.github.io/dot/

slide-18
SLIDE 18

18

Chromosome-scale assemblies

Comparison of the existing reference with long-read assembly Significant differences in chromosome length (34Mb vs 48Mb for chromosome 9) mainly in centromeres 35 Mb vs 43 Mb for chromosome 6

https://dnanexus.github.io/dot/ https://dnanexus.github.io/dot/

slide-19
SLIDE 19

19

Continuity of current plant genome assemblies

Using Nanopore+Bionano we were able to add four more species with contig N50 > 5Mb

Pahang

  • M. schizocarpa

http://www.genoscope.cns.fr/genomes

slide-20
SLIDE 20

20

Sequencing of the banana genome using the PromethION

Musa schizocarpa Musa schizocarpa Estimated Genome size PromethION MinION # flowcells 1 18

  • Cumul. Size

17.6 Gb 27 Gb N50 26 Kb 24 kb Coverage 34 X 51 X # scaffolds 199 227 Cumulative size 519.5 Mb 525.6 Mb N50 36.8 Mb 36.9 Mb Contig N50 10.0 Mb 6.5 Mb Sequencing Costs ~ $6,000 ~ $16,000

slide-21
SLIDE 21

21

Conclusion

  • Musa schizocarpa assembly and annotation are publicly available

Genoscope website : www.genoscope.cns.fr/plants NCBI database : PRJEB26661

  • Musa acuminata (Pahang) reference genome based on long reads and optical map should be available publicly

in the coming months, gene prediction is under progress

  • PromethION throughput allows sequencing of large genomes
  • Nanopore error rate is acceptable for de novo sequencing projects, still an issue with homopolymers
  • DNA extraction is a key point (quantity and quality) to obtain “ultra-long” reads and generate optical maps
slide-22
SLIDE 22

22

Acknowledgments

  • Genoscope labs
  • Bioinformatic : Benjamin Istace, Stefan Engelen, Caroline

Belser and Marion Dubarry

  • Sequencing lab: Corinne Cruaud, Erwan Denis and Arnaud

Lemainque

  • Angélique D’Hont, Guillaume Martin, Franc-Christophe

Baurens and Jaroslav Dolezel, Eva Hribova.

  • Funding agencies : CEA, Genoscope and France Génomique

R&DBioSeq Team

www.genoscope.cns.fr/rdbioseq jmaury@genoscope.cns.fr @J_M_Aury

slide-23
SLIDE 23

23