the draft nuclear genome assembly of eucalyptus
play

The draft nuclear genome assembly of Eucalyptus paucifmora : a - PDF document

GigaScience , 9, 2020, 112 doi: 10.1093/gigascience/giz160 Data Note Downloaded from https://academic.oup.com/gigascience/article-abstract/9/1/giz160/5694103 by Columbia University user on 26 February 2020 DATA NOTE The draft nuclear genome


  1. GigaScience , 9, 2020, 1–12 doi: 10.1093/gigascience/giz160 Data Note Downloaded from https://academic.oup.com/gigascience/article-abstract/9/1/giz160/5694103 by Columbia University user on 26 February 2020 DATA NOTE The draft nuclear genome assembly of Eucalyptus paucifmora : a pipeline for comparing de novo assemblies 1,*, † , Ashutosh Das 1,2, † , David Kainer 1 , Weiwen Wang 1,3 , Alejandro Morales-Suarez 4 , Miriam Schalamun 1 and Robert Lanfear 1,* Benjamin Schwessinger 1 Research School of Biology, the Australian National University. 134 Linnaeus Way, Acton, Canberra, ACT, 2601, Australia; 2 Department of Genetics and Animal Breeding, Faculty of Veterinary Medicine, Chittagong Veterinary and Animal Sciences University. Khulshi, Chattogram, 4225, Bangladesh; 3 Institute of Applied Genetics and Cell Biology, University of Natural Resources and Life Sciences. Muthgasse 18, Vienna, 1190 Wien, Austria and 4 Department of Biological Sciences, Macquarie University.Building 6SR (E8B), 6 Science Rd, Sydney, NSW, 2109, Australia ∗ Correspondence address. Weiwen Wang, Research School of Biology, the Australian National University. 134 Linnaeus Way, Acton, Canberra, ACT, 2601, Australia. E-mail: wei.wang@anu.edu.au http://orcid.org/0000-0001-9319-450X; Robert Lanfear, esearch School of Biology, the Australian National University. 134 Linnaeus Way, Acton, Canberra, ACT, 2601, Australia. E-mail: rob.lanfear@anu.edu.au http://orcid.org/0000-0002-1140-2596 † Equal contribution. Abstract Background: Eucalyptus paucifmora (the snow gum) is a long-lived tree with high economic and ecological importance. Currently, little genomic information for E. paucifmora is available. Here, we sequentially assemble the genome of Eucalyptus paucifmora with different methods, and combine multiple existing and novel approaches to help to select the best genome assembly. Findings: We generated high coverage of long- (Nanopore, 174 × ) and short- (Illumina, 228 × ) read data from a single E. paucifmora individual and compared assemblies from 5 assemblers (Canu, SMARTdenovo, Flye, Marvel, and MaSuRCA) with different read lengths (1 and 35 kb minimum read length). A key component of our approach is to keep a randomly selected collection of ∼ 10% of both long and short reads separated from the assemblies to use as a validation set for assessing assemblies. Using this validation set along with a range of existing tools, we compared the assemblies in 8 ways: contig N50, BUSCO scores, LAI (long terminal repeat assembly index) scores, assembly ploidy, base-level error rate, CGAL (computing genome assembly likelihoods) scores, structural variation, and genome sequence similarity. Our result showed that MaSuRCA generated the best assembly, which is 594.87 Mb in size, with a contig N50 of 3.23 Mb, and an estimated error rate of ∼ 0.006 errors per base. Conclusions: We report a draft genome of E. paucifmora , which will be a valuable resource for further genomic studies of eucalypts. The approaches for assessing and comparing genomes should help in assessing and choosing among many potential genome assemblies from a single dataset. Keywords: long-read assembly; nanopore sequencing; hybrid assembly; genome assessment; assembly comparison; Eucalyptus paucifmora ; haplotig separation; genome polishing Received: 25 October 2019; Revised: 19 November 2019; Accepted: 2 December 2019 � The Author(s) 2020. Published by Oxford University Press. This is an Open Access article distributed under the terms of the Creative Commons C Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. 1

Recommend


More recommend