Improving genome assemblies, assessing structural variation and trait association using chromosome genomics and Illumina skim genotyping by sequencing David Edwards University of Queensland, Australia Dave.Edwards@uq.edu.au 1
Outline • Chromosome sequencing • SNP discovery • Genotyping by sequencing (skim method) • Validating genome structure
The challenge of genome Technology - Next Generation sequence sequencing
The challenge of genome Technology - Next Generation sequence sequencing Thanks to Roger Hellens, Plant and Food New Zealand
Hexaploid wheat genome http://www.jic.ac.uk/staff/graham-moore/wheat_meiosis.htm 17 billion bases 5
Chromosome sequencing • Isolate individual or groups of chromosomes using flow cytometry • Generate NGS libraries and PE Illumina data • Assemble or map reads to reference genome
Mapping reads to reference genomes 1 2 3 4 5 6 7 8 9 10 11 12 7
Sequencing wheat chromosome arms Ta 7DS Bd 1 Bd 3 www.wheatgenome.info 8 Berkman, et al. , Plant Biotechnology Journal (2011)
7BS/4AL translocation 7DS and 7BL sequence similarity with Brachypodium 9
7BS/4AL translocation • Translocation between Bradi1g49500 and Bradi1g49550 • Intervening 4 genes missing from all assemblies • ~13% genes moved from 7BS to 4AL • 13 genes moved from 4AL to 7BS Berkman et al. (2012) Theoretical and Applied Genetics 3 , 423-432 10
Wheat genome evolution 10,000 50,000 years ago years ago 7A AABB AA AABB AW BB AABBDD 7B DD DD 7D 11
GBrowse http://wheatgenome.info/ Lai et al.(2012) Plant and Cell Physiology 53 , 1-7
Genome sequencing in chickpea Two draft genomes published in 2013 13
Chickpea reference (Kabuli)
Chickpea reference (Kabuli)
Chickpea reference (Kabuli) K8 D8 K3 D3 K5 D5 K = Kabuli D = Desi
Chickpea reference (Kabuli) K8 D8 K3 D3 K5 D5
Chickpea reference (Desi) A 8 3 5
Chromosome sequencing • Sequencing isolated chromosomes identifies misassembles and rearrangements at base pair resolution
SGSautoSNP • Generate a reference • Map variety specific reads to the reference • Call differences between the varieties • At least two reads defining the difference • No conflict within a variety (homozygous genomes) >95% accuracy for canola >93% accuracy for wheat 20
Brassica SNP matrix A 0 Bn 55,716 0 E 57,492 67,676 0 I 27,487 33,874 26,406 0 J 100,933 108,457 86,807 52,377 0 M1 52,541 61,657 43,746 20,655 93,148 0 M51 53,627 69,495 54,071 30,968 93,966 56,190 0 M52 64,088 68,533 63,092 34,656 51,013 63,219 60,793 0 M91 70,214 80,230 57,023 38,612 89,294 67,496 60,932 58,091 0 M2 34,535 38,248 27,954 18,731 41,866 34,073 29,306 27,318 11,944 0 Mu 106,182 121,584 87,536 46,824 192,343 72,205 114,260 130,317 131,155 66,838 0 N 159,608 208,373 146,700 73,345 270,623 139,082 178,653 205,985 215,689 113,928 258,980 0 No 81,073 97,160 86,610 39,263 164,813 81,265 93,250 98,393 97,109 46,546 174,630 252,923 0 S 40,857 42,661 53,786 28,431 92,840 51,584 55,260 60,118 64,493 31,424 101,900 160,234 81,474 0 Sr 65,657 85,317 63,305 38,484 113,199 68,078 3,798 73,578 73,825 35,584 137,597 215,422 115,212 68,231 0 T 124,971 149,974 100,000 51,304 212,272 61,611 132,415 153,887 153,504 82,307 175,304 296,891 213,237 119,697 157,308 0 Tf 57,190 76,556 78,239 39,240 140,978 68,383 59,394 78,257 90,655 41,702 157,441 262,784 125,298 65,430 74,385 194,683 0 Tr 11,193 14,028 12,553 6,760 21,972 12,045 6,624 13,849 16,149 7,794 25,791 39,920 20,127 12,249 8,314 30,468 12,331 0 A Bn E I J M1 M51 M52 M91 M2 Mu N No S Sr T Tf Tr 21
Skim GBS • Determine SNPs by sequencing parents and running SGSautoSNP • Low coverage skim sequence segregating population • Map reads to the reference genome • Call genotype where reads cover previously defined SNP • Impute and clean to define haplotype blocks 22
Genotype calling A A T/C C/A Call genotype of previously predicted SNPs 23
Pre-imputation
After imputation and cleaning
Misplaced contigs in assembly?
Recommend
More recommend