Introduction Pilot Study Implementation Markers New Tools Summary Genotypin ing in in Thousands by sequencing (GT GT-seq): A low cost, high-throughput, targeted SNP genotyping method Nathan Campbell, Stephanie Harmon, Shawn Narum Columbia River Inter-Tribal Fish Commission
Introduction Pilot Study Implementation Markers New Tools Summary What is GT-seq? • Next Gen Sequencing of multiplex PCR amplicons containing SNPs • Genotyping in Thousands by sequencing (GT-seq) • A method of Genotyping by Sequencing for thousands of individuals • Hundreds of loci (specific panels of target loci) • Currently set up for Illumina sequencing • SNP loci are genotyped using the ratio of allele 1 to allele 2 read counts at each locus (similar to RAD genotyping) • Alternative to TaqMan assays • Our lab formerly ran panels of 96 – 192 TaqMan assays for genotyping various fish species • GT-seq can produce the same genotypes generated using TaqMan assays • GT-seq greatly reduces the cost of lab reagents/supplies for genotyping
Introduction Pilot Study Implementation Markers New Tools Summary
Introduction Pilot Study Implementation Markers New Tools Summary
Introduction Pilot Study Implementation Markers New Tools Summary 2,068 samples in one lane 100 Percentage of Genotypes Collected 90 80 70 22 plates of 94 steelhead samples 60 50 157M Raw reads 40 30 5.4 – 8.6M reads per plate 20 96.1% of the samples (1,987) 10 0 genotyped at ≥ 90% of target loci 0.00 0.20 0.40 0.60 0.80 1.00 On-Target Fraction per 96-well plate 100.0 100.0 Percentage of Genotypes Collected Percentage of Genotypes Collected 90.0 90.0 80.0 80.0 70.0 70.0 60.0 60.0 50.0 50.0 40.0 40.0 30.0 30.0 20.0 20.0 10.0 10.0 0.0 0.0 0 25000 50000 75000 100000 125000 150000 0 20000 40000 60000 80000 100000 Individual Raw Reads Individual On-Target Reads
Introduction Pilot Study Implementation Markers New Tools Summary Read distribution among loci 4.0 100 90 3.5 80 3.0 Percentage of On-Target reads 70 % Genotypes Collected 2.5 60 2.0 50 40 1.5 30 1.0 20 0.5 10 0.0 0 192 Target loci
Introduction Pilot Study Implementation Markers New Tools Summary GT-seq Plots GT-seq has very low background signal and excellent heterozygote ratios across loci The graphs below plot allele 1 vs. allele 2 for all GTseq loci. The orange line shows the 1:1 ratio of allele 1 to allele 2 which should be true for all heterozygotes (y=x). The other lines show the cutoff values used in the genotyping script. [Below Red = A1 homozygote; Left of Blue = A2 homozygote; Between Blue or Red & Black = NA; Between Black = Heterozygote; Any Data point below read depth of 10 = NA]; Genotypes are color coded, yellow triangles are “No Calls”. *Both graphs are the same plot zoomed to different scales. GT-seq Genotyping GT-seq Genotyping (zoom) 800 50 45 700 40 600 35 Allele 2 Counts 500 Allele 2 Counts 30 400 25 20 300 15 200 10 100 5 0 0 0 100 200 300 400 500 600 700 800 0 5 10 15 20 25 30 35 40 45 50 Allele 1 Counts Allele 1 Counts
Introduction Pilot Study Implementation Markers New Tools Summary GT-seq Plots Comparison to the same Taqman assays… GT-seq Genotyping TaqMan Genotyping 800 1 0.9 700 0.8 600 Allele 2 (6FAM) fluorescence 0.7 500 Allele 2 Counts 0.6 400 0.5 0.4 300 0.3 200 0.2 100 0.1 0 0 0 100 200 300 400 500 600 700 800 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Allele 1 Counts Allele 1 (VIC) fluorescence
Introduction Pilot Study Implementation Markers New Tools Summary Genotype Accuracy Compared to Taqman A. Unmodified TaqMan Probe sequences as search strings B. 15 search string modifications based on observed variations in sequence data A. B. %Concordant: 99.3% 99.9% #Disconcordant: 133 10 #Concordant: 18474 18813 #GTseq Genotypes: 18768 18988 #TaqMan Genotypes: 19039 19039 #Genotyped by both methods 18607 18824
Introduction Pilot Study Implementation Markers New Tools Summary Genotyping costs for GT-seq $10.00 $160,000 Sequencing Cost per sample: GT-seq GT-seq – $3.98/sample $9.00 Total cost: GT-seq genotyping Sequencing Cost per sample (one HiSeq SR100 lane $140,000 Total cost: 5' exonuclease TaqMan – $16.50/sample $8.00 Total supplies cost of genotyping $120,000 $7.00 $100,000 $6.00 $5.00 $80,000 $4.00 $60,000 $3.00 $40,000 $2.00 $20,000 $1.00 $0.00 $0 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 Individual Samples
Introduction Pilot Study Implementation Markers New Tools Summary Benefits of GT-seq • Open source method • Not a kit; Not proprietary • Fast/simple library preparation • Requires only a 96-well thermal cycler • Simple genotyping pipeline • New faster scripts (raw data to genotypes in less than 1 hour) • Clean data • Low background, high accuracy, high throughput • Generates the same genotypes as TaqMan (99.9% concordant) • Less than 1/4 th the cost ($3.98 / sample)
Introduction Pilot Study Implementation Markers New Tools Summary Applications for GT-seq • Stock Improvement (marker assisted selection) • Create panel of trait related SNPs • Quickly generate GEBVs for large numbers of potential broodstock • Genetic Monitoring • Use large numbers of neutral SNP loci to monitor abundance and dispersal of various stocks within a target species • Large-Scale Parentage • Assign juvenile samples to potential parents in genotype database
Introduction Pilot Study Implementation Markers New Tools Summary GT-seq Panels SAMPLES GENOTYPED BY GT-SEQ • O. mykiss (Steelhead and Rainbow trout) : 192 loci E. tridentatus, 6037 • Recently expanded to 287 loci O. nerka, 6996 • Campbell et al. 2015 O. kisutch, 2839 • O. tshawytscha (Chinook Salmon) : 299 loci • Includes SNPs from TaqMan assays and RAD O. mykiss, 15969 markers • O. kisutch (Coho Salmon) : 258 loci • Includes SNPs from TaqMan assays and RAD markers O. tshawytscha, 73725 • O. nerka (Sockeye Salmon) : 93 loci • All SNPs were converted from previously developed TaqMan assays 105,566 Total samples genotyped at end of 2015 • E. tridentatus (Pacific Lamprey) : 316 loci • All SNP targets are from RAD markers
Introduction Pilot Study Implementation Markers New Tools Summary Lessons Learned • GT-seq is somewhat sensitive to DNA concentration and quality • Low concentration DNA samples (<5 ng/uL) genotype poorly • Dirty DNA extracts okay • Higher concentration is more important than purity 100 90 80 Genotyping Percentage 70 60 50 40 30 20 10 0 0.00 0.01 0.10 1.00 10.00 100.00 1000.00 DNA concentration (ng/uL) Qiagen Chelex
Introduction Pilot Study Implementation Markers New Tools Summary GT-seq target loci • Few limitations for target SNPs • Avoid repetitive sequence (messy amplification) • Avoid duplicated loci • Enough flanking sequence for primer design • Diploid Organisms • Tetraploid genotyping is possible but hasn’t been explored • Most SNPs are viable targets
Introduction Pilot Study Implementation Markers New Tools Summary GT-seq targets from RAD loci • Advantages • Thousands of SNPs to choose from • Summary statistics available (Fst , MAF, etc…) • Samples with known genotypes • Caveats • SNP site must be at position 25 or beyond in RAD sequence • Must have enough flanking sequence to design primers surrounding SNP • Strategy • Gather R1 and R2 sequences from specified RAD loci and create scaffolds • Mask any base ambiguities and design primers flanking target SNP
Introduction Pilot Study Implementation Markers New Tools Summary Designing GT-seq primers for RAD loci • Sample RAD specific sequences from raw fastq data (100) • https://github.com/GTseq • Collect coordinates from R1 sequences and gather corresponding R2 sequences for scaffolding • Export masked consensus sequences for primer design (Primer3)
Introduction Pilot Study Implementation Markers New Tools Summary New GT-seq tools • Faster barcode splitting script • Python script using multiple processors • Fewer compute resources for faster barcode splitting (20 min) • Faster/expanded genotyping script • Perl script (100x faster than original; 6 min) • Generates summary statistics for each individual sample • Allows for allele corrections • GTseq_SummaryFigures • Python script using the MatPlotLib module to generate summary figures for any GT-seq library • Outputs scatter plots for each SNP locus
Recommend
More recommend