library building bioinformatic pipeline day 1
play

Library Building & Bioinformatic Pipeline DAY 1 Building a - PDF document

3/29/2017 Library Building & Bioinformatic Pipeline DAY 1 Building a Library Step one, build a plate with your samples to the proper concentrations Excel Spreadsheet logs the library FL_148_NC15, total DNA 200ng, in 26 uL 1 2


  1. 3/29/2017 Library Building & Bioinformatic Pipeline DAY 1 Building a Library • Step one, build a plate with your samples to the proper concentrations • Excel Spreadsheet logs the library FL_148_NC15, total DNA 200ng, in 26 uL 1 2 3 4 5 6 7 8 9 10 11 12 A B C D E F G H 1

  2. 3/29/2017 Digest equal amounts Extract high quality of DNA from each genomic DNA from Ligate unique barcode individual (ideally individuals (Qiagen Kits), and general Y-adapter >200ng) with spec DNA to each sample. SphI/MluCI on 96-well concentrations plate Purify Pool all individual samples into one tube, Then PCR 8x (PCR BioAnalyze size-select on Pippin indices in at this step) Prep for 300- 500bp Purify Burford Reiskind et al. 2016 Mol. Ecol. Resources Digest equal amounts Extract high quality of DNA from each genomic DNA from Ligate unique barcode individual (ideally individuals (Qiagen Kits), and general Y-adapter >200ng) with spec DNA to each sample. SphI/MluCI on 96-well concentrations plate Purify Pool all individual samples into one tube, Then PCR 8x (PCR BioAnalyze size-select on Pippin indices in at this step) Prep for 300- 500bp Purify Burford Reiskind et al. 2016 Mol. Ecol. Resources 2

  3. 3/29/2017 • 96 well plate, each well holds 200ng of 1 individual’s DNA • First digest for 3 hours – SphI & MlucI, cutter buffer, template • Ligation attach P1 and P2 adapters to the sticky ends, P1 has the barcode (48 in total), and illumina platform binding sequemce well A1: FL_148_NC15; barcode ATGGT 1 2 3 4 5 6 7 8 9 10 11 12 A B C D DIGESTION LIGATION E F G H 3

  4. 3/29/2017 Digest equal amounts Extract high quality of DNA from each genomic DNA from Ligate unique barcode individual (ideally individuals (Qiagen Kits), and general Y-adapter >200ng) with spec DNA to each sample. SphI/MluCI on 96-well concentrations plate Purify Pool all individual samples into one tube, Then PCR 8x (PCR BioAnalyze size-select on Pippin indices in at this step) Prep for 300- 500bp Purify 1 2 3 4 5 6 7 8 9 10 11 12 A B C D E F G H • Pool 48 individuals into one tube • Size select for the size range of the platform, hiseq target template of 200bps, miseq target template of 400 bps. 4

  5. 3/29/2017 Digest equal amounts Extract high quality of DNA from each genomic DNA from Ligate unique barcode individual (ideally individuals (Qiagen Kits), and general Y-adapter >200ng) with spec DNA to each sample. SphI/MluCI on 96-well concentrations plate Purify Pool all individual samples into one tube, Then PCR 8x (PCR BioAnalyze size-select on Pippin indices in at this step) Prep for 300- 500bp Purify Illumina Hi-seq • https://www.youtube.com/watch?v=9YxExTS wgPM&t=30s 5

  6. 3/29/2017 Bioinformatic Pipeline QC, demultiplex from Illumina & filter in STACKS ( process_radtags ) Align to reference genome with Bowtie2 or DeNovo if there is no genome Run denovo or ref pipeline Bining reads and SNP detection with STACKS Population Measures Detect outlier loci (GenePop, Structure, (Bayescan, Lositan, GeneLand) DAPC) What happens next?!? First quality check (QC) and processing sequences before STACKS run • If more than 1 library is pooled? – Illumina platform will de-multiplex the libraries based on the index that is in the reverse PCR primer • Run a fastqc analysis to check phred scores on sequences in general (helps determine how the library ran, and whether you need to trim sequences – phred scores = 30 and above) • Need fastq , fasta, or BAM files to run through STACKS • Use Unix terminal commands to interface with STACKS and mysql to create STAOKS output databases • Step 1: process radtags • Step 2: rename folders barcode to individual’s name • Step 3: STACKS pipeline (either denovo or with a reference genome ) • Step 4: Population pipeline 6

  7. 3/29/2017 Quality Control • FastqX Toolkit - http://hannonlab.cshl.edu /fastx_toolkit/ • Can determine if there are problem areas within the sequence • Toolkit gives you different scripts to create – The fastqc file below, and to trim your sequences, etc. etc. process_radtag • All the sequences for all 48 individuals of Library 4! FL_LIB4_S1.fastq.qz • Need to de-multiplex them into individual folders based on their individual barcode in you P1 adapter • STACKS website:- f, -o, -q, -r, -b, sample_AGTCT.fq.qz --inline_null, --renz_1, -i, -c, -t sample_AGTGT.fq.qz • Result in output folders like this : …to the last barcode used 7

  8. 3/29/2017 Renaming your files • From barcode to the associated individual name sample_AGTCT.fq.qz F148_NC_15.fq.gz sample_AGTGT.fq.qz F34_Tx_15.fq.gz …to the last barcode used …to the last individual • At this step you can combine several libraries STACKS pipeline • IF you have a reference genome, align the reads post process_radtag to the reference genome using Bowtie2 or another program • This generates BAM files that are used in the ref_map.pl • Step 1: Create a mysql data base: • Step 2: Create an output folder for STACKS • Step 3: Run the denovo_map.pl 8

  9. 3/29/2017 STACKS output STACKS F148_NC_15.alleles.tsv F148_NC_15.tags.tsv F148_NC_15.matches.tsv F148_NC_15.snps.tsv • Step 1: open the stacks database to confirm that you created loci and have SNP data. • Step 2: setup the population pipeline Population pipeline • For our purposes, this is another filtering step, but it can also be used to gain population summaries • Create a population map text file • Run the population pipeline using unix command line: • Parameters & output files 9

  10. 3/29/2017 Library Specs Recruitment season of 2014 • Number of populations = (5 NC 2 TX) • Number of loci = ? • Number of loci for the whitelist =? • Number post population pipeline = ? Recruitment season of 2015 • Number of populations (3 NC 1 TX) with N = 136 individuals • Number of loci with 1 to 2 SNPs = 104,252 • Number of loci in the whitelist (20 to 140 individuals stack depth of 10) = 51,090 • Number post-population pipeline = 12,294 loci (85% of individuals within a population share the loci) – PLINK, genepop and structure files created 10

Recommend


More recommend