uht sequencing course large scale genotyping
play

UHT Sequencing Course Large-scale genotyping Christian Iseli - PowerPoint PPT Presentation

UHT Sequencing Course Large-scale genotyping Christian Iseli January 2009 Overview Introduction Examples Base calling method and parameters Reads filtering Reads classification Detailed alignment Alignments analysis Output generation


  1. UHT Sequencing Course Large-scale genotyping Christian Iseli January 2009

  2. Overview Introduction Examples Base calling method and parameters Reads filtering Reads classification Detailed alignment Alignments analysis Output generation Jun 04 Jun 04

  3. Introduction Basic problem: distinguish polymorphism from sequencing error Use quality “measures” Use redundancy Use knowledge about data source Jun 04 Jun 04

  4. Examples Retinitis pigmentosa Hypertrophic cardiomiopathy HSA 21q genotyping Jun 04 Jun 04

  5. Retinitis pigmentosa Inherited eye disease Linkage analysis PRPF31 mutation Incomplete penetrance Attempt sequencing Jun 04 Jun 04

  6. PRPF31 example c.1374+654C>G 13 14 Jun 04 Jun 04

  7. PRPF31 example Jun 04 Jun 04

  8. PRPF31 example, zoom Jun 04 Jun 04

  9. PRPF31 example, MFA Jun 04 Jun 04

  10. Examples Retinitis pigmentosa Hypertrophic cardiomiopathy HSA 21q genotyping Jun 04 Jun 04

  11. Hypertrophic cardiomiopathy Small collection of known genes PCR amplify gene pieces Sequence Jun 04 Jun 04

  12. Small deletion Jun 04 Jun 04

  13. Examples Retinitis pigmentosa Hypertrophic cardiomiopathy HSA 21q genotyping Jun 04 Jun 04

  14. “Exome” sequencing Extract selected genomic parts Sequence collected pieces Jun 04 Jun 04

  15. Coverage on HsA 21q Jun 04 Jun 04

  16. Coverage detail HsA 21q Jun 04 Jun 04

  17. HsA 21q HAPMAP NA12782 Jun 04 Jun 04

  18. Base calling Rolexa FastQ ... Jun 04 Jun 04

  19. Reads filtering Entropy Quality values (Position) Jun 04 Jun 04

  20. Filtering example Rolexa base calling Filter reads for length and ambiguity • ACGTU -> 1 • KMRSWY -> 2 • BDHV -> 3 • N -> 4 – Minimum length 20 – Maximum ambiguity 81 Jun 04 Jun 04

  21. Read classification Use fetchGWI against whole genome – Single exact matches -> U (unique) – Multiple exact matches -> R (repeat) – No exact match -> M (missed) Jun 04 Jun 04

  22. Detailed alignment Use M reads Split region of interest in chunks (eg 300 bp + 40 bp overlap) Find reads with identical 12-mer Global alignment of reads vs chunks Filter alignments, retain “good” set Eg: maximum 3 mismatches Jun 04 Jun 04

  23. Alignment analysis Map retained reads to full genome Remove set with better maps outside region of interest Jun 04 Jun 04

  24. Practical alignment analysis 1 12-mers U R M Jun 04 Jun 04

  25. Practical alignment analysis 2 12-mers U R M Jun 04 Jun 04

  26. Output generation Create multiple sequence alignment Prepare text output in column format Call SNPs (alleles, coverage, etc.) Jun 04 Jun 04

  27. Results in CSV files Jun 04 Jun 04

  28. Detailed view in UCSC Jun 04 Jun 04

  29. Results in MFA Jun 04 Jun 04

  30. Script srMap Needs fetch.conf, input chunk and genomic coordinates Produces MFA and CSV output Jun 04 Jun 04

  31. Script prepareJobs Needs genomic coordinates Prepares scripts to process each chunk using srMap Jun 04 Jun 04

  32. Script local2genomic Needs CSV file produced by srMap Adds genomic coordinates Jun 04 Jun 04

  33. Script collateCsv Needs CSV file produced by local2genomic Merges chunks back together Jun 04 Jun 04

  34. Script matchGenotype Needs CSV file produced by srMap, local2genomic, or collateCsv Needs genotype file, eg genotypes_chrMT_YRI_r24_nr.b36_fwd.txt.gz Compares detected SNPs with reference and produces CSV output Jun 04 Jun 04

  35. Exercise data source http://www.illumina.com/HumanGenome/ http://ftp.hapmap.org/genotypes/latest/fwd_strand/non-redundant/ ftp://ftp.ncbi.nih.gov:21/pub/TraceDB/ShortRead/SRA000271/fastq Locally in UHTS_SNP subdirectory of student accounts Jun 04 Jun 04

  36. Exercise 1 Analyze Illumina reads from NA18507 Confirm HapMap genotype for the mitochondrial genome Choose subsets of the reads and see how coverage and SNPs are affected (confirm other genomic regions of interest) Jun 04 Jun 04

  37. Exercise 2 Analyze paired Illumina reads from NA18507 Look at the mitochondrial DNA and explain the apparent gap near coordinates 1-120 Jun 04 Jun 04

  38. Exercise 3 Analyze paired Illumina reads from NA18507 Can you confirm homozygous 1Kb deletion on chromosome 20 at 61 Mb? Jun 04 Jun 04

  39. Exercise 4 Analyze paired Illumina reads from NA18507 Can you confirm a complex re-arrangement on chromosome 5 What do you expect to see in the pairs? Jun 04 Jun 04

Recommend


More recommend