UHT Sequencing Course Large-scale genotyping Christian Iseli January 2009
Overview Introduction Examples Base calling method and parameters Reads filtering Reads classification Detailed alignment Alignments analysis Output generation Jun 04 Jun 04
Introduction Basic problem: distinguish polymorphism from sequencing error Use quality “measures” Use redundancy Use knowledge about data source Jun 04 Jun 04
Examples Retinitis pigmentosa Hypertrophic cardiomiopathy HSA 21q genotyping Jun 04 Jun 04
Retinitis pigmentosa Inherited eye disease Linkage analysis PRPF31 mutation Incomplete penetrance Attempt sequencing Jun 04 Jun 04
PRPF31 example c.1374+654C>G 13 14 Jun 04 Jun 04
PRPF31 example Jun 04 Jun 04
PRPF31 example, zoom Jun 04 Jun 04
PRPF31 example, MFA Jun 04 Jun 04
Examples Retinitis pigmentosa Hypertrophic cardiomiopathy HSA 21q genotyping Jun 04 Jun 04
Hypertrophic cardiomiopathy Small collection of known genes PCR amplify gene pieces Sequence Jun 04 Jun 04
Small deletion Jun 04 Jun 04
Examples Retinitis pigmentosa Hypertrophic cardiomiopathy HSA 21q genotyping Jun 04 Jun 04
“Exome” sequencing Extract selected genomic parts Sequence collected pieces Jun 04 Jun 04
Coverage on HsA 21q Jun 04 Jun 04
Coverage detail HsA 21q Jun 04 Jun 04
HsA 21q HAPMAP NA12782 Jun 04 Jun 04
Base calling Rolexa FastQ ... Jun 04 Jun 04
Reads filtering Entropy Quality values (Position) Jun 04 Jun 04
Filtering example Rolexa base calling Filter reads for length and ambiguity • ACGTU -> 1 • KMRSWY -> 2 • BDHV -> 3 • N -> 4 – Minimum length 20 – Maximum ambiguity 81 Jun 04 Jun 04
Read classification Use fetchGWI against whole genome – Single exact matches -> U (unique) – Multiple exact matches -> R (repeat) – No exact match -> M (missed) Jun 04 Jun 04
Detailed alignment Use M reads Split region of interest in chunks (eg 300 bp + 40 bp overlap) Find reads with identical 12-mer Global alignment of reads vs chunks Filter alignments, retain “good” set Eg: maximum 3 mismatches Jun 04 Jun 04
Alignment analysis Map retained reads to full genome Remove set with better maps outside region of interest Jun 04 Jun 04
Practical alignment analysis 1 12-mers U R M Jun 04 Jun 04
Practical alignment analysis 2 12-mers U R M Jun 04 Jun 04
Output generation Create multiple sequence alignment Prepare text output in column format Call SNPs (alleles, coverage, etc.) Jun 04 Jun 04
Results in CSV files Jun 04 Jun 04
Detailed view in UCSC Jun 04 Jun 04
Results in MFA Jun 04 Jun 04
Script srMap Needs fetch.conf, input chunk and genomic coordinates Produces MFA and CSV output Jun 04 Jun 04
Script prepareJobs Needs genomic coordinates Prepares scripts to process each chunk using srMap Jun 04 Jun 04
Script local2genomic Needs CSV file produced by srMap Adds genomic coordinates Jun 04 Jun 04
Script collateCsv Needs CSV file produced by local2genomic Merges chunks back together Jun 04 Jun 04
Script matchGenotype Needs CSV file produced by srMap, local2genomic, or collateCsv Needs genotype file, eg genotypes_chrMT_YRI_r24_nr.b36_fwd.txt.gz Compares detected SNPs with reference and produces CSV output Jun 04 Jun 04
Exercise data source http://www.illumina.com/HumanGenome/ http://ftp.hapmap.org/genotypes/latest/fwd_strand/non-redundant/ ftp://ftp.ncbi.nih.gov:21/pub/TraceDB/ShortRead/SRA000271/fastq Locally in UHTS_SNP subdirectory of student accounts Jun 04 Jun 04
Exercise 1 Analyze Illumina reads from NA18507 Confirm HapMap genotype for the mitochondrial genome Choose subsets of the reads and see how coverage and SNPs are affected (confirm other genomic regions of interest) Jun 04 Jun 04
Exercise 2 Analyze paired Illumina reads from NA18507 Look at the mitochondrial DNA and explain the apparent gap near coordinates 1-120 Jun 04 Jun 04
Exercise 3 Analyze paired Illumina reads from NA18507 Can you confirm homozygous 1Kb deletion on chromosome 20 at 61 Mb? Jun 04 Jun 04
Exercise 4 Analyze paired Illumina reads from NA18507 Can you confirm a complex re-arrangement on chromosome 5 What do you expect to see in the pairs? Jun 04 Jun 04
Recommend
More recommend