Master’s Thesis Genome Assembly: Scaffolding Guided by Related Genomes Runar Furenes Department of Informatics University of Oslo 2013-06-05 Scaffolding Guided by Related Genomes 1 / 42
Presentation overview Introduction Problem specification Methods Materials Results Discussion Questions Scaffolding Guided by Related Genomes 2 / 42
Presentation overview Introduction Problem specification Methods Materials Results Discussion Questions Scaffolding Guided by Related Genomes 2 / 42
Presentation overview Introduction Problem specification Methods Materials Results Discussion Questions Scaffolding Guided by Related Genomes 2 / 42
Presentation overview Introduction Problem specification Methods Materials Results Discussion Questions Scaffolding Guided by Related Genomes 2 / 42
Presentation overview Introduction Problem specification Methods Materials Results Discussion Questions Scaffolding Guided by Related Genomes 2 / 42
Presentation overview Introduction Problem specification Methods Materials Results Discussion Questions Scaffolding Guided by Related Genomes 2 / 42
Presentation overview Introduction Problem specification Methods Materials Results Discussion Questions Scaffolding Guided by Related Genomes 2 / 42
Introduction Introduction Scaffolding Guided by Related Genomes 3 / 42
Introduction Genome assembly From biological DNA to complete sequenced genome ACTCGCA GGCATGCA GGCTAAGCT CGGATTACC Scaffolding Guided by Related Genomes 4 / 42
Introduction Genome assembly From biological DNA to complete sequenced genome ACTCGCA GGCATGCA GGCTAAGCT CGGATTACC Scaffolding Guided by Related Genomes 4 / 42
Introduction Genome assembly Scaffolding A scaffold consists of at least two contigs Each contig within a scaffold is ordered and oriented A gap estimate is provided for each pair of contigs Mate pairs are commonly used in scaffolding Scaffolding Guided by Related Genomes 5 / 42
Introduction Genome assembly Scaffolding A scaffold consists of at least two contigs Each contig within a scaffold is ordered and oriented A gap estimate is provided for each pair of contigs Mate pairs are commonly used in scaffolding Scaffolding Guided by Related Genomes 5 / 42
Introduction Genome assembly Scaffolding A scaffold consists of at least two contigs Each contig within a scaffold is ordered and oriented A gap estimate is provided for each pair of contigs Mate pairs are commonly used in scaffolding Scaffolding Guided by Related Genomes 5 / 42
Introduction Genome assembly Scaffolding A scaffold consists of at least two contigs Each contig within a scaffold is ordered and oriented A gap estimate is provided for each pair of contigs Mate pairs are commonly used in scaffolding Scaffolding Guided by Related Genomes 5 / 42
Introduction Motivation Motivation for the thesis Scaffolding is an important step in the process of genome assembly Scaffolding often requires time consuming and expensive lab work Using genomes related to the target genome may make this easier The continuously growth of fully sequenced genomes available makes this increasingly relevant Scaffolding Guided by Related Genomes 6 / 42
Introduction Motivation Motivation for the thesis Scaffolding is an important step in the process of genome assembly Scaffolding often requires time consuming and expensive lab work Using genomes related to the target genome may make this easier The continuously growth of fully sequenced genomes available makes this increasingly relevant Scaffolding Guided by Related Genomes 6 / 42
Introduction Motivation Motivation for the thesis Scaffolding is an important step in the process of genome assembly Scaffolding often requires time consuming and expensive lab work Using genomes related to the target genome may make this easier The continuously growth of fully sequenced genomes available makes this increasingly relevant Scaffolding Guided by Related Genomes 6 / 42
Introduction Motivation Motivation for the thesis Scaffolding is an important step in the process of genome assembly Scaffolding often requires time consuming and expensive lab work Using genomes related to the target genome may make this easier The continuously growth of fully sequenced genomes available makes this increasingly relevant Scaffolding Guided by Related Genomes 6 / 42
Introduction Hypotheses Hypotheses Related genomes can be helpful in scaffolding Many such related genomes can be preferable to a few It can be beneficial to use only the ends of contigs Scaffolding Guided by Related Genomes 7 / 42
Introduction Hypotheses Hypotheses Related genomes can be helpful in scaffolding Many such related genomes can be preferable to a few It can be beneficial to use only the ends of contigs Scaffolding Guided by Related Genomes 7 / 42
Introduction Hypotheses Hypotheses Related genomes can be helpful in scaffolding Many such related genomes can be preferable to a few It can be beneficial to use only the ends of contigs Scaffolding Guided by Related Genomes 7 / 42
Problem specification Problem specification Scaffolding Guided by Related Genomes 8 / 42
Problem specification Scaffolding problem specific to this thesis Using related genomes in a scaffolding process Related genomes may have nucleotide sequence similarities Can contigs be scaffolded with high accuracy using one or more such related genomes? More distant related genomes have more sequence similarities on a protein level than on a nucleotide level Can the same process run on a protein level instead? Scaffolding Guided by Related Genomes 9 / 42
Problem specification Scaffolding problem specific to this thesis Using related genomes in a scaffolding process Related genomes may have nucleotide sequence similarities Can contigs be scaffolded with high accuracy using one or more such related genomes? More distant related genomes have more sequence similarities on a protein level than on a nucleotide level Can the same process run on a protein level instead? Scaffolding Guided by Related Genomes 9 / 42
Problem specification Scaffolding problem specific to this thesis Using related genomes in a scaffolding process Related genomes may have nucleotide sequence similarities Can contigs be scaffolded with high accuracy using one or more such related genomes? More distant related genomes have more sequence similarities on a protein level than on a nucleotide level Can the same process run on a protein level instead? Scaffolding Guided by Related Genomes 9 / 42
Problem specification Scaffolding problem specific to this thesis Using related genomes in a scaffolding process Related genomes may have nucleotide sequence similarities Can contigs be scaffolded with high accuracy using one or more such related genomes? More distant related genomes have more sequence similarities on a protein level than on a nucleotide level Can the same process run on a protein level instead? Scaffolding Guided by Related Genomes 9 / 42
Problem specification Earlier research on this subject Other works Existing tools: ABACAS 1 GRASS 2 Can use additional information such as reference genome(s) in their scaffolding algorithms. 1 Assefa et al. 2009 2 Gritsenko et al. 2012 Scaffolding Guided by Related Genomes 10 / 42
Methods Methods Scaffolding Guided by Related Genomes 11 / 42
Methods Overview Proposed method: GuideScaff GuideScaff is a pipeline producing scaffolds from contigs and guiding genomes. Main steps: Use contigs or contig ends from an assembly Match contigs with guiding genomes Use agreeing matches to create scaffolds Evaluate scaffolds with target genome if available Scaffolding Guided by Related Genomes 12 / 42
Methods Overview Proposed method: GuideScaff GuideScaff is a pipeline producing scaffolds from contigs and guiding genomes. Main steps: Use contigs or contig ends from an assembly Match contigs with guiding genomes Use agreeing matches to create scaffolds Evaluate scaffolds with target genome if available Scaffolding Guided by Related Genomes 12 / 42
Methods Overview Proposed method: GuideScaff GuideScaff is a pipeline producing scaffolds from contigs and guiding genomes. Main steps: Use contigs or contig ends from an assembly Match contigs with guiding genomes Use agreeing matches to create scaffolds Evaluate scaffolds with target genome if available Scaffolding Guided by Related Genomes 12 / 42
Methods Overview Proposed method: GuideScaff GuideScaff is a pipeline producing scaffolds from contigs and guiding genomes. Main steps: Use contigs or contig ends from an assembly Match contigs with guiding genomes Use agreeing matches to create scaffolds Evaluate scaffolds with target genome if available Scaffolding Guided by Related Genomes 12 / 42
Methods Overview Contig end extraction Contigs are assumed to be more or less correct Scaffold consists of entire contigs Contigs can map to multiple locations in a genome Using only contig ends could make it easier Scaffolding Guided by Related Genomes 13 / 42
Methods Overview Contig end extraction Contigs are assumed to be more or less correct Scaffold consists of entire contigs Contigs can map to multiple locations in a genome Using only contig ends could make it easier Scaffolding Guided by Related Genomes 13 / 42
Recommend
More recommend