Searching Sequence databases 1: Searching Sequence databases 1: - PowerPoint PPT Presentation

Searching Sequence databases 1: Searching Sequence databases 1: Blast Blast

The Central dogma of Biology The Central dogma of Biology

Protein translation Protein translation gene DNA Transcription mRNA Translation ATG CGT AAT TCA TGA ATG M R N S *

Blast Variants Blast Variants What is a 6 frame translation of a nucleotide sequence? ß blastn ß blastn: nucleotide versus nucleotide : nucleotide versus nucleotide ß blastp ß blastp: protein versus protein : protein versus protein ß blastx ß blastx: nucleotide query, protein database : nucleotide query, protein database ß tblastn ß tblastn: protein query, : protein query, nt nt database database ß tblastx ß tblastx: nucleotide query, nucleotide : nucleotide query, nucleotide database, alignments at a protein level. database, alignments at a protein level.

Homology versus similarity Homology versus similarity ß Sequence alignment tools compute similarity ß Sequence alignment tools compute similarity ß Two sequences are homologous if they shared a ß Two sequences are homologous if they shared a common evolutionary ancestor. common evolutionary ancestor. ß Orthologous Orthologous if the pair was created in a speciation if the pair was created in a speciation ß event event ß Paralogous Paralogous if the pair was created by a gene if the pair was created by a gene ß duplication duplication ß Homology does not always imply sequence ß Homology does not always imply sequence similarity. Ancient protein families might be similarity. Ancient protein families might be homologous, but greatly diverged. The score homologous, but greatly diverged. The score function must reflect this. function must reflect this.

Scoring Function Scoring Function ß For DNA, we worked with a simple ß For DNA, we worked with a simple match/mismatch criteria. match/mismatch criteria. ß Purines Purines (AG) & (AG) & Pyrimidines Pyrimidines (CT) (CT) ß ß Transitions (nucleotide substitution within a Transitions (nucleotide substitution within a ß group) are more likely than transversions transversions. . group) are more likely than ß When aligning When aligning cDNA cDNA, the third base , the third base ß substitution is more likely, then other substitution is more likely, then other positions. positions. ß Q: Can you devise an algorithm to score ß Q: Can you devise an algorithm to score appropriately? appropriately?

Scoring proteins Scoring proteins ß Scoring protein sequence alignments is a much more Scoring protein sequence alignments is a much more ß complex task than scoring DNA complex task than scoring DNA ß Not all substitutions are equal ß Not all substitutions are equal ß Problem was first worked on by Problem was first worked on by Pauling Pauling and and ß collaborators collaborators ß In the 1970s, In the 1970s, Dayhoff Dayhoff created the first similarity created the first similarity ß matrices. matrices. ß “ ß “One size does not fit all One size does not fit all” ” ß Homologous proteins which are evolutionarily close should ß Homologous proteins which are evolutionarily close should be scored differently than proteins that are evolutionarily be scored differently than proteins that are evolutionarily distant distant ß Different proteins might evolve at different rates and we need ß Different proteins might evolve at different rates and we need to normalize for that to normalize for that

PAM1 matrix PAM1 matrix ß Align many proteins that are very similar ß Align many proteins that are very similar ß Is this a problem? Is this a problem? ß ß PAM1 distance is the probability of a ß PAM1 distance is the probability of a substitution when 1% of the residues have substitution when 1% of the residues have changed changed ß Estimate the frequency ß Estimate the frequency P P ab of every pair of ab of every pair of substitions a,b a,b substitions ß S(a,b) = log ß S(a,b) = log 10 (P P ab /P P a P b ) = log10(P P b /P P b ) 10 ( ab / a P b ) = log10( |a / b ) b|a

PAM 1 PAM 1

Searching Sequence databases 1: Searching Sequence databases 1: - PowerPoint PPT Presentation

Searching Sequence databases 1: Searching Sequence databases 1: Blast Blast The Central dogma of Biology The Central dogma of Biology Protein translation Protein translation gene DNA Transcription mRNA Translation ATG CGT AAT TCA

Searching Sequence databases 1: Searching Sequence databases 1: Blast Blast Query: Query:

Protein Sequence Analysis Protein Sequence Analysis Protein sequence motifs Protein sequence

Creating Databases and Tables Introduction to Databases in Python Creating Databases

Inductive Inductive Inductive Inductive Databases Databases Databases Databases and

Lecture 11: Persistent Memory Databases 1 / 71 Persistent Memory Databases Recap

Outline Searching Computers Computers Computers Topic 2: Searching Topic 2: Searching Topic

SEQUENCE ANALYSIS The term " sequence analysis " in biology implies subjecting a DNA or

Database searching Using pairwise alignments to search databases for similar sequences Query

Module 3: Creating and Managing Databases Overview Creating Databases Creating

Chapter 5 Searching and Binary Search Trees 5.1 Searching sequence The purpose of searching :

Sequence to Sequence models: Attention Models 1 Sequence-to-sequence modelling Problem:

Sequence to Sequence models: Attention Models 1 Sequence-to-sequence modelling Problem:

Sequence to Sequence models: Connectionist Temporal Classification 1 Sequence-to-sequence

Searching in speech Language and Keyword searching in OSCAR Language and Computers Computers

Linguistics 384: Language and Computers Operators Searching the web Topic 2: Searching

GEMS/Food Databases and GEMS/Food Databases and GEMS/Food Databases and in the Food Supply

On the Cycle Structures of Hypergraphs Jianfang Wang Academy of Mathematics and System Science,

Multiple Alignments and Phylogenies Mark Voorhies 3/31/2011 Mark Voorhies Multiple Alignments

Map the following onto this image. These are kind of imprecise arrows but I thought thinking

The Least Spanning Area of a Knot and the Optimal Bounding Chain Problem Nathan M. Dunfield

Shortest Non-trivial Cycles in Directed and Undirected Surface Graphs Kyle Fox University of

Fundamentals of Evolution Session 6 - 2018 Bayesian phylogenetics & big trees 1 Recap of

INAF-Astronomical Observatory of Padova III. Evolution of the ejecta Luca Zampieri - Supernovae,

Interspecies gene function prediction using semantic similarity Guoxian Yu*, Wei Luo, Guangyuan

Sambuz

Useful Links

Newsletter

Mail Us

Searching Sequence databases 1: Searching Sequence databases 1: - PowerPoint PPT Presentation

Searching Sequence databases 1: Searching Sequence databases 1: Blast Blast The Central dogma of Biology The Central dogma of Biology Protein translation Protein translation gene DNA Transcription mRNA Translation ATG CGT AAT TCA

Searching Sequence databases 1: Searching Sequence databases 1: Blast Blast Query: Query:

Protein Sequence Analysis Protein Sequence Analysis Protein sequence motifs Protein sequence

Creating Databases and Tables Introduction to Databases in Python Creating Databases

Inductive Inductive Inductive Inductive Databases Databases Databases Databases and

Lecture 11: Persistent Memory Databases 1 / 71 Persistent Memory Databases Recap

Outline Searching Computers Computers Computers Topic 2: Searching Topic 2: Searching Topic

SEQUENCE ANALYSIS The term &quot; sequence analysis &quot; in biology implies subjecting a DNA or

Database searching Using pairwise alignments to search databases for similar sequences Query

Module 3: Creating and Managing Databases Overview Creating Databases Creating

Chapter 5 Searching and Binary Search Trees 5.1 Searching sequence The purpose of searching :

Sequence to Sequence models: Attention Models 1 Sequence-to-sequence modelling Problem:

Sequence to Sequence models: Attention Models 1 Sequence-to-sequence modelling Problem:

Sequence to Sequence models: Connectionist Temporal Classification 1 Sequence-to-sequence

Searching in speech Language and Keyword searching in OSCAR Language and Computers Computers

Linguistics 384: Language and Computers Operators Searching the web Topic 2: Searching

GEMS/Food Databases and GEMS/Food Databases and GEMS/Food Databases and in the Food Supply

On the Cycle Structures of Hypergraphs Jianfang Wang Academy of Mathematics and System Science,

Multiple Alignments and Phylogenies Mark Voorhies 3/31/2011 Mark Voorhies Multiple Alignments

Map the following onto this image. These are kind of imprecise arrows but I thought thinking

The Least Spanning Area of a Knot and the Optimal Bounding Chain Problem Nathan M. Dunfield

Shortest Non-trivial Cycles in Directed and Undirected Surface Graphs Kyle Fox University of

Fundamentals of Evolution Session 6 - 2018 Bayesian phylogenetics &amp; big trees 1 Recap of

INAF-Astronomical Observatory of Padova III. Evolution of the ejecta Luca Zampieri - Supernovae,

Interspecies gene function prediction using semantic similarity Guoxian Yu*, Wei Luo, Guangyuan

Sambuz

Useful Links

Newsletter

Mail Us

SEQUENCE ANALYSIS The term " sequence analysis " in biology implies subjecting a DNA or

Fundamentals of Evolution Session 6 - 2018 Bayesian phylogenetics & big trees 1 Recap of