Introduction Applied Bioinformatics Michael Schroeder Biotechnology Center TU Dresden
DNA – the molecule of life http://www.ornl.gov/hgmis 2
High-throughput Technology 1950s: 2000s: 2010s: Watson and Crick Sanger Center BGI, Cambridge Beijing 3
Drug Discovery 20 20 80 80 New Drugs New Drugs 70 70 R&D spendings R&D spendings New drugs per year New drugs per year 15 15 60 60 50 50 10 10 40 40 30 30 5 5 20 20 10 10 0 0 0 0 60 60 65 65 70 70 75 75 80 80 85 85 90 90 95 95 Year Year 4
Genetic Code 5
Actinidin and Papain Cystein proteases in kiwi and papaya, respectively Tenderises meat and breaks down casein (milk) 50% sequence ID, same structure 6
Hemoglobin and Leghemoglobin Oxygen transport in red blood cells and legumes, respectively 11% sequence ID, same structure 7
Sequence-Structure Relation 8
Similar sequences hint for … § common ancestry and § possibly similar function 9
Similar sequence, similar function? § Monkey V-sys and human PDGF 85% similar Doolittle et al., Science, 1983 Simian sarcoma virus onco gene, v-sys, is derived from the gene encoding a platelet-derived growth factor. § Hypothesis: Cancer = deregulated growth factor Alignment from: http://pdf.aminer.org/000/244/500/design_and_implementation_of_a_dna_sequence_processor.pdf 10
Similar sequence, common ancestry? >sp|P00674|RNP_HORSE Ribonuclease pancreatic Horse KESPAMKFERQHMDSGSTSSSNPTYCNQMMKRRNMTQGWCKPVNTFVHEPLADVQAICLQ… >sp|P00673|RNP_BALAC Ribonuclease pancreatic Minke whale RESPAMKFQRQHMDSGNSPGNNPNYCNQMMMRRKMTQGRCKPVNTFVHESLEDVKAVCSQ… >sp|P00686|RNP_MACRU Ribonuclease pancreatic Red kangaroo ETPAEKFQRQHMDTEHSTASSSNYCNLMMKARDMTSGRCKPLNTFIHEPKSVVDAVCHQE… 11
Alignment CLUSTAL 2.1 multiple sequence alignment sp|P00674|RNP_HORSE sp|P00673|RNP_BALAC sp|P00686|RNP_MACRU KESPAMKFERQHMDSGSTSSSNPTYCNQMMKRRNMTQGWCKPVNTFVHEPLADVQAICLQ 60 RESPAMKFQRQHMDSGNSPGNNPNYCNQMMMRRKMTQGRCKPVNTFVHESLEDVKAVCSQ 60 -ETPAEKFQRQHMDTEHSTASSSNYCNLMMKARDMTSGRCKPLNTFIHEPKSVVDAVCHQ 59 *:** **:*****: :......*** ** *.**.* ***:***:**. *.*:* * KNITCKNGQSNCYQSSSSMHITDCRLTSGSKYPNCAYQTSQKERHIIVACEGNPYVPVHF 120 KNVLCKNGRTNCYESNSTMHITDCRQTGSSKYPNCAYKTSQKEKHIIVACEGNPYVPVHF 120 ENVTCKNGRTNCYKSNSRLSITNCRQTGASKYPNCQYETSNLNKQIIVACEG-QYVPVHF 118 :*: ****::***:*.* : **:** *..****** *:**: :::******* ****** Number of aligned residues DASVEVST 128 DNSV---- 124 DAYV---- 122 * * § Horse and Minke whale: 95 § Minke whale and Red kangoroo: 82 § Horse and Red kangoroo: 75 http://www.genome.jp/tools/clustalw 12
Similar sequence, common ancestry? 13
African elephant: sp|O47885|CYB_ELEMA African elephant: sp|O47885|CYB_ELEMA Mammoth: sp|P92658|CYB_MAMPR Mammoth: sp|P92658|CYB_MAMPR Indian elephant: sp|P24958|CYB_LOXAF Indian elephant: sp|P24958|CYB_LOXAF 14
Elephant and Mammoth Mammoth-African elephant 10 mismatches Mammoth-Indian elephant 14 mismatches Significant? 15
Similarity implies homology Sequence similarity is not equal to homology 16
Similarity usually implies homology § Conservation: Sequences similar in many species § Convergent evolution § Mutation rate varied § Horizontal gene transfer 17
Homologue Orthologue Paralogue 18
Darwin‘s Tree of Life 19
Tree of Life with 2.3 Mio Species opentreeoflife.org 20
Sequence Alignments § Why to compare and align sequences? § How to judge an alignment? § How to compute an alignment? § How to compute an alignment fast? 21
How to judge an alignment § Scoring scheme § number of matches, mismatches, gaps § substitution matrices § Significance § E-value, P-value, Z-score § Structure § Benchmark sequence against structure alignment § Function § Benchmark sequence alignment implies similar function? 22
Sequence Alignments § Why to compare and align sequences? § How to judge an alignment? § How to compute an alignment? § How to compute an alignment fast? 23
Levenshtein (or Edit) Distance Minimum number of insertions, deletions, and replacements to convert string a into string b 24
Levenshtein (or Edit) Distance Let a = a 1 . . . a m and b = b 1 . . . b n be strings. Then lev a,b = lev a,b ( m, n ) is the Levenshtein distance of a and b , where 8 max( i, j ) if min( i, j ) = 0 , > > > 8 > lev a,b ( i � 1 , j ) + 1 < lev a,b ( i, j ) = > < min lev a,b ( i, j � 1) + 1 otherwise, > > > > > lev a,b ( i � 1 , j � 1) + 1 ( a i 6 = b j ) : : and 0 i m and 0 j n and ( 1 if ( a i 6 = b j ) , 1 ( a i 6 = b j ) := 0 if ( a i = b j ) . 25
From Distance to Alignment Aligning RDISLVKNAGI and RNILVSDAKNVGI R D I S L V - - - K N A G I R N I - L V S D A K N V G I From lectures.molgen.mpg.de/Alg/Intro/ 26
Sequence Alignments § Why to compare and align sequences? § How to judge an alignment? § How to compute an alignment? § How to compute an alignment fast? 27
Computing Alignments fast compbio.pbworks.com 28
Computing multiple sequence alignments 29
Computing phylogenetic trees § Distance-based § Neighbour joining § Hierarchical clustering § Character-based § Parsimony method § Maximum Likelihood Saitou, Kyushu Museum, 2002 30
Recommend
More recommend