introduction
play

Introduction Applied Bioinformatics Michael Schroeder - PowerPoint PPT Presentation

Introduction Applied Bioinformatics Michael Schroeder Biotechnology Center TU Dresden DNA the molecule of life http://www.ornl.gov/hgmis 2 High-throughput Technology 1950s: 2000s: 2010s: Watson and Crick Sanger Center BGI,


  1. Introduction Applied Bioinformatics Michael Schroeder Biotechnology Center TU Dresden

  2. DNA – the molecule of life http://www.ornl.gov/hgmis 2

  3. High-throughput Technology 1950s: 2000s: 2010s: Watson and Crick Sanger Center BGI, Cambridge Beijing 3

  4. Drug Discovery 20 20 80 80 New Drugs New Drugs 70 70 R&D spendings R&D spendings New drugs per year New drugs per year 15 15 60 60 50 50 10 10 40 40 30 30 5 5 20 20 10 10 0 0 0 0 60 60 65 65 70 70 75 75 80 80 85 85 90 90 95 95 Year Year 4

  5. Genetic Code 5

  6. Actinidin and Papain Cystein proteases in kiwi and papaya, respectively Tenderises meat and breaks down casein (milk) 50% sequence ID, same structure 6

  7. Hemoglobin and Leghemoglobin Oxygen transport in red blood cells and legumes, respectively 11% sequence ID, same structure 7

  8. Sequence-Structure Relation 8

  9. Similar sequences hint for … § common ancestry and § possibly similar function 9

  10. Similar sequence, similar function? § Monkey V-sys and human PDGF 85% similar Doolittle et al., Science, 1983 Simian sarcoma virus onco gene, v-sys, is derived from the gene encoding a platelet-derived growth factor. § Hypothesis: Cancer = deregulated growth factor Alignment from: http://pdf.aminer.org/000/244/500/design_and_implementation_of_a_dna_sequence_processor.pdf 10

  11. Similar sequence, common ancestry? >sp|P00674|RNP_HORSE Ribonuclease pancreatic Horse KESPAMKFERQHMDSGSTSSSNPTYCNQMMKRRNMTQGWCKPVNTFVHEPLADVQAICLQ… >sp|P00673|RNP_BALAC Ribonuclease pancreatic Minke whale RESPAMKFQRQHMDSGNSPGNNPNYCNQMMMRRKMTQGRCKPVNTFVHESLEDVKAVCSQ… >sp|P00686|RNP_MACRU Ribonuclease pancreatic Red kangaroo ETPAEKFQRQHMDTEHSTASSSNYCNLMMKARDMTSGRCKPLNTFIHEPKSVVDAVCHQE… 11

  12. Alignment CLUSTAL 2.1 multiple sequence alignment sp|P00674|RNP_HORSE sp|P00673|RNP_BALAC sp|P00686|RNP_MACRU KESPAMKFERQHMDSGSTSSSNPTYCNQMMKRRNMTQGWCKPVNTFVHEPLADVQAICLQ 60 RESPAMKFQRQHMDSGNSPGNNPNYCNQMMMRRKMTQGRCKPVNTFVHESLEDVKAVCSQ 60 -ETPAEKFQRQHMDTEHSTASSSNYCNLMMKARDMTSGRCKPLNTFIHEPKSVVDAVCHQ 59 *:** **:*****: :......*** ** *.**.* ***:***:**. *.*:* * KNITCKNGQSNCYQSSSSMHITDCRLTSGSKYPNCAYQTSQKERHIIVACEGNPYVPVHF 120 KNVLCKNGRTNCYESNSTMHITDCRQTGSSKYPNCAYKTSQKEKHIIVACEGNPYVPVHF 120 ENVTCKNGRTNCYKSNSRLSITNCRQTGASKYPNCQYETSNLNKQIIVACEG-QYVPVHF 118 :*: ****::***:*.* : **:** *..****** *:**: :::******* ****** Number of aligned residues DASVEVST 128 DNSV---- 124 DAYV---- 122 * * § Horse and Minke whale: 95 § Minke whale and Red kangoroo: 82 § Horse and Red kangoroo: 75 http://www.genome.jp/tools/clustalw 12

  13. Similar sequence, common ancestry? 13

  14. African elephant: sp|O47885|CYB_ELEMA African elephant: sp|O47885|CYB_ELEMA Mammoth: sp|P92658|CYB_MAMPR Mammoth: sp|P92658|CYB_MAMPR Indian elephant: sp|P24958|CYB_LOXAF Indian elephant: sp|P24958|CYB_LOXAF 14

  15. Elephant and Mammoth Mammoth-African elephant 10 mismatches Mammoth-Indian elephant 14 mismatches Significant? 15

  16. Similarity implies homology Sequence similarity is not equal to homology 16

  17. Similarity usually implies homology § Conservation: Sequences similar in many species § Convergent evolution § Mutation rate varied § Horizontal gene transfer 17

  18. Homologue Orthologue Paralogue 18

  19. Darwin‘s Tree of Life 19

  20. Tree of Life with 2.3 Mio Species opentreeoflife.org 20

  21. Sequence Alignments § Why to compare and align sequences? § How to judge an alignment? § How to compute an alignment? § How to compute an alignment fast? 21

  22. How to judge an alignment § Scoring scheme § number of matches, mismatches, gaps § substitution matrices § Significance § E-value, P-value, Z-score § Structure § Benchmark sequence against structure alignment § Function § Benchmark sequence alignment implies similar function? 22

  23. Sequence Alignments § Why to compare and align sequences? § How to judge an alignment? § How to compute an alignment? § How to compute an alignment fast? 23

  24. Levenshtein (or Edit) Distance Minimum number of insertions, deletions, and replacements to convert string a into string b 24

  25. Levenshtein (or Edit) Distance Let a = a 1 . . . a m and b = b 1 . . . b n be strings. Then lev a,b = lev a,b ( m, n ) is the Levenshtein distance of a and b , where 8 max( i, j ) if min( i, j ) = 0 , > > > 8 > lev a,b ( i � 1 , j ) + 1 < lev a,b ( i, j ) = > < min lev a,b ( i, j � 1) + 1 otherwise, > > > > > lev a,b ( i � 1 , j � 1) + 1 ( a i 6 = b j ) : : and 0  i  m and 0  j  n and ( 1 if ( a i 6 = b j ) , 1 ( a i 6 = b j ) := 0 if ( a i = b j ) . 25

  26. From Distance to Alignment Aligning RDISLVKNAGI and RNILVSDAKNVGI R D I S L V - - - K N A G I R N I - L V S D A K N V G I From lectures.molgen.mpg.de/Alg/Intro/ 26

  27. Sequence Alignments § Why to compare and align sequences? § How to judge an alignment? § How to compute an alignment? § How to compute an alignment fast? 27

  28. Computing Alignments fast compbio.pbworks.com 28

  29. Computing multiple sequence alignments 29

  30. Computing phylogenetic trees § Distance-based § Neighbour joining § Hierarchical clustering § Character-based § Parsimony method § Maximum Likelihood Saitou, Kyushu Museum, 2002 30

Recommend


More recommend