alignments in practice blast and clustal
play

Alignments in Practice BLAST and CLUSTAL Introduction to - PowerPoint PPT Presentation

Alignments in Practice BLAST and CLUSTAL Introduction to Bioinformatics Dortmund, 16.-20.07.2007 Lectures: Sven Rahmann Exercises: Udo Feldkamp, Michael Wurst 1 Overview Dot Plots Nucleotide BLAST Protein BLAST BLAST


  1. Alignments in Practice BLAST and CLUSTAL Introduction to Bioinformatics Dortmund, 16.-20.07.2007 Lectures: Sven Rahmann Exercises: Udo Feldkamp, Michael Wurst 1

  2. Overview ● Dot Plots ● Nucleotide BLAST ● Protein BLAST ● BLAST Statistics ● BLAT ● CLUSTAL ● JalView 2

  3. Dotter – Tool for Dot Plots ● http://www.cgb.ki.se/cgb/groups/sonnhammer/Dotter.html ● Dotlet: a Java applet for Dot Plots 3

  4. Dot Plots ● Hemoglobin Alpha against Hemoglobin Beta 4

  5. EBI Alignment Service 5

  6. BLAST ● URL: http://www.ncbi.nlm.nih.gov/BLAST/ ● Basic Local Alignment Search Tool 6

  7. Choose the right BLAST 7

  8. Nucleotide BLAST Interface 8

  9. BLAST Parameters ● Expect threshold: low [0.01] = strict high [100] = loose ● Word size: speed vs. sensitivity high = faster low = slower, but more sensitive 9

  10. Protein BLAST 10

  11. Protein BLAST Parameters 11

  12. Translated BLAST ● protein query against nucleotide database – nucleotide sequence not unique – also consider reverse complement ● nucleotide query against protein database – consider all 6 reading frames 12

  13. BLAST Output 13

  14. BLAST Output II Database + Accession Description Bit score E-value Link 14

  15. BLAST Statistics ● How good / reliable is a hit found by BLAST? ● Raw score := score of the alignment according to scoring matrix and gap penalties ● Bit score := score (log2 units), length-normalized ● E-value := Number of hits of such or better score in a hypothetical database of random proteins of the same size 15

  16. More on Statistics ● Null model := random model describing sequences without intentional signal (here: pair of random sequences without intentional similarity) ● (single) p-value for observed score s := Prob(Score >= s) in the null model ● (multiple) p-value := Prob(Score >= s at least once) 16

  17. BLAT ● BLAST-Like Alignment Tool ● index-based ● developed at UC Santa Cruz ● especially for searching in whole genomes ● very fast ● limited to nearly exact matches 17

  18. UCSC Genome Browser + BLAT 18

  19. CLUSTAL 19

  20. What Clustal Did (“Output file”) 20

  21. Clustal Results (pretty) 21

  22. Clustal Results (“alignment file”) CLUSTAL W (1.83) multiple sequence alignment FOS_RAT MMFSGFNADYEASSSRCSSASPAGDSLSYYHSPADSFSSMGSPVNTQDFCADLSVSSANF 60 FOS_MOUSE MMFSGFNADYEASSSRCSSASPAGDSLSYYHSPADSFSSMGSPVNTQDFCADLSVSSANF 60 FOS_HUMAN MMFSGFNADYEASSSRCSSASPAGDSLSYYHSPADSFSSMGSPVNAQDFCTDLAVSSANF 60 FOS_CHICK MMYQGFAGEYEAPSSRCSSASPAGDSLTYYPSPADSFSSMGSPVNSQDFCTDLAVSSANF 60 FOS_ZEBRAFISH MMFTSLNADCDASS-RCSTASPSGDSVGYY------------PLNQTQEFTDLSVSSASF 47 **: .: .: :*.* ***:***:***: ** *:* : :**:****.* FOS_RAT IPTVTAISTSPDLQWLVQPTLVSSVAPSQTRAPHPYGLPTPS-TGAYARAGVVKTMSGGR 119 FOS_MOUSE IPTVTAISTSPDLQWLVQPTLVSSVAPSQTRAPHPYGLPTQS-AGAYARAGMVKTVSGGR 119 FOS_HUMAN IPTVTAISTSPDLQWLVQPALVSSVAPSQTRAPHPFGVPAPS-AGAYSRAGVVKTMTGGR 119 FOS_CHICK VPTVTAISTSPDLQWLVQPTLISSVAPSQNRG-HPYGVPAPAPPAAYSRPAVLKAP-GGR 118 FOS_ZEBRAFISH VPTVTAISSCPDLQWMVQP-MISSAAPS-------NGAAQSYNPSSYPKMRVTGAK---- 95 :*******:.*****:*** ::**.*** * . ..:*.: : : FOS_RAT AQSIGRRGKVEQLSPEEEEKRRIRRERNKMAAAKCRNRRRELTDTLQAETDQLEDEKSAL 179 FOS_MOUSE AQSIGRRGKVEQLSPEEEEKRRIRRERNKMAAAKCRNRRRELTDTLQAETDQLEDEKSAL 179 FOS_HUMAN AQSIGRRGKVEQLSPEEEEKRRIRRERNKMAAAKCRNRRRELTDTLQAETDQLEDEKSAL 179 FOS_CHICK GQSIGRRGKVEQLSPEEEEKRRIRRERNKMAAAKCRNRRRELTDTLQAETDQLEEEKSAL 178 FOS_ZEBRAFISH --TSNKRSRSEQLSPEEEEKKRVRRERSKMAAAKCRNRRRELTDTLQAETDQLEDEKSAL 153 : .:*.: **********:*:****.**************************:***** FOS_RAT QTEIANLLKEKEKLEFILAAHRPACKIPNDLGFPEE----MSVTS-LDLTGGLPEATTPE 234 FOS_MOUSE QTEIANLLKEKEKLEFILAAHRPACKIPDDLGFPEE----MSVAS-LDLTGGLPEASTPE 234 FOS_HUMAN QTEIANLLKEKEKLEFILAAHRPACKIPDDLGFPEE----MSVAS-LDLTGGLPEVATPE 234 FOS_CHICK QAEIANLLKEKEKLEFILAAHRPACKMPEELRFSEE----LAAATALDLG----APSPAA 230 FOS_ZEBRAFISH QNDIANLLKEKERLEFILAAHKPICKIPADASFPEPSSSPMSSISVPEIVTTSVVSSTPN 213 22 * :*********:********:* **:* : *.* :: : :: :..

  23. Clustal Guide Tree 23

  24. Clustal Guide Tree ● Guide Tree is not a phylogenetic tree, just a computational device ● Cladogram: edge lengths have no meaning ● Phylogram: edgle lengths correspond to distances 24

  25. JalView: Alignment Editor (start from the CLUSTAL web site) 25

  26. Simple JalView Window ● Simple alignment editor (Java applet) ● Complex alignment editor (Java application) – Web Start, or – Download installer 26

  27. Starting or Installing JalView www.jalview.org 27

  28. Multiple Alignment @ BiBiServ 28

  29. For Windows/MAC: QAlign2 ● URL: http://gi.cebitec.uni-bielefeld.de/QAlign/ ● Live Demo of QAlign2 29

Recommend


More recommend