intro to alignment algorithms global and local
play

Intro to Alignment Algorithms: Global and Local Algorithmic - PowerPoint PPT Presentation

Intro to Alignment Algorithms: Global and Local Algorithmic Functions of Computational Biology Professor Istrail Algorithmic Functions of Computational Biology Professor Istrail Sequence Comparison Biomolecular sequences DNA sequences


  1. Intro to Alignment Algorithms: Global and Local Algorithmic Functions of Computational Biology Professor Istrail

  2. Algorithmic Functions of Computational Biology Professor Istrail Sequence Comparison Biomolecular sequences DNA sequences (string over 4 letter alphabet {A, C, □ G, T}) RNA sequences (string over 4 letter alphabet □ {ACGU}) Protein sequences (string over 20 letter alphabet □ {Amino Acids}) Sequence similarity helps in the discovery of genes, and the prediction of structure and function of proteins.

  3. Algorithmic Functions of Computational Biology – Professor Istrail The Basic Similarity Analysis Algorithm Global Similarity • Scoring Schemes • Edit Graphs • Alignment = Path in the Edit Graph • The Principle of Optimality • The Dynamic Programming Algorithm • The Traceback

  4. Algorithmic Functions of Computational Biology – Professor Istrail The Sequence Alignment Problem Input. : two sequences over the same alphabet and a scoring scheme Output: an alignment of the two sequences of maximum score Example: GCGCATTTGAGCGA □ TGCGTTAGGGTGACCA □ match A possible alignment: mismatch - GCGCATTTGAGCGA - - TGCG - - TTAGGGTGACC indel

  5. Mismatch, Deletion, Insertion TCAGGGGGCTATT mismatch AGTCCTCCGATAA TCAGG G GGCTATT deletion AGTCC - CCGATAA (in template) TCAGGGGG - CTATT insertion AGTCCCCC C GATAA (in template) CSCI2820 - Class 4 5

  6. Algorithmic Functions of Computational Biology – Professor Istrail Consider two sequences X x x x = 1 2 ... n x , i y belong to Σ j Y y y y = 1 2 ... m Over the alphabet A , C , G , T } Σ = {

  7. Algorithmic Functions of Computational Biology – Professor Istrail Scoring Schemes Unit-score A C T δ G - A 1 0 0 0 0 C 1 0 0 0 0 0 G 1 0 0 0 0 0 0 T 1 0 - 0 0 0 0 0

  8. Algorithmic Functions of Computational Biology – Professor Istrail Alignment A is aligned with A ACG C is aligned with G | | | AGG A C G | | | G is aligned with G A G G Unit-cost δ δ δ Score = (A,A) (C,G) (G,G) + + = 1 + 0 + 1 = 2

  9. Algorithmic Functions of Computational Biology – Professor Istrail Gaps “-” is the gap symbol ACATGGAAT ACAT GG - AAT ACAGGAAAT ACA - GG AAAT SCORE 7 8 OPTIMAL ALIGNMENTS AAAGGG - - - AAAGGG GGGAAA GGGAAA - - - SCORE 0 3

  10. Algorithmic Functions of Computational Biology – Professor Istrail δ (x,y) = the score for aligning x with y δ (x,-) = the score for aligning x with - δ (-,y) = the score for aligning - with y

  11. Algorithmic Functions of Computational Biology – Professor Istrail Alignment A-CG - G ATCGTG Score δ δ δ δ δ δ (A,A) + (-,T) + (C,C) + (G,G) + (-,T ) + (G,G) THE SUM OF THE SCORES OF THE PAIRWISE ALIGNED SYMBOLS

  12. ARTEMIS Summer 2008 Professor Istrail Margaret Dayhoff & PAM Similarity Matrices

  13. ARTEMIS Summer 2008 Dr. Margaret Oakley Dayhoff Professor Istrail The Mother & Father of Bioinformatics

  14. Algorithmic Functions of Computational Biology – Professor Istrail Scoring Scheme Dayhoff PAM scoring matrices - A R N D C Q E G H I L K M F P S T W Y V δ -8 -8 -8 -8 -8 -8 -8 -8 -8 -8 -8 -8 -8 -8 -8 -8 -8 -8 -8 -8 -8 -8 -8 3 -3 0 0 -3 -1 0 1 -3 -1 -3 -2 -2 -4 1 1 1 -7 -4 0 A ... R 6 N 4 D PTIPLSRLFDNAMLRAHRLHQ SAIENQRLFNIAVSRVQHLHL Partial alignment for Monkey and Trout somatotropin proteins

  15. Algorithmic Functions of Computational Biology – Professor Istrail Scoring Functions Mutations= Substitutions, Insertions, Deletions Scoring function = a sum of a terms each for a pair of aligned residues, and for each gap The meaning = log of the relative likelihood that the sequences are related, compared to being unrelated Identities and conservative substitutions are Positive terms Non-conservative substitutions are Negative terms

  16. Global alignment problem • Input: Sequences X and Y of length m and n respectively and a similarity matrix • Output: An optimal global alignment of X and Y – Global alignments require all bases in both sequences are aligned CSCI2820 - Class 4 16

  17. Local alignment problem • Input: Sequences X and Y of length m and n respectively and a similarity matrix • Output: An optimal local alignment of X and Y – Local alignments do not require using all bases in either sequence in the alignment • Applicable when looking for subsequences of similarity CSCI2820 - Class 4 17

  18. Algorithmic Functions of Computational Biology – Professor Istrail The Edit Graph AGT with AT Suppose that we want to align We are going to construct a graph where alignments between the two sequences correspond to paths between the begin and and end nodes of the graph. This is the Edit Graph

  19. Algorithmic Functions of Computational Biology – Professor Istrail The sequence AGT The sequence AT 3 0 2 1 0 AGT has length 3 AT has length 2 1 2 The Edit graph has (3+1)*(2+1) nodes

  20. Algorithmic Functions of Computational Biology – Professor Istrail T A G Begin 0 1 2 3 0 A 1 T 2 End AGT indexes the columns, and AT indexes the rows of this “table”

  21. G A 0 1 0 Algorithmic Functions of Computational Biology – A Professor Istrail T G A 1 Begin 0 1 2 3 0 T A 2 1 T 2 End The Graph is directed. The nodes (i,j) will hold values.

  22. Algorithmic Functions of Computational Biology Professor Istrail G A T Begin 0 1 2 3 0 A 1 T 2 End

  23. Algorithmic Functions of Computational Biology – Professor Istrail Directed edges get as labels pairs of aligned letters. A G T Begin 0 1 2 3 0 T G A - - - A - - - - A G T A A A A - A A A A 1 G T A - - - - - - - A G T T T T T T T T T 2 G T A - - - End

  24. Algorithmic Functions of Computational Biology – Professor Istrail Alignment = Path in the Edit Graph A G T Begin 0 1 2 3 0 T G A - - - A AGT - - - - A G T A A A A-T A - A A A A G T A 1 - - - - - - - A G T T T T T T T T T G T 2 A End - - - Every path from Begin to End corresponds to an alignment Every alignment corresponds to a path between Begin and End

  25. Algorithmic Functions of Computational Biology – Professor Istrail The Principle of Optimality The optimal answer to a problem is expressed in terms of optimal answer for its sub-problems

  26. Algorithmic Functions of Computational Biology – Professor Istrail Dynamic Programming Given: Two sequences X and Y Find: An optimal alignment of X with Y Part 1: Compute first the optimal alignment score Part 2: Construct optimal alignment We are looking for the optimal alignment = maximal score path in the Edit Graph from the Begin vertex to the End vertex

  27. Algorithmic Functions of Computational Biology – Professor Istrail The DP Matrix S(i,j) T G A S(1,0) 0 1 2 3 0 A 1 T S(2,1) 2

  28. Algorithmic Functions of Computational Biology – Professor Istrail The DP Matrix Matrix S =[S(i,j)] ) j , i ( S(i,j) = The score of the maximal cost path o t from the Begin Vertex and the vertex (i,j) h t a P (i-1,j-1) The optimal path to (i,j) l (i,j-1) a m must pass through one of i t the vertices p O (i-1,j) (i-1,j) (i,j) (i,j-1) (i-1,j-1)

  29. Algorithmic Functions of Computational Biology – Professor Istrail Opt path (i-1,j-1) (i,j-1) - xi S(i-1,j) + δ (- , yj) (i-1,j) (i,j) yj - δ Optimal path to (i-1,j) + (- , yj)

  30. Algorithmic Functions of Computational Biology – Professor Istrail Optimal path (i-1,j-1) (i-1,j) δ S(i-1,j-1) + (xi , yj) (i,j-1) (i,j) Optimal path to (i-1,j-1) + (xi,yj) δ

  31. Algorithmic Functions of Computational Biology – Professor Istrail Optimal path (i-1,j-1) (i,j-1) δ S(i,j-1) + (xi, -) (I-1,j) (i,j) Optimal path to (i,j-1) δ + (xi,-)

  32. Algorithmic Functions of Computational Biology – Professor Istrail The Basic ALGORITHM δ S(i-1, j-1) + (xi, yj) MAX S(i,j) = δ S(i-1, j) + (xi, -) δ S(i, j-1) + (-, yj)

  33. Algorithmic Functions of Computational Biology – Optimal Alignment and Tracback Professor Istrail A T G 0 1 2 3 0 T G 0 0 0 0 A - - - A - - - - A G T A A A A - A A A A 1 1 1 0 1 G T A - - - - - - - A G T T T T T T T T T 0 2 1 1 2 G T A - - - AGT Optimal Alignment A - T

  34. Algorithmic Functions of Computational Biology – Professor Istrail The Basic ALGORITHM: Local Similarity We add this 0, δ S(i-1, j-1) + (xi, yj), δ MAX S(i-1, j) + (xi, -), S(i,j) = δ S(i, j-1) + (-, yj)

  35. Protein global alignment X = hlsek Y = nlsak • X and Y represent a protein subsequence from the BRCA2 (early onset) protein in human and chimpanzee • Global alignments are used when the two sequences being compared represent a similar biological sequence CSCI2820 - Class 4 35

  36. Margaret Dayhoff’s PAM 100 similarity matrix (partial) A N E H L K S * A 4 -1 0 -3 -3 -3 1 -9 N -1 5 1 2 -4 1 1 -9 E 0 1 5 -1 -5 -1 -1 -9 H -3 2 -1 7 -3 -2 -2 -9 L -3 -4 -5 -3 6 -4 -4 -9 K -3 1 -1 -2 -4 5 -1 -9 S 1 1 -1 -2 -4 -1 4 -9 * -9 -9 -9 -9 -9 -9 -9 1 CSCI2820 - Class 4 36

Recommend


More recommend