An introduction to multiple alignments original version by Cédric Notredame, updated by Laurent Falquet Swiss Institute of Bioinf ormatics Institut Suisse de Bioinf ormatique CN+LF-2006.01 Overview � Multiple alignments � How-to, Goal, problems, use � Patterns � PROSITE database, syntax, use � PSI-BLAST � BLAST, matrices, use � [ Profiles/HMMs ] … Swiss Institute of Bioinf ormatics Institut Suisse de Bioinf ormatique CN+LF-2006.01
Overview � What are multiple alignments? � How can I use my alignments? � How does the computer align the sequences? � The progressive alignment algorithm � What are the difficulties? � Pre-requisite? � How can we compare sequences? � How can we align sequences? Swiss Institute of Bioinf ormatics Institut Suisse de Bioinf ormatique CN+LF-2006.01 Sometimes two sequences are not enough The man with TWO watches NEVER knows the exact time Swiss Institute of Bioinf ormatics Institut Suisse de Bioinf ormatique CN+LF-2006.01
What is a multiple sequence alignment? � What can it do for me? � How can I produce one of these? � How can I use it? chite ---ADKPKRPLSAYMLWLNSARESIKRENPDFK-VTEVAKKGGELWRGLKD wheat --DPNKPKRAPSAFFVFMGEFREEFKQKNPKNKSVAAVGKAAGERWKSLSE trybr KKDSNAPKRAMTSFMFFSSDFRS----KHSDLS-IVEMSKAAGAAWKELGP mouse -----KPKRPRSAYNIYVSESFQ----EAKDDS-AQGKLKLVNEAWKNLSP ***. ::: .: .. . : . . * . *: * chite AATAKQNYIRALQEYERNGG- wheat ANKLKGEYNKAIAAYNKGESA trybr AEKDKERYKREM--------- mouse AKDDRIRYDNEMKSWEEQMAE * : .* . : Swiss Institute of Bioinf ormatics Institut Suisse de Bioinf ormatique CN+LF-2006.01 What is a multiple sequence alignment? � Structural/biochemical criteria � Residues playing a similar role end up in the same column. � Evolution criteria � Residues having the same ancestor end up in the same column. chite ---ADKPKRPLSAYMLWLNSARESIKRENPDFK-VTEVAKKGGELWRGLKD wheat --DPNKPKRAPSAFFVFMGEFREEFKQKNPKNKSVAAVGKAAGERWKSLSE trybr KKDSNAPKRAMTSFMFFSSDFRS----KHSDLS-IVEMSKAAGAAWKELGP mouse -----KPKRPRSAYNIYVSESFQ----EAKDDS-AQGKLKLVNEAWKNLSP ***. ::: .: .. . : . . * . *: * chite AATAKQNYIRALQEYERNGG- wheat ANKLKGEYNKAIAAYNKGESA trybr AEKDKERYKREM--------- mouse AKDDRIRYDNEMKSWEEQMAE * : .* . : Swiss Institute of Bioinf ormatics Institut Suisse de Bioinf ormatique CN+LF-2006.01
Swiss Institute of Bioinf ormatics Institut Suisse de Bioinf ormatique CN+LF-2006.01 How can I use a multiple alignment? chite ---ADKPKRPLSAYMLWLNSARESIKRENPDFK-VTEVAKKGGELWRGLKD wheat --DPNKPKRAPSAFFVFMGEFREEFKQKNPKNKSVAAVGKAAGERWKSLSE trybr KKDSNAPKRAMTSFMFFSSDFRS----KHSDLS-IVEMSKAAGAAWKELGP unknown -----KPKRPRSAYNIYVSESFQ----EAKDDS-AQGKLKLVNEAWKNLSP ***. ::: .: .. . : . . * . *: * chite AATAKQNYIRALQEYERNGG- Less Than 30 % id wheat ANKLKGEYNKAIAAYNKGESA BUT trybr AEKDKERYKREM--------- unknown AKDDRIRYDNEMKSWEEQMAE Conserved where it MATTERS * : .* . : Extrapolation Homology? SwissProt Unkown Sequence Swiss Institute of Bioinf ormatics Institut Suisse de Bioinf ormatique CN+LF-2006.01
How can I use a multiple alignment? chite ---ADKPKRPLSAYMLWLNSARESIKRENPDFK-VTEVAKKGGELWRGLKD wheat --DPNKPKRAPSAFFVFMGEFREEFKQKNPKNKSVAAVGKAAGERWKSLSE trybr KKDSNAPKRAMTSFMFFSSDFRS----KHSDLS-IVEMSKAAGAAWKELGP mouse -----KPKRPRSAYNIYVSESFQ----EAKDDS-AQGKLKLVNEAWKNLSP ***. ::: .: .. . : . . * . *: * chite AATAKQNYIRALQEYERNGG- wheat ANKLKGEYNKAIAAYNKGESA trybr AEKDKERYKREM--------- mouse AKDDRIRYDNEMKSWEEQMAE * : .* . : Extrapolation Prosite Patterns P-K-R-[PA]-x(1)-[ST]… Swiss Institute of Bioinf ormatics Institut Suisse de Bioinf ormatique CN+LF-2006.01 How can I use a multiple alignment? chite ---ADKPKRPLSAYMLWLNSARESIKRENPDFK-VTEVAKKGGELWRGLKD wheat --DPNKPKRAPSAFFVFMGEFREEFKQKNPKNKSVAAVGKAAGERWKSLSE trybr KKDSNAPKRAMTSFMFFSSDFRS----KHSDLS-IVEMSKAAGAAWKELGP mouse -----KPKRPRSAYNIYVSESFQ----EAKDDS-IQGKLKLVNEAWKNLSP ***. ::: .: .. . : . . * . *: * chite AATAKQNYIRALQEYERNGG- wheat ANKLKGEYNKAIAAYNKGESA L? trybr AEKDKERYKREM--------- K>R mouse AKDDRIRYDNEMKSWEEQMAE * : .* . : Extrapolation A F D E Prosite Patterns F G H Q I Prosite Profiles -More Sensitive V L -More Specific W Swiss Institute of Bioinf ormatics Institut Suisse de Bioinf ormatique CN+LF-2006.01
PROSITE profile (see also HMMs) A Substitution Cost For Every Amino Acid, At Every Position Swiss Institute of Bioinf ormatics Institut Suisse de Bioinf ormatique CN+LF-2006.01 How can I use a multiple alignment? chite ---ADKPKRPLSAYMLWLNSARESIKRENPDFK-VTEVAKKGGELWRGLKD wheat --DPNKPKRAPSAFFVFMGEFREEFKQKNPKNKSVAAVGKAAGERWKSLSE trybr KKDSNAPKRAMTSFMFFSSDFRS----KHSDLS-IVEMSKAAGAAWKELGP mouse -----KPKRPRSAYNIYVSESFQ----EAKDDS-AQGKLKLVNEAWKNLSP ***. ::: .: .. . : . . * . *: * chite AATAKQNYIRALQEYERNGG- wheat ANKLKGEYNKAIAAYNKGESA trybr AEKDKERYKREM--------- mouse AKDDRIRYDNEMKSWEEQMAE * : .* . : Phylogeny chite wheat -Evolution trybr -Paralogy/Orthology mouse Swiss Institute of Bioinf ormatics Institut Suisse de Bioinf ormatique CN+LF-2006.01
How can I use a multiple alignment? chite ---ADKPKRPLSAYMLWLNSARESIKRENPDFK-VTEVAKKGGELWRGLKD wheat --DPNKPKRAPSAFFVFMGEFREEFKQKNPKNKSVAAVGKAAGERWKSLSE trybr KKDSNAPKRAMTSFMFFSSDFRS----KHSDLS-IVEMSKAAGAAWKELGP mouse -----KPKRPRSAYNIYVSESFQ----EAKDDS-AQGKLKLVNEAWKNLSP ***. ::: .: .. . : . . * . *: * chite AATAKQNYIRALQEYERNGG- wheat ANKLKGEYNKAIAAYNKGESA trybr AEKDKERYKREM--------- mouse AKDDRIRYDNEMKSWEEQMAE * : .* . : Phylogeny Column Constraint � Struc. Prediction Evolution Constraint � Structure Constraint Swiss Institute of Bioinf ormatics Institut Suisse de Bioinf ormatique CN+LF-2006.01 How can I use a multiple alignment? chite ---ADKPKRPLSAYMLWLNSARESIKRENPDFK-VTEVAKKGGELWRGLKD wheat --DPNKPKRAPSAFFVFMGEFREEFKQKNPKNKSVAAVGKAAGERWKSLSE trybr KKDSNAPKRAMTSFMFFSSDFRS----KHSDLS-IVEMSKAAGAAWKELGP mouse -----KPKRPRSAYNIYVSESFQ----EAKDDS-AQGKLKLVNEAWKNLSP ***. ::: .: .. . : . . * . *: * chite AATAKQNYIRALQEYERNGG- wheat ANKLKGEYNKAIAAYNKGESA trybr AEKDKERYKREM--------- mouse AKDDRIRYDNEMKSWEEQMAE * : .* . : PsiPred or PhD Phylogeny For secondary Structure Prediction: Struc. Prediction 75% Accurate. Threading : is improving but is not yet as good. Swiss Institute of Bioinf ormatics Institut Suisse de Bioinf ormatique CN+LF-2006.01
How can I use a multiple alignment? chite ---ADKPKRPLSAYMLWLNSARESIKRENPDFK-VTEVAKKGGELWRGLKD wheat --DPNKPKRAPSAFFVFMGEFREEFKQKNPKNKSVAAVGKAAGERWKSLSE trybr KKDSNAPKRAMTSFMFFSSDFRS----KHSDLS-IVEMSKAAGAAWKELGP mouse -----KPKRPRSAYNIYVSESFQ----EAKDDS-AQGKLKLVNEAWKNLSP ***. ::: .: .. . : . . * . *: * chite AATAKQNYIRALQEYERNGG- wheat ANKLKGEYNKAIAAYNKGESA trybr AEKDKERYKREM--------- mouse AKDDRIRYDNEMKSWEEQMAE * : .* . : Phylogeny Automatic Multiple Struc. Prediction Sequence Alignment methods are not always perfect… Caution! Swiss Institute of Bioinf ormatics Institut Suisse de Bioinf ormatique CN+LF-2006.01 Swiss Institute of Bioinf ormatics Institut Suisse de Bioinf ormatique CN+LF-2006.01
The problem � why is it difficult to compute a multiple sequence alignment? Biology What is a good alignment? chite ---ADKPKRPLSAYMLWLNSARESIKRENPDFK-VTEVAKKGGELWRGLKD wheat --DPNKPKRAPSAFFVFMGEFREEFKQKNPKNKSVAAVGKAAGERWKSLSE trybr KKDSNAPKRAMTSFMFFSSDFRS----KHSDLS-IVEMSKAAGAAWKELGP mouse -----KPKRPRSAYNIYVSESFQ----EAKDDS-AQGKLKLVNEAWKNLSP ***. ::: .: .. . : . . * . *: * Computation What is the good alignment? Swiss Institute of Bioinf ormatics Institut Suisse de Bioinf ormatique CN+LF-2006.01 The problem � why is it difficult to compute a multiple sequence alignment? CIRCULAR PROBLEM.... Good Good Sequences Alignment Swiss Institute of Bioinf ormatics Institut Suisse de Bioinf ormatique CN+LF-2006.01
The problem � Same as pairwise alignment problem � We do NOT know how sequences evolve. � We do NOT understand the relation between structures and sequences. � We would NOT recognize the “correct” alignment if we had it IN FRONT of our eyes… Swiss Institute of Bioinf ormatics Institut Suisse de Bioinf ormatique CN+LF-2006.01 The Charlie Chaplin paradox Swiss Institute of Bioinf ormatics Institut Suisse de Bioinf ormatique CN+LF-2006.01
What do I need to know to make a good multiple alignment? � How do sequences evolve? � How does the computer align the sequences? � How can I choose my sequences? � What is the best program? � How can I use my alignment? Swiss Institute of Bioinf ormatics Institut Suisse de Bioinf ormatique CN+LF-2006.01 An alignment is a story Deletion Insertion ADKPKRPLSAYMLWLN ADKPRRP---LS-YMLWLN ADKPKRPKPRLSAYMLWLN Mutation Mutations ADKPKRPLSAYMLWLN ADKPKRPLSAYMLWLN + Selection ADKPKRPKPRLSAYMLWLN ADKPRRPLS-YMLWLN Swiss Institute of Bioinf ormatics Institut Suisse de Bioinf ormatique CN+LF-2006.01
Homology � Same sequences -> same origin? -> same function? - > same 3D fold? %Sequence Identity Same 3D Fold 30% Twilight Zone Length 100 Swiss Institute of Bioinf ormatics Institut Suisse de Bioinf ormatique CN+LF-2006.01 Residues and mutations � All residues are equal, but some more than others… M C P Small L V A G G I Aliphatic C T S D N H K Y E F Q W R Aromatic Hydrophobic Polar Accurate matrices are data driven rather than knowledge driven Swiss Institute of Bioinf ormatics Institut Suisse de Bioinf ormatique CN+LF-2006.01
Recommend
More recommend