Predicting Secondary Structure of All-Helical Proteins Using Hidden Markov Support Vector Machines Blaise Gassend, Charles W. O'Donnell, William Thies, Andrew Lee, Marten van Dijk, and Srinivas Devadas Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology Workshop on Pattern Recognition in Bioinformatics – August 20, 2006
Protein Structure Prediction • Classical problem: given sequence, predict structure Sequence Sequence Structure Structure • High-level approaches 1. Energy-minimization (ab-initio) techniques - Elegant, but often lack correct parameters 2. Homology-based techniques - Useful, but hard to predict new proteins Our approach: Use energy minimization, but learn parameters from existing proteins
Our Framework (Training) Protein Data Bank Protein Data Bank Correct Amino - acid structure Sequence Prediction Prediction Energy Energy Algorithm Algorithm Parameters Parameters Predicted Learning Learning structure Algorithm Algorithm correct incorrect Done! Done! Constraints Constraints energy(incorrect) > energy(correct)
Our Framework (Testing) Amino - acid Sequence Prediction Prediction Energy Energy Algorithm Algorithm Parameters Parameters Predicted structure
Initial Focus: Secondary Structure • Classify each residue as alpha helix, beta strand, coil – In this paper, restrict to all-alpha proteins • Applications: – Informing tertiary structure predictors – Identification of homologous proteins – Identification of active sites (coils)
Secondary Structure Predictors 100% 90% Prediction Accuracy (Q3) 80% 70% 60% 50% 1975 1980 1985 1990 1995 2000 2005 2010 Year
Secondary Structure Predictors 100% Sequence Sequence Sequence + Sequence + Only Only Alignment Alignment Statistical Methods Statistical Methods 90% Prediction Accuracy (Q3) HMMs 80% 70% Zvelebil et al. DSC 60% GOR Chou/Fasman 50% 1975 1980 1985 1990 1995 2000 2005 2010 Year
Secondary Structure Predictors 100% Sequence Sequence Sequence + Sequence + Only Only Alignment Alignment Statistical Methods Statistical Methods 90% Neural Networks Neural Networks Prediction Accuracy (Q3) PSIPred HMMs Porter SSPro4 80% Peterson PSIPred SSPro Riis/Krough PHD 70% Zvelebil et al. DSC Qian/Sejnoweski 60% GOR Chou/Fasman 50% 1975 1980 1985 1990 1995 2000 2005 2010 Year
Secondary Structure Predictors 100% Sequence Sequence Sequence + Sequence + Only Only Alignment Alignment Statistical Methods Statistical Methods 90% Neural Networks Neural Networks Prediction Accuracy (Q3) SVMs PSIPred HMMs Porter Nguyen Hu SSPro4 80% Peterson PSIPred Kim Ward SSPro Ceroni Riis/Krough PHD Hua/Sun Casbon 70% Zvelebil et al. DSC Qian/Sejnoweski 60% GOR Chou/Fasman 50% 1975 1980 1985 1990 1995 2000 2005 2010 Year
Secondary Structure Predictors 100% Sequence Sequence Sequence + Sequence + Only Only Alignment Alignment Statistical Methods Statistical Methods 90% Neural Networks Neural Networks Prediction Accuracy (Q3) SVMs PSIPred HMMs HMMs Porter NguyenHu SSPro4 80% Peterson PSIPred Kim Ward SSPro Ceroni Won HMMSTR Riis/Krough Nguyen PHD Hua/Sun Casbon 70% Martin Zvelebil et al. DSC Schmidler et al. Martin Qian/Sejnoweski 60% GOR Chou/Fasman 50% 1975 1980 1985 1990 1995 2000 2005 2010 Year
Secondary Structure Predictors 100% Sequence Sequence Sequence + Sequence + Only Only Alignment Alignment Statistical Methods Statistical Methods 90% Neural Networks Neural Networks 1400-2900 Prediction Accuracy (Q3) SVMs parameters PSIPred HMMs HMMs Porter 680 MB of NguyenHu SSPro4 80% Peterson support vectors PSIPred Kim Ward SSPro Ceroni Won HMMSTR Riis/Krough Nguyen PHD Hua/Sun Casbon 70% Martin Zvelebil et al. DSC Schmidler et al. Martin Qian/Sejnoweski 471 parameters 60% • Exploits biochemical models • Offers biological insight GOR Chou/Fasman 50% 1975 1980 1985 1990 1995 2000 2005 2010 Year
Secondary Structure Predictors 100% Sequence Sequence Sequence + Sequence + Only Only Alignment Alignment Statistical Methods Statistical Methods 90% 302 params Neural Networks Neural Networks 1400-2900 Prediction Accuracy (Q3) SVMs parameters PSIPred HMMs HMMs Porter 680 MB of NguyenHu SSPro4 80% Peterson support vectors PSIPred Kim Ward SSPro Ceroni Won THIS HMMSTR Riis/Krough PAPER Nguyen PHD Hua/Sun Casbon 70% Martin Zvelebil et al. DSC Schmidler et al. Martin Qian/Sejnoweski 471 parameters 60% • Exploits biochemical models • Offers biological insight GOR Chou/Fasman 50% 1975 1980 1985 1990 1995 2000 2005 2010 Year
Our Framework Applied to Helix Prediction Protein Data Bank Protein Data Bank Alpha Helices Alpha Helices Correct Amino - acid MNIFEMLRIDEGL structure Hidden Hidden Sequence HHHHHHHHH Markov Model Markov Model Prediction Prediction Energy Energy Support Support Algorithm Algorithm Parameters Parameters Vector Vector Machines Machines Predicted Learning Learning structure Algorithm Algorithm correct incorrect Done! Done! Constraints Constraints energy(incorrect) > energy(correct)
Energy Parameters Number of Description of Energy Parameters Name Parameters Energy of residue R in a helix 20 H R Energy of residue R at offset i (-3…3) from N-cap 140 N R,i Energy of residue R at offset i (-3…3) from C-cap 140 C R,i Penalty for coils of length 1 or 2 2 302 Total
Energy Parameters Number of Description of Energy Parameters Name Parameters Energy of residue R in a helix 20 H R Energy of residue R at offset i (-3…3) from N-cap 140 N R,i Energy of residue R at offset i (-3…3) from C-cap 140 C R,i Penalty for coils of length 1 or 2 2 302 Total • Example: Sequence: MNIFELRIDEGL Structure: HHHHHH Energy =
Energy Parameters Number of Description of Energy Parameters Name Parameters Energy of residue R in a helix 20 H R Energy of residue R at offset i (-3…3) from N-cap 140 N R,i Energy of residue R at offset i (-3…3) from C-cap 140 C R,i Penalty for coils of length 1 or 2 2 302 Total • Example: Sequence: MNIFELRIDEGL Structure: HHHHHH Energy = H F + H E + H L + H R + H I + H D (Helix)
Energy Parameters Number of Description of Energy Parameters Name Parameters Energy of residue R in a helix 20 H R Energy of residue R at offset i (-3…3) from N-cap 140 N R,i Energy of residue R at offset i (-3…3) from C-cap 140 C R,i Penalty for coils of length 1 or 2 2 302 Total • Example: Sequence: MNIFELRIDEGL Structure: HHHHHH Energy = H F + H E + H L + H R + H I + H D (Helix) + N M,-3 + N N,-2 + N I,-1 + N F,0 + N E,1 + N L,2 + N R,3 (N-cap)
Energy Parameters Number of Description of Energy Parameters Name Parameters Energy of residue R in a helix 20 H R Energy of residue R at offset i (-3…3) from N-cap 140 N R,i Energy of residue R at offset i (-3…3) from C-cap 140 C R,i Penalty for coils of length 1 or 2 2 302 Total • Example: Sequence: MNIFELRIDEGL Structure: HHHHHH Energy = H F + H E + H L + H R + H I + H D (Helix) + N M,-3 + N N,-2 + N I,-1 + N F,0 + N E,1 + N L,2 + N R,3 (N-cap) + C L,-3 + C R,-2 + C I,-1 + C D,0 + C E,1 + C G,2 + C L,3 (C-cap)
Energy Parameters Number of Description of Energy Parameters Name Parameters Energy of residue R in a helix 20 H R Energy of residue R at offset i (-3…3) from N-cap 140 N R,i Energy of residue R at offset i (-3…3) from C-cap 140 C R,i Penalty for coils of length 1 or 2 2 302 Total • Example: Sequence: MNIFELRIDEGL Structure: HHHHHH Energy = H F + H E + H L + H R + H I + H D (Helix) + N M,-3 + N N,-2 + N I,-1 + N F,0 + N E,1 + N L,2 + N R,3 (N-cap) + C L,-3 + C R,-2 + C I,-1 + C D,0 + C E,1 + C G,2 + C L,3 (C-cap)
Learning the Parameters Feature Space Energy ( ) = H A *A + H G *G Legal structure G: # of Glycines in Helices Correct structure = w · [A G] where w represents the energy parameters [H A H G ] Highest energy in direction of energy parameters w A: # of Alanines in Helices
Learning the Parameters Feature Space Energy ( ) = H A *A + H G *G Legal structure G: # of Glycines in Helices Correct structure = w · [A G] where w represents the energy parameters [H A H G ] Highest energy in direction of energy parameters w w A: # of Alanines in Helices
Learning the Parameters Feature Space 1. Predict stucture Legal structure G: # of Glycines in Helices Correct structure Predicted structure w A: # of Alanines in Helices
Learning the Parameters Feature Space 1. Predict stucture Legal structure G: # of Glycines in Helices Correct structure Predicted structure A: # of Alanines in Helices
Learning the Parameters Feature Space 1. Predict stucture Legal structure G: # of Glycines in Helices 2. Refine parameters Correct structure Predicted structure Separating Hyperplane A: # of Alanines in Helices
Learning the Parameters Feature Space 1. Predict stucture Legal structure G: # of Glycines in Helices 2. Refine parameters Correct structure Predicted structure w Separating Hyperplane A: # of Alanines in Helices
Learning the Parameters Feature Space 1. Predict stucture Legal structure G: # of Glycines in Helices 2. Refine parameters Correct structure Predicted structure w A: # of Alanines in Helices
Recommend
More recommend