protein structure bioinformatics introduction
play

Protein Structure Bioinformatics Introduction Secondary Structure - PDF document

Introduction to Protein Structure Bioinformatics 29.9.2004 Protein Structure Bioinformatics Introduction Secondary Structure Prediction & Fold recognition EMBnet course Basel, September 29, 2004 Lorenza Bordoli Swiss Institute of


  1. Introduction to Protein Structure Bioinformatics 29.9.2004 Protein Structure Bioinformatics Introduction Secondary Structure Prediction & Fold recognition EMBnet course Basel, September 29, 2004 Lorenza Bordoli Swiss Institute of Bioinformatics Overview � Introduction � Secondary Structure Prediction � Fold Recognition Lorenza Bordoli 1

  2. Introduction to Protein Structure Bioinformatics 29.9.2004 Principles of protein structure � Primary Structure � Secondary Structure � Tertiary Structure (Fold) � Quaternary Structure Principles of protein structure Protein structure include: � Core Region: � Secondary structure element packed in close proximity in hydrophobic environment � Limited amino acid substitution � Outside the core: � loops and structural elements in contact with water, membrane or other proteins � Amino acid substitution: not as restricted as above Lorenza Bordoli 2

  3. Introduction to Protein Structure Bioinformatics 29.9.2004 PDB Holdings PDB Holdings Lorenza Bordoli 3

  4. Introduction to Protein Structure Bioinformatics 29.9.2004 Protein Structure Databases � PDB: http://www.pdb.org � X-Ray, NMR => atom coordinates of the proteins are deposited in PDB: worldwide repository for the 3-D biological macromolecular structure data. � EBI-MSD: http://www.ebi.ac.uk/msd/ (2003) � suite of web-based search and retrieval interfaces for macromolecular structure research. Protein Structure Databases http://www.wwpdb.org/ Lorenza Bordoli 4

  5. Introduction to Protein Structure Bioinformatics 29.9.2004 Introduction � Goal: Relationship between amino acid sequence and three-dimensional structure in proteins? Can we predict the structure from the sequence? � Currently: comparative (homology) modeling; See Lecture Thursday (Torsten) Homology Modeling Homology modeling = Comparative protein modeling Structure is better conserved than sequence Similar Sequence � Similar Structure Idea: Using experimental 3D-structures of related family members (templates) to calculate a model for a new sequence (target). Lorenza Bordoli 5

  6. Introduction to Protein Structure Bioinformatics 29.9.2004 Flow chart: analyze a new protein sequence Database Does sequence align Protein family Protein Sequence similarity search with a protein of Sequence search known structure ? (BLAST) (Pfam) Predicted Relatioship 3D Homology Modeling to known structure? Structural model Structure prediction Hints for domain (Secondary Structure assignment? Fold recognition) Function? 3D structural analysis in laboratory Secondary structure assignment � DSSP � Dictionary of Secondary Structure of Proteins (Kabsch & Sander, 1983) � Based on recognition of hydrogen-bonding patterns in known structures � Automated assignment of secondary structure � Interprets backbone hydrogen bonds � Uses a Coulomb approximation for the hydrogen bond energy (-0.5 kcal/mol cut-off) � Secondary structures are assigned to consecutive segments of residues with hydrogen bonds Lorenza Bordoli 6

  7. Introduction to Protein Structure Bioinformatics 29.9.2004 Secondary structure assignment � DSSP secondary structure elements � 8 secondary structure classes – H ( α -helix) → H – G (3 10 -helix) → H – I ( π -helix) → H – E (extended strand) → E – B (residue in isolated β -bridge) → E – T (turn) → L – S (bend) → L – " " (blank = other) → L Secondary Structure prediction � What is protein secondary structure prediction? � Simplification of prediction problem � 3D → 1D � Why do we need it? � As starting point for 3D modeling: • Improve sequence alignments • Use in fold recognition (discover family/superfamily relationship) • Definition of loops / core regions Lorenza Bordoli 7

  8. Introduction to Protein Structure Bioinformatics 29.9.2004 Secondary Structure prediction � Assumption: � there should be a correlation between amino acid sequence and secondary structure � What can we predict? � α -helix � β -strand � Loop (coil) Secondary Structure prediction � Projection onto strings of structural assignments � “Secondary Structure” 3-state model: (S) β -Strand (E) (H) α -Helix (L) Loop SEQ MRIILLGAPGAGKGTQAQFIMEKYGIPQISTGDMLRAAVKSGSELGKQAK SS SSSSSSLLLLLLHHHHHHHHHHHLLLSSSLHHHHHHHHHHHLLLLLLHHH SS SSSSSS HHHHHHHHHHH SSS HHHHHHHHHHH HHH Lorenza Bordoli 8

  9. Introduction to Protein Structure Bioinformatics 29.9.2004 Accuracy of prediction � 3-state-per-residue accuracy: � Gives % of correctly predicted residues in α , β or other state � Q 3 = 100 • Σ c i /N • N= total number of residues • C i = number of correctly predicted residue in state I (H,E,L) Performance Evaluation � Assumption: there should be a correlation * between amino acid sequence and secondary structure � Systematic performance testing pre-requisite for reliability of method PDB Dataset Training Set Test Set PDB sub set: PDB sub-set: derive correlation* => Q3 Lorenza Bordoli 9

  10. Introduction to Protein Structure Bioinformatics 29.9.2004 Conformational Preferences α β RT Biochimica et Biophysica Acta 916: 200-204 (1987). 1st Generation secondary structure prediction � 1st Generation based on single amino acid propensities � Chou and Fasman, 1974 � Robson, 1976 � GOR-1: Garnier, Osguthorpe, and Robson, 1978 � Preference of particular residues for certain secondary structure elements: � Single-residue statistics: analysis of the frequency of each 20 aa in α helices, β strands or coils � Databases of very limited size � < 55% Q 3 accuracy Lorenza Bordoli 10

  11. Introduction to Protein Structure Bioinformatics 29.9.2004 1st Generation secondary structure prediction � Chou and Fasman (partial table): Amino Acid P α P β P t Glu 1.51 0.37 0.74 Met 1.45 1.05 0.60 Ala 1.42 0.83 0.66 Val 1.06 1.70 0.50 Ile 1.08 1.60 0.50 Tyr 0.69 1.47 1.14 Pro 0.57 0.55 1.52 Gly 0.57 0.75 1.56 Chou-Fasman P ij -values Name P(H) P(E) P(turn) f(i) f(i+1) f(i+2) f(i+3) Alanine 142 83 66 0.06 0.076 0.035 0.058 Arginine 98 93 95 0.07 0.106 0.099 0.085 Aspartic Acid 101 54 146 0.147 0.11 0.179 0.081 Asparagine 67 89 156 0.161 0.083 0.191 0.091 Cysteine 70 119 119 0.149 0.05 0.117 0.128 Glutamic Acid 151 37 74 0.056 0.06 0.077 0.064 Glutamine 111 110 98 0.074 0.098 0.037 0.098 Glycine 57 75 156 0.102 0.085 0.19 0.152 Histidine 100 87 95 0.14 0.047 0.093 0.054 Isoleucine 108 160 47 0.043 0.034 0.013 0.056 Leucine 121 130 59 0.061 0.025 0.036 0.07 Lysine 114 74 101 0.055 0.115 0.072 0.095 Methionine 145 105 60 0.068 0.082 0.014 0.055 Phenylalanine 113 138 60 0.059 0.041 0.065 0.065 Proline 57 55 152 0.102 0.301 0.034 0.068 Serine 77 75 143 0.12 0.139 0.125 0.106 Threonine 83 119 96 0.086 0.108 0.065 0.079 Tryptophan 108 137 96 0.077 0.013 0.064 0.167 Tyrosine 69 147 114 0.082 0.065 0.114 0.125 Valine 106 170 50 0.062 0.048 0.028 0.053 Lorenza Bordoli 11

  12. Introduction to Protein Structure Bioinformatics 29.9.2004 Chou-Fasman How it works: a. Assign all of the residues the appropriate set of parameters b. Identify a-helix and b-sheet regions. Extend the regions in both directions. c. If structures overlap compare average values for P(H) and P(E) and assign secondary structure based on best scores. d. Turns are modeled as tetra-peptides using 2 different probability values. Assign Pij values 1. Assign all of the residues the appropriate set of parameters T S P T A E L M R S T G P(H) 69 77 57 69 142 151 121 145 98 77 69 57 P(E) 147 75 55 147 83 37 130 105 93 75 147 75 114 143 152 114 66 74 59 60 95 143 114 156 P(turn) Lorenza Bordoli 12

  13. Introduction to Protein Structure Bioinformatics 29.9.2004 Scan peptide for α− helix regions 2. Identify regions where 4/6 aa have a P(H) >100 “alpha-helix nucleus” T S P T A E L M R S T G 69 77 57 69 142 151 121 145 98 77 69 57 P(H) T S P T A E L M R S T G 69 77 57 69 142 151 121 145 98 77 69 57 P(H) Extend α -helix nucleus 3. Extend helix in both directions until a set of four residues have an average P(H) <100. T S P T A E L M R S T G P(H) 69 77 57 69 142 151 121 145 98 77 69 57 Repeat steps 1 – 3 for entire peptide Lorenza Bordoli 13

Recommend


More recommend