sam t04 what s new for casp6
play

SAM-T04: whats new for CASP6 Kevin Karplus Richard Hughey Jenny - PowerPoint PPT Presentation

SAM-T04: whats new for CASP6 Kevin Karplus Richard Hughey Jenny Draper, Sol Katzman, Martina Koeva, George Shackelford Bret Barnes, Marcia Soriano karplus@soe.ucsc.edu Biomolecular Engineering University of California, Santa Cruz CASP6,


  1. SAM-T04: what’s new for CASP6 Kevin Karplus Richard Hughey Jenny Draper, Sol Katzman, Martina Koeva, George Shackelford Bret Barnes, Marcia Soriano karplus@soe.ucsc.edu Biomolecular Engineering University of California, Santa Cruz CASP6, SAM-T04 – p.1/43

  2. Steps of SAM-Txx Methods Iterative search and alignment [rewritten, minor improvements] Local structure prediction [new alphabets, minor tweaks] Multi-track HMMs [minor tweaks] Finding medium-length fragments (fragfinder) [multi-track HMMs, filter implausible] Contact prediction [all new] Conformation generation (undertaker) [major changes] CASP6, SAM-T04 – p.2/43

  3. Contact prediction: new in 2004! Use mutual information between columns. Thin alignments aggressively (30%, 35%, 40%, 50%, 62%). Compute e-value for mutual info (correcting for small-sample effects). Compute z-score of log(e-value) within protein. Feed e-values, z-scores, conservation, amino-acid profile, separation along chain into neural net. CASP6, SAM-T04 – p.3/43

  4. Evaluating contact prediction Two measures of contact prediction: Accuracy: � χ ( i, j ) � 1 (favors short-range predictions, where contact probability is higher) Weighted accuracy: χ ( i,j ) � � contact | separation = | i − j | � Prob � 1 (1 if predictions no better than chance based on separation). CASP6, SAM-T04 – p.4/43

  5. Contact prediction results Accuracy of contact prediction, by protein Weighted-accuracy of contact prediction, by protein 0.5 30 avg is_contact/prob(contact|sep) Neural net Neural net 0.45 thin62, e-value thin62, e-value 25 true positives/predicted thin62, raw mi thin62, raw mi 0.4 0.35 20 0.3 0.25 15 0.2 10 0.15 0.1 5 0.05 0 0 0.01 0.1 1 0.01 0.1 1 predictions/residue predictions/residue CASP6, SAM-T04 – p.5/43

  6. Undertaker Undertaker is UCSC’s attempt at a fragment-packing program (named because it optimizes burial). New cost functions (especially H-bonds) Improved clash detection. New conformation change operators (tweaking torsion angles, rigid body movements of chunks). New ways to specify constraints (Hbond, SSbond, HelixConstraint, StrandConstraint, SheetConstraint). Improved adaptation of genetic algorithm. CASP6, SAM-T04 – p.6/43

  7. Model 1 vs. Robetta 1 smooth GDT scores 100 90 229_1 270 80 207 231 SAM-T04 model1 222_2 199_1 70 249_1 60 281 280_1 50 247_1 243 230 40 201 264 214 30 197 206 20 202_1 248 10 262 262_2 202 0 0 10 20 30 40 50 60 70 80 90 Robetta model1 CASP6, SAM-T04 – p.7/43

  8. Good stuff from Murzin We won’t discuss the following: T0270: 1t0tA became available after servers ran. T0213: Murzin suggested using 1t62A for T0213, T0214, and T0227. T04 scored 1t62A best—we messed up the good alignment. T0214: We used 1t62A, but we never got a good alignment. T0227: T04 scored 1t62A best, but 2 ◦ prediction was poor, so we had bad alignments. T0240: We submitted both dimer and monomer, but mistakenly put the dimer first. T0245: 1tljA became available, but we don’t have the true structure yet. CASP6, SAM-T04 – p.8/43

  9. Best vs. Robetta best (NF and FR/A) smooth GDT scores 55 248_1 50 281 201 45 215 209_2 40 SAM-T04 best 230 235_2 35 248_3 30 248_2 212 25 199_3 272_1 280_2 20 15 10 5 5 10 15 20 25 30 35 40 45 50 55 Robetta best CASP6, SAM-T04 – p.9/43

  10. Good stuff from Robetta We won’t discuss the following, because the good stuff in them seems to have come from better Robetta models: T0209_2: sheet constraints from Robetta-model1 T0248 (all 3 domains): borrows heavily from Robetta-model2 CASP6, SAM-T04 – p.10/43

  11. Model 1 vs. alignment (NF and FR/A) smooth GDT scores 60 50 281 248_1 215 SAM-T04 model1 209_2 40 201 235_2 230 248_2 30 248_3 212 199_3 238 20 272_2 10 216_2 0 0 5 10 15 20 25 30 35 SAM-T04 first alignment CASP6, SAM-T04 – p.11/43

  12. Auto vs. align (NF and FR/A) smooth GDT scores 40 215 235_2 35 281 230 SAM-T04 automatic 30 248_1 248_3 25 199_3 209_2 201 280_2 272_1 20 212 248_2 242 15 272_2 241_2 238 273 10 209_1 216_2 5 0 0 5 10 15 20 25 30 35 SAM-T04 first alignment CASP6, SAM-T04 – p.12/43

  13. Target T0201 (NF) We tried forcing various sheet topologies and selected 4 by hand. Model 1 has right topology (5.9117 all-atom RMSD). Unconstrained cost function not good at choosing topology. Contact prediction didn’t help, though first prediction right. Helices were too short. Highest GDT and lowest RMSD model (try41-opt2.repack-nonPC 5.4912 all-atom) has wrong topology. CASP6, SAM-T04 – p.13/43

  14. Target T0201 (NF) CASP6, SAM-T04 – p.14/43

  15. Target T0201 (NF) Wrong topology, but best scoring decoy. CASP6, SAM-T04 – p.15/43

  16. Target T0230 (FR/A) Good except for C-terminal loop and helix flopped wrong way. We have secondary structure right, including phase of beta strands. Contact prediction helped, but we put too much weight on it—decoys fit predictions better than real structure does. CASP6, SAM-T04 – p.16/43

  17. Target T0230 (FR/A) CASP6, SAM-T04 – p.17/43

  18. Target T0230 (FR/A) Real structure with contact predictions: CASP6, SAM-T04 – p.18/43

  19. Target T0281 (FR/A) Third strand has off-by-one error. Top T04 hit (1gefA) is good, T2K put it 3rd. We submitted the best model we had (in GDT score, try7-opt1 had better rmsd). Sol’s hand work helped, but my attempts to force M1-P4 as a first strand and to remove the bulge at R22 were misguided. CASP6, SAM-T04 – p.19/43

  20. Target T0281 (FR/A) Red is real structure. CASP6, SAM-T04 – p.20/43

  21. Target T0215 (FR/A) Secondary structure good, but helix packing angles wrong. Need helix packing info in undertaker—hand-added constraints were wrong. Too few homologs for contact prediction. CASP6, SAM-T04 – p.21/43

  22. Target T0215 (FR/A) Red is real structure. CASP6, SAM-T04 – p.22/43

  23. Target T0212 (FR/A) We tried to force a jelly-roll structure with the N-terminal strand omitted. Swapping the N- and C-terminal strands of our model would make it almost right. Strand T60-A66 is off by one. CASP6, SAM-T04 – p.23/43

  24. Target T0212 (FR/A) CASP6, SAM-T04 – p.24/43

  25. Web sites UCSC bioinformatics degrees: http://www.soe.ucsc.edu/programs/bionformatics/ SAM tool suite info: http://www.soe.ucsc.edu/research/compbio/sam.html H MM servers: http://www.soe.ucsc.edu/research/compbio/HMM-apps/ These slides: http://www.soe.ucsc.edu/˜karplus/papers/casp6-slides.pdf CASP6 all working files: http://www.soe.ucsc.edu/˜karplus/casp6 CASP6, SAM-T04 – p.25/43

  26. Iterative search using HMM s SAM-T98, T99, T2K, and T04 methods all use similar method for building a target HMM , given a single sequence (or a seed alignment). The target04 script uses perl modules to encapsulate programs, for greater flexibility. uses fastacmd instead of grep for counting and retrieving sequences. uses blastpgp on each iteration to prefetch sequences for hmmscore. uses cheap_gaps transition regularizer throughout. CASP6, SAM-T04 – p.26/43

  27. Local Structure Alphabets Use more backbone alphabets: DSSP & DSSP-ehl2 Str2 Stride Bystroff alpha Use burial alphabets: CB-14-7 near-backbone-11 CASP6, SAM-T04 – p.27/43

  28. Neural Net We use neural nets to predict local properties. Input is profile with probabilities of amino acids at each position of target chain, plus insertion and deletion probabilities. New in 2004 is additional 20 inputs with one-hot encoding of amino acid in the target sequence. Neural nets were retrained using T04 alignments and better training set. CASP6, SAM-T04 – p.28/43

  29. Multi-track H MM s Using more 2-track HMM s: amino acid plus each local structure alphabet. Using 3-track HMM s: amino acid, backbone (str2), burial (CB-14-7) Generate many alignments for each potential template. use different HMM s. use both local and global. use both Viterbi and posterior decoding. CASP6, SAM-T04 – p.29/43

  30. Fragfinder Medium-length fragments (9 long) for every position Generated from 3-track HMM s. Residues filtered to remove improbable φ - ψ pairs (creating smaller fragments). CASP6, SAM-T04 – p.30/43

  31. Best vs. Robetta best smooth GDT scores 100 90 270 268_2 80 231 207 249_1 70 SAM-T04 best 60 240 243 50 201 264 213 40 30 197 206 202_1 20 227 262 10 202 262_2 0 0 10 20 30 40 50 60 70 80 90 Robetta best CASP6, SAM-T04 – p.31/43

  32. SAM-T04 auto vs. Robetta 1 smooth GDT scores 100 90 229_1 270 80 231 SAM-T04 automatic 70 247_3 249_1 60 254 50 280_1 40 243 30 206 234 20 199_2 251 222_2 202_1 209_2 10 228_2 262_2 0 0 10 20 30 40 50 60 70 80 90 Robetta model1 CASP6, SAM-T04 – p.32/43

  33. Model 1 vs. SAM-T04 auto smooth GDT scores 100 90 80 SAM-T04 model1 222_2 207 199_1 70 268_2 60 229 248_1 243 280_1 234 50 281 209_2 205 199_2 40 30 228_2 197 208 20 262_1 10 0 0 10 20 30 40 50 60 70 80 90 100 SAM-T04 automatic CASP6, SAM-T04 – p.33/43

  34. Model 1 vs. alignment smooth GDT scores 100 90 80 233_1 231 277 SAM-T04 model1 222_2 207 70 275 249_1 268_2 60 232 264_2 50 247_1 281 248_1 215 201 209_2 40 251 235_2 223_1 248_2 30 212 20 223 262_2 272_2 10 0 0 10 20 30 40 50 60 70 80 90 100 SAM-T04 first alignment CASP6, SAM-T04 – p.34/43

Recommend


More recommend