protein interaction prediction the pipe and insips
play

Protein Interaction Prediction: The PIPE and InSiPS Projects Frank - PowerPoint PPT Presentation

Protein Interaction Prediction: The PIPE and InSiPS Projects Frank Dehne School of Computer Science Centre For Advanced Studies Canada Frank Dehne www.dehne.net Parallel Computational Biochemistry Protein-Protein Interactions Frank Dehne


  1. Protein Interaction Prediction: The PIPE and InSiPS Projects Frank Dehne School of Computer Science Centre For Advanced Studies Canada Frank Dehne ■ www.dehne.net

  2. Parallel Computational Biochemistry Protein-Protein Interactions Frank Dehne ■ www.dehne.net

  3. The PIPE Project (Start: 2003) Multi-Disciplinary Team ● Computer Science ● Graduate Students (CS) – F.Dehne – S.Pitre, C.North, A.Amos- Binks, A.Schoenrock, ... ● Biochemistry ● Graduate Students – A.Golshani (Biochemistry) – A.Wong – B.Samanfar, M.Hooshyar, – J.Greenblatt (Toronto) M.Alamgir, K.Omidi, ● Biomedical Engineering D.Burnside , ... – J.Green Frank Dehne ■ www.dehne.net

  4. Proteins Frank Dehne ■ www.dehne.net

  5. Proteins Primary Sequence: V H L T P E E K ... 3D Structure: Frank Dehne ■ www.dehne.net

  6. Protein-Protein Interactions (PPIs) Frank Dehne ■ www.dehne.net

  7. Protein-Protein Interaction Networks Partial Arabidopsis PPI Network Frank Dehne ■ www.dehne.net

  8. PPI Enabled Cell Processes S. cerevisiae (Yeast) Frank Dehne ■ www.dehne.net

  9. Tandem Affinity Purification (TAP) Do YGL227W and YMR135C interact? Frank Dehne ■ www.dehne.net

  10. Experimental Data TAP tag Y2H S. cerevisiae (Yeast) Frank Dehne ■ www.dehne.net

  11. Experimental Data species # proteins # protein # known # unknown S. cerevisiae pairs interactions interactions S. cerevisiae 6,300 19,867,056 15,151 ??? C. elegans 23,684 280,454,086 6,607 ??? C. elegans H. sapiens 22,513 253,406,328 41,678 ??? H. sapiens Frank Dehne ■ www.dehne.net

  12. PPI Prediction ● Can we detect PPIs based on primary sequence only? ● Advantages: – No 3D structure information needed. (PDB is small) – Can be applied to all proteins, even those without known 3D structure. – Can be applied to all genomes, even newly sequenced ones. Frank Dehne ■ www.dehne.net

  13. Basic PIPE Algorithm String comparison Match = (Sum of pairwise PAM values > Threshold) Frank Dehne ■ www.dehne.net

  14. PIPE Output Positive Frank Dehne ■ www.dehne.net

  15. PIPE Output Negative Frank Dehne ■ www.dehne.net

  16. PIPE: Detecting Novel Protein-Protein Interactions Yeast: YGL227W - YMR135C Frank Dehne ■ www.dehne.net

  17. PIPE: Detecting Novel Protein-Protein Interactions Yeast: YGL227W - YMR135C Experimental Verification Banting and Best Institute of Medical Research, Toronto Frank Dehne ■ www.dehne.net

  18. PIPE: Detecting Novel Protein-Protein Interactions Yeast: YGL227W - YMR135C Banting and Best Institute of Medical Research, Toronto Frank Dehne ■ www.dehne.net

  19. PIPE: Detecting Novel Protein-Protein Interactions Yeast: YGL227W - YMR135C Banting and Best Institute of Medical Research, Toronto Protein complex: YGL227W, YMR135C, YIL017C, YDL176W, YIL097W, YDR255C, YBR105C Frank Dehne ■ www.dehne.net

  20. PIPE: Elucidating the Architecture of Protein Complexes S. cerevisiae Frank Dehne ■ www.dehne.net

  21. Global Scan of Entire Protein Interaction Networks species # proteins # protein # known # unknown pairs interactions interactions S. cerevisiae 6,300 19,867,056 15,151 ??? C. elegans 23,684 280,454,086 6,607 ??? H. sapiens 22,513 253,406,328 41,678 ??? Frank Dehne ■ www.dehne.net

  22. Challenges ● Large number of protein pairs – Requires innovative data structures for approx. string matching (Hamming distance via PAM matrix). – Requires high performance computing. ● Small number of true positives (very sparse, ~ 0.1 % density) – Requires very high specificity ~99.95 % (i.e. less than 0.05% false positive rate) – Otherwise: #false positives > #true positives Frank Dehne ■ www.dehne.net

  23. Challenges ● False positives created by “popular” motifs that are not related to protein interaction. Frank Dehne ■ www.dehne.net

  24. PIPE's Prediction Accuracy Frank Dehne ■ www.dehne.net

  25. BMC Bioinformatics: Comparison Study PIPE PIPE 2nd 2nd Consensus (incl. PIPE) Human Yeast Frank Dehne ■ www.dehne.net

  26. PIPE's Performance PIPE Sequential Performance Improvements: ● Character based amino acid representation was converted into binary encodings that eliminated lookup in PAM120. ● “Sliding window” process was improved to use incremental updates. ● Fast similarity search: Pre-computed all possible protein fragment comparisons and stored all matches of similar fragments in a hash table. Frank Dehne ■ www.dehne.net

  27. Large Scale Parallelization: MP-PIPE Architecture: Cluster of multi-core processors  One MP-PIPE worker per proc.  Each worker with multiple threads  H.sapiens protein pairs Frank Dehne ■ www.dehne.net

  28. Global Scan of Entire Protein Interaction Networks MP-PIPE's superior performance and prediction accuracy enabled the first ever complete scan of entire protein interaction networks species # proteins # protein # known # novel Running time pairs interaction PIPE pred. * (1,000 proc. cores) s 1 hour S. 6,300 19,867,056 15,151 14,438 cerevisiae 1 week C. elegans 23,684 280,454,086 6,607 32,548 H.sapiens 22,513 253,406,328 41,678 130,470 3 months * False positive rate: 0.0001 Frank Dehne ■ www.dehne.net

  29. H.Sapiens dsDNA Break Repair Blue: Proteins known to be involved in dsDNA break repair Green: Known interaction Red: Novel interactions discovered by PIPE Yellow: Novel proteins likely involved in dsDNA break repair

  30. InSiPS : The In Silico Protein Synthesizer A computational tool that can synthesize proteins with specific protein-protein interaction prediction profiles. Frank Dehne ■ www.dehne.net

  31. The In Silico Protein Synthesizer (InSiPS) ● Given – a set of target proteins and targets – a set of non-target proteins . ● Design a protein (sequence) ? that is – predicted to interact with the target proteins and – predicted not to interact with non-targets the non-targets. Frank Dehne ■ www.dehne.net

  32. Drugs Based On PPI Inhibitors Frank Dehne ■ www.dehne.net

  33. Drugs Based On PPI Inhibitors Frank Dehne ■ www.dehne.net

  34. Fragment Based Screening Frank Dehne ■ www.dehne.net

  35. InSiPS : Synthetic Proteins As PPI Inhibitors ● More “druggable targets” ● Can attach to “flat” larger pathway interaction regions that smaller compounds can not intercept X recognize target ● Natural compounds can have side effects No side effects Frank Dehne ■ www.dehne.net

  36. InSiPS : Algorithm Frank Dehne ■ www.dehne.net

  37. Performance On BlueGene /Q Population Size: 1500 Sequences. 1 Target. 250 Non-targets. #Nodes (16 cores per node) #Nodes (16 cores per node) Frank Dehne ■ www.dehne.net

  38. Parameter Tuning Limitations ● Can InSiPS always find a PPI Parameters: inhibitor for any combination of target / non-target proteins ● No! (May not even be biochemically possible.) Frank Dehne ■ www.dehne.net

  39. InSiPS : Limitations “Good” Cases “Bad” Cases Frank Dehne ■ www.dehne.net

  40. InSiPS : Experimental Verification ● Task: Design a protein that attaches to a yeast protein involved in DNA repair, thereby blocking its function. ● Target Yeast protein: YAL017W (PSK1) – DNA repair ● Non-Targets: All other Yeast proteins (~ 6,000) ● InSiPS generated protein: “ Anti-PSK1 ”: HHHHHHSDNEHLHKCQRLKTRWKMARQFSDPQHNMYWIINWAQAMNIHADQNQEEEEELHDASVNNAEQYMAQCAPE EACQYPVRRSYGLHATNCIERRKCCMIMYQHPTCRQWEAKNTCAISRAGKGVYWKGIIFMRAWKHWCTRRLVQ ● Fitness: 0.465163 ● Target score: 0.71832232 ● Max non-target score: 0.35243136 (YLL039C) InSiPS ● Avg non-target score: 0.0720702297 Blue Gene /Q Frank Dehne ■ www.dehne.net

  41. InSiPS : Experimental Verification ● Task: Design a protein that attaches to a yeast protein involved in DNA repair, thereby blocking its function. ● Target Yeast protein: YAL017W (PSK1) – DNA repair ● Non-Targets: All other Yeast proteins (~ 6,000) ● InSiPS generated protein: “ Anti-PSK1 ”: HHHHHHSDNEHLHKCQRLKTRWKMARQFSDPQHNMYWIINWAQAMNIHADQNQEEEEELHDASVNNAEQYMAQCAPE EACQYPVRRSYGLHATNCIERRKCCMIMYQHPTCRQWEAKNTCAISRAGKGVYWKGIIFMRAWKHWCTRRLVQ ● Fitness: 0.465163 ● Target score: 0.71832232 ● Max non-target score: 0.35243136 (YLL039C) ● Avg non-target score: 0.0720702297 Frank Dehne ■ www.dehne.net

  42. InSiPS : Experimental Verification UV Light PSK1 DNA mRNA Protein mR mR Prote NA NA in Deletion DNA mRNA Protein mR Prote mR NA in NA Anti-PSK1 DNA mRNA Protein mR Prote mR NA in NA Frank Dehne ■ www.dehne.net

  43. InSiPS : Experimental Verification WT WT + PSK1 (empty Anti-PSK1 knockout WT vector) expressed Decreasing cell density Expression of Anti-Psk1 causes sensitivity to UV light. Equal numbers of cells serially diluted and exposed to 30s of UV light Frank Dehne ■ www.dehne.net

  44. Current Project: Muscular Dystrophy With Alex Blais, Ottawa General Hospital Frank Dehne ■ www.dehne.net

  45. Muscular Dystrophy With Alex Blais, Ottawa General Hospital Frank Dehne ■ www.dehne.net

  46. Muscular Dystrophy With Alex Blais, Ottawa General Hospital Dystrophic patient Healthy donor Stem Cell Therapy 1. Muscle biopsy from healthy donor 2. Satellite cell isolation 3. In vitro expansion 4. Transplantation into patient Frank Dehne ■ www.dehne.net

Recommend


More recommend