Protein Interaction Prediction: The PIPE and InSiPS Projects Frank Dehne School of Computer Science Centre For Advanced Studies Canada Frank Dehne ■ www.dehne.net
Parallel Computational Biochemistry Protein-Protein Interactions Frank Dehne ■ www.dehne.net
The PIPE Project (Start: 2003) Multi-Disciplinary Team ● Computer Science ● Graduate Students (CS) – F.Dehne – S.Pitre, C.North, A.Amos- Binks, A.Schoenrock, ... ● Biochemistry ● Graduate Students – A.Golshani (Biochemistry) – A.Wong – B.Samanfar, M.Hooshyar, – J.Greenblatt (Toronto) M.Alamgir, K.Omidi, ● Biomedical Engineering D.Burnside , ... – J.Green Frank Dehne ■ www.dehne.net
Proteins Frank Dehne ■ www.dehne.net
Proteins Primary Sequence: V H L T P E E K ... 3D Structure: Frank Dehne ■ www.dehne.net
Protein-Protein Interactions (PPIs) Frank Dehne ■ www.dehne.net
Protein-Protein Interaction Networks Partial Arabidopsis PPI Network Frank Dehne ■ www.dehne.net
PPI Enabled Cell Processes S. cerevisiae (Yeast) Frank Dehne ■ www.dehne.net
Tandem Affinity Purification (TAP) Do YGL227W and YMR135C interact? Frank Dehne ■ www.dehne.net
Experimental Data TAP tag Y2H S. cerevisiae (Yeast) Frank Dehne ■ www.dehne.net
Experimental Data species # proteins # protein # known # unknown S. cerevisiae pairs interactions interactions S. cerevisiae 6,300 19,867,056 15,151 ??? C. elegans 23,684 280,454,086 6,607 ??? C. elegans H. sapiens 22,513 253,406,328 41,678 ??? H. sapiens Frank Dehne ■ www.dehne.net
PPI Prediction ● Can we detect PPIs based on primary sequence only? ● Advantages: – No 3D structure information needed. (PDB is small) – Can be applied to all proteins, even those without known 3D structure. – Can be applied to all genomes, even newly sequenced ones. Frank Dehne ■ www.dehne.net
Basic PIPE Algorithm String comparison Match = (Sum of pairwise PAM values > Threshold) Frank Dehne ■ www.dehne.net
PIPE Output Positive Frank Dehne ■ www.dehne.net
PIPE Output Negative Frank Dehne ■ www.dehne.net
PIPE: Detecting Novel Protein-Protein Interactions Yeast: YGL227W - YMR135C Frank Dehne ■ www.dehne.net
PIPE: Detecting Novel Protein-Protein Interactions Yeast: YGL227W - YMR135C Experimental Verification Banting and Best Institute of Medical Research, Toronto Frank Dehne ■ www.dehne.net
PIPE: Detecting Novel Protein-Protein Interactions Yeast: YGL227W - YMR135C Banting and Best Institute of Medical Research, Toronto Frank Dehne ■ www.dehne.net
PIPE: Detecting Novel Protein-Protein Interactions Yeast: YGL227W - YMR135C Banting and Best Institute of Medical Research, Toronto Protein complex: YGL227W, YMR135C, YIL017C, YDL176W, YIL097W, YDR255C, YBR105C Frank Dehne ■ www.dehne.net
PIPE: Elucidating the Architecture of Protein Complexes S. cerevisiae Frank Dehne ■ www.dehne.net
Global Scan of Entire Protein Interaction Networks species # proteins # protein # known # unknown pairs interactions interactions S. cerevisiae 6,300 19,867,056 15,151 ??? C. elegans 23,684 280,454,086 6,607 ??? H. sapiens 22,513 253,406,328 41,678 ??? Frank Dehne ■ www.dehne.net
Challenges ● Large number of protein pairs – Requires innovative data structures for approx. string matching (Hamming distance via PAM matrix). – Requires high performance computing. ● Small number of true positives (very sparse, ~ 0.1 % density) – Requires very high specificity ~99.95 % (i.e. less than 0.05% false positive rate) – Otherwise: #false positives > #true positives Frank Dehne ■ www.dehne.net
Challenges ● False positives created by “popular” motifs that are not related to protein interaction. Frank Dehne ■ www.dehne.net
PIPE's Prediction Accuracy Frank Dehne ■ www.dehne.net
BMC Bioinformatics: Comparison Study PIPE PIPE 2nd 2nd Consensus (incl. PIPE) Human Yeast Frank Dehne ■ www.dehne.net
PIPE's Performance PIPE Sequential Performance Improvements: ● Character based amino acid representation was converted into binary encodings that eliminated lookup in PAM120. ● “Sliding window” process was improved to use incremental updates. ● Fast similarity search: Pre-computed all possible protein fragment comparisons and stored all matches of similar fragments in a hash table. Frank Dehne ■ www.dehne.net
Large Scale Parallelization: MP-PIPE Architecture: Cluster of multi-core processors One MP-PIPE worker per proc. Each worker with multiple threads H.sapiens protein pairs Frank Dehne ■ www.dehne.net
Global Scan of Entire Protein Interaction Networks MP-PIPE's superior performance and prediction accuracy enabled the first ever complete scan of entire protein interaction networks species # proteins # protein # known # novel Running time pairs interaction PIPE pred. * (1,000 proc. cores) s 1 hour S. 6,300 19,867,056 15,151 14,438 cerevisiae 1 week C. elegans 23,684 280,454,086 6,607 32,548 H.sapiens 22,513 253,406,328 41,678 130,470 3 months * False positive rate: 0.0001 Frank Dehne ■ www.dehne.net
H.Sapiens dsDNA Break Repair Blue: Proteins known to be involved in dsDNA break repair Green: Known interaction Red: Novel interactions discovered by PIPE Yellow: Novel proteins likely involved in dsDNA break repair
InSiPS : The In Silico Protein Synthesizer A computational tool that can synthesize proteins with specific protein-protein interaction prediction profiles. Frank Dehne ■ www.dehne.net
The In Silico Protein Synthesizer (InSiPS) ● Given – a set of target proteins and targets – a set of non-target proteins . ● Design a protein (sequence) ? that is – predicted to interact with the target proteins and – predicted not to interact with non-targets the non-targets. Frank Dehne ■ www.dehne.net
Drugs Based On PPI Inhibitors Frank Dehne ■ www.dehne.net
Drugs Based On PPI Inhibitors Frank Dehne ■ www.dehne.net
Fragment Based Screening Frank Dehne ■ www.dehne.net
InSiPS : Synthetic Proteins As PPI Inhibitors ● More “druggable targets” ● Can attach to “flat” larger pathway interaction regions that smaller compounds can not intercept X recognize target ● Natural compounds can have side effects No side effects Frank Dehne ■ www.dehne.net
InSiPS : Algorithm Frank Dehne ■ www.dehne.net
Performance On BlueGene /Q Population Size: 1500 Sequences. 1 Target. 250 Non-targets. #Nodes (16 cores per node) #Nodes (16 cores per node) Frank Dehne ■ www.dehne.net
Parameter Tuning Limitations ● Can InSiPS always find a PPI Parameters: inhibitor for any combination of target / non-target proteins ● No! (May not even be biochemically possible.) Frank Dehne ■ www.dehne.net
InSiPS : Limitations “Good” Cases “Bad” Cases Frank Dehne ■ www.dehne.net
InSiPS : Experimental Verification ● Task: Design a protein that attaches to a yeast protein involved in DNA repair, thereby blocking its function. ● Target Yeast protein: YAL017W (PSK1) – DNA repair ● Non-Targets: All other Yeast proteins (~ 6,000) ● InSiPS generated protein: “ Anti-PSK1 ”: HHHHHHSDNEHLHKCQRLKTRWKMARQFSDPQHNMYWIINWAQAMNIHADQNQEEEEELHDASVNNAEQYMAQCAPE EACQYPVRRSYGLHATNCIERRKCCMIMYQHPTCRQWEAKNTCAISRAGKGVYWKGIIFMRAWKHWCTRRLVQ ● Fitness: 0.465163 ● Target score: 0.71832232 ● Max non-target score: 0.35243136 (YLL039C) InSiPS ● Avg non-target score: 0.0720702297 Blue Gene /Q Frank Dehne ■ www.dehne.net
InSiPS : Experimental Verification ● Task: Design a protein that attaches to a yeast protein involved in DNA repair, thereby blocking its function. ● Target Yeast protein: YAL017W (PSK1) – DNA repair ● Non-Targets: All other Yeast proteins (~ 6,000) ● InSiPS generated protein: “ Anti-PSK1 ”: HHHHHHSDNEHLHKCQRLKTRWKMARQFSDPQHNMYWIINWAQAMNIHADQNQEEEEELHDASVNNAEQYMAQCAPE EACQYPVRRSYGLHATNCIERRKCCMIMYQHPTCRQWEAKNTCAISRAGKGVYWKGIIFMRAWKHWCTRRLVQ ● Fitness: 0.465163 ● Target score: 0.71832232 ● Max non-target score: 0.35243136 (YLL039C) ● Avg non-target score: 0.0720702297 Frank Dehne ■ www.dehne.net
InSiPS : Experimental Verification UV Light PSK1 DNA mRNA Protein mR mR Prote NA NA in Deletion DNA mRNA Protein mR Prote mR NA in NA Anti-PSK1 DNA mRNA Protein mR Prote mR NA in NA Frank Dehne ■ www.dehne.net
InSiPS : Experimental Verification WT WT + PSK1 (empty Anti-PSK1 knockout WT vector) expressed Decreasing cell density Expression of Anti-Psk1 causes sensitivity to UV light. Equal numbers of cells serially diluted and exposed to 30s of UV light Frank Dehne ■ www.dehne.net
Current Project: Muscular Dystrophy With Alex Blais, Ottawa General Hospital Frank Dehne ■ www.dehne.net
Muscular Dystrophy With Alex Blais, Ottawa General Hospital Frank Dehne ■ www.dehne.net
Muscular Dystrophy With Alex Blais, Ottawa General Hospital Dystrophic patient Healthy donor Stem Cell Therapy 1. Muscle biopsy from healthy donor 2. Satellite cell isolation 3. In vitro expansion 4. Transplantation into patient Frank Dehne ■ www.dehne.net
Recommend
More recommend