Pathways analysis in proteomics Angela Bachi Dibit-San Raffaele Scientific Institute, Milano
Pathways analysis in proteomics the input is the expression proteomics data and the output is the list of activated or dominant pathways in a given sample
Aim - To generate non-trivial functional hypotheses on biological systems - To define disease biomarker among pathways or pathway patterns instead of single molecules - To rationalize how molecules interact in ‘molecular pathways’, i.e. chains of chemical reactions or physical interactions in which the product of one reaction becomes the reactant of the other. - To construct for each organism the minimum set of linearly independent (orthogonal) pathways, non-negative linear combinations of which represent all potential steady states of the organism.
Differential Analysis of Promonocytic U937 “ Plus ” And “ Minus ” Cell Clones Promonocytic U937 cell line is used as in vitro model of HIV infection. Two different clones of U937 have been described and defined as “Plus” and “Minus” in respect of their efficiency or inefficiency to support productive HIV-1 infection. Aim of this study was to investigate the whole proteome of Plus (10) and Minus(34) clones in order to detect potential quali/quantitative differences at the protein expression level to unravel protein correlates of efficient/inefficient HIV replication.
Known differences at morphological, proliferative and molecular levels Expression of common and differential Cellular factors differentially myelomonocytic Ag in U937 Plus and expressed by Plus and Minus Minus cell clones U937 cell clones
•SINGLE-STEP ANALYSIS About 250 ng of tryptic digest have been injected into the nLC and separated with a 194 minutes long gradient. C:\Xcalibur\data\09Sep\27Sep\AA1_7LM 9/28/2007 1:44:06 PM RT: 8.95 - 77.50 SM: 7G NL: 17.32 MS high resolution 100 8.43E4 Base Peak 90 F: ITMS + c NSI E d Full Top 5 low resolution 80 ms2 MS AA1_7LM 70 23.49 Relative Abundance 29.43 60 38.90 29.15 34.56 50 26.55 33.50 35.41 22.68 24.29 33.36 40 31.64 20.71 30 16.52 20 37.61 40.56 16.27 41.53 10 15.58 44.93 10.53 47.19 62.61 70.83 49.99 53.35 0 10 15 20 25 30 35 40 45 50 55 60 65 70 75 Time (min) AA1_7LM #746 RT: 25.55 AV: 1 NL: 1.79E4 T: ITMS + c NSI E d Full ms2 681.33@cid35.00 [175.00-1375.00] 963.2988 100 90 805.2812 80 70 Relative Abundance 672.2371 60 557.1674 876.2897 50 40 1116.1715 399.0985 718.2689 486.1531 30 644.1760 1187.2200 20 539.1782 300.0984 371.1805 1003.1432 246.2188 1062.3458 589.1819 440.1023 759.1253 10 945.2876 858.3187 215.0024 1227.2920 1300.2496 0 200 300 400 500 600 700 800 900 1000 1100 1200 1300 m/z
•SINGLE-STEP ANALYSIS About 250 ng of tryptic digest have been injected into the nLC and separated with a 180 minutes long gradient. Mass spectra have been acquired twice with two partially overlapping mass ranges: •300-900 low mass •750-1600 high mass U937_10 U937_10bis U937_34 U937_34bis Number of 10510 13708 12415 10689 queries Number of proteins (bold red and ion 664 733 752 680 score more than 20) Mass ranges combined. Mascot and X! tandem searches. 5ppm precursor, 0.5 Da for fragments. At least 2 peptides at 95%, protein probability >99%. more than 700 proteins identified
Expression Proteomics • Fast (2-4h) • Reliable (more than 2 peptides,>99%) • Quantitative (R2 around 0.9) • Specific (FDR< 1%) • Deep (dynamic range >1:20) • Sensitive (100 ng proteins) • Comprehensive (>500 proteins ) • Inexpensive (200 euro)
MASCOT result Chromatographic peak Retention Abundance of Time the below peptide MS of a peptide The QUANTI analysis: Sum of Accurate isotopic peaks The MASCOT database search is mass compared to the chromatographic run to get the abundance of each single peptide identified.
Peptide QUANTIfication SEQUENCE RT MA SCOT RT APEX MZ MASCOT MZ Q UA NTI IPI MA SCOT SCORE INT ENSITY FULLINT R.YESLTDPSKLDSGK.E 91.200607 84.407524 770.381287 770.379028 IPI00784295 61 212771.6406 K.HLEINPDHSIIET LR.Q 86.204216 86.17421 893.976074 893.978271 IPI00784295 64 65058548 K.VILHLKEDQ TEYLEER.R 83.876076 83.714897 1008.028015 1008.027954 IPI00784295 70 10953614 K.DLVILLYETALLSSGFSLEDPQTHANR.I 133.445953 131.745605 1001.520325 1001.524231 IPI00784295 137 203456512 M.PEETQTQDQPMEEEEVETFAFQ AEIAQLMSLIINTFYSNK.E 173.490768 173.519012 1560.401001 1560.397095 IPI00784295 54 182668.7188 M.PEETQTQDQPMEEEEVETFAFQ AEIAQLMSLIINTFYSNKEIFLR.E 177.030182 176.660629 1335.139771 1335.140381 IPI00784295 26 87048.85938 K.TLNDELEIIEG MK.F 82.980316 77.69648 752.882202 752.881042 IPI00784154 96 1714931.5 R.ALM LQG VDLLADAVAVTMGPK.G 125.89357 126.296585 1057.073608 1057.074341 IPI00784154 98 118192832 R.TALLDAAG VASLLTTAEVVVTEIPK.E 201.519104 95.73555 828.140686 828.142761 IPI00784154 69 1631606.125 K.VVIGMDVAASEFFR.S 122.195076 119.691261 770.89563 770.895142 IPI00465248 115 32162.14258 K.IDKLM IEMDG TENK.S 105.149544 103.420265 818.901428 818.9021 IPI00465248 56 147749.8281 R.AAVPSGASTGIYEALELR.D 92.408096 92.50663 902.976379 902.978149 IPI00465248 93 456041760 K.LAM QEFMILPVGAANFR.E 101.599594 101.737404 954.499329 954.499756 IPI00465248 70 258961744 K.FTASAG IQVVGDDLTVTNPK.R 112.769814 117.65992 1017.031128 1017.029846 IPI00465248 88 78492.90625 K.FTASAG IQVVGDDLTVTNPKR.I 88.349304 88.265793 1095.084839 1095.081543 IPI00465248 111 9042372 Protein QUANTIfication IPI Abundance Sum of the intensity of every IPI00784295 279951163.2 peak belonging to that protein: IPI00784154 121539369.6 IPI00465248 724304280.9 … … … …
SINGLE-STEP ANALYSIS Total lysates in-solution digested: 250 ng of digested proteins separated by LC through very long gradient. Analysis in duplicates. MS acquired two times, with two different and partially overlapping mass ranges (300-900 and 750-1600). Mass spectra summed up and submitted to database searching as a whole; MASCOT and X! tandem algorithms for database search. More than 700 proteins identified and quantitated : 19 unique of clone 10 47 unique of clone 34 64 up regulated and 27 down-regulated in 34
Go analysis
PSE Zubarev et al. J Proteomics. 2008 Protein names and protein abundances are loaded Two analysis can be performed: direct and TF mediated, but both pass through the Key Node (signaling molecules found on pathway intersections in the upstream vicinity of the genes from the input list) filtering step. The resultant sets of genes are compared and their intersection is mapped on a pathway database. Each found key-node receives a score reflecting its connectivity, i.e. how many input- list genes are reached and the proximities to the reached genes. Key-nodes with the highest connectivity (highest score) are then selected, and downstream genes are chosen as a subset for subsequent mapping onto the pathways.
Load Proteins list and abundance to ExPlain IPI no. Abundance IPI00003362 433576562 IPI00784154 104565394 IPI00021439 620550130 IPI00010796 188395124 IPI00784347 75332303 IPI00604784 1741269646 IPI00465028 149096200 IPI00019502 98212724 IPI00788958 82243515 IPI00396378 101936146 IPI00169383 143714167 Abundance IPI00027720 115625859 IPI00215743 35239791 IPI00219018 193798265 IPI00554648 217364644 IPI00021405 80042812 IPI00030363 66232299 IPI00291006 98192134 IPI00003865 130129714 IPI00303476 91854550 IPI00465248 176033959 IPI00021428 86977256 IPI00013808 47563661 • • • • • • Proteins converted to genes
KeyNodes Analysis Key Node = signalling molecule found on pathway intersections in the upstream vicinity of the genes from the input list KeyNode Score= KeyNode Name connectivity
Mapping of signaling molecules onto pathways
Pathway Score .htm file of pathway list Pathway score = ∑ KeyNode scores Pathway name Score EGF pathway 4.38 stress ‐ associated pathways 3.45 E2F network 3.34 Caspase network 3.28 + insulin pathway 2.80 p53 pathway 2.71 .htm file of KeyNode list T ‐ cell antigen receptor pathway 2.54 JNK pathway 2.42 Score Fas pathway 2.40 PRL pathway 2.29 B ‐ cell antigen receptor pathway 2.22 TGFbeta pathway 2.16 RANKL pathway 2.06 Sphase(Cdk2) 2.03 Epo pathway 1.86 TLR4 pathway 1.85 IL ‐ 1 pathway 1.83 G1phase(Cdk2) 1.80
Recommend
More recommend