Proteomics Informatics – Protein identification I: searching protein sequence collections and significance testing (Week 4)
Peptide Mapping - Mass Accuracy 2
Peptide Mapping Database Size Human C. elegans S. cerevisiae 3
Peptide Mapping Cys-Containing Peptides Human C. elegans S. cerevisiae 4
Identification – Peptide Mass Fingerprinting Sequence DB Repeat for each protein Pick Protein Digestion MS All Peptide Masses MS Compare, Score, Test Significance Identified Proteins
ProFound – Search Parameters http://prowl.rockefeller.edu/
ProFound – Protein Identification by Peptide Mapping r ∑ − 2 ( ) m m − r − 0 i i r ( )! m m N r r ∏ ∝ − = max min 1 i ( | ) ( | ) exp P k DI P k I g F σ σ i pattern 2 ! 2 2 2 N = 1 i W. Zhang & B.T. Chait, Analytical Chemistry 72 (2000) 2482-2489
ProFound Results
Peptide Mapping – Mass Accuracy 7 140 ProFound Mascot 6 120 5 100 -log(e) 4 Score 80 3 60 2 40 1 20 0 0 0 0.5 1 1.5 2 0 0.5 1 1.5 2 Mass Tolerance (Da) Mass Tolerance (Da)
Peptide Mapping - Database Size S. cerevisiae Expectation Values Peptide mapping example: Fungi S. Cerevisiae 4.8e-7 Fungi 8.4e-6 All Taxa 2.9e-4 All Taxa
Database size
Missed Cleavage Sites u = 1 Expectation Values Peptide mapping example: u=1 4.8e-7 u = 2 u=2 1.1e-5 u=4 6.8e-4 u = 4
Peptide Mapping - Partial Modifications No Modifications Searched Searched With Without Possible Modifications Phosphorylation of S/T/Y DARPP-32 0.00006 0.01 Phophorylation (S, T, or Y) CFTR 0.00002 0.005 Even if the protein is modified it is usually better to search a protein sequence database without specifying possible modifications using peptide mapping data.
Peptide Mapping - Ranking by Direct Calculation of the Significance
General Criteria for a Good Protein Identification Algorithms The response to random input data should be random. Maximum number of correct identification and minimum number of incorrect identifications for any data set. Maximal separation between scores for correct identifications and the distribution of scores for random matching proteins for any data set. The statistical significance of the results should be calculated. The searches should be fast.
Response to Random Data Normalized Frequency
Peptide Fragmentation Mass Frag- Mass Ion Source Detector Analyzer 1 mentation Analyzer 2 b y
Identification – Tandem MS
Tandem MS – Sequence Confirmation S G F L E E D E L K 100 % Relative Abundance 0 250 500 750 1000 m/z
Tandem MS – Sequence Confirmation S G F L E E D E L K 88 145 292 405 534 663 778 907 1020 1166 b ions S G F L E E D E L K 100 % Relative Abundance 0 250 500 750 1000 m/z
Tandem MS – Sequence Confirmation S G F L E E D E L K 88 145 292 405 534 663 778 907 1020 1166 b ions S G F L E E D E L K 1166 1080 1022 875 762 633 504 389 260 147 y ions 100 % Relative Abundance 0 250 500 750 1000 m/z
Tandem MS – Sequence Confirmation S G F L E E D E L K 88 145 292 405 534 663 778 907 1020 1166 b ions S G F L E E D E L K 1166 1080 1022 875 762 633 504 389 260 147 y ions 762 100 % Relative Abundance 875 [M+2H] 2+ 633 292 405 534 1022 260 389 504 907 1020 663 778 1080 0 250 500 750 1000 m/z
Tandem MS – Sequence Confirmation S G F L E E D E L K 88 145 292 405 534 663 778 907 1020 1166 b ions S G F L E E D E L K 1166 1080 1022 875 762 633 504 389 260 147 y ions 762 100 % Relative Abundance 875 [M+2H] 2+ 633 292 405 534 1022 260 389 504 907 1020 663 778 1080 0 250 500 750 1000 m/z
Tandem MS – Sequence Confirmation S G F L E E D E L K 88 145 292 405 534 663 778 907 1020 1166 b ions S G F L E E D E L K 1166 1080 1022 875 762 633 504 389 260 147 y ions 762 100 % Relative Abundance 875 113 [M+2H] 2+ 113 633 292 405 534 1022 260 389 504 907 1020 663 778 1080 0 250 500 750 1000 m/z
Tandem MS – Sequence Confirmation S G F L E E D E L K 88 145 292 405 534 663 778 907 1020 1166 b ions S G F L E E D E L K 1166 1080 1022 875 762 633 504 389 260 147 y ions 762 100 129 % Relative Abundance 875 [M+2H] 2+ 129 633 292 405 534 1022 260 389 504 907 1020 663 778 1080 0 250 500 750 1000 m/z
Tandem MS – Sequence Confirmation S G F L E E D E L K 88 145 292 405 534 663 778 907 1020 1166 b ions S G F L E E D E L K 1166 1080 1022 875 762 633 504 389 260 147 y ions 762 100 % Relative Abundance 875 [M+2H] 2+ 633 292 405 534 1022 260 389 504 907 1020 663 778 1080 0 250 500 750 1000 m/z
Tandem MS – Sequence Confirmation S G F L E E D E L K 88 145 292 405 534 663 778 907 1020 1166 b ions S G F L E E D E L K 1166 1080 1022 875 762 633 504 389 260 147 y ions 762 100 % Relative Abundance 875 [M+2H] 2+ 633 292 405 534 1022 260 389 504 907 1020 663 778 1080 0 250 500 750 1000 m/z
Tandem MS – Sequence Confirmation S G F L E E D E L K 88 145 292 405 534 663 778 907 1020 1166 b ions S G F L E E D E L K 1166 1080 1022 875 762 633 504 389 260 147 y ions 762 100 % Relative Abundance 875 [M+2H] 2+ 633 292 405 534 1022 260 389 504 907 1020 663 778 1080 0 250 500 750 1000 m/z
Tandem MS – de novo Sequencing 762 100 % Relative Abundance Amino acid masses 875 [M+2H] 2+ 1-letter 3-letter Chemical Monois Average code code formula otopic 633 292 A Ala 71.0371 71.0788 C 3 H 5 ON 405 534 1022 260 389 504 R Arg 156.101 156.188 C 6 H 12 ON 4 907 1020 663 778 1080 N Asn 114.043 114.104 C 4 H 6 O 2 N 2 0 250 500 750 1000 D Asp 115.027 115.089 C 4 H 5 O 3 N m/z C Cys 103.009 103.139 C 3 H 5 ONS E Glu 129.043 129.116 C 5 H 7 O 3 N Mass Differences Q Gln 128.059 128.131 C 5 H 8 O 2 N 2 G Gly 57.0215 57.0519 C 2 H 3 ON H His 137.059 137.141 C 6 H 7 ON 3 I Ile 113.084 113.159 C 6 H 11 ON Sequences L Leu 113.084 113.159 C 6 H 11 ON K Lys 128.095 128.174 consistent C 6 H 12 ON 2 M Met 131.04 131.193 C 5 H 9 ONS with spectrum F Phe 147.068 147.177 C 9 H 9 ON P Pro 97.0528 97.1167 C 5 H 7 ON S Ser 87.032 87.0782 C 3 H 5 O 2 N T Thr 101.048 101.105 C 4 H 7 O 2 N W Trp 186.079 186.213 C 11 H 10 ON 2 Y Tyr 163.063 163.176 C 9 H 9 O 2 N V Val 99.0684 99.1326 C 5 H 9 ON
Tandem MS – de novo Sequencing 260 292 389 405 504 534 633 663 762 778 875 907 1020 1022 1079 260 32 129 145 244 274 373 403 502 518 615 647 760 762 819 292 97 113 212 242 341 371 470 486 583 615 728 730 787 389 16 115 145 244 274 373 389 486 518 631 633 690 405 99 129 228 258 357 373 470 502 615 617 674 504 30 129 159 258 274 371 403 516 518 575 534 99 129 228 244 341 373 486 488 545 633 30 129 145 242 274 387 389 446 663 99 115 212 244 357 359 416 762 16 113 145 258 260 317 778 97 129 242 244 301 875 32 145 147 204 907 113 115 172 1020 2 59 1022 57
Tandem MS – de novo Sequencing 260 292 389 405 504 534 633 663 762 778 875 907 1020 1022 1079 260 32 129 145 244 274 373 403 502 518 615 647 760 762 819 97 113 292 212 242 341 371 470 486 583 615 728 730 787 115 389 16 145 244 274 373 389 486 518 631 633 690 99 129 405 228 258 357 373 470 502 615 617 674 129 504 30 159 258 274 371 403 516 518 575 99 129 534 228 244 341 373 486 488 545 129 633 30 145 242 274 387 389 446 663 99 115 212 244 357 359 416 113 762 16 145 258 260 317 778 97 129 242 244 301 147 875 32 145 204 907 113 115 172 1020 2 59 57 1022
Tandem MS – de novo Sequencing 260 292 389 405 504 534 633 663 762 778 875 907 1020 1022 1079 32 E 260 145 244 274 373 403 502 518 615 647 760 762 819 X P I/L 212 292 242 341 371 470 486 583 615 728 730 787 16 D 389 145 244 274 373 389 486 518 631 633 690 X V E 405 228 258 357 373 470 502 615 617 674 30 E 504 159 258 274 371 403 516 518 575 X V E 534 228 244 341 373 486 488 545 30 E 633 145 242 274 387 389 446 S GF(I/L)EEDE(I/L)… …GF(I/L)EEDE(I/L)… …GF(I/L)EEDE(I/L)… X V D 663 212 244 357 359 416 …(I/L)EDEE(I/L)FG… …(I/L)EDEE(I/L)FG… 16 I/L 145 762 258 260 317 1166 – 1020 – 18 = 128 ⇒ K or Q Peptide M+H = 1166 X P E 778 242 244 301 1166 -1079 = 87 => S 145 F 875 32 204 SGF(I/L)EEDE(I/L)( K/Q) S GF(I/L)EEDE(I/L)… X I/L D 907 172 1020 2 59 G 1022
Tandem MS – de novo Sequencing Challenges in de novo sequencing Challenges in de novo sequencing Neutral loss (-H 2 O, -NH 3 ) Neutral loss (-H 2 O, -NH 3 ) Modifications Modifications Background peaks Background peaks Incomplete information Incomplete information
Recommend
More recommend