Proteomics Informatics – Protein identification I: searching protein sequence collections and significance testing (Week 4)
Peptide Mapping - Mass Accuracy 2
Peptide Mapping Database Size Human C. elegans S. cerevisiae 3
Peptide Mapping Cys-Containing Peptides Human C. elegans S. cerevisiae 4
Identification – Peptide Mass Fingerprinting Sequence DB Repeat for each protein Pick Protein Digestion MS All Peptide Masses MS Compare, Score, Test Significance Identified Proteins
ProFound Results
Database size
Mixtures
Peptide Fragmentation Mass Frag- Mass Ion Source Detector Analyzer 1 mentation Analyzer 2 b y
Identification – Tandem MS
Tandem MS – Sequence Confirmation S G F L E E D E L K 100 % Relative Abundance 0 250 500 750 1000 m/z
Tandem MS – Sequence Confirmation S G F L E E D E L K 88 145 292 405 534 663 778 907 1020 1166 b ions S G F L E E D E L K 100 % Relative Abundance 0 250 500 750 1000 m/z
Tandem MS – Sequence Confirmation S G F L E E D E L K 88 145 292 405 534 663 778 907 1020 1166 b ions S G F L E E D E L K 1166 1080 1022 875 762 633 504 389 260 147 y ions 100 % Relative Abundance 0 250 500 750 1000 m/z
Tandem MS – Sequence Confirmation S G F L E E D E L K 88 145 292 405 534 663 778 907 1020 1166 b ions S G F L E E D E L K 1166 1080 1022 875 762 633 504 389 260 147 y ions 762 100 % Relative Abundance 875 [M+2H] 2+ 633 292 405 534 1022 260 389 504 907 1020 663 778 1080 0 250 500 750 1000 m/z
Tandem MS – Sequence Confirmation S G F L E E D E L K 88 145 292 405 534 663 778 907 1020 1166 b ions S G F L E E D E L K 1166 1080 1022 875 762 633 504 389 260 147 y ions 762 100 % Relative Abundance 875 [M+2H] 2+ 633 292 405 534 1022 260 389 504 907 1020 663 778 1080 0 250 500 750 1000 m/z
Tandem MS – Sequence Confirmation S G F L E E D E L K 88 145 292 405 534 663 778 907 1020 1166 b ions S G F L E E D E L K 1166 1080 1022 875 762 633 504 389 260 147 y ions 762 100 % Relative Abundance 875 113 [M+2H] 2+ 113 633 292 405 534 1022 260 389 504 907 1020 663 778 1080 0 250 500 750 1000 m/z
Tandem MS – Sequence Confirmation S G F L E E D E L K 88 145 292 405 534 663 778 907 1020 1166 b ions S G F L E E D E L K 1166 1080 1022 875 762 633 504 389 260 147 y ions 762 100 129 % Relative Abundance 875 [M+2H] 2+ 129 633 292 405 534 1022 260 389 504 907 1020 663 778 1080 0 250 500 750 1000 m/z
Tandem MS – Sequence Confirmation S G F L E E D E L K 88 145 292 405 534 663 778 907 1020 1166 b ions S G F L E E D E L K 1166 1080 1022 875 762 633 504 389 260 147 y ions 762 100 % Relative Abundance 875 [M+2H] 2+ 633 292 405 534 1022 260 389 504 907 1020 663 778 1080 0 250 500 750 1000 m/z
Tandem MS – Sequence Confirmation S G F L E E D E L K 88 145 292 405 534 663 778 907 1020 1166 b ions S G F L E E D E L K 1166 1080 1022 875 762 633 504 389 260 147 y ions 762 100 % Relative Abundance 875 [M+2H] 2+ 633 292 405 534 1022 260 389 504 907 1020 663 778 1080 0 250 500 750 1000 m/z
Tandem MS – Sequence Confirmation S G F L E E D E L K 88 145 292 405 534 663 778 907 1020 1166 b ions S G F L E E D E L K 1166 1080 1022 875 762 633 504 389 260 147 y ions 762 100 % Relative Abundance 875 [M+2H] 2+ 633 292 405 534 1022 260 389 504 907 1020 663 778 1080 0 250 500 750 1000 m/z
Tandem MS – de novo Sequencing 762 100 % Relative Abundance Amino acid masses 875 [M+2H] 2+ 1-letter 3-letter Chemical Monois Average code code formula otopic 633 292 A Ala 71.0371 71.0788 C 3 H 5 ON 405 534 1022 260 389 504 R Arg 156.101 156.188 C 6 H 12 ON 4 907 1020 663 778 1080 N Asn 114.043 114.104 C 4 H 6 O 2 N 2 0 250 500 750 1000 D Asp 115.027 115.089 C 4 H 5 O 3 N m/z C Cys 103.009 103.139 C 3 H 5 ONS E Glu 129.043 129.116 C 5 H 7 O 3 N Mass Differences Q Gln 128.059 128.131 C 5 H 8 O 2 N 2 G Gly 57.0215 57.0519 C 2 H 3 ON H His 137.059 137.141 C 6 H 7 ON 3 I Ile 113.084 113.159 C 6 H 11 ON Sequences L Leu 113.084 113.159 C 6 H 11 ON K Lys 128.095 128.174 consistent C 6 H 12 ON 2 M Met 131.04 131.193 C 5 H 9 ONS with spectrum F Phe 147.068 147.177 C 9 H 9 ON P Pro 97.0528 97.1167 C 5 H 7 ON S Ser 87.032 87.0782 C 3 H 5 O 2 N T Thr 101.048 101.105 C 4 H 7 O 2 N W Trp 186.079 186.213 C 11 H 10 ON 2 Y Tyr 163.063 163.176 C 9 H 9 O 2 N V Val 99.0684 99.1326 C 5 H 9 ON
Tandem MS – de novo Sequencing 260 292 389 405 504 534 633 663 762 778 875 907 1020 1022 1079 260 32 129 145 244 274 373 403 502 518 615 647 760 762 819 292 97 113 212 242 341 371 470 486 583 615 728 730 787 389 16 115 145 244 274 373 389 486 518 631 633 690 405 99 129 228 258 357 373 470 502 615 617 674 504 30 129 159 258 274 371 403 516 518 575 534 99 129 228 244 341 373 486 488 545 633 30 129 145 242 274 387 389 446 663 99 115 212 244 357 359 416 762 16 113 145 258 260 317 778 97 129 242 244 301 875 32 145 147 204 907 113 115 172 1020 2 59 1022 57
Tandem MS – de novo Sequencing 260 292 389 405 504 534 633 663 762 778 875 907 1020 1022 1079 260 32 129 145 244 274 373 403 502 518 615 647 760 762 819 97 113 292 212 242 341 371 470 486 583 615 728 730 787 389 16 115 145 244 274 373 389 486 518 631 633 690 99 129 405 228 258 357 373 470 502 615 617 674 504 30 129 159 258 274 371 403 516 518 575 99 129 534 228 244 341 373 486 488 545 633 30 129 145 242 274 387 389 446 99 115 663 212 244 357 359 416 762 16 113 145 258 260 317 97 129 778 242 244 301 875 32 145 147 204 113 115 907 172 1020 2 59 57 1022
Tandem MS – de novo Sequencing 260 292 389 405 504 534 633 663 762 778 875 907 1020 1022 1079 32 E 260 145 244 274 373 403 502 518 615 647 760 762 819 X P I/L 212 292 242 341 371 470 486 583 615 728 730 787 16 D 389 145 244 274 373 389 486 518 631 633 690 X V E 405 228 258 357 373 470 502 615 617 674 30 E 504 159 258 274 371 403 516 518 575 X V E 534 228 244 341 373 486 488 545 30 E 633 145 242 274 387 389 446 …GF(I/L)EEDE(I/L)… …GF(I/L)EEDE(I/L)… S GF(I/L)EEDE(I/L)… X V D 663 212 244 357 359 416 …(I/L)EDEE(I/L)FG… …(I/L)EDEE(I/L)FG… 16 I/L 145 762 258 260 317 1166 – 1020 – 18 = 128 K or Q Peptide M+H = 1166 X P E 778 242 244 301 1166 -1079 = 87 => S 145 F 875 32 204 SGF(I/L)EEDE(I/L)( K/Q) S GF(I/L)EEDE(I/L)… X I/L D 907 172 1020 2 59 G 1022
Tandem MS – de novo Sequencing Challenges in de novo sequencing Challenges in de novo sequencing Neutral loss (-H 2 O, -NH 3 ) Neutral loss (-H 2 O, -NH 3 ) Modifications Modifications Background peaks Background peaks Incomplete information Incomplete information
Tandem MS – Database Search Sequence DB Lysis Pick Protein Fractionation Repeat for all proteins Digestion LC-MS Pick Peptide all peptides Repeat for MS/MS All Fragment Masses MS/MS Compare, Score, Test Significance
Search Results
Significance Testing False protein identification is caused by random matching An objective criterion for testing the significance of protein identification results is necessary. The significance of protein identifications can be tested once the distribution of scores for false results is known.
Significance Testing - Expectation Values The majority of sequences in a collection will give a score due to random matching.
Significance Testing - Expectation Values Database Search List of Candidates M/Z Distribution of Scores for Random and False Identifications Extrapolate And Calculate Expectation Values List of Candidates With Expectation Values
Rho-diagrams: Overall Quality of a Data Set Expectation values as a function of score for random matching: e ( s ) exp( s ) Definition: E i (i=0,-1,- 2,…) is the number of spectra that has been assigned an expectation value between exp(i) and exp(i-1). For random matching: e exp( i ) E Nde N {exp( i ) exp( i 1 )} i e exp( i 1 ) E i N exp( i ){ 1 exp( 1 )} ( i ) log( ) log( ) i E N { 1 exp( 1 )} 0
Rho-diagram Random Matching -6 -5 -4 -3 -2 -1 0 0 -1 -2 -3 -4 -5 -6 log(e)
Rho-diagram Data Quality -10 -8 -6 -4 -2 0 0 -2 -4 -6 -8 -10 log(e)
Rho-diagram Parameters
How many fragments are sufficient? To identify an unmodified peptide? To identify an unmodified peptide? To identify an unmodified peptide? To identify a modified peptide? To identify a modified peptide? To localize a modification on a peptide?
How many fragments are sufficient? How does it depend on different parameters? • Precursor mass • Precursor mass error • Fragment mass error • Background peaks
Recommend
More recommend