Mass Spectra Alignments and their Significance ocker 1 , Hans-Michael Kaltenbach 2 Sebastian B¨ 1 Technische Fakult¨ at, Universit¨ at Bielefeld 2 NRW Int’l Graduate School in Bioinformatics and Genome Research, Universit¨ at Bielefeld B¨ ocker, Kaltenbach Mass Spectra Alignments CPM 2005
Overview ◮ Mass Spectrometry in Proteomics ◮ Protein Identification via MS ◮ Alignment of Spectra ◮ Score Significance ◮ Conclusion B¨ ocker, Kaltenbach Mass Spectra Alignments CPM 2005
Overview ◮ Mass Spectrometry in Proteomics ◮ Protein Identification via MS ◮ Alignment of Spectra ◮ Score Significance ◮ Conclusion B¨ ocker, Kaltenbach Mass Spectra Alignments CPM 2005
Overview ◮ Mass Spectrometry in Proteomics ◮ Protein Identification via MS ◮ Alignment of Spectra ◮ Score Significance ◮ Conclusion B¨ ocker, Kaltenbach Mass Spectra Alignments CPM 2005
Overview ◮ Mass Spectrometry in Proteomics ◮ Protein Identification via MS ◮ Alignment of Spectra ◮ Score Significance ◮ Conclusion B¨ ocker, Kaltenbach Mass Spectra Alignments CPM 2005
Overview ◮ Mass Spectrometry in Proteomics ◮ Protein Identification via MS ◮ Alignment of Spectra ◮ Score Significance ◮ Conclusion B¨ ocker, Kaltenbach Mass Spectra Alignments CPM 2005
Proteins Biology Proteins are directed polymers of 20 different amino acids. G T D N S T D M K K A T I Q K A T S K A Mathematics Proteins are strings over an alphabet Σ . B¨ ocker, Kaltenbach Mass Spectra Alignments CPM 2005
Mass Spectrometry Mass Spectrometry in Bioscience Mass spectrometry measures the masses and quantity of molecules in a probe. It is widely used in biosciences to identify proteins and other biomolecules. B¨ ocker, Kaltenbach Mass Spectra Alignments CPM 2005
Fragmentation of peptides Problem Solely measuring the mass of a protein is not sufficient for identifi- cation. G T D N S T K D M K T I Q A abundance K A K A T S mass Idea Break up the protein into smaller pieces in a deterministic way. The spectrum of these pieces is called a fingerprint of the protein. B¨ ocker, Kaltenbach Mass Spectra Alignments CPM 2005
Fragmentation of peptides Problem Solely measuring the mass of a protein is not sufficient for identifi- cation. G T D N S T K D M K T I Q A abundance K A K A T S mass Idea Break up the protein into smaller pieces in a deterministic way. The spectrum of these pieces is called a fingerprint of the protein. D M K A abundance K A T S T I Q K A G T D N S T K mass B¨ ocker, Kaltenbach Mass Spectra Alignments CPM 2005
Peptide Mass Fingerprints Enzymatic cleavage example An enzyme cuts amino acid sequence after each letter K . G T D N S T D M K K A T I Q K A T S K A B¨ ocker, Kaltenbach Mass Spectra Alignments CPM 2005
Peptide Mass Fingerprints Enzymatic cleavage example An enzyme cuts amino acid sequence after each letter K . G T D N D M S T K K A T I Q K A T S K A B¨ ocker, Kaltenbach Mass Spectra Alignments CPM 2005
Peptide Mass Fingerprints Artificial Spectrum of GTDSTNKDMKASTAKAKQIT 1.00 DMK / 374.4614 0.95 0.90 QIT / 343.3801 Rel. Abundance 0.85 0.80 AK / 199.3618 GTDSTNK / 703.7071 0.75 0.70 ASTAK / 458.5151 200 300 400 500 600 700 Mass B¨ ocker, Kaltenbach Mass Spectra Alignments CPM 2005
Real Mass Spectrum (PMF peaks annotated) B¨ ocker, Kaltenbach Mass Spectra Alignments CPM 2005
Processing the spectrum Peak extraction Spectra are summarized into peak lists , but extracting peaks is in- herently difficult. Problem: Peak lists are never correct ◮ Inaccurate calibration ◮ Probe contamination ◮ Peak detection ◮ ... B¨ ocker, Kaltenbach Mass Spectra Alignments CPM 2005
Identification Protein Identification w/ PMF ◮ Isolate many copies of ONE protein ◮ Digest it into specific smaller fragments (Mass Fingerprint) ◮ Make a mass spectrum of these fragments ◮ Compare spectrum to all predicted mass spectra from DB Mass Fingerprint Mass Fingerprint via Peaklist Peaklist via Mass Spectrometry in-silico fragmentation Comparison AVKKPPTVHIIT... KVVGTASILLYV... VVNMTREEEASD... QEVFGGTELLPP... Score + Significance PLMKKRPHGTFD... ............... KLMMMTGERDFG... HILKMLVFDSAQ... B¨ ocker, Kaltenbach Mass Spectra Alignments CPM 2005
Identification Protein Identification w/ PMF ◮ Isolate many copies of ONE protein ◮ Digest it into specific smaller fragments (Mass Fingerprint) ◮ Make a mass spectrum of these fragments ◮ Compare spectrum to all predicted mass spectra from DB Mass Fingerprint Mass Fingerprint via Peaklist Peaklist via Mass Spectrometry in-silico fragmentation Comparison AVKKPPTVHIIT... KVVGTASILLYV... VVNMTREEEASD... QEVFGGTELLPP... Score + Significance PLMKKRPHGTFD... ............... KLMMMTGERDFG... HILKMLVFDSAQ... B¨ ocker, Kaltenbach Mass Spectra Alignments CPM 2005
Comparing Two Peak Lists Peaklists and Empty Peaks Let S m , S p be an extracted and a predicted peaklist. Let ε denote a special gap peak. Scoring Scheme Each assignment between the two peak lists can be scored: � score ( S p , S m ) = score ( i, j ) matched peaks matched i,j � + score ( i, ε ) missing peaks missing � + score ( ε, j ) additional peaks additional B¨ ocker, Kaltenbach Mass Spectra Alignments CPM 2005
Matching peaklists Matching ◮ One-to-one peak matching ◮ Peak matchings should not cross ◮ Any peak must be matched either to a peak or to the gap peak ◮ Matching score mainly based on mass difference but can include other features Best matching Using such scoring schemes, the best peaklist matching can be com- puted using standard global alignment . B¨ ocker, Kaltenbach Mass Spectra Alignments CPM 2005
Scoring scheme example: Peak counting Peak counting score � 1 | mass ( i ) − mass ( j ) | ≤ δ score ( i, j ) = 0 else score ( i, ε ) = score ( ε, j ) = 0 δ = 10 , S m = { 1000 , 1230 , 1500 } and S p = { 1000 , 1235 , 1700 } Alignment S p 1000 1235 ε 1700 S m 1000 1230 1500 ε score ( S m , S p ) = (1 + 1) + 0 + 0 = 2 . B¨ ocker, Kaltenbach Mass Spectra Alignments CPM 2005
Estimating the score distribution Problem The score distribution depends on ◮ Measured spectrum ◮ Sequence length ◮ Mass and probability of characters Estimation techniques ◮ Different null-models: Sampling against spectra or sampling against sequences ◮ Sampling against sequences Random or DB sequences both take long time ◮ Estimation of moments Works with certain classes of distributions B¨ ocker, Kaltenbach Mass Spectra Alignments CPM 2005
Score distribution Claim In most useful cases, the score distribution for fixed string length can be well approximated by a normal distribution and is then de- termined by expectation and variance. Missing and additional scores are usually very small compared to matches. B¨ ocker, Kaltenbach Mass Spectra Alignments CPM 2005
Computing moments Main Idea Probability of a peak corresponds to probability of a fragment of same mass in peptide. ◮ Discretize masses by scaling and rounding ◮ Compute probability of fragment of length l with mass � = m ◮ Compute probability of string of length L to have no fragment of peak mass m ◮ Can all be done in preprocessing ◮ Estimate moments ◮ Compute p-value B¨ ocker, Kaltenbach Mass Spectra Alignments CPM 2005
Computing moments Main Idea Probability of a peak corresponds to probability of a fragment of same mass in peptide. ◮ Discretize masses by scaling and rounding ◮ Compute probability of fragment of length l with mass � = m ◮ Compute probability of string of length L to have no fragment of peak mass m ◮ Can all be done in preprocessing ◮ Estimate moments ◮ Compute p-value B¨ ocker, Kaltenbach Mass Spectra Alignments CPM 2005
Fragment probability Weighted Alphabet We call the tuple (Σ , µ ) with mass function µ : Σ → N an (integer) weighted alphabet . Define µ ( s ) := � | s | k =1 µ ( s k ) . Fragments Let x be the cleavage character and Σ x = Σ \{ x } . The number of fragments of length l with mass m is then given by � c [ l, m ] = c [ l − 1 , m − µ ( σ )] σ ∈ Σ x ,µ ( σ ) ≤ m and for uniform character distribution we get the probability r [ l, m ] = 1 − c [ l, m ] | Σ x | l B¨ ocker, Kaltenbach Mass Spectra Alignments CPM 2005
Probability in Strings Main idea We compute prob. of string having NO fragment of mass m . Then the very first fragment must not have mass m and the following string must have no fragment of mass m . Iterate. p[L,m] G T D S T N K D M K A S T A K A K Q I T B¨ ocker, Kaltenbach Mass Spectra Alignments CPM 2005
Probability in Strings Main idea We compute prob. of string having NO fragment of mass m . Then the very first fragment must not have mass m and the following string must have no fragment of mass m . Iterate. p[L,m] G T D S T N K D M K A S T A K A K Q I T 1st cleavage site at position l B¨ ocker, Kaltenbach Mass Spectra Alignments CPM 2005
Recommend
More recommend