CSE182-L11 Protein sequencing and Mass Spectrometry CSE182 Course - PowerPoint PPT Presentation

CSE182-L11 Protein sequencing and Mass Spectrometry CSE182

Course Summary Gene finding • Sequence Comparison (BLAST & other tools) • Protein Motifs: – Profiles/Regular Expression/ HMMs • Discovering protein coding genes – Gene finding HMMs – DNA signals (splice signals) • How is the genomic sequence itself obtained? – LW statistics – Sequencing and assembly • Next topic: the dynamic aspects of the cell ESTs Protein sequence analysis CSE182

The Dynamic nature of the cell • The molecules in the body, RNA, and proteins are constantly turning over. – New ones are ‘created’ through transcription, translation – Proteins are modified post- translationally, – ‘Old’ molecules are degraded CSE182

Dynamic aspects of cellular function • Expressed transcripts – Microarrays to ‘count’ the number of copies of RNA • Expressed proteins – Mass spectrometry is used to ‘count’ the number of copies of a protein sequence. • Protein-protein interactions (protein networks) • Protein-DNA interactions • Population studies CSE182

The peptide backbone The peptide backbone breaks to form fragments with characteristic masses. H...-HN-CH-CO-NH-CH-CO-NH-CH-CO-…OH R i-1 R i R i+1 C-terminus N-terminus AA residue i-1 AA residue i+1 AA residue i CSE182

Mass Spectrometry CSE182

Nobel citation ’02 CSE182

The promise of mass spectrometry • Mass spectrometry is coming of age as the tool of choice for proteomics – Protein sequencing, networks, quantitation, interactions, structure…. • Computation has a big role to play in the interpretation of MS data. • We will discuss algorithms for – Sequencing, Modifications, Interactions.. CSE182

Sample Preparation Enzymatic Digestion (Trypsin) + Fractionation CSE182

Single Stage MS Mass Spectrometry LC-MS: 1 MS spectrum / second CSE182

Tandem MS Secondary Fragmentation Ionized parent peptide CSE182

The peptide backbone The peptide backbone breaks to form fragments with characteristic masses. H...-HN-CH-CO-NH-CH-CO-NH-CH-CO-…OH R i-1 R i R i+1 C-terminus N-terminus AA residue i-1 AA residue i+1 AA residue i CSE182

Ionization The peptide backbone breaks to form fragments with characteristic masses. H + H...-HN-CH-CO-NH-CH-CO-NH-CH-CO-…OH R i-1 R i R i+1 C-terminus N-terminus AA residue i-1 AA residue i+1 AA residue i Ionized parent peptide CSE182

Fragment ion generation The peptide backbone breaks to form fragments with characteristic masses. H + H...-HN-CH-CO NH-CH-CO-NH-CH-CO-…OH R i-1 R i R i+1 C-terminus N-terminus AA residue i-1 AA residue i AA residue i+1 Ionized peptide fragment CSE182

Tandem MS for Peptide ID 88 145 292 405 534 663 778 907 1020 1166 b ions S G F L E E D E L K 1166 1080 1022 875 762 633 504 389 260 147 y ions 100 % Intensity [M+2H] 2+ 0 250 500 750 1000 m/z November 09

Peak Assignment 88 145 292 405 534 663 778 907 1020 1166 b ions S G F L E E D E L K 1166 1080 1022 875 762 633 504 389 260 147 y ions y 6 100 Peak assignment implies % Intensity Sequence (Residue tag) y 7 Reconstruction ! [M+2H] 2+ y 5 b 3 b 4 y 2 y 3 b 5 y 4 y 8 b 8 b 9 b 6 b 7 y 9 0 250 500 750 1000 m/z November 09

Database Searching for peptide ID • For every peptide from a database – Generate a hypothetical spectrum – Compute a correlation between observed and experimental spectra – Choose the best • Database searching is very powerful and is the de facto standard for MS. – Sequest, Mascot, and many others CSE182

Spectra: the real story • Noise Peaks • Ions, not prefixes & suffixes • Mass to charge ratio, and not mass – Multiply charged ions • Isotope patterns, not single peaks CSE182

Peptide fragmentation possibilities (ion types) x n-i y n-i y n-i-1 v n-i w n-i z n-i -HN-CH-CO-NH-CH-CO-NH- CH-R’ R i i+1 a i R” i+1 b i b i+1 c i d i+1 low energy fragments high energy fragments CSE182

Ion types, and offsets • P = prefix residue mass • S = Suffix residue mass • b-ions = P+1 • y-ions = S+19 • a-ions = P-27 CSE182

Mass-Charge ratio • The X-axis is not mass, but (M+Z)/Z – Z=1 implies that peak is at M+1 – Z=2 implies that peak is at (M+2)/2 • M=1000, Z=2, peak position is at 501 • Quiz: Suppose you see a peak at 501. Is the mass 500, or is it 1000? CSE182

Isotopic peaks • Ex: Consider peptide SAM • Mass = 308.12802 • You should see: • Instead, you see 308.13 308.13 310.13 CSE182

Isotopes • C-12 is the most common. Suppose C-13 occurs with probability 1% • EX: SAM – Composition: C11 H22 N3 O5 S1 • What is the probability that you will see a single C-13?   11  ⋅ 0.01 ⋅ (0.99) 10  1   • Note that C,S,O,N all have isotopes. Can you compute the isotopic distribution? CSE182

All atoms have isotopes • Isotopes of atoms – O16,18, C-12,13, S32,34…. – Each isotope has a frequency of occurrence • If a molecule (peptide) has a single copy of C-13, that will shift its peak by 1 Da • With multiple copies of a peptide, we have a distribution of intensities over a range of masses (Isotopic profile). • How can you compute the isotopic profile of a peak? CSE182

Isotope Calculation • Denote: – N c : number of carbon atoms in the peptide – P c : probability of occurrence of C-13 (~1%) – Then N c =50   Pr[Peak at M] = N C 0 1 − p c N C ( )  p c  0   +1   Pr[Peak at M + 1] = N C 1 1 − p c N C − 1 ( )  p c N c =200  1   +1 CSE182

Isotope Calculation Example • Suppose we consider Nitrogen, and Carbon • N N : number of Nitrogen atoms • P N : probability of occurrence of N-15 • Pr(peak at M) • Pr(peak at M+1)? • Pr(peak at M+2)? Pr[Peak at M] = N C   N C N N   0 1 − p c 0 1 − p N N N ( ) ( )  p c  p N   0 0         Pr[Peak at M + 1] = N C N C − 1 N N 1 1 − p c 0 1 − p N N N ( ) ( )  p c  p N   1 0         + N C N C N N 0 1 − p c 1 1 − p N N N − 1 ( ) ( )  p c  p N   0 1     How do we generalize? How can we handle Oxygen (O-16,18)? CSE182

General isotope computation • Definition: – Let p i,a be the abundance of the isotope with mass i Da above the least mass – Ex: P 0,C : abundance of C-12, P 2,O : O-18 etc. • Characteristic polynomial N a p 0, a + p 1, a x + p 2, a x 2 +  ∏ ( ) φ ( x ) = a • Prob{M+i}: coefficient of x i in φ (x) (a binomial convolution) CSE182

Isotopic Profile Application • In DxMS, hydrogen atoms are exchanged with deuterium • The rate of exchange indicates how buried the peptide is (in folded state) • Consider the observed characteristic polynomial of the isotope profile φ t1 , φ t2 , at various time points. Then φ t 2 ( x ) = φ t 1 ( x )( p 0, H + p 1, H ) N H • The estimates of p 1,H can be obtained by a deconvolution • Such estimates at various time points should give the rate of incorporation of Deuterium, and therefore, the accessibility. CSE182

Quiz  How can you determine the charge on a peptide?  Difference between the first and second isotope peak is 1/Z  Proposal:  Given a mass, predict a composition, and the isotopic profile  Do a ‘goodness of fit’ test to isolate the peaks corresponding to the isotope  Compute the difference CSE182

Tandem MS summary • The basics of peptide ID using tandem MS is simple. – Correlate experimental with theoretical spectra • In practice, there might be many confounding problems. – Isotope peaks, noise peaks, varying charges, post-translational modifications, no database. • Recall that we discussed how peptides could be identified by scanning a database. • What if the database did not contain the peptide of interest? CSE182

De novo analysis basics • Suppose all ions were prefix ions? Could you tell what the peptide was? • Can post-translational modifications help? CSE182

CSE182-L11 Protein sequencing and Mass Spectrometry CSE182 Course - PowerPoint PPT Presentation

CSE182-L11 Protein sequencing and Mass Spectrometry CSE182 Course Summary Gene finding Sequence Comparison (BLAST & other tools) Protein Motifs: Profiles/Regular Expression/ HMMs Discovering protein coding genes

CSE182-L7 CSE182-L7 Protein structure Basics Protein structure Basics Protein sequencing via MS

CSE182-L13 Mass Spectrometry Quantitation and other applications CSE182 The forbidden pairs

CSE182-L7 Dicitionary matching Pattern matching October 09 CSE182 Dictionary Matching

CSE182-L12 Mass Spectrometry Peptide identification CSE182 General isotope computation

CSE182-L6 P-value and E-value Dicitionary matching Pattern matching October 09 CSE182 Why is

CSE 182-L2:Blast & variants I Dynamic Programming www.cse cse. .ucsd ucsd. .edu

SHRP2 Project L11: Evaluating Alternative Operations Strategies to Improve Travel Time

Landmark Map L11: Landmark Mapping Locations and uncertainties of n landmarks, with respect

CS3505/5020 Software Practice II Software process overview Sequence diagrams CS 3505 L11 - 1

L11 June 30, 2017 1 Lecture 11: Interacting with the filesystem CSCI 1360E: Foundations for

L14 Mass Spec Quantitation MS applications Microarray analysis CSE182 LC-MS Maps Peptide 2 I

CSE182-L10 Gene Finding November 09 HMM fair-coin example 0.6 0.6 1 0.4 0.4 E F (H)=0.5 E L

CSE182-L9 Protein domain analysis via HMMs Gene finding November 09 QUIZ! Question: Your

CSE182-L8 Protein Sequence Analysis Patterns (regular expressions) Profiles HMM Gene Finding

CSE 182: Biological Data Analysis Instructor: Vineet Bafna TA: Ryan Kelley www. www.cse cse.

CSE182-L16 Non-coding RNA Biol. Data analysis: Review Assembly Protein Sequence Sequence

Genome 559 Intro to Statistical and Computational Genomics Lecture 15b: Classes and Objects,

Limi$ng the Number of Poten$al Binding Modes by Introducing

High-throughput molecular dynamics simulation and Markov modeling Frank No (FU Berlin)

The Foundations of Personalized Medicine Jeremy M. Berg Pittsburgh Foundation Professor and

Proteomics Informatics Databases, data repositories and standardization (Week 8) Protein

Development of Multiscale Models for Complex Chemical Systems From H+H 2 to Biomolecules Do not

KBDOCK A Case-Based Reasoning Approach for Protein Docking Dave Ritchie Team Orpailleur

Accelerating Tandem MS Protein Database Searches Using OpenCL Programming devices the

Sambuz

Useful Links

Newsletter

Mail Us

CSE182-L11 Protein sequencing and Mass Spectrometry CSE182 Course - PowerPoint PPT Presentation

CSE182-L11 Protein sequencing and Mass Spectrometry CSE182 Course Summary Gene finding Sequence Comparison (BLAST & other tools) Protein Motifs: Profiles/Regular Expression/ HMMs Discovering protein coding genes

CSE182-L7 CSE182-L7 Protein structure Basics Protein structure Basics Protein sequencing via MS

CSE182-L13 Mass Spectrometry Quantitation and other applications CSE182 The forbidden pairs

CSE182-L7 Dicitionary matching Pattern matching October 09 CSE182 Dictionary Matching

CSE182-L12 Mass Spectrometry Peptide identification CSE182 General isotope computation

CSE182-L6 P-value and E-value Dicitionary matching Pattern matching October 09 CSE182 Why is

CSE 182-L2:Blast &amp; variants I Dynamic Programming www.cse cse. .ucsd ucsd. .edu

SHRP2 Project L11: Evaluating Alternative Operations Strategies to Improve Travel Time

Landmark Map L11: Landmark Mapping Locations and uncertainties of n landmarks, with respect

CS3505/5020 Software Practice II Software process overview Sequence diagrams CS 3505 L11 - 1

L11 June 30, 2017 1 Lecture 11: Interacting with the filesystem CSCI 1360E: Foundations for

L14 Mass Spec Quantitation MS applications Microarray analysis CSE182 LC-MS Maps Peptide 2 I

CSE182-L10 Gene Finding November 09 HMM fair-coin example 0.6 0.6 1 0.4 0.4 E F (H)=0.5 E L

CSE182-L9 Protein domain analysis via HMMs Gene finding November 09 QUIZ! Question: Your

CSE182-L8 Protein Sequence Analysis Patterns (regular expressions) Profiles HMM Gene Finding

CSE 182: Biological Data Analysis Instructor: Vineet Bafna TA: Ryan Kelley www. www.cse cse.

CSE182-L16 Non-coding RNA Biol. Data analysis: Review Assembly Protein Sequence Sequence

Genome 559 Intro to Statistical and Computational Genomics Lecture 15b: Classes and Objects,

Limi$ng the Number of Poten$al Binding Modes by Introducing

High-throughput molecular dynamics simulation and Markov modeling Frank No (FU Berlin)

The Foundations of Personalized Medicine Jeremy M. Berg Pittsburgh Foundation Professor and

Proteomics Informatics Databases, data repositories and standardization (Week 8) Protein

Development of Multiscale Models for Complex Chemical Systems From H+H 2 to Biomolecules Do not

KBDOCK A Case-Based Reasoning Approach for Protein Docking Dave Ritchie Team Orpailleur

Accelerating Tandem MS Protein Database Searches Using OpenCL Programming devices the

Sambuz

Useful Links

Newsletter

Mail Us

CSE 182-L2:Blast & variants I Dynamic Programming www.cse cse. .ucsd ucsd. .edu