CSE182-L7 CSE182-L7 Protein structure Basics Protein structure Basics Protein sequencing via MS Protein sequencing via MS
Quiz Quiz ß What research won the Nobel prize in What research won the Nobel prize in ß Chemistry in 2004? Chemistry in 2004? ß In 2002? In 2002? ß
A structural view of proteins A structural view of proteins
CS view of a protein CS view of a protein • >sp|P00974|BPT1_BOVIN Pancreatic >sp|P00974|BPT1_BOVIN Pancreatic • trypsin inhibitor precursor (Basic inhibitor precursor (Basic trypsin protease inhibitor) (BPI) (BPTI) protease inhibitor) (BPI) (BPTI) (Aprotinin Aprotinin) - ) - Bos taurus Bos taurus (Bovine). (Bovine). ( • MKMSRLCLSVALLVLLGTLAASTPGCDT MKMSRLCLSVALLVLLGTLAASTPGCDT • SNQAKAQRPDFCLEPPYTGPCKARIIRYF SNQAKAQRPDFCLEPPYTGPCKARIIRYF YNAKAGLCQTFVYGGCRAKRNNFKSAED YNAKAGLCQTFVYGGCRAKRNNFKSAED CMRTCGGAIGPWENL CMRTCGGAIGPWENL
Protein structure basics Protein structure basics
Side chains determine amino-acid type Side chains determine amino-acid type ß The residues may have different properties. The residues may have different properties. ß ß Aspartic acid (D), and Aspartic acid (D), and Glutamic Glutamic Acid (E) are Acid (E) are ß acidic residues acidic residues
Bond angles form structural Bond angles form structural constraints constraints
Various constraints determine 3d Various constraints determine 3d structure structure ß Constraints Constraints ß ß Structural constraints due to physiochemical Structural constraints due to physiochemical ß properties properties ß Constraints due to bond angles Constraints due to bond angles ß ß H-bond formation H-bond formation ß ß Surprisingly, a few conformations are seen Surprisingly, a few conformations are seen ß over and over again. over and over again.
Alpha-helix Alpha-helix ß 3.6 residues per 3.6 residues per ß turn turn ß H-bonds between H-bonds between ß 1st and 4th 1st and 4th residue stabilize residue stabilize the structure. the structure. ß First discovered First discovered ß by Linus Pauling Linus Pauling by
Beta-sheet Beta-sheet Each strand by itself has 2 residues per turn, and is not stable. ß ß Each strand by itself has 2 residues per turn, and is not stable. Adjacent strands hydrogen-bond to form stable beta-sheets, parallel or anti-parallel. Adjacent strands hydrogen-bond to form stable beta-sheets, parallel or anti-parallel. ß ß Beta sheets have long range interactions that stabilize the structure, while alpha-helices Beta sheets have long range interactions that stabilize the structure, while alpha-helices ß ß have local interactions. have local interactions.
Domains Domains ß The basic structures (helix, strand, loop) The basic structures (helix, strand, loop) ß combine to form complex 3D structures. combine to form complex 3D structures. ß Certain combinations are popular. Many Certain combinations are popular. Many ß sequences, but only a few folds sequences, but only a few folds
3D structure 3D structure • Predicting tertiary structure is an important problem in Bioinformatics. • Premise: Clues to structure can be found in the sequence. • While de novo tertiary structure prediction is hard, there are many intermediate, and tractable goals. PDB • The PDB database is a compendium of structures
Protein Domains Protein Domains An important realization (in the last decade) is that proteins have a ß An important realization (in the last decade) is that proteins have a ß modular architecture of domains/folds. modular architecture of domains/folds. Example: The zinc finger domain is a DNA-binding domain. ß Example: The zinc finger domain is a DNA-binding domain. ß ß What is a domain? What is a domain? ß Part of a sequence that can fold independently, and is present in Part of a sequence that can fold independently, and is present in ß ß other sequences as well other sequences as well
Proteins containing zf zf Proteins containing domains domains How can we find a motif corresponding to a zf domain
Domain review Domain review ß What is a domain? What is a domain? ß ß How are domains expressed How are domains expressed ß ß Motifs (Regular expression & others) Motifs (Regular expression & others) ß ß Multiple alignments Multiple alignments ß ß Profiles Profiles ß ß Profile Profile HMMs HMMs ß
Protein Domain databases Protein Domain databases Prosite ß Motifs Motifs ß http://us.expasy.org/prosite/ ß PROSITE: Regular PROSITE: Regular ß Expressions & Expressions & Profiles Profiles ß BLOCKS:Multiple BLOCKS:Multiple ß PFAM Alignments Alignments http://www.sanger.ac.uk/Software/Pfam/ ß Pfam Pfam: HMMS : HMMS ß
How are Proteins Sequenced? How are Proteins Sequenced? Mass Spec 101: Mass Spec 101:
Nobel Citation 2002 Nobel Citation 2002
Nobel Citation, 2002 Nobel Citation, 2002
Mass Spectrometry Mass Spectrometry
Sample Preparation Sample Preparation Enzymatic Digestion (Trypsin) + Fractionation
Single Stage MS Single Stage MS Mass Spectrometry LC-MS: 1 MS spectrum / second
Tandem MS Tandem MS Secondary Fragmentation Ionized parent peptide
The peptide backbone The peptide backbone The peptide backbone breaks to form fragments with characteristic masses. H...-HN-CH-CO-NH-CH-CO-NH-CH-CO-…OH R i-1 R i R i+1 C-terminus N-terminus AA residue i-1 AA residue i+1 AA residue i
Ionization Ionization The peptide backbone breaks to form fragments with characteristic masses. H + H...-HN-CH-CO-NH-CH-CO-NH-CH-CO-…OH R i-1 R i R i+1 C-terminus N-terminus AA residue i-1 AA residue i+1 AA residue i Ionized parent peptide
Fragment ion generation Fragment ion generation The peptide backbone breaks to form fragments with characteristic masses. H + H...-HN-CH-CO NH-CH-CO-NH-CH-CO-…OH R i-1 R i R i+1 C-terminus N-terminus AA residue i-1 AA residue i AA residue i+1 Ionized peptide fragment
Tandem MS for Peptide ID Tandem MS for Peptide ID 88 145 292 405 534 663 778 907 1020 1166 b ions S G F L E E D E L K 1166 1080 1022 875 762 633 504 389 260 147 y ions 100 % Intensity [M+2H] 2+ 0 250 500 750 1000 m/z
Peak Assignment Peak Assignment 88 145 292 405 534 663 778 907 1020 1166 b ions S G F L E E D E L K 1166 1080 1022 875 762 633 504 389 260 147 y ions y 6 100 Peak assignment implies % Intensity Sequence (Residue tag) y 7 Reconstruction ! [M+2H] 2+ y 5 b 3 b 4 y 2 y 3 b 5 y 4 y 8 b 8 b 9 b 6 b 7 y 9 0 250 500 750 1000 m/z
Database Searching for peptide ID Database Searching for peptide ID ß For every peptide from a database For every peptide from a database ß ß Generate a hypothetical spectrum Generate a hypothetical spectrum ß ß Compute a correlation between observed Compute a correlation between observed ß and experimental spectra and experimental spectra ß Choose the best Choose the best ß ß Database searching is very powerful and Database searching is very powerful and ß is the de facto de facto standard for MS. standard for MS. is the ß Sequest Sequest, Mascot, and many others , Mascot, and many others ß
Spectra: the real story Spectra: the real story ß Noise Peaks Noise Peaks ß ß Ions, not prefixes & suffixes Ions, not prefixes & suffixes ß ß Mass to charge ratio, and not mass Mass to charge ratio, and not mass ß ß Multiply charged ions Multiply charged ions ß ß Isotope patterns, not single peaks Isotope patterns, not single peaks ß
Peptide fragmentation possibilities (ion types) x n-i y n-i y n-i-1 v n-i w n-i z n-i -HN-CH-CO-NH-CH-CO-NH- CH-R’ R i i+1 a i R” i+1 b i b i+1 c i d i+1 low energy fragments high energy fragments
Ion types, and offsets Ion types, and offsets ß P = prefix residue mass P = prefix residue mass ß ß S = Suffix residue mass S = Suffix residue mass ß ß b-ions = P+1 b-ions = P+1 ß ß y-ions = S+19 y-ions = S+19 ß ß a-ions = P-27 a-ions = P-27 ß
Mass-Charge ratio Mass-Charge ratio ß The X-axis is (M+Z)/Z The X-axis is (M+Z)/Z ß ß Z=1 implies that peak is at M+1 Z=1 implies that peak is at M+1 ß ß Z=2 implies that peak is at (M+2)/2 Z=2 implies that peak is at (M+2)/2 ß ß M=1000, Z=2, peak position is at 501 M=1000, Z=2, peak position is at 501 ß ß Suppose you see a peak at 501. Is the mass Suppose you see a peak at 501. Is the mass ß 500, or is it 1000? 500, or is it 1000?
Isotopic peaks Isotopic peaks ß Ex: Consider peptide SAM Ex: Consider peptide SAM ß ß Mass = Mass = 308.12802 ß 308.12802 ß You should see: You should see: ß 308.13 ß Instead, you see Instead, you see ß 308.13 310.13
Isotopes Isotopes ß C-12 is the most common. Suppose C-13 C-12 is the most common. Suppose C-13 ß occurs with probability 1% occurs with probability 1% ß EX: EX: SAM ß SAM ß Composition: C11 H22 N3 O5 S1 ß Composition: C11 H22 N3 O5 S1 ß What is the probability that you will see a What is the probability that you will see a ß single C-13? single C-13? Ê Á ˆ 11 ˜ ⋅ 0.1 ⋅ 0.9 10 1 Ë ¯ ß Note that C,S,O,N all have isotopes. Can you Note that C,S,O,N all have isotopes. Can you ß compute the isotopic distribution? compute the isotopic distribution?
Recommend
More recommend