Statistical modeling in molecular medicine: proteomics Anna Gambin - PowerPoint PPT Presentation

Statistical modeling in molecular medicine: proteomics Anna Gambin Institute of Informatics, University of Warsaw

outline • masSpec basics •modeling isotopic distribution •modeling exopeptidase activity •incorporating MEROPS data • peptidase activity in time •modeling electron transfer dissociation •deconvolution of spectra • modeling fragmentation

�� Mass Spectrometry data source: Center For Proteomics, Proteins Antwerp, belgium

Identifying proteins is complicated there are plenty of proteins in a sample proteins are frequently fragmented even a single protein has a complicated signal

� �� Chemical compounds are made of different isotopes isotopic envelope

�� huge number of isotopologues C c H h N n O o S s n e i e

important observation some isotopic variants are more probable than others P( ) =

�� Assume 1) variants of isotopes of atoms are independent 2) elements vary in abundances of isotopes P( ) =

�� o 0 + o 1 + o 2 = 200

�� How much we gain by considering the smallest set with a fixed probability ? π k/ 2 ie − 1 ⇣ ⌘ Y p q k ≈ C lattice n det ∆ e 2 Γ ( k/ 2 + 1) ∝ e χ 2 ( k ) Elements ie − 1 Y Y n i e − 1 2 n e e Elements Elements

To get the smallest set with probability P: Find the most probable variant while Total Probability < P : Get layer so that p> P(v)>=qp where p = P(v min previous layer ) Trim the least probable variants from the last layer so that Total Probability >= P

� Monotonic Expansion Property: For each v set {W: P(W)>=P(v) } is adjacent to v multinomial Smallest set with current Total Probability distribution

��

�� our OPTIMAL implementation uses complexity queue for storing subsequent layers a version of quick select for trimming �� other tricks O(n) in the total number of configurations

We provide theoretical background and get better run times

proteolytic fragmentation LC-MS/MS • data for colorectal cancer patients and healthy donors • ca 1000 peptides • preprocessing : spectra interpretation and retention time aligning

Exopeptidase activity • motivation : differential exoprotease activities contribute to cancer type–specific serum peptidome degradation • our goal: first formal model estimated from LC-MS/MS data Villanueva, J., Nazarian, A., Lawlor, K., et al. 2008. A sequence-specific exopeptidase activity test (sseat) for “functional” biomarker discovery. Mol. Cell. Proteomics 7, 509–518.

⌥ Cleavage graph ⋆ FTSSTS if x ⇥ i = x i + 1, x ⇥  � i = x � i for some i , a ⇥ i   if x ⇥ j = x j + 1, x ⇥  i = x i � 1 , a r ( i,j ) x i  Q ( x, x ⇥ ) = FTSST TSSTS SSTSY and x ⇥ � i � j = x � i � j for some i ⇧ j ,    if x ⇥ i = x i � 1, x ⇥ � i = x � i for some i . a i † x i  transition intensities for Markov process FTSS TSST SSTS STSY describing the flow of particles through the graph i.e. the process of peptidome degradation FTS TSS SST STS TSY  create  a � i    FT SS TS ST SY    move   a r ( i,j ) x i   Q ( x, x � ) = †         annihilate/degrade  a i † x i  

in equilibrium Proposition 1 (Equilibrium distribution). The process .X.t// has the equilibrium (stationary) distribution � given by: e � i � x i Y i � .x/ D x i Š ; i 2 V where the configuration of intensities . � i / i 2 V is the unique solution to the following system of “balance” equations: 0 1 X @X � k a r.k;i/ C a ?i D � i a r.i;j/ C a i � for every i 2 V : A i ! j k ! i old as the hills, but…

hierarchical Bayesian model ( B r ) r ∈ R ( B ? i ) i ∈ V in S shape , S rate ( b r ) r ∈ R ( b ? i ) i ∈ V in s ∼ Gamma( S shape , S rate ) ∼ ∼ Dir(( B r ) r ∈ R ) Dir(( B ? i ) i ∈ V in ) missing readings q � i = � i ( s, b ? , b ) for i ∈ V ( ✏ i ) i ∈ V errors x i ∼ Poiss( � i ) for i : ✏ i = 1 � i ∼ Bern( q ) for i : ✏ i = 1 ⌧ Metropolis-Hastings to sample from posterior: y i ∼ LogNormal( x i , ⌧ ) for i : � i = 1 y i ∼ Background for i : � i = 0

NON TRIVIAL TASK: filling the cleavage graph with real data • from aa sequence: • 1000 peptides: mass, calculate mass charge, retention time • consider all charges • 243 precursor peptides • predict retention • ca. 40 000 subsequences time (random forests) FTSS quite often: missing reads and errors !

Cleavage graph for real proteolytic events u MSFT † LTN † K • 20 colorectal cancer ⇥ peps ⇥ ther xy vw patients and 20 thermolysin pepsin healthy donors, y x v w • ca 1000 peptides, MSFT † L † TN MSFT LTNK K • preprocessing phase ⇥ ther ⇥ chem vz st thermolysin chemotrypsin MUCH SMALLER cleavage graphs ! z s t LTN MSFTL TN

identified enzymes make sense ! Color Key and Histogram 100 Count 60 20 10 30 Value plasmin neprilysin calpain.2 matrix.metallopeptidase.3 kallikrein.related.peptidase.3 aminopeptidase.PILS legumain cathepsin.K membrane.type.matrix.metallopeptidase.4 cathepsin.H ADAM10.peptidase ADAM17.peptidase caspase.1 ADAMTS4.peptidase pepsin.A chymotrypsin.C ADAMTS5.peptidase membrane.type.matrix.metallopeptidase.6 calpain.1 cathepsin.L cathepsin.G myeloblastin chymase...Homo.sapiens..type. tryptase.alpha matrix.metallopeptidase.20 tripeptidyl.peptidase.I elastase.1 granzyme.B...Homo.sapiens..type. cathepsin.S trypsin.1 membrane.type.matrix.metallopeptidase.3 cathepsin.B eupitrilysin 25 38 16 14 7 19 3 13 1 9 15 37 28 26 34 33 39 31 22 35 17 6 12 29 8 10 2 27 23 32 11 20 18 24 21 4 5 30 36 data set no.

⇤ A. Gambin, B. Kluge / Modeling Proteolysis from MS data u stochastic dynamics in time MSFT † LTN † K ⇥ peps ⇥ ther xy vw from MEROPS: thermolysin pepsin by ρ vw the vector of all peptidase affinity coefficients for the cleavage v † w (for � � y x v w if x � = x � � u + � v + � w and u = v † w , � c T ⇥ vw x u MSFT † L † TN MSFT LTNK K Q xx � = 0 otherwise . ⇥ ther ⇥ chem vz st thermolysin chemotrypsin to be estimated: estimate peptidase cutting intensities vector to perform the cleavage is proportional z s t LTN MSFTL TN calculated from P ( x, t ) = P ( X ( t ) = x ) . CME ⌥ ⌅ ⌥ tP ( x, t ) = ( Q yx P ( y, t ) − Q xy P ( x, t )) no more monomolecular system - y ⇥ = x ⌅ c T ⇤ vw [( x u + 1) P ( x + � u − � v − � w , t ) − x u P ( x, t )] we have reactions: = A -> B and A-> B+C (endopeptidases) u = v † w ⌅ c T ⇤ vw [ x � u P ( x � , t ) − x u P ( x, t )] , = u = v † w

interesting moments... � u − v − w by E q ( t ) the expected number of instances of peptide q at time t . equation above: 150 ⌅ E q ( t ) = x q P ( x, t ) , 100 x 20 50 � ⇥ ⌅ ∂ ⌦ ∂ t E q ( t ) = λ uq E u ( t ) + λ qq E q ( t ) . Row 0 40 u → q q ∈ V − 50 60 − 100 E ( t ) = E (0) T exp( Λ t ) , − 150 20 40 60 the matrix Λ = ( λ vw ) v,w ∈ V for peptide VAHRFKDLGEEN.

ETD fragmentation more fragments more insight into structure more confidence in correct identification

some bonds get easily broken ETD .. others not

the goal of masstodon understand fragmentation inside the instrument under different experimental conditions use purified chemical samples study fragmentation pathways solution: locate fragments in data 1. deconvolute signals and 2. infer fragmentation reaction constants

Statistical modeling in molecular medicine: proteomics Anna Gambin - PowerPoint PPT Presentation

Statistical modeling in molecular medicine: proteomics Anna Gambin Institute of Informatics, University of Warsaw outline masSpec basics modeling isotopic distribution modeling exopeptidase activity incorporating MEROPS data

Proteomics databases and protein characterization tools Marie-Claude.Blatter@ISB-SIB.ch EMBnet

Quality control of proteomics data IBIP19: Integrative Biological Interpretation using Proteomics

What is proteomics good for? IBIP19: Integrative Biological Interpretation using Proteomics with

1 Genome Transcriptome Proteome Metabolome Genome: the complete set of hereditary material

4. Molecular dynamics Understanding Molecular Simulation Molecular Simulations Molecular

Proteomics and Mass Spectrometry Ron Bose, MD PhD Biochemistry and Molecular Cell Biology

Molecular Modeling of Proteins O. Michielin, SIB/LICR Molecular Modeling of Proteins Lecture

MOLECULAR DYNAMICS STUDY OF LIPOSOMES WITH A NEW COARSE-GRAINED MOLECULAR MODEL Wataru SHINODA

Principles and Applications of Proteomics Overview Why Proteomics? 2-DE Sample

Proteomics and Protein Mass Proteomics and Protein Mass Spectrometry 2004 Spectrometry 2004

Pathways analysis in proteomics Angela Bachi Dibit-San Raffaele Scientific Institute, Milano

Proteomics pathway Proteomics pathway Sample Data Analysis Separation Selection of spot(s) G

Proteomics Informatics (BMSC-GA 4437) Instructor David Feny Contact information

Proteomics and Mass Spectrometry Ron Bose, MD PhD Biochemistry and Molecular Cell Biology Programs

Proteomics and Mass Spectrometry Ron Bose, MD PhD Biochemistry and Molecular Cell Biology

Molecular vibrations Ask Hjorth Larsen Center for Atomic-scale Materials Design 2008 Molecular

Interpr preting a ng and Ap nd Appl plying g GI GI360 60 T Test Da Data: a: The

How can I prevent Hold Tube Feeds for 4 hours Prior harm in my critically ill to Extubation?

Staphylococcus aureus In 1878, Koch observed staphylococci. Staphylococcus recognized as

for advanced facility: SAXS Studies on Structures of Biological Macromolecules in Solution at 4C

Electricity & Electricity Generation GEOS 24705/ ENST 24705 Refrigeration by ice made Chicago

Reasoning with DAML+OIL: What can it do for YOU? Ian Horrocks horrocks@cs.man.ac.uk University

Call to Action on Eliminating Infection-Related Ventilator-Associated Complications (IVAC)

Mak akin ing g Alg lgor orit ithms Trustwor orthy: : Wh What t Ca Can Statistical

Explore More Topics

Sambuz

Useful Links

Newsletter

Mail Us

Statistical modeling in molecular medicine: proteomics Anna Gambin - PowerPoint PPT Presentation

Statistical modeling in molecular medicine: proteomics Anna Gambin Institute of Informatics, University of Warsaw outline masSpec basics modeling isotopic distribution modeling exopeptidase activity incorporating MEROPS data

Proteomics databases and protein characterization tools Marie-Claude.Blatter@ISB-SIB.ch EMBnet

Quality control of proteomics data IBIP19: Integrative Biological Interpretation using Proteomics

What is proteomics good for? IBIP19: Integrative Biological Interpretation using Proteomics with

1 Genome Transcriptome Proteome Metabolome Genome: the complete set of hereditary material

4. Molecular dynamics Understanding Molecular Simulation Molecular Simulations Molecular

Proteomics and Mass Spectrometry Ron Bose, MD PhD Biochemistry and Molecular Cell Biology

Molecular Modeling of Proteins O. Michielin, SIB/LICR Molecular Modeling of Proteins Lecture

MOLECULAR DYNAMICS STUDY OF LIPOSOMES WITH A NEW COARSE-GRAINED MOLECULAR MODEL Wataru SHINODA

Principles and Applications of Proteomics Overview Why Proteomics? 2-DE Sample

Proteomics and Protein Mass Proteomics and Protein Mass Spectrometry 2004 Spectrometry 2004

Pathways analysis in proteomics Angela Bachi Dibit-San Raffaele Scientific Institute, Milano

Proteomics pathway Proteomics pathway Sample Data Analysis Separation Selection of spot(s) G

Proteomics Informatics (BMSC-GA 4437) Instructor David Feny Contact information

Proteomics and Mass Spectrometry Ron Bose, MD PhD Biochemistry and Molecular Cell Biology Programs

Proteomics and Mass Spectrometry Ron Bose, MD PhD Biochemistry and Molecular Cell Biology

Molecular vibrations Ask Hjorth Larsen Center for Atomic-scale Materials Design 2008 Molecular

Interpr preting a ng and Ap nd Appl plying g GI GI360 60 T Test Da Data: a: The

How can I prevent Hold Tube Feeds for 4 hours Prior harm in my critically ill to Extubation?

Staphylococcus aureus In 1878, Koch observed staphylococci. Staphylococcus recognized as

for advanced facility: SAXS Studies on Structures of Biological Macromolecules in Solution at 4C

Electricity &amp; Electricity Generation GEOS 24705/ ENST 24705 Refrigeration by ice made Chicago

Reasoning with DAML+OIL: What can it do for YOU? Ian Horrocks horrocks@cs.man.ac.uk University

Call to Action on Eliminating Infection-Related Ventilator-Associated Complications (IVAC)

Mak akin ing g Alg lgor orit ithms Trustwor orthy: : Wh What t Ca Can Statistical

Explore More Topics

Sambuz

Useful Links

Newsletter

Mail Us

Electricity & Electricity Generation GEOS 24705/ ENST 24705 Refrigeration by ice made Chicago