Gérard, Daniel, Automatic identification and Stephane, Sebastien characterisation of proteins Khaled, Pierre-Alain, Robin, David, Christine, Ron Patricia H., Markus, Patricia P., Marc, Vassilios Nadine Definitions Definitions Genome DNA « Genomics » « Transcriptomics » mRNA Proteome proteins « Proteomics » Cell function 1
Definitions (cont.) Definitions (cont.) • Proteome : the set of expressed genes (proteins) in a given organism, at a given point in time, in a given situation. The protein complement of the genome. • Proteomics : the analysis of proteome(s) The big challenge The big challenge To observe the whole system To observe the whole system To understand the systems, instead of To understand the systems, instead of their components and their complexity their components and their complexity 2
Protein chemistry ≠ Proteomics Protein chemistry ≠ Proteomics Protein chemistry Proteomics Protein chemistry Proteomics • Individual proteins • Individual proteins • Complex mixtures • Complex mixtures • Complete sequence • Complete sequence • Partial sequence analysis • Partial sequence analysis analysis analysis • Emphasis on structure • Emphasis on structure • Emphasis on • Emphasis on and function and function identification by DB identification by DB matching matching • Structural biology • Structural biology • • Systems biology Systems biology Question Question If we can measure gene expression If we can measure gene expression (mRNA), why bother with proteomics? (mRNA), why bother with proteomics? 3
Proteome complexity Proteome complexity a b c d a b c d a b c a c d Alternative splicing a’ b c d a b’ c’ d truncations, fragments a b c d a b c d PTMs discrete and heterogeneous a b c d a b c d 4
α 2-Macroglobuline Ceruloplasmine C3 α IgA chaine S C1s α 1-Antitrypsin IgM chaine µ Prothrombin Hémopexine α 1-B-Glycoprotéine IgD chaine δ α 2-Antiplasmine Albumin α 1-Anti- IgA chaine α chymotrypsin Antithrombine III α 2 -HS- Angiotensinogène α 1-Antitrypsin Glycoprotéine Gc-Globuline Histidine-Riche Glycoprotéine Ac. Oroso- Fibrinogen chaine γ mucoid Paraoxonase Fibrinogen cleaved γ chain LRG ApoA-IV Actin Haptoglobin chaine β Zn- α Glycoprotéine C3 ApoJ NA3 ApoJ Haptoglobin clivée chaine β α− α− Microglobuline Transthyrétine ApoE3,3 (multimère) ApoD Sérum Amyloid P C- Réactive Protéine Ig chaines légères Ig chaine légère κ ProapoA-I Ig chaine J ApoA-I SRBP ApoA-IV Hémopexine (frag.) Haptoglobin chaine α 2 (frag.) Transthyrétine ApoA-I (frag.) Vitronectin(frag.) Haptoglobin chaine α 1 ApoC-II & C-III ApoE (frag.) ApoA-II Haptoglobin α 2 chain Transthyretin Haptoglobin α 1 chain 5
Log-log plot of the abundance of 19 gene products at the protein level (X-axis) and mRNA level (Y-axis) 1 ACTG ACTB CPS CYB5 0.1 TBA1 LAMR ENPLCRTC TBB1 R=0.48 Message Abundance HSP90 F1ATPB LAMB HSP60 2 clones (% of clones in RTI) PDI TPM GR75 1 clone HSC70 BIP 0.01 HSP70 0.001 1000 10000 100000 1000000 Protein Abundance (CB stained 2D-PAGE gels) Modified from Electrophoresis 1997, 18, 533-537 Proteins are heterogeneous Proteins are heterogeneous chemical entities chemical entities • • Huge variations in concentration Huge variations in concentration • • Wide range of molecular weight Wide range of molecular weight • Wide range of • Wide range of pI pI • • Wide range of Wide range of hydrophobicity hydrophobicity • • Modify over time Modify over time • • Differ from individual to individual Differ from individual to individual • Various stabilities • Various stabilities 6
Active proteins are not living alone Active proteins are not living alone • • Interactions with other proteins Interactions with other proteins • • Interactions with DNA/RNA Interactions with DNA/RNA • • Interactions with chemical entities ( Interactions with chemical entities (ligands ligands, , cofactors, metal ions, hormones, etc.) cofactors, metal ions, hormones, etc.) How variable is the human How variable is the human proteome? proteome? • 1 proteome per tissue (250 tissues) 1 proteome per tissue (250 tissues) • • 1 proteome per sub-cellular fractions, etc. • 1 proteome per sub-cellular fractions, etc. • 1000 diseases 1000 diseases • • Proteomes change with external influences (food, • Proteomes change with external influences (food, drugs, drugs, … …) ) • Proteomes change with time Proteomes change with time • • One genome per individual, but how many • One genome per individual, but how many proteomes? >10 proteomes? >10 6 6 ? ? • 40 40’ ’000 genes 000 genes 1 1’ ’000 000’ ’000 proteins? 000 proteins? • 7
Definitions (end) Definitions (end) • Proteome : the set of expressed genes (proteins) in a given organism, at a given point in time, in a given situation. The protein complement of the genome. • Proteomics : the analysis of proteome(s) • Proteomatics : bioinformatics for proteomics Proteomics: an analytical and Proteomics: an analytical and bioinformatic challenge bioinformatic challenge • • No equivalent of PCR No equivalent of PCR – A small amount of a polypeptide must be detected and analysed without any amplification • • No specific hybridisation No specific hybridisation • • One gene produces n? One gene produces n? proteins proteins • • Requires analytical methods and specific Requires analytical methods and specific bioinformatics tools bioinformatics tools 8
Proteomics: a challenge Proteomics: a challenge • • Integration of four important tools: Integration of four important tools: 1. Analytical protein separation technologies 2. Databases (complete genome sequences, EST, proteins) 3. Mass spectrometry (MS) 4. Bioinformatics 1. Analytical protein separation 1. Analytical protein separation technologies technologies • • LC LC • • 1-D 1-D • 2-D • 2-D 9
2. Databases 2. Databases • • Complete genome sequences Complete genome sequences • • EST EST • • Proteins Proteins Large but Large but known known index of possible proteins index of possible proteins Prediction (motifs, domains, structures) Prediction (motifs, domains, structures) 3. Mass spectrometry (MS) 3. Mass spectrometry (MS) • • Precise analysis of bio-molecules (proteins Precise analysis of bio-molecules (proteins and peptides): and peptides): – Exact measure of intact protein masses > 100 kDa – Exact measure of peptide masses from enzymatic digestion high sensitivity and high precision – Peptide sequence analysis 10
4. Bioinformatics 4. Bioinformatics • • Correlation of MS data and specific protein Correlation of MS data and specific protein sequence databases. sequence databases. • De novo De novo sequencing sequencing • • • Characterisation of proteins with MS data Characterisation of proteins with MS data Principal applications of Principal applications of proteomics proteomics - Analysis of differential protein expression associated to a disease, different cell states, sample treatments drug targets, disease markers - Study of protein/protein interactions - Micro-characterisation of proteins * Identification (catalogue of proteins) * characterisation of post-translational modifications From A. Pandey, M. Mann Nature 2000, 405, 837-846 11
Proteomics pathway Proteomics pathway Sample Data Analysis, Separation Selection of spot(s) Sample and data tracking G Q R E N K T M E Post-separation analysis ... NRTKGG ... Databases Data processing Proteomics pathway Proteomics pathway choice of sample LC (CEX, affinity, etc.) sample collection 1-DE (CE, SDS) sample pre-fractionation sample pre-treatment 2-DLC, 2-DE Samples comparison Statistical analysis Choice of fractions (LC) Choice of gel spots (1-DE, 2-DE) Systematic analysis Edman sequencing AAA Specific Identification tools Endoproteolytic cleavage Specific Characterisation tools Mass Spectrometry (MALDI-MS, ESI MS/MS) Analysis tools Validation tools 12
Recommend
More recommend