Computational Systems Biology TUM WS 2010/11 Lecture 4: Protein Structure and Disorder in Complete Genomes 2010-11-11 Dr. Arthur Dong
How To Read A Paper Focus: Technical details or the big picture? Within the paper: What's the whole point, the take-home lesson? Why did they do what they did? (historical perspective) Any parts problematic and could be improved? Expected versus unexpected Go beyond the paper: Observation – Question – Hypothesis – Investigation – Application What's the next obvious step? Can I apply the same ideas/techniques in other areas? Turn any question into a project (and possibly a paper)!
Proteins are the worker molecules in a cell Alcohol dehydrogenase oxidizes alcohols to aldehydes or ketones Catalysis: Almost all chemical reactions in a living cell are catalyzed by protein enzymes. Transport: Some proteins transports various substances, such as oxygen, ions, and so on. Haemoglobin carries oxygen Information transfer: For example, hormones. Insulin controls the amount of sugar in the blood
Levels of Protein Structure
Sometimes we don't have a choice... Secondary structures, α-helix and β-sheet, have regular hydrogen-bonding patterns.
Tertiary structure 6
Protein Structure in Complete Genomes 1990s – The start of complete-genome sequencing Sequencing and Assembly Gene Prediction Proteome – the “parts” list of all proteins (our starting point) H. influenzae – 1995 (bacteria) M. jannaschii – 1996 (archaea) S. cerevisiae – 1996 (eukarya) Comparison of living organisms at different scales: At atom and amino acids level (physics and chemistry) they are all the same. At species level they are all different. Find the happy medium – molecular biology (individual proteins etc) and systems biology (the interaction of proteins etc) 3 diverse organisms from 3 kingdoms of life Expect significant differences in their genomes – what are those? What are actually similar? Method – sequence analysis Object – protein structure Perspective – genome-wide, systems-level
Compare secondary structures across genomes The expected The unexpected Why? Possible explanations
Comparison of super-secondary structures
Protein Tertiary Structure: PDB and SCOP PDB – depository of all solved structures (can be multi-domain or multi-protein) SCOP – classification of domains/proteins by structural and evolutionary relatedness SCOP hierarchy: Family: homologs (evolutionarily related, >30% sequence identity, similar function) Superfamily: likely homologs (low sequence identity but similar function) Fold: Similar tertiary structure – same secondary elements arranged in the same way in space Difference mainly in flanking and connecting regions e.g. loops/turns Possibly no evolutionary relation and low sequence identity
Folds across genomes Bias → Structural Genomics Ancient folds Prevalence of mixed folds
5 Most Common Folds Present in All 3 Genomes Similar architecture! Similar function (basic metabolism) Why are they common? (evolution, folding energy, ...)
Application: Whole-genome trees based on fold occurance
Protein Disorder What is protein disorder? Not everything folds into compact 3D structure Abundance of “floppy”, extended regions Conformation ensemble rather than fixed structure What is its function? Coupled binding (“induced fit” rather than “lock-and-key”) High specificity, low affinity (easily reversible) Interaction with a large number of targets Can you predict disorder from sequence? Low sequence complexity Amino acid compositional bias
Coupling of folding to target binding KID domain of CREB pKID bound to KIX domain of CBP (CREB binding protein). • Can provide tighter binding than similar sized, folded proteins. • Enthalpy-Entropy compensation. Predicted α -helices in free peptide • Allows post-translational modification. Experimentally determined α -helices in complex
Protein Disorder in Complete Genomes Which kinds of proteins tend to be disordered?
Gene Ontology – A Unifying Vocabulary Across Organisms
Clustering of Genes – mRNA versus GO
GO: Molecular Function Un/expected?
GO: Cellular Component Consistent?
Some obvious questions: Are disorder conserved? More protein interactions?
Recommend
More recommend