Protein threading Protein Threading Basic premise Structure is - PDF document

Protein threading Protein Threading • Basic premise Structure is better conserved than sequence The number of unique structural (domain) folds in nature Structure can adopt a is fairly small (possibly a few thousand) wide range of mutations. • Statistics from Protein Data Bank (~35,000 structures) Physical forces favor certain structures. 90% of new structures submitted to PDB in the past three years have similar structural folds in PDB Number of folds is limited. Currently ~700 Total: 1,000 ~10,000 TIM barrel Concept of Threading Threading problem o Thread ( align or place) a query protein sequence • Threading: Given a sequence, and a fold (template), onto a template structure in “optimal” way compute the optimal alignment score between the o Good alignment gives approximate backbone sequence and the fold. structure • If we can solve the above problem, then Query sequence • Given a sequence, we can try each known fold, and find MTYKLILNGKTKGETTTEAVDAATAEKVFQYANDNGVDGEWTYTE the best fold that fits this sequence. Template set • Because there are only a few thousands folds, we can find the correct fold for the given sequence. • Threading is NP-hard. Components of Threading Protein Threading – structure database • Template library • Use structures from DB classification categories (PDB) • Build a template database • Scoring function • Single and pairwise energy terms • Alignment • Consideration of pairwise terms leads to NP-hardness • heuristics • Confidence assessment • Z-score, P-value similar to sequence alignment statistics • Improvements • Local threading, multi-structure threading

Protein Threading – energy function Assessing Prediction Reliability MTYKLILNGKTKGETTTEAVDAATAEKVFQYANDNGVDGEWTYTE MTYKLILNGKTKGETTTEAVDAATAEKVFQYANDNGVDGEWTYTE how preferable to put two particular residues how well a residue fits nearby: E_p a structural environment: E_s alignment gap Score = -1500 Score = -720 Score = -1120 Score = -900 penalty: E_g Which one is the correct structural total energy: E_p + E_s + E_g fold for the target sequence if any? find a sequence-structure alignment The one with the highest score ? to minimize the energy function Prediction of Protein Structures Prediction of Protein Structures • Examples – a few good examples • Not so good example actual predicted actual predicted actual predicted actual predicted Existing Prediction Programs • PROSPECT • https://csbl.bmb.uga.edu/protein_pipeline • FUGU • http://www-cryst.bioc.cam.ac.uk/~fugue/prfsearch.html • THREADER • http://bioinf.cs.ucl.ac.uk/threader/

CASP/CAFASP CASP6/CAFASP4 • CASP: Critical CASP • 64 targets Assessment of Predictor • Resources for predictors Structure Prediction • No X-ray, NMR machines (of course) • CAFASP4 predictors: no manual intervention • CASP6 predictors: anything (servers, google,…) • CAFASP: Critical • Evaluation: Assessment of Fully • CASP6 Assessed by experts+computer CAFASP Automated Structure Predictor • CAFASP4 evaluated by a computer program. Prediction • Predicted structures are superimposed on the 1. Won’t get tired experimental structures. 2. High-throughput • CASP7 will be held this year (November) Protein structure databases • PDB • 3D structures • SCOP • Murzin, Brenner, Hubbard, Chothia • Classification • Class (mostly alpha, mostly beta, alpha/beta (interspersed), alpha+beta (segregated), multi-domain, membrane) • Fold (similar structure) • Superfamily (homology, distant sequence similarity) (a) myoglobin (b) hemoglobin (c) lysozyme (d) transfer RNA (e) antibodies (f) viruses (g) actin (h) the nucleosome • Family (homology and close sequence similarity) (i) myosin (j) ribosome Courtesy of David Goodsell, TSRI Protein databases The SCOP Database • CATH Structural Classification Of Proteins • Orengo et al • Class (alpha, beta, alpha/beta, few SSEs) FAMILY: proteins that are >30% similar, or >15% similar and have similar known structure/function • Architecture (orientation of SSEs but ignoring connectivity) SUPERFAMILY: proteins whose families have some sequence and function/structure similarity suggesting a common evolutionary • Topology (orientation and connectivity, based on origin SSAP = fold of SCOP) • Homology (sequence similarity = superfamily of COMMON FOLD: superfamilies that have same secondary structures SCOP) in same arrangement, probably resulting by physics and chemistry • S level (high sequence similarity = family of SCOP) • SSAP alignment tool (dynamic programming) CLASS: alpha, beta, alpha–beta, alpha+beta, multidomain

Protein databases Protein structure comparison • FSSP • Levels of structure description • DALI structure alignment tool (distance matrix) • Atom/atom group • Holm and Sander • Residue • Fragment • MMDB • Secondary structure element (SSE) • VAST structure comparison (hierarchical) • Basis of comparison • Madej, Bryant et al • Geometry/architecture of coordinates/relative positions • sequential order of residues along backbone, ... • physio-chemical properties of residues, … How to compare? Structure Analysis – Basic Issues • Coordinates for representing 3D structures • Key problem : find an optimal correspondence • Cartesian between the arrangements of atoms in two • Other (e.g. dihedral angles) molecular structures (say A and B) in order to align • Basic operations them in 3D • Translation in 3D space • Optimality of the alignment is determined using a • Rotation in 3D space root mean square measure of the distances • Comparing 3D structures between corresponding atoms in the two • Root mean square distances between points of two molecules are typically used as a measure of how well they are aligned molecules • Efficient ways to compute minimal RMSD once correspondences are • Complication : It is not known a priori which atom known (O(n) algorithm) in molecule B corresponds to a given atom in • Using eigenvalue analysis of correlation matrix of points molecule A (the two molecules may not even have • Due to the high computational complexity, practical the same number of atoms) algorithms rely on heuristics Structure Analysis – Basic Issues Find the optimal alignment • Sequence order dependent approaches • Computationally this is easier • Interest in motifs preserving sequence order • Sequence order independent approaches • More general • Active sites may involve non-local AAs + • Searching with structural information

Optimal Alignment Structure Comparison Which atom in structure A corresponds to • Find the highest number of atoms aligned with which atom in structure B ? the lowest RMSD (Root Mean Squared Deviation) THESESENTENCESALIGN--NICELY • Find a balance between local regions with very ||| || |||| ||||| |||||| good alignments and overall alignment THE--SEQUENCE-ALIGNEDNICELY Structural Alignment Structure Comparison Methods to superimpose structures by translation and rotation x 1 , y 1 , z 1 x 1 + d, y 1 , z 1 An optimal superposition of myoglobin x 2 , y 2 , z 2 x 2 + d, y 2 , z 2 and beta-hemoglobin, which are Translation x 3 , y 3 , z 3 x 3 + d, y 3 , z 3 structural neighbors. However, their sequence homology is only 8.5% Rotation Structure Comparison Root Mean Square Deviation Scoring system to find optimal alignment 5 = ∑ − 2 (X X ) RED1 BLUE1 d + d + d + d + d Answer: Root Mean Square Deviation ( RMSD ) RMS i = 1 ~ 1 2 3 4 5 5 5 ∑ 2 d 3 i = i 4 RMSD 1 5 2 n n = number of atoms d i = distance between 2 corresponding atoms i 1 2 3 4 5 in 2 structures

Protein threading Protein Threading Basic premise Structure is - PDF document

Protein threading Protein Threading Basic premise Structure is better conserved than sequence The number of unique structural (domain) folds in nature Structure can adopt a is fairly small (possibly a few thousand) wide range of

Threading, Events, and Concurrency Threading Recap Threading in Multicore World

Protein Sequence Analysis Protein Sequence Analysis Protein sequence motifs Protein sequence

Protein-Protein interactions Reducing the complexity Why are protein-protein interactions

Chip Multi-threading and Chip Multi-threading and Sun s Niagara-series s Niagara-series

Threading the Needle: Threading the Needle: NHs Journey to Establish NHs Journey to

Threads Threads Threads vs Processes Multi-threading Models Threading Issues

Web Threading DAVID CATUHE - @DELTAKOSH BABYLON.JS / MICROSOFT Today multi - threading is

Animal protein production in a Animal protein production in a Animal protein production in a

DNA RNA Protein synthesis AMINO ACIDS PROTEIN Protein degradation FUNCTION Some properties

CSE182-L7 CSE182-L7 Protein structure Basics Protein structure Basics Protein sequencing via MS

Dynamics of Protein-Protein Interactions: A Probabilistic Model Toward Protein Function Amir

Protein Folding Protein Folding Proteins have unique 3-dimensional shapes created by the

Hasup Lee, Seungtaek Sun and Ye-Yeong Park ( Group 6 ) Protein-Protein interaction is

Protein Folding Protein Folding Proteins have unique 3-dimensional shapes created by the

Collaboration-based Function Prediction in Protein-Protein Interaction networks Hossein Rahmani

Protein design Chris Bystroff Biology 12 Apr 2016 1 Protein folding/ protein design folding

SmartSlog knowledge patterns: initial experimental performance evaluation Pavel Vanag, Dmitry

Consumer Communications Eligibility Determination Notice May 13, 2013 Connect for Health

Consumer Communications Eligibility Determination Notice May 02, 2013 COHBE Merged Advisory

Colorados Approach to Developing the Single Streamlined Application (SSAp) March 26, 2013

Mainstreaming Development Planning in Telangana B. KALYAN CHAKRAVARTHY, IAS Director General,

CONTENTS MESSAGE FROM THE LEADER OF THE

VO for education: the VESPA web application Francesco Cepparo INAF - Osservatorio Astronomico di

BVP Wind Bond Financial Information Mark Richardson BVP Wind Bond Highlights 8 year loan

Sambuz

Useful Links

Newsletter

Mail Us