Kpax Protein Structure Alignment Demo: Using Kpax on Linux Dave - PowerPoint PPT Presentation

Outline Overview of Protein Sequences and Structures Structural Alignment Using Dynamic Programming The Kpax Algorithm Explained Kpax – Protein Structure Alignment Demo: Using Kpax on Linux Dave Ritchie Practical: Homology Modeling Using Kpax + Modeler Team Orpailleur Inria Nancy – Grand Est 2 / 33 Protein Sequences and Structures Comparing Two Strings Q. Suppose we have two strings, e.g. EXPONENTIAL and POLYNOMIAL . How do we measure their similarity? A1. In information theory, the edit distance measures the cost of transforming one string into another using one-character edits POLYNOMIAL A2. Match 3 letters and then give a score for each pair... ||| EXPONENTIAL Q. Suppose gaps are allowed. What is the best possible alignment? --POLYNOM-IAL --POLYNOMIAL A. How about or ? || | ||| || | ||| EXPO--NENTIAL EXPONEN-TIAL Q. Which is better ? A1. The second one? (6 matches + 3 gaps v’s 6 matches + 5 gaps) Source: ”The Gam protein of bacteriophage Mu is an orthologue of eukaryotic Ku”, A2. ... It depends on the score for each pair and the penalty for a gap F.A. di Fagagna et al. , EMBO Reports (2003), 4, 47–52 3 / 33 4 / 33

Dynamic Programming Back-Tracking Through The DP Scoring Table Dynamic programming (DP) is a method of dividing a problem into smaller P O L Y N O M I A L sub-problems. It was first described by Richard Bellman in the 1940s. But p p p p p p p p p p p p 0 instead of using recursion, it uses a table (“memoisation” in 1940s language). p E 1 p X 2 Goal: find similarity E ( n , m ) between two strings: x [ 1: n ] and y [ 1: m ] p P 3 p O 4 p 5 N Sub-goal: find E ( i , j ) between two prefixes: x [ 1: i ] and y [ 1: j ] p E 6 p N 7 p T 8 x [ i ] x [ i ] Observation: the best alignment must end on y [ j ] or or − p I 9 y [ j ] − p A 10 p L 11 Method: build similarity table with scores S ( i , j ) and penalties P ( i ) : p 12 0 1 2 3 4 5 6 7 8 9 1011  E ( i − 1 , j − 1 ) + S ( i , j )   --POLYNOMIAL E ( i , j ) = max E ( i , j − 1 ) − P ( i ) This gives the desired optimal alignment || | |||  E ( i − 1 , j ) − P ( j )  EXPONEN-TIAL Then, “trace back” from E ( n , m ) to E ( 1 , 1 ) to extract the alignment 5 / 33 6 / 33 3D Least-Squares Fitting So, What’s The Problem? Least-squares fitting finds the 3D rotation/translation matrix M that DP is “perfect” for 1D string matching minimises the sum of squared distances: Least-squares fitting is “perfect” for 3D superposition N � BUT ( x A i − M . x B i ) 2 F = Proteins are not made of 1D symbols or 3D points. They are made i = 1 For proteins, the x i are normally C α atom coordinates of complex 3D chemical components (amino acid residues). It is The translational part is easy – shift centres of mass to the origin difficult to write a good scoring function to compare residues... The rotation can be found using eigenvector or quaternion methods Similar 1D protein sub-sequences can have different 3D shapes ( α -helices, β -strands), i.e. global environment can affect local shape. The residual error (RMSD) is then given by We don’t know a priori the right 1D pairings for 3D fitting... � N Proteins are globally flexible. Even if many local 1D regions “match”, � � 1 � � ( x A i − M . x B i ) 2 RMSD = not all of them might simultaneously superpose well in 3D space... N i = 1 ADDITIONALLY! So, given list of aligned C α ’s, we can fit optimally to some RMSD Proteins can contain multiple repeats and/or transpositions... 7 / 33 8 / 33

Over 100 Structure Alignment Algorithms in 25 Years Quick List of Structural Alignment Approaches http://en.wikipedia.org/wiki/Structural alignment software “elastic” Gaussian scoring “double dynamic programming” on C α distance matrices triples or higher fragments (8-tuples) of C α atoms backbone C α vectors backbone torsion angles secondary structure elements geometric hashing Voronoi tessellations structural alphabets Lagrangian contact map optimisation eigenvector analysis of distance matrices Fourier correlations 90 more... Gaussian fragments ... 9 / 33 10 / 33 Introducing Kpax Defining Local Coordinate Frames All C α atoms have highly conserved tetrahedral geometry Exploit this to define a “canonical” C α –C–N orientation e.g. put C α at origin; C on -ve z axis; N in +ve xz plane http://kpax.loria.fr/ Dynamic programming with Gaussian scores Uses NO sequence similarity OR secondary structure information Very fast database search (CATH, SCOP, Pfam, ..., user-defined) Rigid and flexible structural alignments Now, ALL α -helices and β -strands look the same at the origin Multiple flexible alignments coming soon... 11 / 33 12 / 33

Comparing Structural Fragments Representing Local Geometry as a Product of Gaussians In the canonical frame, similar structures have similar distances Calculate Gaussian distribution of all C α atoms in CATH between their up-stream and down-stream C α atoms: . .. . . . .. . .. . .. . .. . .. . . . . . . .. . . .. . . . . .. . . . . . . . . . . .. . . . . . . . . . . . .. . . .. . . . . .. . . .. . . . . .. . . .. . .. . . .. .. . . .. .. . . . .. . .. . .. . . .. .. . . . . . .. . . . . . . . . . . . . . . .. . .. .. . . . . ... . . . . . . .. . . .. . . . . . . . . .. . . . . . . . . .. . . . . . . -3 . .. . . .. . . . .. . .. . . .. . . . .. . . . .. . . .. . . .. . . . . . . . . . .. . . z . . . . . . . -2 . . .. . . . . . . . . .. . .. . . ... . . . . . . . . .. . . .. . . . . . . .. . .. . . .. . . . . . . . . . . .. . . . . . . ... . . . .. . . . . . . . . . .. . .. . . . . . . . . . . . . . . . . . . . . . -1 y .. . . . .. . . . . . . . . . . . . . .. . . . . .. . ... . . . . . . . . .. . . . . . . . . . . .. . . . . . . . .. . .. . . . .. . . . . . .. . . . . . . . . . ... . . . . . . . . . . . .. . . . . . . . . . . . . . . x . . . .. . . .. . . . . ... . +1 . . . . . .. . . . . . . .. .. . . . . . . . .. . . . . . . . .. .. . . . . . . .. . . .. . . . . . . .. . . . .. . . . .. . . . . . . .. .. . . . . .. . . . . . .. . . .. . . . . . . CATH +2 . . .. . . .. . . .. . . .. . .. . .. . . .. . .. . .. . . . . .. . . . . . . . . . . . .. . . . . . .. . . . .. . . . .. . . . .. . . . . . .. . . .. . . . . .. . . . .. .. .. +3 .. .. . . .. . . . . .. . . . .. .. . . . .. . . . Gives Gaussian width σ k for each up-stream and down-stream C α Then, represent residue i as a product of Gaussians: ψ i = φ − 1 ( x i − 1 ) φ + 1 ( x i − n ) φ + n ( x i + 1 ) ... φ − n ( x i + n ) i i i i each individual Gaussian function has the form: But how to combine all the distances into a single score? i ( x i + k ) = N k e − β k r 2 k / 2 σ 2 φ k k 13 / 33 14 / 33 Calculating a Per-Residue Local Similarity Score Detecting Secondary Structure Elements By sliding a model α -helix and β -strand along a structure, Kpax detects its secondary structure elements (SSEs) automatically (it does not distinguish π or 3 10 helices or detect β -turns). Here are some examples: Calculate the local-frame similarity, K local , as an overlap integral ij � K local = ψ i ψ j d x − n ... + n . ij With products of Gaussians, this reduces to a simple sum = e − � n k = − n β k R 2 i + k , j + k / 4 σ 2 K local k , ij In identical α -helices, β -strands, and even loops, K local = 1. ij Nice, but how to match correctly a short α -helix with a longer one? 15 / 33 16 / 33

Kpax Protein Structure Alignment Demo: Using Kpax on Linux Dave - PowerPoint PPT Presentation

Outline Overview of Protein Sequences and Structures Structural Alignment Using Dynamic Programming The Kpax Algorithm Explained Kpax Protein Structure Alignment Demo: Using Kpax on Linux Dave Ritchie Practical: Homology Modeling Using

Protein Sequence Analysis Protein Sequence Analysis Protein sequence motifs Protein sequence

CSE182-L7 CSE182-L7 Protein structure Basics Protein structure Basics Protein sequencing via MS

Protein Structure Analysis with Protein Structure Analysis with Protein Structure Analysis with

Geometric arrangement algorithms for protein structure determination Jeff Martin Bruce Donald

Protein-Protein interactions Reducing the complexity Why are protein-protein interactions

Sequence Alignment Gerhard Jger ESSLLI 2016 Gerhard Jger Sequence Alignment ESSLLI 2016 1

Sequence Alignment (chapter 6) p The biological problem p Global alignment p Local alignment p

Protein design Chris Bystroff Biology 12 Apr 2016 1 Protein folding/ protein design folding

Protein Structure Prediction 1 Ram Samudrala, University of Washington Rationale for

Animal protein production in a Animal protein production in a Animal protein production in a

DNA RNA Protein synthesis AMINO ACIDS PROTEIN Protein degradation FUNCTION Some properties

Dynamics of Protein-Protein Interactions: A Probabilistic Model Toward Protein Function Amir

Global alignment of protein-protein interaction networks by graph matching methods. Mikhail

Part I : I ntroduction to Protein Structure A/P Shoba Ranganathan Kong Lesheng National

Protein Structure Prediction Protein = chain of amino acids (AA) aa connected by peptide

Ben Burr Trail PROJECT ALIGNMENT Project alignment Hamblen Elem School PROJECT ALIGNMENT

What do we learn from Pan- Cancer Subtyping? TCGA Symposium May 12, 2014 Pan-Can Integrated

Animal Source Foods and Child Participate during the seminar: Cognitive Development: A #AgEvents

Prediction of Human Protein Kinase Substrate Specificities Javad Safaei 1 , Jan Manuch 1 , Arvind

Last class... To understand how living systems work, we need to focus at different

VI.3 Rule-Based Information Extraction Goal: Identify and extract unary, binary, or n -ary

Homework 4. SDP Extensions of PCA/MDS Instructor: Yuan Yao Due: Open Date The problem below

Harvard Applied Mathematics 205 Unit 0: Overview of Scientific Computing Lead instructor: Chris

Age nda End of Que r y Opt i m i z a t i on Que s t i ons ? Da t a I