algorithms in bioinformatics
play

Algorithms in Bioinformatics: Proteins Methods for protein - PowerPoint PPT Presentation

AlgBioInfo A. Mucherino Introduction Algorithms in Bioinformatics: Proteins Methods for protein Molecular Distance Geometry determination Distance Geometry the MDGP the Simulated Annealing Antonio Mucherino Discrete Distance


  1. AlgBioInfo A. Mucherino Introduction Algorithms in Bioinformatics: Proteins Methods for protein Molecular Distance Geometry determination Distance Geometry the MDGP the Simulated Annealing Antonio Mucherino Discrete Distance www.antoniomucherino.it Geometry The DDGP the BP algorithm IRISA, University of Rennes 1, Rennes, France Vertex orders Making order Consecutivity de Bruijn last update: October 5 th 2016 Optimization Ending Challenge More research

  2. Proteins AlgBioInfo Proteins are biochemical molecules consisting of one or more A. Mucherino polypeptides, typically folded into a globular or fibrous form, which Introduction perform a certain biological function. Proteins Methods for protein determination Distance They are chains of smaller molecules Geometry called amino acids. the MDGP the Simulated Their three-dimensional conformations Annealing can give clues about their biological Discrete Distance function. Geometry Google finds about 315,000,000 The DDGP documents containing the word “protein”. the BP algorithm Vertex orders Making order Consecutivity de Bruijn Optimization Wikipedia: http://en.wikipedia.org/wiki/Protein Ending YouTube: http://www.youtube.com/watch?v=Q7dxi4ob2O4 Challenge More research

  3. The P rotein D ata B ank (PDB) It’s a database containing several protein three-dimensional conformations. AlgBioInfo A. Mucherino Introduction Proteins Methods for protein determination Distance Geometry the MDGP the Simulated Annealing Discrete Distance Geometry The DDGP the BP algorithm Vertex orders Making order Consecutivity de Bruijn The database is experiencing a great expansion: this snapshot was taken a few Optimization years ago, meanwhile the total number of conformations in the database reached Ending the 110,000 threshold! Challenge More research http://www.rcsb.org/pdb/

  4. Identifying protein conformations AlgBioInfo How to identify the three-dimensional conformation of a protein? A. Mucherino Introduction Experimental methods Proteins Methods for protein determination X-ray crystallography Distance Nuclear Magnetic Resonance (NMR) Geometry . . . the MDGP the Simulated Annealing Computational methods Discrete Distance Homology modeling Geometry The DDGP Ab-initio approaches the BP algorithm . . . Vertex orders Making order Consecutivity de Bruijn Optimization Ending Challenge This is a non-exhaustive list. More research

  5. Identifying protein conformations AlgBioInfo How to identify the three-dimensional conformation of a protein? A. Mucherino Introduction Experimental methods Proteins Methods for protein determination X-ray crystallography Distance Nuclear Magnetic Resonance (NMR) Geometry . . . the MDGP the Simulated Annealing Computational methods Discrete Distance Homology modeling Geometry The DDGP Ab-initio approaches the BP algorithm . . . Vertex orders Making order Consecutivity de Bruijn Optimization Ending Challenge This is a non-exhaustive list. More research

  6. X-ray crystallography AlgBioInfo X-ray crystallography is an experimental method for determining A. Mucherino the arrangement of atoms within a crystal. Introduction Proteins Methods for protein determination Distance Crystals of proteins are generated in Geometry order to discover their conformation. the MDGP the Simulated The crystal must have a certain size Annealing in order to be used. Discrete Distance The process of generating the crystal Geometry can be very difficult and expensive. The DDGP the BP algorithm Vertex orders Making order Consecutivity de Bruijn Optimization Ending Wikipedia: http://en.wikipedia.org/wiki/X-ray crystallography Challenge YouTube: http://www.youtube.com/watch?v=j4HgLf eJoc More research

  7. The NMR AlgBioInfo The Nuclear Magnetic Resonance (NMR) studies the behavior of A. Mucherino the magnetic moments of spin nuclei. Introduction Proteins Methods for protein determination The protein sample is submitted to an Distance external intense magnetic field, which Geometry induces the alignment of the magnetic the MDGP moment of nuclei. the Simulated Annealing The analysis of this phenomenon allows to Discrete estimate the distance between pairs of nuclei Distance (i.e., between pairs of atoms). Geometry The DDGP NMR do not directly provide information the BP algorithm about the coordinates of the atoms. Vertex orders Making order Consecutivity de Bruijn Wikipedia: http://en.wikipedia.org/wiki/Nuclear magnetic resonance spectroscopy Optimization YouTube: http://www.youtube.com/watch?v=IGk3NAziVWs Ending Challenge More research

  8. Identifying protein conformations AlgBioInfo How to identify the three-dimensional conformation of a protein? A. Mucherino Introduction Experimental methods Proteins Methods for protein X-ray crystallography determination Nuclear Magnetic Resonance (NMR) Distance Geometry . . . the MDGP the Simulated Annealing Computational methods Discrete Distance Homology modeling Geometry Ab-initio approaches The DDGP the BP algorithm . . . Vertex orders Making order Consecutivity de Bruijn Optimization We will study in details the problem of identifying protein conformations from the Ending data obtained through NMR experiments. Challenge More research

  9. Identifying protein conformations AlgBioInfo How to identify the three-dimensional conformation of a protein? A. Mucherino Introduction Experimental methods Proteins Methods for protein X-ray crystallography determination Nuclear Magnetic Resonance ( NMR ) Distance Geometry . . . the MDGP the Simulated Annealing Computational methods Discrete Distance Homology modeling Geometry Ab-initio approaches The DDGP the BP algorithm . . . Vertex orders Making order Consecutivity de Bruijn Optimization We will study in details the problem of identifying protein conformations from the Ending data obtained through NMR experiments. Challenge More research

  10. AlgBioInfo A. Mucherino Introduction Proteins Methods for protein determination Distance the Molecular Distance Geometry Problem Geometry the MDGP the Simulated Annealing Discrete MDGP Distance Geometry The DDGP the BP algorithm Vertex orders Making order Consecutivity de Bruijn Optimization Ending Challenge More research

  11. The M olecular D istance G eometry P roblem AlgBioInfo Let G = ( V , E , d ) be a simple weighted undirected graph, where A. Mucherino V the set of vertices of G − it is the set of atoms; E the set of edges of G − it is the set of known distances; Introduction E ′ ⊂ E Proteins the subset of E where distances are exact; Methods for protein determination d the weights associated to the edges of G Distance the numerical value of each weight corresponds to the Geometry known distance; it can be an interval. the MDGP the Simulated Annealing Definition Discrete Distance → R K Geometry The DGP is the problem of finding an embedding x : V − The DDGP such that: the BP algorithm ∀ ( u , v ) ∈ E ′ || x u − x v || = d ( u , v ) , Vertex orders Making order ∀ ( u , v ) ∈ E \ E ′ d ( u , v ) ≤ || x u − x v || ≤ d ( u , v ) . Consecutivity de Bruijn Optimization Ending Equality constraints represent (hyper) spheres; Challenge Inequality constraints represent (hyper) spherical shells. More research The MDGP is NP-hard.

  12. MDGP instances Where to find the necessary information about the distances? AlgBioInfo A. Mucherino when working with molecules, a set of distances can be Introduction derived from their chemical structure: Proteins Methods for protein determination Distance Geometry the MDGP the Simulated Annealing Discrete Distance Geometry The DDGP the BP algorithm Vertex orders Making order Consecutivity de Bruijn Optimization Ending additional distances can be obtained by experimental Challenge More research techniques, such as NMR.

  13. Global optimization AlgBioInfo A. Mucherino By definition, the MDGP is a constraint satisfaction problem. Introduction Proteins Methods for protein However, it is generally reformulated as a global optimization determination Distance problem, where the objective is to minimize a penalty function Geometry capable of measuring the violation of the constraints: the MDGP the Simulated Annealing Discrete � � + max ( || x u − x v || − ¯ 1 max ( d ( u , v ) − || x u − x v || , 0 ) d ( u , v ) , 0 ) Distance � Geometry ¯ | E | d ( u , v ) d ( u , v ) ( u , v ) ∈ E The DDGP the BP algorithm Vertex orders Making order When all distances are correct, the value of the penalty function in Consecutivity de Bruijn the solution is zero. Optimization Ending Challenge More research

  14. The penalty function The penalty function of the optimization problem is strongly AlgBioInfo non-smooth: A. Mucherino Introduction Proteins Methods for protein determination Distance Geometry the MDGP the Simulated Annealing Discrete Distance Geometry The DDGP the BP algorithm Vertex orders this search space is, a priori, continuous, Making order Consecutivity optimization methods risk to get stuck at local minima with de Bruijn Optimization objective value very close to the optimal one. Ending Challenge Function graphic from: C. Lavor, A. Mucherino, L. Liberti, N. Maculan, On the Computation of Protein More research Backbones by using Artificial Backbones of Hydrogens , Journal of Global Optimization 50 (2), 329–344, 2011.

Recommend


More recommend