Introduction Continuous Optimization Numerical Results Conclusions Continuous global optimization for protein structure analysis . Bertolazzi 1 , C. Guerra 2 , F .Lampariello 1 , G. Liuzzi 1 P PR PS BB 2011, 13/09/2011 1IASI - Consiglio Nazionale delle Ricerche 2DEI - Università di Padova
Introduction Continuous Optimization Numerical Results Conclusions Outline Continuous global optimization for protein structure analysis 1 Introduction 2 Continuous Optimization 3 Numerical Results 4 Conclusions
Introduction Continuous Optimization Numerical Results Conclusions Problem description Given two patches of two proteins surfaces, find the isometric transformation (roto-translation) which best overlaps one patch onto the other.
Introduction Continuous Optimization Numerical Results Conclusions Motivations Binding pockets or cavities of similar shape are likely to bind the same ligand Surface alignment is useful in determining if there exists a portion of a target protein which is similar to the active site of a known (model) protein If this happens then the target protein is likely to bind the same ligand as the model one thus having similar functional properties 1gol 1csn Ligand: ATP (Adenosine TriPhosphate)
Introduction Continuous Optimization Numerical Results Conclusions Approaches In computer vision and computer graphics the problem is a.k.a. Surface Registration Geometric Hashing [Lamdan & Wolfson IEEE CV ’88] Preprocessing to build hash table (time consuming) Recognition based on voting process Finds most similar image among a set of reference images Iterative Closest Point (ICP) [Besl & McKay IEEE PAMI ’92] No preprocessing needed Fast but often yields poor alignments outcome depends on initial guess Shape Contexts [Belongie et al. IEEE PAMI ’02] Preprocessing to build shape contexts (time consuming) Recognition based on correlation process
Introduction Continuous Optimization Numerical Results Conclusions Approaches In computer vision and computer graphics the problem is a.k.a. Surface Registration Geometric Hashing [Lamdan & Wolfson IEEE CV ’88] Preprocessing to build hash table (time consuming) Recognition based on voting process Finds most similar image among a set of reference images Iterative Closest Point (ICP) [Besl & McKay IEEE PAMI ’92] No preprocessing needed Fast but often yields poor alignments outcome depends on initial guess Shape Contexts [Belongie et al. IEEE PAMI ’02] Preprocessing to build shape contexts (time consuming) Recognition based on correlation process
Introduction Continuous Optimization Numerical Results Conclusions Approaches In computer vision and computer graphics the problem is a.k.a. Surface Registration Geometric Hashing [Lamdan & Wolfson IEEE CV ’88] Preprocessing to build hash table (time consuming) Recognition based on voting process Finds most similar image among a set of reference images Iterative Closest Point (ICP) [Besl & McKay IEEE PAMI ’92] No preprocessing needed Fast but often yields poor alignments outcome depends on initial guess Shape Contexts [Belongie et al. IEEE PAMI ’02] Preprocessing to build shape contexts (time consuming) Recognition based on correlation process
Introduction Continuous Optimization Numerical Results Conclusions Approaches In computer vision and computer graphics the problem is a.k.a. Surface Registration Geometric Hashing [Lamdan & Wolfson IEEE CV ’88] Preprocessing to build hash table (time consuming) Recognition based on voting process Finds most similar image among a set of reference images Iterative Closest Point (ICP) [Besl & McKay IEEE PAMI ’92] No preprocessing needed Fast but often yields poor alignments outcome depends on initial guess Shape Contexts [Belongie et al. IEEE PAMI ’02] Preprocessing to build shape contexts (time consuming) Recognition based on correlation process
Introduction Continuous Optimization Numerical Results Conclusions Approaches In computer vision and computer graphics the problem is a.k.a. Surface Registration Geometric Hashing [Lamdan & Wolfson IEEE CV ’88] Preprocessing to build hash table (time consuming) Recognition based on voting process Finds most similar image among a set of reference images Iterative Closest Point (ICP) [Besl & McKay IEEE PAMI ’92] No preprocessing needed Fast but often yields poor alignments outcome depends on initial guess Shape Contexts [Belongie et al. IEEE PAMI ’02] Preprocessing to build shape contexts (time consuming) Recognition based on correlation process
Introduction Continuous Optimization Numerical Results Conclusions Iterative Closest Point Assumes closest points correspond to each other Optimize to reduce overall error Good for registering surfaces with shapes very similar to each other Final result heavily depends on initial relative position between surfaces. Convergence to local minima Poor results for protein surfaces ICP attracted by local minima
Introduction Continuous Optimization Numerical Results Conclusions Iterative Closest Point Assumes closest points correspond to each other Optimize to reduce overall error Good for registering surfaces with shapes very similar to each other Final result heavily depends on initial relative position between surfaces. Convergence to local minima Poor results for protein surfaces ICP attracted by local minima
Introduction Continuous Optimization Numerical Results Conclusions Continuous Global Optimization The problem can be (mathematically) formulated as min (P) u =( x , y , z ,α,β,γ ) f ( u ) where: f ( u ) is a so called distance function. The global minimizer(s) u ⋆ = ( x , y , z , α, β, γ ) ⋆ of Problem (P) gives the best isometric transformation(s) which makes the two surfaces best overlap onto each other
Introduction Continuous Optimization Numerical Results Conclusions Problem Properties Problem (P) has the following distinguishing properties presence of many local minima besides the global ones first derivatives of f are unavailable Hence we use a Derivative-Free Controlled Random Search (DF-CRS) global optimization method
Introduction Continuous Optimization Numerical Results Conclusions ACRS The method maintains a population of candidate solution throughout the entire process. It mainly consists of two phases: An initial global random search phase - generation of the inital population An iterative local refinement phase - progressive update of the population
Introduction Continuous Optimization Numerical Results Conclusions ACRS Let ǫ > 0 be a given tolerance, N = 6, p = 50 N . Global phase: Randomly generate set S 0 = { u 1 0 , . . . , u p 0 } . k = 0 f max − f min � � Do While > ǫ k k Local phase: generate a new point and update set S k Set k = k + 1 End Do u max = arg max f max = f ( u max f ( u ) ) k k k u ∈ S k u min = arg min f min = f ( u min u ∈ S k f ( u ) ) k k k
Introduction Continuous Optimization Numerical Results Conclusions Running Time Numerical experience revealed that ACRS requires O ( 10 3 ) up to O ( 10 4 ) iterations, on avarage, to converge It is able to recover good alignment when surfaces are indeed similar
Introduction Continuous Optimization Numerical Results Conclusions Binding sites alignment We perform an all-to-all comparisons on a dataset of 100 proteins in complex with one of 9 ligands: AMP , ATP , FAD, FMN, GLC, HEME, NAD, PO4, and TES. The proteins were carefully selected so that the dataset is non-redundant and the binding sites are not evolutionary related . Use atoms near (7 Åfrom the lingad) the binding site. Report q ij = 2num. align. atoms which is between 0 and 1. num. P 1 + num. P 2
Introduction Continuous Optimization Numerical Results Conclusions Binding sites alignment RED corresponds to high number of aligned atoms (good similarity) TES PO4 NAD HEM GLC FMN FAD ATP AMP Mostly red ares around the main diagonal. Proteins of the same class are correctly classified Proteins belonging to the PO4 group are similar to each other and well separated from other groups Also for HEM and FAD, to AMP ATP FAD FMN GLC HEM NAD PO4 TES some extent, similar considerations apply
Introduction Continuous Optimization Numerical Results Conclusions Binding sites alignment RED corresponds to high number of aligned atoms (good similarity) TES PO4 NAD HEM GLC FMN FAD ATP AMP Mostly red ares around the main diagonal. Proteins of the same class are correctly classified Proteins belonging to the PO4 group are similar to each other and well separated from other groups Also for HEM and FAD, to AMP ATP FAD FMN GLC HEM NAD PO4 TES some extent, similar considerations apply
Recommend
More recommend