Center for Bioinformatics Tübingen Incorporating Molecular Flexibility into Three- Dimensional Structural Kernels Andreas Jahn 4. German Conference on Chemoinformatics 10.11.2008 Goslar Computer Science Department • Computer Architecture • Prof. Zell
Introduction & Motivation “… enzyme and substrate must fit each other like a lock and key.” (Emil Fischer, 1894) “Form follows function.” (Louis H. Sullivan, 1896) � Activity is a function of the 3D structure. • 3D structure is not unique due to the flexibility of the compounds. • What are the possible 3D structures? • Possible solution: Conformational sampling • But: Time-consuming & non deterministic � Try to encode the flexibility and possible shape into the data structure. Andreas Jahn 2
Basics Optimal Assignment Kernel source; QSAR Comb. Sci., 2006 , 25, 4, 317-323 ( , ) ( , ) = Κ w i j i j RBF • Atom-based similarity measure + • RBF kernel calculates local γ Κ ( , ) + i1 j1 1 RBF ( , ) atom similarity using atom and γ Κ i2 j1 1 RBF + bond descriptors ( , ) γ Κ + i3 j2 � • Incorporates the local 2 RBF neighbourhood • Atom-wise similarity acts as weight of an edge in complete bipartite graph. • Choose edges that maximize the sum of the edges. Andreas Jahn 3
Basics Two problems of the Optimal Assignment Kernel source; J.-P. Vert, Technical Report HAL-002182 78, 2008 The Optimal Assignment Kernel is not a valid kernel function. Κ ← Κ − λ Ι � Fix the kernel matrix with min No consideration of the flexibility and the shape of the structures. Andreas Jahn 4
Methods - Overview Two different methods were implemented • OAK FLEX • Encode the neighbourhood flexibility space relative to an atom. • Determine the similarity of the flexibility space. • Incorporate the similarity of the flexibility space into the Optimal Assignment Kernel. • Rigid Superposition • Identify rigid scaffolds of the structures. • Superposition of rigid fragments and determine a similarity score. • Integrate the similarity score into the Optimal Assignment Kernel. Andreas Jahn 5
Rigid Superposition • Rule-based expert system identifies rigid scaffolds of the structures. • Calculate all pairwise similarity values. • Calculate the optimal assignment of the fragments. 23.662 Fragment #1 Fragment #1 Molecule b Similarity of Fragments F #1 F #2 F #1 23.662 8.676 Molecule a F #2 6.599 19.262 Fragment #2 Fragment #2 19.262 Molecule a Molecule b Andreas Jahn 6
Rigid Superposition • Superposition of the assigned fragments. • Calculate a similarity score based on the overlap volume. • Integrate information into the Optimal Assignment Kernel. Andreas Jahn 7
OAK FLEX Encode the neighbourhood flexibility space relative to an atom • The flexibility space results from rotatable bonds. � All single bonds outside of a ring generate flexibility spaces. • Flexibility space of the whole molecule is important. � For each atom the relative flexibility space has to be enumerated. • Flexibility spaces have to be comparable. � Unique parameterisation of the space necessary. Andreas Jahn 8
OAK FLEX Flexibility space and the unique parameterisation • Core atom acts as origin. • Parameterisation of the flexibility relative to core atom. • 1 st order rotation is parameterised by d1 and r1. Neighbour n1 Rotatable bond Neighbour n2 Core atom Andreas Jahn 9
OAK FLEX Enumeration of the 1 st order rotations • Depth-limited search with limited depth of 2. • Prune subtrees after rigid bond. 5 9 3 6 4 10 1 11 2 7 8 Andreas Jahn 10
OAK FLEX Extension to the 2 nd order rotation • Unique parametrization by M1, M2, r2 and h. • Additional flag necessary for case differentiation. h Core atom Andreas Jahn 11
OAK FLEX Different cases of the 2 nd order rotation • Both cases are special cases of the 1 st order rotation. � Only two parameters and the flag are necessary. Core atom Core atom Atom n1 Andreas Jahn 12
OAK FLEX Enumeration of the 2 nd order rotations • Depth-limited search with limited depth of 3. • Prune subtrees after 2 rigid bonds. Andreas Jahn 13
OAK FLEX Similarity calculation of two flexibility spaces • RBF kernel based on the parameters. • Individual σ to adjust weight of the parameter 2 2 − − ( ) ( ) d d r r − + 2 σ 2 σ = Similarity e d r Core atom Core atom d r Parameters: d r Parameters: Andreas Jahn 14
OAK FLEX Comparison of the flexibility spaces of two core atoms • Atoms have list of flexibility spaces. • But: Only one similarity value is needed. � Calculate similarity matrix and use optimal assignment. #1 #1 Similarity #1 #2 #3 #1 RBF(#1,#1) RBF(#1,#2) RBF(#1,#3) #2 #2 RBF(#2,#1) RBF(#2,#2) RBF(#2,#3) #2 Normalize similarity value #3 ( , ) k a b Atom a ( , ) ← k a b ( , ) ( , ) k a a k b b Atom b Andreas Jahn 15
OAK FLEX Overview of the calculation steps Atom A 1 st R. Atom B 1 st R. Atom A 2 nd R. Atom B 2 nd R. RBF-Kernel RBF-Kernel OAK Matrix 1 st R. Matrix 2 nd R. Local atom similarity Hungarian Hungarian Matrix Method Method 2 nd R. similarity 1 st R. similarity Flex-Matrix matrix matrix Hungarian Method Normalisation Kernel value Andreas Jahn 16
Results • Methods evaluated on 8 QSAR datasets compiled by source; J. Med. Chem., 2004 , 47, 22, 5541-5554 Sutherland et al. • Using ε -SVR to build models. • Seeded 10-fold multirun • Equal folds for both methods � Comparison of the methods possible • 100 multiruns generate 1000 MSE values. • Each value is considered as a sample of a Gaussian distribution. � Paired Wilcoxon signed-rank test determines significant shifts of the mean. : µ = µ H • Hypotheses for the test: 0 OAK OAK FLEX : µ > µ H 1 OAK OAK FLEX Andreas Jahn 17
Results OAK OAK FLEX Dataset MSE Q 2 MSE Q 2 p-value ACE α 1.52 ± 0.63 0.71 ± 0.13 0.98 1.48 ± 0.61 0.73 ± 0.13 AchE β 0.86 ± 0.36 0.48 ± 0.21 0.80 ± 0.30 0.52 ± 0.19 0.02 BZR γ 0.67 ± 0.30 0.48 ± 0.19 < 0.001 0.58 ± 0.25 0.54 ± 0.17 COX2 δ 1.02 ± 0.31 0.51 ± 0.13 0.97 ± 0.27 0.53 ± 0.12 0.001 DHFR ε 0.64 ± 0.19 0.71 ± 0.08 0.60 ± 0.17 0.73 ± 0.08 0.001 GPB ζ 0.55 ± 0.33 0.59 ± 0.25 0.089 0.53 ± 0.37 0.60 ± 0.29 THER η 1.64 ± 0.96 0.64 ± 0.21 0.149 1.56 ± 1.00 0.66 ± 0.22 THR θ 0.47 ± 0.26 0.57 ± 0.25 0.42 ± 0.24 0.59 ± 0.24 0.022 α Angiotensine Converting Enzyme, β Acetylcholinesterase, γ Benzodiazepine Receptor, δ Cyclooxygenase II, ε Dihydrofolate Reductase, ζ Glycogen Phosphorylase B, η Thermolysin, θ Thrombin Andreas Jahn 18
Computation time Comparison of the avg. runtime • Overhead between 16% and 70%. • Overhead correlates with the flexibility of the molecules. Dataset ACE AchE BZR COX2 DHFR GPB THERM THR Ø OAK (ms) 5.3 6.7 4.8 6.2 6.0 4.7 7.2 11.7 Ø OAK FLEX (ms) 7.9 8.5 5.6 8.5 8.2 6.9 12.3 18.0 Factor 1.49 1.26 1.16 1.37 1.36 1.46 1.70 1.53 Ring atoms 35% 71% 77% 66% 58% 36% 24% 51% Andreas Jahn 19
Discussion • Interpretation of kernel models are difficult. • But: Visualization of the mappings disclose differences. OAK OAK FLEX Andreas Jahn 20
Conclusion • Method incorporates molecular flexibility for similarity calculations. • Significant performance gain in 5 of 8 QSAR datasets. • Type of encoding the flexibility not suitable for all datasets. (ACE: Quality of the model decreased) • Publication: Fechner, N.; Jahn, A.; Hinselmann, G.; Zell, A. Journal of Chemical Information and Modeling, in revision. Andreas Jahn 21
Center for Bioinformatics Tübingen Acknowledgement I thank Nikolas Fechner, Georg Hinselmann and Andreas Zell. Computer Science Department • Computer Architecture • Prof. Zell
Center for Bioinformatics Tübingen Thank you for your attention Computer Science Department • Computer Architecture • Prof. Zell
OAK FLEX Performance tuning ( ) ( ) 3 Ο a + b • Hungarian method: • Performance problem due to high number of calculations. � Implementation of a greedy heuristic to reduce computational cost. 1 st order rotations 2 nd order rotations Heuristic Hungarian Heuristic Hungarian Ø sum 2,079 2,091 2,709 2,18 Difference 0,561% 0,008% Andreas Jahn 24
Recommend
More recommend