Donovan N. Chin & R. Aldrin Denny
Traditional Drug Discovery (insert graph) In Silico Prediction of ADME (insert graph) ◦ Potency ◦ Absorption ◦ Lead ◦ Drug ◦ Toxicity ◦ Excretion ◦ Metabolism ◦ distribution
Target IVY(Brute force virtual screening of very large compound libraries) Lead Discovery IVY(Utilize predictive models from Biogen data for more efficient virtual screening) Lead Optimization candidate
(insert graph) ◦ Potency ◦ Lead ◦ Drug ◦ Toxicity ◦ Excretion ◦ Metabolism ◦ Distribution ◦ absorption
Goal: Identify crystallographic binding mode, Rank order ligands wrt binding with protein (insert graph) Receptor Docking Ligand Shape Generate plausible trial binding modes using docking function then Re-rank modes with scoring function
(insert graph) 341 Active 47 Non-Active
(insert graph) After filtering by Pharmacophore Feature
(insert graph)
(insert functions for) ◦ F_Score* ◦ D_Score ◦ G_Score ◦ PMF_Score ◦ Chem_Score ◦ ICM_Score*
Cell Adhesion Assay (50% Serum) ◦ (insert graph) Biochemical Adhesion Assay ◦ (insert graph) Scoring Functions Are Poor More Often Than Not
Receptor Site View Library Design FlexX Score Consensus Score>=3 e.g. Contact Map, CLogP MW, HBOND Rotatable bonds Consensus=5? if yes, substructure exists? if yes, Pharmacophore<4.2Å? if yes, Publish Hit Report
(insert graph)
Goal: Predict hit/miss class based on presence of features (fingerprints) Method ◦ Given a set of N samples ◦ Given that some subset A of them are good („active‟) Then we estimate for a new compound: P(good)~ A/N ◦ Given a set of binary features F For a given feature F: It appears in N samples It appears in A good samples Can we estimate: P(good l F)~A/N (Problem: Error gets worse as N small) ◦ P‟(good l F)= (A+P(good)k)/(n+k) P‟(good l F) p(good)as N 0 P‟(good l F) A/N as N large ◦ (If K=1/P(good) this is the Laplacian correction) Descriptors (insert) Advantages ◦ Can describe huge number of features (up to 4 billion; MDL 1024; Lead scope 27,000) ◦ Contains tertiary and stereochemistry information ◦ Fast
Classification Analysis ◦ Developing Non-Linear Scoring Functions to classify actives and non-actives ◦ (insert graphs) ◦ Cost Function to Minimize: Gini Impurity N= 1- Σ P^2( ω )
Training Set Prediction Success (insert table) 10-fold cross validation Randomly split training and test sets Significant Improvement in Separating Actives from Non-Actives
(insert graph) Significant Improvement in Finding Hits Using New SF
Optimal tree identified (insert graph) No random effects (insert graph)
(insert cluster) Able to identify different molecular property criteria that lead to hits
(insert graph)
(insert graph) Size= magnitude of OBA OBA values cover range of descriptor space
(insert graph) Choose 1 & 2D Descriptors for ease of interpretation and lower “noise”
Build Model (insert graphs) Apply Model
Features found in high OBA Features found in low OBA Would be nice if CART did similar view
Improved scoring functions for separating hits from non-hits in structure-based drug design developed with CART and Bayesian models Identified key differences in molecular physical properties that led to hits Built reasonably predictive OBA model (cannot expect method to extend to other systems given complexity of OBA, however)
Biogen IDEC Modeling ◦ Rajiah Denny ◦ Claudio Chuaqui ◦ Juswinder Singh ◦ Herman van Vlijmen ◦ Norman Wang ◦ Anuj Patel ◦ Zhan Deng Chemistry ◦ Kevin Guckian ◦ Dan Scott ◦ Thomas Durand-Reville ◦ Pat Conlon ◦ Charlie Hammond ◦ Chuck Jewell Pharmacology ◦ Tonika Bonhert
Recommend
More recommend