Lhasa trusted community KNIME nodes Data processing and metabolism prediction Dr Samuel Webb samuel.webb@lhasalimited.org
Who am I? • Working within the Research Group at Lhasa Limited • Activities include: • Software tool development • Data mining • Algorithm development • Managing Lhasa’s internal KNIME nodes and build • Managing Lhasa’s open source KNIME contribution
What is KNIME? “Our KNIME Analytics Platform is the leading open solution for data-driven innovation, designed for discovering the potential hidden in data, mining for fresh insights, or predicting new futures. Organizations can take their collaboration, productivity and performance to the next level with a robust range of commercial extensions to our open source platform.” – www.knime.org/about • Analytics platform • Core software open source • Software development kit (SDK) makes it easy to develop your own nodes
KNIME and cheminformatics • Large number of downloads for the community plugins • Large number of community developers • Some examples of node types: • Chemical engines : ChemAxon, RDKit, CDK and Indigo • General purpose and algorithms : Vernalis, Enalos and Lhasa • Data searches : CIR and EMBL-EBI
What does Lhasa use KNIME for? • Data processing: • Combining datasets: find overlap, compare activities when overlap exists, join in data where no overlap exists… • Monitoring: • Extracting data from a the database which has been altered identifying review work content • (Q)SAR • Model building, clustering, algorithm development, applicability domains, chemical space investigation….
Free, open source plugins released LHASA CONTRIBUTION TO KNIME
What have we released? General nodes Metabolism nodes • Data manipulation • SMARTCyp 2.4.2 • • Discretise Cytochrome P450 site of metabolism predictor • Model scoring • Integration of Patrick Rydberg’s • open source tool Binary Scorer • Binned performance • WhichCyp 1.2 • Result • Prediction of binding to Cytochrome • Table manipulation P450 isoform(s) • Integration of Patrick Rydberg’s • Dumb Joiner (to be deprecated) open source tool • Row Splitter (col+) • Table to HTML
Disclaimer • These nodes / plugins are not Lhasa Limited products • Help / support for these nodes is provided via: • The KNIME forum: https://tech.knime.org/forum/lhasa-nodes • knime@lhasalimited.org (preferable to use the KNIME forum)
More information • https://tech.knime.org/lhasa-nodes-for-knime
Why would you use these nodes? Convert the performance table to HTML and email Here we calculate the performance of the Random Forest with Morgan Filter out rows where and MACCS fingerprints either model predict active
Why would you use these nodes? Convert the performance table to HTML and email Here we calculate the performance of the Random Forest with Morgan Filter out rows where and MACCS fingerprints either model predict active
Generic nodes: model performance • Similar functionality to the Scorer node • Calculates various performance metrics for binary classification models • Can choose multiple prediction columns
Generic nodes: table to HTML • Convert a table to a single HTML cell • The String render will render HTML tags • Select which columns to include • StringValue, IntValue, DoubleValue • Creates a single cell output
SMARTCyp 2.4.2 • SMARTCyp is a method for prediction of which sites in a molecule that are most liable to metabolism by Cytochrome P450. • It has been shown to be applicable to metabolism by the isoforms 1A2, 2A6, 2B6, 2C8, 2C19, 2E1 , and 3A4 , and specific models for the isoform 2C9 and isoform 2D6 are included in KNIME 2.4.2 • SMARTCyp is Developed by the Department of Drug Design and Pharmacology at the University of Copenhagen and is funded by Lhasa Limited. More details can be found at: http://www.farma.ku.dk/smartcyp/about.php
SMARTCyp 2.4.2 usage • Let’s recreate the results table from • http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4055970/ • SMARTCyp: A 2D Method for Prediction of Cytochrome P450-Mediated Drug Metabolism • Patrik Rydberg, † David E. Gloriam, † Jed Zaretzki, ‡ Curt Breneman, ‡ and Lars Olsen* † • Metabolic position = any site listed as primary, secondary or tertiary • Use the top 3 predicted sites. Accuracy increases as you increase the rank limit • When considering only the top ranked site there is a 65% accuracy in identifying an experimentally seen SOM vs 81% using the top 3 sites
SMARTCyp 2.4.2 usage
SMARTCyp 2.4.2 usage
SMARTCyp 2.4.2 usage
SMARTCyp 2.4.2 usage
SMARTCyp 2.4.2 usage • Here we’ve incorporated multiple chemical engines from the same platform • RDKit • Rendering • CDK • Rendering • SMARTCyp processing
WhichCyp • Predicts binding to Cyp isoforms: 1A2, 2C9, 2C19, 2D6 and 3A4. • Further reading: • Michal Rostkowski, Ola Spjuth and Patrik Rydberg. WhichCyp: Prediction of Cytochromes P450 Inhibition , Bioinformatics, 2013 , 29, 2051-2052
WhichCyp usage • Renders images of the predictions as a PNG • May be updated to SVG in the future • Input: a structure column that is compatible with a CDK Value such as: • Mol • SDF • Smiles • CDK • Outputs the values you would get in the CSV file when running manually: • Binding, Missing Signatures and sensitivity warnings
WHERE CAN I GET THEM?
Getting our nodes: • Download KNIME: https://www.knime.org/downloads/overview • Selecting + all free extensions and Lhasa’s will be included
Getting our nodes: • Alternatively they can be added to an existing KNIME • Trusted Community Contributions - http://update.knime.org/community-contributions/trusted/3.1
Thank you Support: https://tech.knime.org/forum
Recommend
More recommend