lhasa trusted community knime nodes
play

Lhasa trusted community KNIME nodes Data processing and metabolism - PowerPoint PPT Presentation

Lhasa trusted community KNIME nodes Data processing and metabolism prediction Dr Samuel Webb samuel.webb@lhasalimited.org Who am I? Working within the Research Group at Lhasa Limited Activities include: Software tool development


  1. Lhasa trusted community KNIME nodes Data processing and metabolism prediction Dr Samuel Webb samuel.webb@lhasalimited.org

  2. Who am I? • Working within the Research Group at Lhasa Limited • Activities include: • Software tool development • Data mining • Algorithm development • Managing Lhasa’s internal KNIME nodes and build • Managing Lhasa’s open source KNIME contribution

  3. What is KNIME? “Our KNIME Analytics Platform is the leading open solution for data-driven innovation, designed for discovering the potential hidden in data, mining for fresh insights, or predicting new futures. Organizations can take their collaboration, productivity and performance to the next level with a robust range of commercial extensions to our open source platform.” – www.knime.org/about • Analytics platform • Core software open source • Software development kit (SDK) makes it easy to develop your own nodes

  4. KNIME and cheminformatics • Large number of downloads for the community plugins • Large number of community developers • Some examples of node types: • Chemical engines : ChemAxon, RDKit, CDK and Indigo • General purpose and algorithms : Vernalis, Enalos and Lhasa • Data searches : CIR and EMBL-EBI

  5. What does Lhasa use KNIME for? • Data processing: • Combining datasets: find overlap, compare activities when overlap exists, join in data where no overlap exists… • Monitoring: • Extracting data from a the database which has been altered identifying review work content • (Q)SAR • Model building, clustering, algorithm development, applicability domains, chemical space investigation….

  6. Free, open source plugins released LHASA CONTRIBUTION TO KNIME

  7. What have we released? General nodes Metabolism nodes • Data manipulation • SMARTCyp 2.4.2 • • Discretise Cytochrome P450 site of metabolism predictor • Model scoring • Integration of Patrick Rydberg’s • open source tool Binary Scorer • Binned performance • WhichCyp 1.2 • Result • Prediction of binding to Cytochrome • Table manipulation P450 isoform(s) • Integration of Patrick Rydberg’s • Dumb Joiner (to be deprecated) open source tool • Row Splitter (col+) • Table to HTML

  8. Disclaimer • These nodes / plugins are not Lhasa Limited products • Help / support for these nodes is provided via: • The KNIME forum: https://tech.knime.org/forum/lhasa-nodes • knime@lhasalimited.org (preferable to use the KNIME forum)

  9. More information • https://tech.knime.org/lhasa-nodes-for-knime

  10. Why would you use these nodes? Convert the performance table to HTML and email Here we calculate the performance of the Random Forest with Morgan Filter out rows where and MACCS fingerprints either model predict active

  11. Why would you use these nodes? Convert the performance table to HTML and email Here we calculate the performance of the Random Forest with Morgan Filter out rows where and MACCS fingerprints either model predict active

  12. Generic nodes: model performance • Similar functionality to the Scorer node • Calculates various performance metrics for binary classification models • Can choose multiple prediction columns

  13. Generic nodes: table to HTML • Convert a table to a single HTML cell • The String render will render HTML tags • Select which columns to include • StringValue, IntValue, DoubleValue • Creates a single cell output

  14. SMARTCyp 2.4.2 • SMARTCyp is a method for prediction of which sites in a molecule that are most liable to metabolism by Cytochrome P450. • It has been shown to be applicable to metabolism by the isoforms 1A2, 2A6, 2B6, 2C8, 2C19, 2E1 , and 3A4 , and specific models for the isoform 2C9 and isoform 2D6 are included in KNIME 2.4.2 • SMARTCyp is Developed by the Department of Drug Design and Pharmacology at the University of Copenhagen and is funded by Lhasa Limited. More details can be found at: http://www.farma.ku.dk/smartcyp/about.php

  15. SMARTCyp 2.4.2 usage • Let’s recreate the results table from • http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4055970/ • SMARTCyp: A 2D Method for Prediction of Cytochrome P450-Mediated Drug Metabolism • Patrik Rydberg, † David E. Gloriam, † Jed Zaretzki, ‡ Curt Breneman, ‡ and Lars Olsen* † • Metabolic position = any site listed as primary, secondary or tertiary • Use the top 3 predicted sites. Accuracy increases as you increase the rank limit • When considering only the top ranked site there is a 65% accuracy in identifying an experimentally seen SOM vs 81% using the top 3 sites

  16. SMARTCyp 2.4.2 usage

  17. SMARTCyp 2.4.2 usage

  18. SMARTCyp 2.4.2 usage

  19. SMARTCyp 2.4.2 usage

  20. SMARTCyp 2.4.2 usage • Here we’ve incorporated multiple chemical engines from the same platform • RDKit • Rendering • CDK • Rendering • SMARTCyp processing

  21. WhichCyp • Predicts binding to Cyp isoforms: 1A2, 2C9, 2C19, 2D6 and 3A4. • Further reading: • Michal Rostkowski, Ola Spjuth and Patrik Rydberg. WhichCyp: Prediction of Cytochromes P450 Inhibition , Bioinformatics, 2013 , 29, 2051-2052

  22. WhichCyp usage • Renders images of the predictions as a PNG • May be updated to SVG in the future • Input: a structure column that is compatible with a CDK Value such as: • Mol • SDF • Smiles • CDK • Outputs the values you would get in the CSV file when running manually: • Binding, Missing Signatures and sensitivity warnings

  23. WHERE CAN I GET THEM?

  24. Getting our nodes: • Download KNIME: https://www.knime.org/downloads/overview • Selecting + all free extensions and Lhasa’s will be included

  25. Getting our nodes: • Alternatively they can be added to an existing KNIME • Trusted Community Contributions - http://update.knime.org/community-contributions/trusted/3.1

  26. Thank you Support: https://tech.knime.org/forum

Recommend


More recommend