RDKit (cheminformatics) Neo4j Integration Mentors: Christian - PowerPoint PPT Presentation

RDKit (cheminformatics) Neo4j Integration Mentors: Christian Pilger (BASF) Presenter - Evgeny Sorokin Greg Landrum (RDKit) Stefan Armbruster (Neo4j)

Motivation Neo4j = useful tool to map knowledge ● Chemical/pharmaceutical R&D: ● Required: mapping data of completely different nature (recipe, process, application ○ test, chemical structures) Knowledge graphs are frequently a good choice here over other data models ○ Problem: Neo4j does not support chemical structures ○ RDKit ● is a widely used Open Source tool to deal with chemical structures ○ has proven its value in conjunction with Postgres ○ Idea: enrich Neo4j's capabilities by combining it with RDKit => GSoC project ●

Chemical structure representation Not intended : “dissolve” atoms and bond as nodes and relations into the graph! ● Intended : use available structure representation as node properties ● SMILES format: c1ccccc1 (single line ASCII representation --> exact search via string matching) ○ MOL format: (3D coordinates: richer format, more details --> advantages in sub-structure searches) ○ name: benzene formula: C 6 H 6 SMILES: c1ccccc1

Chemical structures example name: benzene formula: C 6 H 6 SMILES: c1ccccc1

Requirements Basic Functionality : ● Exact chemical search (“find the molecule benzene") ○ Chemical substructure search (“find all molecules that contain a benzene moiety") ○ Typical application scenarios in Graph context ● Find entry points into the graph ○ Filter paths during graph traversal with chemical structure conditions ○

How was it implemented - storage in a graph A new node with labels :Chemical:Structure is processed by RDKit event handler ● From either smiles or mdlmol property a list of 7-8 properties is created per node ● A full text index is created for fingerprint property ● canonical_smiles ● inchi ● formula ● molecular_weight ● fp - bit-vector fingerprint ● fp_ones - count of positive bits ● mdlmol ● smiles [optional] ●

How was it implemented - exact search Simple case: compare two canonical smiles with each other, find a match. SMILES O=S(=O)(Cc1ccccc1)CS(=O)(=O)Cc1ccccc1 Canonical SMILES O=S(=O)(CC1=CC=CC=C1)CS(=O)(=O)CC1=CC=CC=C1

How was it implemented - SSS Chemical fingerprint is a unique pattern for the presence of a particular molecule. Bitvector and count-based fingerprints

How was it implemented - SSS 1. Each of the structures is encoded as bitvector fingerprint 2. Bitvectors are transformed into a string of positive indexes 3. Fulltext index is applied to transformed bitvectors (numbers -> words) 4. Search is done with constraints regarding specific properties.

How was it implemented - SSS

Chemical reactions’ relationships

What are possible applications 2.) expand path apoc.path.expand

Resources ● https://github.com/rdkit/neo4j-rdkit ● https://www.rdkit.org/UGM/2012/Landrum_RDKit_UGM.Fingerprints.Final.pptx.pdf ● https://www.rdkit.org/ ● https://neo4j.com/docs/cypher-manual/current/schema/index/ ● http://tiny.cc/mol_block_definition ● @evgerher via telegram

Hunger games Q&A 1) Hard : what format can resolve a situation when chemical structure has chirality property (ex.: Lactic acid )? a) SMILES b) MOL block c) All of above 2) Medium : what is the difference between bitvector and count-based fingerprints? a) Harder to store b) Does not support similarity search c) Does not keep track of occurence amount 3) Easy : transformation of bitvector [1 0 1 0 1 1 0 0 1] is: a) “1 3 5 6 9” b) “2 4 6 7 9” c) “1 3 5 6 9”

RDKit (cheminformatics) Neo4j Integration Mentors: Christian - PowerPoint PPT Presentation

RDKit (cheminformatics) Neo4j Integration Mentors: Christian Pilger (BASF) Presenter - Evgeny Sorokin Greg Landrum (RDKit) Stefan Armbruster (Neo4j) Motivation Neo4j = useful tool to map knowledge Chemical/pharmaceutical R&D:

An Introduc/on to Neo4j @iansrobinson ian.robinson@neotechnology.com #neo4j Neo4j

Data Integration for Neo4j using Kettle Matt Casters, matt.casters@neo4j.com mattcasters Neo4j

Stefan Plantikow, Neo4j 2017 Stefan Plantikow, Neo4j 2 2017 Stefan Plantikow, Neo4j

Neosemantics - A Linked Data Toolkit for Neo4j Jess Barrasa - Neo4j Jess Barrasa

All-new SDN-RX: Reactive Spring Data Neo4j Spring Data Neo4j / Neo4j-OGM Team Michael Simons

Intro to Neo4j for Developers Jennifer Reif Developer Relations Engineer, Neo4j

Many Features, Few Samples: Many Features, Few Samples: From cheminformatics cheminformatics to

Django and Neo4j Domain modeling that kicks ass! twitter: @thobe / #neo4j Tobias Ivarsson

Causal Consistency For Large Neo4j Clusters Jim Webber Chief Scientist, Neo4j QCon London Leads

Neo4j Spatial - GIS for the rest of us. OSCON Data 2011 #neo4j Peter Neubauer @peterneubauer

Building Spatial Search Algorithms for Neo4j Craig Taverner Neo4j Cypher and Spatial

Neo4j and Spring Data Going from relational databases to databases with relations Michael

Best of Cheminformatics and Biologics in Data Management ChemAxon Fingerprint Our success

Building a real-time recommendation engine with Neo4j OSCON 2017 William Lyon @lyonwj William

#NODES #2k19 Earth (Milky Road), 10/10/2019 larus-ba.it/neo4j @AgileLARUS Agenda Agenda

Understanding Trolls with Efficient Analytics of Large Graphs in Neo4j David Allen, Amy

Online Trust and Digital Certificates: Tech Tutorial Edward W. Felten Professor of Computer

with GPU in Hybrid Storage Systems Prince Hamandawana, Awais Khan, Changgyu Lee , Sungyong Park,

Clock Around the Clock Time-Based Device Fingerprinting Iskander Sanchez-Rola, Igor Santos,

Need for Classification Classification required To isolate traffic of interest

Beauty and the Beast: Diverting modern web browsers to build unique browser fingerprints Pierre

TLS Fingerprinting Techniques Zlatina Gancheva advised by Patrick Sattler, Lars Wstrich Friday

Computer and Information Security Fall 2019 User Authentication and Access Control Tyler Bletsch

Morellian Analysis for Browsers: Making Web Authentication Stronger With Canvas Fingerprinting

Sambuz

Useful Links

Newsletter

Mail Us