Towards Implementing Semantic Literature-Based Discovery with a - PowerPoint PPT Presentation

Towards Implementing Semantic Literature-Based Discovery with a Graph Database E-mail: dimitar.hristovski@gmail.com E-mail: dimitar.hristovski@gmail.com Dimitar Hristovski 1 , Andrej Kastrin 2 , Dejan Dinevski 3 , Thomas C. Rindesch 4 1 Faculty of Medicine, Ljubljana, Slovenia , 2 Faculty of Information Studies, Novo mesto, Slovenia; 3 Faculty of Medicine, Maribor, Slovenia; 4 National Library of Medicine, Bethesda, USA;

Text Mining • Information extraction: Extract structured information from unstructured documents. • Document summarization: Reduce documents to create a summary with most important to create a summary with most important parts. • Question-Answering: Automatically answer questions posed by humans. • Literature-based discovery

Literature-based Discovery (LBD) • Methodology for generating hypotheses by uncovering implicit relationships from existing knowledge

Swanson’s LBD • Raynaud‘s disease is associated with high blood viscosity • Fish oil has been shown to lead to reduction in blood viscosity blood viscosity

Representing Biomedical Knowledge as a Concept Graph • Nodes: biomedical concepts • Edges and/or arcs: relations between the concepts • Concept relations: • Concept relations: – Co-occurrences – semantic relations

From Documents to Concept Graph Citations SemRep MEDLINE Semantic Relations Aggregation & CSV Export SemMedDB Preparation Neo4j Cypher Load to Graph Database Queries for LBD

Extracting Semantic Relations with SemRep • SemRep is a natural language processing system that extracts semantic propositions from the biomedical research literature • Example: From “dexamethasone is a potent inducer of multidrug resistance-associated protein expression in rat hepatocytes“ SemRep extracts: hepatocytes“ SemRep extracts: – Dexamethasone STIMULATES Multidrug Resistence- Associated Proteins – Multidrug Resistance-Associated Proteins PART_OF Rats – Hepatocytes PART_OF Rats • SemMedDB - a mySQL database of extracted semantic relations from MEDLINE

Neo4j • A native graph database • Supports graph property data model • Has declarative query language Cypher - uses ASCII-Art to represent graph patterns From: http://dx.doi.org/10.1186/1742-4682-4-50

Export from SemMedDB • 52 616 158 semantic relation instances exported • CSV format

Aggregation and Loading with LOAD CSV LOAD CSV FROM ’semmed_sub_rel_obj.txt’ AS line WITH line MERGE (c1:Concept {cui: line[0]}) ON CREATE SET c1.name=line[1], c1.type=line[2], c1.freq=1 ON MATCH SET c1.freq = c1.freq + 1 ON MATCH SET c1.freq = c1.freq + 1 MERGE (c2:Concept {cui: line[4]}) ON CREATE SET c2.name=line[5], c2.type=line[6], c2.freq=1 ON MATCH SET c2.freq = c2.freq + 1 MERGE (c1)-[r:Relation {type:line[3]}]->(c2) ON CREATE SET r.freq = 1 ON MATCH SET r.freq = r.freq + 1;

Aggregation and Loading with Import Tool • Aggregation with AWK scripts • Preparation of import files with AWK scripts and shell utilities (e.g. join, sort, ...) • Stand alone batch import tool jexp • Stand alone batch import tool jexp (https://github.com/jexp/batch-import) • Import worked very fast

Results – Graph Database Size • 269 047 nodes (unique concepts) • 14 150 952 relationships between the nodes (aggregated from 52 616 158 relation instances) • 58 relationship types (e.g. TREATS, CAUSES, ...) • 58 relationship types (e.g. TREATS, CAUSES, ...) • 132 node labels used for semantic types

Implementing LBD with Cypher • Most general LBD • Finding novel treatments • Generic “inhibit the cause of the disease” discovery pattern discovery pattern • More specific version of “inhibit the cause of the disease”

Most General LBD MATCH (x:Concept)--(y:Concept)--(z:Concept) WHERE NOT (x)--(z) RETURN x, y, z;

General Query for Finding Novel Treatments MATCH (drug:Concept:phsu)-[r1]->(y) -[r2]->(disease:Concept:dsyn) WHERE NOT (drug)-[:TREATS]->(disease) RETURN drug, disease, count(y) AS y_count RETURN drug, disease, count(y) AS y_count DESC;

“Inhibit the Cause of the Disease” Discovery Pattern MATCH (drug:phsu)-[:INHIBITS]-> (gene:gngm)-[:CAUSES]-> (disease:dsyn) WHERE NOT (drug)-[:TREATS]->(disease) RETURN drug, gene, disease;

Visualization of the Last Query

Discussion • Challenges when loading into Neo4j • Indexing confusion in Neo4j • Fast performance with a small number of starting nodes starting nodes • Unpredictable performance with large number of starting nodes or when aggregation required

Future Work • Performance evaluation and comparison: speed and storage • Compare with: relational database(s) (e.g. mySQL), triple store (e.g. Virtuoso) mySQL), triple store (e.g. Virtuoso) • Develop web application

Conclusions • Graph database Neo4j suitable for representing biomedical knowledge needed for semantic LBD • Query language Cypher is (relatively) easy to • Query language Cypher is (relatively) easy to express LBD discovery patterns

More Specific Version of “Inhibit the Cause of the Disease” MATCH (drug:Concept:phsu)-[:ISA]-> (m:Concept {name:"Antipsychotic Agents"}) WITH drug MATCH (drug)-[:INHIBITS]-> MATCH (drug)-[:INHIBITS]-> (gene:gngm)-[:CAUSES]->(s:neop) WHERE NOT (drug)-[:TREATS]->(s) RETURN drug, count(distinct gene), count(distinct s);

Towards Implementing Semantic Literature-Based Discovery with a - PowerPoint PPT Presentation

Towards Implementing Semantic Literature-Based Discovery with a Graph Database E-mail: dimitar.hristovski@gmail.com E-mail: dimitar.hristovski@gmail.com Dimitar Hristovski 1 , Andrej Kastrin 2 , Dejan Dinevski 3 , Thomas C. Rindesch 4 1 Faculty

UNESCO Discovery Centre reference image of education space UNESCO Discovery Centre Discovery

OIB class of 2020 10th grade LV1 3 h H-G Literature 4 h 2 h (+2 h French) 11th grade

Creating Semantic Mashups: Bridging Web 2.0 and the Semantic Web Jamie Taylor, Colin Evans, Toby

: on the Semantic Web : on the Semantic Web Building a Semantic Prototype for Danish Building a

Semantic Processing Augmenting CFGs Currying Quantifier scope Semantic Grammars L445 / L545

Align, Disambiguate, and Walk A Unified Approach for Measuring Semantic Similarity Semantic

Semantic Similarity MultiJEDI ERC 259234 Semantic Similarity Semantic Similarity Mostly

Module 13 Introduction to Semantic Technology, Ontologies and the Semantic Web Module 13 Outline

Tricks for Statistical Semantic Tricks for Statistical Semantic Knowledge Discovery: Knowledge

Literature survey The aim of a literature review (sometimes called a literature survey) is to

One Page Everywhere Fluid, Responsive Design with Semantic.gs The Semantic Grid System Grid

RDF, RDFS and OWL: Graph Data Models for the Semantic Web Semantic Web: The Idea Semantic

Motivation Bootstrapping Semantic Lexicons A semantic lexicon contains semantic category Ex:

Semantic Web 2008 Se a t c eb 008 Semantic Web ca. 2008 S ti W b 2008 Semantic Web

Lecture 1: Semantic Web and RDF Aidan Hogan aidhog@gmail.com THE WEB The Web is now 26 years

Application: Semantic Role Labeling CS 6956: Deep Learning for NLP Overview What is semantic

Discord Bot CHRIS L Discord What is it? Why does it need bots? Existing bots Why

Chapter 6 : Computer Science Class XI ( As per CBSE Board) Python Fundamentals New Syllabus

A Tool for Packaging and Exchanging Simulation Results Dragan Savi [e-mail:

The Challenge of Connected Data Dr. Jim Webber Chief Scientist, Neo Technology @jimwebber

Object Oriented Programming in Action Object Oriented Analysis and Design Find and define

IODEF Data Model Status (changes from 02 to 03) <draft-ietf-inch-iodef-03> tracked @

5. CVS Commit/merge (+log messages). Differences between versions. Code traceability

GAMES A game designed for a purpose other than pure entertainment Applying game mechanics to

Sambuz

Useful Links

Newsletter

Mail Us

Towards Implementing Semantic Literature-Based Discovery with a - PowerPoint PPT Presentation

Towards Implementing Semantic Literature-Based Discovery with a Graph Database E-mail: dimitar.hristovski@gmail.com E-mail: dimitar.hristovski@gmail.com Dimitar Hristovski 1 , Andrej Kastrin 2 , Dejan Dinevski 3 , Thomas C. Rindesch 4 1 Faculty

UNESCO Discovery Centre reference image of education space UNESCO Discovery Centre Discovery

OIB class of 2020 10th grade LV1 3 h H-G Literature 4 h 2 h (+2 h French) 11th grade

Creating Semantic Mashups: Bridging Web 2.0 and the Semantic Web Jamie Taylor, Colin Evans, Toby

: on the Semantic Web : on the Semantic Web Building a Semantic Prototype for Danish Building a

Semantic Processing Augmenting CFGs Currying Quantifier scope Semantic Grammars L445 / L545

Align, Disambiguate, and Walk A Unified Approach for Measuring Semantic Similarity Semantic

Semantic Similarity MultiJEDI ERC 259234 Semantic Similarity Semantic Similarity Mostly

Module 13 Introduction to Semantic Technology, Ontologies and the Semantic Web Module 13 Outline

Tricks for Statistical Semantic Tricks for Statistical Semantic Knowledge Discovery: Knowledge

Literature survey The aim of a literature review (sometimes called a literature survey) is to

One Page Everywhere Fluid, Responsive Design with Semantic.gs The Semantic Grid System Grid

RDF, RDFS and OWL: Graph Data Models for the Semantic Web Semantic Web: The Idea Semantic

Motivation Bootstrapping Semantic Lexicons A semantic lexicon contains semantic category Ex:

Semantic Web 2008 Se a t c eb 008 Semantic Web ca. 2008 S ti W b 2008 Semantic Web

Lecture 1: Semantic Web and RDF Aidan Hogan aidhog@gmail.com THE WEB The Web is now 26 years

Application: Semantic Role Labeling CS 6956: Deep Learning for NLP Overview What is semantic

Discord Bot CHRIS L Discord What is it? Why does it need bots? Existing bots Why

Chapter 6 : Computer Science Class XI ( As per CBSE Board) Python Fundamentals New Syllabus

A Tool for Packaging and Exchanging Simulation Results Dragan Savi [e-mail:

The Challenge of Connected Data Dr. Jim Webber Chief Scientist, Neo Technology @jimwebber

Object Oriented Programming in Action Object Oriented Analysis and Design Find and define

IODEF Data Model Status (changes from 02 to 03) &lt;draft-ietf-inch-iodef-03&gt; tracked @

5. CVS Commit/merge (+log messages). Differences between versions. Code traceability

GAMES A game designed for a purpose other than pure entertainment Applying game mechanics to

Sambuz

Useful Links

Newsletter

Mail Us

IODEF Data Model Status (changes from 02 to 03) <draft-ietf-inch-iodef-03> tracked @