Learning Links in MeSH Co-occurrence Network Preliminary Results - PowerPoint PPT Presentation

Learning Links in MeSH Co-occurrence Network Preliminary Results Andrej Kastrin 1 , Thomas C. Rindflesch 2 and Dimitar Hristovski 3 andrej.kastrin@gmail.com dimitar.hristovski@gmail.com 1 Faculty of Information Studies, Novo mesto, Slovenia 2 Lister Hill National Center for Biomedical Communications, National Library of Medicine, Bethesda, MD, USA 3 Institute of Biostatistics and Medical Informatics, Faculty of Medicine, University of Ljubljana, Ljubljana, Slovenia MIE 2014, Istanbul, Turkey

Literature-Based Discovery • Find implicit relations between entities. • Propose implicit relations as potential scientific hypoteses. • Swanson’s XYZ model: • Relations XY and YZ are known • Implicit relation XZ is (putative) new discovery Y X Z 2/19

Swanson’s Example • Blood viscosity was found to co-occur with Raynaud’s disease. • Fish oil reduces blood viscosity. • Fish oil was proposed as a new treatment for Raynaud’s disease. High blood viscosity Y X Z Fish oil Raynaud’s disease 3/19

Literature-Based Discovery as Link Prediction Problem • We can model biomedical literature as a network of biomedical concepts. • Link prediction refers to the prediction of future links between concepts that are not directly connected in the current snapshot of a network. Y X Z 4/19

MEDLINE/PubMed www.ncbi.nlm.nih.gov/pubmed 5/19

Medical Subject Headings (MeSH) • MeSH is the source of nodes for our network. • MeSH is a comprehensive controlled vocabulary for indexing in the life sciences. • The 2013 version of MeSH contains 26 853 descriptors. • Every article in MEDLINE/PubMed is indexed with about 10-15 descriptors. • Some descriptors are designated (*), indicating the article’s major topic. 6/19

MeSH Terms as Used to Describe a Paper PMID- 20091016 TI - Chi-square-based scoring function for... AB - OBJECTIVES: Text categorization has been used... MH - Access to Information MH - Algorithms MH - Artificial Intelligence MH - Bayes Theorem MH - *Chi-Square Distribution MH - Data Collection MH - Data Interpretation, Statistical MH - *Data Mining MH - Humans MH - *MEDLINE MH - Medical Informatics MH - *Natural Language Processing 7/19

Methods • We have a training network G [ t 1 , t 2 ] which contains interactions among nodes that take place in the time interval [ t 1 , t 2 ] . • We have a test network G [ t 3 , t 4 ] which contains interactions among nodes that take place in the time interval [ t 3 , t 4 ] . • Learning (prediction) task: provide a list of edges that are present in the test network, but absent in the training network. Training network Test network D D H H B B A A F F C C G G E E 8/19

Data Collection • We constructed two networks: • Training network [2003-2007] • Test network [2008-2012] • Networks were post-processed to remove non-informative edges. • We applied χ 2 test for independence for each co-occurrence pair to obtain a statistic which indicates whether a particular pair occurs together more often than by chance. 9/19

Similarity Measures for Link Prediction • For each node pair ( u , v ) we calculate a similarity score s ( u , v ) . • Score s ( u , v ) gives the likelihood of link formation between nodes u and v . • We used two similarity measures: • Jaccard coefficient s uv = | Γ( u ) ∩ Γ( v ) | | Γ( u ) ∪ Γ( v ) | where Γ( u ) is set of neighbors of u • Adamic-Adar coefficient 1 � s uv = log | Γ( z ) | z ∈ Γ( u ) ∩ Γ( v ) 10/19

Jaccard Coefficient s uv = | Γ( u ) ∩ Γ( v ) | | Γ( u ) ∪ Γ( v ) | = 4 9 = 0 . 44 u v 11/19

Adamic–Adar Coefficient 1 � s uv = log | Γ( z ) | z 1 1 z 1 = log 7 + · · · + log 4 z 2 = 7 . 60 u v z 3 z 4 12/19

Performance Assessment • Major challenge is huge number of possible node pairs. • We use a bootstrap resampling approach: • We draw a random sample of 1000 nodes and create appropriate training and test networks. • We compute a link prediction score s ( u , v ) for each node pair that is not associated with any interaction before time t 3 . • We assign class label “positive” to this node pair if the link occurs in test network and “negative” otherwise. • We repeat this procedure 100 times. • Using class labels and similarity scores we constructed an ROC curve. 13/19

Results Topological Characteristics of the MeSH Networks Parameter Train Test Nodes 24 225 25 570 Edges 4 897 380 5 615 965 Edges (reduced) 3 328 288 3 810 535 Density 0 . 01 0 . 01 Mean degree 274 . 78 298 . 05 Average path length 2 . 23 2 . 20 Clustering coefficient 0 . 27 0 . 26 Small-worldness index 21 . 57 20 . 70 14/19

Similarity Score Distribution 0.010 Class Density 0 1 0.005 0.000 0 1000 2000 3000 Jaccard coefficient 15/19

Prediction Performance Jaccard Adamic−Adar 1.0 1.0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Average true positive rate ● Average true positive rate ● ● ● 0.8 ● 0.8 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.6 0.6 ● ● ● ● ● ● ● ● ● ● ● ● 0.4 ● 0.4 ● ● ● ● ● 0.2 0.2 AUC = 0.78 AUC = 0.82 0.0 0.0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 False positive rate False positive rate AUC ( Area under the ROC curve ): 0.90 – 1.00 = excellent, 0.80 – 0.90 = good, 0.70 – 0.80 = fair, 0.60 – 0.70 = poor, 0.50 – 0.60 = fail 16/19

Example 18/19

Future Work • Explore the role of node and edge attributes in prediction performance. • Extend the study to semantic relations instead of co-occurrences. • Assess prediction performance on a large-scale network. • Develop network filtering methods. • Develop a web application for real-time computing. 19/19

Learning Links in MeSH Co-occurrence Network Preliminary Results - PowerPoint PPT Presentation

Learning Links in MeSH Co-occurrence Network Preliminary Results Andrej Kastrin 1 , Thomas C. Rindflesch 2 and Dimitar Hristovski 3 andrej.kastrin@gmail.com dimitar.hristovski@gmail.com 1 Faculty of Information Studies, Novo mesto, Slovenia 2

Links Student Web Presence Guidelines Summary 1. The Purpose of Links 2. Worst Links 3. Best

Thomas Hhn 4. Juni 2009 TU-Berlin, Berlin Why to How to Worksheets mesh ? mesh ? Outline

Mesh Basics Mesh Basics 1 Spring 2010 Definitions: Definitions: 1/2 Definitions:

Mesh Networks | Hacking The T3lc0 Model http://arig.org.il What's a Mesh Anyway ? Mesh =

Mesh Network Information ArrowSpan Wireless Mesh Network Solutions (China) Sept 2007 1

W ir eless Mesh Netw or k W ir eless Mesh Netw or k Technical Overview Technical Overview Danny

What is a mesh network? A mesh network is created when many devices have established

A Service Mesh Is Easy To Swallow In Small Pieces Andrew Jenkins Eng Lead, Aspen Mesh

What Makes for a Good Mesh? CS101 Meshing Winter 2007 1 Mesh Quality What makes a mesh

Smoothing Gianpaolo Palma Triangle Mesh List of vertices + List of triangle as triple of vertex

Broadband Networking Primer: Network Concepts and Applications Al Taylor, KN3U Presentation for

LINKS AND RULES GENOME VISUALIZATION WITH CIRCOS LINKS AND RULES 1 Martin Krzywinski

How of the Conceptual Future Internet Links lead to links that link to other links. Many

Arsenic Occurrence and Arsenic Occurrence and Innovative Technologies Innovative Technologies

PFAS OCCURRENCE & MONITORING GUIDANCE for California water systems Rick Zimmer May 2, 2019

Non-Mesh Treatment of SUI Shachar Aharony MD AUA SUI Guidelines 2017 Shachar Aharony MD,

Analysis of High-Throughput Biological Data Part II: Computational Bottlenecks and Novel

Outline 1 The topic 2 Decision support systems 3 Modeling 3.2 Numerical models

The statistical evaluation of DNA crime stains in R Miriam Maruiakov Department of

Large chromatographic data sets analysis on the example of metabolomic data Aneta Sawikowska 1 , 2

From the Foundation of Mathematics to the Birth of Computation Fairouz Kamareddine Heriot-Watt

The science potential of atmospheric Cherenkov arrays used as intensity interferometers Michael

21-cm signal from cosmic dawn: Imprints of the light-cone effects Raghunath Ghara NCRA-TIFR,

RemoveYoung (RY) A tool for the removal of the young stellar component within an adjustable age

Sambuz

Useful Links

Newsletter

Mail Us

Learning Links in MeSH Co-occurrence Network Preliminary Results - PowerPoint PPT Presentation

Learning Links in MeSH Co-occurrence Network Preliminary Results Andrej Kastrin 1 , Thomas C. Rindflesch 2 and Dimitar Hristovski 3 andrej.kastrin@gmail.com dimitar.hristovski@gmail.com 1 Faculty of Information Studies, Novo mesto, Slovenia 2

Links Student Web Presence Guidelines Summary 1. The Purpose of Links 2. Worst Links 3. Best

Thomas Hhn 4. Juni 2009 TU-Berlin, Berlin Why to How to Worksheets mesh ? mesh ? Outline

Mesh Basics Mesh Basics 1 Spring 2010 Definitions: Definitions: 1/2 Definitions:

Mesh Networks | Hacking The T3lc0 Model http://arig.org.il What's a Mesh Anyway ? Mesh =

Mesh Network Information ArrowSpan Wireless Mesh Network Solutions (China) Sept 2007 1

W ir eless Mesh Netw or k W ir eless Mesh Netw or k Technical Overview Technical Overview Danny

What is a mesh network? A mesh network is created when many devices have established

A Service Mesh Is Easy To Swallow In Small Pieces Andrew Jenkins Eng Lead, Aspen Mesh

What Makes for a Good Mesh? CS101 Meshing Winter 2007 1 Mesh Quality What makes a mesh

Smoothing Gianpaolo Palma Triangle Mesh List of vertices + List of triangle as triple of vertex

Broadband Networking Primer: Network Concepts and Applications Al Taylor, KN3U Presentation for

LINKS AND RULES GENOME VISUALIZATION WITH CIRCOS LINKS AND RULES 1 Martin Krzywinski

How of the Conceptual Future Internet Links lead to links that link to other links. Many

Arsenic Occurrence and Arsenic Occurrence and Innovative Technologies Innovative Technologies

PFAS OCCURRENCE &amp; MONITORING GUIDANCE for California water systems Rick Zimmer May 2, 2019

Non-Mesh Treatment of SUI Shachar Aharony MD AUA SUI Guidelines 2017 Shachar Aharony MD,

Analysis of High-Throughput Biological Data Part II: Computational Bottlenecks and Novel

Outline 1 The topic 2 Decision support systems 3 Modeling 3.2 Numerical models

The statistical evaluation of DNA crime stains in R Miriam Maruiakov Department of

Large chromatographic data sets analysis on the example of metabolomic data Aneta Sawikowska 1 , 2

From the Foundation of Mathematics to the Birth of Computation Fairouz Kamareddine Heriot-Watt

The science potential of atmospheric Cherenkov arrays used as intensity interferometers Michael

21-cm signal from cosmic dawn: Imprints of the light-cone effects Raghunath Ghara NCRA-TIFR,

RemoveYoung (RY) A tool for the removal of the young stellar component within an adjustable age

Sambuz

Useful Links

Newsletter

Mail Us

PFAS OCCURRENCE & MONITORING GUIDANCE for California water systems Rick Zimmer May 2, 2019