Exploring the application of deep learning techniques on medical - PDF document

Exploring the application of deep learning techniques on medical text corpora José Antonio Miñarro-Giménez a, Oscar Marín-Alonso a,b and Matthias Samwald a a Section for Medical Expert and Knowledge-Based Systems Center for Medical Statistics, Informatics, and Intelligent Systems Medical University of Vienna, Austria & Vienna University of Technology, Austria b Dept. of Computer Technology, University of Alicante, Alicante, Spain MIE 2014, 1st September 2014, Istanbul, Turkey

Introduction Problem: Increasingly difficult to find relevant information ARTIFICIAL INTELLIGENCE IN MEDICINE COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS INTERNATIONAL JOURNAL OF MEDICAL INFORMATICS INTERNATIONAL JOURNAL OF TECHNOLOGY ASSESSMENT IN HEALTH CARE JOURNAL OF BIOMEDICAL INFORMATICS MEDICAL & BIOLOGICAL ENGINEERING & COMPUTING JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION MEDICAL DECISION MAKING METHODS OF INFORMATION IN MEDICINE STATISTICAL METHODS IN MEDICAL RESEARCH STATISTICS IN MEDICINE BRIEFINGS IN BIOINFORMATICS BMC BIOINFORMATICS MEDICAL IMAGE ANALYSIS ARTIFICIAL INTELLIGENCE NEUROINFORMATICS BIOINFORMATICS

Introduction • Challenge: Automatically process biomedical literature. • Approaches: Data mining. Information extraction methods. Natural language processing. ... • Tools: Word2vec (https://code.google.com/p/word2vec/)

Word2vec toolkit Vector models Word2vec toolkit word2vec ABC Options • Type of architecture: Skip-gram or continuous bag-of-words. • Vector space dimension. • Size of the context window. • Training algorithms: hierarchical softmax and / or negative sampling. • Threshold for downsampling the frequent words. • ...

Word2vec toolkit Word2vec toolkit word2vec distance analogy

Word2vec Analogy method

Distance vs Analogy

Corpus Corpora Word count Vocabulary size Clinically relevant subset of PubMed, full abstracts 161.428.286 204.096 Conclusion sections from clinically relevant subset of 17.342.158 47.703 PubMed, “pubmed_key_assertions” Merck Manuals 12.667.064 49.174 Medscape 25.854.998 63.600 Clinically relevant subset of Wikipedia, “wikipedia" 10.945.677 65.875 Combined corpus (including all corpora above), 236.835.672 261.353 “combined”

NDF-RT ontology NDF-RT Description Example relationshi p may_treat Provides the association between drugs and the Warfarin -> may_treat -> “Thrombophlebitis” diseases they may treat. may_prevent Provides the list of diseases that a drug may prevent. Warfarin -> may_prevent -> “Myocardial Infarction” has_PE Relates drugs to their corresponding physiological Warfarin -> has_PE -> "Decreased Coagulation Factor effects. Concentration" has_MoA The mechanisms of action of each drug. Warfarin -> has_MoA -> “Vitamin K Epoxide Reductase Inhibitors”

Testing system RESTful server RESTful client Analogy Distance Query Matching service service module module NDF-RT Word2vec Word2vec ontology Results analogy tool distance tool Trained Word2vec corpus train tool

Pre-processing corpus

Pre-processing corpus NDF-RT ontology Gathering Raw List of text text terms corpora Processing corpora Remove Avoid Group punctuation capitalized multiword signs words terms Processed text

Statistics 1. The number of resulting vectors of words with at least one correct term from the relationships of NDF-RT ontology. 2. The evaluation of window size and the type of architecture. 3. The evaluation of vector dimension in vector model.

Results Hit rate Corpus Tool may_treat may_prevent has_PE has_MoA Analogy 27,37% 10,59% 2,49% 6,91% combined Distance 3,21% 6,32% 0,67% 6,81% Analogy 15,74% 5,09% 0,84% 1,51% PubMed key assertions Distance 2,13% 4,07% 0,37% 3,60% Analogy 14,9% 5,35% 2,22% 2,69% wikipedia Distance 1,3% 3.34% 0,32% 3,34%

Results Window size

Results Vector dimension

Conclusions • Word2vec is very efficient to generate vector models and to execute the different search methods. • Pre-processing the corpus content is needed to improve the resulting vector models. • The analogy method gets better related terms than distance search method. • The generated vector models provide the best results when searching for information related to “may_treat” relationship. • However, only a 27% of hit rate is a poor result compared to other approaches. • The customization of vector dimension has more impact than other training parameters such as the size of the context window. • The number of indexed terms is a better factor than the number of words in a corpus to measure their quality.

Future work • Test the word2vec toolkit with even larger medical corpora (> 10GB) . • Investigate the use of contextual knowledge to improve precision and recall of word2vec search methods. – Medical terminologies and ontologies.

QUESTIONS ?

Exploring the application of deep learning techniques on medical - PDF document

Exploring the application of deep learning techniques on medical text corpora Jos Antonio Miarro-Gimnez a, Oscar Marn-Alonso a,b and Matthias Samwald a a Section for Medical Expert and Knowledge-Based Systems Center for Medical

Exploring the IPY with NOAA Exploring the IPY with NOAA Exploring the IPY with NOAA Exploring

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

AGN deep multiwavelength AGN deep multiwavelength AGN deep multiwavelength surveys: surveys:

Deep Learning: Theory and Practice Deep Learning - Practical 02-04-2020 Considerations

Presentation about Deep Learning --- Zhongwu xie Contents 1.Brief introduction of Deep learning.

Deep Learning on GPUs March 2016 What is Deep Learning? GPUs and DL AGENDA DL in practice

Deep learning Deep reinforcement learning Hamid Beigy Sharif university of technology December

Differen'able Func'onal Programming Noel Welsh @noelwelsh underscore Goals Deep learning

DSC 102 Systems for Scalable Analytics Arun Kumar Topic 6: Deep Learning Systems 1 Outline

1/88 Presentation: Advanced Techniques 2/88 Presentation: Advanced Techniques 3/88

Intraday Techniques Intraday Techniques Intraday Techniques Intraday Techniques Combining

ACCELERATE DEEP LEARNING WITH NVIDIA'S DEEP LEARNING PLATFORM | STEPHEN JONES | GTC16 DEEP

Deep learning for natural language processing A short primer on deep learning Benoit Favre <

Relational Deep Learning: A Deep Latent Variable Model for Link Prediction Hao Wang, Xingjian

Detecting inappropriate prescribing for older patients at the community pharmacy Eline Tommelein,

Cardiovascular pharmacogenomics: ready for prime time? Simon de Denus, pharmacist, MSc (Pharm),

P2Y12 Antagonists Antagonists Study Year Follow-Up Comparison Stent P TIMI P Value Agent

Conditions on Propositional Anaphora Todd Snider Cornell University LSA Annual Meeting 2017

The webinar will be starting soon... White Labeling The Future of Cannabis With Jim Breese CMO

Marijuana Prevention Education Perinatal Collaborative of Pierce County (PCPC) June 11, 2018

MTN-020 Data Communiqu #9 August 6, 2013 Updated Ring Adherence (RA-1) CRF Version 1.0

Presentation Outline Technical Orientation Welcome Jeff Farbman Wallace Center at

Sambuz

Useful Links

Newsletter

Mail Us

Exploring the application of deep learning techniques on medical - PDF document

Exploring the application of deep learning techniques on medical text corpora Jos Antonio Miarro-Gimnez a, Oscar Marn-Alonso a,b and Matthias Samwald a a Section for Medical Expert and Knowledge-Based Systems Center for Medical

Exploring the IPY with NOAA Exploring the IPY with NOAA Exploring the IPY with NOAA Exploring

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

AGN deep multiwavelength AGN deep multiwavelength AGN deep multiwavelength surveys: surveys:

Deep Learning: Theory and Practice Deep Learning - Practical 02-04-2020 Considerations

Presentation about Deep Learning --- Zhongwu xie Contents 1.Brief introduction of Deep learning.

Deep Learning on GPUs March 2016 What is Deep Learning? GPUs and DL AGENDA DL in practice

Deep learning Deep reinforcement learning Hamid Beigy Sharif university of technology December

Differen'able Func'onal Programming Noel Welsh @noelwelsh underscore Goals Deep learning

DSC 102 Systems for Scalable Analytics Arun Kumar Topic 6: Deep Learning Systems 1 Outline

1/88 Presentation: Advanced Techniques 2/88 Presentation: Advanced Techniques 3/88

Intraday Techniques Intraday Techniques Intraday Techniques Intraday Techniques Combining

ACCELERATE DEEP LEARNING WITH NVIDIA'S DEEP LEARNING PLATFORM | STEPHEN JONES | GTC16 DEEP

Deep learning for natural language processing A short primer on deep learning Benoit Favre &lt;

Relational Deep Learning: A Deep Latent Variable Model for Link Prediction Hao Wang, Xingjian

Detecting inappropriate prescribing for older patients at the community pharmacy Eline Tommelein,

Cardiovascular pharmacogenomics: ready for prime time? Simon de Denus, pharmacist, MSc (Pharm),

P2Y12 Antagonists Antagonists Study Year Follow-Up Comparison Stent P TIMI P Value Agent

Conditions on Propositional Anaphora Todd Snider Cornell University LSA Annual Meeting 2017

The webinar will be starting soon... White Labeling The Future of Cannabis With Jim Breese CMO

Marijuana Prevention Education Perinatal Collaborative of Pierce County (PCPC) June 11, 2018

MTN-020 Data Communiqu #9 August 6, 2013 Updated Ring Adherence (RA-1) CRF Version 1.0

Presentation Outline Technical Orientation Welcome Jeff Farbman Wallace Center at

Sambuz

Useful Links

Newsletter

Mail Us

Deep learning for natural language processing A short primer on deep learning Benoit Favre <