TextMed: A Multi-Agent System with Reinforcement Learning Agents for Biomedical Text Mining Michael Camara Janyl Jumadinova Oliver Bonham-Carter September 9, 2015
Big Data
Biomedical Research ◮ PubMed: U.S. National Library of Medicine free search engine ◮ 24 million records (abstracts and citations) ◮ Annual growth rate of 4%
Text Mining ◮ Text summarization ◮ Document retrieval ◮ Document classification
Text Mining ◮ Text summarization ◮ Document retrieval ◮ Document classification ◮ Information extraction
Preprocessing: Lister 1. Lister downloads and decompresses data 2. Keyword used to obtain relevant abstracts 3. Abstracts divided into datasets 4. Agent assigned to each dataset
Preprocessing: Abstract Creation 1. Lister downloads and decompresses data 2. Keyword used to obtain relevant abstracts 3. Abstracts divided into datasets 4. Agent assigned to each dataset
Preprocessing: Dataset Creation 1. Lister downloads and decompresses data 2. Keyword used to obtain relevant abstracts 3. Abstracts divided into datasets 4. Agent assigned to each dataset
Preprocessing: Agent Allocation 1. Lister downloads and decompresses data 2. Keyword used to obtain relevant abstracts 3. Abstracts divided into datasets 4. Agent assigned to each dataset
TextMed: Parsing 1. Scan through each document with keyword
TextMed: MeSH Keyword List 2. Obtain keyword from MeSH (Medical Subject Heading) list
TextMed: Match Found? 3. Iterate through list until match found
TextMed: SentiStrength 4. Perform sentiment analysis on keyword match
TextMed: SentiStrength (cont.) Sentiment Analysis Example: ”The penicillin successfully treated the condition, but the patient complained of severe side effects afterwards.”
TextMed: SentiStrength (cont.) Sentiment Analysis Example: ”The penicillin successfully [+3] treated the condition, but the patient complained of severe [-2] side effects afterwards.” Sentiment Score = [+3] + [-2] = 1
TextMed: Reinforcement Learning 5. Perform reinforcement learning:
TextMed: Reinforcement Learning (cont.) 1. Give command 2. Dog performs an action 3. Give treat if action matches command 4. Dog tries to maximize treats
TextMed: Reinforcement Learning (cont.) 1. Provide list of possible actions 2. Agent performs an action 3. Agent receives reward based on how sentiment changes N | gs k − ls k , d | � R k = N i = d 4. Agent tries to optimize reward for next time
TextMed: Continue Parsing 6. Continue parsing all keywords, then begin next document
TextMed: Multiple Agents 7. Multiple agents working simultaneously
Experimental Setup ◮ Three primary datasets used for experiments ◮ Each dataset obtained using different keywords with Lister program and PubMed database ◮ Similar pattern of results for each
Alzheimer’s Dataset: Reward Data ◮ Smaller reward = more optimal, less sentiment fluctuation ◮ Initially high reward, becomes smaller over time
Alzheimer’s Dataset: Local Sentiment vs Global Sentiment ◮ Sentiment after learning ◮ Sentiment before learning ◮ Highly variable throughout all ◮ Variable at beginning, documents stabilizes near end
Proximity Parameter Keyword = penicillin. The penicillin successfully treated the condition, but the patient complained of severe side effects afterwards.
Proximity Parameter Keyword = penicillin. Proximity = 1: The penicillin successfully [+3] treated the condition, but the patient complained of severe side effects afterwards. Sentiment Score = [+3] + 0 = 3
Proximity Parameter Keyword = penicillin. Proximity = 2: The penicillin successfully [+3] treated the condition, but the patient complained of severe side effects afterwards. Sentiment Score = [+3] + 0 = 3
Proximity Parameter Keyword = penicillin. Proximity = 3: The penicillin successfully [+3] treated the condition, but the patient complained of severe side effects afterwards. Sentiment Score = [+3] + 0 = 3
Proximity Parameter Keyword = penicillin. Proximity = 13: The penicillin successfully [+3] treated the condition, but the patient complained of severe [-2] side effects afterwards. Sentiment Score = [+3] + [-2] = 1
Alzheimer’s: Proximity/Reward Heatmap
Future Work ◮ Optimize SentiStrength for biomedical texts ◮ Modify reinforcement learning algorithm ◮ Incorporate data from multiple databases ◮ Incorporate data from medical records ◮ Compare to other systems
Thank You: ◮ Professor Jumadinova ◮ Oliver Bonham-Carter ◮ Dr. Michael Thelwall ◮ Dr. Barbara Lotze Research Fellowship Fund
Recommend
More recommend