machine learning vs knowledge based approaches to adr
play

Machine learning vs. knowledge based approaches to ADR - PowerPoint PPT Presentation

Machine learning vs. knowledge based approaches to ADR identification November 2017 Topics Short about us Iden3fying ADRs Machine Learning for seman3c rela3ons iden3fica3on Results Challenges SHORT ABOUT US Focus on


  1. Machine learning vs. knowledge based approaches to ADR identification November 2017

  2. Topics Short about us • Iden3fying ADRs • Machine Learning for seman3c rela3ons • iden3fica3on Results • Challenges •

  3. SHORT ABOUT US Focus on text-analy3cs for Pharmaceu3cals. Since 1998 Othe her text xt sou sources s Voi oice of of the he Pa3 Pa3ent nt Electroni Ele onic He Healt alth h Re Recor ords ds Scien3fic literature – FDA – Patents – Business opportuni3es

  4. SHORT ABOUT US Advanced Databases Group, Universidad Carlos III de Madrid § Research lines: • Natural language processing • Accessibility § Resources produced: • Drug-drug-interac-on collec-on (DDI Corpus) • DINTO ontology

  5. Our goal at TAC ADR “ Combine Knowledge Based with Machine Learning Based approaches to leverage ADR identification ”

  6. Iden%fying ADRs

  7. TO TOPI PIC C EXTRA EXTRACTI CTION N NLP and Resource based approach ¡ SI SIDER DER ¡ UMLS UMLS ¡ Tr Trai aini ning ng cor orpus s

  8. TO TOPI PIC C EXTRA EXTRACTI CTION N NLP and Resource based approach ¡ SI SIDER DER ¡ UMLS UMLS ¡ Tr Trai aini ning ng cor orpus s Dictionary #entries Adverse Reactions 21,826 Factor 41 Severity 158 Animal 27 DrugClass 101

  9. TO TOPI PIC C EXTRA EXTRACTI CTION N NLP and Resource based approach And some rules to iden3fy nega3on: • MeaningCloud Insights Engine API supports this rule syntax •

  10. Machine Learning for seman%c rela%ons iden%fica%on

  11. Machine learning for seman3c rela3ons iden3fica3on Represen3ng ADR men3on context through a set of features: ADRMen9on – Other pairs (where Other is Severity, DrugClass, Ø M1TXT, M1TXT, M2TXT, M2TXT, BWTXT BWTXT: the text of both/between men3ons. Nega9on, Animal or Factor) Ø C1BO C1BOW, W, C2BO C2BOW: bag-of-words of both men3ons. Ø C1PO C1POS, S, C2PO C2POS: part of speech of both men3ons. Ø PB1PO PB1POS, S, PA PA1PO 1POS, S, PB2PO PB2POS, S, PA PA2PO 2POS, S, PWPO PWPOS: the PoS tags of the two tokens before/ aZer/between both men3ons. Ø WA WA1TXT, 1TXT, WB2TXT, WB2TXT, WA WA2TXT, 2TXT, WB1TXT WB1TXT: the two tokens aZer/before the men3on. Ø LA LA1LEM, 1LEM, LB2LEM, LB2LEM, LA LA2LEM, 2LEM, LB1LEM LB1LEM: the lemmas of the two tokens aZer/before both men3ons. Ø LWLEM LWLEM: the lemmas between of the two men3ons Ø NTO NTOKB KB: the number of tokens between the two men3ons.

  12. Machine learning for seman3c rela3ons iden3fica3on And the algorithm? § SVM, support vector machines (using scikit-learn on Python) § Specifically, SVC implementa3on: • Default parameter values • Linear kernel But, no deep learning??!! Of course (CNN), but not in the official runs.

  13. Results

  14. Results • Task 1. ADR and related en33es Low precision!! • Task 2. Rela3ons between ADRs and en33es Oh, oh!!

  15. Results • Task 3. Posi3ve ADRs Only a few negated men9ons? PreQy good!! • Task 4. Normaliza3on through MedDRA Using dic9onaries with seman9c informa9on produces nice results

  16. Challenges

  17. Challenges • Nega3on iden3fica3on requires more effort (not only on the ADRs field). Some weird things found in the test set: Eg.: The most frequently observed malignancies other than non- melanoma skin cancer … Negation? • CNNs and the use of syntac3c features improves results P R F1 Other 0.71 0.81 0.76 Negated 0.72 0.40 0.51 Hypothetical 0.75 0.75 0.75 Effect 0.76 0.61 0.68 Avg / total 0.73 0.73 0.73

  18. Challenges • Recall must be improved: o separated mul3word men3ons o ADRs with no MedDRA code, enough lexical resources? • How to approach errors when applying deep learning? • Enough accuracy for prac3cal applica3ons? What does FDA think?

  19. Thanks QUESTIONS?

  20. LabDA Resources Corpus DDI (Drug-Drug Interac3ons) 1,025 annotated documents, 18,502 en33es and } 5,028 DDIs (by expert pharma) MedLine and DrugBank texts } Annota3ns guidelines and interannotator agreement. } Available at labda.inf.uc3m.es } Used at DDIExtrac3on 2011 and DDIExtrac3on 2013 } Semeval Tasks

  21. LabDA Resources DINTO Ontology- knowledge about drugs and intera3ons (11,555 DDIs and 8,786 pharmacological en33es). Available at OBO Foundry Applica3on to Informa3on Extrac3on and Predic3on

  22. Meani Me aning ngClou Cloud d LLC LLC Automa3ng the extrac3on of Meaning from any informa3on source. Addr ddress ss 35-37 36th Street 11106 Astoria NY Cont Contac act Inf nfo o jmar3nez@meaningcloud.com Tele Te lephone hone Phone: +1 (646) 403-3104 meaningcloud.com

Recommend


More recommend