medical t ext data
play

Medical T ext Data Sendong (Stan) Zhao + , Meng Jiang * , Ming Liu + - PowerPoint PPT Presentation

CausalTriad: T oward Pseudo Causal Relation Discovery and Hypotheses Generation from Medical T ext Data Sendong (Stan) Zhao + , Meng Jiang * , Ming Liu + , Bing Qin + , Ting Liu + + Harbin Institute of Technology, China * University of Notre


  1. CausalTriad: T oward Pseudo Causal Relation Discovery and Hypotheses Generation from Medical T ext Data Sendong (Stan) Zhao + , Meng Jiang * , Ming Liu + , Bing Qin + , Ting Liu + + Harbin Institute of Technology, China * University of Notre Dame, USA

  2. Pseudo Causal Relation • Golden standard ⁃ Randomized controlled experiments ⁃ Too costly • Observational data ⁃ Structured data, eg. EHR ⁃ Unstructured data ( Text data ), eg. medical literature, patient report • Pseudo causal relation ⁃ Semantic-level causal relations ⁃ Verified true causal knowledge ⁃ Or, have not been identified previously ⁃ Or, no evidence to support them

  3. Previous Studies • Extract causal relations from single sentences • While causal relations usually span multiple sentences • Use only textual information and ignore structural information • While causal relations naturally have an attached network structure • Only extraction rather than inference • While causality itself is a basic logical rule

  4. Causation Transitivity • Preserving transitivity is a basic desideratum for an adequate analysis of causation -- L. A. Paul and Ned Hall “ Causation: A User’s Guide” 𝐷 𝐶 𝐷 𝐵 𝐵 ……

  5. Causation Transitivity in Medical Text Obesity Diabetes cause Obesity usually increases the risk of diabetes. cause ? People with diabetes have more sugar in blood Hyperglycemia called hyperglycemia . ? Metformin has become a mainstay of type 2 diabetes management and is now the recommended first-line drug for treating the disease. Metformin

  6. Motivation • Jointly utilize ⁃ Textual information (context and co-occurrence) ⁃ Structural information (causation transitivity rule) • Through inference to ⁃ Discover causal relations in text ⁃ Generate new causal relation hypotheses

  7. Problem Definition • Problem : Causal Relation Discovery from Triad Structures • Medical Cause- Effect Candidates Network 𝐻 = 𝑊, 𝐹 , 𝐹 ∈ 𝑊 × 𝑊 • Triad Structure ⁃ Each Triangle in the network ⁃ Basic unit

  8. Our method • Causal Relation Candidates Matching • 3 Clues for Causal Discovery ⁃ Causal Association ⁃ Contextual Information ⁃ Causal Transitivity Rules • Factor Graph Model

  9. Causal Relation Candidates Matching • Medical Dictionary ⁃ Dryad data package ⁃ TCMonline and TCMID • For every n consecutive sentences • Match medical entities • Pair each of them into several pairs • Every two pairs with a shared entity generate a triad structure • Eg. ( 𝑓 𝑗 , 𝑓 𝑙 ) and ( 𝑓 𝑗 , 𝑓 𝑘 ) generate a triad structure ( 𝑓 𝑙 , 𝑓 𝑗 , 𝑓 𝑘 )

  10. Our method • Causal Relation Candidates Matching • 3 Clues for Causal Discovery ⁃ Causal Association ⁃ Contextual Information ⁃ Causal Transitivity Rules • Factor Graph Model

  11. 3 Clues for Causal Discovery • Causal Association ⁃ Frequently co-occurring entities are more likely to be a causation [Do and Roth 2013] ⁃ e i is a possible cause of entity e j , if e j happens more frequently with e i than by itself [Suppes 1970] • Contextual Information ⁃ Causal relations in the text tend to share special contexts ⁃ Like domain-related words, causal triggers, connectives, etc. • Causation Transitivity Rule

  12. Causal Association • Modeling causal association 𝐷𝐵 𝑓 𝑗𝑘 = 𝐽(𝑓 𝑗 , 𝑓 𝑘 ) × 𝐸(𝑓 𝑗 , 𝑓 𝑘 ) × 𝑁𝑏𝑦(𝑣 𝑗 , 𝑣 𝑘 ) ⁃ Larger mutual information 𝑘 = 𝑚𝑝𝑕 𝑄(𝑓 𝑗 , e j ) 𝐽 𝑓 𝑗 , 𝑓 𝑄 𝑓 𝑗 𝑄(𝑓 𝑘 ) ⁃ Award pairs that co-exist closer, while penalizing those are further apart in text 𝑘 = − log 𝑡𝑓𝑜𝑢 𝑓 𝑗 − 𝑡𝑓𝑜𝑢 𝑓 + 1 𝑘 𝐸 𝑓 𝑗 , 𝑓 2 × 𝑋𝑇 ⁃ Model the frequency of co-occurrence of two medical entities, 𝑁𝑏𝑦 𝑣 𝑗 , 𝑣 𝑘 𝑄(𝑓 𝑗 ,𝑓 𝑘 ) 𝑄(𝑓 𝑗 ,𝑓 𝑘 ) 𝑣 𝑗 = −𝑄(𝑓 𝑗 ,𝑓 𝑘 )+𝜁 , 𝑣 𝑘 = max 𝑄 𝑓 𝑗 ,𝑓 𝑙 max 𝑄 𝑓 𝑙 ,𝑓 𝑘 −𝑄(𝑓 𝑗 ,𝑓 𝑘 )+𝜁 𝑙 𝑙

  13. Contextual Information (1) • Encode Synthetic Context

  14. Contextual Information (2) • Encode context based on pre-trained word2vec Word Embedding • Three ways

  15. Causation Transitivity Rules • angle rules and triadic rule

  16. Integrate 3 Clues • Combining evidence from both textual supports and structural inferences, the above three clues are better equipped to discover causal relations. • They are complementary in several ways: ⁃ Causal association gives preferences to frequently co-occurring causal pairs. ⁃ Causal transitivity rules are designed to identify causal relations with few textual supports except for those that follow the transitivity rule and generate new causal hypothesis. ⁃ Incorporating contextual information from the text can potentially eliminate those frequently co-occurring medical entities which are not causal.

  17. Our method • Causal Relation Candidates Matching • 3 Clues for Causal Discovery ⁃ Causal Association ⁃ Contextual Information ⁃ Causal Transitivity Rules • Factor Graph Model

  18. CausalTriad: Factor Graph for Each Triad Structure

  19. Experiments • Data collection ⁃ TCM consists of the abstracts of 106,151 papers. ⁃ HealthBoards consists of post messages on health and medical issues such as diseases, symptoms, medicines, and side- effects, etc.

  20. Experimental Results • Generating new causal relation hypotheses

  21. Experimental Results • Different types of causal relations ⁃ DISEASE – cause – SYMPTOM ⁃ FORMULA – against – DISEASE ⁃ HERB – against – DISEASE ⁃ FORMULA – relieve – SYMPTOM ⁃ HERB – relieve – SYMPTOM ⁃ DISEASE – bring – DISEASE ⁃ DRUG – against – DISEASE ⁃ DISEASE – cause – SYMPTOM

  22. Experimental Results • Patterns causal reasoning rules

  23. Experimental Results • Causal relation extraction

  24. Experimental Results • Extracting causal relations from single sentence and multiple sentences. • Extracting implicit causal relations

  25. Influence Factors • Influence from the size of labeled training data

  26. Influence Factors • Influence from the number of bootstrapping rounds and window size

  27. Conclusions • We propose CausalTriad to incorporate both textual and structural clues for causal relation discovery from text. • Experimental results on two datasets demonstrate that: ⁃ CausalTriad is effective for discovering explicit and implicit causal relations from both single sentence and multiple sentences. ⁃ CausalTriad can generate new causal relation hypotheses through inference.

  28. Sendong (Stan) Zhao Meng Jiang Ming Liu Bing Qin Ting Liu Thank You! Any comments and suggestions? Homepage: http://ir.hit.edu.cn/~sdzhao/ Email: zhaosendong@gmail.com

Recommend


More recommend