mandarin chinese
play

Mandarin Chinese Bai Li Advised by Frank Rudzicz Li B ., Hsu Y-T., - PowerPoint PPT Presentation

Automatic Detection of Dementia in Mandarin Chinese Bai Li Advised by Frank Rudzicz Li B ., Hsu Y-T., Rudzicz F. Detecting dementia in Mandarin Chinese using transfer learning from a parallel corpus. To appear at NAACL 2019. Alzheimers


  1. Automatic Detection of Dementia in Mandarin Chinese Bai Li Advised by Frank Rudzicz Li B ., Hsu Y-T., Rudzicz F. “Detecting dementia in Mandarin Chinese using transfer learning from a parallel corpus”. To appear at NAACL 2019.

  2. Alzheimer’s Disease (AD) and Dementia • Neurodegenerative disease • 5.7 million patients in the USA, 50 million worldwide • Symptoms: • Early: forgetfulness, language impairment • Late: loss of motor control, death • One of the most costly diseases • No cure is known • For this presentation, Alzheimer’s disease ≈ Dementia 2

  3. Why detect Alzheimer’s disease? • Early treatment • No known drugs to slow down progression of AD • But can reduce symptoms! • Clinical trials • Current treatments may be ineffective because started too late! 3

  4. Detecting AD • Many tests: MRI, PET scan • Cognitive tests • Category naming • Picture naming • Picture description • Cognitive tests: cheap, non-intrusive, screening mechanism 4

  5. Category Fluency • Name as many {animals, fruits, colours} as possible in 60 seconds 5

  6. Picture Description • Describe this picture in as much detail as possible 6

  7. Linguistic impairment of AD • People with dementia use language differently! • Word finding difficulties • “the boy is standing on the chair ” • More pronouns / adverbial constructions • “ he’s reaching up there ” • Acoustic abnormality • Higher pause rate, slower speech • Less complex sentences 7

  8. Feature extraction for AD detection Automated tools to extract relevant features • Length of narration • Vocabulary diversity 𝑣𝑜𝑗𝑟𝑣𝑓 𝑥𝑝𝑠𝑒𝑡 • Type-token Ratio: 𝑢𝑝𝑢𝑏𝑚 𝑥𝑝𝑠𝑒𝑡 • Frequency metrics in corpus 8

  9. Syntactic features • Part-of-speech tag counts (e.g: #adj, #noun, #pronoun/#noun) • Constituency parse tree • Max, mean, median heights • Production rule counts • Length of clauses, dependent clauses, coordinate phrases • Dependency parse tree • Mean, median, max dependency distance 9

  10. Machine learning to detect dementia • Fraser (2016) extracts over 400 features and achieves 81% classification accuracy using logistic regression • Fraser, Kathleen C., Jed A. Meltzer, and Frank Rudzicz . "Linguistic features identify Alzheimer’s disease in narrative speech." Journal of Alzheimer's Disease 49.2 (2016): 407-422. 10

  11. DementiaBank • Collected between 1983 to 1988 at University of Pittsburgh • 551 cookie theft narrations (241 healthy, 310 dementia) • Mini Mental State Exam (MMSE), scored out of 30 • Other tasks • Demographic information, diagnosis 11

  12. Mandarin Dataset: Lu Corpus • 49 speakers of Taiwanese Mandarin • Several tasks for each speaker • Cookie theft picture description • Category Fluency (animals, fruits, colours, places in Taiwan) • Picture Naming (30 items) • Transcripts of the picture description available • Diagnostic information unknown 12

  13. Dementia Score using PCA • Derive a proxy score for dementia 13

  14. How to detect dementia in Chinese • Q: Why not just do the same thing that we did with English? • A: Not enough data • Solution: Need to combine datasets somehow, across different languages • Use transfer learning ! 14

  15. Some Domain Adaptation Methods • Large corpus in domain S, small corpus in domain T • Want accurate model for domain T • Existing methods require same features in S and T Daume III, Hal. "Frustratingly Easy Domain Adaptation." Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics . 2007. 15

  16. Cross-language features: Difficulties 她站在一个椅子上 she's standing on a chair she-stand-at-one-CL-chair-on she/PRP 's/VBZ standing/VBG 她 /PN 站 /VV 在 /P on/IN a/DT chair/NN 一 /CD 个 /M 椅子 /NN 上 /LC ???? 16

  17. Cross-language features: Difficulties • Experiments: poor accuracy with universal cross-language features • Model needs to learn not only to detect dementia • It also needs to learn how features correspond across languages • We only have n=49 samples! 17

  18. Idea: Extract features separately, learn correspondences using out-of-domain data! 18

  19. Movie subtitles! • OpenSubtitles corpus, containing aligned subtitles in 62 languages 19

  20. Baselines 1. Unilingual: train model using Mandarin data only, evaluate using cross-validation 2. Google Translate: translate Mandarin narration into English, then run the English classifier 20

  21. Evaluation: Spearman’s correlation between model’s output and dementia score 21

  22. Proposed Model (Learning Feature Correspondence) 1. Extract feature vector 𝑦 in Chinese 2. Extract feature vector 𝑧 in English, independently 3. Learn mapping function 𝑔: 𝑦 → 𝑧 using OpenSubtitles movie dialogue corpus This is a multi-output regression problem 22

  23. Proposed Model (Learning Feature Correspondence) Unsupervised – only English dementia data used during training! 23

  24. Independent Linear Regressions • For each target feature, train a separate linear regression • Use ElasticNet regularization, independent hyperparameter search num_characters num_words pronoun_count noun_verb_ratio … … English Chinese 24

  25. Reduced Rank Regression • Problem: not taking advantage of relationship between outputs • Solution: • Note: equivalent to linear neural network with hidden layer of size 𝑆 num_words num_characters pronoun_count noun_verb_ratio … … 25 English Chinese

  26. Joint Feature Selection • Problem: some features are noisy or impossible to reconstruct • Solution: order by 𝑆 2 , use only the top 𝐿 features 𝑆 2 = 0.8 num_words num_characters pronoun_count 𝑆 2 = 0.1 noun_verb_ratio … … English Chinese 26

  27. Results 𝑞 = 0.06 • Initial model not very good • Reduced rank regression also not effective • Joint feature selection beats baselines 27

  28. Results: Number of features Accuracy of English Performance of classifier using K whole model features 28

  29. 29

  30. Ablation study About 1000-2000 parallel sentences needed 30

  31. Summary First use of NLP to detect dementia in Mandarin Chinese 1. Extracted lexicosyntactic features in English and Chinese 2. Used out-of-domain corpus to learn correspondence model 3. Combined with English dementia classifier 31

  32. Future Work • Need for human transcripts • Incorporate speech data • Apply to other languages (French, Korean) • Collect quality data in multiple languages 32

Recommend


More recommend