convolutional neural network to model articulation
play

Convolutional Neural Network to Model Articulation Impairments in - PowerPoint PPT Presentation

Convolutional Neural Network to Model Articulation Impairments in Patients with Parkinsons Disease asquez-Correa 1 , 2 Juan Camilo V Juan Rafael Orozco-Arroyave 1 , 2 , Elmar N oth 2 1 GITA research group, University of Antioquia UdeA. 2


  1. Convolutional Neural Network to Model Articulation Impairments in Patients with Parkinson’s Disease asquez-Correa 1 , 2 Juan Camilo V´ Juan Rafael Orozco-Arroyave 1 , 2 , Elmar N¨ oth 2 1 GITA research group, University of Antioquia UdeA. 2 Pattern recognition Lab. Friedrich Alexander Universit¨ at. Erlangen-N¨ urnberg. jcamilo.vasquez@udea.edu.co 18th INTERSPEECH, 2017 1 / 32 November 9, 2017

  2. Outline Introduction Methods Experimental framework Results Conclusion 2 / 32

  3. Outline Introduction Methods Experimental framework Results Conclusion 3 / 32

  4. Introduction: Parkinson’s Disease ◮ Second most prevalent neurologi- cal disorder worldwide. ◮ Patients develop several motor and non-motor impairments. (O. Hornykiewicz 1998). ◮ Speech impairments are one of the earliest manifestations. 4 / 32

  5. Introduction: Speech impairments Speech impairments in PD patients: hypokinetic dysarthria Phonation Prosody Intelligibility pataka pataka Articulation 5 / 32

  6. Introduction: Imprecise articulation ◮ One of the most deviant speech dimensions in PD. ◮ Reduced velocity of lip, tongue, and jaw movements. ◮ Strong indication of the literature statement: imprecise con- sonants caused by reduced range of movements of ar- ticulators pa ta ka 6 / 32

  7. Introduction: Hypothesis PD patients have difficulties to begin and to stop the vocal fold vibration, and such difficulties can be observed on speech sig- nals by modeling the transitions between voiced and unvoiced sounds Onset transition Offset transition Unvoiced Voiced Voiced Unvoiced Voiced Unvoiced 7 / 32

  8. Introduction: Aims ◮ To model the time-frequency (TF) information provided by the onset and offset transitions: short-time Fourier trans- form (STFT) and continuous wavelet transform (CWT). ◮ To “learn” features from time-frequency representations: con- volutional neural network (CNN). ◮ Why TF and feature-learning? both have been successfully used in several paralinguistics tasks: emotion, deception, depression, and others. 8 / 32

  9. Outline Introduction Methods Experimental framework Results Conclusion 9 / 32

  10. Methods Transitions Time frequency Convolutional detection representations neural network 10 / 32

  11. Methods: Transitions detection Transitions Time frequency Convolutional detection representations neural network Onset transition Offset transition Onset and offset are detected according to the presence of the fundamental frequency. 11 / 32

  12. Methods: Time-frequency representation Transitions Time frequency Convolutional detection representations neural network 4000 3500 3000 Frequency (Hz) 2500 2000 1500 1000 500 0.00 0.02 0.04 0.06 0.08 0.10 0.12 0.14 0.00 0.02 0.04 0.06 0.08 0.10 0.12 0.14 Time (s) Time (s) STFT of onset for a PD patient (left) and a HC subject (right) Play PD Play HC 12 / 32

  13. Methods: Time-frequency representation Transitions Time frequency Convolutional detection representations neural network 500 400 300 Scale 200 100 0.00 0.02 0.04 0.06 0.08 0.10 0.12 0.14 0.00 0.02 0.04 0.06 0.08 0.10 0.12 0.14 Time (s) Time (s) CWT of onset for a PD patient (left) and a HC subject (right) 13 / 32

  14. Methods: Convolutional neural network Transitions Time frequency Convolutional detection representations neural network Feature maps 1 Input layer Feature maps 2 PD vs. HC Convolution layer I Max-pool. layer 1 Convolution layer II Max-pool layer 2 Fully conected MLP CNN learns high–level representations from the low–level raw data 14 / 32

  15. Outline Introduction Methods Experimental framework Results Conclusion 15 / 32

  16. Data ◮ Three databases with recordings in three languages: Span- ish, German, and Czech. ◮ Diadochokinetic exercises, isolated sentences, read texts, and monologues. 16 / 32

  17. Data Language Description Spanish 50 Patients and 50 Healthy controls. Balanced in age (60 years old) and gender. Patients in middle state of the disease. German 88 Patients and 88 Healthy controls. Balanced in age (64 years old). patients in low and middle state of the disease. Czech 20 Patients and 15 Healthy controls. All male speakers. Patients diagnosed during recording session. Table: Databases 17 / 32

  18. Experiments and validation ◮ Classification of PD patients vs. HC subjects in the same language. ◮ 10 fold cross-validation: 8 for training, 1 to optimize hyper-parameters, and 1 for test. ◮ Cross-language classification. ◮ One language used for train and validation and other language used for test. 18 / 32

  19. Experiments and validation ◮ Results are compared respect to previous studies 1 . Support vector machine 1 Juan Camilo V´ asquez-Correa et al. “Effect of acoustic conditions on algorithms to detect Parkinson’s disease from speech”. In: International Conference on Acoustics, Speech and Signal Processing (ICASSP), . 2017, pp. 5065–5069. 19 / 32

  20. Outline Introduction Methods Experimental framework Results Conclusion 20 / 32

  21. Results: same language for train and test TFR Onset Offset Onset+Offset Spanish CNN-STFT 85.3 81.6 85.9 CNN-CWT 84.2 81.8 85.2 Baseline 69.3 69.6 71.6 German CNN-STFT 70.3 68.0 75.0 CNN-CWT 68.0 66.9 70.5 Baseline 72.7 70.9 74.0 Czech CNN-STFT 77.9 80.4 84.4 CNN-CWT 89.2 87.7 89.4 Baseline 75.3 74.4 78.8 21 / 32

  22. Results: same language for train and test TFR Onset Offset Onset+Offset Spanish CNN-STFT 85.3 81.6 85.9 CNN-CWT 84.2 81.8 85.2 Baseline 69.3 69.6 71.6 German CNN-STFT 70.3 68.0 75.0 CNN-CWT 68.0 66.9 70.5 Baseline 72.7 70.9 74.0 Czech CNN-STFT 77.9 80.4 84.4 CNN-CWT 89.2 87.7 89.4 Baseline 75.3 74.4 78.8 22 / 32

  23. Results: same language for train and test TFR Onset Offset Onset+Offset Spanish CNN-STFT 85.3 81.6 85.9 CNN-CWT 84.2 81.8 85.2 Baseline 69.3 69.6 71.6 German CNN-STFT 70.3 68.0 75.0 CNN-CWT 68.0 66.9 70.5 Baseline 72.7 70.9 74.0 Czech CNN-STFT 77.9 80.4 84.4 CNN-CWT 89.2 87.7 89.4 Baseline 75.3 74.4 78.8 23 / 32

  24. Results: same language for train and test 4000 Frequency (Hz) 3000 2000 1000 0 50 100 150 50 100 150 Time (ms) Time (ms) Low Energy High Energy Figure: Output of the CNN after the last max–pool layer: PD patient (left) and a HC speaker (right) 24 / 32

  25. Results: same language for train and test Speech tasks Spanish German Czech read text 85.0 70.3 88.5 monologue 85.6 70.3 89.1 /pa-ta-ka/ 85.4 70.7 89.2 25 / 32

  26. Results: different language for train and test Test Lang. TFR onset offset onset+offset Train with Spanish German CNN-STFT 51.7 50.2 54.7 German Baseline 53.7 55.0 54.1 Czech CNN-CWT 55.2 55.4 57.9 Czech Baseline 60.3 57.4 60.4 Train with German Spanish CNN-STFT 58.0 55.7 55.8 Spanish Baseline 53.5 53.5 53.6 Czech CNN-STFT 53.0 52.4 53.0 Czech Baseline 50.9 51.7 52.6 Train with Czech Spanish CNN-CWT 53.8 56.3 56.7 Spanish Baseline 53.4 51.6 52.4 German CNN-STFT 54.0 51.8 54.0 German Baseline 51.2 51.0 50.7 26 / 32

  27. Results: different language for train and test Test Lang. TFR onset offset onset+offset Train with Spanish German CNN-STFT 51.7 50.2 54.7 German Baseline 53.7 55.0 54.1 Czech CNN-CWT 55.2 55.4 57.9 Czech Baseline 60.3 57.4 60.4 Train with German Spanish CNN-STFT 58.0 55.7 55.8 Spanish Baseline 53.5 53.5 53.6 Czech CNN-STFT 53.0 52.4 53.0 Czech Baseline 50.9 51.7 52.6 Train with Czech Spanish CNN-CWT 53.8 56.3 56.7 Spanish Baseline 53.4 51.6 52.4 German CNN-STFT 54.0 51.8 54.0 German Baseline 51.2 51.0 50.7 27 / 32

  28. Results: different language for train and test Test Lang. TFR onset offset onset+offset Train with Spanish German CNN-STFT 51.7 50.2 54.7 German Baseline 53.7 55.0 54.1 Czech CNN-CWT 55.2 55.4 57.9 Czech Baseline 60.3 57.4 60.4 Train with German Spanish CNN-STFT 58.0 55.7 55.8 Spanish Baseline 53.5 53.5 53.6 Czech CNN-STFT 53.0 52.4 53.0 Czech Baseline 50.9 51.7 52.6 Train with Czech Spanish CNN-CWT 53.8 56.3 56.7 Spanish Baseline 53.4 51.6 52.4 German CNN-STFT 54.0 51.8 54.0 German Baseline 51.2 51.0 50.7 28 / 32

  29. Outline Introduction Methods Experimental framework Results Conclusion 29 / 32

  30. Conclusion ◮ A deep learning approach is proposed to model articulation impairments of PD patients. ◮ Voiced-Unvoiced transitions are modeled with CNNs using STFT and CWT. 30 / 32

  31. Conclusion ◮ The proposed method is able to classify PD patients and HC subjects and improves the baseline when the language used for train and test is the same. ◮ Additional approaches should be proposed when the train and test language are different. ◮ Recurrent neural networks and other architectures may be considered to assess co-articulation. ◮ Deep learning approaches trained with phonation, articula- tion, and prosody information may be addressed to evaluate specific speech impairments. 31 / 32

Recommend


More recommend