capt
play

CAPT Xie Yanlu Beijing Language and Culture University Outline - PowerPoint PPT Presentation

Landmark in Chinese CAPT Xie Yanlu Beijing Language and Culture University Outline English landmark Methods to select Chinese landmark Experiments in Chinese CAPT Discussion 2016/2/11 3 Objective in using computer aided


  1. Landmark in Chinese CAPT Xie Yanlu Beijing Language and Culture University

  2. Outline  English landmark  Methods to select Chinese landmark  Experiments in Chinese CAPT  Discussion

  3. 2016/2/11 3

  4. Objective in using computer aided pronunciation training (CAPT) Basic fact: learner's erroneous sound always deviates a little  from the canonical sound. Lip: rounding spread Pinyin: e o “ ” “ ” Rounding e e{o } sound: “ ” “ ” “ Spreading ” sound: o{w } o sound: ” “ Mispronunciation detection is a typical distinctive feature selection problem

  5. Quantal nonlinearities  High-Slope Nonlinearities are Natural Category Boundaries (Stevens, 1989) Acoustics � � I Articulation Stable region Stable region Natural category = robustness to noise and variation, therefore languages tend to choose natural boundaries as their distinctive features. 2016/2/11 5

  6. Nonlinear Map from Acoustic Features to Perceptual Features (Kuhl 1992)

  7. Consonant Confusions at -6dB SNR P T K F TH S SH B D G V DH Z ZH M N P 80 43 64 17 14 6 2 1 1 1 1 2 T 71 84 55 5 9 3 8 1 1 1 K 66 76 107 12 8 9 4 1 1 F 18 12 9 175 48 11 1 7 2 1 2 2 TH 19 17 16 104 64 32 7 5 4 5 6 4 5 S 8 5 4 23 39 107 45 4 2 3 1 1 3 2 1 SH 1 6 3 4 6 29 195 3 1 B 1 5 4 4 136 10 9 47 16 6 1 5 4 D 8 5 80 45 11 20 20 26 1 G 2 3 63 66 3 19 37 56 3 V 2 2 48 5 5 145 45 12 4 DH 6 31 6 17 86 58 21 5 6 4 Z 1 1 17 20 27 16 28 94 44 1 ZH 1 26 18 3 8 45 129 2 M 1 4 4 1 3 177 46 N 4 1 5 2 7 1 6 47 163 Distinctive Features: ± nasal, ± voiced, ± fricative, ± strident

  8. Pronunciation Erroneous Tendency (PET) Confusions in CAPT Diacritic raising PET E.g. Notation s lowering advancing Round sound “u” has Spreading w u{w} backing a spreading lip lengthening The tongue position shortening Backing - n{-} of phoneme is a little centralizing back rounding The aspiration spreading Shorting ; p{;} duration of phoneme labiodentalizing p is shorter laminalizing Balade-palatal devoicing voicing phoneme sh is Laminalizi insertion sh sh{sh} pronounced as ng deletion Japanese lamina- stopping alveolar fricativizing lateral nasalizing flapping

  9. Confusions in CAPT PET Diacritics PET sh sh 、 x zh zh 、 z 、 j Laminalizin ch ch 、 q 、 q6 、 en g x sh j x 、 sh an an 、 ang 、 e v v 、 j Backing ang ang ing ing u u 、 iu 、 q6 Spreading f f eng eng 、 ang q 、 j 、 i|sh| 、 Shorting q zh|sh| k k 、 g r r uo uo 2016/2/11 9

  10. Phonetic landmark  A phonetic landmark is an instantaneous speech event that is  perceptually salient (“salient" = easy to detect), and that has  high information density about the message the speaker wishes to communicate. 2016/2/11 10

  11. Landmarks are Redundant Stevens, 1999 To recognize a stop consonant, it is necessary and sufficient to hear “backed” any one of these: • Release into vowel • Closure from vowel • “Ejective” burst … three “acoustic landmarks” with very different spectral patterns.

  12. landmark locations  Four different candidate landmark locations:  the temporal midpoint of the vowel  the boundary between the vowel and the consonant  the middle of the consonant  the boundary between the consonant and its following segment 2016/2/11 12

  13. Englsih Landmark 1) For all vowel -type phones (usually has labels that starts with the letters  a, e, i, o, u, for example, [ih], [ae], etc.) => Find the middle of the interval = (start time + end time)/2 and put a V landmark 2) For all glide-type phones ( [h], [w], [y], [r], [l] ) => find the middle of  the interval, and put a G landmark 3) For all nasal-type phones ( [m], [n], [ng] ) => at the start time, put the  Nc landmark, and at the end time, put the Nr landmark 4) For all stop-closure phones ( [b-cl], [d-cl], etc.) => at the start time, put  the Sc landmark 5) For all stop-type phones ( [b], [d], etc.) => at the start time, put the Sr  landmark 6) For all fricative-type phones ( [v], [dh], [z], etc.) => at the start time,  put the Fc landmark, and at the end time, put the Fr landmark 7) For all affricate-type phones ([jh] or [dj], [ch] ) => at the start time, put  the Sr landmark, and also put the Fc landmark, and at the end time, put the Fr landmark 2016/2/11 13

  14. How to find Chinese landmark  Refer to English Landmark in IPA  Perception  Observation  Intuition/Guess? 2016/2/11 14

  15. sh sh zh How to find Chinese landmark ch x  English landmark in CAPT j an an  IPA projection v  Chinese landmark in CAPT ang ang  Nasal: an/ang en/eng in/ing ing ing u  Dorsal: j q x k/z c s f  Vowel: v u eng r uo eng eng  Zh/ch q k r uo uo 2016/2/11 15

  16. How to find Chinese landmark: perception of modified speech pure vowel nasalized vowel nasal consonant I V T’ I V T  IV+t-N I V N I V N’  IV-T+N I V N’ I V N  IV-T+n IV+t-N: nasal consonant is cut and nasalized vowel is exchanged, IV-T+N: nasalized vowel is cut, IV-T+n: nasalized vowel is cut and nasal consonant is exchanged

  17. /ban/ vs /bang/ ban1 bang1 Revised3 Revised4 ban1 bang1 Revised5 Revised6 ban1 bang1 Revised1 Revised2 V1 T1 N1 V2 T2 N2 V1 N1 V2 N2 V1 T1 N1 V2 T2 N2 V1 N2 V2 N1 V1 T1 N1 V2 T2 N2 V1 T2 V2 T1 IV+t-N IV-T+N IV-T+n IV+t-N: nasal consonant is cut and nasalized vowel is exchanged, IV-T+N: nasalized vowel is cut, IV-T+n: nasalized vowel is cut and nasal consonant is exchanged 2016/2/11 17

  18. the nasalized vowels play a dominating role in perception 2016/2/11 18

  19. How to find Chinese landmark Dorsal  Dorsal 2016/2/11 19

  20. following vowel landmark T and VOT (Wu 1989)  Coarticulation (Öhman 1966)  Initial C, first V, T and P all start at the syllable onset (Xu 2006)  We cannot explain the result of Dorsal  Due to the landmark ?  Or due to the coarticulation ?  2016/2/11 20

  21. Englsih Landmark & Chinese Landmark 2016/2/11 21

  22. System validation 301 F1 Score  Text utterances true positive rate (TPR)  #speakers 7 females positive rate (FPR).  #utterances 1899 Receiver Operating Characteristic #phonemes 26431  (ROC): Average Receiver Operating Characteristic (ROC) metric that formulates the relationship length per 14 between true positive rate (TPR) and false utterance positive rate (FPR). #kinds of 65 specific PETs

  23. Phonetic Labels

  24. Best acoustic cues selected for individual phones 2016/2/11 24

  25. Landmark: onset of vowel Nearly the same Eng>Chn Chn>Eng Receiver Operating Characteristic (ROC) 2016/2/11 25

  26. Landmark-: following vowel Eng>Chn Eng>Chn Eng>Chn Chn>Eng 2016/2/11 26

  27. Discussion  English landmarks locating at both start and end of durations for most of the 16 phones slightly outperformed Chinese landmarks that was defined by the empirical analysis of error pairs in the large scale corpus.  Chinese landmarks might lose some significant information on discriminating pronunciation errors especially for the nasal phones and fricative phones. 2016/2/11 27

  28. Convolution Forgetting Curve Model Xie Yanlu Beijing Language and Culture University

  29. Outline  Introduction  Exponential shape forgetting curve model  Convolution Forgetting Curve Model  Experiments in cognitive learning

  30. the procedure of memory(Ebbinghaus H,1913)    f t ( ) a exp( a t ) a 1 2 3 exponential function in forgetting (Wixted, J. T., etc 1991)      f t ( ) a exp( t T / ) a exp( t T / ) a 1 1 2 2 3 (Rubin, David, C.etc 1999)  Quantitative Description  Mathematical Description

  31. Exponential shape forgetting curve model Forgetting curve from University of Waterloo

  32. Procedure of convolution memory model (Baddeley AD.2000) Central Executive Output Input Visuo-spatial sketch- Episodic Buffer Phonological loop pad Long term memory

  33. Convolution Forgetting Curve Model Long-term memory conformation is the result of interaction  of input and the central executive in the working memory. In consideration of the relationship between stimulation  (study) and memory, it is alike interaction of signal and system in circuit theory               y t f h t d f t ( )* ( ) h t 

  34. One time learning convolution model (OCM)      y t t h t h t ( )* ( ) ( )       y t h t ( ) a exp( a t ) a 1 2 3 Parameters represent the personal intrinsic characteristic of the learner

  35. Repeated learning convolution model (RCM) N       N y t ( t T )* ( ) h t      n y t f t ( nT )* ( ) h t  n n 1  n 1 N    h t T ( ) n  n 1 N          y t a exp a t ( T ) Na 1 2 n 3  n 1

Recommend


More recommend