How Abstract Phonemic Categories Are Necessary for Coping With Speaker-Related Variation Anne Cutler, Frank Eisner, James M. McQueen and Dennis Norris Marius Volz Exemplar Theory 24 June 2020
Introduction: Variability in Speech Sounds Listeners can understand speech sounds despite considerable variability Talker‘s vocal tract shapes, dialect , positions of words, ambient noise, etc. Two utterances of the same speech are never the same
Introduction: Variability in Speech Sounds Abstractionist Model Episodic Model Relevant information is extracted from Lexicon entries of words include the signal information about talker‘s voice Abstract representations can be Complex and detailed memory traces mapped onto representations of words for words in the lexicon Normalisation procedures would be Perception of words and voices are redundant independent processes Whispered and synthesised speech Aphasia in the right vs. the left hemisphere
Introduction: Variability in Speech Sounds Nygaard, Sommers and Pisoni (1994) Trained some listeners to identify voices Trained listeners recognised more new words in noise than untrained listeners Exposure to talkers‘ voices facilitated later recognitions of new words Talker-specific information must have been encoded Adjusting to various voices increases processing demands Perceptual knowledge is retained in procedural memory Enhances processing efficiency of utterances by the same talker
Introduction: Variability in Speech Sounds Abstractionist Model Episodic Model Relevant information is extracted from Lexicon entries of words include the signal information about talker‘s voice Abstract representations can be Complex and detailed memory traces mapped onto representations of words for words in the lexicon Normalisation procedures would be Perception of words and voices are redundant independent processes Talker-specific information plays a role Whispered and synthesised speech in speech perception Aphasia in the right vs. the left An extreme abstractionist view is hemisphere untenable
Introduction: Variability in Speech Sounds Abstractionist Model Episodic Model Relevant information is extracted from the Lexicon entries of words include information about talker‘s voice signal Complex and detailed memory traces for words Abstract representations can be mapped onto representations of words in the lexicon Normalisation procedures would be redundant Perception of words and voices are Talker-specific information plays a role in independent processes speech perception Whispered and synthesised speech An extreme abstractionist view is untenable Aphasia in the right vs. the left hemisphere Cutler et al. (this paper) Extreme versions of neither view is tenable Talker specific knowledge could be stored prelexically Generalisation for idiosyncrasies across the vocabulary possible
Lexically-Guided Perceptual Learning Perceptual system adjusts rapidly to articulatory idiosyncrasies of a talker Norris, McQueen, and Cutler (2003) Two groups of listeners Training: Words that ended in [f] or [s] For each group, one of the fricatives was replaced by an ambiguous fricative [?] Lexical decision task: 90% of [?]-final words were accepted as real words Test: Categorising sounds from an [ ɛ f]-[ ɛ s] continuum Participants were more likely to categorise a sound as their respective training sound Prelexical adjustment in how the acoustic signal is mapped onto a phonemic category
Lexically-Guided Perceptual Learning Eisner and McQueen (2005) Similar training conditions Learning was talker-specific Effect was only applied to the fricative test sounds uttered by the training talker Kraljic and Samuel (2006) Generalised learning found for [d]-[t] and [b]-[p] contrasts Stops contain less information about the talker than fricatives
Lexically-Guided Perceptual Learning Eisner and McQueen (2006) Is learning stable over time? One group was trained in the morning and tested 12 hours later One group was trained in the evening and tested 12 hours later Effects did not decrease in either group Training consisted of listening to a short story with either [f] or [s] replaced Results suggest that lexically guided perceptual learning is automatic
Lexical Generalisability In episodic models: Postlexical phoneme categorisation Based on lexical episodic traces A listener learns about a talker’s unusual speech sound Recognition of all words containing that sound are affected Indicates prelexical phoneme categorisation If learning generalises to words not heard during training… Evidence for abstract prelexical phonemic representations
Lexical Generalisability McQueen, Cutler, and Norris (2006) Training: Auditory lexical decision task Final [f] or [s] were replaced with an ambiguous fricative Test: cross-modal identity priming task Auditory prime followed by a visual lexical decision task Speed and accuracy of decision measured and compared Critical words: DOOF and DOOS Prime: [do:?] or a phonologically unrelated word Listeners were faster and more accurate with their training fricatives More wrong answers (negative values) when the training and target word’s fricative were different Perceptual adjustments are applied to other words in the lexicon
Simulations With an Episodic Model Abstract, flexible prelexical representations help in dealing with phonetic variability Episodic models contain detailed traces, and lack this abstraction and flexibility Episodic models should not be able to explain lexical generalisation
Simulations With an Episodic Model MINERVA-2 (Hintzman 1986) Simulation model of human memory Each episode lays down a trace in Long-Term Memory New inputs activate all traces in proportion to their matching contents An aggregate echo of all activated traces is returned to Working Memory Vector consists of name fields (category identity) and form fields (phonetic patterns)
Simulations With an Episodic Model New training items are similar to existing traces except for their final portion, the ambiguous fricatives A test item‘s ambiguous sound corresponds with such a training stimulus and thus activates its entire trace Training episodes resonate with test inputs but do not help in interpreting them
Simulations With an Episodic Model More unambiguous than ambiguous training sounds lead to a stronger proportional activation of the unambiguous sound due to higher quantity → opposite of studies‘ results
Simulations With an Episodic Model Training phase 40 words ending in [f] and [s] respectively 20 additional ambiguous items that originally ended with the same final phoneme 20 additional episodes of unambiguous items ending with the other final phoneme Test phase Content of the echo was compared to the two possible interpretations Determine if it was more similar to the trained than the untrained fricative Score for form retrieval was slightly below chance → opposite of the effect found in human data
Simulations With an Episodic Model Pure episodic model is unable to simulate results from experiments with humans No generalisation that would put test inputs in the direction of the trained sound Episodic models can abstract a prototype Inputs will activate that prototype Echoes of ambiguous input sounds will also be ambiguous No relationship between name fields of different words ( oliif vs doof )
Conclusions Abstractionist Model vs. Episodic Model Abstract prelexical representations help dealing with variation in the speech signal Efficient: Idiosyncrasies are stored once instead of for each word Benefit comprehension of unheard signals containing such idiosyncrasies Inflexible with respect to acquiring new phonemic categories (second language acquisition) But flexible with respect to adjustments to existing categories (new words with critical sound) Flexibility is incompatible with episodic models Talker-specific information also helps in identifying phonemes and words Hybrid model with both episodic traces and prelexical abstractions influences speech recognition
Thank You for Your Attention!
Recommend
More recommend