From Speech Perception to Language Andrew Nevins (Harvard University) Lectures at Universidadte Federal do Rio de Janiero May 2006
Your background? Syllable? Heavy syllable? Stress? Secondary stress? Vowel reduction? Graphs/quantitative data? Top-down vs. bottom-up processing?
Stress is part of all speakers’ knowledge Abso-bloomin’-lutely
Stress can indicate lexical contrasts perVERT PERvert Its acoustic correlates involve greater duration, greater amplitude, and pitch contour on the stressed syllable (the vowel carries most of this)
Contrastive vs. Fixed Stress In languages like English and Russian, stress is not always fixed in the same position, so it can be used to contrast different words (e.g. trústy vs. trustée; or pi.sál vs. pí.sal , a mistake to be careful of!) In other languages (Czech, French, Turkish, Polish, Finnish,...) stress is always in a fixed location (e.g. always on the 1st syllable in Czech, always on the last syllable in French, etc.)
Perceptual use of Fixed Stress Canyoureadthiseasilywithoutpunctuationorspaces? Vroomen 1998: Learners can use stress as a cue for word boundaries in an artificial word monitoring task Infants at 7.5 months can already segment words from fluent speech the bnick strategy is one way (“stab nick”) Metrical segmentation strategy...
The same effect in free stress languages? English speakers sometimes (accidentally) take the stressed syllable to be evidence for a word boundary Thus in a must to a.vóid , a common “slip of the ear” is something like a muscular void 9-month infants prefer to listen to strong-weak words (róbin) than weak-strong words (giráffe) [Jusczyk, Cutler & Redanz 93]
Infants finding words How do infants learn new words? How do they separate the target word from the surrounding context? “Fast mapping” and Carey’s chromium study Brent & Siskind: one-word utterances occur only 9% Boundaries between words are not marked by acoustic events
Jusczyk & Aslin Two groups of infants: one heard cup and dog during familiarization phase, other group heard feet and bike “The cup was bright and shiny” “Meg put her cup back on the table” During test phase, both groups heard sentences with all 4 A later experiment showed they had no preference for tup,bog,zeet,gike : they are not storing words “coarsely”
How do they do it? Allophony: aspiration vs. flapping vs. glottalization (notate,notable,note) Transitional probabilities? P(AB)/P(A): in “prettybaby” -- TP(pre,tty) > TP(tty,ba) “Local minima” of TPs might be used to find word boundaries Based on 2 minutes(!) of exposure: pulikiberagafodaru infants have a preference for words with high internal TP (Saffran et.al) Note that statistics are not a panacea...
Unique Stress Constraint Yang & Gambell 2005 The Unique Stress Constraint: chewbácca vs. dárthváder Take the sequence WSSSW, the USC will automatically segment this as [WS][S][SW]. Take SWWWS. Already you know there are 2 words, and probabilities can work on the medial W’s.
Algebraic “subtraction” Yang & Gambell 2005 If you already know “big” then extracting “snake” is easy in bigsnake Kids seem to do this, saying “two dults”, perhaps after doing subtraction on adult and “I was hajve” after behave
Language-specific processing How much of perception is guided by “training”: what language you already speak? “Top-down” influences on processing
Day 2 The effects of contrastive status (a linguistic property about the way the lexicon is built up) on the way that raw acoustic properties are perceived
Stress “Deafness” in French It is well known that one’s native phonology affects one’s ability to perceive segmental contrasts; e.g. the difficulty of [l]/[r] perception by Japanese speakers Dupoux & Peperkamp suggest that it may also affect one’s ability to perceive suprasegmental contrasts
Stress-deafness Test Dupoux & Peperkamp Subjects required to learn 2 CVCV nonwords that differ only in (a) place of articulation of C2 or (b) stress, and transcribe auditorally presented sequences i.e. kúpi-kúti vs. mípa-mipá Longer duration for stressed σ Higher F0 for stressed σ
Speakers of fixed- stress languages are comparatively bad at perceiving Note that Finnish has initial fixed contrastive stress stress and Spanish has final fixed stress
Rhythm and Prosody “Those guys talk fast!” “I can’t find the word boundaries!”
Rhythmic differences across languages Syllable-timed rhythm (Sp., It.) vs. stress-timed rhythm (Eng,Du) Lloyd James/Kenneth Pike: “Machine-gun languages versus Morse-code languages”
Acoustic Correlates of Rhythm? Durational Isochrony (“even spacing”) not experimentally upheld Phonological characteristics (Dauer 1983)(a) more syllable types in stress-timed languages (e.g. CCVC, VCC, etc.) (b) reduction of unstressed syllables Yet : Catalan has same syllable structure as Spanish, but has vowel reduction; Polish allows complex syllable types, but has no reduction
Ratios and variance Take a look at the spectrogram...which is more salient? Ramus et. al measured vowel/consonant intervals “Next Tuesday on”: [n][e][kst][u][sd][eio][n] %V and Variance(C)
Babies can tell!
Rhythmic differences The next local elections will take place during the winter Le prossime elezioni locali avranno luogo in inverno Tsugi no chiho senkyo wa haruni okonawareru daru Infants hear speech filtered at 400 Hz...
Homework assignment distribution: three parts Feel free to ask questions! nevins@fas.harvard.edu Individual appointments possible Requests for next week’s discussion are encouraged
Day 3: Categories and Speech-Specificity What makes something a category? How does “speech mode” influence perception?
The effects of contrastive status A,B a pair of sounds are used contrastively in a language, A,B only differ along a single acoustic dimension Tokens of sounds produced in between the extremes of “A”-ness and “B”-ness may be perceived differently depending on whether they are used contrastively in the language
Liberman et. al presented a continuum of linguistic stimuli and non-linguistic stimuli . The only acoustic difference: [la] has falling F3 and [ra] has rising F3
Idealized Categorization items 5-8 are categorized as ”A” 100% items 5-8 are categorized as “B” 0% items 1-4 are categorized as ”A” 100% items 1-4 are categorized as “B” 0% Idealized Categorization: Nonetheless, they are perceived 8 stimuli vary along an acoustic as belong to 2 distinct groups dimension in even steps
Idealized Discrimination Within “Category”, subjects cannot reliably discriminate two acoustically different stimuli. They can only guess. (50%) There is a point between each adjacent stimulus But across “category”, they on the continuum which indicates subjects’ ability are perfect, even though the to correctly guess “identical” or “not identical” acoustic difference here is the same as other pairs
Visual Light Wavelength (Nanometers)
English speakers: Stimuli 1-6 categorized as [ra] around100% Stimuli 7-8 not reliably categorized Stimuli 9-13 categorized as [ra] around 0% Discrimination of Stimuli 3 steps apart varied near category boundary for English speakers; discrimination function shows no pattern for Japanese speakers
MMN only for Hindi speakers when -50ms stimulus presented after sequence of -10ms stimuli
On the stimuli that were F3 transitions alone, both populations had non-categorial perception
Contrastiveness and Distributional Patterns These “bell curved” distribution functions, with highest frequency centered symmetrically around a mean are called Gaussian distributions. If a 2-way distinction is contrastive in the language, will it show the unimodal pattern, which has the most actual utterances most centered around the middle of the continuum, or will it have more utterances that are near the extremes? Hint: think about humans’ identification function when there are two contrastive categories along such a continuum
Maye, Werker & Gerken Infants heard: 16 tokens on 8-point ta-da continuum, 4 ma, 4 la 2.3 minutes total Then, they were presented tokens 3 & 6, and tokens 1 & 8 Infants in the bimodal condition looked longer in general They also looked longer when there were 3/6 presented in sequence than 1 or 8 presented alone.
What about allophones? These are also not in a unimodal distribution (though we don’t really have evidence that they are “as bimodal” as contrastive pairs) Learning that two categories are allophonic requires noticing that they are found in completely distinct environments (Notice Maye et.al’s kids heard the stimuli in identical environments: word-inital and followed by the same vowel)
Is Speech processed different than sound? Going back to the l/r study, it was interesting that Japanese speakers could distinguish F3 transitions when presented alone da vs. ga also distinguished by F3 Duplex Perception (Liberman et. al): third formant of da/ga continuum played to one ear, and the rest of sound played to other ear
Recommend
More recommend