From Speech Perception to Language Andrew Nevins (Harvard - PowerPoint PPT Presentation

From Speech Perception to Language Andrew Nevins (Harvard University) Lectures at Universidadte Federal do Rio de Janiero May 2006

Your background? Syllable? Heavy syllable? Stress? Secondary stress? Vowel reduction? Graphs/quantitative data? Top-down vs. bottom-up processing?

Stress is part of all speakers’ knowledge Abso-bloomin’-lutely

Stress can indicate lexical contrasts perVERT PERvert Its acoustic correlates involve greater duration, greater amplitude, and pitch contour on the stressed syllable (the vowel carries most of this)

Contrastive vs. Fixed Stress In languages like English and Russian, stress is not always fixed in the same position, so it can be used to contrast different words (e.g. trústy vs. trustée; or pi.sál vs. pí.sal , a mistake to be careful of!) In other languages (Czech, French, Turkish, Polish, Finnish,...) stress is always in a fixed location (e.g. always on the 1st syllable in Czech, always on the last syllable in French, etc.)

Perceptual use of Fixed Stress Canyoureadthiseasilywithoutpunctuationorspaces? Vroomen 1998: Learners can use stress as a cue for word boundaries in an artificial word monitoring task Infants at 7.5 months can already segment words from fluent speech the bnick strategy is one way (“stab nick”) Metrical segmentation strategy...

The same effect in free stress languages? English speakers sometimes (accidentally) take the stressed syllable to be evidence for a word boundary Thus in a must to a.vóid , a common “slip of the ear” is something like a muscular void 9-month infants prefer to listen to strong-weak words (róbin) than weak-strong words (giráffe) [Jusczyk, Cutler & Redanz 93]

Infants finding words How do infants learn new words? How do they separate the target word from the surrounding context? “Fast mapping” and Carey’s chromium study Brent & Siskind: one-word utterances occur only 9% Boundaries between words are not marked by acoustic events

Jusczyk & Aslin Two groups of infants: one heard cup and dog during familiarization phase, other group heard feet and bike “The cup was bright and shiny” “Meg put her cup back on the table” During test phase, both groups heard sentences with all 4 A later experiment showed they had no preference for tup,bog,zeet,gike : they are not storing words “coarsely”

How do they do it? Allophony: aspiration vs. flapping vs. glottalization (notate,notable,note) Transitional probabilities? P(AB)/P(A): in “prettybaby” -- TP(pre,tty) > TP(tty,ba) “Local minima” of TPs might be used to find word boundaries Based on 2 minutes(!) of exposure: pulikiberagafodaru infants have a preference for words with high internal TP (Saffran et.al) Note that statistics are not a panacea...

Unique Stress Constraint Yang & Gambell 2005 The Unique Stress Constraint: chewbácca vs. dárthváder Take the sequence WSSSW, the USC will automatically segment this as [WS][S][SW]. Take SWWWS. Already you know there are 2 words, and probabilities can work on the medial W’s.

Algebraic “subtraction” Yang & Gambell 2005 If you already know “big” then extracting “snake” is easy in bigsnake Kids seem to do this, saying “two dults”, perhaps after doing subtraction on adult and “I was hajve” after behave

Language-specific processing How much of perception is guided by “training”: what language you already speak? “Top-down” influences on processing

Day 2 The effects of contrastive status (a linguistic property about the way the lexicon is built up) on the way that raw acoustic properties are perceived

Stress “Deafness” in French It is well known that one’s native phonology affects one’s ability to perceive segmental contrasts; e.g. the difficulty of [l]/[r] perception by Japanese speakers Dupoux & Peperkamp suggest that it may also affect one’s ability to perceive suprasegmental contrasts

Stress-deafness Test Dupoux & Peperkamp Subjects required to learn 2 CVCV nonwords that differ only in (a) place of articulation of C2 or (b) stress, and transcribe auditorally presented sequences i.e. kúpi-kúti vs. mípa-mipá Longer duration for stressed σ Higher F0 for stressed σ

Speakers of fixed- stress languages are comparatively bad at perceiving Note that Finnish has initial fixed contrastive stress stress and Spanish has final fixed stress

Rhythm and Prosody “Those guys talk fast!” “I can’t find the word boundaries!”

Rhythmic differences across languages Syllable-timed rhythm (Sp., It.) vs. stress-timed rhythm (Eng,Du) Lloyd James/Kenneth Pike: “Machine-gun languages versus Morse-code languages”

Acoustic Correlates of Rhythm? Durational Isochrony (“even spacing”) not experimentally upheld Phonological characteristics (Dauer 1983)(a) more syllable types in stress-timed languages (e.g. CCVC, VCC, etc.) (b) reduction of unstressed syllables Yet : Catalan has same syllable structure as Spanish, but has vowel reduction; Polish allows complex syllable types, but has no reduction

Ratios and variance Take a look at the spectrogram...which is more salient? Ramus et. al measured vowel/consonant intervals “Next Tuesday on”: [n][e][kst][u][sd][eio][n] %V and Variance(C)

Babies can tell!

Rhythmic differences The next local elections will take place during the winter Le prossime elezioni locali avranno luogo in inverno Tsugi no chiho senkyo wa haruni okonawareru daru Infants hear speech filtered at 400 Hz...

Homework assignment distribution: three parts Feel free to ask questions! nevins@fas.harvard.edu Individual appointments possible Requests for next week’s discussion are encouraged

Day 3: Categories and Speech-Specificity What makes something a category? How does “speech mode” influence perception?

The effects of contrastive status A,B a pair of sounds are used contrastively in a language, A,B only differ along a single acoustic dimension Tokens of sounds produced in between the extremes of “A”-ness and “B”-ness may be perceived differently depending on whether they are used contrastively in the language

Liberman et. al presented a continuum of linguistic stimuli and non-linguistic stimuli . The only acoustic difference: [la] has falling F3 and [ra] has rising F3

Idealized Categorization items 5-8 are categorized as ”A” 100% items 5-8 are categorized as “B” 0% items 1-4 are categorized as ”A” 100% items 1-4 are categorized as “B” 0% Idealized Categorization: Nonetheless, they are perceived 8 stimuli vary along an acoustic as belong to 2 distinct groups dimension in even steps

Idealized Discrimination Within “Category”, subjects cannot reliably discriminate two acoustically different stimuli. They can only guess. (50%) There is a point between each adjacent stimulus But across “category”, they on the continuum which indicates subjects’ ability are perfect, even though the to correctly guess “identical” or “not identical” acoustic difference here is the same as other pairs

Visual Light Wavelength (Nanometers)

English speakers: Stimuli 1-6 categorized as [ra] around100% Stimuli 7-8 not reliably categorized Stimuli 9-13 categorized as [ra] around 0% Discrimination of Stimuli 3 steps apart varied near category boundary for English speakers; discrimination function shows no pattern for Japanese speakers

MMN only for Hindi speakers when -50ms stimulus presented after sequence of -10ms stimuli

On the stimuli that were F3 transitions alone, both populations had non-categorial perception

Contrastiveness and Distributional Patterns These “bell curved” distribution functions, with highest frequency centered symmetrically around a mean are called Gaussian distributions. If a 2-way distinction is contrastive in the language, will it show the unimodal pattern, which has the most actual utterances most centered around the middle of the continuum, or will it have more utterances that are near the extremes? Hint: think about humans’ identification function when there are two contrastive categories along such a continuum

Maye, Werker & Gerken Infants heard: 16 tokens on 8-point ta-da continuum, 4 ma, 4 la 2.3 minutes total Then, they were presented tokens 3 & 6, and tokens 1 & 8 Infants in the bimodal condition looked longer in general They also looked longer when there were 3/6 presented in sequence than 1 or 8 presented alone.

What about allophones? These are also not in a unimodal distribution (though we don’t really have evidence that they are “as bimodal” as contrastive pairs) Learning that two categories are allophonic requires noticing that they are found in completely distinct environments (Notice Maye et.al’s kids heard the stimuli in identical environments: word-inital and followed by the same vowel)

Is Speech processed different than sound? Going back to the l/r study, it was interesting that Japanese speakers could distinguish F3 transitions when presented alone da vs. ga also distinguished by F3 Duplex Perception (Liberman et. al): third formant of da/ga continuum played to one ear, and the rest of sound played to other ear

From Speech Perception to Language Andrew Nevins (Harvard - PowerPoint PPT Presentation

From Speech Perception to Language Andrew Nevins (Harvard University) Lectures at Universidadte Federal do Rio de Janiero May 2006 Your background? Syllable? Heavy syllable? Stress? Secondary stress? Vowel reduction? Graphs/quantitative

Speech Generation and Perception 1 Speech Generation and Perception : The study of the

MODULES AS PERCEPTUAL INPUT - SYSTEMS Language Perception Visual Auditory Perception

Assessing speech perception in children: Current practice and considerations Cincinnati

More on Speech More on Speech Perception Perception Phoneme Phoneme Discrimination

Even more on Speech Even more on Speech Perception: It s not just s not just Perception:

Infant Speech Perception LSCP Infant Lab Outline Introduction to Phonology Problem of

Language 75,000 - 100,000 words Productive or generative nature of language

Speech and Language CS 188: Artificial Intelligence Spring 2011 Speech technologies

Deep Learning for Perception Robert Platt Northeastern University Perception problems We will

Speech and Language CS 188: Artificial Intelligence Speech technologies Automatic

Chapter 4 Hearing, Auditory Models, and Speech Perception

Age-related hearing loss: Speech perception problems and speech technology needs Sandra

speech pathologists do? Leanne Stein, M.S., CCC-SLP Speech-Language Pathologist Speech Therapy

NLP Introduction to Natural Language Processing Introduction Language and Communication

HMMS and Speech HMMS and Speech HMMS and Speech Recognition Recognition Recognition Presented

Speech and Language Therapy Katie Leahy Team lead, Solihull Adult Speech and Language Therapy

EE E6820: Speech & Audio Processing & Recognition Lecture 4: Auditory Perception 1

Our Project Acoustic and lexical effects on speech perception in Kaqchikel (Mayan) LSA 2017 Our

Prevalence of speech and language difficulties at ages 3 and 5 in the ROI, & attendance at

Introduction to CL & NLP CMSC 35100 April 1, 2003 Speech and Language Processing

Chapter 1 Introduction to Speech Signal Processing 1 Outline The

Speech Synthesis and Perception with Envelope Cue B ACKGROUND I MPLEMENTATION R ESULTS D

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 15: Language

Questions Regarding Evaluation of Speech and Language Disorders in Children Under 18 months

From Speech Perception to Language Andrew Nevins (Harvard - PowerPoint PPT Presentation

From Speech Perception to Language Andrew Nevins (Harvard University) Lectures at Universidadte Federal do Rio de Janiero May 2006 Your background? Syllable? Heavy syllable? Stress? Secondary stress? Vowel reduction? Graphs/quantitative

Speech Generation and Perception 1 Speech Generation and Perception : The study of the

MODULES AS PERCEPTUAL INPUT - SYSTEMS Language Perception Visual Auditory Perception

Assessing speech perception in children: Current practice and considerations Cincinnati

More on Speech More on Speech Perception Perception Phoneme Phoneme Discrimination

Even more on Speech Even more on Speech Perception: It s not just s not just Perception:

Infant Speech Perception LSCP Infant Lab Outline Introduction to Phonology Problem of

Language 75,000 - 100,000 words Productive or generative nature of language

Speech and Language CS 188: Artificial Intelligence Spring 2011 Speech technologies

Deep Learning for Perception Robert Platt Northeastern University Perception problems We will

Speech and Language CS 188: Artificial Intelligence Speech technologies Automatic

Chapter 4 Hearing, Auditory Models, and Speech Perception

Age-related hearing loss: Speech perception problems and speech technology needs Sandra

speech pathologists do? Leanne Stein, M.S., CCC-SLP Speech-Language Pathologist Speech Therapy

NLP Introduction to Natural Language Processing Introduction Language and Communication

HMMS and Speech HMMS and Speech HMMS and Speech Recognition Recognition Recognition Presented

Speech and Language Therapy Katie Leahy Team lead, Solihull Adult Speech and Language Therapy

EE E6820: Speech &amp; Audio Processing &amp; Recognition Lecture 4: Auditory Perception 1

Our Project Acoustic and lexical effects on speech perception in Kaqchikel (Mayan) LSA 2017 Our

Prevalence of speech and language difficulties at ages 3 and 5 in the ROI, &amp; attendance at

Introduction to CL &amp; NLP CMSC 35100 April 1, 2003 Speech and Language Processing

Chapter 1 Introduction to Speech Signal Processing 1 Outline The

Speech Synthesis and Perception with Envelope Cue B ACKGROUND I MPLEMENTATION R ESULTS D

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 15: Language

Questions Regarding Evaluation of Speech and Language Disorders in Children Under 18 months

EE E6820: Speech & Audio Processing & Recognition Lecture 4: Auditory Perception 1

Prevalence of speech and language difficulties at ages 3 and 5 in the ROI, & attendance at

Introduction to CL & NLP CMSC 35100 April 1, 2003 Speech and Language Processing