Stefanie Shattuck-Hufnagel Speech Communication Group Research - PowerPoint PPT Presentation

Cue-based analysis of speech: Implications for prosodic transcription Stefanie Shattuck-Hufnagel Speech Communication Group Research Laboratory of Electronics MIT

A stark view: Some unanswered questions • What are the contrastive categories of spoken prosody? • How does their phonetic implementation vary systematically with context? • How do they relate to meaning and to interaction?

Prosodic parallels to a feature-cue-based approach to speech processing? 1) Segmental phonology: growing evidence that language users systematically control: • individual acoustic cues to contrastive phonemic segments • contextually appropriate parameter values of these cues 2) Models: representation and processing of surface phonetic information at this level of detail • feature-cue-based processing (Halle, Stevens) 3) Parallels in prosodic phonology? • if so, what are the implications for prosodic transcription?

Instruction giver’s map Instruction follower’s map

Reduction of surface word forms It’s probably the same thing.

probably the

Strengthening/clarification of surface word forms Are you going to have to do that all over again? ProbabLY .

Extremes of variation in word forms

Surface phonetic segments often not appropriate for transcription • Cues not aligned in time – Cues to a feature can be distributed over time • nasality in V preceding a nasal coda C in I can go • duration of V preceding a voiceless coda C in I can’t go – Cues to features of two segments can overlap in time • /n + dh/ of win those  interdental nasal • Cues selected individually – Individual cues to features survive ‘deletion’ of segment • Duration of V preceding a ‘deleted’ voiced coda C in cat – Individual cues to features are sometimes added • Glottalized word-final /t/ sometimes also has closure and release burst

Feature-cue-based transcription provides a better fit • Stevens 2002 (extending Halle 1972): Two types of features, two types of cues – Landmarks: abrupt spectral changes as cues to articulator-free features • Consonant, Vowel, Glide, Continuant, Sonorant, Strident – Landmark-related cues: spectral patterns near Landmarks, as cues to articulator-bound features • Labial, Coronal, Velar, Voiced, Nasal etc. – Additional acoustic events

Landmark cues Rapid spectral changes across several energy bands which provide information about articulator-free features Boyce et al. 2013

Landmark labelling captures individual cue patterns

Advantages of Landmark Cues in Speech Perception • Reliably produced – 80% of predicted LMs in AEMT Corpus (Shattuck- Hufnagel & Veilleux 2007) • Robustly detectable (‘auditory edges’) • Highly informative – Articulator-free features (~manner) provide estimate of CV structure of the utterance – Identification of regions rich in cues to other features (place, voicing) – Inter-Landmark times provide estimate of durational markers of prosodic structure

Extension to Production A sketch of an extrinsic timing model Stage 1: a phonological planning stage – symbolic segmental representations are sequenced and slotted into an appropriate prosodic structure – appropriate acoustic cues are selected for each segment’s features in its context Stage 2: a phonetic planning stage – cues are mapped onto sets of articulators – appropriate values for spatial and temporal parameters of movement are computed Stage 3: a motor-sensory implementation stage – articulator movements are generated and tracked. Turk and Shattuck-Hufnagel 2014

Evidence for a Feature-Cue-Based production planning model • Evidence that speakers can choose among individual cues – Feature cues left behind in phonetic reduction – New cues in challenging speaking circumstances – Inventory constraints on LM modification • Evidence that speakers compute cue parameter values – Conversational convergence: partial, governed by social values – Covert contrast in development – Inventory constraints on final lengthening

Conversational convergence/divergence Neilson 2011

Evidence for a Feature-Cue-Based production planning model • Evidence that speakers can choose among individual cues – Feature cues left behind in phonetic reduction – New cues in challenging speaking circumstances – Inventory constraints on LM modification • Evidence that speakers compute cue parameter values – Conversational convergence: partial, governed by social values – Covert contrast in development – Inventory constraints on final lengthening

Covert contrast in child speech Scobbie 1998; see also Gibbon 1990

Covert contrast for stop voicing Macken & Barton 1980 JCL

Characteristics of the FCBP approach • More complex planning by the speaker – Not ‘choose a surface allophone’ – But instead, ‘choose context -appropriate feature cues and cue parameter values’ • Extensive interpretation by the listener – Which linguistic constituents and structures does the signal contain cues for? – What information about the interaction and the situation does the signal contain cues for?

Parallels in Prosodic Processing? • Individual variation in cue patterns – Irregular pitch periods at prosodic boundaries and prominences (Pierrehumbert & Talkin 1992, Dilley et al. 1996) • New cues in challenging speaking situations – Dysarthric speakers use duration instead of F0 to signal question vs statement (Patel 2003) – Whispered speech in Mandarin shows amplitude variation analogous to F0 shape for tones (Gao 2003) • Interpretation of ambiguous cues in context – Early prominence patterns influence interpretation of ambiguous later prominence (Dilley & Shattuck-Hufnagel 1998) – Early speaking rate influences interpretation of ambiguous cues to function words (Dilley & Pitt 2008)

Parallels in Prosodic Processing? • Individual variation in cue patterns – Irregular pitch periods at prosodic boundaries and prominences (Pierrehumbert & Talkin 1992, Dilley et al. 1996) • New cues in challenging speaking situations – Dysarthric speakers use duration instead of F0 to signal question vs statement (Patel 2003) – Whispered speech in Mandarin shows amplitude variation analogous to F0 shape for tones (Gao 2003) • Interpretation of cues in context – Early prominence patterns influence interpretation of ambiguous later prominence (Dilley & Shattuck-Hufnagel 1998) – Early speaking rate influences interpretation of ambiguous cues to function words (Dilley & Pittt 1998)

New cues in challenging speaking situations: Dysarthric Speech Patel 2003

New cues in challenging speaking situations: Whispered Speech https://lingos.co/blog/mandarin-tones/ Gao 1999

New cues in challenging speaking situations: Whispered Speech Gao 1999

New Cues in challenging speaking situations: Whispered Speech Gao 2003

Implications for Prosodic Transcription? • Determine the contrastive categories • Determine the range of appropriate cues and cue parameter values for each category, across contexts • Determine the relationship of the categories (and cue parameter values) to meaning and to interaction

Implications for Prosodic Transcription? • Determine the contrastive categories • Determine the range of appropriate cues and cue parameter values for each category, across contexts • Determine the relationship of the categories (and cue parameter values) to meaning and to interaction • Can cue-based transcription move us toward these goals?

Some useful steps • Consider prosodic elements in terms of distributed cues to contrastive elements and parameter values for those cues – Rather than as a sequence of surface elements • Develop displays of parameters as compelling as F0 contours – Duration and amplitude as % of typical – Autodetection of irregular pitch periods • Create inventories of contrastive use of prosodic phrasing and prominence across languages • Investigate ‘phonological equivalence’ in prosody

Phonological equivalence

Which differences distinguish contrasts?

Stefanie Shattuck-Hufnagel Speech Communication Group Research - PowerPoint PPT Presentation

Cue-based analysis of speech: Implications for prosodic transcription Stefanie Shattuck-Hufnagel Speech Communication Group Research Laboratory of Electronics MIT A stark view: Some unanswered questions What are the contrastive categories

Speech Processing Speech Processing Using Speech with Computers Overview Overview Speech vs

CATEGORICAL VS. EPISODIC MEMORY FOR PITCH ACCENTS IN ENGLISH by Amelia E. Kimball, Jennifer Cole,

6-Text To Speech (TTS) Speech Synthesis Speech Synthesis Concept Speech Naturalness Phone

Errors and uncertainty in variables When to worry and when to Bayes? Stefanie Muff

EECS E6870 converting speech to text Speech Recognition automatic speech recognition

Speech Processing 11-492/18-492 Speech Processing 11-492/18-492 Speech Synthesis Evaluation

Speech Processing 15-492/18-492 Speech Synthesis Overview Text processing Speech Synthesis

Speech Processing 15- -492/18 492/18- -492 492 Speech Processing 15 Speech Synthesis Prosody

Project Overview Speech Speech Generation Generation Common Semantic Frame Speech Speech

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 25: Speech

Speech Processing 11-492/18-492 Speech Processing 11-492/18-492 Speech Recognition Acoustic

SK Telecom 1 U U U U U U U- U - - communication - - - - - communication

Speech sound disorder by Sajjal (2018) Definition A speech sound disorder (SSD) is a speech

Speech of Greta Thunberg at the UN Climate Change COP24 Conference in Katowice Content -Greta

Chapter 1 Introduction to Speech Signal Processing 1 Outline The

Speech and Language CS 188: Artificial Intelligence Speech technologies Automatic

Day-7 How to write half letter or double Conjuncts- when two consonants are joined together

Southern French (De-)Nasal(ized) Vowels: [m bOm vEm blAN] Megan L. Risdal Department of

Introduction to English Linguistics 9: Old English Definition c. 450c. 1150 Settlement

EDAN20 Language Technology http://cs.lth.se/edan20/ Chapter 18: Speech Synthesis Pierre Nugues

Acoustic Modeling Hsin-min Wang References: 1. X. Huang et. al., Spoken Language Processing,

Sequence Alignment Gerhard Jger ESSLLI 2016 Gerhard Jger Sequence Alignment ESSLLI 2016 1

Daralyn Hassan, MS, MT(ASCP) April 3rd, 2014 CLIA General overview of CLIA Identification

Exploring new structures for the development of CPL-dyes based on flexible bis(BODIPY)s Csar Ray,

Sambuz

Useful Links

Newsletter

Mail Us