A Preliminary Studies on Landmark detection Advisor: Hsiao-Chuan - PowerPoint PPT Presentation

SPEECH LAB NTHU EE A Preliminary Studies on Landmark detection Advisor: Hsiao-Chuan Wang Speaker: Jyh-Min CHENG Date: Aug. 18, 2006

Introduction SPEECH LAB NTHU EE � A knowledge-based speech recognition system is dedicated to processing speech (versus signals in general) and therefore is efficient � Rather than explicitly specifying speech knowledge in a recognition system, a statistical approach builds models by training on speech data, thereby implicitly acquiring knowledge on its own 2

Introduction (cont.) SPEECH LAB NTHU EE � Statistical methods have been successful for large-vocabulary, speaker-independent speech recognition � Lee, K.-F. (1989). Automatic speech recognition: the development of the SPHINX system � Heavily reliance on data, statistical methods do not generalize easily to tasks for which they are not explicitly trained � Retraining, adaptation, etc. 3

Introduction (cont.) SPEECH LAB NTHU EE � Performance degrades when there are environment mismatch � Das, S., Bakis, R., A., Nahamoo, D., and Picheny, M. (1993). Influence of background noise and microphone on the performance of the IBM Tangora speech recognition system � A combination of both knowledge-based and statistical approaches � Knowledge sources are added such as phone duration, an auditory front-end, mel-frequency scale, etc. 4

Introduction (cont.) SPEECH LAB NTHU EE � Knowledge-based speech recognition system was proposed � Stevens, K. N., Manuel, S. Y., Shattuck-Hufinagel, S., and Liu, S. (1992). “ Implementation of a model for lexical access based on features ” , ICSLP 5

6 SPEECH LAB NTHU EE Introduction (cont.)

Introduction (cont.) – Distinctive features SPEECH LAB NTHU EE � Distinctive features concisely describe the sounds of a language at a sub-segmental level � They have a relatively direct relation to acoustics and articulation � Jacobson, R., and Zue, V. W. (1952). “ Preliminaries to speech analysis ” � They can concisely describe many of the contextual variations of a segment � Speaking styles, phonological assimilation across word boundaries, etc. 7

Introduction (cont.) – Landmark detection SPEECH LAB NTHU EE � Landmarks are a guide to the presence of underlying segments, which organize distinctive features into bundles � Define regions in an utterance when the acoustic correlates of distinctive features are most salient � They mark perceptual foci and articulatory targets 8

Introduction (cont.) – Landmark detection SPEECH LAB NTHU EE � For some phonetic contrasts, a listener focuses on landmarks to get the acoustic cues necessary for deciphering the underlying distinctive features � Stevens, K. N. (1985). “ Evidence for the role of acoustic boundaries in the perception of speech sounds ” � Furui, S. (1986). “ On the role of spectral transition for speech perception ” � Ohde, R. M. (1994). “ The developmental role of acoustic boundaries in speech perception ” 9

Introduction (cont.) – Landmark detection SPEECH LAB NTHU EE � After finding out the landmarks, the subsequent processing can focus on relevant speech portions, instead of treating each part of the signal equally important � Minimizes the amount of processing necessary � Independent of timing factors, like speaking rate and segmental duration, etc. � Gives timing information to aid in later processing 10

Introduction (cont.) – Landmark detection SPEECH LAB Landmark detection, Frame-based processing, and Segmentation NTHU EE � Landmark detection is just one way to organize the speech waveform � Frame-based processing and Segmentation are two other possibilities 11

Introduction (cont.) – Landmark detection SPEECH LAB Landmark detection, Frame-based processing, and Segmentation NTHU EE � Frame-based processing is the most popular way of dividing up the speech waveform � Segmentation is more structured than frame- based processing � Finds boundaries in the speech waveform � Delimit unequal-length, semi-steady-state, abutting regions, with each region corresponding to a phone or sub-phone unit 12

Introduction (cont.) – Landmark detection SPEECH LAB Landmark detection, Frame-based processing, and Segmentation NTHU EE � Subsequent processing focuses on these regions, typically acquiring averages across a region and sometimes measuring attributes near the boundaries � Gish, H., and Ng. K. (1993). “ A segmental speech model with applications to word spotting ” , ICASSP � Zue, V. W., Glass, J. R., Goodine, D., Leung, H., Philips, M., Pilifroni, J., and Seneff, S. (1990b). “ Recent progress on the SUMMIT system ” 13

Introduction (cont.) – Landmark detection SPEECH LAB Landmark detection, Frame-based processing, and Segmentation NTHU EE � Segmentation approach performs better than or comparably to a frame-based approach while reducing the computational load in training and testing by a significant amount � Flammia, G., Dalsgaard, P., Anderson, O., and Linberg, B. (1992). “ Segment based variable frame rate speech analysis and recognition using spectral variation function ” , ICSLP � Marcus, J. (1993). “ Phonetic recognition in a segmental-based HMM ” , ICASSP 14

Introduction (cont.) – Landmark detection SPEECH LAB Landmark detection, Frame-based processing, and Segmentation NTHU EE � Segmentation was a popular method of organizing speech waveform in the 1970s through mid-1980s � Compatible with acoustic-phonetic processing � Weinstein, C. J., McCandless, S. S., Mondshein, L. F., and Zue, V. W. (1975). “ A system for acoustic-phonetic analysis of continuous speech ” , IEEE ASSP 15

Introduction (cont.) – Landmark detection SPEECH LAB Landmark detection, Frame-based processing, and Segmentation NTHU EE � Segmentation failed when parts of the waveform do not have sharp boundaries, like those corresponding to diphthongs and semivowels � Over-segmentation � Andre-Obrecht, R. (1988). “ A new statistical approach for the automatic segmentation of continuous speech signals ” , IEEE ASSP � Multi-level representation � Glass, J. R. (1988). “ Finding acoustic regularities in speech: applications to phonetic recognition ” 16

Introduction (cont.) – Landmark detection SPEECH LAB Landmark detection, Frame-based processing, and Segmentation NTHU EE � Landmark detection is different from frame- based processing and segmentation � Landmark are foci , so speech processing is done around a landmark rather than in between two landmarks � Not all boundaries are landmarks, and not all landmarks are boundaries � The problem of semivowels and diphthongs is avoided altogether � Typically more hierarchical � Associated with distinctive features rather than associated with phones in segmentation 17

Objective SPEECH LAB NTHU EE � The most numerous types of landmarks are acoustically abrupt � Zue, V., Seneff, S., and Glass, J. (1990a). “ Speech database development at MIT: TIMIT and beyond ” , speech commun. � An estimate based on a phonetically balanced subset of sentences in the TIMIT corpus shows that acoustically abrupt landmarks comprise approximately 68% of the total number of landmarks in speech � Often associated with consonantal segments, like a stop closure or release 18

I. LANDMARKS SPEECH LAB NTHU EE � Categorized into four groups � Abrupt-consonantal (AC) � Abrupt (A) � Nonabrupt (N) � Vocalic (V) 19

I. LANDMARKS (cont.) SPEECH LAB NTHU EE � Phonologically, segments can be classified as [+ consonantal] or [-consonantal] � Sagey, E. (1986). “ The representation of features and relations in nonlinear phonology ” � A [+ consonantal] involves a primary articulator forming a tight constriction in the midline of the vocal tract (lips, tongue blade, tongue body) � A [-consonantal] does not involve a primary articulator and not forming a tight constriction (soft palate, and glottis) � Speech is formed by a series of articulator narrowings and releases 20

I. LANDMARKS (cont.) SPEECH LAB NTHU EE � The most salient of these narrowings and releases are acoustically abrupt � An acoustically abrupt constriction involving a primary articulator is typically tight and is a consequence of implementing a [+ consonantal] segment � An abrupt-consonantal ( AC ) landmark marks the closure and another marks the release of one of these constrictions 21

I. LANDMARKS (cont.) SPEECH LAB NTHU EE � The clearest manifestation of an AC landmark is when the constriction occurs adjacent to a Outer AC landmark [-consonantal] segment � A pair of these landmarks, one on either side of the constriction, will be referred to as the outer AC landmarks � Ex: [b] closure and release in “ able ” � Other landmarks can occur within or outside of the pair of outer AC landmarks 22

I. LANDMARKS (cont.) SPEECH LAB NTHU EE � A common sequence of landmarks is one in which the outer AC landmarks are governed by the same underlying segment and, thus are implemented by the same articulator � Ex: [b] closure and release in “ able ” � Some outer AC landmarks are not governed by the same articulator � Ex: [p] closure and [d] release in “ tap dance ” 23

A Preliminary Studies on Landmark detection Advisor: Hsiao-Chuan - PowerPoint PPT Presentation

SPEECH LAB NTHU EE A Preliminary Studies on Landmark detection Advisor: Hsiao-Chuan Wang Speaker: Jyh-Min CHENG Date: Aug. 18, 2006 Introduction SPEECH LAB NTHU EE A knowledge-based speech recognition system is dedicated to processing

Landmark Landmark-based routing based routing Landmark Landmark-based routing based routing

CS 557 Landmark Routing The Landmark Hierarchy: A New Hierarchy For Routing in Very Large

Detection of neutral particles detection of neutrons detection of neutrinons detection of low

Skeletal System - Bony Landmark Palpation 15a A&P: Skeletal System - Bony Landmark

Topic 1: Landmark Designation Study more flexible alternatives to landmark districts Add

THE LANDMARK NICOSIA HOTEL POSITIONING The Landmark Nicosia, a 5* Hotel in Cyprus, is the

Landmark Map L11: Landmark Mapping Locations and uncertainties of n landmarks, with respect

Multicast- -Enabled Landmark Enabled Landmark Multicast (M- -LANMAR) : LANMAR) : (M

The More the Merrier?! Evaluating the Effect of Landmark Extraction Algorithms on Landmark-Based

CAPT Xie Yanlu Beijing Language and Culture University Outline English landmark Methods

Preliminary results of Preliminary results of Preliminary results of Invalda Preliminary results

Preliminary Report from Preliminary Report from Preliminary Report from Preliminary Report from

Preliminary results of Preliminary results of Preliminary results of Preliminary results of

Low Level Low Level Low Level Low Level Detection of Detection of Detection of Detection of

Model Architectures and Training Techniques for High-Precision Landmark Localization Sina Honari

PRELIMINARY BUDGET TIMELINE Adopt Preliminary budget on June 23 rd The preliminary budget

Issues and Strategies in Annotation at Phoneme Level Mahwish Farooq Phonological labeling

A novel irregular voice model for HMM-based speech synthesis Tams Gbor Csap, Gza Nmeth

Social Gospel, Holiness, Progressivism: Contesting Passion in the Early 20 th Century Berwick

2/13/14 How I Treat Oral Chronic Graft-vs-Host Disease Nathaniel S. Treister, DMD, DMSc

GLU in Asia: the Alumni, the Network and the University Possibilities of Online Education

GLANSER A Scalable Location & Tracking System for First Responders Status Update

Gen enes t s to J Jea eans A A Green een S Solution t to Blue e Den enim UC Berkeley

NONLINEAR SYSTEM IDENTIFICATION USING DETERMINISTIC MULTILEVEL SEQUENCES Ender M. Ek sio

Sambuz

Useful Links

Newsletter

Mail Us

A Preliminary Studies on Landmark detection Advisor: Hsiao-Chuan - PowerPoint PPT Presentation

SPEECH LAB NTHU EE A Preliminary Studies on Landmark detection Advisor: Hsiao-Chuan Wang Speaker: Jyh-Min CHENG Date: Aug. 18, 2006 Introduction SPEECH LAB NTHU EE A knowledge-based speech recognition system is dedicated to processing

Landmark Landmark-based routing based routing Landmark Landmark-based routing based routing

CS 557 Landmark Routing The Landmark Hierarchy: A New Hierarchy For Routing in Very Large

Detection of neutral particles detection of neutrons detection of neutrinons detection of low

Skeletal System - Bony Landmark Palpation 15a A&amp;P: Skeletal System - Bony Landmark

Topic 1: Landmark Designation Study more flexible alternatives to landmark districts Add

THE LANDMARK NICOSIA HOTEL POSITIONING The Landmark Nicosia, a 5* Hotel in Cyprus, is the

Landmark Map L11: Landmark Mapping Locations and uncertainties of n landmarks, with respect

Multicast- -Enabled Landmark Enabled Landmark Multicast (M- -LANMAR) : LANMAR) : (M

The More the Merrier?! Evaluating the Effect of Landmark Extraction Algorithms on Landmark-Based

CAPT Xie Yanlu Beijing Language and Culture University Outline English landmark Methods

Preliminary results of Preliminary results of Preliminary results of Invalda Preliminary results

Preliminary Report from Preliminary Report from Preliminary Report from Preliminary Report from

Preliminary results of Preliminary results of Preliminary results of Preliminary results of

Low Level Low Level Low Level Low Level Detection of Detection of Detection of Detection of

Model Architectures and Training Techniques for High-Precision Landmark Localization Sina Honari

PRELIMINARY BUDGET TIMELINE Adopt Preliminary budget on June 23 rd The preliminary budget

Issues and Strategies in Annotation at Phoneme Level Mahwish Farooq Phonological labeling

A novel irregular voice model for HMM-based speech synthesis Tams Gbor Csap, Gza Nmeth

Social Gospel, Holiness, Progressivism: Contesting Passion in the Early 20 th Century Berwick

2/13/14 How I Treat Oral Chronic Graft-vs-Host Disease Nathaniel S. Treister, DMD, DMSc

GLU in Asia: the Alumni, the Network and the University Possibilities of Online Education

GLANSER A Scalable Location &amp; Tracking System for First Responders Status Update

Gen enes t s to J Jea eans A A Green een S Solution t to Blue e Den enim UC Berkeley

NONLINEAR SYSTEM IDENTIFICATION USING DETERMINISTIC MULTILEVEL SEQUENCES Ender M. Ek sio

Sambuz

Useful Links

Newsletter

Mail Us

Skeletal System - Bony Landmark Palpation 15a A&P: Skeletal System - Bony Landmark

GLANSER A Scalable Location & Tracking System for First Responders Status Update