Automated Large-Scale Phonetic Analysis: DASS William A. - PowerPoint PPT Presentation

Automated Large-Scale Phonetic Analysis: DASS William A. Kretzschmar, Jr., Joseph Stanley, Katherine Kuiper University of Georgia 1

DASS • 64 interviews available on a portable USB drive • 370 hours of sound files--c. 200Gb, about 5000 files in all—plus metadata Map by Peggy Renwick • LICHEN user University of Georgia: Paulina Bounds, Steven Coats, interface software William A. Kretzschmar, Jr., Tony Snodgrass University of Oulu: Ilkka Juuso, Lisa Lena Opas- Hänninen, Tapio Seppänen

NSF grant for automated phonetic analysis • Automatically extract stressed vowels in the DASS inteviews • 1.5 million tokens overall • Extent of variation in vowels pronounced by one individual • Variation across regional and social categories of speakers • Challenge for generalizations based on small datasets, like Labov’s Southern Shift 3

Complex systems • Distributions in nonlinear Nonlinear A-curve pattern, patterns vowel in half • “Scale-free” distribution, i.e. the same pattern at every level of scale (overall, regional subsets, social subsets, individuals) • Big Data needed to show the patterns at all levels

Forced alignment with automatic formant extraction • Computational goal since 1970s • P2FA as early success (Yuan and Liberman 2008), used with automatic formant extraction in Evanini 2009. • P2FA has turned into FAVE (Rosenfelder et al. 2011) • DARLA (Dartmouth Linguistic Automation), Reddy and Stanford 2015.

Why DASS? • LAGS already widely used in analyses of Southern speech (e.g. Dorrill 2003, Feagin 2003, Schönweitz 2001, and Thomas 2005). • Thomas (2001) has demonstrated successful acoustic analysis of our old recordings. • The Atlas web site gets about a million accesses per year in recent years, so it is already a dataset that people want to use • DASS makes a good sample across the South

The pilot study (Renwick and Olsen 2015) • Ten speakers from section AK or LAGS, in Southeast Georgia, about 30 hours of audio. • Manual transcription of files, with semi-automated alignment using Perl and formant extraction in Praat, with manual adjustments • For one speaker (LAGS 195), the study found 76,735 words, as opposed to the 800+ targets that LAGS looked for: way more phonetic information!

Our progress: the short story • 35 part-time undergraduate transcribers • Transcriptions with Transcriber tool (available free online) • 3 graduate assistants and our administrative assistant monitor transcription and quality control • Forced alignment with DARLA, automatic formant extraction with modified FAVE

Initial results: æ Speaker 40 (F, W, 38, TN) Speaker 434 (M, B, 90, AL) tokens of æ tokens of æ

Initial results: i Speaker 40 (F, W, 38, TN) Speaker 434 (M, B, 90, AL) tokens of i tokens of i

Complex Systems and the Humanities http://emergence.libs.uga.edu

Thanks for your patience! Selected References Kretzschmar, William A., Jr., Paulina Bounds, Jacqueline Hettel, Lee Pederson, Ilkka Juuso, Lisa Lena Opas-Hänninen, Tapio Seppänen. 2013. The Digital Archive of Southern Speech (DASS). Southern Journal of Linguistics 37.2 (2013): 17-38. Reddy, Sravana and James Stanford. 2015. Toward completely automated vowel extraction: Introducing DARLA. Linguistics Vanguard . Renwick, Margaret, and Rachel Miller Olsen. 2015. Voices of coastal Georgia. Paper presented at the Acoustic Society of America (ASA 2015), Jacksonville. Rosenfelder, Ingrid; Fruehwald, Joe; Evanini, Keelan and Jiahong Yuan. 2011. FAVE (Forced Alignment and Vowel Extraction) Program Suite. http://fave.ling.upenn.edu.

Automated Large-Scale Phonetic Analysis: DASS William A. - PowerPoint PPT Presentation

Automated Large-Scale Phonetic Analysis: DASS William A. Kretzschmar, Jr., Joseph Stanley, Katherine Kuiper University of Georgia 1 DASS 64 interviews available on a portable USB drive 370 hours of sound files--c. 200Gb, about

Why phonetic transcription? Global phonetic diversity Inconsistent orthography within

Phonetics Darrell Larsen Linguistics 101 Darrell Larsen Phonetics What Is Phonetics? Phonetic

Long-Term Formant Long-Term Formant Distribution as a forensic- phonetic feature phonetic

Phonetics Darrell Larsen Linguistics 101 Darrell Larsen Phonetics What Is Phonetics? Phonetic

ANGELICA DASS PROJECT 3 SOURCE: SOURCE: https://www.pinterest.es/pin/

NCCS Data Analytics and Storage System (DASS) May 4, 2016 High Performance Science www.nasa.gov

A large-scale International IPv6 Network A large-scale International IPv6 Network www.6net.org

FINANCING LARGE SCALE SOLAR Large Scale Solar Conference - Sydney Gloria Chan Director, Large

A Phonetic Analysis of Igbo Tone Linda Chinelo Nkamigbo Department of Linguistics Nnamdi Azikiwe

Automated Design of Digital Automated Design of Digital Automated Design of Digital Automated

A Corpus For Large-Scale Phonetic Typology Elizabeth Salesky Eleanor Chodroff Tiago Pimentel

1 I nternational Congress on Phonetic Sciences I CPhS 2019 Melbourne Convention Exhibition Centre

a Visualization of Phonetic i Markers for Early ESL u Learners in

Articulatory Phonetics The Articulatory System and the International Phonetic Alphabet The IPA:

Articulatory Phonetics IPA: The Vowels and the International Phonetic Alphabet Practice

Overview of Automated Bus Consortium Program Accelerating automated technology for transit

Cryptanalytic Extraction of Neural Network Models Nicholas Carlini 1 , Matthew Jagielski 12 , Ilya

CS 133 - Introduction to Computational and Data Science Instructor: Renzhi Cao Computer Science

Lecture 11: Object detection Contains slides from S. Lazebnik, R. Girshick, B. Hariharan 1

Archiving and Packaging A Survey Tim Kientzle kientzle@freebsd.org

Relation Extraction Prof. Sameer Singh CS 295: STATISTICAL NLP WINTER 2017 February 23, 2017

Extracting Semantic Information from on-line Art Music Discussion Forums. Mohamed Sordo, Joan

When We First Met: Visual-Inertial Person Localization for Co-Robot Rendezvous Xi Sun, Xinshuo

Framework to extract Coq terms to -terms Semi-automatic verification (only briefly

Sambuz

Useful Links

Newsletter

Mail Us