University of Southern California IEEE Odyssey June 2016 - PowerPoint PPT Presentation

Understanding ¡individual-‑level ¡speech ¡variability: ¡ ¡ From ¡novel ¡speech ¡production ¡data ¡to ¡robust ¡speaker ¡recognition Shrikanth ¡(Shri) ¡ ¡Narayanan ¡ Signal ¡Analysis ¡and ¡Interpreta6on ¡Laboratory ¡(SAIL) ¡ h:p://sail.usc.edu ¡ University ¡of ¡Southern ¡California ¡ IEEE Odyssey June 2016

Different ¡individuals…. ..each ¡with ¡a ¡uniquely ¡shaped ¡vocal ¡instrument 2

Different ¡individuals…. nose tongue velum ..each ¡with ¡a ¡uniquely ¡shaped ¡vocal ¡instrument 3

And ¡with ¡differing ¡arDculatory ¡strategies ¡during ¡speech ¡… FiEeen ¡different ¡individuals ¡producing ¡vowel ¡/i/

Theme What role can speech science play in understanding and supporting speech technology development? 5

Talk ¡focus: ¡Vocal ¡tract ¡Structure ¡and ¡Function Characterize ¡ interplay ¡between ¡vocal-‑tract ¡structure ¡and ¡function ¡ • Structure: ¡ Physical ¡characteristics ¡of ¡the ¡vocal-‑tract ¡apparatus ¡ • ¡ e.g. ¡hard ¡palate ¡geometry, ¡tongue ¡volume, ¡velum ¡mass ¡ • Function: ¡ Behavioral ¡characteristics ¡of ¡speech ¡articulation ¡ • ¡ e.g. ¡dynamic ¡formation ¡of ¡constrictions ¡in ¡the ¡vocal ¡tract 6

Overarching ¡Questions • How ¡are ¡individual ¡vocal-‑tract ¡ structural ¡differences ¡ reflected ¡ in ¡the ¡speech ¡acoustics? ¡ • Can ¡structural ¡differences ¡be ¡ predicted ¡from ¡acoustics? ¡ • How ¡do ¡individuals ¡adopt ¡to ¡structural ¡differences ¡to ¡achieve ¡ phonetic ¡equivalence ? ¡ • What ¡contributes ¡to ¡distinguishing ¡speakers ¡from ¡one ¡another ¡ from ¡the ¡speech ¡signal? Not only try to differentiate individuals from their speech signal but understand what makes them different from a structure-function perspective 7

Summary ¡of ¡specific ¡goals ¡of ¡this ¡talk • Quantify ¡individual ¡variability ¡ in ¡vocal-‑tract ¡morphology ¡ • Predict ¡morphological ¡details ¡from ¡acoustics ¡ • Characterize ¡ individual ¡articulatory ¡strategy ¡ • Explore ¡applications ¡to ¡automatic ¡speaker ¡recognition ¡ • Interpret ¡speaker ¡recognition ¡as ¡variability ¡ in ¡morphology ¡and ¡ strategy ¡(including ¡speaking ¡style ¡differences) 8

Speech Production and Articulation kNowledge Group http://sail.usc.edu/span Multimodal Data Acquisition Diverse Stimuli 3d MRI EMA RT-MRI Audio • Vowels, Continuants • Read sentences • Spontaneous • Non speech gestures Scientific Multimodal Analysis & Modeling Insights, Models, Theory • direct image analysis APPLICATIONS TECHNOLOGY • forced alignment • articulator tracking • acoustic feature extraction • dynamics of production • cross-modal registration • 3d vocal tract shaping • airway segmentation • morphological characterization • articulatory coordination • task-dynamic modeling • source-filter interaction • realization of prosody • dynamic 3d vocaltract modeling • speaker-specific phonetics • joint factor analysis, manifold learning, multiview learning

Rest ¡of ¡the ¡talk • Measuring ¡speech ¡producDon: ¡geTng ¡data ¡ - focus ¡on ¡magne6c ¡resonance ¡imaging ¡ • Analysis ¡of ¡speech ¡producDon ¡data ¡ • Some ¡modeling ¡& ¡applicaDon ¡results ¡ - Characterizing ¡vocal ¡tract ¡morphology ¡ - Understanding ¡speaker ¡specific ¡ar6culatory ¡strategy ¡ - Inferring ¡vocal ¡tract ¡structure/strategy ¡from ¡speech ¡signal ¡ - Enriching ¡Speaker ¡Verifica6on ¡with ¡produc6on ¡informa6on 10

Methods for vocal tract imaging getting speech production data….

Speech ¡ProducDon ¡Studies: ¡ ¡ Data ¡Is ¡Integral • Observe, ¡measure, ¡visualize ¡ar6culatory ¡details ¡during ¡speech ¡ Long ¡history ¡of ¡instrumenta6on ¡and ¡imaging ¡applica6ons ¡ • Number ¡of ¡techniques, ¡each ¡with ¡its ¡own ¡strengths ¡and ¡limita6ons ¡ ¡ • – Spa6al ¡and ¡temporal ¡resolu6on ¡ – Subject ¡safety ¡ ¡ ¡ – Flexibility, ¡ease ¡of ¡use, ¡portability ¡ – Data ¡interpretability ¡ – Specific ¡research ¡and ¡applica6on ¡needs 12

Commonly used speech production data types X-‑ray ¡ ¡ ¡ ¡ + ¡high ¡temporal ¡and ¡spa6al ¡resolu6on ¡ ¡ − ¡radia6on; ¡limited ¡resolu6on ¡for ¡sob ¡6ssue ¡ Electromagnetometry ¡(EMA) ¡ + ¡safe; ¡high ¡temporal ¡resolu6on; ¡flesh ¡point ¡tracking ¡ ¡ − ¡invasive; ¡spa6ally ¡sparse ¡data; ¡not ¡for ¡pharyngeal ¡structures ¡ Ultrasound ¡ + ¡safe; ¡high ¡temporal ¡resolu6on; ¡portable ¡ − ¡provides ¡incomplete ¡view ¡of ¡vocal ¡tract ¡ Palatography ¡ + ¡safe; ¡high ¡temporal ¡resolu6on; ¡portable ¡ − ¡invasive; ¡provides ¡indirect ¡informa6on ¡on ¡oral ¡cavity ¡ 13

Classic ¡Speech ¡ProducDon ¡Data ¡Examples X-‑ray ¡(Stevens, ¡1962) ¡ Ultrasound ¡(Stone, ¡1980) ¡ http://www.speech.umaryland.edu http://psyc.queensu.ca/~munhallk/05_database.htm upper lip velum tongue Electropalatography teeth lower lip (courtesy: ¡UCLA ¡Phone6cs ¡Lab) Electromagnetometry

Newer ¡PossibiliDes: ¡ ¡ MRI ¡for ¡structural ¡vocal ¡tract ¡imaging Capable of 3D imaging of the hydrogen concentration in human body Number ¡of ¡advantages: ¡ ¡ Non-‑invasive, ¡no ¡ionizing ¡radia6on ¡ – – Arbitrary ¡scan ¡plane: ¡Informa6on ¡on ¡complete ¡vocal ¡tract ¡geometry ¡ ¡ Excellent, ¡flexible ¡structural ¡differen6a6on: ¡Good ¡sob ¡6ssue ¡contrast, ¡SNR ¡ ¡ – Amenable ¡to ¡ ¡computerized ¡3D ¡modeling: ¡reconstruc6on ¡and ¡visualiza6on ¡ – Quan6ta6ve ¡informa6on: ¡area ¡func6on ¡and ¡acous6c ¡rela6ons ¡ – Variability ¡analyses ¡ – LimitaDons/Challenges ¡ – Slow: ¡Spa6al ¡& ¡Temporal ¡resolu6on ¡tradeoffs, ¡op6mizing ¡to ¡a ¡given ¡applica6on ¡ Noisy ¡images: ¡Suscep6bility, ¡blurring ¡ar6facts ¡ – Imaging ¡teeth ¡ – Interac6on ¡with ¡other ¡physiological ¡ac6vi6es: ¡respira6on, ¡swallowing, ¡other ¡movement ¡ – Clean, ¡Synchronized ¡audio ¡(and ¡other ¡modali6es, ¡as ¡needed) ¡ – Ease ¡of ¡experimenta6on, ¡including ¡cost ¡ ¡and ¡portability – 15

  MRI: ¡Toward ¡real ¡Dme ¡acquisiDon ¡for ¡speech ¡ (circa ¡2004) Improving ¡MRI ¡temporal ¡resoluDon ¡ – A ¡non ¡2D-‑FFT ¡acquisi6on ¡strategy ¡ ¡( spiral ¡k-‑space ¡trajectory ) ¡on ¡a ¡GE ¡Signa ¡1.5T ¡ CV/i ¡scanner ¡with ¡a ¡low-‑flip ¡angle ¡spiral ¡gradient ¡echo, ¡9-‑10 ¡images/second ¡ – Adapted ¡pulse ¡sequence ¡originally ¡developed ¡for ¡cardiac ¡imaging. ¡ ¡ VELUM – Effec6ve ¡reconstruc6on ¡rates ¡of ¡24-‑35 ¡frames/second ¡ • sliding ¡window ¡reconstruc6on ¡technique ¡ First ¡to ¡use ¡real-‑Dme ¡MRI ¡and ¡ ¡ synchronous ¡noise-‑cancelled ¡audio ¡ ¡ to ¡understand ¡vocal ¡tract ¡movements ¡ during ¡natural ¡speech ¡producDon. TONGUE Narayanan. ¡S., ¡Nayak, ¡K., ¡ ¡Lee, ¡S., ¡Sethy, ¡A., ¡and ¡Byrd, ¡D. ¡An ¡approach ¡to ¡real-‑6me ¡magne6c ¡resonance ¡imaging ¡for ¡ speech ¡produc6on. ¡J. ¡Acoust. ¡Soc. ¡Am., ¡115:1771-‑1776, ¡2004. 16

Can ¡we ¡speed ¡up ¡MRI ¡to ¡even ¡better ¡rates? ¡ 17

SpaDal ¡vs.Time ¡resoluDon: ¡speech ¡MRI • Our ¡new ¡system ¡( circa ¡2015 ) ¡ ¡enables ¡visualiza6on ¡of ¡all ¡speech ¡tasks 4 Cartesian (R=2.4, 1 slice) consonant constrictions Closures of alveolar trills Spiral (R=6.5, 1 slice) 3.5 sustained Spatial resolution:(mm 2 ) 3 velic sounds velo- movements 2.5 pharyngeal tongue closure movements 2 Single slice • (vowel to consonant 12 ms/frame 1.5 transitions) (83 fps) co-articulation 1 events 0.5 0 Proposed 50 100 150 200 250 300 Time resolution (msec) Sajan Lingala, Yinghua Zhu, Yoon-Chul Kim, Asterios T outios, Shrikanth Narayanan, Krishna Nayak. A fast and flexible MRI system for the study of dynamic vocal tract shaping. Magnetic Resonance in Medicine. 2016

University of Southern California IEEE Odyssey June 2016 - PowerPoint PPT Presentation

Understanding individual-level speech variability: From novel speech production data to robust speaker recognition Shrikanth (Shri) Narayanan Signal Analysis and

02 | 27 SOUTHERN CROSS 23.04 03 | 27 SOUTHERN CROSS 23.04 04 | 27 SOUTHERN CROSS 23.04 06

CSCI-548: Information Integration on the Web Craig Knoblock University of Southern California

Energy Storage Demonstration Programs at Southern California Edison Mark Irwin Director,

Tom Lehman Tom Lehman University Southern California University Southern California Information

Southern Pinghua and its Noun Southern Pinghua and its Noun Southern Pinghua and its Noun

Southern California Regional Brine-Concentrate Management Study Southern California Area Office

Curriculum on Citizenship California Basics California Basics Agenda A1. California

Extension Forestry Update SOUTHERN PLN ORLANDO, FL AUGUST 25-27, 2015 WILLIAM HUBBARD SOUTHERN

1. JEWISH LIFE IN EAST RAND, MIDRAND, SOUTHERN GREAT ESCARPMENT & SOUTHERN HIGHVELD. In the

Intervention Strategies Intervention Strategies Vincent E. Vigil, Ed.D. University of Southern

Joint Forces Training Base Los Alamitos A Unique Southern California Resource Troy Edgar Los

Compiling Axioms from the Source Descriptions Craig Knoblock University of Southern California

Learning to Optimize Plan Execution in Information Agents Craig A. Knoblock Knoblock Craig A.

Record Linkage Record Linkage Craig Knoblock University of Southern California These slides are

Masters Thesis Defense Matthew Jeremy Michelson University of Southern California June 15,

Automatic Outlier Detection: A Bayesian Approach Jo-Anne Ting, University of Southern California

Q&A Please submit all questions concerning webinar content through the Q&A panel.

Optimizing Outcomes for Patients With Soft-Tissue Sarcoma Through the Multidisciplinary Medical

Bone and Soft Tissue 1/7/16 Collecting Cancer Data: Bone & Soft Tissue NAACCR 2015-2016

Diagnosis and treatment Protocol of COVID-19 (Trial Version 7) Updated on March 3, 2020 Released

Skull-1 Norma Frontalis, Lateralis, Occipitalis and Verticalis Dr. Heba Kalbouneh Associate

#1 Light microscope. These are finger like projections of the mucosa that extend into the lumen

Speech production & perception Professor Marie Roch Phonetics & Phonology Phoneme

1 Inspiratory Inspiratory Reserve Capacity P O2 =100 mm Hg P O2 =100 mm Hg O 2 O 2 14 ml O 2

Sambuz

Useful Links

Newsletter

Mail Us

University of Southern California IEEE Odyssey June 2016 - PowerPoint PPT Presentation

Understanding individual-level speech variability: From novel speech production data to robust speaker recognition Shrikanth (Shri) Narayanan Signal Analysis and

02 | 27 SOUTHERN CROSS 23.04 03 | 27 SOUTHERN CROSS 23.04 04 | 27 SOUTHERN CROSS 23.04 06

CSCI-548: Information Integration on the Web Craig Knoblock University of Southern California

Energy Storage Demonstration Programs at Southern California Edison Mark Irwin Director,

Tom Lehman Tom Lehman University Southern California University Southern California Information

Southern Pinghua and its Noun Southern Pinghua and its Noun Southern Pinghua and its Noun

Southern California Regional Brine-Concentrate Management Study Southern California Area Office

Curriculum on Citizenship California Basics California Basics Agenda A1. California

Extension Forestry Update SOUTHERN PLN ORLANDO, FL AUGUST 25-27, 2015 WILLIAM HUBBARD SOUTHERN

1. JEWISH LIFE IN EAST RAND, MIDRAND, SOUTHERN GREAT ESCARPMENT &amp; SOUTHERN HIGHVELD. In the

Intervention Strategies Intervention Strategies Vincent E. Vigil, Ed.D. University of Southern

Joint Forces Training Base Los Alamitos A Unique Southern California Resource Troy Edgar Los

Compiling Axioms from the Source Descriptions Craig Knoblock University of Southern California

Learning to Optimize Plan Execution in Information Agents Craig A. Knoblock Knoblock Craig A.

Record Linkage Record Linkage Craig Knoblock University of Southern California These slides are

Masters Thesis Defense Matthew Jeremy Michelson University of Southern California June 15,

Automatic Outlier Detection: A Bayesian Approach Jo-Anne Ting, University of Southern California

Q&amp;A Please submit all questions concerning webinar content through the Q&amp;A panel.

Optimizing Outcomes for Patients With Soft-Tissue Sarcoma Through the Multidisciplinary Medical

Bone and Soft Tissue 1/7/16 Collecting Cancer Data: Bone &amp; Soft Tissue NAACCR 2015-2016

Diagnosis and treatment Protocol of COVID-19 (Trial Version 7) Updated on March 3, 2020 Released

Skull-1 Norma Frontalis, Lateralis, Occipitalis and Verticalis Dr. Heba Kalbouneh Associate

#1 Light microscope. These are finger like projections of the mucosa that extend into the lumen

Speech production &amp; perception Professor Marie Roch Phonetics &amp; Phonology Phoneme

1 Inspiratory Inspiratory Reserve Capacity P O2 =100 mm Hg P O2 =100 mm Hg O 2 O 2 14 ml O 2

Sambuz

Useful Links

Newsletter

Mail Us

1. JEWISH LIFE IN EAST RAND, MIDRAND, SOUTHERN GREAT ESCARPMENT & SOUTHERN HIGHVELD. In the

Q&A Please submit all questions concerning webinar content through the Q&A panel.

Bone and Soft Tissue 1/7/16 Collecting Cancer Data: Bone & Soft Tissue NAACCR 2015-2016

Speech production & perception Professor Marie Roch Phonetics & Phonology Phoneme