CAN STANDARD ANALYSIS TOOLS BE USED ON DECOMPRESSED SPEECH? - PowerPoint PPT Presentation

CAN STANDARD ANALYSIS TOOLS BE USED ON DECOMPRESSED SPEECH? R.J.J.H. van Son Institute of Phonetic Sciences/ACLC University of Amsterdam Herengracht 338, 1016CG Amsterdam Rob.van.Son@hum.uva.nl

� ✁ � ✁ � ✁ � Introduction Large Speech Corpora aim at Natural Interactions Field Recordings by Volunteers Large Amounts of it (Months) Internet Distribution Solutions Minidisc Recorders Compressed Storage Compressed Distribution

� ✁ ✁ ✁ � � Methods Analysis using praat 4.0.16 : SPEECH ( IFAcorpus ) : 125 Segmented sentences, Pitch ( Simple : Auto Correlation) read and retold Formants 1-3 ( Burg algorithm) 4 male and 4 female speakers Spectral Center of Gravity Recorded on 2 microphones (first spectral moment) to CD-audio TEST CONDITIONS: Microphone change : From HF condenser (Sennheiser MKH 105) to head-mounted dynamic (Shure SM10A) Sony Minidisc : ATRAC3 on Walkman MZ-R909 Ogg Vorbis (40 kbs) : 1.0rc3 , 45 kbs effective (factor 15.5) Ogg Vorbis (80 kbs) : 1.0rc3 , 85 kbs effective (factor 8.3) MP3 (192 kbs) : LAME 3.92 , 204 kbs effective (factor 3.5) All compressed recordings aligned to within 0.5 ms of original

✁ ✁ ✁ ✁ Jump Errors Pitch can pick wrong (sub-)harmonic Formants can be mislabeled Results in large, " jump ", errors that have to be handled Excluding differences larger than 9 semitones catches most of these jumps

Large Jumps in F 0 -F 3 (# differences > 9 semitones) 4.0% Vowels N=2415 # Jumps --> % 3.0% 2.0% Microphone change Sony Minidisc 1.0% Ogg Vorbis (40 kbs) Ogg Vorbis (80 kbs) MP3 (192 kbs) 0.0% F 0 F 1 F 2 F 3

� � � � � ✁ � � Systematic Differences Bit-rate 80 kbs and higher Pitch < 0.04 semitones Formants < 0.04 semitones CoG < 0.15 semitones Bit-rate 40 kbs F 2 /F 3 0.1 semitones CoG < 0.5 semitones Microphone switch Formants < 0.5 semitones CoG < 5 semitones (!)

✁ ✁ ✁ Root-Mean-Square Errors Systematic Differences are Ignored in this Study Standard Deviation == Root-Mean-Square Error Discard Pitch and Formant ( not CoG) Differences > 9 semitones (>10 standard deviations of the difference)

� RMS Errors in Pitch, Formant & CoG 4.1 = Vowels 2.0 RMS error --> semitones N 2322 Microphone change Sony Minidisc 1.5 Ogg Vorbis (40 kbs) Ogg Vorbis (80 kbs) MP3 (192 kbs) 1.0 0.5 0.0 F 0 F 1 F 2 F 3 CoG

� 2.0 RMS Errors in F 0 F 0 RMS error --> semitones (All Sonorants) 1.5 1.0 Microphone change 0.5 Sony Minidisc Ogg Vorbis (40 kbs) Ogg Vorbis (80 kbs) MP3 (192 kbs) 0.0 Vowels Vowel- Total Nasals like N 2322 785 786 3549 Manner of Articulation

RMS Errors in CoG (all continuants) 2.0 4.1 5.4 3.2 7.6 2.5 5.3 = = = = = = RMS error --> semitones CoG 1.5 1.0 Microphone change Sony Minidisc 0.5 Ogg Vorbis (40 kbs) Ogg Vorbis (80 kbs) MP3 (192 kbs) 0.0 Vowels Vowel- Nasals Fricatives Total like N = 2415 853 795 863 4926 Manner of Articulation

� � � Cascaded Compression Field situation: Record on Minidisc Transmit/Store/Distribute with 80 kbs Compression Archive with 192 kbs Compression Simulated with: CD-audio (Original) -> Sony Minidisc -> Ogg Vorbis 80 kbs -> MP3 192 kbs

� ✁ � � Cascaded Compression Sony MD > Ogg Vorbis (80kbs) > MP3 (192kbs) RMS error --> semitones 2.0 N 863 Sony MD Compression cascade N 814 1.5 Pitch and Formants: Weakest Link Determines 1.0 N 2348 RMS Error (Sony Minidisc) N 786 CoG: Total Error = 0.5 Sum of Component RMS Errors 0.0 F0 F1 F2 F3 CoG F0 CoG F0 CoG CoG Nasals Vowels Vowel- Fricatives like

� ✁ ✁ ✁ ✁ � � ✁ ✁ ✁ � ✁ Discussion and Conclusions Repeated Compression Decompressed Speech Combined Error can be used for Pitch , Pitch & Formants: Weakest Link Formant , and Whole CoG: Sum of Component RMS Spectrum ( CoG ) Analysis Errors Solution: (Partial) Translation of RMS error < 1 semitone Formats, i.e., No Decompression (<6%) Vowels < 0.7 semitone CoG Strongly Affected by Nasals < 0.3 semitone Low bit-rates (40 kbs) Holds for Low bit-rates Repeated Compression (40 kbs) for Pitch and Microphone Choice Formants

CAN STANDARD ANALYSIS TOOLS BE USED ON DECOMPRESSED SPEECH? - PowerPoint PPT Presentation

CAN STANDARD ANALYSIS TOOLS BE USED ON DECOMPRESSED SPEECH? R.J.J.H. van Son Institute of Phonetic Sciences/ACLC University of Amsterdam Herengracht 338, 1016CG Amsterdam Rob.van.Son@hum.uva.nl Introduction

Speech Processing Speech Processing Using Speech with Computers Overview Overview Speech vs

6-Text To Speech (TTS) Speech Synthesis Speech Synthesis Concept Speech Naturalness Phone

Speech Processing 15-492/18-492 Speech Synthesis Overview Text processing Speech Synthesis

Speech Processing 15- -492/18 492/18- -492 492 Speech Processing 15 Speech Synthesis Prosody

Analysis of speech Dr. Anil Kumar Vuppala IIIT Hyderabad Analysis of speech Representing speech

EECS E6870 converting speech to text Speech Recognition automatic speech recognition

Speech Processing 11-492/18-492 Speech Processing 11-492/18-492 Speech Synthesis Evaluation

Project Overview Speech Speech Generation Generation Common Semantic Frame Speech Speech

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 25: Speech

Speech Processing 11-492/18-492 Speech Processing 11-492/18-492 Speech Recognition Acoustic

Speech Processing 15-492/18-492 Speech Synthesis Pronunciation Letter to Sound rules Speech

Speech Processing 11-492/18-492 Speech Synthesis Overview Text processing Speech Synthesis

Speech of Greta Thunberg at the UN Climate Change COP24 Conference in Katowice Content -Greta

Speech Processing 15-492/18-492 Speech Synthesis Waveform generation 2 Speech Synthesis Text

Speech sound disorder by Sajjal (2018) Definition A speech sound disorder (SSD) is a speech

Chapter 1 Introduction to Speech Signal Processing 1 Outline The

Text-to-Speech Synthesis Bernd Mbius Language Science and Technology Saarland University

Formal Verification of a State-of-the-Art Integer Square Root Guillaume Melquiond Rapha el

Performance analysis and formal verification of cognitive wireless networks Gian-Luca Dei Rossi

Towards Formal Verification in Cryptographic Web Applications A Three Year Evolution Nadim

Questions about homework? (note the blank page on older version) In part 4, use

DSP HW2-2 Speech Analysis Outline 1. Introduction 2.

EE E6820: Speech & Audio Processing & Recognition Lecture 5: Speech modeling and

The Prediction Error Signal 1 Prediction Error Signal Behavior 2 LP Speech Analysis file:s5,

Sambuz

Useful Links

Newsletter

Mail Us

CAN STANDARD ANALYSIS TOOLS BE USED ON DECOMPRESSED SPEECH? - PowerPoint PPT Presentation

CAN STANDARD ANALYSIS TOOLS BE USED ON DECOMPRESSED SPEECH? R.J.J.H. van Son Institute of Phonetic Sciences/ACLC University of Amsterdam Herengracht 338, 1016CG Amsterdam Rob.van.Son@hum.uva.nl Introduction

Speech Processing Speech Processing Using Speech with Computers Overview Overview Speech vs

6-Text To Speech (TTS) Speech Synthesis Speech Synthesis Concept Speech Naturalness Phone

Speech Processing 15-492/18-492 Speech Synthesis Overview Text processing Speech Synthesis

Speech Processing 15- -492/18 492/18- -492 492 Speech Processing 15 Speech Synthesis Prosody

Analysis of speech Dr. Anil Kumar Vuppala IIIT Hyderabad Analysis of speech Representing speech

EECS E6870 converting speech to text Speech Recognition automatic speech recognition

Speech Processing 11-492/18-492 Speech Processing 11-492/18-492 Speech Synthesis Evaluation

Project Overview Speech Speech Generation Generation Common Semantic Frame Speech Speech

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 25: Speech

Speech Processing 11-492/18-492 Speech Processing 11-492/18-492 Speech Recognition Acoustic

Speech Processing 15-492/18-492 Speech Synthesis Pronunciation Letter to Sound rules Speech

Speech Processing 11-492/18-492 Speech Synthesis Overview Text processing Speech Synthesis

Speech of Greta Thunberg at the UN Climate Change COP24 Conference in Katowice Content -Greta

Speech Processing 15-492/18-492 Speech Synthesis Waveform generation 2 Speech Synthesis Text

Speech sound disorder by Sajjal (2018) Definition A speech sound disorder (SSD) is a speech

Chapter 1 Introduction to Speech Signal Processing 1 Outline The

Text-to-Speech Synthesis Bernd Mbius Language Science and Technology Saarland University

Formal Verification of a State-of-the-Art Integer Square Root Guillaume Melquiond Rapha el

Performance analysis and formal verification of cognitive wireless networks Gian-Luca Dei Rossi

Towards Formal Verification in Cryptographic Web Applications A Three Year Evolution Nadim

Questions about homework? (note the blank page on older version) In part 4, use

DSP HW2-2 Speech Analysis Outline 1. Introduction 2.

EE E6820: Speech &amp; Audio Processing &amp; Recognition Lecture 5: Speech modeling and

The Prediction Error Signal 1 Prediction Error Signal Behavior 2 LP Speech Analysis file:s5,

Sambuz

Useful Links

Newsletter

Mail Us

EE E6820: Speech & Audio Processing & Recognition Lecture 5: Speech modeling and