7-Speech Quality Assessment Quality Levels Subjective Tests - PowerPoint PPT Presentation

7-Speech Quality Assessment Quality Levels Subjective Tests Objective Tests Intelligibility Naturalness

Quality Levels Synthetic Quality (Under 4.8 kbps) Communication Quality (4.8 to 13 kbps) Toll Quality (13 to 64 kbps) Broadcast Quality (Upper than 64 kbps)

Test Types Intelligibility Naturalness Subjective DRT, MRT MOS, DAM AI, Global SNR, Seg. Objective None. SNR, FW-Seg. SNR, Future ASR Itakura Measure, WSSM systems

First Class Subjective Intelligibility Tests Diagnostic Rhyme Test (DRT) – Selecting between two CVC by different first C – First C should have specific properties – Ex. hop - fop And than - dan Modified Rhyme Test (MRT) – Selecting between CVC’s by different first C – Ex. Cat, bat, rat, mat, fat, sat

First Class (Cont’d) Subjective Intelligibility tests DRT is very applicable and credible In this test user can hear the speech only once  N N   Correct Incorrect % 100 DRT N Tests

Second Class Subjective Naturalness tests Mean Opinion Score (MOS) – MOS is very applicable and credible – In this test user can hear the speech a lot Diagnostic Acceptability Measure (DAM) – This test is very complex

Mean Opinion Score (MOS) Scores for MOS are like this Score Speech Quality 1 Not Acceptable 2 Weak 3 Medium 4 Good 5 Excellent

Diagnostic Acceptability Measure (DAM) This test is very complex In this test there is 19 different parameters for score. These parameters divide into 3 main groups: – Signal Quality – Background Quality – Total Quality

Objective Tests These tests can not be used for intelligibility. Because system couldn ’ t recognize speech intelligibility Objective tests can only be used for speech Naturalness

Objective Tests (Cont’d) Articulation Index (AI) Signal to Noise Ratio (SNR) – Global (Classic) SNR – Segmental SNR – Frequency Weighted Segmental SNR

Articulation Index (AI) AI assumes that different frequency bands distortion are independent, and measure signal quality in different bands. In each band determines percentage of perceptible signal by listener 20 Bands HZ . . . . . . . . . 200 6100

Articulation index (Cont’d) Perceptible by user signal : – 1- Upper than human hearing threshold – 2- Under than human pain threshold – 3- Upper than Masking Noise level – In each case one of the states 1 or 3 is prevail

Articulation index (Cont ’ d) In AI SNR measured isolated in each band 20 1 ( , 30 ) Min SNR   AI 20 30  1 j

Signal To Noise Ratio(SNR)    ˆ n s s ( ) ( ) ( ) n n         2 ˆ 2 [ ] E s s  ( ) ( ) ( ) n n n      n n   2 E s ( ) s n   n   2 s ( ) n E     s n 10 log 10 log SNR  ( ) global  E   ˆ 2 [ ] s s ( ) ( ) n n   n

Segmental SNR m j  2 ( ) s n N 1     1  n m M 10 log [ j ] SNR ( ) seg m N j    1 j ˆ 2 [ ( ) ( ) ] s n s n    1 n m M j N : Number of frames j ’ th Frame SNR M: Frame length Usually averaged over “good frames” “good frames”: having SNRs of higher than -10dB and Saturated at +30dB

Frequency Weighted Segmental SNR Siemens Formula: 𝑂 𝐺 𝑘,𝑙 σ 𝑡(𝑜) 2 𝑥 𝑇𝑂𝑆 𝐺𝑋𝑇 = 1 1 𝑂 ෍ ෍ 10𝑚𝑝𝑕 10 𝑡 𝑜 ] 2 σ[(𝑡 𝑜 − Ƹ 𝑋 𝑙 𝑙=1 𝑘=1 𝐺 𝑋 𝑙 = ෍ 𝑥 𝑘,𝑙 𝑘=1 F : Number of frequency bands N : Number of frames

Frequency Weighted Segmental SNR Deller Formula K  10log [ ( ) ( )] w E m E m   , 10 , , j k s k j k j 1 M 1    1 k 10log [ ] SNR  ( ) 10 fw seg K  M  0 j w , j k  1 k

Frequency Weighted Segmental SNR Other Formulas:    ( ) 1 E m M K 1 1    , s k j   10log SNR w  ( ) 10 , fw seg j k K   ( )  M E m      0 1 j k , w k j , j k  1 k   K  10log [ ( ) ( )] w E m E m     , 10 , , j k s k j k j 1 M 1      1 k SNR  ( ) fw seg  K   M  0 j w   , j k    1 k

The Final Formula The right formula for fw-seg SNR is thus:   K  10log [ ( ) ( )]  w E m E m    , 10 , , 1 j k s k j k j M 1      1 k SNR  ( ) fw seg  K   M  0 j w   , j k    1 k

The Final Formula Where – M is the number of frames – j is the frame index – k is the frequency band index – w j,k is the weight of the kth band of the jth frame – E s,k and E e ,k are the energies of the kth band of signal and noise respectively

Itakura Measure (  ) H (  ) S (  ) H Is the envelope spectrum        2 ( ) { ( )} ( ) | ( ) | S F R S X Use from All-Pole (AR) Model

Itakura Measure (Cont ’ d) 1   ( ) H p     j 1 a i e  1 i This is based on the spectrum difference between main signal and assessment signal a Autoregressive Coefficients i K Reflection Coefficients i R Autocorrelation Coefficients i

Itakura Measure (Cont ’ d) M 1    2 ( ( ), ( )) [ ( , ) ( , )] d g m g m g l m g l m ˆ ˆ s s s s M  1 l m :Index of frame l : Index of coefficients

Itakura Measure (Cont ’ d) ~    ( ( ), ( ' )) d m m ˆ lp s s M      [ ( , ) ( , ' )] W l m l m ˆ , , ' l m m s s 1   1 l [ ] M  W , , ' l m m  1 l  ( m , ) Is the l ’ th parameter of the frame that l s conduces m ’ th sample

Weighted Spectral Slope Measure (WSSM)     | ( , ) | | ( 1 , ) | | ( , ) | s k m s k m s k m     ˆ ˆ ˆ | ( , ) | | ( 1 , ) | | ( , ) | s k m s k m s k m  | ( 1 , ) | | ( , ) | are in dB. s k m and s k m Is STFT of k ’ th band of the frame ( , ) s k m that conduces m ’ th sample   ˆ (| ( , ) |, | ( , ) |) d s m s m WSSM 36       ˆ 2 [ | ( , ) | | ( , ) | ] K W s k m s k m , k m  1 k

PESQ Perceptual Evaluation of Speech Quality

PESQ The most eminent result of PESQ is the MOS. It directly expresses the voice quality. The PESQ MOS as defined by the ITU recommendation P.862 ranges from 1.0 (worst) up to 4.5 (best). This may surprise at first glance since the ITU scale ranges up to 5.0, but the explanation is simple: PESQ simulates a listening test and is optimized to reproduce the average result of all listeners (remember, MOS stands for Mean Opinion Score). Statistics however prove that the best average result one can generally expect from a listening test is not 5.0, instead it is ca. 4.5. It appears the subjects are always cautious to score a 5, meaning "excellent", even if there is no degradation at all.

7-Speech Quality Assessment Quality Levels Subjective Tests - PowerPoint PPT Presentation

7-Speech Quality Assessment Quality Levels Subjective Tests Objective Tests Intelligibility Naturalness Quality Levels Synthetic Quality (Under 4.8 kbps) Communication Quality (4.8 to 13 kbps) Toll Quality (13 to 64 kbps) Broadcast Quality

Speech Processing Speech Processing Using Speech with Computers Overview Overview Speech vs

6-Text To Speech (TTS) Speech Synthesis Speech Synthesis Concept Speech Naturalness Phone

EECS E6870 converting speech to text Speech Recognition automatic speech recognition

Speech Processing 11-492/18-492 Speech Processing 11-492/18-492 Speech Synthesis Evaluation

Speech Processing 15-492/18-492 Speech Synthesis Overview Text processing Speech Synthesis

Speech Processing 15- -492/18 492/18- -492 492 Speech Processing 15 Speech Synthesis Prosody

Project Overview Speech Speech Generation Generation Common Semantic Frame Speech Speech

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 25: Speech

Speech Processing 11-492/18-492 Speech Processing 11-492/18-492 Speech Recognition Acoustic

Speech sound disorder by Sajjal (2018) Definition A speech sound disorder (SSD) is a speech

Speech of Greta Thunberg at the UN Climate Change COP24 Conference in Katowice Content -Greta

Chapter 1 Introduction to Speech Signal Processing 1 Outline The

Speech and Language CS 188: Artificial Intelligence Speech technologies Automatic

Speech Processing 15-492/18-492 Speech Recognition Signal Processing Analog to Digital Speech

Speech Processing 15-492/18-492 Speech Synthesis Pronunciation Letter to Sound rules Speech

Speech Processing 11-492/18-492 Speech Synthesis Overview Text processing Speech Synthesis

Authentication and Identity Systems Brad Hill Me iSEC Partners: 2005 Mid-April 2011

Collective Annotation: Applying Voting Theory to Computational Linguistics Ulle Endriss

What is scale? @catswetel #qconlondon How a system How things responds change when its

TDDD89 Introductions Workshop Pamela Vang Overview Structure Language Motivation Johan

LIT ITERARY DEVICES COPY THIS! DEFINITIONS TERMINOLOGY 1) An object or word is used to

5. Applications of Rational and Meromorphic Asymptotics http://ac.cs.princeton.edu

Domain Adaptation in Statistical Machine Translation Logic, Language and Computation Bart

Module 2, Week 1-2 Objective: To Study and Implement Influential Bow Mechanics and Exercises for

Sambuz

Useful Links

Newsletter

Mail Us