MBE Vocoder Page 0 of 34 Outline Introduction to vocoders MBE - PowerPoint PPT Presentation

MBE Vocoder Page 0 of 34

Outline  Introduction to vocoders  MBE vocoder – MBE Parameters – Parameter estimation – Analysis and synthesis algorithm  AMBE  IMBE Page 1 of 34

Vocoders - analyzer Speech analyzed first by segmenting speech using a 1. window (e.g. Hamming window) Excitation and system parameters are calculated for 2. each segment 1. Excitation parameters : voiced/unvoiced, pitch period 2. System parameters: spectral envelope / system impulse response Sending this parameters 3. Page 2 of 34

Vocoders - Synthesizer Excitation Signal Synthesized System White noise/ unvoiced voice parameters Pulse train/voiced Page 3 of 34

Vocoders  But usually vocoders have poor quality – Fundamental limitation in speech models – Inaccurate parameter estimation – Incapability of pulse train/ white noise to produce all voice • speech synthesized entirely with a periodic source exhibits a “ buzzy ” quality, and speech synthesized entirely with a noise source exhibits a “ hoarse ” quality  Potential solution to buzziness of vocoders is to use of mixed excitation models  In these vocoders periodic and noise like excitations are mixed with a calculated ratio and this ration will be sent along the parameters Page 4 of 34

Multi Band Excitation Speech Model  Due to stationary nature of a speech signal, a window w(n) is usually applied to signal  ( ) ( ) ( ) s w n w n s n (  ) s  The Fourier transform of a windowed segment can be modeled as the product w (   of a spectral envelope and an excitation spectrum ) H | E ( ) | w w     ˆ ( ) ( ) | ( ) | s H E w w w (  In most models is a smoothed version of the original speech spectrum  (  ) s ) H w w Page 5 of 34

MBE model (Cont ’ d)  the spectral envelope must be represented accurately enough to prevent degradations in the spectral envelope from dominating. – quality improvements achieved by the addition of a frequency dependent voiced/unvoiced mixture function.  In previous simple models, the excitation spectrum is totally specified by the fundamental frequency w 0 and a voiced/unvoiced decision for the entire spectrum.  In MBE model, the excitation spectrum is specified by the fundamental frequency w 0 and a frequency dependent voiced/unvoiced mixture function. Page 6 of 34

Multi Banding  In general, a continuously varying frequency dependent voiced/unvoiced mixture function would require a large number of parameters to represent it accurately. The addition of a large number of parameters would severely decrease the utility of this model in such applications as bit-rate reduction.  To further reduce the number of these binary parameters, the spectrum is divided into multiple frequency bands and a binary voiced/unvoiced parameter is allocated to each band.  MBE model differs from previous models in that the spectrum is divided into a large number of frequency bands (typically 20 or more), whereas previous models used three frequency bands at most . Page 7 of 34

Multi Banding V/UV Original information spectrum Noise Spectral spectrum envelope Excitation spectrum Periodic spectrum Synthetic spectrum Page 8 of 34

MBE Parameters The parameters used in MBE model are:  1. spectral envelope 2. the fundamental frequency 3. the V/UV information for each harmonic 4. and the phase of each harmonic declared voiced . The phases of harmonics in frequency bands declared unvoiced are not included since they are not required by the synthesis algorithm Page 9 of 34

Parameter Estimation  In many approaches (LPC based algorithms) the algorithms for estimation of excitation parameters and estimation of spectral envelope parameters operate independently .  These parameters are usually estimated based on heuristic criterion without explicit consideration of how close the synthesized speech will be to the original speech. – This can result in a synthetic spectrum quite different from the original spectrum.  In MBE the excitation and spectral envelope parameters are estimated simultaneously so that the synthesized spectrum is closest in the least squares sense to the spectrum of the original speech “ analysis by synthesis ” Page 10 of 34

Parameter Estimation (Cont ’ d) the estimation process has been divided into two  major steps. 1. In the first step, the pitch period and spectral envelope parameters are estimated to minimize the error between the original spectrum and the synthetic spectrum. 2. Then, the V/UV decisions are made based on the closeness of fit between the original and the synthetic spectrum at each harmonic of the estimated fundamental. Page 11 of 34

Parameter Estimation (cont ’ d) 1   2  The parameters estimated by         ˆ ( ) ( ) s s d minimizing the following error  w w 2 criterion:   – Where     ˆ ( ) ( ) | ( ) | s H E w w w 1  2 b    m      ( ) ( ) s A E d  The error in an interval  m w m W 2 a m is minimized at: b m     ( ) ( ) Sw Ew d  a m A m 2 b m    ( ) Ew d a m Page 12 of 34

Pitch Estimation and Spectral Envelope  An efficient method for obtaining a good approximation for the periodic transform P ( w ) in this interval is to precompute samples of the Fourier transform of the window w (n) and center it around the harmonic frequency associated with this interval.  For unvoiced frequency intervals, the envelope parameters are estimated by substituting idealized white noise (unity across the band) for |E (a)| in previous formulas which reduces to averaging the original spectrum in each frequency interval.  For unvoiced regions, only the magnitude of A, is estimated since the phase of A, is not required for speech synthesis. Page 13 of 34

More about pitch estimation  Experimentally, the error E tends to vary slowly with the pitch period P  the initial estimate is obtained by evaluating the error for integer pitch periods  Since integer multiples of the correct pitch period have spectra with harmonics at the correct frequencies, the error E will be comparable for the correct pitch period and its integer multiples. Page 14 of 34

More about pitch estimation (Cont ’ d) Error/Pitch Speech segment Original Original and spectrum Synthetic P=42.48 Original and Synthetic P=42 Page 15 of 34

V/UV Decision  The voiced/unvoiced decision for each harmonic is made by comparing the normalized error over each harmonic  of the estimated fundamental to a   m 1  threshold m 2 b m   ( ) Sw d  2 When the normalized error over mth  a m harmonic is below the threshold, this frame will be marked as voiced else unvoiced Page 16 of 34

Analysis Algorithm Flowchart start Select initial pitch period Select V/UV spectral (Dynamic programming Envelope parameters Pitch tracker) For each freq. band Window Speech Refine initial pitch period segment (frequency domain approach) Stop Compute error vs. pitch period Autocorrelation approach Make V/UV decision for each Frequency band Page 17 of 34

Speech Synthesis  The voiced signal can be synthesized as the sum of sinusoidal oscillators with frequencies at the harmonics of the fundamental and amplitudes set by the spectral envelope parameters (The time domain method).  The unvoiced signal can be synthesized as the sum of bandpass filtered white noise  The frequency domain method was selected for synthesizing the unvoiced portion of the synthetic speech. Page 18 of 34

Synthesis algorithm block diagram V/UV Voiced envelope Decision Voiced envelope Separate Bank of samples Voiced samples Envelope Voiced/Unvoiced Harmonic speech Unvoiced envelope Unvoiced envelope samples Envelope samples oscillators samples samples Unvoiced envelope Linear interpolation samples White noise Unvoiced Replace Weighted STFT envelope Overlap-add sequence speech Page 19 of 34

MBE Synthesis algorithm  First, the spectral envelope samples are separated into voiced or unvoiced spectral envelope samples depending on whether they are in frequency bands declared voiced or unvoiced  Voiced envelope samples include both magnitude and phase, whereas unvoiced envelope samples include only the magnitude.  Voiced speech is synthesized from the voiced envelope samples by summing the outputs of a band of sinusoidal oscillators running at the harmonics of the fundamental frequency    ˆ ( ) ( ) cos( ( )) s t A t t v m m m Page 20 of 34

MBE Synthesis algorithm (Voiced)   The phase function is determined by an initial m   phase and a frequency track as follows: ( t ) 0 m t         ( ) ( ) t d m m 0 0   The frequency track is linearly interpolated ( t ) m between the m th harmonic of the current frame and that of the next frame by:  ( ) S t t         ( ) ( 0 ) ( ) t m m S 0 0 m m S S Page 21 of 34

MBE Vocoder Page 0 of 34 Outline Introduction to vocoders MBE - PowerPoint PPT Presentation

MBE Vocoder Page 0 of 34 Outline Introduction to vocoders MBE vocoder MBE Parameters Parameter estimation Analysis and synthesis algorithm AMBE IMBE Page 1 of 34 Vocoders - analyzer Speech analyzed first by segmenting

MELP Vocoder Outline 1 Introduction MELP Vocoder Features Algorithm Description

MELP Vocoder Page 0 of 23 Outline Introduction MELP Vocoder Features Algorithm

Vocoders 1 The Channel Vocoder (analyzer) : The channel vocoder employs a bank of bandpass

Analog FM Modulator MIC Digital Voice Vocoder Modulator MIC D-STAR GMSK AMBE by Digital

Lecture 6: Music Mark Hasegawa-Johnson ECE 401: Signal and Image Analysis, Fall 2020 Review

Tacotron: End-to-End TTS Tacotron [Wang 2017]: Neural Vocoder Convert spectrogram to

Digital Voice VHF, UHF, and HF Analog Voice - AM/SSB Analog Voice - FM Digital Voice GMSK UHF

GCT535- Sound Technology for Multimedia Time-Stretching and Pitch-Shifting Graduate School of

A template-based approach for speech synthesis intonation generation using LSTMs Srikanth Ronanki