linear prediction
play

Linear Prediction 1 Outline Windowing LPC Introduction to - PowerPoint PPT Presentation

Linear Prediction 1 Outline Windowing LPC Introduction to Vocoders Excitation modeling Pitch Detection Short-Time Processing Speech signal is inherently non-stationary For continuant phonemes there are stationary


  1. Linear Prediction 1

  2. Outline  Windowing  LPC  Introduction to Vocoders  Excitation modeling  Pitch Detection

  3. Short-Time Processing  Speech signal is inherently non-stationary  For continuant phonemes there are stationary periods of at least 20-25ms  The short-time speech frames are assumed stationary  The frame length should be chosen to include just one phoneme or allophone  Frame lengths are usually chosen to be between 10- 50ms  We consider rectangular and Hamming windows here 3

  4. Rectangular Window

  5. Hamming Window

  6. Comparison of Windows 6

  7. Comparison of Windows (cont ’ d)

  8. Linear Prediction Coding (LPC)  Based on all-pole model for speech production system: A  H ( z ) p    k 1 a k z .  k 1  In time domain, we get: p     s [ n ] a . s [ n k ] Au [ n ] k g  k 1  In other words, we can predict s[n] as a function of p previous signal samples (and the excitation).  The set of { a k } is one way of representing the time varying filter. There are many other ways to represent this filter (e.g., pole value, Lattice filter value, LSP, … ).

  9. LPC parameter estimation  There are many methods to estimate the LPC parameters:  Autocorrelation method: results in the optimization of a in a set of p linear equations.  Covariance method  Procedures (such as Levinson-Durbin, Burg, Le Roux) obtain efficient estimation of these parameters.

  10. LPC Parameters in Coding (vocoders) Θ 0 Pitch period, P gain DT G(z) impulse glottal voiced generator filter V H(z) R(z) s(n) vocal tract lip radiation speech filter filter signal UV white noise unvoiced generator Θ 0 Pitch period, P gain DT voiced impulse generator V all-pole s(n) filter speech signal UV white unvoiced noise Θ 0 generator gain

  11. Linear Prediction (Introduction) :  The object of linear prediction is to estimate the output sequence from a linear combination of input samples, past output samples or both : q p       ˆ y ( n ) b ( j ) x ( n j ) a ( i ) y ( n i )   j 0 i 1  The factors a(i) and b(j) are called predictor coefficients. 11

  12. Linear Prediction (Introduction) :  Many systems of interest to us are describable by a linear, constant-coefficient difference equation : p q      a ( i ) y ( n i ) b ( j ) x ( n j )   i 0 j 0  If Y(z)/X(z)=H(z), where H(z) is a ratio of polynomials N(z)/D(z), then q p       j i N ( z ) b ( j ) z and D ( z ) a ( i ) z   j 0 i 0  Thus the predictor coefficients give us immediate access to the poles and zeros of H(z). 12

  13. Linear Prediction (Types of System Model) :  There are two important variants :  All-pole model (in statistics, autoregressive (AR) model ) :  The numerator N(z) is a constant.  All-zero model (in statistics, moving-average (MA) model ) :  The denominator D(z) is equal to unity.  The mixed pole-zero model is called the autoregressive moving-average (ARMA) model. 13

  14. Linear Prediction (Derivation of LP equations) :  Given a zero-mean signal y(n), in the AR model : p     ˆ y ( n ) a ( i ) y ( n i )  i 1   ˆ  The error is : e ( n ) y ( n ) y ( n ) p    a ( i ) y ( n i )  i 0  To derive the predictor we use the orthogonality principle , the principle states that the desired coefficients are those which make the error orthogonal to the samples y(n-1), y(n-2), … , y(n-p). 14

  15. Linear Prediction (Derivation of LP equations) :  Thus we require that     y ( n j ) e ( n ) 0 for j 1, 2, ..., p  Or,   p   y ( n j ) a ( i ) y ( n i ) 0  i 0  Interchanging the operation of averaging and summing, and representing < > by summing over n, we have p       a ( i ) y ( n i ) y ( n j ) 0 , j 1,..., p  i 0 n  The required predictors are found by solving these equations. 15

  16. Linear Prediction (Derivation of LP equations) :  The orthogonality principle also states that resulting minimum error is given by   2 E e ( n ) y ( n ) e ( n )  Or, p     a ( i ) y ( n i ) y ( n ) E  i 0 n  We can minimize the error over all time : p     a ( i ) r 0 , j 1 , 2 , ...,p  i j  i 0 p  i  a ( i ) r E   0 i     r y ( n ) y ( n i )  where i   n 16

  17. Linear Prediction (Applications) :  Autocorrelation matching :  We have a signal y(n) with known r yy ( n ) autocorrelation . We model this with the AR system shown below : e ( n ) y ( n ) σ 1-A(z)     H ( z ) p A ( z )    i 1 a i z  i 1 17

  18. Linear Prediction (Order of Linear Prediction) :  The choice of predictor order depends on the analysis bandwidth. The rule of thumb is : 2 BW  1000  p c  For a normal vocal tract, there is an average of about one formant per kilo Hertz of BW.  One formant requires two complex conjugate poles.  Hence for every formant we require two predictor coefficients, or two coefficients per kilo Hertz of bandwidth. 18

  19. Linear Prediction (AR Modeling of Speech Signal) :  True Model: Pitch Gain s(n) Speech DT G(z) Signal Impulse Glottal Voiced U(n) generator Filter Voiced H(z) R(z) Volume V Vocal tract LP velocity U Filter Filter Uncorrelated Noise Unvoiced generator Gain 19

  20. Linear Prediction (AR Modeling of Speech Signal) :  Using LP analysis : Pitch Gain DT estimate Impulse Voiced s(n) generator Speech All-Pole Signal V Filter U (AR) White Noise Unvoiced H(z) generator 20

  21. Introduction to Vocoders V/UV pitch ŝ(n) s(n) vocoder filter parameters Channel vocoder original analysis (or storage) synthesizer synthesized speech speech signal signal  Beside the estimation of the vocal tract parameters, a vocoder needs excitation estimation.  In early vocoders, this has been achieved by the estimation of V/UV, pitch, and gain.  More modern vocoders involve more sophisticated estimation of the excitation, such as in CELP, where vector quantization is used.

  22. Pitch Detection  Since speech signal in voiced frames is quasi -periodic (and not fully periodic), the pitch detection is not always easy.  Especially in some phonemes that manifest less periodic behavior, pitch detection is difficult.  Some pitch detection methods:  AMDF (Average Magnitude Difference Function)  Autocorrelation with center clipping

Recommend


More recommend