Speech Signal Representations Berlin Chen 2003 References: 1. X. Huang et. al., Spoken Language Processing, Chapters 5, 6 2. J. R. Deller et. al., Discrete-Time Processing of Speech Signals, Chapters 4-6 3. J. W. Picone, “Signal modeling techniques in speech recognition,” proceedings of the IEEE, September 1993, pp. 1215-1247
Introduction • Current speech recognition systems are mainly composed of : – A front-end feature extractor (feature extraction module) • Required to discover salient characteristics suited for classification • Based on scientific and/or heuristic knowledge about patterns to recognize – A back-end classifier (classification module) • Required to set class boundaries accurately in the feature space • Statistically designed according to the fundamental Bayes’ decision theory 2
Background Review: Background Review: Digital Signal Processing 3
Analog Signal to Digital Signal Analog Signal Digital Signal: Discrete-time Discrete-time Signal or Digital Signal signal with discrete [ ] ( ) amplitude = x n x nT , T : sampling period a t = nT sampling period=125 μ s 1 T F s = sampling rate =>sampling rate=8kHz 4
Analog Signal to Digital Signal Discrete-Time Continuous-Time to Discrete-Time Conversion Continuous-Time Signal Signal Sampling Impulse Train ( ) x a t To [ ] ( ) ) ( ) Sequence = ˆ x n ( x nT x t s a ∞ ( ) ( ) ( ) ( ) ∑ = = δ − x t s t x t t nT switch a a = −∞ n ( ) ∞ ( ) ∞ ∞ [ ] ( = δ − ∑ ( ) ( ) ∑ ) s t t nT ∑ = δ − = δ − x nT t nT x n t nT a = −∞ n [ ] = −∞ = −∞ n n [ ] ( ) Periodic Impulse Train x n x s t can be uniquely specified by x n Digital Signal ( ) Discrete-time x a t signal with discrete amplitude ( ) δ = ∀ ≠ t 0 , t 0 ( ) ( ) ∞ ∞ 1 = δ − s t t nT ( ) ∑ ∫ δ = t dt 1 = −∞ n − ∞ -2T -T 0 T 2 T 3 T 4 T 5 T 6 T 7 T 8 T 5
Analog Signal to Digital Signal • A continuous signal sampled at different periods ( ) ( ) x a t x a t T 1 ( ) x t s ∞ ( ) ( ) ( ) ( ) ∑ = = δ − x t s t x t t nT a a = −∞ n ∞ ∞ [ ] ( ( ) ( ) ) ∑ ∑ = δ − = δ − x nT t nT x n t nT a = −∞ = −∞ n n 6
Analog Signal to Digital Signal • Spectra ( ) Ω X a j Ω N π = 2 π 2 ( ) ( ) ∞ Ω = π 2 F (sampling frequency) Ω = δ Ω − Ω S j k ∑ s s s T T = −∞ k 1 ( ) ( ) ( ) Ω < Ω ( ) ( ) ( ) T / 2 Ω = Ω Ω Ω = Ω ∗ Ω ( ) X j R j X j X j X j S j Ω = s R j Ω a p s a π s 2 Ω 0 otherwise s π = ∞ 1 1 ( ) ( ( ) ) ∑ Low-pass filter Ω = Ω − Ω Ω < Ω X j X j k s a s T N s 2 T = −∞ k ( ) Ω − Ω > Ω Q s N N ⇒ Ω > Ω 2 s N high frequency components π = 1 got superimposed on Ω > Ω low frequency N s 2 T components ( ) Ω − Ω < Ω Q s N N ⇒ Ω < Ω 2 ( ) ( ) s N Ω Ω aliasing distortion X j can' t be recovered from X j 7 a p
Analog Signal to Digital Signal • To avoid aliasing ( overlapping , fold over ) – The sampling frequency should be greater than two times of Ω > Ω frequency of the signal to be sampled → 2 s N – (Nyquist) sampling theorem • To reconstruct the original continuous signal – Filtered with a low pass filter with band limit Ω s • Convolved in time domain ∞ ( ) ∑ ( ) ( ) = − x t x nT h t nT a a ( ) = −∞ n = sinc Ω h t t s ∞ ( ) ( ) ∑ = Ω − x nT sinc t nT a s = −∞ n 8
Two Main Approaches to Digital Signal Processing • Filtering Signal in Signal out Filter [ ] [ ] x n y n Amplify or attenuate some frequency components of [ ] x n • Parameter Extraction Signal in Parameter out Parameter [ ] Extraction x n c c c 21 L 1 11 c c c e.g.: 22 L 2 12 1. Spectrum Estimation 2. Parameters for Recognition c c c 2 m Lm 1 m 9
Sinusoid Signals [ ] ( ) = ω + φ x n A cos n f : normalized frequency ≤ f ≤ 0 1 – : amplitude ( 振幅 ) A π 2 ω – : angular frequency ( 角頻率 ), ω = π = 2 f T φ – : phase ( 相角 ) Period, represented by number of samples π [ ] = ω n − x n A cos 2 = T 25 samples 10
Sinusoid Signals [ ] is periodic with a period of N (samples) • x n [ ] [ ] + = x n N x n ( ) ( ) ω + + φ = ω + φ A cos ( n N ) A cos n ω = π N 2 π 2 ω = N • Examples (sinusoid signals) [ ] ( ) – is periodic with period N= 8 = π x n cos n / 4 1 [ ] ( ) = π – is periodic with period N= 16 x n cos 3 n / 8 2 [ ] ( ) – = is not periodic x n cos n 3 11
Sinusoid Signals [ ] ( ) = π x n cos n / 4 1 π π π π = = + = + cos n cos ( n N ) cos n N 1 1 4 4 4 4 π ( ) ⇒ = π ⋅ ⇒ ⋅ N 2 k 8 k N and k are positive integers 1 1 4 ∴ = N 8 1 [ ] ( ) = π x n cos 3 n / 8 2 π π π π 3 3 3 3 ( ) = ⋅ = + = ⋅ + ⋅ cos n cos n N cos n N 2 2 8 8 8 8 π 3 16 ( ) ⇒ ⋅ = π ⋅ ⇒ = N 2 k N k N and k are positive numbers 2 2 2 8 3 ∴ = N 16 2 [ ] ( ) = x n cos n 3 ( ) ( ( ) ) ( ) = ⋅ = ⋅ + = + cos 1 n cos 1 n N cos n N 3 3 ⇒ = π ⋅ N 2 k 3 N and k are positive integers Q 3 ∴ N doesn' t exist ! 3 12
Sinusoid Signals • Complex Exponential Signal – Use Euler’s relation to express complex numbers = + z x jy ( ) φ ⇒ = j = φ + φ z Ae A cos j sin ( ) A is a real number Im = φ x A cos = φ y A sin Re 13
Sinusoid Signals • A Sinusoid Signal [ ] ( ) = ω + φ x n A cos n { } ( ) ω + φ = j n Re Ae { } ω φ = j n j Re Ae e 14
Sinusoid Signals • Sum of two complex exponential signals with same frequency ( ) ( ) ω + φ ω + φ j n + j n A e A e 0 1 0 1 ( ) ω φ φ = + j n j j e A e A e 0 1 0 1 ω φ = j n j e Ae ( ) ω + φ = j n Ae A , A and A are real numbers 0 1 – When only the real part is considered ( ) ( ) ( ) ω + φ + ω + φ = ω + φ A cos n A cos n A cos n 0 0 1 1 – The sum of N sinusoids of the same frequency is another sinusoid of the same frequency 15
Some Digital Signals 16
Some Digital Signals [ ] • Any signal sequence can be represented x n as a sum of shift and scaled unit impulse sequences (signals) [ ] [ ] [ ] ∞ = δ − x n x k n k ∑ = −∞ k Time-shifted unit scale/weighted impulse sequence ∞ 3 [ ] [ ] [ ] [ ] [ ] ∑ ∑ = δ − = δ − x n x k n k x k n k = −∞ = − k k 2 [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] = − δ + + − δ + + δ + δ − + δ − + δ − x 2 n 2 x 1 n 1 x 0 n x 1 n 1 x 2 n 2 x 3 n 3 ( ) [ ] ) [ ] ( ) [ ] ( ) [ ] ) [ ] ( ) [ ] ( ( = δ + + − δ + + δ + δ − + − δ − + δ − 1 n 2 2 n 1 2 n 1 n 1 1 n 2 1 n 3 17
Digital Systems • A digital system T is a system that, given an input signal x [ n ], generates an output signal y [ n ] [ ] [ ] { } = y n T x n [ ] { } [ ] T x n y n 18
Properties of Digital Systems • Linear – Linear combination of inputs maps to linear combination of outputs [ ] [ ] [ ] [ ] { } { } { } + = + T ax n bx n aT x n bT x n 1 2 1 2 • Time-invariant (Time-shift) – A time shift of in the input by m samples give a shift in the output by m samples [ ] [ ] { } ± = ± ∀ y n m T x n m , m 19
Properties of Digital Systems • Linear time-invariant (LTI) – The system output can be expressed as a convolution ( 迴旋積分 ) of the input x [ n ] and the impulse response h [ n ] – The system can be characterized by the system’s impulse response h [ n ], which also is a signal sequence [ ] • If the input x [ n ] is impulse , the output is h [ n ] δ n 20
Recommend
More recommend