Outline An Algorithm for Determining • Intro to problem the Endpoints for Isolated • Solution Utterances • Algorithm • Summary L.R. Rabiner and M.R. Sambur The Bell System Technical Journal , Vol. 54, No. 2, Feb. 1975, pp. 297-315 Visual Recognition Motivation • Word recognition needs to detect word boundaries in speech “Eight” • Recognizing silence can reduce: – Processing load – (Network not identified as savings source) • Easy in sound proof room, with digitized tape • Easy • Note how quiet beginning is (tape) Tough Visual Recognition Slightly Tougher Visual Recognition “Four” “Six” • “sss” starts crossing the ‘zero’ line, so can still • Eye picks ‘B’, but ‘A’ is real start detect – /f/ is a weak fricative 1
Tough Visual Recognition Tough Visual Recognition “Nine” “Five” • Difficult to say where final trailing off ends • Eye picks ‘A’, but ‘B’ is real endpoint – V becomes devoiced The Problem The Solution • Noisy computer room with background noise • Two measurements: – Weak fricatives: /f, th, h/ – Energy – Weak plosive bursts: /p, t, k/ – Zero crossing rate • Simple, fast, accurate – Final nasals – Voiced fricatives becoming devoiced – Trailing off of sounds (ex: binary, three) • Simple, efficient processing – Avoid hardware costs Energy Zero (Level) Crossing Rate • Sum of magnitudes of 10 ms of sound, centered on interval: • Number of zero crossings per 10 ms – E(n) = Σ i =-50 to 50 | s(n + i) | – Normal number of cross-overs during silence – Increase in cross-overs during speech 2
The Algorithm: Startup The Algorithm: Thresholds • Compute energy, E ( n ), for interval • At initialization, record sound for 100ms – Get max, IMX – Have silence, IMN – Assume ‘silence’ I 1 = 0.03 * ( IMX – IMN ) + IMN – Measure background noise (3% of peak energy) • Compute average (IZC’) and std dev ( σ ) of I 2 = 4 * IMN zero crossing rate (4x silent energy) • Choose Zero-crossing threshold (IZCT) • Get energy thresholds (ITU and ITL) – Threshold for unvoiced speech – ITL = MIN( I 1, I 2) – IZCT = min(25 / 10ms, IZC’ * 2 σ ) – ITU = 5 * ITL The Algorithm: Zero Crossing The Algorithm: Energy Computation Computation • Search sample for energy greater than ITL • Search back 250 ms – Save as start of speech, say s • Search for energy greater than ITU – Count number of intervals where rate exceeds IZCT – s becomes start of speech • If 3+, set starting point, s , to first time – If energy falls below ITL, restart • Else s remains the same • Search for energy less than ITL • Do similar search after end – Save as end of speech • Results in conservative estimates – Endpoints may be outside Algorithm: Examples The Algorithm: Example “Half” • Caught trailing /f/ (Word begins with strong fricative) 3
Algorithm: Evaluation: Part 1 Examples • 54-word vocabulary • Read by 2 males, 2 females “Four” • No gross errors (off by more than 50ms) • Some small errors – Losing weak fricatives Notice how different each – None affected recognition “four” is Evaluation: Part 2 Evaluation 3: Your Project 1 • 10 speakers • Count 0 to 9 • No errors at all Future Work • Three classes of speech: – Silence – Unvoiced speech – Voiced speech • May be more computationally intensive solutions that are more effective 4
Recommend
More recommend