assessment of vocal noise via bi directional long term
play

Assessment of Vocal Noise via Bi-directional Long-term Linear - PowerPoint PPT Presentation

Assessment of Vocal Noise via Bi-directional Long-term Linear Prediction of Running Speech F. Bettens * , F. Grenez * , J. Schoentgen *,** * Universit Libre de Bruxelles ** National Fund for Scientific Research Belgium Cause Vocal


  1. Assessment of Vocal Noise via Bi-directional Long-term Linear Prediction of Running Speech F. Bettens * , F. Grenez * , J. Schoentgen *,** * Université Libre de Bruxelles ** National Fund for Scientific Research Belgium

  2. Cause Vocal Dysperiodicities Diplophonia, Bi-Phonation, Vocal Fold Dynamics Random Vibrations Vocal Jitter & Shimmer, Perturbations Frequency & Amplitude Tremor (Audible) Additive Breathiness, Breathy Voice, Noise Owing to Whispery Voice, … Turbulence Vibrations Ventricular Folds or “Parasitic” Vibrations Ary-Epiglottic Ligaments, … Pitch Breaks, Phonation Breaks, Transients Timbre Breaks, …

  3. Existing Cues of Vocal Noise • Detection of individual vocal cycles (or harmonics) � Steady vowel fragments � (Pseudo)-Periodicity � Period Perturbation Quotient � Amplitude Perturbation Quotient � Harmonics-to-Noise Ratio

  4. Objectives : Analyses of Dysperiodicities • Give up request that speech fragments are : � (Pseudo)-Periodic � Steady • Any Speech Fragment : � Modal Voices & (Very) Hoarse Voices � Sustained Vowels & Running Speech

  5. Motivation : Analysis of Running Speech • Voicing in running speech � Variable acoustic impedance � Voicing onsets & offsets � Variable pressure drops � Variable laryngeal positions • Voice Loading

  6. Double Linear Predictive Analysis • Conventional short-term linear prediction: N = ∑ forward short-term − − ⇒ = − x' [ n ] a x [ n i ] e [ n ] x [ n ] x' [ n ] prediction error i S = i 1 with x [ n ] : speech signal • Long-term linear prediction: M = ∑ forward double − − − ⇒ = − y' [ n ] b y [ n P i ] e [ n ] y [ n ] y' [ n ] i L prediction error = i 0 = with y [ n ] e [ n ] S remove existing correlations ⇒ unpredictable noise component (Qi, 1999)

  7. Double Linear Predictive Analysis ⇒ Solutions: Drawbacks: – e S [ n ] is an artificial signal – the dysperiodicities in weighted sum x ′ [ n ] are omitted = ∑ N − − ⇒ = − x' [ n ] a x [ n i ] e [ n ] x [ n ] x' [ n ] i S = i 1 ⇒ remove short-term linear predictive analysis stage – e L [ n ] is inflated to the right of unvoiced/voiced boundaries M = ∑ − − − ⇒ = − y' [ n ] b y [ n P i ] e [ n ] y [ n ] y' [ n ] i L = i 0 ⇒ proceed to bi-directional analysis

  8. Bi-directional Long-term Prediction • Forward long-term linear prediction: � � � M � � = ∑ forward long-term − − − ⇒ = − y ' [ n ] b y [ n P i ] e [ n ] y [ n ] y ' [ n ] prediction error i L = i 0 = with y [ n ] x [ n ] : speech signal • Backward long-term linear prediction: � � � M � � = ∑ backward long-term − + + ⇒ = − y ' [ n ] b y [ n P i ] e [ n ] y [ n ] y ' [ n ] i L prediction error = i 0 • Bi-directional long-term linear prediction: � � ( ) bi-directional long- ⇒ = keep the “best” e [ n ] min e [ n ] , e [ n ] L L L term prediction error (frame by frame)

  9. Long-term Prediction Distance : P Maximum of the auto-correlation function example: steady vowel [a] (dysphonic speaker) ↓ ⇒ P = 184 (2 cycles)

  10. Vocal Noise Cue   N ∑ P 2 x [ n ]     Signal-to-Dysperiodicity Ratio: = − = n 1 SDR 10 log 1 dB 10 N   P ∑ 2 e [ n ]   L   = n 1 example: healthy speaker dysphonic speaker steady vowel [a] speech signal x [ n ] bi-directional e L [ n ] long-term prediction error SDR = 31,2 dB SDR = 10,1 dB

  11. Results1:Sentence (1 female speaker; modal phonation type) ( http://www.limsi.fr/VOQUAL/ : “Il est sorti avant le jour”) segments [il] speech signal bi-directional long-term prediction error forward long-term prediction error

  12. Results 2 : Sentence (1 female speaker; 5 phonation types) ( http://www.limsi.fr/VOQUAL/ : “Il est sorti avant le jour”) Direction Signal Double Long-term prediction prediction SDR bi-directional > SDR forward SDR (dB) SDR (dB) Bi-directional Modal 25.7 19.5 Rough1 16.9 11.4 Rough2 13.9 8.0 Rough3 9.8 3.6 Whisper 9.5 3.2 Forward Modal 25.4 16.2 Rough1 16.8 10.3 Rough2 13.7 6.9 Rough3 9.6 2.7 Whisper 9.3 1.8 SDR double > SDR long-term

  13. Conclusion The forward & backward long-term prediction of speech enables the analysis of any speech signal with a view to the assessment of the vocal noise (i.e. vocal dysperiodicities) The analysis is not based on any assumptions regarding the periodicity or stationarity of the speech signals

Recommend


More recommend