Assessment of Vocal Noise via Bi-directional Long-term Linear Prediction of Running Speech F. Bettens * , F. Grenez * , J. Schoentgen *,** * Université Libre de Bruxelles ** National Fund for Scientific Research Belgium
Cause Vocal Dysperiodicities Diplophonia, Bi-Phonation, Vocal Fold Dynamics Random Vibrations Vocal Jitter & Shimmer, Perturbations Frequency & Amplitude Tremor (Audible) Additive Breathiness, Breathy Voice, Noise Owing to Whispery Voice, … Turbulence Vibrations Ventricular Folds or “Parasitic” Vibrations Ary-Epiglottic Ligaments, … Pitch Breaks, Phonation Breaks, Transients Timbre Breaks, …
Existing Cues of Vocal Noise • Detection of individual vocal cycles (or harmonics) � Steady vowel fragments � (Pseudo)-Periodicity � Period Perturbation Quotient � Amplitude Perturbation Quotient � Harmonics-to-Noise Ratio
Objectives : Analyses of Dysperiodicities • Give up request that speech fragments are : � (Pseudo)-Periodic � Steady • Any Speech Fragment : � Modal Voices & (Very) Hoarse Voices � Sustained Vowels & Running Speech
Motivation : Analysis of Running Speech • Voicing in running speech � Variable acoustic impedance � Voicing onsets & offsets � Variable pressure drops � Variable laryngeal positions • Voice Loading
Double Linear Predictive Analysis • Conventional short-term linear prediction: N = ∑ forward short-term − − ⇒ = − x' [ n ] a x [ n i ] e [ n ] x [ n ] x' [ n ] prediction error i S = i 1 with x [ n ] : speech signal • Long-term linear prediction: M = ∑ forward double − − − ⇒ = − y' [ n ] b y [ n P i ] e [ n ] y [ n ] y' [ n ] i L prediction error = i 0 = with y [ n ] e [ n ] S remove existing correlations ⇒ unpredictable noise component (Qi, 1999)
Double Linear Predictive Analysis ⇒ Solutions: Drawbacks: – e S [ n ] is an artificial signal – the dysperiodicities in weighted sum x ′ [ n ] are omitted = ∑ N − − ⇒ = − x' [ n ] a x [ n i ] e [ n ] x [ n ] x' [ n ] i S = i 1 ⇒ remove short-term linear predictive analysis stage – e L [ n ] is inflated to the right of unvoiced/voiced boundaries M = ∑ − − − ⇒ = − y' [ n ] b y [ n P i ] e [ n ] y [ n ] y' [ n ] i L = i 0 ⇒ proceed to bi-directional analysis
Bi-directional Long-term Prediction • Forward long-term linear prediction: � � � M � � = ∑ forward long-term − − − ⇒ = − y ' [ n ] b y [ n P i ] e [ n ] y [ n ] y ' [ n ] prediction error i L = i 0 = with y [ n ] x [ n ] : speech signal • Backward long-term linear prediction: � � � M � � = ∑ backward long-term − + + ⇒ = − y ' [ n ] b y [ n P i ] e [ n ] y [ n ] y ' [ n ] i L prediction error = i 0 • Bi-directional long-term linear prediction: � � ( ) bi-directional long- ⇒ = keep the “best” e [ n ] min e [ n ] , e [ n ] L L L term prediction error (frame by frame)
Long-term Prediction Distance : P Maximum of the auto-correlation function example: steady vowel [a] (dysphonic speaker) ↓ ⇒ P = 184 (2 cycles)
Vocal Noise Cue N ∑ P 2 x [ n ] Signal-to-Dysperiodicity Ratio: = − = n 1 SDR 10 log 1 dB 10 N P ∑ 2 e [ n ] L = n 1 example: healthy speaker dysphonic speaker steady vowel [a] speech signal x [ n ] bi-directional e L [ n ] long-term prediction error SDR = 31,2 dB SDR = 10,1 dB
Results1:Sentence (1 female speaker; modal phonation type) ( http://www.limsi.fr/VOQUAL/ : “Il est sorti avant le jour”) segments [il] speech signal bi-directional long-term prediction error forward long-term prediction error
Results 2 : Sentence (1 female speaker; 5 phonation types) ( http://www.limsi.fr/VOQUAL/ : “Il est sorti avant le jour”) Direction Signal Double Long-term prediction prediction SDR bi-directional > SDR forward SDR (dB) SDR (dB) Bi-directional Modal 25.7 19.5 Rough1 16.9 11.4 Rough2 13.9 8.0 Rough3 9.8 3.6 Whisper 9.5 3.2 Forward Modal 25.4 16.2 Rough1 16.8 10.3 Rough2 13.7 6.9 Rough3 9.6 2.7 Whisper 9.3 1.8 SDR double > SDR long-term
Conclusion The forward & backward long-term prediction of speech enables the analysis of any speech signal with a view to the assessment of the vocal noise (i.e. vocal dysperiodicities) The analysis is not based on any assumptions regarding the periodicity or stationarity of the speech signals
Recommend
More recommend