End-to-End Probabilistic Inference for Nonstationary Audio Analysis (or how to apply Spectral Mixture GPs to audio) William Wilkinson , Michael Riis Andersen, Josh Reiss, Dan Stowell, Arno Solin June 12, 2019 Queen Mary University of London / Aalto University / Technical University of Denmark
Probabilistic time-frequency analysis We previously showed that a spectral mixture Gaussian process is equivalent to a probabilistic filter bank , i.e. a filter bank that adapts to the signal and can make predictions / generate new data. filter response (dB) filter response (dB) frequency (Hz) frequency (Hz) standard filter bank probabilistic / adaptive filter bank 1
Probabilistic time-frequency analysis We previously showed that a spectral mixture Gaussian process is equivalent to a probabilistic filter bank , i.e. a filter bank that adapts to the signal and can make predictions / generate new data. D � � � σ 2 d exp( −| t − t ′ | /ℓ d ) cos( ω d ( t − t ′ ) [Prior] f ( t ) ∼ GP 0 , , d =1 [Likelihood] y k = f ( t k ) + σ y k ε k , 1
End-to-End probabilistic time-frequency analysis The next step in the signal processing chain is often to analyse the dependencies in the spectrogram, with e.g. non-negative matrix factorisation (NMF) . 2
End-to-End probabilistic time-frequency analysis signal y k Audio 3 Time (sampled at 16 kHz)
End-to-End probabilistic time-frequency analysis spectrogram GP subbands f d ( t ) × GP carrier ) z H ( . q e r F signal y k = Audio 3 Time (sampled at 16 kHz)
End-to-End probabilistic time-frequency analysis GP spectrogram = NMF weights ( W ) × positive modulator GPs ( g n ( t )) × spectrogram GP subbands f d ( t ) × GP carrier ) z H ( . q e r F signal y k = Audio 3 Time (sampled at 16 kHz)
The model GP prior: 0 , σ 2 � d exp( −| t − t ′ | /ℓ d ) cos( ω d ( t − t ′ ) � f d ( t ) ∼ GP , d = 1 , 2 , . . . , D , g n ( t ) ∼ GP (0 , κ ( n ) g ( t , t ′ )) , n = 1 , 2 , . . . , N , 4
The model GP prior: 0 , σ 2 � d exp( −| t − t ′ | /ℓ d ) cos( ω d ( t − t ′ ) � f d ( t ) ∼ GP , d = 1 , 2 , . . . , D , g n ( t ) ∼ GP (0 , κ ( n ) g ( t , t ′ )) , n = 1 , 2 , . . . , N , Likelihood model: � y k = a d ( t k ) f d ( t k ) + σ y ε k , d for square amplitudes (the magnitude spectrogram): � a 2 d ( t k ) = W d , n softplus ( g n ( t k )) , n 4
The model GP prior: 0 , σ 2 � d exp( −| t − t ′ | /ℓ d ) cos( ω d ( t − t ′ ) � f d ( t ) ∼ GP , d = 1 , 2 , . . . , D , g n ( t ) ∼ GP (0 , κ ( n ) g ( t , t ′ )) , n = 1 , 2 , . . . , N , Likelihood model: � y k = a d ( t k ) f d ( t k ) + σ y ε k , d for square amplitudes (the magnitude spectrogram): � a 2 d ( t k ) = W d , n softplus ( g n ( t k )) , n This is a nonstationary spectral mixture GP 4
Inference We show how to write the model as a stochastic differential equation : d ˜ f ( t ) = F ˜ f ( t ) + Lw ( t ) , d t y k = H (˜ f ( t k )) + σ y ε k , such that inference can proceed via Kalman filtering & smoothing. 5
Inference We show how to write the model as a stochastic differential equation : d ˜ f ( t ) = F ˜ f ( t ) + Lw ( t ) , d t y k = H (˜ f ( t k )) + σ y ε k , such that inference can proceed via Kalman filtering & smoothing. Usually the nonlinear H ( · ) is dealt with via linearisation (EKF), but we implement full Expectation Propagation (EP) in the Kalman smoother , and the infinite-horizon solution which scales as: O ( M 2 T ) 5
Applications and Results The fully probabilistic model can, without modification , be applied to: 6
Applications and Results The fully probabilistic model can, without modification , be applied to: Missing Data Synthesis Signal 2 EP IHGP 1 EKF 0 − 2 − 1 0 5 10 15 20 25 30 35 40 Time [ms] 6
Applications and Results The fully probabilistic model can, without modification , be applied to: Missing Data Denoising Synthesis 15 Signal EP 1 2 EP EP 20 IHGP IHGP 1 10 1 EKF SNR [dB] IHGP 20 EKF 1 0 EKF 20 5 SpecSub − 2 − 1 0 1 · 10 − 2 0 . 1 0 . 3 0 . 5 0 5 10 15 20 25 30 35 40 Corrupting noise variance Time [ms] 6
Applications and Results The fully probabilistic model can, without modification , be applied to: Missing Data Denoising Source Separation Synthesis Input audio, y 15 Signal EP 1 2 EP EP 20 IHGP IHGP 1 10 1 EKF SNR [dB] Source one: piano note C IHGP 20 EKF 1 0 EKF 20 Source two: piano note E 5 SpecSub − 2 − 1 Source three: piano note G 0 1 · 10 − 2 0 . 1 0 . 3 0 . 5 0 5 10 15 20 25 30 35 40 Corrupting noise variance Time [ms] 1 1 . 5 2 2 . 5 3 3 . 5 4 4 . 5 5 5 . 5 6 Time [secs] 6
Applications and Results The fully probabilistic model can, without modification , be applied to: Missing Data Denoising Source Separation Synthesis Input audio, y 15 Signal EP 1 2 EP EP 20 IHGP IHGP 1 10 1 EKF SNR [dB] Source one: piano note C IHGP 20 EKF 1 0 EKF 20 Source two: piano note E 5 SpecSub − 2 − 1 Source three: piano note G 0 1 · 10 − 2 0 . 1 0 . 3 0 . 5 0 5 10 15 20 25 30 35 40 Corrupting noise variance Time [ms] 1 1 . 5 2 2 . 5 3 3 . 5 4 4 . 5 5 5 . 5 6 Time [secs] Thanks for listening! Poster: 6:30pm Weds, Pacific Ballroom #217 Contact: william.wilkinson@aalto.fi 6
Recommend
More recommend