Multi-Pitch Estimation via Semidefinite Programming August 24, 2016 T. L. Jensen Joint work with L. Vandenberghe, UCLA Dept. of Electronic Systems Aalborg University
Agenda 2 ◮ Multi-pitch estimation. ◮ Superresolution/gridless/atomic norm using semidefinite programming. ◮ Bringing it together. ◮ Complex- and real-valued data. ◮ Simulations
Multi-pitch estimation I 3 ◮ Harmonic signals: Fundamental ω k , first harmonic 2 · ω k , second harmonic 3 · ω k . ◮ Multi-pitch: superposition of k = 1 , . . . , K harmonic signals. ◮ Application in music, speech, vibration analysis etc.
Multi-pitch estimation II 4 100 80 ◮ K = 2 pitches Periodogram 60 ◮ L = 3 harmonics ◮ N = 160 samples 40 ◮ SNR = 31 [dB] 20 0 0 1 2 3 ! ◮ Multi-pitch estimation: Estimate ω k , amplitudes (and K ) 1 . ◮ Problem may be ill-posed or ill-conditioned. 1 M. G. Christensen and A. Jakobsson. Multi-Pitch Estimation . San Rafael, CA, USA: Morgan & Claypool, 2009.
Atomic decomposition 5 ◮ Atomic decomposition over a continuous dictionary A n ⊆ C n using a regularization term f ( � r k ) + � r k =1 a k c H k =1 � c k � 2 minimize . (1) subject to a k ∈ A n , k = 1 , . . . , r ◮ Variables: Atoms a k ∈ C n , coefficients c k ∈ C m , k = 1 , . . . , r and the number of selected atoms r . ◮ m = 1 single measurement, m > 1 multiple measurement case. Notice a kind of (group)-sparsity promoting term. ◮ In current literature: Often � s � T √ n � A n = 1 , exp( jω ) , . . . , exp( j ( n − 1) ω ) � | | ω − α | ≤ β, | s | = 1 , s ∈ C (2) with α = 0 and β = π .
Atomic decomposition as a SDP 6 ◮ With α = 0 and β = π , f convex, the atomic decomposition is equivalent to the SDP f ( X 12 ) + 1 minimize 2 ( tr X 11 + tr X 22 ) � X 11 � X 12 subject to � 0 X H X 22 (3) 12 X 11 ∈ T n X 12 ∈ C n × m , X 22 ∈ H m with r = rank ( X ⋆ 11 ). 2 2 E. J. Cand` es and C. Fernandez-Granda. “Super-resolution from noisy data”. In: J. Fourier Anal. Appl. 19.6 (2013), pp. 1229–1254; G. Tang et al. “Compressed Sensing Off the Grid”. In: IEEE Trans. Information Theory 59.11 (2013), pp. 7465–7490; B. N. Bhaskar, G. Tang, and B. Recht. “Atomic Norm Denoising With Applications to Line Spectral Estimation”. In: IEEE Trans. Signal Processing 61.23 (2013), pp. 5987–5999; Y. Li and Y. Chi. “Off-the-Grid Line Spectrum Denoising and Estimation With Multiple Measurement Vectors”. In: IEEE Trans. Signal Processing 64.5 (2016), pp. 1257–1269.
Complex-valued multi-pitch model 7 The complex-valued multi-pitch model can be formulated as L � x = Z K ( lω )¯ c l , y = x + w (4) l =1 with � T � y = y 0 , . . . , y N − 1 (5) � T � ¯ c l = c l, 1 , . . . , ¯ ¯ c l,K (6) � T � ω = ω 1 , . . . , ω K (7) � � Z K ( ω ) = z ( ω 1 ) , . . . , z ( ω K ) (8) � T � 1 , exp( jω k ) , . . . , exp( j ( N − 1) ω k ) z ( ω k ) = (9) � T ∼ CN (0 , σ 2 I ) . � w = w 0 , . . . , w N − 1 (10)
Bringing it together I 8 ◮ Relating the formulations at n = NL r � a k c H X 12 = k , a k ∈ A NL . (11) k =1 ◮ Define the selection matrix P l that selects N elements P l v � � from every l th element of v , P l v = v 1 , v 1+ l , . . . , v 1+( N − 1) l . Then for some a k ∈ A NL z ( lω k ) = P l a k , (12) and we may form the selection and add matrix ∈ R N × NL 2 , P l ∈ R N × NL . � � P = P 1 P 2 · · · P L (13)
Bringing it together II 9 � H . ◮ Let c k = � [¯ c 1 ] k · · · [¯ c L ] k ◮ Then L L K � � � Z K ( lω )¯ c l = z ( lω k )[¯ c l ] k l =1 l =1 k =1 K L � � = P l a k [¯ c l ] k k =1 l =1 K � P vec ( a k c H = k ) k =1 � K � � a k c H = P vec k k =1 = P vec ( X 12 ) for some a k ∈ A NL , k = 1 , . . . , K and K = r .
A complex-valued SDP formulation 10 ◮ A complex-valued multi-pitch estimator can then be formulated via the SDP 1 minimize 2 ( tr ( X 11 ) + tr ( X 22 )) subject to � y − x � 2 ≤ δ x = P vec ( X 12 ) � X 11 � X 12 (14) � 0 X H X 22 12 X 11 ∈ T NL X 22 ∈ H L , X 12 ∈ C NL × L .
A real-valued SDP formulation I 11 ◮ The real-valued model is � L � � x = ℜ Z K ( lω )¯ c l , y = x + w (15) l =1 with w ∼ N (0 , σ 2 I ). ◮ A real-valued y ∈ R N atomic norm multi-pitch SDP estimator is 1 minimize 2 ( tr ( X 11 ) + tr ( X 22 )) � y − P vec ( ℜ ( X 12 )) � 2 ≤ δ subject to � X 11 � X 12 � 0 (16) X H X 22 12 X 11 ∈ T NL X 22 ∈ H L , X 12 ∈ C NL × L with a solution ( X ⋆ 11 , X ⋆ 22 , X ⋆ 12 ).
A real-valued SDP formulation II 12 ◮ The optimal objective is 1 2 ( tr ( X ⋆ 11 ) + tr ( X ⋆ 22 )) = 1 2 ( tr ( ℜ ( X ⋆ 11 )) + tr ( ℜ ( X ⋆ 22 )) and � X ⋆ �� X ⋆ X ⋆ X ⋆ � �� 11 12 11 12 � 0 ⇒ ℜ � 0 . (17) ( X ⋆ 12 ) H X ⋆ ( X ⋆ 12 ) H X ⋆ 22 22 ◮ If X ⋆ 11 is Toeplitz, then ℜ ( X ⋆ 11 ) is also Toeplitz. ◮ So, ( ℜ ( X ⋆ 11 ) , ℜ ( X ⋆ 22 ) , ℜ ( X ⋆ 12 )) also solves the previous SDP. ◮ We can instead solve the equivalent real SDP 1 minimize 2 ( tr ( X 11 ) + tr ( X 22 )) subject to � y − P vec ( X 12 ) � 2 ≤ δ � X 11 � X 12 � 0 (18) X T X 22 12 X 11 ∈ S NL ∩ T NL X 22 ∈ S L , X 12 ∈ R NL × L with a solution that also solves the complex SDP (16).
Frequency constraint 13 ◮ If the signal y is Nyquist sampled: − π ≤ Lω k ≤ π . ◮ Recall the dictionary A n : � s � T � √ n 1 , exp( jω ) , . . . , exp( j ( n − 1) ω ) A n = � | | ω − α | ≤ β, | s | = 1 , s ∈ C . (19) ◮ The constrained controlled by the parameters α, β can be imposed by adding a semidefinite cone constraint 3 − e jα FX 11 G T − e − jα GX 11 F T + 2 cos( β ) GX 11 G T � 0 (20) � � � � where F = 0 I NL − 1 , G = I NL − 1 0 . ◮ With the selection α = 0, β = π/L , (20) is a real semidefinite cone constraint and Toeplitz. 3 H.-H. Chao and L. Vandenberghe. “Extension of semidefinite programming methods for atomic decomposition”. In: ICASSP . 2016, pp. 4757–4761.
Simulations I 14 ◮ Monte Carlo, R = 500 repetitions, known model-order, K = 2, L = 3, real-valued data otherwise same setup as 4 . ◮ The proposed estimators are implemented with a CVXOPT custom solver 5 based on a non-canonical semidefinite cone representation 6 and an alternating direction method of multipliers with fixed k = 350 iterations. ◮ δ : 1) solve the SDP with δ selected by averaging the smallest 1 3 of the coefficients of the periodogram 2) extract the frequencies ω ⋆ , re-select the regularization parameter as minimum of linear least-squares, re-solve the SDP. 4 M. G. Christensen et al. “Multi-pitch estimation”. In: Signal Processing 88.4 (Apr. 2008), pp. 972–983. 5 M. S. Andersen et al. “Interior-point methods for large-scale cone programming”. In: Optimization for Machine Learning . Ed. by S. Sra, S. Nowozin, and S. J. Wright. MIT Press, 2011. 6 T. Roh and L. Vandenberghe. “Discrete transforms, semidefinite programming and sum-of-squares representations of nonnegative polynomials”. In: SIAM J. Optimiz. 16 (2006), pp. 939–964.
Simulations II 15 ◮ The accuracy should at-least for unbiased estimators be governed by the asymptotic Cram´ er-Rao lower bound (CRLB) for estimating a single fundamental ˆ ω k : 24 σ 2 var (ˆ ω k ) ≥ (21) ( N ( N 2 − 1)) � L l =1 A 2 k,l l 2 where A k,l = | [¯ c l ] k | . These simulations A k,l = 1. ◮ The bound depends on the “enhanced SNR” 7 (for a single pitch) or pseudo SNR (PSNR) for the k th pitch 8 � L l =1 A 2 k,l l 2 PSNR k = 10 log 10 . (22) σ 2 7 A. Nehorai and B. Porat. “Adaptive comb filtering for harmonic signal enhancement”. In: IEEE Trans. Acoust., Speech, Signal Process.” 34.5 (Oct. 1986), pp. 1124–1138. 8 M. G. Christensen et al. “Multi-pitch estimation”. In: Signal Processing 88.4 (Apr. 2008), pp. 972–983.
Simulations III: closely spaced fundamentals 16 10 0 ANLS OPTFILT 10 − 1 ORTH SDPMP ADMM 10 − 2 RMSE CSDPMP ADMM SDPMP 10 − 3 CSDPMP CRLB 10 − 4 10 − 5 0 . 00 0 . 01 0 . 02 0 . 03 0 . 04 0 . 05 ∆ Figure : RMSE as a function of the fundamental frequency difference ω 2 − ω 1 = ∆, K = 2, N = 160, L = 3, PSNR 1 = PSNR 2 = 40 [dB].
Simulations IV: versus PSNR 17 10 0 ANLS OPTFILT 10 − 1 ORTH SDPMP ADMM 10 − 2 RMSE CSDPMP ADMM SDPMP 10 − 3 CSDPMP CRLB 10 − 4 10 − 5 0 5 10 15 20 25 30 35 40 PSNR Figure : RMSE as a function of the PSNR = PSNR 1 = PSNR 2 , K = 2, N = 160, L = 3, and ω 1 = 0 . 1580, ω 2 = 0 . 6364.
Summary 18 Multi-pitch estimation using semidefinite-programming: ◮ Convex optimization (semidefinite programming (SDP)). ◮ Gridless (atomic norm/superresolution, numerically: accuracy determined by the underlying method). ◮ The real-valued model is “easier”/”computational more efficient” compared to the complex-valued model. ◮ Approximately achieves the CRLB. ◮ High resolution (separating two pitches with almost the same frequency).
Recommend
More recommend