parametric pitch estimators for music signals
play

Parametric Pitch Estimators for Music Signals SMC 2012 Tutorial - PowerPoint PPT Presentation

Introduction Statistical Methods Filtering Methods Subspace Methods Amplitude Estimation Discussion Parametric Pitch Estimators for Music Signals SMC 2012 Tutorial Mads Grsbll Christensen Dept. of Architecture, Design & Media


  1. Introduction Statistical Methods Filtering Methods Subspace Methods Amplitude Estimation Discussion So, given a cost function J ( · ) first evaluate candidate fundamental frequencies ω k on a grid Ω as ˆ ω k = arg min ω k ∈ Ω J ( ω k ) . (15) The grid Ω must be sufficiently dense or we may miss the minumum! ω ( i ) Use this estimate as an initial estimate ˆ in the following manner: k ω ( i + 1 ) ω ( i ) k − α ∇ J ( ω ( i ) ˆ = ˆ k ) . (16) k Find the step size α using so-called line search: � � ω ( i ) k − α ∇ J ( ω ( i ) α = arg min ˆ α J ˆ k ) , (17) which can be done in a number of ways. Usually, only a few iterations are required.

  2. Introduction Statistical Methods Filtering Methods Subspace Methods Amplitude Estimation Discussion On The Complex Signal Model There are a number of reasons we use the complex signal model: • Simpler math • Faster algorithms Real signals can be mapped to (almost) equivalent complex signals: • Using the Hilbert transform to calculate the discrete-time analytic signal. • Those do not hold for very low and high frequencies (relative to N ). • It is, in most cases, possible to account for real signals in estimators, but it is often not worth the trouble.

  3. Introduction Statistical Methods Filtering Methods Subspace Methods Amplitude Estimation Discussion Issues The following issues occur when decomposing speech and audio signals using the signal model: • Non-stationarity • Noise characteristics • Overlapping Harmonics • Order estimation/model selection • Inharmonicity The one thing we want to avoid is multiple-dimensional nonlinear optimization!

  4. Introduction Statistical Methods Filtering Methods Subspace Methods Amplitude Estimation Discussion On Modified Signal Models A myriad of modified signal models exist, including: • AM-FM models allowing for various kinds of modulation. • Polynomial phase and amplitude models. • Other parametric modulation models. • Inharmonicity models. • Perturbed (uncertainty) models. Sometimes easy to incorporate in estimators, sometimes difficult, depending on the type of estimator. Prior knowledge (like amplitude smoothness, small perturbations, ω k distribution) can be incorporated using priors .

  5. Introduction Statistical Methods Filtering Methods Subspace Methods Amplitude Estimation Discussion Parameter Estimation Bounds An estimator is said to be unbiased if an estimate ˆ θ i of θ i of the parameter vector θ ∈ R P whose expected value is the true parameter, i.e., � � ˆ E θ i = θ i ∀ θ i , (18) � � ˆ and the difference θ i − E θ i , if any, is referred to as the bias. The Cramér-Rao lower bound (CRLB) of the parameter is given by (under so-called regularity conditions) θ i ) ≥ [ I ( θ )] − 1 var (ˆ , (19) ii with � ∂ 2 ln p ( x ; θ ) � [ I ( θ )] il = − E , (20) ∂θ i ∂θ l where ln p ( x ; θ ) is the log-likelihood function of the observed signal x ∈ C N .

  6. Introduction Statistical Methods Filtering Methods Subspace Methods Amplitude Estimation Discussion The following asymptotic bounds can be established for the pitch estimation problem for white Gaussian noise: 6 σ 2 var (ˆ ω k ) ≥ (21) N ( N 2 − 1 ) � L k l = 1 A 2 k , l l 2 σ 2 var (ˆ A k , l ) ≥ (22) 2 N � � σ 2 3 l 2 ( N − 1 ) 2 1 var (ˆ φ k , l ) ≥ + . (23) � L k A 2 m = 1 A k , m m 2 ( N 2 − 1 ) 2 N k , l These depend on the following quantity: � L l = 1 A 2 k , l l 2 PSNR k = 10 log 10 [ dB ] . (24) σ 2 For colored noise, the squared amplitudes should be weighted by the reciprocal of the noise psd.

  7. Introduction Statistical Methods Filtering Methods Subspace Methods Amplitude Estimation Discussion Such bounds are useful for a number of reasons: • An estimator attaining the bound is optimal. • The bounds tell us how performance can be expected to depend on various quantities. • The bounds can be used as benchmarks in simulations. • Provides us with “rules of thumb” (e.g., include as many harmonics as possible, less noise should result in increasing performance, same for more samples). Caveat emptor: it does not accurately predict the performance of non-linear estimators under adverse conditions (thresholding behavior). It is also possible to calculate it exact CRLBs, where no asymptotic approximations are used. These predict more complicated phenomena. An estimator attaining the bound is said to be efficient . A more fundamental property is consistency .

  8. Introduction Statistical Methods Filtering Methods Subspace Methods Amplitude Estimation Discussion Evaluation of Estimators Basically two questions need to be answered: 1) how does an estimator perform given that the model is true? 2) is the model true? Monte Carlo Repeated experiment with parameters and/or noise being randomized in each run. Synthetic Signals Makes it possible to measure the performance of estimators. MIDI Signals Same as above, but may still ultimately be model-based. Audio Databases Real signal allows us to answer the second question. But how do we measure performance? Speech/audio databases are okay, but contain subjective aspects–the pitch may be not well-defined in a particular segment. Or we are trying to solve an ill-posed problem. It is of course possible to violate model assumptions (whereby robustness is revealed) with synthetic signals.

  9. Introduction Statistical Methods Filtering Methods Subspace Methods Amplitude Estimation Discussion Basically check whether the estimator is efficient or at least consistent. A good measure is the root mean square estimation error (RMSE): � � � � 2 � K S � � � 1 ω ( s ) RMSE = ˆ − ω k , (25) k SK k = 1 s = 1 ω ( s ) where ω k and ˆ are the true fundamental frequency and the estimate k for source k in Monte Carlo iteration s The RMSE can be used to bound the probability of errors using the Chebyshev inequality. Some annotated speech and audio databases: Keele Pitch Database, RWC Music Database, MAPS database, IOWA Musical Instrument Samples. Relevant MIREX tasks: Audio melody extraction, chord detection, multi-pitch estimation & tracking, score following (training/tweaking sets available).

  10. Introduction Statistical Methods Filtering Methods Subspace Methods Amplitude Estimation Discussion Statistical Methods Statistical methods are based on statistical models of the observed signal with the observation pdf being characterized by a number of parameters. Maximum likelihood (ML) estimation is perhaps the most commonly used of all types of estimators. Often based on a deterministic plus stochastic signal model, where the parameters of interest are considered deterministic but unknown and the observation noise is the stochastic part. ML is statistically efficient for a sufficiently high number of samples and can be computationally demanding for nonlinear problems like ours. Often approximate methods can be derived from explicit assumptions.

  11. Introduction Statistical Methods Filtering Methods Subspace Methods Amplitude Estimation Discussion Maximum Likelihood Method For multi-variate Gaussian signals, the likelihood function of the observed signal sub-vector x k ( n ) can be written as 1 π M det ( Q k ) e − e H k ( n ) Q − 1 e k ( n ) . p ( x k ( n ); θ k ) = (26) k Assuming that the deterministic part is stationary and the noise is i.i.d., the likelihood of { x k ( n ) } G − 1 n = 0 can be written as π MG det ( Q k ) G e − � G − 1 G − 1 � 1 n = 0 e H k ( n ) Q − 1 e k ( n ) . p ( { x k ( n ) } ; θ k ) = p ( x k ( n ); θ k ) = k n = 0 The log-likelihood function is L ( θ k ) = ln p ( { x k ( n ) } ; θ k ) and the maximum likelihood estimator is ˆ θ k = arg max L ( θ k ) . (27)

  12. Introduction Statistical Methods Filtering Methods Subspace Methods Amplitude Estimation Discussion For white Gaussian noise, i.e., Q k = σ 2 I , and setting M = N the log-likelihood function is k − 1 L ( θ k ) = − N ln π − N ln σ 2 � e k � 2 2 , (28) σ 2 k with e k = x k − Z k a k . Given ω k and L k , we can substitute the amplitudes by their LS estimates, and the ML noise variance estimate is then � � − 1 Z H k = 1 σ 2 Z H k x k � 2 ˆ N � x k − Z k k Z k 2 . (29) This leads to the following estimator: � � − 1 Z H ω k x H Z H ω k = arg max ˆ ω k L ( ω k ) = arg max k Z k k Z k k x k . (30) The above a nonlinear function and is termed the nonlinear least-squares (NLS) method.

  13. Introduction Statistical Methods Filtering Methods Subspace Methods Amplitude Estimation Discussion The projection matrix Π Z k can be approximated as � � − 1 Z H Z H k = Z k Z H M →∞ M Π Z k = lim lim M →∞ M Z k k Z k k . (31) Using this, the noise variance estimate can be simplified, i.e., k ≈ 1 N � x k − 1 σ 2 N Z k Z H k x k � 2 ˆ 2 . (32) Writing out the log-likelihood function, we get σ 2 ω k = arg max ˆ ω k L ( ω k ) = arg max ω k − N ln π − N ln ˆ k − N (33) ω k x H k Z k Z H ω k � Z H k x k � 2 = arg max k x k = arg max (34) 2 2 = � L k l = 1 | � N − 1 n = 0 x k ( n ) e − j ω k ln | 2 � � L k where � Z H k x k � 2 l = 1 | X k ( ω k l ) | 2 , i.e., this can be computed using an FFT (i.e., harmonic summation )! This is known as the approximate NLS method (ANLS). Note that these estimators are known to be robust to colored noise.

  14. Introduction Statistical Methods Filtering Methods Subspace Methods Amplitude Estimation Discussion −50 −100 −150 Log−likelihood −200 −250 −300 −350 −400 0 2 4 6 8 10 Number of Harmonics Figure: Log-likelihood function for a synthetic periodic signal (with L k = 5).

  15. Introduction Statistical Methods Filtering Methods Subspace Methods Amplitude Estimation Discussion 20 L k =5 18 L k =8 16 14 Cost Function 12 10 8 6 4 2 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 Fundamental Frequency [radians] Figure: Cost function for a synthetic signal ω k = 0 . 3142.

  16. Introduction Statistical Methods Filtering Methods Subspace Methods Amplitude Estimation Discussion 3 3 2 2 1 1 Amplitude Amplitude 0 0 −1 −1 −2 −2 −3 −3 0 10 20 30 40 50 60 0 10 20 30 40 50 60 Time [ms] Time [ms] Figure: Speech signals with pitches 165 and 205 Hz.

  17. Introduction Statistical Methods Filtering Methods Subspace Methods Amplitude Estimation Discussion 4 4 12 x 10 12 x 10 205 Hz Source 165 Hz Source 10 10 8 8 Cost Function Cost Function 6 6 4 4 2 2 0 0 100 150 200 250 300 350 400 100 150 200 250 300 350 400 Fundamental Frequency [Hz] Fundamental Frequency [Hz] Figure: Approximate maximum likelihood cost function for the two speech signals (left) and their mixture (right).

  18. Introduction Statistical Methods Filtering Methods Subspace Methods Amplitude Estimation Discussion Inharmonicity To incorporate the inharmonicity model, we only have to replace the √ frequencies ω k l by ψ k , l = ω k l 1 + B k l 2 , i.e., the estimator becomes � L k ω k , ˆ | X k ( ψ k , l ) | 2 (ˆ B k ) = arg max (35) ω k , B k l = 1 L k � � 1 + B k l 2 ) | 2 , = arg max | X k ( ω k l (36) ω k , B k l = 1 which means that we, in principle, have to sweep over combinations of the two nonlinear parameters to obtain the estimates. Similarly, in the exact method, Z k would then be a function of both ω k and B k .

  19. Introduction Statistical Methods Filtering Methods Subspace Methods Amplitude Estimation Discussion Model and Order Selection For the parametric methods to work, the model order L must be known! To determine the model order (or choose between different models), one can use a number of different methods. The MAP method penalizes nonlinear and linear parameters differently and is well-suited for our purposes. First, we introduce Z q = { 0 , 1 , . . . , q − 1 } which is the candidate model index set with M m , m ∈ Z q being the candidate models. The principle of MAP-based model selection is to choose the model that maximizes the a posteriori probability, i.e., p ( x k |M m ) p ( M m ) � M k = arg M m , m ∈ Z q p ( M m | x k ) = arg max max . (37) p ( x k ) M m , m ∈ Z q

  20. Introduction Statistical Methods Filtering Methods Subspace Methods Amplitude Estimation Discussion Assuming that all the models are equally probable, i.e., p ( M m ) = 1 (38) q and noting that p ( x k ) is constant once x k has been observed, the MAP model selection criterion reduces to � M k = arg M m , m ∈ Z q p ( x k |M m ) , max (39) which is the likelihood function when seen as a function of M m . Since the various models also depend on a number of unknown parameters, we will integrate those out as � p ( x |M m ) = p ( x k | θ k , M m ) p ( θ k |M m ) d θ k . (40) Θ k

  21. Introduction Statistical Methods Filtering Methods Subspace Methods Amplitude Estimation Discussion We will use the method of Laplace integration. Assuming that the likelihood function is highly peaked, we can write � p ( x k | θ k , M m ) p ( θ k |M m ) d θ k Θ k � � − 1 / 2 = π D k / 2 det � p ( x k | ˆ θ k , M m ) p (ˆ H k θ k |M m ) , (41) where D k is the number of parameters and � H k = − ∂ 2 ln p ( x k | θ k , M m ) � � � (42) � ∂ θ k ∂ θ T θ k =ˆ k θ k is the Hessian of the log-likelihood function evaluated at ˆ θ k (also known as the observed information matrix).

  22. Introduction Statistical Methods Filtering Methods Subspace Methods Amplitude Estimation Discussion Taking the logarithm and ignoring constant terms (and D k 2 ln π ), we get � � + 1 � M m , m ∈ Z q − ln p ( x k | ˆ � M k = arg min θ k , M m ) 2 ln det H k , (43) � �� � � �� � log-likelihood penalty which can be used directly for selecting between various models and orders. The Hessian matrix is related to the Fischer information matrix, only it is evaluated in ˆ θ k . We introduce the normalization matrix � N − 3 / 2 � 0 K N = (44) N − 1 / 2 I O where I is an 2 L k × 2 L k identity matrix.

  23. Introduction Statistical Methods Filtering Methods Subspace Methods Amplitude Estimation Discussion Using this normalization matrix, we can write the determinant of the Hessian in as � � � � � � � K N � K − 2 det H k = det det H k K N . (45) N And, finally, by observing that K N � H k K N = O ( 1 ) and taking the logarithm, we obtain � � � � � � � K N � K − 2 ln det H k = ln det + ln det H k K N (46) N � � K − 2 = ln det + O ( 1 ) (47) N = 3 ln N + 2 L k ln N + O ( 1 ) . (48) When the additive noise is a white complex Gaussian process, the log-likelihood function is N ln σ 2 k , where σ 2 k then is replaced by an σ 2 estimate ˆ k ( L k ) .

  24. Introduction Statistical Methods Filtering Methods Subspace Methods Amplitude Estimation Discussion Model Selection Rules This all leads to the following rule (for L k ≥ 1): k ( L k ) + L k log N + 3 ˆ σ 2 L k = arg min L k N log ˆ 2 log N . (49) No harmonics are present if L k log N + 3 σ k ( 0 ) 2 < N log ˆ k (ˆ L k ) + ˆ σ 2 N log ˆ 2 log N . (50) Comments: • Accurate order estimation is critical to the pitch estimation problem but also a very difficult problem. • Statistical order estimation methods (MDL, MAP, AIC) are based on asymptotic approximations and are often arbitrary and suboptimal. • Colored noise may cause problems for order estimation, more so than for fundamental frequency estimation!

  25. Introduction Statistical Methods Filtering Methods Subspace Methods Amplitude Estimation Discussion 200 No order penalty MAP criterion 150 100 Cost Function 50 0 −50 −100 −150 0 2 4 6 8 10 Number of Harmonics Figure: MAP model selection criterion and log-likelihood term for a synthetic signal with L k = 5.

  26. Introduction Statistical Methods Filtering Methods Subspace Methods Amplitude Estimation Discussion 4000 Frequency [Hz] 3000 2000 1000 0 0.5 1 1.5 2 2.5 Time [ms] 400 Estimate [Hz] 300 200 100 0 0.5 1 1.5 2 2.5 Time [s] Figure: Voiced speech signal spectrogram (top) and pitch estimates (bottom).

  27. Introduction Statistical Methods Filtering Methods Subspace Methods Amplitude Estimation Discussion Expectation Maximization Algorithm We write the signal model as a sum of K sources in white additive Gaussian noise, i.e., � K x = x k (51) k = 1 where the individual sources are given by x k = Z k a k + β k e , and • the noise source is decomposed into e k = β k e where β k ≥ 0 is chosen so that � K k = 1 β k = 1. • the set { x k } is referred to as the complete data which is unobservable and the observed data is x . • x and { x k } are assumed to be jointly Gaussian. • the observations are assumed to be white Gaussian.

  28. Introduction Statistical Methods Filtering Methods Subspace Methods Amplitude Estimation Discussion The problem is then to estimate the complete data set or its parameters. By stacking the complete data in a vector y as � � T , x T 1 x T 2 . . . x T y = (52) K we can now write the incomplete data as x = Hy , (53) where H = [ I · · · I ] . In each iteration, where ( i ) denotes the iteration number, the EM algorithm consists of two steps, the E-step, i.e., � U ( θ , θ ( i ) ) = ln p ( y , x ; θ ) p ( y | x ; θ ( i ) ) d y , (54) and the M-step, i.e., θ ( i + 1 ) = arg max U ( θ , θ ( i ) ) . (55) θ

  29. Introduction Statistical Methods Filtering Methods Subspace Methods Amplitude Estimation Discussion Define an estimate of the k th source at iteration ( i ) as � � � K x ( i ) k = Z ( i ) a ( i ) Z ( i ) a ( i ) ˆ k ˆ k ˆ k + β k x − , (56) k k = 1 where Z ( i ) ω ( i ) is constructed from ˆ k . The problem of estimating the k fundamental frequencies then becomes � � − 1 Z H ω ( i + 1 ) x ( i ) H x ( i ) Z H ˆ = arg max ˆ Z k k Z k k ˆ (57) k k k ω ( i + 1 ) k ω ( i + 1 ) and the amplitudes can be found given ˆ as k � � − 1 a ( i + 1 ) Z ( i + 1 ) H Z ( i + 1 ) Z ( i + 1 ) H x ( i ) ˆ = ˆ k . (58) k k k k This process is then repeated until convergence for i = 0 , . . . , I − 1. This is the same as the single-pitch ML method, but applied to the source estimates.

  30. Introduction Statistical Methods Filtering Methods Subspace Methods Amplitude Estimation Discussion Comments: • The EM algorithm leads to an implicit source separation. • It is quite complicated to use and, especially, to initialize. • Many different variations of these ideas can be (and have been) devised. For example, the harmonic matching pursuit. • The model order estimation problem can cause difficulties.

  31. Introduction Statistical Methods Filtering Methods Subspace Methods Amplitude Estimation Discussion Harmonic Fitting Idea: Estimate the unconstrained frequencies { ψ k , l } and fit the fundamental frequency to those (aka EXIP). Define θ ′ k = [ A k , 1 φ k , 1 ψ k , 1 · · · A k , L k φ k , L k ψ k , L k ] T (59) and k = [ ω k A k , 1 φ k , 1 · · · A k , L k φ k , L k ] T . η ′ (60) The basic idea of the method is that there exists a so-called selection matrix S ′ ∈ Z 3 L k × ( 2 L k + 1 ) that relates the vectors as θ ′ k = S ′ η ′ k . (61) ′ k from estimates ˆ We can now find an estimate of η ′ k as θ � �� � 2 � � ′ � W ′ 1 ˆ η ′ k − S ′ η ′ ˆ k = arg min θ � 2 . (62) 2 k η ′ k How to choose W ′ ?

  32. Introduction Statistical Methods Filtering Methods Subspace Methods Amplitude Estimation Discussion If a maximum likelihood estimator is used for θ ′ k then the estimates will asymptotically be distributed according the CRLB! ′ Hence, we may choose W ′ = I (ˆ θ k ) , which is the FIM matrix. Therefore, W ′ becomes block diagonal for large N , i.e.,   W ′ 0 1   W ′ = ...  , (63)  W ′ 0 L k where the individual sub-matrices contain the inverse of the CRLB matrix for the individual sinusoids of the unconstrained model, i.e.,   2 N 0 0 l = 1 2 N ˆ N 2 ˆ W ′  A 2 A 2  . 0 (64) k , l k , l σ 2 N 2 ˆ 3 N 3 ˆ 2 k A 2 A 2 0 k , l k , l

  33. Introduction Statistical Methods Filtering Methods Subspace Methods Amplitude Estimation Discussion The weighting does not lead to refined estimates of the amplitudes. Consequently, we define θ k ∈ R 2 L k × 1 and η k ∈ R L k + 1 × 1 like θ ′ k and η ′ k but without the amplitudes. Now we may rewrite (61) as   0 1 0 · · · 0   1 0 0 · · · 0     0 0 1 · · · 0     2 0 0 · · · 0 η k � S η k . θ k =   (65)   . . . .   . .     0 0 0 · · · 1 0 0 · · · 0 L k As before, we can state our estimator as the minimizer of the norm of the error between the left and the right side of this expression, i.e., � �� � � � 2 1 ˆ η k = arg min ˆ � W θ k − S η k � 2 . (66) 2 η k

  34. Introduction Statistical Methods Filtering Methods Subspace Methods Amplitude Estimation Discussion W is a now block diagonal with sub-matrices W l defined as � � 2 N ˆ N 2 ˆ A 2 A 2 W l = 1 k , l k , l , (67) N 2 ˆ 3 N 3 ˆ σ 2 A 2 2 A 2 k k , l k , l and the cost function is � �� � � L k 2 = 1 2 � � 1 ˆ ˆ k , l ([ 2 N (ˆ φ k , l − φ k , l ) + N 2 ( ˆ A 2 J = � W θ k − S η k � ψ k , l − l ω k )] 2 σ 2 k l = 1 � � φ k , l − φ k , l ) + 2 × (ˆ N 2 (ˆ 3 N 3 ( ˆ ( ˆ φ k , l − φ k , l ) + ψ k , l − l ω k ) ψ k , l − l ω k )) . Substituting the phases { φ k , l } by estimates and solving for ω k , we get � L k l = 1 l ˆ k , l ˆ A 2 ψ k , l ω k = ˆ . (68) � L k l = 1 l 2 ˆ A 2 k , l Which is essentially a closed-form fundamental frequency estimator!

  35. Introduction Statistical Methods Filtering Methods Subspace Methods Amplitude Estimation Discussion Experimental Details • RMSE as a function of various conditions. • Percentage of correctly estimated model orders. • ω 1 = 0 . 6364 and ω 2 = 0 . 1580, three harmonics, unit amplitudes. • 100 Monte Carlo iterations for each point when RMSE, 1000 for orders. • When estimating ω k , the model order is assumed known (and vice versa). • White Gaussian noise. • First a coarse estimate is found using grid search, then Gradient/Newton methods are used for refinement.

  36. Introduction Statistical Methods Filtering Methods Subspace Methods Amplitude Estimation Discussion Experimental Results −2 −2 10 10 CRLB CRLB ANLS (2 source) ANLS (2 sources) ANLS (1 source) ANLS (1 sources) EM (2 sources) EM (2 sources) −3 −3 10 10 RMSE RMSE −4 −4 10 10 −5 −5 10 10 −6 −6 10 10 0 5 10 15 20 25 30 35 40 50 100 150 200 250 300 350 400 450 500 N PSNR [dB] Figure: RMSE as a function of N (with PSNR = 40 dB) and PSNR (with N = 400).

  37. Introduction Statistical Methods Filtering Methods Subspace Methods Amplitude Estimation Discussion 0 10 CRLB ANLS −1 EM 10 −2 10 RMSE −3 10 −4 10 −5 10 −6 10 0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 ∆ [radians] Figure: RMSE as a function of the difference between two fundamental frequencies for N = 160 and PSNR = 40 dB.

  38. Introduction Statistical Methods Filtering Methods Subspace Methods Amplitude Estimation Discussion 110 110 100 100 90 90 80 80 70 70 % Correct % Correct 60 60 50 50 40 40 30 30 20 20 10 10 0 0 0 10 20 30 40 50 60 100 150 200 250 300 350 400 450 500 N PSNR [dB] Figure: Percentage of correctly estimated model orders as a function of N (with PSNR = 40 dB) and PSNR (with N = 500).

  39. Introduction Statistical Methods Filtering Methods Subspace Methods Amplitude Estimation Discussion Discussion • For the single-pitch case, the NLS methods performs extremely well, being statistically efficient, even asymptotically for colored noise. • Associated problems of model and order selection can be solved consistently within the framework. • Somewhat problematic for the multi-pitch case, requiring multidimensional nonlinear optimization. • The EM algorithm and similar methodologies provide only a partial solution. • The harmonic fitting approach is very sensitive to spurious estimates and its generalization to multiple pitches is not straightforward.

  40. Introduction Statistical Methods Filtering Methods Subspace Methods Amplitude Estimation Discussion Filtering Methods Intuitive idea: filter the observed signal with filters having unit gain for the candidate harmonics while suppressing everything else. Can be used for estimating parameters, extracting periodic signals, and separation of periodic signals. One can use classical IIR/FIR filters or adaptive optimal filters. The history of comb filters goes far back. As we shall seen, there also exists some connections between statistical methods and filtering methods.

  41. Introduction Statistical Methods Filtering Methods Subspace Methods Amplitude Estimation Discussion Comb Filtering Mathematically, we may express periodicity as x ( n ) ≈ x ( n − D ) where D is the pitch period. It follows that a measure of the periodicity can be obtained using a metric on e ( n ) defined as e ( n ) = x ( n ) − α x ( n − D ) . (69) Taking the z-transform of this expression we get E ( z ) = X ( z ) − α X ( z ) z − D (70) = X ( z )( 1 − α z − D ) . (71) The transfer function H ( z ) of the filter that operates on x ( n ) can be seen to be H ( z ) = E ( z ) X ( z ) = ( 1 − α z − D ) . (72) This mathematical structure is known as a comb filter.

  42. Introduction Statistical Methods Filtering Methods Subspace Methods Amplitude Estimation Discussion A more efficient alternative is notch filters which are filters that cancel out signal components at certain frequencies. These have the following form: H ( z ) = 1 + β 1 z − 1 P ( z ) 1 + ρβ 1 z − 1 = P ( ρ − 1 z ) . (73) Using L k such notch filters having notches at frequencies { ψ i } , we obtain � L k ( 1 − e j ψ i z − 1 ) = 1 + β 1 z − 1 + . . . + β L k z − L k , P ( z ) = (74) i = 1 which has zeros on the unit circle at the desired frequencies. This polynomial defines the numerator, while the denominator is � L k ( 1 − ρ e j ψ i z − 1 ) = 1 + ρβ 1 z − 1 + . . . + ρ L k β M z − L k . P ( ρ − 1 z ) = (75) i = 1

  43. Introduction Statistical Methods Filtering Methods Subspace Methods Amplitude Estimation Discussion Combining these, one obtains the following comb filter: 1 + β 1 z − 1 + β 2 z − 2 + . . . + β L k z − L k P ( z ) H ( z ) = P ( ρ − 1 z ) = 1 + ρβ 1 z − 1 + ρ 2 β 2 z − 2 + . . . + ρ L k β L k z − L k . (76) By applying this filter for various candidate fundamental frequencies to our observed signal x ( n ) , we can obtain the filtered signal e ( n ) where the harmonics have been suppressed: e ( n ) = x ( n ) + β 1 x ( n − 1 ) + β 2 x ( n − 2 ) + . . . + β L k x ( n − L k ) (77) − ρβ 1 e ( n − 1 ) − ρ 2 β 2 e ( n − 2 ) − . . . − ρ L k β L k e ( n − L k ) . (78) Finally, we can use this signal for finding the fundamental frequency: � N | e ( n ) | 2 . ω k = arg min ˆ ω J with J = (79) n = 1

  44. Introduction Statistical Methods Filtering Methods Subspace Methods Amplitude Estimation Discussion 1.5 1 0.8 0.6 0.4 1 Magnitude 0.2 Real 0 −0.2 0.5 −0.4 −0.6 −0.8 −1 0 −1 −0.5 0 0.5 1 0 1 2 3 4 5 6 Imaginary Frequency [radians] Figure: Z-plane representation of the zeros (circles) and poles (x-mark) and frequency response.

  45. Introduction Statistical Methods Filtering Methods Subspace Methods Amplitude Estimation Discussion Classical Methods Returning to the original comb filter e ( n ) = x ( n ) − α x ( n − D ) , (80) we see that it can be thought of as a prediction problem with unknowns α and D . We could determine these as � | e ( n ) | 2 � α, ˆ { ˆ D } = arg min α, D E , (81) which is also what long-term predictors in speech coders do. Assuming α = 1, we obtain � | e ( n ) | 2 � = E { ( x ∗ ( n ) − x ∗ ( n − D )) (( x ( n ) − x ( n − D )) } (82) E � | x ( n ) | 2 � � | x ( n − D ) | 2 � = E + E − E { x ∗ ( n ) x ( n − D ) } − E { x ∗ ( n − D ) x ( n ) } . (83) Assuming that the signal is stationary, we have that � | x ( n ) | 2 � � | x ( n − D ) | 2 � = σ 2 . E = E

  46. Introduction Statistical Methods Filtering Methods Subspace Methods Amplitude Estimation Discussion Furtermore, we have that r ( D ) = E { x ∗ ( n ) x ( n − D ) } . This leads to the following estimator: ˆ D = arg max D 2 Re ( r ( D )) , (84) which is the well-known auto-correlation method (complex version). One can generalize this principle as follows (with p ≥ 1): ˆ α, D E {| x ( n ) − x ( n − D ) | p } . D = arg min (85) Comments: (i) For p = 2, we obtain the autocorrelation method (corresponding to e ( n ) being Gaussian). (ii) For p = 1, we obtain the average magnitude difference function (AMDF) method (corresponding to e ( n ) being Laplacian). (iii) Restricted to integer samples (fractional delays require more work). (iv) Non-unique estimates (and summation limits considerations).

  47. Introduction Statistical Methods Filtering Methods Subspace Methods Amplitude Estimation Discussion FIR Methods We construct a vector from M time-reversed samples of the observed signal, i.e., x ( n ) = [ x ( n ) x ( n − 1 ) · · · x ( n − M + 1 ) ] T , (86) with M ≤ N and with ( · ) T denoting the transpose. Next, introducing the output signal y k , l ( n ) of the l th filter for the k th source having coefficients h k , l ( n ) as M − 1 � h k , l ( m ) x ( n − m ) = h H y k , l ( n ) = k , l x ( n ) , (87) m = 0 h k , l being a vector containing the impulse response of the l th filter, i.e., � � H . h ∗ k , l ( 0 ) · · · h ∗ h k , l = k , l ( M − 1 ) (88)

  48. Introduction Statistical Methods Filtering Methods Subspace Methods Amplitude Estimation Discussion The output power of the l th filter can be written as � | y k , l ( n ) | 2 � � � h H k , l x ( n ) x H ( n ) h k , l = h H E = E k , l Rh k , l . (89) The total output power of all the filters is � L k � L k � | y k , l ( n ) | 2 � h H E = k , l Rh k , l . (90) l = 1 l = 1 Defining a matrix H k consisting of the filters { h k , l } as H k = [ h k , 1 · · · h k , L k ] , (91) we can write the total output power as a sum of the power of the subband signals, i.e., L k � � | y k , l ( n ) | 2 � � � H H = Tr . (92) E k RH k l = 1

  49. Introduction Statistical Methods Filtering Methods Subspace Methods Amplitude Estimation Discussion Interpretation of the NLS Method Suppose we construct the filters from complex sinusoids as � e − j ω k l 0 · · · e − j ω k l ( M − 1 ) � T h k , l = , (93) The matrix H k is identical to the Vandermonde matrix Z k except that it is time-reversed, i.e., Z k = [ z ( ω k ) · · · z ( ω k L k ) ] , (94) with z ( ω ) = [ 1 e − j ω · · · e − j ω ( M − 1 ) ] T . Then, we may write the total output power of the filterbank as � � � � H H Z H Tr = Tr (95) k RH k k RZ k � � � Z H k x ( n ) � 2 = E . (96) 2 Except for the expectation, this is the FFT method introduced earlier.

  50. Introduction Statistical Methods Filtering Methods Subspace Methods Amplitude Estimation Discussion Optimal filterbank Idea: find a set of filters that pass power undistorted at specific frequencies while minimizing the output power: � � H H H H min H k Tr s.t. k Z k = I , (97) k RH k where I is the L k × L k identity matrix. As before, the matrix Z k ∈ C M × L k is constructed from L k complex sinusoidal vectors. Using the method of Lagrange multipliers, the filterbank matrix H k can be shown to be � � − 1 . H k = R − 1 Z k Z H k R − 1 Z k (98) This data and frequency dependent filter bank can then be used to estimate the fundamental frequencies as �� � − 1 � k R − 1 Z k Z H ω k = arg max ˆ ω k Tr . (99)

  51. Introduction Statistical Methods Filtering Methods Subspace Methods Amplitude Estimation Discussion 205 Hz Source 165 Hz Source 0.6 0.25 0.5 0.2 Cost Function Cost Function 0.4 0.15 0.3 0.1 0.2 0.05 0.1 0 0 100 150 200 250 300 350 400 100 150 200 250 300 350 400 Fundamental Frequency [Hz] Fundamental Frequency [Hz] Figure: Cost function based on optimal filtering for the two speech signals (left) and their mixture (right).

  52. Introduction Statistical Methods Filtering Methods Subspace Methods Amplitude Estimation Discussion Optimal Filter Suppose that we instead design a single filter for the k th source, h k . This filter design problem can be stated as h k h H h H min k Rh k s.t. k z ( ω k l ) = 1 , (100) for l = 1 , . . . , L k . This has the solution � � − 1 1 k Rh k = 1 H � � − 1 1 . h k = R − 1 Z k Z H k R − 1 Z k h H Z H k R − 1 Z k and As before, we readily obtain an estimate of the fundamental frequency as ω k 1 H � � − 1 1 . Z H k R − 1 Z k ˆ ω k = arg max (101) It is perhaps not clear how these two estimators are related.

  53. Introduction Statistical Methods Filtering Methods Subspace Methods Amplitude Estimation Discussion 10 Magnitude [dB] 0 −10 −20 −30 0 1 2 3 4 5 6 Frequency [radians] 10 Magnitude [dB] 0 −10 −20 −30 0 1 2 3 4 5 6 Frequency [radians] Figure: Frequency response (magnitude) of the optimal filter bank and filter for white noise.

  54. Introduction Statistical Methods Filtering Methods Subspace Methods Amplitude Estimation Discussion Asymptotic Analysis Comparing the optimal filters, we observe that they can be related as L � � � − 1 1 = H k 1 = h k = R − 1 Z k Z H k R − 1 Z k h k , l , (102) l = 1 �� � − 1 � but generally 1 H � � − 1 1 � = Tr k R − 1 Z k Z H R − 1 Z k Z H . Analyzing the asymptotic properties of the cost function, we see that � � �� Φ( ω k ) Φ( ω k L k ) �� 1 Z H lim = diag · · · , (103) k RZ k M M →∞ with Φ( ω ) being the psd of x ( n ) . It can therefore be seen that � L k M →∞ M 1 H � � − 1 1 = k R − 1 Z k Z H lim Φ( ω k l ) . (104) l = 1 Conclusion: The methods are asymptotically equivalent and are equivalent to the NLS method too!

  55. Introduction Statistical Methods Filtering Methods Subspace Methods Amplitude Estimation Discussion Order Estimation As stated earlier, the MAP criterion is defined for L k ≥ 1 as σ 2 ( L k ) + L k log N + 3 ˆ L k = arg min L N log ˆ 2 log N . (105) σ 2 ( L k ) from the residual We now need to find the estimate ˆ ˆ e ( n ) = x ( n ) − y k ( n ) . Additionally, y k ( n ) is M − 1 � � L k M − 1 � y k ( n ) = h k , l ( m ) x ( n − m ) = h k ( m ) x ( n − m ) , (106) m = 0 l = 1 m = 0 where h k ( m ) is the sum over the filters of the filterbank. We can now write (with g k = [ ( 1 − h k ( 0 )) − h k ( 1 ) · · · − h k ( M − 1 ) ] H ) M − 1 � h k ( m ) x ( n − m ) � g H ˆ e ( n ) = x ( n ) − k x ( n ) . (107) m = 0

  56. Introduction Statistical Methods Filtering Methods Subspace Methods Amplitude Estimation Discussion We can then estimate the noise variance as � e ( n ) | 2 � � � σ 2 ( L k ) = E g H k x ( n ) x H ( n ) g k = g H ˆ | ˆ = E k Rg k . (108) Defining g k = b 1 − h k , with b 1 = [ 1 0 · · · 0 ] the variance estimate is rewritten as σ 2 ( L k ) = b H 1 Rb 1 − b H 1 Rh k − h H k Rb 1 + h H ˆ k Rh k . (109) � | x ( n ) | 2 � The first term can be identified as b H and h H 1 Rb 1 = E k Rh k we know. Writing out the cross-terms yields � � − 1 1 = h H b H 1 Rh k = b H 1 RR − 1 Z k Z H k R − 1 Z k k Rh k . (110) Therefore, the variance estimate can be expressed as � | x ( n ) | 2 � − 1 H � � − 1 1 . σ 2 ( L k ) = E Z H k R − 1 Z k ˆ (111)

  57. Introduction Statistical Methods Filtering Methods Subspace Methods Amplitude Estimation Discussion Efficient Implementation Both the filterbank method and the single filter method require the calculation of � � − 1 . Z H k R − 1 Z k (112) To apply the MIL to the calculation of the cost function, we first define a matrix Z ( L k − 1 ) = [ z ( ω k ) · · · z ( ω k ( L k − 1 )) ] , (113) k � e − j ω k L k ( M − 1 ) � T . We can now write and a vector z ( L k ) e − j ω k L k 0 · · · = k � � − 1 Z ( L k − 1 ) H R − 1 Z ( L k − 1 ) Z ( L k − 1 ) H R − 1 z ( L k ) � � − 1 = Z H k R − 1 Z k k k k k z ( L k ) H R − 1 Z ( L k − 1 ) z ( L k ) H R − 1 z ( L k ) k k k k � Ξ L k , (114) where Ξ L k is the matrix calculated for an order L k model.

  58. Introduction Statistical Methods Filtering Methods Subspace Methods Amplitude Estimation Discussion Next, define the quantities ξ L k = z ( L k ) H R − 1 z ( L k ) η L k = Z ( L k − 1 ) H R − 1 z ( L k ) and . (115) k k k k We can now express the matrix inverse using the MIL in terms of these as � Ξ L k − 1 � � − Ξ L k − 1 η L k � 0 Ξ L k = + O 0 1 � − η H 1 � 1 × L k Ξ L k − 1 (116) ξ H L k − η H L k Ξ L k − 1 η L k � Ξ L k − 1 � � ζ L k ζ H � + 1 − ζ L k 0 � L k . (117) − ζ H 0 O β L k 1 L k This shows that once Ξ L k − 1 is known, Ξ L k can be obtained in a simple way.

  59. Introduction Statistical Methods Filtering Methods Subspace Methods Amplitude Estimation Discussion For a given ω k , we calculate the order 1 inverse matrix as Ξ 1 = 1 , (118) ξ 1 and then, for l = 2 , . . . , L k , calculate R − 1 z ( l ) = (119) κ l k z ( l ) H ξ l = κ l (120) k Z ( l − 1 ) H = (121) η l κ l k = Ξ l − 1 η l (122) ζ l ξ H l − η H β l = l Ξ l − 1 η l (123) � Ξ l − 1 � � ζ l ζ H � + 1 0 − ζ l l Ξ l = , (124) − ζ H O 0 β l 1 l from which the fundamental frequency and the model order can be found.

  60. Introduction Statistical Methods Filtering Methods Subspace Methods Amplitude Estimation Discussion Experimental Results −2 −2 10 10 CRLB CRLB Opt. Filtering (2 sources) Opt. Filtering (2 sources) Opt. Filtering (1 source) Opt. Filtering (1 source) −3 −3 10 10 RMSE RMSE −4 −4 10 10 −5 −5 10 10 −6 −6 10 10 0 5 10 15 20 25 30 35 40 50 100 150 200 250 300 350 400 450 500 N PSNR [dB] Figure: RMSE as a function of N (with PSNR = 40 dB) and PSNR (with N = 400).

  61. Introduction Statistical Methods Filtering Methods Subspace Methods Amplitude Estimation Discussion 0 10 CRLB Opt. Filtering −1 10 −2 10 RMSE −3 10 −4 10 −5 10 −6 10 0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 ∆ [radians] Figure: RMSE as a function of the difference between two fundamental frequencies for N = 160 and PSNR = 40 dB.

  62. Introduction Statistical Methods Filtering Methods Subspace Methods Amplitude Estimation Discussion 110 110 100 100 90 90 80 80 70 70 % Correct % Correct 60 60 50 50 40 40 30 30 20 20 10 10 0 0 0 10 20 30 40 50 60 100 150 200 250 300 350 400 450 500 N PSNR [dB] Figure: Percentage of correctly estimated model orders as a function of N (with PSNR = 40 dB) and PSNR (with N = 500).

  63. Introduction Statistical Methods Filtering Methods Subspace Methods Amplitude Estimation Discussion 110 100 90 80 70 % Correct 60 50 40 30 20 10 0 10 20 30 40 50 60 70 80 90 100 M Figure: Percentage of correctly estimated model orders as a function of M (with PSNR = 40 dB and N = 200).

  64. Introduction Statistical Methods Filtering Methods Subspace Methods Amplitude Estimation Discussion Discussion • The methods based on optimal filtering form an intriguing alternative for pitch estimation. • Especially so as they lead to a natural decoupling of the multi-pitch estimation problem. • They can also be used directly for enhancement and separation. • They have excellent performance under adverse conditions, like closely spaced multiple pitches. • Robust to colored noise. • Complexity may be prohibitive for some applications. • Order and model selection can be performed consistently within the framework.

  65. Introduction Statistical Methods Filtering Methods Subspace Methods Amplitude Estimation Discussion Subspace Methods In subspace methods, the full observation space is divided into signal (plus noise) and noise subspaces. The properties of these can then be used for various estimation and identification tasks. Some of the most elegant unconstrained frequency estimators are subspace methods, with ESPRIT, MODE, and root-MUSIC essentially being the only closed-form frequency estimators. There has been some interesting in subspace methods in the connection with sinusoidal speech and audio modeling (unconstrained model).

  66. Introduction Statistical Methods Filtering Methods Subspace Methods Amplitude Estimation Discussion Covariance Matrix Model � � � � T . a T a T Define Z = Z 1 · · · Z K and a ( n ) = 1 ( n ) · · · K ( n ) We write the model as K � x ( n ) = Z k a k ( n ) + e ( n ) , = Za ( n ) + e ( n ) . (125) k = 1 As shown earlier, the covariance matrix is then ( � K k = 1 Z k P k Z H k has rank V = � K k = 1 L k ) � K � � k + σ 2 I = ZPZ H + σ 2 I x ( n ) x H ( n ) Z k P k Z H R = E = (126) k = 1 � � [ | A k , 1 | 2 · · · | A k , L k | 2 ] with P k = diag and P = diag ([ P 1 · · · P K ]) .

  67. Introduction Statistical Methods Filtering Methods Subspace Methods Amplitude Estimation Discussion Let R = U Λ U H (127) be the eigenvalue decomposition (EVD) of the covariance matrix. U contains the M orthonormal eigenvectors of R , i.e., � � U = u 1 · · · u M , (128) and Λ is a diagonal matrix containing the corresponding (sorted) positive eigenvalues, λ k . Let S be formed as � � · · · S = u 1 u V . (129) The subspace that is spanned by the columns of S we denote R ( S ) , which is the same space as R ( Z ) .

  68. Introduction Statistical Methods Filtering Methods Subspace Methods Amplitude Estimation Discussion Similarly, let G be formed as � u V + 1 � G = · · · u M , (130) where R ( G ) is the so-called noise subspace . Using the EVD, the covariance matrix model can now be written as � K � � U H = Λ − σ 2 I Z k P k Z H U k . (131) k = 1 Some useful properties that can be exploited for estimation purposes can now be established. It can be seen that (MUSIC) Z H G = 0 Z H and k G = 0 ∀ k . (132) When only one source is present we have that Z = Z k and V = L k .

  69. Introduction Statistical Methods Filtering Methods Subspace Methods Amplitude Estimation Discussion Furthermore, we have that S = ZB . Defining S = [ I 0 ] S and S = [ 0 I ] S . (133) and Z = [ I 0 ] Z and Z = [ 0 I ] Z , (134) we see that Z = ZD and S = S Ξ , (135) � � [ e j ψ 1 · · · e j ψ L ] with D = diag . This leads us to (ESPRIT) Ξ = B − 1 DB , (136) from which the frequencies { ψ l } can be found.

  70. Introduction Statistical Methods Filtering Methods Subspace Methods Amplitude Estimation Discussion Pre-whitening The question is how to deal with colored noise. The classical approach is to either do a) sub-band processing or b) pre-whitening. When the noise is not white, the covariance matrix model is � � = ZPZ H + Q , x ( n ) x H ( n ) R = E (137) where Q = E { e ( n ) e H ( n ) } . Since covariance matrices are symmetric and positive definite, so are their inverses and the Cholesky factorization of Q − 1 is Q − 1 = LL H , (138) where L is an M × M lower triangular matrix. By multiplying the observed signal vectors by this matrix, we get � � L H x ( n ) x H ( n ) L = L H ZPZ H L + I . (139) E A simple implementation of this is via a pre-whitening filter.

  71. Introduction Statistical Methods Filtering Methods Subspace Methods Amplitude Estimation Discussion Rank Estimation As we have seen, the likelihood function of the observed signal can for the Gaussian case be expressed as: G − 1 � p ( { x ( n ) } ; ζ ) = p ( x ( n ); ζ ) (140) n = 0 π GM det ( R ) G e − � G 1 n = 0 x H ( n ) R − 1 x ( n ) . = (141) By taking the logarithm, we obtain the log-likelihood function L ( ζ ) = ln p ( { x ( n ) } ; ζ ) (142) G − 1 � x H ( n ) R − 1 x ( n ) . = − GM ln π − G ln det ( R ) − (143) n = 0

  72. Introduction Statistical Methods Filtering Methods Subspace Methods Amplitude Estimation Discussion As it turns out, this can be expressed as � M v = L ′ + 1 ˆ � M 1 λ v ˆ M − L ′ λ v − G ( M − L ′ ) ln L ( ζ ) = − GM ln π − G ln − GM . � M λ 1 / ( M − L ′ ) v = L ′ + 1 ˆ v v = 1 Using the AIC ( ν = 2) or MDL ( ν = 1 2 ln N ), we obtain the following cost function to be minimized for determining the rank of the signal subspace: J ( L ′ ) = −L ( ζ ) + ( L ′ ( 2 M − L ′ ) + 1 ) ν. (144) The signal subspace dimension is identical to the number of harmonics for the single-pitch case and the total number of harmonics for multi-pitch signals. Unfortunately, the criterion performs poorly in practice, especially when the noise is colored.

  73. Introduction Statistical Methods Filtering Methods Subspace Methods Amplitude Estimation Discussion 16 Log−ratio MDL 14 12 Cost Function 10 8 6 4 2 0 0 10 20 30 40 50 Rank Figure: Log-ratio between the geometric and arithmetic means and the MDL cost function.

  74. Introduction Statistical Methods Filtering Methods Subspace Methods Amplitude Estimation Discussion Angles Between Subspaces The principal angles { θ k } between the two subspaces Z = R ( Z ) and G = R ( G ) are defined recursively for k = 1 , . . . , K as u H v = u H cos ( θ k ) = max u ∈Z max k v k , (145) � u � 2 � v � 2 v ∈G where K is the minimal dimension of the two subspaces, i.e., K = min { V , M − V } and u H u i = 0 and v H v i = 0 for i = 1 , . . . , k − 1. For subspace G , the projection matrix is � � − 1 G H = GG H , G H G Π G = G (146) while for subspace Z , the projection matrix is � � − 1 Z H . Z H Z Π Z = Z (147)

  75. Introduction Statistical Methods Filtering Methods Subspace Methods Amplitude Estimation Discussion Using the two projection matrices, we can now write y H Π Z Π G z = y H cos ( θ k ) = max max k Π Z Π G z k = σ k . (148) � y � 2 � z � 2 y z for k = 1 , . . . , K . Furthermore, we require that y H y i = 0 and z H z i = 0 for i = 1 , . . . , k − 1. It follows that { σ k } are the singular values of Π Z Π G which are related to the Frobenius norm as � � � K � Π Z Π G � 2 Π Z Π G Π H G Π H σ 2 F = Tr = Tr { Π Z Π G } = k . (149) Z k = 1 Interestingly, this can be related to the Frobenius norm of the difference between the two projection matrices, i.e., � Π Z − Π G � 2 F = Tr { Π Z + Π G − 2 Π Z Π G } = M − 2 � Π Z Π G � 2 F . (150)

  76. Introduction Statistical Methods Filtering Methods Subspace Methods Amplitude Estimation Discussion The Frobenius norm of the product Π Z Π G can be rewritten as � � − 1 Z H GG H � � � Π Z Π G � 2 Z H Z F = Tr Z . (151) This can be simplified because lim M →∞ M Π Z = ZZ H : � K � � k ≈ 1 = 1 � Π Z Π G � 2 σ 2 Z H GG H Z M � Z H G � 2 F = M Tr F . (152) k = 1 By averaging over all the nontrivial angles, we now arrive at (with K = min { V , M − V } .) � K � K 1 cos 2 ( θ k ) = 1 1 σ 2 MK � Z H G � 2 k ≈ (153) F K K k = 1 k = 1 which can be used to measure the level of orthogonality to obtain a) parameter estimates and b) order estimates.

  77. Introduction Statistical Methods Filtering Methods Subspace Methods Amplitude Estimation Discussion Estimation using the Orthogonality Property For the single-pitch case, the covariance matrix model is � � x k ( n ) x H = Z k P k Z H k + σ 2 I R k = E k ( n ) (154) By forming a matrix Z k for different candidate frequencies and then measure the angles between the subspaces we can obtain an estimate as � L k ω k � Z H k G � 2 � z H ( ω k l ) G � 2 ω k = arg min ˆ F = arg min 2 , (155) ω k l = 1 i.e., we find estimates by maximizing the angles between the subspaces R ( Z k ) and R ( G ) . For an unknown model order we arrive at 1 MK � Z H k G � 2 ω k = arg min ˆ ω k min with K = min { L k , M − L k } . (156) F L k

  78. Introduction Statistical Methods Filtering Methods Subspace Methods Amplitude Estimation Discussion To recapitulate, the covariance matrix model for the multi-pitch case is � K k + σ 2 I = ZPZ H + σ 2 I . Z k P k Z H R = (157) k = 1 The subspace orthogonality property states that the matrix Z and all its sub-matrices are orthogonal to G , i.e., Z H G = 0 Z H and k G = 0 ∀ k . (158) First, assume that the model orders are known and note that F = � K � Z H G � 2 k = 1 � Z H k G � 2 F . The set of fundamental frequencies estimates are then � K { ω k } � Z H G � 2 ω k � Z H k G � 2 { ˆ ω k } = arg min F = arg min F , (159) k = 1 which allows for independent optimization over the sources.

  79. Introduction Statistical Methods Filtering Methods Subspace Methods Amplitude Estimation Discussion For the case where the model orders { L k } (and thus the rank of G ) are unknown, the estimator becomes 1 MK � Z H G � 2 { ˆ ω k } = arg min { ω k } min F , (160) { L k } where K = min { � K k = 1 L k , M − � K k = 1 L k } since the rank of the signal and noise subspaces depend on the total number of harmonics. Fortunately, the estimator can be simplified somewhat as � K 1 1 MK � Z H G � 2 MK � Z H k G � 2 { ω k } min min F = min min F . (161) ω k { L k } { L k } k = 1 Simplification: Find G first using another method.

  80. Introduction Statistical Methods Filtering Methods Subspace Methods Amplitude Estimation Discussion 1 205 Hz Source 0.9 165 Hz Source 0.5 0.8 0.7 0.4 Cost Function Cost Function 0.6 0.5 0.3 0.4 0.2 0.3 0.2 0.1 0.1 0 0 100 150 200 250 300 350 400 100 150 200 250 300 350 400 Fundamental Frequency [Hz] Fundamental Frequency [Hz] Figure: Subspace-based cost function for the two speech signals (left) and their mixture (right).

  81. Introduction Statistical Methods Filtering Methods Subspace Methods Amplitude Estimation Discussion 1 0.9 0.8 0.7 Cost Function 0.6 0.5 0.4 0.3 0.2 0.1 0 1 2 3 4 5 6 7 8 9 Number of Harmonics Figure: The cost function based on subspace orthogonality as a function of the model order L k for a signal where the true order is five.

  82. Introduction Statistical Methods Filtering Methods Subspace Methods Amplitude Estimation Discussion Frequency [Hz] 4000 2000 0 1 2 3 4 5 6 Time [ms] 1000 Estimate [Hz] 800 600 400 200 0 1 2 3 4 5 6 Time [s] Figure: Clarinet signal spectrogram (top) and pitch estimates obtained using the joint estimator (bottom).

  83. Introduction Statistical Methods Filtering Methods Subspace Methods Amplitude Estimation Discussion Inharmonicity Consider the unconstrained signal model and various models for { ψ k , l } L k � A k , l e j ( ψ k , l n + φ k , l ) + e k ( n ) . x k ( n ) = (162) l = 1 • ψ k , l l = ω k l . √ 1 + B k l 2 with B k ≪ 1 (pinned ends). • ψ k , l = ω k l √ 1 + B k l 2 � √ B k + � 1 + 2 4 • ψ k , l = ω k l (clamped ends). π 2 B k π Comments: • We refer to the last two models as parametric inharmonicity models. • Deviations from the harmonic model is sometimes referred to as inharmonicity (a very pronounced effect for some instruments).

  84. Introduction Statistical Methods Filtering Methods Subspace Methods Amplitude Estimation Discussion It is trivial to incorporate these models in MUSIC. The Z k matrix is constructed from the two nonlinear parameters as � � � √ 1 + B k ) Z k = 1 + B k L 2 . (163) z ( ω k · · · z ( ω k L k k ) Thus, estimates can be obtained as � � � 2 ω k , ˆ � Z H (ˆ B k ) = arg min k G F , (164) ω k , B k which has to be evaluated for a large range of combinations of the two parameters.

Recommend


More recommend