Least Squares Estimation, Filtering, and Prediction Motivation • If the second-order statistics are known, the optimum estimator is • Principle of least squares given by the normal equations • Normal equations • In many applications, they aren’t known • Weighted least squares • Alternative approach is to estimate the coefficients from observed • Statistical properties data • FIR filters • Two possible approaches • Windowing – Estimate required moments from available data and build an approximate MMSE estimator • Combined forward-backward linear prediction – Build an estimator that minimizes some error functional • Narrowband Interference Cancellation calculated from the available data J. McNames Portland State University ECE 539/639 Least Squares Ver. 1.02 1 J. McNames Portland State University ECE 539/639 Least Squares Ver. 1.02 2 MMSE versus Least Squares Principle of Least Squares • Recall that MMSE estimators are optimal in expectation across • Will only discuss the sum of squares as the performance criterion the ensemble of all stochastic processes with the same second – Recall our earlier discussion about alternatives order statistics – Essentially, picking sum of squares will permit us to obtain a • Least squares estimators minimize the error on a given block of closed-form optimal solution data • Requires a data set where both the inputs and desired responses – In signal processing applications, the block of data is a are known finite-length period of time • Recall the range of possible applications • No guarantees about optimality on other data sets or other – Plant modeling for control (system identification) stochastic processes – Inverse modeling/deconvolution • If the process is ergodic and stationary, the LSE estimator – Interference cancellation approaches the MMSE estimator as the size of the data set grows – Prediction – This is the first time in this class we’ve discussed estimation from data – First time we need to consider ergodicity J. McNames Portland State University ECE 539/639 Least Squares Ver. 1.02 3 J. McNames Portland State University ECE 539/639 Least Squares Ver. 1.02 4
Recalling the Book’s Notation Change in Notation • y ( n ) ∈ C 1 × 1 is the target or desired response In a trade of elegance and simplicity for inconsistency, I’m going to break with some of the book’s notational conventions • x k ( n ) represent the inputs M • These may be of several types � y ( n ) � c k ( n ) x k ( n ) = c T ( n ) x ( n ) Notes: ˆ – Multiple sensors, no lags: x ( n ) = [ x 1 ( n ) , x 2 ( n ) , . . . , x M ( n )] T k =1 – Lag window: x ( n ) = [ x ( n ) , x ( n − 1) , . . . , x ( n − M + 1)] T M � k ( n ) x k ( n ) = c H ( n ) x ( n ) y ( n ) � Book: ˆ c ∗ – Combined k =1 • Data sets consists of values over the time span 0 ≤ n ≤ N − 1 • In the case that c is real, they are consistent • Boldface is now used for vectors and matrices • Rationale • The coefficients are represented as c ( n ) – The inner product notation leads to unnecessary complications in the notation – Most books use the same notation that the notes use – Leads to a symmetry: c T ( n ) x ( n ) = x T ( n ) c ( n ) J. McNames Portland State University ECE 539/639 Least Squares Ver. 1.02 5 J. McNames Portland State University ECE 539/639 Least Squares Ver. 1.02 6 Definitions Matrix Formulation ⎡ x 1 (0) x 2 (0) . . . x M (0) ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ M e (0) y (0) c 1 . � y ( n ) � c k ( n ) x k ( n ) = c T ( n ) x ( n ) Estimate: ˆ . e (1) y (1) c 2 ⎢ ⎥ x 1 (1) x 2 (1) . x M (1) ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎦ = ⎦ − ⎢ . ⎥ ⎢ . ⎥ ⎢ . ⎥ k =1 ⎢ . . . ⎥ . . ... . ⎢ ⎥ ⎢ ⎥ . . . ⎢ ⎥ . . . ⎢ ⎥ . . . ⎣ ⎣ ⎣ ⎦ e ( n ) � y ( n ) − ˆ Estimation error: y ( n ) = y ( n ) − c ( n ) x ( n ) ⎣ ⎦ e ( N - 1) y ( N - 1) c M x 1 ( N - 1) x 2 ( N - 1) . . . x M ( N - 1) N − 1 � | e ( n ) | 2 E e � e = y − Xc Sum of squared errors: n =0 where • Coefficient vector c ( n ) is typically held constant over the data � T e � � e (0) e (1) . . . e ( N - 1) error data vector ( N × 1) window, 0 ≤ n ≤ N − 1 y ( N - 1) � T y � � y (0) y (1) . . . desired response vector ( N × 1) • Contrast with adaptive filter approach � T x T (0) x T (1) x T ( N - 1) X � � • The coefficients c that minimize E e are called the linear LSE . . . input data matrix ( N × M ) estimator � T c � � . . . combiner parameter vector ( M × 1) c 1 c 2 c M Note: my definitions differ from the book by a conjugate factor ∗ J. McNames Portland State University ECE 539/639 Least Squares Ver. 1.02 7 J. McNames Portland State University ECE 539/639 Least Squares Ver. 1.02 8
Input Data Matrix Notation Size of Data Matrix N × 1 = e y − X N × M c ⎡ ⎤ x 1 (0) x 2 (0) . . . x M (0) ⎡ x T (0) ⎤ M × 1 N × 1 . x T (1) . ⎢ ⎥ x 1 (1) x 2 (1) . x M (1) ⎢ ⎥ • Suppose we wish to make | e | = 0 ⎢ ⎥ X = = . ⎢ ⎥ . . . ⎢ ... ⎥ . ⎢ ⎥ . . . . ⎢ ⎥ . . . ⎣ ⎦ y = Xc ⎣ ⎦ x T ( N − 1) x 1 ( N - 1) x 2 ( N - 1) . . . x M ( N - 1) • Suppose X has maximum rank � � � = � . . . � x 1 x 2 x M – Linearly independent rows or columns • We will need to reference both the row and column vectors of the – Always occurs with real data data matrix X • N linear equations and M unknowns • This is always awkward to denote – N < M : underdetermined, infinite number of solutions • Book’s notation is redundant – N = M : unique solution, c = X − 1 y – Row vectors (“snapshots”) indicated by ( n ) and boldface x – N > M : overdetermined, no solution, in general – Column vectors (“data records”) indicated by k and � x (book uses ˜ x ) • In practical applications we pick N > M (why?) • I don’t know of a more elegant solution J. McNames Portland State University ECE 539/639 Least Squares Ver. 1.02 9 J. McNames Portland State University ECE 539/639 Least Squares Ver. 1.02 10 Block Processing Normal Equations E e = � e � 2 = e H e • LSE estimators can be used in block processing mode – Take a segment of N input-output observations, say = ( y − Xc ) H ( y − Xc ) n 1 ≤ n ≤ n 1 + N − 1 = ( y H − c H X H )( y − Xc ) – Estimate the coefficients = y H y − c H X H y − y H Xc + c H X H Xc – Increment the temporal location of the block to n 1 + N 0 • E e is a nonlinear function of y , X , and c • The blocks overlap by N − N 0 samples • Is a quadratic function of each of these components • Reminiscent of Welch’s method of PSD estimation • E e is the sum of squared errors • Useful for parametric time-frequency analysis – Energy of the error signal over the interval 0 ≤ n ≤ N − 1 • In most other nonstationary applications, adaptive filters are • If we take the average squared errors (ASE), we have an estimate usually used instead of the mean square error = the estimated power of the error N − 1 P e = 1 N E e = 1 ˆ � | e ( n ) | 2 N n =0 J. McNames Portland State University ECE 539/639 Least Squares Ver. 1.02 11 J. McNames Portland State University ECE 539/639 Least Squares Ver. 1.02 12
Recommend
More recommend