least squares estimation filtering and prediction
play

Least Squares Estimation, Filtering, and Prediction Motivation If - PowerPoint PPT Presentation

Least Squares Estimation, Filtering, and Prediction Motivation If the second-order statistics are known, the optimum estimator is Principle of least squares given by the normal equations Normal equations In many applications, they


  1. Least Squares Estimation, Filtering, and Prediction Motivation • If the second-order statistics are known, the optimum estimator is • Principle of least squares given by the normal equations • Normal equations • In many applications, they aren’t known • Weighted least squares • Alternative approach is to estimate the coefficients from observed • Statistical properties data • FIR filters • Two possible approaches • Windowing – Estimate required moments from available data and build an approximate MMSE estimator • Combined forward-backward linear prediction – Build an estimator that minimizes some error functional • Narrowband Interference Cancellation calculated from the available data J. McNames Portland State University ECE 539/639 Least Squares Ver. 1.02 1 J. McNames Portland State University ECE 539/639 Least Squares Ver. 1.02 2 MMSE versus Least Squares Principle of Least Squares • Recall that MMSE estimators are optimal in expectation across • Will only discuss the sum of squares as the performance criterion the ensemble of all stochastic processes with the same second – Recall our earlier discussion about alternatives order statistics – Essentially, picking sum of squares will permit us to obtain a • Least squares estimators minimize the error on a given block of closed-form optimal solution data • Requires a data set where both the inputs and desired responses – In signal processing applications, the block of data is a are known finite-length period of time • Recall the range of possible applications • No guarantees about optimality on other data sets or other – Plant modeling for control (system identification) stochastic processes – Inverse modeling/deconvolution • If the process is ergodic and stationary, the LSE estimator – Interference cancellation approaches the MMSE estimator as the size of the data set grows – Prediction – This is the first time in this class we’ve discussed estimation from data – First time we need to consider ergodicity J. McNames Portland State University ECE 539/639 Least Squares Ver. 1.02 3 J. McNames Portland State University ECE 539/639 Least Squares Ver. 1.02 4

  2. Recalling the Book’s Notation Change in Notation • y ( n ) ∈ C 1 × 1 is the target or desired response In a trade of elegance and simplicity for inconsistency, I’m going to break with some of the book’s notational conventions • x k ( n ) represent the inputs M • These may be of several types � y ( n ) � c k ( n ) x k ( n ) = c T ( n ) x ( n ) Notes: ˆ – Multiple sensors, no lags: x ( n ) = [ x 1 ( n ) , x 2 ( n ) , . . . , x M ( n )] T k =1 – Lag window: x ( n ) = [ x ( n ) , x ( n − 1) , . . . , x ( n − M + 1)] T M � k ( n ) x k ( n ) = c H ( n ) x ( n ) y ( n ) � Book: ˆ c ∗ – Combined k =1 • Data sets consists of values over the time span 0 ≤ n ≤ N − 1 • In the case that c is real, they are consistent • Boldface is now used for vectors and matrices • Rationale • The coefficients are represented as c ( n ) – The inner product notation leads to unnecessary complications in the notation – Most books use the same notation that the notes use – Leads to a symmetry: c T ( n ) x ( n ) = x T ( n ) c ( n ) J. McNames Portland State University ECE 539/639 Least Squares Ver. 1.02 5 J. McNames Portland State University ECE 539/639 Least Squares Ver. 1.02 6 Definitions Matrix Formulation ⎡ x 1 (0) x 2 (0) . . . x M (0) ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ M e (0) y (0) c 1 . � y ( n ) � c k ( n ) x k ( n ) = c T ( n ) x ( n ) Estimate: ˆ . e (1) y (1) c 2 ⎢ ⎥ x 1 (1) x 2 (1) . x M (1) ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎦ = ⎦ − ⎢ . ⎥ ⎢ . ⎥ ⎢ . ⎥ k =1 ⎢ . . . ⎥ . . ... . ⎢ ⎥ ⎢ ⎥ . . . ⎢ ⎥ . . . ⎢ ⎥ . . . ⎣ ⎣ ⎣ ⎦ e ( n ) � y ( n ) − ˆ Estimation error: y ( n ) = y ( n ) − c ( n ) x ( n ) ⎣ ⎦ e ( N - 1) y ( N - 1) c M x 1 ( N - 1) x 2 ( N - 1) . . . x M ( N - 1) N − 1 � | e ( n ) | 2 E e � e = y − Xc Sum of squared errors: n =0 where • Coefficient vector c ( n ) is typically held constant over the data � T e � � e (0) e (1) . . . e ( N - 1) error data vector ( N × 1) window, 0 ≤ n ≤ N − 1 y ( N - 1) � T y � � y (0) y (1) . . . desired response vector ( N × 1) • Contrast with adaptive filter approach � T x T (0) x T (1) x T ( N - 1) X � � • The coefficients c that minimize E e are called the linear LSE . . . input data matrix ( N × M ) estimator � T c � � . . . combiner parameter vector ( M × 1) c 1 c 2 c M Note: my definitions differ from the book by a conjugate factor ∗ J. McNames Portland State University ECE 539/639 Least Squares Ver. 1.02 7 J. McNames Portland State University ECE 539/639 Least Squares Ver. 1.02 8

  3. Input Data Matrix Notation Size of Data Matrix N × 1 = e y − X N × M c ⎡ ⎤ x 1 (0) x 2 (0) . . . x M (0) ⎡ x T (0) ⎤ M × 1 N × 1 . x T (1) . ⎢ ⎥ x 1 (1) x 2 (1) . x M (1) ⎢ ⎥ • Suppose we wish to make | e | = 0 ⎢ ⎥ X = = . ⎢ ⎥ . . . ⎢ ... ⎥ . ⎢ ⎥ . . . . ⎢ ⎥ . . . ⎣ ⎦ y = Xc ⎣ ⎦ x T ( N − 1) x 1 ( N - 1) x 2 ( N - 1) . . . x M ( N - 1) • Suppose X has maximum rank � � � = � . . . � x 1 x 2 x M – Linearly independent rows or columns • We will need to reference both the row and column vectors of the – Always occurs with real data data matrix X • N linear equations and M unknowns • This is always awkward to denote – N < M : underdetermined, infinite number of solutions • Book’s notation is redundant – N = M : unique solution, c = X − 1 y – Row vectors (“snapshots”) indicated by ( n ) and boldface x – N > M : overdetermined, no solution, in general – Column vectors (“data records”) indicated by k and � x (book uses ˜ x ) • In practical applications we pick N > M (why?) • I don’t know of a more elegant solution J. McNames Portland State University ECE 539/639 Least Squares Ver. 1.02 9 J. McNames Portland State University ECE 539/639 Least Squares Ver. 1.02 10 Block Processing Normal Equations E e = � e � 2 = e H e • LSE estimators can be used in block processing mode – Take a segment of N input-output observations, say = ( y − Xc ) H ( y − Xc ) n 1 ≤ n ≤ n 1 + N − 1 = ( y H − c H X H )( y − Xc ) – Estimate the coefficients = y H y − c H X H y − y H Xc + c H X H Xc – Increment the temporal location of the block to n 1 + N 0 • E e is a nonlinear function of y , X , and c • The blocks overlap by N − N 0 samples • Is a quadratic function of each of these components • Reminiscent of Welch’s method of PSD estimation • E e is the sum of squared errors • Useful for parametric time-frequency analysis – Energy of the error signal over the interval 0 ≤ n ≤ N − 1 • In most other nonstationary applications, adaptive filters are • If we take the average squared errors (ASE), we have an estimate usually used instead of the mean square error = the estimated power of the error N − 1 P e = 1 N E e = 1 ˆ � | e ( n ) | 2 N n =0 J. McNames Portland State University ECE 539/639 Least Squares Ver. 1.02 11 J. McNames Portland State University ECE 539/639 Least Squares Ver. 1.02 12

Recommend


More recommend