si231 matrix computations lecture 3 least squares
play

SI231 Matrix Computations Lecture 3: Least Squares Ziping Zhao - PowerPoint PPT Presentation

SI231 Matrix Computations Lecture 3: Least Squares Ziping Zhao Fall Term 20202021 School of Information Science and Technology ShanghaiTech University, Shanghai, China Lecture 3: Least Squares Part I: linear representations


  1. SI231 Matrix Computations Lecture 3: Least Squares Ziping Zhao Fall Term 2020–2021 School of Information Science and Technology ShanghaiTech University, Shanghai, China

  2. Lecture 3: Least Squares • Part I: linear representations – time-series modeling, Vandermonde matrix – basis representation – discrete-time linear time-invariant systems, Toeplitz matrix, circulant matrix – OFDM, localization • Part II: least squares (LS) – LS by normal equation – projection theorem, orthogonal projection, pseudo-inverse – LS by optimization • Part III: extensions – matrix factorization, PCA, matrix completion – gradient descent, online algorithms Ziping Zhao 1

  3. Main Result • Problem: given y ∈ R m , A ∈ R m × n , solve x ∈ R n � y − Ax � 2 min (LS) 2 – called (linear) least squares (LS) – find an x whose residual r = y − Ax is the smallest in the Euclidean sense • Solution: suppose that A has full column rank ( m ≥ n ) . The solution to (LS) is unique and is given by x LS = ( A T A ) − 1 A T y – complexity: O ( mn 2 + n 3 ) – LS solution to an overdetermined system of equations y = Ax ( m > n ) ∗ in which case it is very likely that y / ∈ R ( A ) – if A is semi-orthogonal, the solution is simplified to x LS = A T y – if A is square, the solution is simplified to x LS = A − 1 y – unless specified, in this lecture we will assume A to have full column rank without further mentioning Ziping Zhao 2

  4. Part I: Linear Representations Ziping Zhao 3

  5. Linear Representation There are numerous applications in which we deal with a representation y = Ax , or y = Ax + v , where y is given; A is given or stipulated; x is to be determined; v is noise or error. • the first case y = Ax is the one we have learned (partially) in Lecture 2 • we also deal with a representation of multiple given y ’s: y i = Ax i , i = 1 , . . . , n, in which case the problem can also be written as Y = AX , and both x i ’s or X and A are to be determined. (cf. Part III) Ziping Zhao 4

  6. Time Series • let y t , t = 0 , 1 , . . . , be a real-valued time series. • examples: speech signal, music, stock market index, real-time seismic waveforms, air quality index (AQI), sunspot counts, ... Sunspot time series. Source: http://sunspotwatch.com Ziping Zhao 5

  7. Time Series • one can analyze a time series using model-free techniques such as Fourier transform – by model-free, we mean that we make little assumptions on the time series • we can also apply a model • model-based approaches exploit problem natures and can work very well— assuming that you choose a right model for your data Ziping Zhao 6

  8. Harmonic Model for Time Series • Harmonic model: k � A i r t y t = i cos(2 πf i t + φ i ) + v t , t = 0 , 1 , . . . i =1 � � − 1 2 , 1 for some positive integer k and for some A i > 0 , r i > 0 , f i ∈ , 2 φ i ∈ [0 , 2 π ) , i = 1 , . . . , k ; v t is noise or modeling error. – ( A i , r i , f i , φ i ) ’s are model parameters and unknown – k is called the model order; also unknown but we can plug a guess number • we can use the Hilbert transform 1 to convert y t to a complex time series k k � � i e j (2 πf i t + φ i ) + ˜ A i r t α i z t y t = ˜ v t = i + ˜ v t , i =1 i =1 where α i = A i e j φ i , z i = r i e j 2 πf i . 1 call hilbert on MATLAB Ziping Zhao 7

  9. Harmonic Model for Time Series • suppose z i ’s are known, and the observation time window is T . Then,       y 0 ˜ 1 1 · · · 1 v 0 ˜   α 1 z 1 z 2 · · · z k y 1 ˜ v 1 ˜       α 2         z 2 z 2 z 2 · · · y 2 ˜ = + v 2 ˜      .    . 1 2 k .    . .    .   . . . . . . . . .       α k z T − 1 z T − 1 z T − 1 y T − 1 ˜ ˜ v T − 1 · · · � �� � 1 2 k = x � �� � � �� � � �� � = y = v = A – we can estimate the amplitude-phase coefficients α i ’s from { ˜ y t } via LS, given information of the frequencies f i ’s and the damping coefficients r i ’s Ziping Zhao 8

  10. Vandermonde Matrix A matrix A ∈ C m × n is said to be Vandermonde if it takes the form   1 1 · · · 1 z 1 z 2 · · · z n     z 2 z 2 z 2 · · · A = ,   1 2 n   . . . . . .   z m − 1 z m − 1 z m − 1 · · · n 1 2 where z i ∈ C , i = 1 , . . . , n , are called the roots of the Vandermonde matrix. • Fact: a Vandermonde A has full rank if its roots are distinct; i.e., z i � = z j for all i, j with i � = j – Vandermonde matrices possess a stronger linear independence property: if we pick any k columns of A , with k ≤ m , they are always linearly independent. (verify as a mini exercise) Ziping Zhao 9

  11. Autoregessive Model for Time Series • Autoregressive (AR) model: y t = a 1 y t − 1 + a 2 y t − 2 + . . . + a q y t − q + v t , t = 0 , 1 , . . . for some coefficient a ∈ R q and for some positive integer (or model order) q . – model y t as being related to its past values in a linear manner – also called the all-pole model in signals and systems Ziping Zhao 10

  12. Autoregessive Model for Time Series • let T + 1 be the observation time window. We have       y 1 y 0 v 1 y 2 y 1 y 0 v 2         a 1       . . . ... . . . . . .       a 2         y q = y q − 1 . . . y 1 y 0 + v q         . . .       . . .   . . . . . . . . .       a q  .   . .   .  . . . . . . . .       � �� � = x y T y T − q +1 . . . . . . y T − 1 v T � �� � � �� � � �� � = y = v = A – we can estimate the AR coefficients a i ’s from { y t } T t =0 via LS Ziping Zhao 11

  13. Autoregessive Model for Time Series • Prediction: suppose a is known and we have the time series up to time t − 1 . – we may predict the present from the past via y t = a 1 y t − 1 + a 2 y t − 2 + . . . + a q y t − q ˆ – we may also try to predict the future by recursively running y t + d = a 1 ˆ ˆ y t + d − 1 + a 2 ˆ y t + d − 2 + . . . + a q ˆ y t + d − q , d = 1 , 2 , . . . where we denote ˆ y t − i = y t − i for i = 1 , . . . , q . Ziping Zhao 12

  14. Toy Demo.: Predicting Hang Seng Index 4 2.3 x 10 Hang Seng Index Linear Prediction 2.25 2.2 Hang Seng Index 2.15 2.1 2.05 2 1.95 1.9 0 10 20 30 40 50 60 day blue: Hang Seng Index during a certain time period. y t = � q red: training phase; ˆ i =1 a i y t − i ; a is obtained by LS; q = 10 . y t + d = � q green: prediction phase; ˆ i =1 a i ˆ y t + d − i . Ziping Zhao 13

  15. Moving Average Model for Time Series • Moving Average (MA) model: y t = b 1 v t + b 2 v t − 1 + . . . + b p v t − p +1 , t = 0 , 1 , . . . for some coefficient b ∈ R p ; p is the model order; v t is unknown but assumed to be “white.” • not as simple as the AR case; roughly speaking we can do this trick: 1 ⇒ convert back in time as AR Y ( z ) = B ( z ) V ( z ) = ⇒ Y ( z ) = V ( z ) = with many a i ’s B ( z ) � �� � = A ( z ) here X ( z ) denotes the z -transform of x t . • one can also do ARMA model • further reading: [Stoica-Moses’97] Ziping Zhao 14

  16. Polynomial Model for Time Series • Polynomial model: y t = a 0 + a 1 t + a 2 t 2 + . . . + a p t p + v t , t = 0 , 1 , . . . , where a ∈ R p +1 . – p = 1 : a linear line, p = 2 : quadratic, ... • Interpolation: use a 0 + a 1 t + a 2 t 2 + . . . + a p t p to predict y t for any t ∈ R • we have       y 0 1 0 · · · 0 v 0   a 0 . . . . . . . . . . . .       a 1         t p y t 1 t · · · v t = +         . . .       . . .   . . . . . . . . .       a p ( T − 1) p y T − 1 1 T − 1 · · · v T − 1 � �� � = x � �� � � �� � � �� � = y = v = A – A T is Vandermonde with distinct roots; thus A has full rank Ziping Zhao 15

  17. Curve Fitting Aim: given a set of input-output data pairs ( x i , y i ) ∈ R × R , i = 1 , . . . , m , find a function f ( x ) that fits the data well 4 Samples 3 2 1 0 −1 y −2 −3 −4 −5 −6 −1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1 x Ziping Zhao 16

  18. Curve Fitting i =0 a i x i and use LS Like time series, we can apply a polynomial model f ( x ) = � p 4 "True" Curve Samples 3 Fitted Curve 2 1 0 −1 y −2 −3 −4 −5 −6 −1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1 x “True” curve: the true f ( x ) ; p = 5 . Fitted curve: estimated f ( x ) ; a obtained by LS; p = 5 . Ziping Zhao 17

Recommend


More recommend