Spectral Analysis of Stationary Stochastic Process Hanxiao Liu hanxiaol@cs.cmu.edu February 20, 2016 1 / 16
Outline ◮ Stationarity ◮ The time-frequency dual ◮ Spectral representation ◮ Marginal/conditional dependencies ◮ Inference 2 / 16
Stationary Stochastic Process Strong stationarity: ∀ t 1 , . . . , t k , h D ( X ( t 1 ) , . . . , X ( t k )) = ( X ( t 1 + h ) , . . . , X ( t k + h )) (1) Weak/2nd-order stationarity: X ( t ) X ( t ) ⊤ � � < ∞ ∀ t (2) E E ( X ( t )) = µ ∀ t (3) Cov ( X ( t ) , X ( t + h )) = Γ( h ) ∀ t, h (4) The r.h.s. does not depend on t . Γ( h ) autocovariance function (marginal dependencies) Γ(0) variance (power) of X 3 / 16
Spectral Representation Theorem � π e iwt dZ ( ω ) X ( t ) = (5) − π ◮ E [ dZ ( ω ) dZ ∗ ( ω ′ )] = 0 if ω � = ω ′ . ◮ ∗ denotes Hermitian (conjugate) transpose. Compared to X ( t ), we are more interested in Γ( h )— 0 illustrative animation A and B. 4 / 16
Spectral Representation Theorem X (0) X ( h ) ⊤ � � Γ( h ) = E (6) �� � � ω ′ e iw ′ h dZ ∗ ( ω ′ ) e 0 dZ ( ω ) = E (7) ω � � ω ′ e iw ′ h E [ dZ ( ω ) dZ ∗ ( ω ′ )] = (8) ω � e iwh E [ dZ ( ω ) dZ ∗ ( ω )] = (9) ω � e iwh s ( ω ) dω = (10) ω Γ( h ) - covariance with lag h (time domain) s ( ω ) - covariance at frequency ω (freq domain) 5 / 16
Spectral Density Function The Fourier transform pair � e iwh s ( ω ) dω Γ( h ) = (11) ω ∞ s ( ω ) = 1 � Γ( h ) e − iωh (12) 2 π h = −∞ We call s the spectral density function , since � Γ(0) = s ( ω ) dω (13) ω Γ(0) = Cov( X ( t ) , X ( t )) = cumulative effect of s ( w ) 6 / 16
Marginal Dependencies Γ( h ) ← sample autocovariance function N − h − 1 Γ( h ) = 1 � ⊤ ˆ � X ( t ) − ¯ X ( t + h ) − ¯ � � � (14) X X N t =0 Asymptotic normality under mild assumptions. s ( ω ) ← periodogram . Let ω k = 2 πk N , I ( ω k ) = d ( k ) d ( k ) ∗ → ˆ s ( ω ) (15) t =0 X ( t ) e − ikt is obtained via DFT. � N − 1 where d ( k ) := 1 N ◮ bad estimator in general ◮ good estimator with appropriate smoothing 7 / 16
Conditional Dependence For time-series i and j X i X j | X V \{ i,j } (16) = | � � ⇐ ⇒ Cov X i ( t ) , X i ( t + h ) | X V \{ i,j } = 0 , ∀ h (17) ⇒ (Γ( h ) − 1 ) ij = 0 , ∀ h ⇐ (18) ⇒ ( s ( ω ) − 1 ) ij = 0 , ∀ ω ∈ [0 , 2 π ] ⇐ (19) Inferring conditional dependences ◮ = inferring Γ( h ) − 1 ◮ = inferring s ( ω ) − 1 Applicable to any stationary X 8 / 16
Autoregressive Gaussian Process The Autoregressive (AR) process p � X ( t ) = − A h X ( t − h ) + ǫ ( t ) (20) h =1 ǫ ( t ) Gaussian white noise ∼ N (0 , Σ) We’d like to parametrize s ( ω ) − 1 with A ◮ Inferring conditional dependences for AR can be cast as an optimization problem w.r.t. A 9 / 16
Filter Theorem For any stationary X and { a t } s.t. � ∞ t = −∞ | a t | < ∞ , process Y ( t ) = � ∞ h = −∞ a h X ( t − h ) is stationary with s Y ( ω ) = |A ( e iω ) | 2 s X ( ω ) (21) where A ( z ) = � ∞ −∞ a h z − h In 1-d AR, ⇒ s ( ω ) − 1 = |A ( e iω ) | 2 ǫ ( t ) = x ( t ) + � p h =1 a h x ( t − h ) = σ 2 Multi-dimensional analogy: s ( ω ) − 1 = A ( e iω )Σ − 1 A ( e iω ) ∗ (22) where A ( z ) = � p h =0 A h z − h , A 0 := I . 10 / 16
Parametrized Spectral Density Parametrize s ( ω ) − 1 by AR parameters p p � ∗ � � � s ( ω ) − 1 = � � A h e − ihω Σ − 1 A h e − ihω (23) h =0 h =0 p = Y 0 + 1 � e − ihω Y h + e ihω Y ⊤ � � (24) h 2 h =1 where Y 0 = � p h Σ − 1 A h , Y h = 2 � p − h h =0 A ⊤ i =0 A ⊤ i Σ − 1 A i + h def = Σ − 1 ⇒ Y 0 = � p h B h , Y h = 2 � p − h h =0 B ⊤ i =0 B ⊤ 2 A h = B h i B i + h ( s ( ω ) − 1 ) ij = 0 ⇐ ⇒ ( Y h ) ij = ( Y h ) ji = 0, ∀ 0 , . . . , p , i.e. linear constraints over Y ⇐ ⇒ quadratic constraints over B 11 / 16
Conditional MLE Simplification: fix x (1) , . . . x ( p ) p � A h x ( t − h ) ǫ ( t ) = (25) h =0 x ( t ) x ( t − 1) = [ A 0 , . . . , A h ] := A x ( t ) ∼ N (0 , Σ) (26) . . . x ( t − p ) A least-squares estimate. Likelihood = e − 1 � N t = p +1 x ( t ) ⊤ A ⊤ Σ − 1 A x ( t ) = e − 1 � N t = p +1 x ( t ) ⊤ B ⊤ B x ( t ) B =Σ − 1 2 2 2 A = = = = = = m ( N − p ) m ( N − p ) N − p (2 π ) (det Σ) (2 π ) (det B 0 ) p − N 2 2 2 (27) 12 / 16
Regularized ML Maximize log-likelihood � CB ⊤ B � − 2 log det B 0 + tr min (28) B Solution given by Yule-Walker equations. Enforcing sparsity over s ( ω ) − 1 CB ⊤ B + γ � D ( B ⊤ B ) � 1 � � min − 2 log det B 0 + tr (29) B Convex relaxation: min − log det Z 00 + tr ( CZ ) + γ � D ( Z ) � 1 (30) Z � 0 ◮ Exact if rank( Z ∗ ) ≤ m ◮ Bregman divergence + ℓ 1 -regularization. Well studied. 13 / 16
Non-stationary Extensions With stationarity ∞ s ( ω ) = 1 � Γ( h ) e − iωh (31) 2 π h = −∞ No stationarity? The Wigner-Ville spectrum ∞ � � s ( t, ω ) = 1 t + h 2 , t − h � e − iωh Γ (32) 2 π 2 h = −∞ Other types of power spectra ◮ Rihaczek spectrum ◮ (Generalized) Evolutionary spectrum 14 / 16
Reference I Bach, F. R. and Jordan, M. I. (2004). Learning graphical models for stationary time series. Signal Processing, IEEE Transactions on , 52(8):2189–2199. Basu, S., Michailidis, G., et al. (2015). Regularized estimation in sparse high-dimensional time series models. The Annals of Statistics , 43(4):1535–1567. Matz, G. and Hlawatsch, F. (2003). Time-varying power spectra of nonstationary random processes . na. Pereira, J., Ibrahimi, M., and Montanari, A. (2010). Learning networks of stochastic differential equations. In Advances in Neural Information Processing Systems , pages 172–180. Songsiri, J., Dahl, J., and Vandenberghe, L. (2010). Graphical models of autoregressive processes. Convex Optimization in Signal Processing and Communications , pages 89–116. 15 / 16
Reference II Songsiri, J. and Vandenberghe, L. (2010). Topology selection in graphical models of autoregressive processes. The Journal of Machine Learning Research , 11:2671–2705. Tank, A., Foti, N. J., and Fox, E. B. (2015). Bayesian structure learning for stationary time series. In Uncertainty in Artificial Intelligence, UAI 2015, July 12-16, 2015, Amsterdam, The Netherlands , pages 872–881. 16 / 16
Recommend
More recommend