High Dimensionsal Time Series High-dimensional Time Series Models George Michailidis University of Florida Transdisciplinary Foundations of Data Science IMA, September 2016 Transdisciplinary Foundations of Data Science IMA, Septemb George Michailidis High Dimensionsal Time Series / 34
High Dimensionsal Time Series Introduction Learning Tasks with Temporally Dependent Data Predictive inference, forecasting, segmentation,covariance estimation/graphical modeling Regression models: y t = X t β + ǫ t , where the p -dimensional predictors X and error term ǫ is generated by a stationary process Autoregressive models: X t = AX t − 1 + E t , where the p -dimensional error process E t is white noise Related control problem: X t = AX t − 1 + BU t + E t , together with a cost/performance function Transdisciplinary Foundations of Data Science IMA, Septemb George Michailidis High Dimensionsal Time Series / 34
High Dimensionsal Time Series Introduction Learning Tasks with Temporally Dependent Data Predictive inference, forecasting, segmentation,covariance estimation/graphical modeling (ctd) Factor models: X t = Λ F t + E t , where X t is a p -dimensional process, F t a k -dimensional latent/factor process and E t a noise process A popular model in the economics/finance literature is for the factors to be changing dynamically over time; e.g. F t = Φ F t − 1 + U t Given a multivariate time series X t and identify structural breaks; i.e. identify points in time that the structure of the model changes e.g X t = A 1 X t − 1 I ( t ≤ τ ) + A 2 X t − 1 I ( t ≥ τ ) + E t , for some τ ∈ [0 , T ] There is an online version of the problem for streaming data Estimate covariance matrix of temporally dependent data Transdisciplinary Foundations of Data Science IMA, Septemb George Michailidis High Dimensionsal Time Series / 34
High Dimensionsal Time Series Introduction Application areas Macroeconomics/Finance Functional Genomics Neuroscience Control of large networks Transdisciplinary Foundations of Data Science IMA, Septemb George Michailidis High Dimensionsal Time Series / 34
High Dimensionsal Time Series Introduction Application areas: Economics testing relationship between money and income (Sims, 1972, 1980) understanding stock price-volume relation (Hiemstra et al., 1994) dynamic effect of government spending and taxes on output (Blanchard and Jones, 2002) identify and measure the effects of monetary policy innovations on macroeconomic variables (Bernanke et al., 2005) Transdisciplinary Foundations of Data Science IMA, Septemb George Michailidis High Dimensionsal Time Series / 34
High Dimensionsal Time Series Introduction Forecasting models in Economics 6 Employment Federal Funds Rate 4 Consumer Price Index 2 0 -2 -4 Feb-60 Feb-61 Feb-62 Feb-63 Feb-64 Feb-65 Feb-66 Feb-67 Feb-68 Feb-69 Feb-70 Feb-71 Feb-72 Feb-73 Feb-74 Aug-60 Aug-61 Aug-62 Aug-63 Aug-64 Aug-65 Aug-66 Aug-67 Aug-68 Aug-69 Aug-70 Aug-71 Aug-72 Aug-73 Aug-74 -6 Transdisciplinary Foundations of Data Science IMA, Septemb George Michailidis High Dimensionsal Time Series / 34
High Dimensionsal Time Series Introduction Application areas: Functional Genomics Identify regulatory mechanisms from time course data (panel data structure) HeLa gene expression regulatory network [From: Fujita et al., 2007] Transdisciplinary Foundations of Data Science IMA, Septemb George Michailidis High Dimensionsal Time Series / 34
High Dimensionsal Time Series Introduction Application Areas: Neuroscience Identify brain connectivity regions Transdisciplinary Foundations of Data Science IMA, Septemb George Michailidis High Dimensionsal Time Series / 34
High Dimensionsal Time Series Introduction Need for high-dimensional models Economics: forecasting with many predictors (De Mol et al., 2008) or understand strcutural relationships (Christiano et al., 1999) Finance: build large scale systemic risk models Functional Genomics: reconstruct gene regulatory networks based on limited experimental data Neuroscience: build detailed connectivity maps on temporal data exhibiting multiple structural changes Network control: for large sparse systems (Liu, Slotine, Barabasi, 2011) Transdisciplinary Foundations of Data Science IMA, Septemb George Michailidis High Dimensionsal Time Series / 34
High Dimensionsal Time Series Introduction Key issues: Nature of the data measurements (numerical, count, binary) (see Raskutti et al., 2016, for models for count data) Capture the correct dynamics (see Chen and Shojaie, 2016 for models for self-exciting processes) How does the temporal dependence impact estimation and prediction accuracy? Transdisciplinary Foundations of Data Science IMA, Septemb George Michailidis High Dimensionsal Time Series / 34
High Dimensionsal Time Series Introduction Illustration of estimation accuracy Transdisciplinary Foundations of Data Science IMA, Septemb George Michailidis High Dimensionsal Time Series / 34
High Dimensionsal Time Series Modeling Framework Vector Autoregression Canonical model for understanding lead-lag cross-dependencies Successful for forecasting purposes and for intervention analysis (impulse response) Exhibits a number of technical challenges in high-dimensions Transdisciplinary Foundations of Data Science IMA, Septemb George Michailidis High Dimensionsal Time Series / 34
High Dimensionsal Time Series Modeling Framework The VAR Model p -dimensional, discrete time, stationary process X t = { X t 1 , . . . , X t p } X t = A 1 X t − 1 + . . . + A d X t − d + ǫ t , ǫ t i . i . d ∼ N ( 0 , Σ ǫ ) (1) A 1 , . . . , A d : p × p transition matrices (solid, directed edges) Σ − 1 ǫ : contemporaneous dependence (dotted, undirected edges) t =1 A t z t outside { z ∈ C , | z | ≤ 1 } Eigenvalues of A ( z ) := I p − � d stability: Transdisciplinary Foundations of Data Science IMA, Septemb George Michailidis High Dimensionsal Time Series / 34
High Dimensionsal Time Series Modeling Framework Detour: VARs and Granger Causality Concept introduced by Granger (1969) A time series X is said to Granger-cause Y if it can be shown, usually through a series of F-tests on lagged values of X (and with lagged values of Y also known), that those X values provide statistically significant information about future values of Y . In the context of a high-dimensional VAR model we have that X T − t is Granger-causal for X T if A t i , j � = 0. i j Granger-causality does not imply true causality; it is built on correlations Also, related to estimating a Directed Acyclic Graph (DAG) with ( d + 1) × p variables, with a known ordering of the variables Transdisciplinary Foundations of Data Science IMA, Septemb George Michailidis High Dimensionsal Time Series / 34
High Dimensionsal Time Series Modeling Framework Estimating high-dimensional VARs through regression data: { X 0 , X 1 , . . . , X T } - one replicate, observed at T + 1 time points construct autoregression ( X T − 1 ) ′ ( X T − 2 ) ′ ( X T − d ) ′ ( ǫ T ) ′ ( X T ) ′ · · · A ′ 1 ( X T − 2 ) ′ ( X T − 3 ) ′ ( X T − 1 − d ) ′ ( ǫ T − 1 ) ′ ( X T − 1 ) ′ · · · . = . + . . . . . ... . . . . . . . . . . . A ′ ( X d ) ′ ( X d − 1 ) ′ ( X d − 2 ) ′ ( X 0 ) ′ d ( ǫ d ) ′ · · · � �� � � �� � � �� � � �� � B ∗ Y X E vec ( X B ∗ ) + vec ( E ) vec ( Y ) = ( I ⊗ X ) vec ( B ∗ ) + vec ( E ) = β ∗ Y = Z + vec ( E ) vec ( E ) ∼ N ( 0 , Σ ǫ ⊗ I ) ���� ���� ���� � �� � Np × 1 Np × q q × 1 Np × 1 N = ( T − d + 1) , q = dp 2 Key Assumption : A t are sparse, � d t =1 � A t � 0 ≤ k Transdisciplinary Foundations of Data Science IMA, Septemb George Michailidis High Dimensionsal Time Series / 34
High Dimensionsal Time Series Modeling Framework Estimation Methods ℓ 1 -penalized least squares ( ℓ 1 -LS) 1 N � Y − Z β � 2 + λ N � β � 1 argmin β ∈ R q ℓ 1 -penalized log-likelihood ( ℓ 1 -LL) (Davis et al., 2012) N ( Y − Z β ) ′ � � 1 Σ − 1 argmin ⊗ I ( Y − Z β ) + λ N � β � 1 ǫ β ∈ R q Transdisciplinary Foundations of Data Science IMA, Septemb George Michailidis High Dimensionsal Time Series / 34
High Dimensionsal Time Series Modeling Framework ℓ 1 -LL Algorithm Objective function jointly non-convex, but convex w.r.t. B ’s and Σ − 1 ǫ Algorithm converges to stationary point near truth with high probability under high-dimensional scaling, provided it is initialized at a good point (details in Lin et al., 2016) Transdisciplinary Foundations of Data Science IMA, Septemb George Michailidis High Dimensionsal Time Series / 34
Recommend
More recommend