Analysis of Big Dependent Data in Economics and Finance Ruey S. Tsay Booth Shool of Business, University of Chicago September 2016 Ruey S. Tsay Big Dependent Data 1 / 72
Outline Big data? Machine learning? Data science? What is in for 1 economics and finance? Real-world data are often dynamically dependent 2 A simple example: Methods for independent data may fail 3 Trade-off between simplicity and reality 4 Some methods useful for analyzing big dependent data in 5 economics and finance Examples 6 Concluding remarks 7 Ruey S. Tsay Big Dependent Data 2 / 72
Big dependent data Accurate information is the key to success in the 1 competitive global economy. Information age. What is big data? High dimension (many variables)? Large 2 sample size? Both? Not all big data sets are useful. Confounding & Noises 3 Need to develop methods to efficiently extract useful 4 information from big data Know the limitations of big data 5 Issues emerged from big data: privacy? ethical issues? 6 Focus on methods for analyzing big dependent data in 7 economics and finance Ruey S. Tsay Big Dependent Data 3 / 72
What are available? Statistical methods: Focus on sparsity (Simplicity) 1 Various penalized regressions, e.g. Lasso and its 2 extensions Various dimension reduction methods and models 3 Common framework used: Independent observations, with 4 limited extensions to stationary data Real data are often dynamically dependent! Some useful concepts in analyzing big data: Parsimony vs sparsity: Parsimony ⇒ Sparsity 1 Simplicity vs reality: trade-off btw feasibility & 2 sophistication Ruey S. Tsay Big Dependent Data 4 / 72
Parsimonious, not sparse A simple example k k � � y t = c + β x it + ǫ t = c + β x it + ǫ t , i = 1 i = 1 where k is large, x it are not perfectly correlated, and ǫ t are iid N ( 0 , σ 2 ) . The model has three parameters so it is parsimonious, but not sparse because y depends on all explanatory variables. In some applications, � k i = 1 x it is a close approximation to the first principal component. For example, the level of interest rates is important to an economy. Fused-Lasso can solve this difficulty in some situations. Ruey S. Tsay Big Dependent Data 5 / 72
What is LASSO regression? Model: (assume mean-adjusted) p � y i = β j X j , i + ǫ i . j = 1 Matrix form: X is the design matrix Y = X β + ǫ . Objective function: In particular, if p > T � ( � Y − X β � 2 β ( λ ) = arg min 2 / T + λ � β � 1 ) , β where λ ≥ 0 is a penalty parameter, � β � 1 = � p j = 1 | β j | , 2 = � T i = 1 ( y i − X ′ � Y − X β � 2 i β ) 2 Ruey S. Tsay Big Dependent Data 6 / 72
What is the big deal? Sparsity Using convexity, LASSO is equivalent to � � Y − X β � 2 β opt ( R ) = arg 2 / T . min β ; � β � 1 ≤ R Old friend: Ridge regression � ( � Y − X β � 2 2 / T + λ � β � 2 β Ridge ( λ ) = arg min 2 ) , or β � � Y − X β � 2 β ( R ) = arg min 2 / T . β ; � β � 2 2 ≤ R Special case: p = 2. � Y − X β � 2 2 / T is quadratic. � β � 1 is a region of diamond shape, yet � β � 2 2 is a circle. Thus, LASSO leads to sparsity. Ruey S. Tsay Big Dependent Data 7 / 72
Computation and extensions Optimization: Least angle regression (lars) by Efron et al. 1 (2004) makes the computation very efficient. Extensions: 2 Group lasso: Yuan and Lin (2006). Subsets of X have specific meaning, e.g. treatment Elastic net: Zou and Hastie (2005). Using a combination of L 1 and L 2 penalties SCAD: Fan and Li (2001). Nonconcave penalized likelihood. [Smoothly clipped absolute deviation (SCAD).] Various Bayesian methods: penalty function is the prior. Packages available in R: lars, glmnet, gamlr, gbm and 3 many others. Ruey S. Tsay Big Dependent Data 8 / 72
A simulated example p = 300, T = 150, X iid N ( 0 , 1 ) , ǫ i iid N ( 0 , 0 . 25 ) . y i = x 3 i + 2 ( x 4 i + x 5 i + x 7 i ) − 2 ( x 11 , i + x 12 , i + x 13 , i + x 21 , i + x 22 , i + x 30 , i )+ ǫ i How? R demonstration 1 Selection of λ ? Cross-validation (10-fold), measurement of 2 prediction accuracy The commands lars and cv.lars of the package lars 3 The commands glmnet and cv.glmnet of the package 4 glmnet Relationship between the two packages (alpha = 0) 5 Ruey S. Tsay Big Dependent Data 9 / 72
Lasso may fail for dependent data Data generating model: scalar Gaussian autoregressive, 1 AR(3), model x t = 1 . 9 x t − 1 − 0 . 8 x t − 2 − 0 . 1 x t − 3 + a t , a t ∼ N ( 0 , 1 ) . Generate 2000 observations. See Figure 1. Big data setup 2 Dependent x t : t = 11 , . . . , 2000 Regressors: X t = [ x t − 1 , x t − 2 , . . . , x t − 10 , z 1 t , . . . , z 10 , t ] , where z it are iid N ( 0 , 1 ) . Dimension = 20, sample size 1990. Run the Lasso regression via the lars package of R. See 3 Figure 2 for results. Lag 3, x t − 3 was not selected. Lasso fails in this case. Ruey S. Tsay Big Dependent Data 10 / 72
−25000 −30000 xt −35000 −40000 0 500 1000 1500 2000 Time Figure: Time plot of simulated AR(3) time series with 2000 observations Ruey S. Tsay Big Dependent Data 11 / 72
LASSO 0 1 9 23 28 35 39 40 43 48 50 1 * * * * * * * * ** * 4e+05 * * ** * * * * * * * ** * * * * * ** ** Standardized Coefficients * ** * * * * ** * * * * * 2e+05 0e+00 * * * 5 * * * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** * * * * * * * * * * * * * * * * * * ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** * ** ** ** * ** ** ** ** ** ** ** ** ** ** ** ** ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** * ** * ** * ** * * ** * ** * ** * ** * ** * * ** * ** * ** * ** * ** * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * * * * * * * * * * ** * * * * ** * * * * * * * * * * * * * * * ** * * * 6 * * * * * * * ** * * * * * * ** * * ** ** * * * * * −2e+05 * * * * * * * * 2 0.0 0.2 0.4 0.6 0.8 1.0 |beta|/max|beta| Figure: Results of Lasso regression for the AR(3) series Ruey S. Tsay Big Dependent Data 12 / 72
OLS works if we entertain AR models Run the linear regression using the first three variables of X t . Fitted model x t = 1 . 902 x t − 1 − 0 . 807 x t − 2 − 0 . 095 x t − 3 + ǫ t , σ ǫ = 1 . 01 . All estimates are statistically significant with p -value less than 2 . 22 × 10 − 5 . The residuals are well behaved, e.g. Q ( 10 ) = 12.23 with p -value 0.20 (after adjusting the df). Simple time series method works for dependent data. Ruey S. Tsay Big Dependent Data 13 / 72
Why does lasso fail? Two possibilities: Scaling effect: Lasso standardizes each variable in X t . For 1 unit-root non-stationary time series, standardization might wash out the dependence in the stationary part Multicollinearity: Unit-root time series have strong serial 2 correlations. [ACF approach 1 for all lags.] This artificial example highlights the difference between independent and dependent data. Need to develop methods for big dependent data! Ruey S. Tsay Big Dependent Data 14 / 72
Possible solutions Re-parameterization using time series properties 1 Use different penalties for different parameters 2 The first approach is easier. For the particular time series, we can define ∆ x t = ( 1 − B ) x t and ∆ 2 x t = ( 1 − B ) 2 x t . Then, x t = 1 . 9 x t − 1 − 0 . 8 x t − 2 − 0 . 1 x t − 3 + a t x t − 1 + ∆ x t − 1 − 0 . 1 ∆ 2 x t − 1 + a t = = double + single + stationary + a t . The coefficients of x t − 1 , ∆ x t − 1 , ∆ 2 x t − 1 are 1, 1, an − 0 . 1, respectively. Ruey S. Tsay Big Dependent Data 15 / 72
Recommend
More recommend