Analysis of Big Dependent Data in Economics and Finance Ruey S. - PowerPoint PPT Presentation

Analysis of Big Dependent Data in Economics and Finance Ruey S. Tsay Booth Shool of Business, University of Chicago September 2016 Ruey S. Tsay Big Dependent Data 1 / 72

Outline Big data? Machine learning? Data science? What is in for 1 economics and finance? Real-world data are often dynamically dependent 2 A simple example: Methods for independent data may fail 3 Trade-off between simplicity and reality 4 Some methods useful for analyzing big dependent data in 5 economics and finance Examples 6 Concluding remarks 7 Ruey S. Tsay Big Dependent Data 2 / 72

Big dependent data Accurate information is the key to success in the 1 competitive global economy. Information age. What is big data? High dimension (many variables)? Large 2 sample size? Both? Not all big data sets are useful. Confounding & Noises 3 Need to develop methods to efficiently extract useful 4 information from big data Know the limitations of big data 5 Issues emerged from big data: privacy? ethical issues? 6 Focus on methods for analyzing big dependent data in 7 economics and finance Ruey S. Tsay Big Dependent Data 3 / 72

What are available? Statistical methods: Focus on sparsity (Simplicity) 1 Various penalized regressions, e.g. Lasso and its 2 extensions Various dimension reduction methods and models 3 Common framework used: Independent observations, with 4 limited extensions to stationary data Real data are often dynamically dependent! Some useful concepts in analyzing big data: Parsimony vs sparsity: Parsimony ⇒ Sparsity 1 Simplicity vs reality: trade-off btw feasibility & 2 sophistication Ruey S. Tsay Big Dependent Data 4 / 72

Parsimonious, not sparse A simple example k k � � y t = c + β x it + ǫ t = c + β x it + ǫ t , i = 1 i = 1 where k is large, x it are not perfectly correlated, and ǫ t are iid N ( 0 , σ 2 ) . The model has three parameters so it is parsimonious, but not sparse because y depends on all explanatory variables. In some applications, � k i = 1 x it is a close approximation to the first principal component. For example, the level of interest rates is important to an economy. Fused-Lasso can solve this difficulty in some situations. Ruey S. Tsay Big Dependent Data 5 / 72

What is LASSO regression? Model: (assume mean-adjusted) p � y i = β j X j , i + ǫ i . j = 1 Matrix form: X is the design matrix Y = X β + ǫ . Objective function: In particular, if p > T � ( � Y − X β � 2 β ( λ ) = arg min 2 / T + λ � β � 1 ) , β where λ ≥ 0 is a penalty parameter, � β � 1 = � p j = 1 | β j | , 2 = � T i = 1 ( y i − X ′ � Y − X β � 2 i β ) 2 Ruey S. Tsay Big Dependent Data 6 / 72

What is the big deal? Sparsity Using convexity, LASSO is equivalent to � � Y − X β � 2 β opt ( R ) = arg 2 / T . min β ; � β � 1 ≤ R Old friend: Ridge regression � ( � Y − X β � 2 2 / T + λ � β � 2 β Ridge ( λ ) = arg min 2 ) , or β � � Y − X β � 2 β ( R ) = arg min 2 / T . β ; � β � 2 2 ≤ R Special case: p = 2. � Y − X β � 2 2 / T is quadratic. � β � 1 is a region of diamond shape, yet � β � 2 2 is a circle. Thus, LASSO leads to sparsity. Ruey S. Tsay Big Dependent Data 7 / 72

Computation and extensions Optimization: Least angle regression (lars) by Efron et al. 1 (2004) makes the computation very efficient. Extensions: 2 Group lasso: Yuan and Lin (2006). Subsets of X have specific meaning, e.g. treatment Elastic net: Zou and Hastie (2005). Using a combination of L 1 and L 2 penalties SCAD: Fan and Li (2001). Nonconcave penalized likelihood. [Smoothly clipped absolute deviation (SCAD).] Various Bayesian methods: penalty function is the prior. Packages available in R: lars, glmnet, gamlr, gbm and 3 many others. Ruey S. Tsay Big Dependent Data 8 / 72

A simulated example p = 300, T = 150, X iid N ( 0 , 1 ) , ǫ i iid N ( 0 , 0 . 25 ) . y i = x 3 i + 2 ( x 4 i + x 5 i + x 7 i ) − 2 ( x 11 , i + x 12 , i + x 13 , i + x 21 , i + x 22 , i + x 30 , i )+ ǫ i How? R demonstration 1 Selection of λ ? Cross-validation (10-fold), measurement of 2 prediction accuracy The commands lars and cv.lars of the package lars 3 The commands glmnet and cv.glmnet of the package 4 glmnet Relationship between the two packages (alpha = 0) 5 Ruey S. Tsay Big Dependent Data 9 / 72

Lasso may fail for dependent data Data generating model: scalar Gaussian autoregressive, 1 AR(3), model x t = 1 . 9 x t − 1 − 0 . 8 x t − 2 − 0 . 1 x t − 3 + a t , a t ∼ N ( 0 , 1 ) . Generate 2000 observations. See Figure 1. Big data setup 2 Dependent x t : t = 11 , . . . , 2000 Regressors: X t = [ x t − 1 , x t − 2 , . . . , x t − 10 , z 1 t , . . . , z 10 , t ] , where z it are iid N ( 0 , 1 ) . Dimension = 20, sample size 1990. Run the Lasso regression via the lars package of R. See 3 Figure 2 for results. Lag 3, x t − 3 was not selected. Lasso fails in this case. Ruey S. Tsay Big Dependent Data 10 / 72

−25000 −30000 xt −35000 −40000 0 500 1000 1500 2000 Time Figure: Time plot of simulated AR(3) time series with 2000 observations Ruey S. Tsay Big Dependent Data 11 / 72

LASSO 0 1 9 23 28 35 39 40 43 48 50 1 * * * * * * * * ** * 4e+05 * * ** * * * * * * * ** * * * * * ** ** Standardized Coefficients * ** * * * * ** * * * * * 2e+05 0e+00 * * * 5 * * * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** * * * * * * * * * * * * * * * * * * ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** * ** ** ** * ** ** ** ** ** ** ** ** ** ** ** ** ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * ** * ** * ** * ** * * ** * ** * ** * ** * ** * * ** * ** * ** * ** * ** * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * * * * * * * * * * ** * * * * ** * * * * * * * * * * * * * * * ** * * * 6 * * * * * * * ** * * * * * * ** * * ** ** * * * * * −2e+05 * * * * * * * * 2 0.0 0.2 0.4 0.6 0.8 1.0 |beta|/max|beta| Figure: Results of Lasso regression for the AR(3) series Ruey S. Tsay Big Dependent Data 12 / 72

OLS works if we entertain AR models Run the linear regression using the first three variables of X t . Fitted model x t = 1 . 902 x t − 1 − 0 . 807 x t − 2 − 0 . 095 x t − 3 + ǫ t , σ ǫ = 1 . 01 . All estimates are statistically significant with p -value less than 2 . 22 × 10 − 5 . The residuals are well behaved, e.g. Q ( 10 ) = 12.23 with p -value 0.20 (after adjusting the df). Simple time series method works for dependent data. Ruey S. Tsay Big Dependent Data 13 / 72

Why does lasso fail? Two possibilities: Scaling effect: Lasso standardizes each variable in X t . For 1 unit-root non-stationary time series, standardization might wash out the dependence in the stationary part Multicollinearity: Unit-root time series have strong serial 2 correlations. [ACF approach 1 for all lags.] This artificial example highlights the difference between independent and dependent data. Need to develop methods for big dependent data! Ruey S. Tsay Big Dependent Data 14 / 72

Possible solutions Re-parameterization using time series properties 1 Use different penalties for different parameters 2 The first approach is easier. For the particular time series, we can define ∆ x t = ( 1 − B ) x t and ∆ 2 x t = ( 1 − B ) 2 x t . Then, x t = 1 . 9 x t − 1 − 0 . 8 x t − 2 − 0 . 1 x t − 3 + a t x t − 1 + ∆ x t − 1 − 0 . 1 ∆ 2 x t − 1 + a t = = double + single + stationary + a t . The coefficients of x t − 1 , ∆ x t − 1 , ∆ 2 x t − 1 are 1, 1, an − 0 . 1, respectively. Ruey S. Tsay Big Dependent Data 15 / 72

Analysis of Big Dependent Data in Economics and Finance Ruey S. - PowerPoint PPT Presentation

Analysis of Big Dependent Data in Economics and Finance Ruey S. Tsay Booth Shool of Business, University of Chicago September 2016 Ruey S. Tsay Big Dependent Data 1 / 72 Outline Big data? Machine learning? Data science? What is in for 1

Copula Models for Dependent Data Analysis Yihao Deng Department of Mathematical Sciences Purdue

of Phenomenon in the Voivodeships of Poland MSc Micha Mrozek Faculty of Economics, Finance and

MSc Economics and MSc Finance Prof. Manolis Galenianos, Director MSc Economics Prof. Alessio

Interprocedural Analysis with Data-Dependent Calls Circularity dilemma In languages with function

WHATS DATA? INTRODUCTION TO DATA ANALYSIS LEARNING GOALS appreciate the diversity of data

Mid Mid-Hu Huds dson Valley y Com ommunit nity Profiles ofiles A Data and Analysis

Sector Fiscal Analysis: An Overview Livio Di Matteo, Economics, Lakehead University Presentation

Data Envelopment Analysis in Finance and Energy New Approaches to Efficiency and their

Week 6: An Introduction to Time Series Dependent data, autocorrelation, AR and periodic

Week 5: An Introduction to Time Series Dependent data, autocorrelation, AR and periodic

UCSC FINANCE RESEARCH GROUP PhD Program in Economics and Finance Catholic University of Milan

LLM MASTER OF LAWS in Comparative Law, Economics and Finance The Master of Laws in

Government Data for Fiscal Health Analysis April 24, 2019 XBRL Municipal Finance Data Forum Who

Careers in economics careers@bath.ac.uk Finance Huge variety of job roles e.g. investment

Local or Global Smoothing? A Bandwidth Selector for Dependent Data Francesco Giordano Maria

The finance and economics of the Internet Andrew Odlyzko School of Mathematics and Digital

Financial System Architecture and Systematic Risk Jos Jorge Cef.up Centre for Economics and

data analysis needs to be a sequence of steps with analysis decisions at step k dependent on

Joint Factor Analysis for Text-Dependent Speaker Verification Patrick Kenny, Themos Stafylakis,

START BUILDING YOUR FUTURE WITH Why Study Economics? Department of Economics Understanding:

Data Envelopment Analysis in Finance Martin Branda Faculty of Mathematics and Physics Charles

Programming, Data Management and Visualization Module E: Data analysis & visualization

Modelling Dependent Credit Risks Advanced Mathematical Methods for Finance with Extensions of

Challenges in Computational Finance and Financial Data Analysis James E. Gentle Department of

Analysis of Big Dependent Data in Economics and Finance Ruey S. - PowerPoint PPT Presentation

Analysis of Big Dependent Data in Economics and Finance Ruey S. Tsay Booth Shool of Business, University of Chicago September 2016 Ruey S. Tsay Big Dependent Data 1 / 72 Outline Big data? Machine learning? Data science? What is in for 1

Copula Models for Dependent Data Analysis Yihao Deng Department of Mathematical Sciences Purdue

of Phenomenon in the Voivodeships of Poland MSc Micha Mrozek Faculty of Economics, Finance and

MSc Economics and MSc Finance Prof. Manolis Galenianos, Director MSc Economics Prof. Alessio

Interprocedural Analysis with Data-Dependent Calls Circularity dilemma In languages with function

WHATS DATA? INTRODUCTION TO DATA ANALYSIS LEARNING GOALS appreciate the diversity of data

Mid Mid-Hu Huds dson Valley y Com ommunit nity Profiles ofiles A Data and Analysis

Sector Fiscal Analysis: An Overview Livio Di Matteo, Economics, Lakehead University Presentation

Data Envelopment Analysis in Finance and Energy New Approaches to Efficiency and their

Week 6: An Introduction to Time Series Dependent data, autocorrelation, AR and periodic

Week 5: An Introduction to Time Series Dependent data, autocorrelation, AR and periodic

UCSC FINANCE RESEARCH GROUP PhD Program in Economics and Finance Catholic University of Milan

LLM MASTER OF LAWS in Comparative Law, Economics and Finance The Master of Laws in

Government Data for Fiscal Health Analysis April 24, 2019 XBRL Municipal Finance Data Forum Who

Careers in economics careers@bath.ac.uk Finance Huge variety of job roles e.g. investment

Local or Global Smoothing? A Bandwidth Selector for Dependent Data Francesco Giordano Maria

The finance and economics of the Internet Andrew Odlyzko School of Mathematics and Digital

Financial System Architecture and Systematic Risk Jos Jorge Cef.up Centre for Economics and

data analysis needs to be a sequence of steps with analysis decisions at step k dependent on

Joint Factor Analysis for Text-Dependent Speaker Verification Patrick Kenny, Themos Stafylakis,

START BUILDING YOUR FUTURE WITH Why Study Economics? Department of Economics Understanding:

Data Envelopment Analysis in Finance Martin Branda Faculty of Mathematics and Physics Charles

Programming, Data Management and Visualization Module E: Data analysis &amp; visualization

Modelling Dependent Credit Risks Advanced Mathematical Methods for Finance with Extensions of

Challenges in Computational Finance and Financial Data Analysis James E. Gentle Department of

Programming, Data Management and Visualization Module E: Data analysis & visualization