Latent and Network Models with Applications to Finance Jingchen Liu Department of Statistics Columbia University Joint work with Yunxiao Chen, Xiaoou Li, and Zhiliang Ying At ISFA-Columbia Workshop, June 28, 2016 November 16, 2015 1 / 44
Modeling multivariate distribution ◮ Multivariate random vector: ( R 1 , ..., R J ) ◮ Continuous vectors: multivariate Gaussian, multivariate t -distribution... ◮ Categorical vectors: loglinear model... ◮ Copula ◮ Regression 2 / 44
Modeling multivariate distribution ◮ Multivariate random vector: ( R 1 , ..., R J ) ◮ Continuous vectors: multivariate Gaussian, multivariate t -distribution... ◮ Categorical vectors: loglinear model... ◮ Copula ◮ Regression 2 / 44
Modeling multivariate distribution ◮ Multivariate random vector: ( R 1 , ..., R J ) ◮ Continuous vectors: multivariate Gaussian, multivariate t -distribution... ◮ Categorical vectors: loglinear model... ◮ Copula ◮ Regression 2 / 44
Modeling multivariate distribution ◮ Multivariate random vector: ( R 1 , ..., R J ) ◮ Continuous vectors: multivariate Gaussian, multivariate t -distribution... ◮ Categorical vectors: loglinear model... ◮ Copula ◮ Regression 2 / 44
Modeling multivariate distribution ◮ Multivariate random vector: ( R 1 , ..., R J ) ◮ Continuous vectors: multivariate Gaussian, multivariate t -distribution... ◮ Categorical vectors: loglinear model... ◮ Copula ◮ Regression 2 / 44
Latent variable modeling ◮ There exists α such that f ( R 1 , ..., R J | α ) is simple. ◮ What is considered as simple? ◮ Independence, small variance,... 3 / 44
Latent variable modeling ◮ There exists α such that f ( R 1 , ..., R J | α ) is simple. ◮ What is considered as simple? ◮ Independence, small variance,... 3 / 44
Latent variable modeling ◮ There exists α such that f ( R 1 , ..., R J | α ) is simple. ◮ What is considered as simple? ◮ Independence, small variance,... 3 / 44
Graphical representation 4 / 44
Local independence � f ( R 1 , ..., R J | α ) = f ( R j | α ) j 5 / 44
Applications ◮ Finance, political sciences ◮ Education ◮ Psychiatry/psychology ◮ Marketing and e-commerce 6 / 44
Applications ◮ Finance, political sciences ◮ Education ◮ Psychiatry/psychology ◮ Marketing and e-commerce 6 / 44
Applications ◮ Finance, political sciences ◮ Education ◮ Psychiatry/psychology ◮ Marketing and e-commerce 6 / 44
Applications ◮ Finance, political sciences ◮ Education ◮ Psychiatry/psychology ◮ Marketing and e-commerce 6 / 44
Linear factor models ◮ ( R 1 , ..., R J ) is continous. ◮ Linear factor models: α = ( α 1 , ..., α K ) R j = a ⊤ j α + ε j ◮ Principle component analysis 7 / 44
Linear factor models ◮ ( R 1 , ..., R J ) is continous. ◮ Linear factor models: α = ( α 1 , ..., α K ) R j = a ⊤ j α + ε j ◮ Principle component analysis 7 / 44
Linear factor models ◮ ( R 1 , ..., R J ) is continous. ◮ Linear factor models: α = ( α 1 , ..., α K ) R j = a ⊤ j α + ε j ◮ Principle component analysis 7 / 44
Categorical variable and item response theory model ◮ Binary R i ∈ { 0 , 1 } . e a ⊤ α − bj j ◮ P ( R j = 1 | α ) = α ∈ R K α − bj , a ⊤ 1+ e j 1.0 0.8 0.6 y 0.4 0.2 0.0 -4 -2 0 2 4 x 8 / 44
Categorical variable and item response theory model ◮ Binary R i ∈ { 0 , 1 } . e a ⊤ α − bj j ◮ P ( R j = 1 | α ) = α ∈ R K α − bj , a ⊤ 1+ e j 1.0 0.8 0.6 y 0.4 0.2 0.0 -4 -2 0 2 4 x 8 / 44
Stock Price Structure ◮ Data1: 97 stocks selected from S&P100 in 1013 trading days from 2009 to 2014. ◮ Data2: 117 stocks selected from SSE180 (Shanghai Stock Exchange) in 1159 trading days from 2009 to 2014. 9 / 44
Stock Price Structure ◮ Data1: 97 stocks selected from S&P100 in 1013 trading days from 2009 to 2014. ◮ Data2: 117 stocks selected from SSE180 (Shanghai Stock Exchange) in 1159 trading days from 2009 to 2014. 9 / 44
Exploratory Analysis e.g. ◮ The block circled by blue contains mostly the energy companies: APA (Apache Corp), APC (Anadarko Petroleum), BHI (Baker Hughes), COP (Conoco Phillips), CVX (Chevron), DVN (Devon), ... ◮ The block circled by black contains the financial companies: The heatmap of stock-stock cor- C (citi), BAC (BOA), MS (Morgan Stanley), relation (Data 1; based on daily BK(Bank of New York Mellon), JPM (JP log return); stocks have been re- Morgan), ... ordered 10 / 44
Linear factor model ◮ Linear factor models R j = a ⊤ j α + ε j ◮ Fama-French model: R = R f + β ( K − R f ) + b s SMB + b v HML + α 11 / 44
Linear factor model ◮ Linear factor models R j = a ⊤ j α + ε j ◮ Fama-French model: R = R f + β ( K − R f ) + b s SMB + b v HML + α 11 / 44
Linear factor model ◮ ( R 1 , ..., R J ) is not multivariate Gaussian in many ways if J is large! ◮ Marginal tail, joint tail, asymmetric correlation... ◮ Too many factors! 12 / 44
Linear factor model ◮ ( R 1 , ..., R J ) is not multivariate Gaussian in many ways if J is large! ◮ Marginal tail, joint tail, asymmetric correlation... ◮ Too many factors! 12 / 44
Linear factor model ◮ ( R 1 , ..., R J ) is not multivariate Gaussian in many ways if J is large! ◮ Marginal tail, joint tail, asymmetric correlation... ◮ Too many factors! 12 / 44
Nonlinear factor model > S open ◮ Dichotomize R ji = 1 if S close for stock j on day i i i e a ⊤ j α − bj α ∈ R K ◮ P ( R j = 1 | α ) = j α − bj , a ⊤ 1+ e 13 / 44
Nonlinear factor model > S open ◮ Dichotomize R ji = 1 if S close for stock j on day i i i e a ⊤ j α − bj α ∈ R K ◮ P ( R j = 1 | α ) = j α − bj , a ⊤ 1+ e 13 / 44
Latent graphical model 14 / 44
Latent graphical model 15 / 44
Latent graphical model 16 / 44
Issues to concern ◮ Parametric/nonparametric models: latent variable and graph ◮ Inference: identifiability 17 / 44
Issues to concern ◮ Parametric/nonparametric models: latent variable and graph ◮ Inference: identifiability 17 / 44
The latent variable component – IRT model ◮ Alternative formulation: e a ⊤ j α − b j P ( R j | α ) ∝ e R j ( a ⊤ j α − b j ) P ( R j = 1 | α ) = ⇔ 1 + e a ⊤ j α − b j ◮ Local independence J j R j ( a ⊤ � � j α − b j ) P ( R 1 , ..., R J | α ) = P ( R j | α ) ∝ e j =1 18 / 44
The latent variable component – IRT model ◮ Alternative formulation: e a ⊤ j α − b j P ( R j | α ) ∝ e R j ( a ⊤ j α − b j ) P ( R j = 1 | α ) = ⇔ 1 + e a ⊤ j α − b j ◮ Local independence J j R j ( a ⊤ � � j α − b j ) P ( R 1 , ..., R J | α ) = P ( R j | α ) ∝ e j =1 18 / 44
Graphical component component – Ising model 1 � i , j s ij R i R j P ( R 1 , ..., R J | S ) ∝ e 2 ◮ Physics ◮ Graphical representation 19 / 44
Graphical component component – Ising model 1 � i , j s ij R i R j P ( R 1 , ..., R J | S ) ∝ e 2 ◮ Physics ◮ Graphical representation 19 / 44
Latent graphical model: IRT model + Ising model ◮ Nonlocal independence j R j ( a ⊤ j α − b j )+ 1 � � i , j s ij R i R j P ( R 1 , ..., R J | α ) ∝ e 2 ◮ Simplification: R 2 j = R j j R j a ⊤ j α + 1 � � i , j s ij R i R j P ( R 1 , ..., R J | α ) ∝ e 2 20 / 44
Latent graphical model: IRT model + Ising model ◮ Nonlocal independence j R j ( a ⊤ j α − b j )+ 1 � � i , j s ij R i R j P ( R 1 , ..., R J | α ) ∝ e 2 ◮ Simplification: R 2 j = R j j R j a ⊤ j α + 1 � � i , j s ij R i R j P ( R 1 , ..., R J | α ) ∝ e 2 20 / 44
Latent variable and network modeling ◮ The item response function f A , S ( R | α ) ∝ exp { α ⊤ A R + 1 2 R ⊤ S R } where A K × J = ( a 1 , ..., a J ) and S J × J = ( s ij ) ◮ Population (prior) distribution such that f A , S ( R , α ) ∝ exp {−| α | 2 / 2 + α ⊤ A R + 1 2 R ⊤ S R } 21 / 44
Latent variable and network modeling ◮ The item response function f A , S ( R | α ) ∝ exp { α ⊤ A R + 1 2 R ⊤ S R } where A K × J = ( a 1 , ..., a J ) and S J × J = ( s ij ) ◮ Population (prior) distribution such that f A , S ( R , α ) ∝ exp {−| α | 2 / 2 + α ⊤ A R + 1 2 R ⊤ S R } 21 / 44
Latent variable and network modeling ◮ Marginalized likelihood � f ( R , α ) d α ∝ exp { 1 2 R ⊤ ( A ⊤ A + S ) R } L ( A , S ) = ◮ Let L J × J = A ⊤ A L ( L , S ) = f ( R | L , S ) ∝ exp { 1 2 R ⊤ ( L + S ) R } 22 / 44
Latent variable and network modeling ◮ Marginalized likelihood � f ( R , α ) d α ∝ exp { 1 2 R ⊤ ( A ⊤ A + S ) R } L ( A , S ) = ◮ Let L J × J = A ⊤ A L ( L , S ) = f ( R | L , S ) ∝ exp { 1 2 R ⊤ ( L + S ) R } 22 / 44
Identifiability ◮ Identifiability of L and S ◮ Low dimension latent factor : L J × J = A ⊤ A is positive semi-definite of rank K ≪ J ◮ Small remaining dependence S is sparse 23 / 44
Identifiability ◮ Identifiability of L and S ◮ Low dimension latent factor : L J × J = A ⊤ A is positive semi-definite of rank K ≪ J ◮ Small remaining dependence S is sparse 23 / 44
Recommend
More recommend