latent and network models with applications to finance
play

Latent and Network Models with Applications to Finance Jingchen Liu - PowerPoint PPT Presentation

Latent and Network Models with Applications to Finance Jingchen Liu Department of Statistics Columbia University Joint work with Yunxiao Chen, Xiaoou Li, and Zhiliang Ying At ISFA-Columbia Workshop, June 28, 2016 November 16, 2015 1 / 44


  1. Latent and Network Models with Applications to Finance Jingchen Liu Department of Statistics Columbia University Joint work with Yunxiao Chen, Xiaoou Li, and Zhiliang Ying At ISFA-Columbia Workshop, June 28, 2016 November 16, 2015 1 / 44

  2. Modeling multivariate distribution ◮ Multivariate random vector: ( R 1 , ..., R J ) ◮ Continuous vectors: multivariate Gaussian, multivariate t -distribution... ◮ Categorical vectors: loglinear model... ◮ Copula ◮ Regression 2 / 44

  3. Modeling multivariate distribution ◮ Multivariate random vector: ( R 1 , ..., R J ) ◮ Continuous vectors: multivariate Gaussian, multivariate t -distribution... ◮ Categorical vectors: loglinear model... ◮ Copula ◮ Regression 2 / 44

  4. Modeling multivariate distribution ◮ Multivariate random vector: ( R 1 , ..., R J ) ◮ Continuous vectors: multivariate Gaussian, multivariate t -distribution... ◮ Categorical vectors: loglinear model... ◮ Copula ◮ Regression 2 / 44

  5. Modeling multivariate distribution ◮ Multivariate random vector: ( R 1 , ..., R J ) ◮ Continuous vectors: multivariate Gaussian, multivariate t -distribution... ◮ Categorical vectors: loglinear model... ◮ Copula ◮ Regression 2 / 44

  6. Modeling multivariate distribution ◮ Multivariate random vector: ( R 1 , ..., R J ) ◮ Continuous vectors: multivariate Gaussian, multivariate t -distribution... ◮ Categorical vectors: loglinear model... ◮ Copula ◮ Regression 2 / 44

  7. Latent variable modeling ◮ There exists α such that f ( R 1 , ..., R J | α ) is simple. ◮ What is considered as simple? ◮ Independence, small variance,... 3 / 44

  8. Latent variable modeling ◮ There exists α such that f ( R 1 , ..., R J | α ) is simple. ◮ What is considered as simple? ◮ Independence, small variance,... 3 / 44

  9. Latent variable modeling ◮ There exists α such that f ( R 1 , ..., R J | α ) is simple. ◮ What is considered as simple? ◮ Independence, small variance,... 3 / 44

  10. Graphical representation 4 / 44

  11. Local independence � f ( R 1 , ..., R J | α ) = f ( R j | α ) j 5 / 44

  12. Applications ◮ Finance, political sciences ◮ Education ◮ Psychiatry/psychology ◮ Marketing and e-commerce 6 / 44

  13. Applications ◮ Finance, political sciences ◮ Education ◮ Psychiatry/psychology ◮ Marketing and e-commerce 6 / 44

  14. Applications ◮ Finance, political sciences ◮ Education ◮ Psychiatry/psychology ◮ Marketing and e-commerce 6 / 44

  15. Applications ◮ Finance, political sciences ◮ Education ◮ Psychiatry/psychology ◮ Marketing and e-commerce 6 / 44

  16. Linear factor models ◮ ( R 1 , ..., R J ) is continous. ◮ Linear factor models: α = ( α 1 , ..., α K ) R j = a ⊤ j α + ε j ◮ Principle component analysis 7 / 44

  17. Linear factor models ◮ ( R 1 , ..., R J ) is continous. ◮ Linear factor models: α = ( α 1 , ..., α K ) R j = a ⊤ j α + ε j ◮ Principle component analysis 7 / 44

  18. Linear factor models ◮ ( R 1 , ..., R J ) is continous. ◮ Linear factor models: α = ( α 1 , ..., α K ) R j = a ⊤ j α + ε j ◮ Principle component analysis 7 / 44

  19. Categorical variable and item response theory model ◮ Binary R i ∈ { 0 , 1 } . e a ⊤ α − bj j ◮ P ( R j = 1 | α ) = α ∈ R K α − bj , a ⊤ 1+ e j 1.0 0.8 0.6 y 0.4 0.2 0.0 -4 -2 0 2 4 x 8 / 44

  20. Categorical variable and item response theory model ◮ Binary R i ∈ { 0 , 1 } . e a ⊤ α − bj j ◮ P ( R j = 1 | α ) = α ∈ R K α − bj , a ⊤ 1+ e j 1.0 0.8 0.6 y 0.4 0.2 0.0 -4 -2 0 2 4 x 8 / 44

  21. Stock Price Structure ◮ Data1: 97 stocks selected from S&P100 in 1013 trading days from 2009 to 2014. ◮ Data2: 117 stocks selected from SSE180 (Shanghai Stock Exchange) in 1159 trading days from 2009 to 2014. 9 / 44

  22. Stock Price Structure ◮ Data1: 97 stocks selected from S&P100 in 1013 trading days from 2009 to 2014. ◮ Data2: 117 stocks selected from SSE180 (Shanghai Stock Exchange) in 1159 trading days from 2009 to 2014. 9 / 44

  23. Exploratory Analysis e.g. ◮ The block circled by blue contains mostly the energy companies: APA (Apache Corp), APC (Anadarko Petroleum), BHI (Baker Hughes), COP (Conoco Phillips), CVX (Chevron), DVN (Devon), ... ◮ The block circled by black contains the financial companies: The heatmap of stock-stock cor- C (citi), BAC (BOA), MS (Morgan Stanley), relation (Data 1; based on daily BK(Bank of New York Mellon), JPM (JP log return); stocks have been re- Morgan), ... ordered 10 / 44

  24. Linear factor model ◮ Linear factor models R j = a ⊤ j α + ε j ◮ Fama-French model: R = R f + β ( K − R f ) + b s SMB + b v HML + α 11 / 44

  25. Linear factor model ◮ Linear factor models R j = a ⊤ j α + ε j ◮ Fama-French model: R = R f + β ( K − R f ) + b s SMB + b v HML + α 11 / 44

  26. Linear factor model ◮ ( R 1 , ..., R J ) is not multivariate Gaussian in many ways if J is large! ◮ Marginal tail, joint tail, asymmetric correlation... ◮ Too many factors! 12 / 44

  27. Linear factor model ◮ ( R 1 , ..., R J ) is not multivariate Gaussian in many ways if J is large! ◮ Marginal tail, joint tail, asymmetric correlation... ◮ Too many factors! 12 / 44

  28. Linear factor model ◮ ( R 1 , ..., R J ) is not multivariate Gaussian in many ways if J is large! ◮ Marginal tail, joint tail, asymmetric correlation... ◮ Too many factors! 12 / 44

  29. Nonlinear factor model > S open ◮ Dichotomize R ji = 1 if S close for stock j on day i i i e a ⊤ j α − bj α ∈ R K ◮ P ( R j = 1 | α ) = j α − bj , a ⊤ 1+ e 13 / 44

  30. Nonlinear factor model > S open ◮ Dichotomize R ji = 1 if S close for stock j on day i i i e a ⊤ j α − bj α ∈ R K ◮ P ( R j = 1 | α ) = j α − bj , a ⊤ 1+ e 13 / 44

  31. Latent graphical model 14 / 44

  32. Latent graphical model 15 / 44

  33. Latent graphical model 16 / 44

  34. Issues to concern ◮ Parametric/nonparametric models: latent variable and graph ◮ Inference: identifiability 17 / 44

  35. Issues to concern ◮ Parametric/nonparametric models: latent variable and graph ◮ Inference: identifiability 17 / 44

  36. The latent variable component – IRT model ◮ Alternative formulation: e a ⊤ j α − b j P ( R j | α ) ∝ e R j ( a ⊤ j α − b j ) P ( R j = 1 | α ) = ⇔ 1 + e a ⊤ j α − b j ◮ Local independence J j R j ( a ⊤ � � j α − b j ) P ( R 1 , ..., R J | α ) = P ( R j | α ) ∝ e j =1 18 / 44

  37. The latent variable component – IRT model ◮ Alternative formulation: e a ⊤ j α − b j P ( R j | α ) ∝ e R j ( a ⊤ j α − b j ) P ( R j = 1 | α ) = ⇔ 1 + e a ⊤ j α − b j ◮ Local independence J j R j ( a ⊤ � � j α − b j ) P ( R 1 , ..., R J | α ) = P ( R j | α ) ∝ e j =1 18 / 44

  38. Graphical component component – Ising model 1 � i , j s ij R i R j P ( R 1 , ..., R J | S ) ∝ e 2 ◮ Physics ◮ Graphical representation 19 / 44

  39. Graphical component component – Ising model 1 � i , j s ij R i R j P ( R 1 , ..., R J | S ) ∝ e 2 ◮ Physics ◮ Graphical representation 19 / 44

  40. Latent graphical model: IRT model + Ising model ◮ Nonlocal independence j R j ( a ⊤ j α − b j )+ 1 � � i , j s ij R i R j P ( R 1 , ..., R J | α ) ∝ e 2 ◮ Simplification: R 2 j = R j j R j a ⊤ j α + 1 � � i , j s ij R i R j P ( R 1 , ..., R J | α ) ∝ e 2 20 / 44

  41. Latent graphical model: IRT model + Ising model ◮ Nonlocal independence j R j ( a ⊤ j α − b j )+ 1 � � i , j s ij R i R j P ( R 1 , ..., R J | α ) ∝ e 2 ◮ Simplification: R 2 j = R j j R j a ⊤ j α + 1 � � i , j s ij R i R j P ( R 1 , ..., R J | α ) ∝ e 2 20 / 44

  42. Latent variable and network modeling ◮ The item response function f A , S ( R | α ) ∝ exp { α ⊤ A R + 1 2 R ⊤ S R } where A K × J = ( a 1 , ..., a J ) and S J × J = ( s ij ) ◮ Population (prior) distribution such that f A , S ( R , α ) ∝ exp {−| α | 2 / 2 + α ⊤ A R + 1 2 R ⊤ S R } 21 / 44

  43. Latent variable and network modeling ◮ The item response function f A , S ( R | α ) ∝ exp { α ⊤ A R + 1 2 R ⊤ S R } where A K × J = ( a 1 , ..., a J ) and S J × J = ( s ij ) ◮ Population (prior) distribution such that f A , S ( R , α ) ∝ exp {−| α | 2 / 2 + α ⊤ A R + 1 2 R ⊤ S R } 21 / 44

  44. Latent variable and network modeling ◮ Marginalized likelihood � f ( R , α ) d α ∝ exp { 1 2 R ⊤ ( A ⊤ A + S ) R } L ( A , S ) = ◮ Let L J × J = A ⊤ A L ( L , S ) = f ( R | L , S ) ∝ exp { 1 2 R ⊤ ( L + S ) R } 22 / 44

  45. Latent variable and network modeling ◮ Marginalized likelihood � f ( R , α ) d α ∝ exp { 1 2 R ⊤ ( A ⊤ A + S ) R } L ( A , S ) = ◮ Let L J × J = A ⊤ A L ( L , S ) = f ( R | L , S ) ∝ exp { 1 2 R ⊤ ( L + S ) R } 22 / 44

  46. Identifiability ◮ Identifiability of L and S ◮ Low dimension latent factor : L J × J = A ⊤ A is positive semi-definite of rank K ≪ J ◮ Small remaining dependence S is sparse 23 / 44

  47. Identifiability ◮ Identifiability of L and S ◮ Low dimension latent factor : L J × J = A ⊤ A is positive semi-definite of rank K ≪ J ◮ Small remaining dependence S is sparse 23 / 44

Recommend


More recommend