stock movement prediction from tweets and historical
play

Stock Movement Prediction from Tweets and Historical Prices Yumo Xu - PowerPoint PPT Presentation

Stock Movement Prediction from Tweets and Historical Prices Yumo Xu and Shay B. Cohen Institute for Language, Cognition, and Computation School of Informatics, University of Edinburgh ACL, 2018. https://yumoxu.github.io/ , yumo.xu@ed.ac.uk 1 /


  1. Stock Movement Prediction from Tweets and Historical Prices Yumo Xu and Shay B. Cohen Institute for Language, Cognition, and Computation School of Informatics, University of Edinburgh ACL, 2018. https://yumoxu.github.io/ , yumo.xu@ed.ac.uk 1 / 28

  2. Who cares about stock movements? 2 / 28

  3. Who cares about stock movements? No one would be unhappy if they could predict stock movements 2 / 28

  4. Who cares about stock movements? No one would be unhappy if they could predict stock movements Investor Government Researcher 2 / 28

  5. Background ◮ Two mainstreams in finance: technical and fundamental analysis 3 / 28

  6. Background ◮ Two mainstreams in finance: technical and fundamental analysis ◮ Two main content resources in NLP: public news and social media 3 / 28

  7. Background ◮ Two mainstreams in finance: technical and fundamental analysis ◮ Two main content resources in NLP: public news and social media ◮ History of NLP models Feature engineering (before 2010) ↓ Topic models (2013-2015) ↓ Event-driven neural nets (2014-2015) ↓ Hierarchical attention nets (2018) 3 / 28

  8. Background ◮ Two mainstreams in finance: technical and fundamental analysis ◮ Two main content resources in NLP: public news and social media ◮ History of NLP models Feature engineering (before 2010) ↓ Topic models (2013-2015) Generative ↓ Event-driven neural nets (2014-2015) ↓ Hierarchical attention nets (2018) 4 / 28

  9. Background ◮ Two mainstreams in finance: technical and fundamental analysis ◮ Two main content resources in NLP: public news and social media ◮ History of NLP models Feature engineering (before 2010) ↓ Topic models (2013-2015) Generative ↓ Event-driven neural nets (2014-2015) ↓ Hierarchical attention nets (2018) 5 / 28

  10. However, it has never been easy... Complexities The market is highly stochastic, and we make temporally-dependent predictions from chaotic data. 6 / 28

  11. Divide and Treat Chaotic market information 1 Noisy and heterogeneous High market stochasticity 2 Random-walk theory (Malkiel, 1999) Temporally-dependent prediction 3 When a company suffers from a major scandal on a trading day, its stock price will have a downtrend in the coming trading days Public information needs time to be absorbed into movements over time (Luss and d’Aspremont, 2015), and thus is largely shared across temporally-close predictions 7 / 28

  12. Divide and treat Chaotic market information Market Information Encoder 1 Noisy and heterogeneous High market stochasticity Variational Movement Decoder 2 Random walk theory (Malkiel, 1999) Temporally-dependent prediction Attentive Temporal Auxiliary 3 When a company suffers from a major scandal on a trading day, its stock price will have a downtrend in the coming trading days Public information needs time to be absorbed into movements over time (Luss and d’Aspremont, 2015), and thus is largely shared across temporally-close predictions 8 / 28

  13. Problem Formulation Stock Movement Prediction ◮ We estimate the binary movement where 1 denotes rise and 0 denotes fall ◮ Target trading day: d ◮ We use the market information comprising relevant tweets, and historical prices, in the lag [ d − ∆ d , d − 1 ] where ∆ d is a fixed lag size 9 / 28

  14. Generative Process ◮ T eligible trading days in the ∆ d lag y X ◮ Encode observed market information as a random variable X = [ x 1 ; . . . ; x T ] φ Z θ |D| 10 / 28

  15. Generative Process ◮ T eligible trading days in the ∆ d lag y X ◮ Encode observed market information as a random variable X = [ x 1 ; . . . ; x T ] ◮ Generate the latent driven factor Z = [ z 1 ; . . . ; z T ] φ Z θ |D| 10 / 28

  16. Generative Process ◮ T eligible trading days in the ∆ d lag y X ◮ Encode observed market information as a random variable X = [ x 1 ; . . . ; x T ] ◮ Generate the latent driven factor Z = [ z 1 ; . . . ; z T ] φ Z θ ◮ Generate stock movements y = [ y 1 , . . . , y T ] from X , Z |D| 10 / 28

  17. Factorization ◮ For multi-task learning, we model p θ ( y | X ) = � Z p θ ( y , Z | X ) instead of p θ ( y T | X ) Main target: y T Temporal auxiliary target: y ∗ = [ y 1 , . . . , y T − 1 ] ◮ Factorization p θ ( y , Z | X ) = p θ ( y T | X , Z ) p θ ( z T | z < T , X ) T − 1 � p θ ( y t | x ≤ t , z t ) p θ ( z t | z < t , x ≤ t , y t ) t = 1 11 / 28

  18. Primary components Market Information Encoder (MIE) 1 y X Encodes X Variational Movement Decoder (VMD) 2 Infers Z with X , y and decodes stock movements y from X , Z φ Z θ Attentive Temporal Auxiliary (ATA) 3 Integrates temporal loss for training |D| 12 / 28

  19. StockNet architecture Training Objective α y 3 07/08 (c) Attentive Temporal Output y 1 y 2 Auxiliary (ATA) 03/08 06/08 h dec Variational decoder Temporal Attention z g 1 g 2 g 3 N ( µ, δ 2 ) k N (0 , I ) D KL � � h 1 h 2 h 3 (a) Variational Movement Decoder (VMD) z 1 z 2 z 3 µ " log δ 2 h enc N (0 , I ) Variational encoder Historical 02/08 03/08 06/08 Input (b) Market Information Prices Attention Attention Attention Encoder (MIE) (d) VAEs Bi-GRUs Message Embedding Layer Message Corpora 02/08 03/08 - 05/08 06/08 13 / 28

  20. Variational Movement Decoder ◮ Goal: recurrently infer Z from X , y and decode y from X , Z ◮ Challenge: posterior inference is intractable in our factorized model 14 / 28

  21. Variational Movement Decoder ◮ Goal: recurrently infer Z from X , y and decode y from X , Z ◮ Challenge: posterior inference is intractable in our factorized model VAE solutions ◮ Neural approximation and reparameterization ◮ Recurrent ELBO ◮ Adopt a posterior approximator q φ ( z t | z < t , x ≤ t , y t ) ∼ N ( µ, δ 2 I ) where φ = { µ, δ } 14 / 28

  22. StockNet architecture Training Objective α y 3 07/08 (c) Attentive Temporal Output y 1 y 2 Auxiliary (ATA) 03/08 06/08 h dec Variational decoder Temporal Attention z g 1 g 2 g 3 N ( µ, δ 2 ) k N (0 , I ) D KL � � h 1 h 2 h 3 (a) Variational Movement Decoder (VMD) z 1 z 2 z 3 " log δ 2 µ h enc N (0 , I ) Variational encoder Historical 02/08 03/08 06/08 Input (b) Market Information Prices Attention Attention Attention Encoder (MIE) (d) VAEs Bi-GRUs Message Embedding Layer Message Corpora 02/08 03/08 - 05/08 06/08 15 / 28

  23. Interface between VMD and ATA ˜ y T Training Objective ◮ Integrate the deterministic feature h t Temporal Attention 1 and the latent variable z t g t = tanh ( W g [ x t , h s t , z t ] + b g ) Dependency Score g T ◮ Decode movement hypothesis: first Information Score auxiliary targets, then main target ◮ Temporal attention: v ∗ g 2 g 1 g 3 16 / 28

  24. Attentive Temporal Auxiliary ◮ Break down the approximated L to temporal objectives f ∈ R T × 1 f t = log p θ ( y t | x ≤ t , z ≤ t ) − λ D KL [ q φ ( z t | z < t , x ≤ t , y t ) � p θ ( z t | z < t , x ≤ t )] ◮ Reuse v ∗ to build the final temporal weight vector v ∈ R 1 × T v = [ α v ∗ , 1 ] where α ∈ [ 0 , 1 ] controls the overall auxiliary effects ◮ Recompose F N F ( θ, φ ; X , y ) = 1 � v ( n ) f ( n ) N n 17 / 28

  25. Experimental setup ◮ Dataset Two-year daily price movements of 88 stocks Two components: a Twitter dataset and a historical price dataset Training: 20 months, 20,339 movements Development: 2 months, 2,555 movements Test: 2 months, 3,720 movements ◮ Lag window: 5 ◮ Metrics: accuracy and Matthews Correlation Coefficient (MCC) ◮ Comparative study: five baselines from different genres and five StockNet variations 18 / 28

  26. Baselines and variants Baselines StockNet variants ◮ R AND : a naive predictor making ◮ H EDGE F UND A NALYST : fully-equipped random guess ◮ T ECHNICAL A NALYST : from only prices ◮ ARIMA: Autoregressive Integrated ◮ F UNDAMENTAL A NALYST : from only tweets Moving Average ◮ I NDEPENDENT A NALYST : optimizing only ◮ R AND F OREST (Pagolu et al., 2016) the main target ◮ TSLDA (Nguyen and Shirai, 2015) ◮ D ISCRIMINATIVE A NALYST : a discriminative ◮ HAN (Hu et al., 2018) variant 19 / 28

  27. Results Baseline comparison Baseline models Acc. MCC ◮ The accuracy of 56% is generally 50.89 -0.002266 R AND 51.39 -0.020588 ARIMA reported as a satisfying result 53.08 0.012929 R AND F OREST (Nguyen and Shirai, 2015) 54.07 0.065382 TSLDA ◮ ARIMA : does not yield satisfying 57.64 0.051800 HAN results ◮ Two best baselines: TSLDA and HAN StockNet variations Acc. MCC Variant comparison 54.96 0.016456 T ECHNICAL A NALYST ◮ Two information sources are 58.23 0.071704 F UNDAMENTAL A NALYST integrated effectively 57.54 0.036610 I NDEPENDENT A NALYST 56.15 0.056493 D ISCRIMINATIVE A NALYST ◮ Generative framework incorporates 58.23 0.080796 H EDGE F UND A NALYST randomness properly 20 / 28

Recommend


More recommend