Stock Movement Prediction from Tweets and Historical Prices Yumo Xu and Shay B. Cohen Institute for Language, Cognition, and Computation School of Informatics, University of Edinburgh ACL, 2018. https://yumoxu.github.io/ , yumo.xu@ed.ac.uk 1 / 28
Who cares about stock movements? 2 / 28
Who cares about stock movements? No one would be unhappy if they could predict stock movements 2 / 28
Who cares about stock movements? No one would be unhappy if they could predict stock movements Investor Government Researcher 2 / 28
Background ◮ Two mainstreams in finance: technical and fundamental analysis 3 / 28
Background ◮ Two mainstreams in finance: technical and fundamental analysis ◮ Two main content resources in NLP: public news and social media 3 / 28
Background ◮ Two mainstreams in finance: technical and fundamental analysis ◮ Two main content resources in NLP: public news and social media ◮ History of NLP models Feature engineering (before 2010) ↓ Topic models (2013-2015) ↓ Event-driven neural nets (2014-2015) ↓ Hierarchical attention nets (2018) 3 / 28
Background ◮ Two mainstreams in finance: technical and fundamental analysis ◮ Two main content resources in NLP: public news and social media ◮ History of NLP models Feature engineering (before 2010) ↓ Topic models (2013-2015) Generative ↓ Event-driven neural nets (2014-2015) ↓ Hierarchical attention nets (2018) 4 / 28
Background ◮ Two mainstreams in finance: technical and fundamental analysis ◮ Two main content resources in NLP: public news and social media ◮ History of NLP models Feature engineering (before 2010) ↓ Topic models (2013-2015) Generative ↓ Event-driven neural nets (2014-2015) ↓ Hierarchical attention nets (2018) 5 / 28
However, it has never been easy... Complexities The market is highly stochastic, and we make temporally-dependent predictions from chaotic data. 6 / 28
Divide and Treat Chaotic market information 1 Noisy and heterogeneous High market stochasticity 2 Random-walk theory (Malkiel, 1999) Temporally-dependent prediction 3 When a company suffers from a major scandal on a trading day, its stock price will have a downtrend in the coming trading days Public information needs time to be absorbed into movements over time (Luss and d’Aspremont, 2015), and thus is largely shared across temporally-close predictions 7 / 28
Divide and treat Chaotic market information Market Information Encoder 1 Noisy and heterogeneous High market stochasticity Variational Movement Decoder 2 Random walk theory (Malkiel, 1999) Temporally-dependent prediction Attentive Temporal Auxiliary 3 When a company suffers from a major scandal on a trading day, its stock price will have a downtrend in the coming trading days Public information needs time to be absorbed into movements over time (Luss and d’Aspremont, 2015), and thus is largely shared across temporally-close predictions 8 / 28
Problem Formulation Stock Movement Prediction ◮ We estimate the binary movement where 1 denotes rise and 0 denotes fall ◮ Target trading day: d ◮ We use the market information comprising relevant tweets, and historical prices, in the lag [ d − ∆ d , d − 1 ] where ∆ d is a fixed lag size 9 / 28
Generative Process ◮ T eligible trading days in the ∆ d lag y X ◮ Encode observed market information as a random variable X = [ x 1 ; . . . ; x T ] φ Z θ |D| 10 / 28
Generative Process ◮ T eligible trading days in the ∆ d lag y X ◮ Encode observed market information as a random variable X = [ x 1 ; . . . ; x T ] ◮ Generate the latent driven factor Z = [ z 1 ; . . . ; z T ] φ Z θ |D| 10 / 28
Generative Process ◮ T eligible trading days in the ∆ d lag y X ◮ Encode observed market information as a random variable X = [ x 1 ; . . . ; x T ] ◮ Generate the latent driven factor Z = [ z 1 ; . . . ; z T ] φ Z θ ◮ Generate stock movements y = [ y 1 , . . . , y T ] from X , Z |D| 10 / 28
Factorization ◮ For multi-task learning, we model p θ ( y | X ) = � Z p θ ( y , Z | X ) instead of p θ ( y T | X ) Main target: y T Temporal auxiliary target: y ∗ = [ y 1 , . . . , y T − 1 ] ◮ Factorization p θ ( y , Z | X ) = p θ ( y T | X , Z ) p θ ( z T | z < T , X ) T − 1 � p θ ( y t | x ≤ t , z t ) p θ ( z t | z < t , x ≤ t , y t ) t = 1 11 / 28
Primary components Market Information Encoder (MIE) 1 y X Encodes X Variational Movement Decoder (VMD) 2 Infers Z with X , y and decodes stock movements y from X , Z φ Z θ Attentive Temporal Auxiliary (ATA) 3 Integrates temporal loss for training |D| 12 / 28
StockNet architecture Training Objective α y 3 07/08 (c) Attentive Temporal Output y 1 y 2 Auxiliary (ATA) 03/08 06/08 h dec Variational decoder Temporal Attention z g 1 g 2 g 3 N ( µ, δ 2 ) k N (0 , I ) D KL � � h 1 h 2 h 3 (a) Variational Movement Decoder (VMD) z 1 z 2 z 3 µ " log δ 2 h enc N (0 , I ) Variational encoder Historical 02/08 03/08 06/08 Input (b) Market Information Prices Attention Attention Attention Encoder (MIE) (d) VAEs Bi-GRUs Message Embedding Layer Message Corpora 02/08 03/08 - 05/08 06/08 13 / 28
Variational Movement Decoder ◮ Goal: recurrently infer Z from X , y and decode y from X , Z ◮ Challenge: posterior inference is intractable in our factorized model 14 / 28
Variational Movement Decoder ◮ Goal: recurrently infer Z from X , y and decode y from X , Z ◮ Challenge: posterior inference is intractable in our factorized model VAE solutions ◮ Neural approximation and reparameterization ◮ Recurrent ELBO ◮ Adopt a posterior approximator q φ ( z t | z < t , x ≤ t , y t ) ∼ N ( µ, δ 2 I ) where φ = { µ, δ } 14 / 28
StockNet architecture Training Objective α y 3 07/08 (c) Attentive Temporal Output y 1 y 2 Auxiliary (ATA) 03/08 06/08 h dec Variational decoder Temporal Attention z g 1 g 2 g 3 N ( µ, δ 2 ) k N (0 , I ) D KL � � h 1 h 2 h 3 (a) Variational Movement Decoder (VMD) z 1 z 2 z 3 " log δ 2 µ h enc N (0 , I ) Variational encoder Historical 02/08 03/08 06/08 Input (b) Market Information Prices Attention Attention Attention Encoder (MIE) (d) VAEs Bi-GRUs Message Embedding Layer Message Corpora 02/08 03/08 - 05/08 06/08 15 / 28
Interface between VMD and ATA ˜ y T Training Objective ◮ Integrate the deterministic feature h t Temporal Attention 1 and the latent variable z t g t = tanh ( W g [ x t , h s t , z t ] + b g ) Dependency Score g T ◮ Decode movement hypothesis: first Information Score auxiliary targets, then main target ◮ Temporal attention: v ∗ g 2 g 1 g 3 16 / 28
Attentive Temporal Auxiliary ◮ Break down the approximated L to temporal objectives f ∈ R T × 1 f t = log p θ ( y t | x ≤ t , z ≤ t ) − λ D KL [ q φ ( z t | z < t , x ≤ t , y t ) � p θ ( z t | z < t , x ≤ t )] ◮ Reuse v ∗ to build the final temporal weight vector v ∈ R 1 × T v = [ α v ∗ , 1 ] where α ∈ [ 0 , 1 ] controls the overall auxiliary effects ◮ Recompose F N F ( θ, φ ; X , y ) = 1 � v ( n ) f ( n ) N n 17 / 28
Experimental setup ◮ Dataset Two-year daily price movements of 88 stocks Two components: a Twitter dataset and a historical price dataset Training: 20 months, 20,339 movements Development: 2 months, 2,555 movements Test: 2 months, 3,720 movements ◮ Lag window: 5 ◮ Metrics: accuracy and Matthews Correlation Coefficient (MCC) ◮ Comparative study: five baselines from different genres and five StockNet variations 18 / 28
Baselines and variants Baselines StockNet variants ◮ R AND : a naive predictor making ◮ H EDGE F UND A NALYST : fully-equipped random guess ◮ T ECHNICAL A NALYST : from only prices ◮ ARIMA: Autoregressive Integrated ◮ F UNDAMENTAL A NALYST : from only tweets Moving Average ◮ I NDEPENDENT A NALYST : optimizing only ◮ R AND F OREST (Pagolu et al., 2016) the main target ◮ TSLDA (Nguyen and Shirai, 2015) ◮ D ISCRIMINATIVE A NALYST : a discriminative ◮ HAN (Hu et al., 2018) variant 19 / 28
Results Baseline comparison Baseline models Acc. MCC ◮ The accuracy of 56% is generally 50.89 -0.002266 R AND 51.39 -0.020588 ARIMA reported as a satisfying result 53.08 0.012929 R AND F OREST (Nguyen and Shirai, 2015) 54.07 0.065382 TSLDA ◮ ARIMA : does not yield satisfying 57.64 0.051800 HAN results ◮ Two best baselines: TSLDA and HAN StockNet variations Acc. MCC Variant comparison 54.96 0.016456 T ECHNICAL A NALYST ◮ Two information sources are 58.23 0.071704 F UNDAMENTAL A NALYST integrated effectively 57.54 0.036610 I NDEPENDENT A NALYST 56.15 0.056493 D ISCRIMINATIVE A NALYST ◮ Generative framework incorporates 58.23 0.080796 H EDGE F UND A NALYST randomness properly 20 / 28
Recommend
More recommend