dive deeper in finance
play

Dive Deeper in Finance GTC 2017 San Jos California Daniel Egloff - PowerPoint PPT Presentation

Dive Deeper in Finance GTC 2017 San Jos California Daniel Egloff Dr. sc. math. Managing Director QuantAlea May 7, 2017 Today Generative models for financial time series Sequential latent Gaussian Variational Autoencoder


  1. Dive Deeper in Finance GTC 2017 – San José – California Daniel Egloff Dr. sc. math. Managing Director QuantAlea May 7, 2017

  2. Today ▪ Generative models for financial time series – Sequential latent Gaussian Variational Autoencoder ▪ Implementation in TensorFlow – Recurrent variational inference using TF control flow operations ▪ Applications to FX data – 1s to 10s OHLC aggregated data – Event based models for tick data is work in progress

  3. Generative Models and GPUs ▪ What I cannot create, I do not understand (Richard Feynman) ▪ Generative models are recent innovation in Deep Learning – GANs – Generative adversarial networks – VAE – Variational autoencoders ▪ Training is computationally demanding – Explorative modelling not possible without GPUs

  4. Deep Learning ▪ Deep Learning in finance is complementary to existing models and not a replacement ▪ Deep Learning benefits – Richer functional relationship between explanatory and response variables – Model complicated interactions – Automatic feature discovery – Capable to handle large amounts of data – Standard training procedures with back propagation and SGD – Frameworks and tooling

  5. Latent Variable – Encoding/Decoding ▪ Latent variable can be thought of a encoded representation of x ▪ Likelihood serves as decoder ▪ Posterior provides encoder 𝑞 𝑨 𝑦 𝑞 𝑦 𝑨 𝑦 𝑨 𝑦 Encoder Decoder 𝑞 𝑨

  6. Intractable Maximum Likelihood ▪ Maximum likelihood standard model fitting approach 𝑞 𝑦 = ׬ 𝑞 𝑦 𝑨 𝑞 𝑨 𝑒𝑨 → max ▪ Problem : marginal 𝑞 𝑦 and posterior 𝑞 𝑦 𝑨 𝑞 𝑨 𝑞 𝑨 𝑦 = 𝑞 𝑦 are intractable and their calculation suffers from exponential complexity ▪ Solutions – Markov Chain MC, Hamiltonian MC – Approximation and variational inference

  7. Variational Autoencoders ▪ Assume latent space with prior 𝑞 𝑨 𝑦 𝑞 𝑨 𝑦 𝑨 𝑞 𝑦 𝑨 𝑦 𝑞 𝑨

  8. Variational Autoencoders ▪ Parameterize likelihood 𝑞 𝑦 𝑨 with a deep neural network 𝑞 𝜒 𝑦 𝑨 𝜈 𝑦 𝑞 𝑨 𝑦 𝑨 𝑦 𝜏 𝑞 𝑨

  9. Variational Autoencoders ▪ Parameterize likelihood 𝑞 𝑦 𝑨 with a deep neural network ▪ Approximate intractable posterior 𝑞 𝑨 𝑦 with a deep neural network 𝑟 𝜄 𝑨 𝑦 𝑞 𝜒 𝑦 𝑨 𝜈 𝜈 𝑦 𝑞 𝑨 𝑦 𝑨 𝑦 𝜏 𝜏 𝑞 𝑨

  10. Variational Autoencoders ▪ Parameterize likelihood 𝑞 𝑦 𝑨 with a deep neural network ▪ Approximate intractable posterior 𝑞 𝑨 𝑦 with a deep neural network ▪ Learn the parameters 𝜄 and 𝜒 with backpropagation 𝑟 𝜄 𝑨 𝑦 𝑞 𝜒 𝑦 𝑨 𝜈 𝜈 𝑦 𝑨 𝑦 𝜏 𝜏 𝑞 𝑨

  11. Variational Inference ▪ Which loss to optimize? ▪ Can we choose posterior from a flexible family of distributions Q by minimizing a distance to real posterior? 𝑟 ∗ 𝑨 𝑦 = argmin 𝐿𝑀 𝑟 𝜄 𝑨 𝑦 ฮ𝑞 𝜒 𝑨 𝑦 𝜄∈𝑅 ▪ Problem: not computable because it involves marginal 𝑞 𝜒 𝑦 𝐿𝑀 𝑟 𝜄 𝑨 𝑦 ฮ𝑞 𝜒 𝑨 𝑦 = 𝐹 𝑟 𝜄 𝑨 𝑦 log 𝑟 𝜄 𝑨 𝑦 − 𝐹 𝑟 𝜄 𝑨 𝑦 log 𝑞 𝜒 𝑦, 𝑨 + log 𝑞 𝜒 𝑦 ≥ 0 Can be made small if Q is flexible enough

  12. Variational Inference ▪ Which loss to optimize? ▪ Can we choose posterior from a flexible family of distributions Q by minimizing a distance to real posterior? 𝑟 ∗ 𝑨 𝑦 = argmin 𝐿𝑀 𝑟 𝜄 𝑨 𝑦 ฮ𝑞 𝜒 𝑨 𝑦 𝜄∈𝑅 ▪ Drop left hand side because positive 0 ≤ 𝐹 𝑟 𝜄 𝑨 𝑦 log 𝑟 𝜄 𝑨 𝑦 − 𝐹 𝑟 𝜄 𝑨 𝑦 log 𝑞 𝜒 𝑦, 𝑨 + log 𝑞 𝜒 𝑦 −𝐹𝑀𝐶𝑃(𝜄, 𝜒)

  13. Variational Inference ▪ Which loss to optimize? ▪ Can we choose posterior from a flexible family of distributions Q by minimizing a distance to real posterior? 𝑟 ∗ 𝑨 𝑦 = argmin 𝐿𝑀 𝑟 𝜄 𝑨 𝑦 ฮ𝑞 𝜒 𝑨 𝑦 𝜄∈𝑅 ▪ Obtain tractable lower bound for marginal 𝐹𝑀𝐶𝑃(𝜄, 𝜒) ≤ log 𝑞 𝜒 𝑦 ▪ Training criterion: maximize evidence lower bound

  14. Variational Inference ▪ To interpret lower bound, write it as log 𝑞 𝜒 𝑦 ≥ 𝐹𝑀𝑃𝐶 𝜄, 𝜒 − 𝐿𝑀 𝑟 𝜄 𝑨 𝑦 ԡ𝑞 𝑨 = 𝐹 𝑟 𝜄 (𝑨|𝑦) log 𝑞 𝜒 𝑦 𝑨 Reconstruction score Penalty of deviation from prior 𝑞 𝜒 𝑦 𝑨 𝑦 𝑨~𝑟 𝜄 𝑨 𝑦 ▪ The smaller the tighter the lower bound 𝐿𝑀 𝑟 𝜄 𝑨 𝑦 ฮ𝑞 𝜒 𝑨 𝑦

  15. Applications to Time Series ▪ Sequence structure for observable and latent factor ▪ Model setup – Gaussian distributions with parameters calculated from deep recurrent neural network – Prior standard Gaussian – Model training with variational inference

  16. Inference and Training 𝜈 𝑢−1 𝜈 𝑢 𝜈 𝑢+1 𝜏 𝑢−1 𝜏 𝑢 𝜏 𝑢+1 ℎ 𝑢−1 ℎ 𝑢 ℎ 𝑢+1 𝑦 𝑢+1 𝑦 𝑢−1 𝑦 𝑢 𝑨 𝑢 𝑨 𝑢+1 𝑨 𝑢−1 𝑟 𝜄 𝑨 𝑦 ℎ 𝑢+1 ℎ 𝑢−1 ℎ 𝑢 𝜈 𝑢−1 𝜈 𝑢 𝜈 𝑢+1 𝜏 𝑢−1 𝜏 𝑢+1 𝜏 𝑢

  17. Implied Factorization ▪ Probability distributions factorize 𝑈 𝑈 𝑞 𝜒 𝑦 ≤𝑈 𝑨 ≤𝑈 = ෑ 𝑞 𝜒 𝑦 𝑢 𝑦 <𝑢 , 𝑨 ≤𝑢 = ෑ 𝑂 𝑦 𝑢 𝜈 𝜒 𝑦 <𝑢 , 𝑨 ≤𝑢 , 𝜏 𝜒 𝑦 <𝑢 , 𝑨 ≤𝑢 𝑢=1 𝑢=1 𝑈 𝑈 𝑟 𝜄 𝑨 ≤𝑈 𝑦 ≤𝑈 = ෑ 𝑟 𝜄 𝑨 𝑢 𝑦 <𝑢 , 𝑨 <𝑢 = ෑ 𝑂 𝑨 𝑢 𝜈 𝜄 𝑦 <𝑢 , 𝑨 <𝑢 , 𝜏 𝜄 𝑦 <𝑢 , 𝑨 <𝑢 𝑢=1 𝑢=1 ▪ Loss calculation – Distributions can be easily simulated to calculate expectation term – Kullback Leibler term can be calculated analytically

  18. Calculating ELBO ▪ Loss calculation – Kullback Leibler term can be calculated analytically – For fixed 𝑢 the quantities 𝜈 𝜒 , 𝜈 𝜄 , 𝜏 𝜒 , 𝜏 𝜄 depend on 𝑨 𝑢 ~𝑂 𝑨 𝑢 𝜈 𝜄 𝑦 <𝑢 , 𝑨 <𝑢 , 𝜏 𝜄 𝑦 <𝑢 , 𝑨 <𝑢 – Simulate from this distribution to estimate expectation with a sample mean 𝑈 𝜏 𝜒−1 𝑦 𝑢 − 𝜈 𝜒 + log det 𝜏 𝜒 + σ 𝑢 ቄ 𝐹𝑀𝐶𝑃 𝜄, 𝜒 = −𝐹 𝑟 ቂ 𝑦 𝑢 − 𝜈 𝜒 𝜈 𝜄 𝑈 𝜈 𝜄 + 𝑢𝑠𝜏 𝜄 − log det 𝜏 𝜄 ቅ ቃ Approximate with Monte Carlo sampling from 𝑟 𝜄 𝑨 ≤𝑈 𝑦 ≤𝑈

  19. Generation 𝜈 𝑢−1 𝜈 𝑢 𝜈 𝑢+1 𝜏 𝑢−1 𝜏 𝑢 𝜏 𝑢+1 𝑞 𝜒 𝑦 𝑨 ℎ 𝑢−1 ℎ 𝑢 ℎ 𝑢+1 𝑦 𝑢+1 𝑦 𝑢−1 𝑦 𝑢 𝑨 𝑢 𝑨 𝑢+1 𝑨 𝑢−1 𝑞(𝑨)

  20. Time Series Embedding ▪ Single historical value not predictive enough ▪ Embedding – Use lag of ~20 historical observations at every time step t t +1 Batch t +2 Time steps

  21. Implementation ▪ Implementation in TensorFlow ▪ Running on P100 GPUs for model training ▪ Long time series and large batch sizes require substantial GPU memory

  22. TensorFlow Dynamic RNN ▪ Unrolling rnn with tf.nn.dynamic_rnn – Simple to use – Can handle variable sequence length ▪ Not flexible enough for generative networks

  23. TensorFlow Control Structures ▪ Using tf.while_loop – More to program, need to understand control structures in more detail – Much more flexible

  24. Implementation ▪ Notations

  25. Implementation ▪ Variable and Weight Setup Recurrent neural network definition

  26. Implementation ▪ Allocate TensorArray objects ▪ Fill input TensorArray objects with data

  27. Implementation ▪ While loop body inference part Update inference rnn state

  28. Implementation ▪ While loop body inference part Update generator rnn state

  29. Implementation ▪ Call while loop ▪ Stacking TensorArray objects

  30. Implementation ▪ Loss Calculation

  31. FX Market ▪ FX market is largest and most liquid market in the world ▪ Decentralized over the counter market – Not necessary to go through a centralized exchange – No single price for a currency at a given point in time ▪ Fierce competition between market participants ▪ 24 hours, 5 ½ days per week – As one major forex market closes, another one opens

  32. FX Data ▪ Collect tick data from major liquidity provider e.g. LMAX ▪ Aggregation to OHLC bars (1s, 10s, …) ▪ Focus on US trading session 5 4 3 2 1 12 11 10 9 8 7 6 5 4 3 2 1 12 11 10 9 8 7 6 8am – 5pm EST 7pm – 4am EST (Tokyo) 3am – 12am EST 5pm – 2am EST (Sidney) US session London session Asian session

  33. EURUSD 2016

  34. Single Day

  35. One Hour

  36. 10 Min Sampled at 1s At high frequency FX prices fluctuate in range of deci-pips 5 pips Larger jumps in the order of multiple pips 1/10 pips = 1 deci-pip and more

  37. Setup ▪ Normalize data with std deviation ො 𝜏 over training interval ▪ 260 trading days in 2016, one model per day ▪ 60 dim embedding, 2 dim latent space 𝜏 ො Training Out of sample test

  38. Results Training

  39. Out of Sample

  40. Volatility of Prediction

  41. Latent Variables

  42. Pricing in E-Commerce ▪ Attend our talk on our latest work on AI and GPU accelerated genetic algorithms with Jet.com

Recommend


More recommend