deep rl and memory
play

Deep RL and Memory NVIDIA GTC 2018 March 26, 2018 Daniel Egloff, - PowerPoint PPT Presentation

Trade and Manage Wealth with Deep RL and Memory NVIDIA GTC 2018 March 26, 2018 Daniel Egloff, Founder, CEO, Head R&D Problem Retail investor customer demands Manage portfolio more actively Get additional return from smart


  1. Trade and Manage Wealth with Deep RL and Memory NVIDIA GTC 2018 March 26, 2018 Daniel Egloff, Founder, CEO, Head R&D

  2. Problem Retail investor customer demands • Manage portfolio more actively • Get additional return from smart investment decisions Lack of products Market barriers Costs Current digital Regulations limits Existing actively platforms focus on access to hedge fund managed products passively managed style active products charge high fees products

  3. Question Can modern AI technology replace a PM or trader?

  4. Solution – AI Agents Smart data-aware AI agents for active investment decisions • AI powered alternative to smart beta ETFs • AI supervised and automated trading strategies Save costs Save time Smart returns Don’t miss market Extra return from Fully automated opportunities smart data driven without human Delegate work to decisions interaction smart agents

  5. 90% of the robo-advisors today are ETF-based. ETFs alone have run out of steam to fuel the Market Validation next growth of robo advisors. Robo advisors need more than 6 years to make a profit of a customer, post acquisition. Source: Burnmark report April 2017. Intergeneration Investor wealth transfer behavior change Digital Reduced margins Robo advisors channels Growth of Growth of Growth of Smart Beta ETF market online brokers ETFs 12tn wealth transferring from Smart beta 30% growth in 1920/30 to 1946/64 generation. 2017. Source: EY EFT report Source: Burnmark report April 2017. 2017.

  6. AI Foundations • Several recent innovations in AI and Big Data • Deep Reinforcement Learning • Differentiable Neural Computer • Large scale data streaming infrastructures from eCommerce

  7. Classical Portfolio Construction • Information bottleneck • Signal based, not end-to-end • Partial and staged data usage • Many data sources cannot be integrated with current factor based portfolio construction • Retrospective • Design, fit, deploy, re-engineer offline • Missing feedback link and on-line learning • Difficult to account for nonlinear dependencies

  8. Signal Based Market state Next Market S t state S t+1 Supervised Error - learning Weights Forecasting Trade rule P&L/Utility system 𝜄 1 system 𝜄 2 𝑉(𝜄 1 , 𝜄 2 ) Trades Signals Transaction Information costs bottleneck

  9. Reinforcement Learning Based • Full information in portfolio weights and trades • Feedback loop to improve on good decisions and avoid unsuccessful decisions • Allows for more realistic modeling of intelligence of a successful PM or trader • Much easier process

  10. Reinforcement Learning Based Market state Next Market S t state S t+1 Reinforcement learning 𝑉(𝜄 1 ) Trade P&L/Utility system 𝜄 1 𝑉(𝜄 1 ) Weights/Trades Delay Transaction costs

  11. 0 1 2 3 PaaS to design, train, test, AI Trading Agents deploy and run agents 4 5 AI Agent Objective Excess return Online Learning Initial Training Risk Medium 2 5 Strategy & Style 4 Frequency 1h 1 Universe AAPL AMZN BAC GOOG WMT JMP I learn from 5 2 Price VIX News LOB News Batch architecture Online architecture My performance Live Historic 2 5 Stats Stats 3 Brokers & Exchanges Training Data Scenarios 5 Bull Bear Crash Stats Stats Stats 0 5 0 Streaming architecture

  12. Challenges and Insights

  13. Reinforcement Learning Setup Learning a behavioral strategy which maximizes long term sum of rewards by a direct interaction with an unknown and uncertain environment Environment While not terminal do: Agent perceives state s t Agent performs action a t Action Reward State Agent receives reward r t Environment evolves to state s t+1 Agent

  14. Environment RL - State Agent Environment state • What is the market state? • Which data required • Price data top of the book • LOB L1, L2 • LOB messages • Secondary and non standard data • Event data versus time clocked data • How to combine agent and market state to environment state?

  15. Environment RL - Policy Agent Agent policy specification • What is the agent action? • Continuous action for percentages of wealth • Discrete units of lots to buy/sell • Order implementation using market/limit orders • Long only vs. long/short • Long only vs. long/short? • Long only agent do not face bankruptcy • Short position can lead to bankruptcy

  16. Environment RL - Policy Agent Distributions on the simplex • Commonly known distributions (Dirichlet, …) not appropriate • Exploiting less known Hilbert space structure on (open) simplex leading to isometry to Euclidian space (Aitchison) Isometry Pull back normal distribution, Student-t, etc.

  17. Environment RL - Interaction Agent Interaction of agent and environment • Market evolution, LOB resilience • Temporal and permanent market impact • Position change • Order cancellations Market evolution • Partial fills and impact Market Market Market Policy liquidity prices prices Target Executed trades trades Agent Agent positions positions Filled trades New state State Time

  18. Our 6 Main Challenges • Sparse trading, learn to use cash and wait for opportunities • Robustness of RL • Scaling up RL training • Handling high resolution event time series data • Adapting agents to changing markets while not forgetting • Explaining agent decisions and behavior

  19. Sparse Trading • Reward modelling, including realistic transaction cost modelling • Adding risk to give cash a value • Properly balance risk and reward • Combining tree search and RL or option framework to learn to postpone trading

  20. Robustness • Reward modelling • Very long history • Looking at different scales of time series • Training on synthesized data, e.g. reconstruct prices from skewed sampling from empirical return distribution

  21. Scaling up RL • Breaking long episodes into partial episodes with differential memory Partial roll out p Partial states DNC Estimated s s s Environment states s initial state ……… Policy Environment reaction Action a a a

  22. High Resolution Event TS • New hybrid RNN – CNN network topology • Properly apply convolution over time and cross section • Cross section should be permutation invariant! • Convolution at different time frequencies • Residual NN OHLC OHLC too simplistic

  23. Adapting while not Forgetting • New attention mechanism relative to prior attention with penalty • Prior attention reflects “agent style” Prioritization of new data History New Important time Important time marked in prior marked in prior

  24. Explaining Agent Decisions • Learning supervised model to explain agent returns • Compare to different ETF and investment products following a specific investment style • Value • Growth • Momentum • Mean reversion

  25. Contact Info www.flink.ai | daniel@flink.ai

Recommend


More recommend