Trade and Manage Wealth with Deep RL and Memory NVIDIA GTC 2018 March 26, 2018 Daniel Egloff, Founder, CEO, Head R&D
Problem Retail investor customer demands • Manage portfolio more actively • Get additional return from smart investment decisions Lack of products Market barriers Costs Current digital Regulations limits Existing actively platforms focus on access to hedge fund managed products passively managed style active products charge high fees products
Question Can modern AI technology replace a PM or trader?
Solution – AI Agents Smart data-aware AI agents for active investment decisions • AI powered alternative to smart beta ETFs • AI supervised and automated trading strategies Save costs Save time Smart returns Don’t miss market Extra return from Fully automated opportunities smart data driven without human Delegate work to decisions interaction smart agents
90% of the robo-advisors today are ETF-based. ETFs alone have run out of steam to fuel the Market Validation next growth of robo advisors. Robo advisors need more than 6 years to make a profit of a customer, post acquisition. Source: Burnmark report April 2017. Intergeneration Investor wealth transfer behavior change Digital Reduced margins Robo advisors channels Growth of Growth of Growth of Smart Beta ETF market online brokers ETFs 12tn wealth transferring from Smart beta 30% growth in 1920/30 to 1946/64 generation. 2017. Source: EY EFT report Source: Burnmark report April 2017. 2017.
AI Foundations • Several recent innovations in AI and Big Data • Deep Reinforcement Learning • Differentiable Neural Computer • Large scale data streaming infrastructures from eCommerce
Classical Portfolio Construction • Information bottleneck • Signal based, not end-to-end • Partial and staged data usage • Many data sources cannot be integrated with current factor based portfolio construction • Retrospective • Design, fit, deploy, re-engineer offline • Missing feedback link and on-line learning • Difficult to account for nonlinear dependencies
Signal Based Market state Next Market S t state S t+1 Supervised Error - learning Weights Forecasting Trade rule P&L/Utility system 𝜄 1 system 𝜄 2 𝑉(𝜄 1 , 𝜄 2 ) Trades Signals Transaction Information costs bottleneck
Reinforcement Learning Based • Full information in portfolio weights and trades • Feedback loop to improve on good decisions and avoid unsuccessful decisions • Allows for more realistic modeling of intelligence of a successful PM or trader • Much easier process
Reinforcement Learning Based Market state Next Market S t state S t+1 Reinforcement learning 𝑉(𝜄 1 ) Trade P&L/Utility system 𝜄 1 𝑉(𝜄 1 ) Weights/Trades Delay Transaction costs
0 1 2 3 PaaS to design, train, test, AI Trading Agents deploy and run agents 4 5 AI Agent Objective Excess return Online Learning Initial Training Risk Medium 2 5 Strategy & Style 4 Frequency 1h 1 Universe AAPL AMZN BAC GOOG WMT JMP I learn from 5 2 Price VIX News LOB News Batch architecture Online architecture My performance Live Historic 2 5 Stats Stats 3 Brokers & Exchanges Training Data Scenarios 5 Bull Bear Crash Stats Stats Stats 0 5 0 Streaming architecture
Challenges and Insights
Reinforcement Learning Setup Learning a behavioral strategy which maximizes long term sum of rewards by a direct interaction with an unknown and uncertain environment Environment While not terminal do: Agent perceives state s t Agent performs action a t Action Reward State Agent receives reward r t Environment evolves to state s t+1 Agent
Environment RL - State Agent Environment state • What is the market state? • Which data required • Price data top of the book • LOB L1, L2 • LOB messages • Secondary and non standard data • Event data versus time clocked data • How to combine agent and market state to environment state?
Environment RL - Policy Agent Agent policy specification • What is the agent action? • Continuous action for percentages of wealth • Discrete units of lots to buy/sell • Order implementation using market/limit orders • Long only vs. long/short • Long only vs. long/short? • Long only agent do not face bankruptcy • Short position can lead to bankruptcy
Environment RL - Policy Agent Distributions on the simplex • Commonly known distributions (Dirichlet, …) not appropriate • Exploiting less known Hilbert space structure on (open) simplex leading to isometry to Euclidian space (Aitchison) Isometry Pull back normal distribution, Student-t, etc.
Environment RL - Interaction Agent Interaction of agent and environment • Market evolution, LOB resilience • Temporal and permanent market impact • Position change • Order cancellations Market evolution • Partial fills and impact Market Market Market Policy liquidity prices prices Target Executed trades trades Agent Agent positions positions Filled trades New state State Time
Our 6 Main Challenges • Sparse trading, learn to use cash and wait for opportunities • Robustness of RL • Scaling up RL training • Handling high resolution event time series data • Adapting agents to changing markets while not forgetting • Explaining agent decisions and behavior
Sparse Trading • Reward modelling, including realistic transaction cost modelling • Adding risk to give cash a value • Properly balance risk and reward • Combining tree search and RL or option framework to learn to postpone trading
Robustness • Reward modelling • Very long history • Looking at different scales of time series • Training on synthesized data, e.g. reconstruct prices from skewed sampling from empirical return distribution
Scaling up RL • Breaking long episodes into partial episodes with differential memory Partial roll out p Partial states DNC Estimated s s s Environment states s initial state ……… Policy Environment reaction Action a a a
High Resolution Event TS • New hybrid RNN – CNN network topology • Properly apply convolution over time and cross section • Cross section should be permutation invariant! • Convolution at different time frequencies • Residual NN OHLC OHLC too simplistic
Adapting while not Forgetting • New attention mechanism relative to prior attention with penalty • Prior attention reflects “agent style” Prioritization of new data History New Important time Important time marked in prior marked in prior
Explaining Agent Decisions • Learning supervised model to explain agent returns • Compare to different ETF and investment products following a specific investment style • Value • Growth • Momentum • Mean reversion
Contact Info www.flink.ai | daniel@flink.ai
Recommend
More recommend