Deep RL and Memory NVIDIA GTC 2018 March 26, 2018 Daniel Egloff, - PowerPoint PPT Presentation

Trade and Manage Wealth with Deep RL and Memory NVIDIA GTC 2018 March 26, 2018 Daniel Egloff, Founder, CEO, Head R&D

Problem Retail investor customer demands • Manage portfolio more actively • Get additional return from smart investment decisions Lack of products Market barriers Costs Current digital Regulations limits Existing actively platforms focus on access to hedge fund managed products passively managed style active products charge high fees products

Question Can modern AI technology replace a PM or trader?

Solution – AI Agents Smart data-aware AI agents for active investment decisions • AI powered alternative to smart beta ETFs • AI supervised and automated trading strategies Save costs Save time Smart returns Don’t miss market Extra return from Fully automated opportunities smart data driven without human Delegate work to decisions interaction smart agents

90% of the robo-advisors today are ETF-based. ETFs alone have run out of steam to fuel the Market Validation next growth of robo advisors. Robo advisors need more than 6 years to make a profit of a customer, post acquisition. Source: Burnmark report April 2017. Intergeneration Investor wealth transfer behavior change Digital Reduced margins Robo advisors channels Growth of Growth of Growth of Smart Beta ETF market online brokers ETFs 12tn wealth transferring from Smart beta 30% growth in 1920/30 to 1946/64 generation. 2017. Source: EY EFT report Source: Burnmark report April 2017. 2017.

AI Foundations • Several recent innovations in AI and Big Data • Deep Reinforcement Learning • Differentiable Neural Computer • Large scale data streaming infrastructures from eCommerce

Classical Portfolio Construction • Information bottleneck • Signal based, not end-to-end • Partial and staged data usage • Many data sources cannot be integrated with current factor based portfolio construction • Retrospective • Design, fit, deploy, re-engineer offline • Missing feedback link and on-line learning • Difficult to account for nonlinear dependencies

Signal Based Market state Next Market S t state S t+1 Supervised Error - learning Weights Forecasting Trade rule P&L/Utility system 𝜄 1 system 𝜄 2 𝑉(𝜄 1 , 𝜄 2 ) Trades Signals Transaction Information costs bottleneck

Reinforcement Learning Based • Full information in portfolio weights and trades • Feedback loop to improve on good decisions and avoid unsuccessful decisions • Allows for more realistic modeling of intelligence of a successful PM or trader • Much easier process

Reinforcement Learning Based Market state Next Market S t state S t+1 Reinforcement learning 𝑉(𝜄 1 ) Trade P&L/Utility system 𝜄 1 𝑉(𝜄 1 ) Weights/Trades Delay Transaction costs

0 1 2 3 PaaS to design, train, test, AI Trading Agents deploy and run agents 4 5 AI Agent Objective Excess return Online Learning Initial Training Risk Medium 2 5 Strategy & Style 4 Frequency 1h 1 Universe AAPL AMZN BAC GOOG WMT JMP I learn from 5 2 Price VIX News LOB News Batch architecture Online architecture My performance Live Historic 2 5 Stats Stats 3 Brokers & Exchanges Training Data Scenarios 5 Bull Bear Crash Stats Stats Stats 0 5 0 Streaming architecture

Challenges and Insights

Reinforcement Learning Setup Learning a behavioral strategy which maximizes long term sum of rewards by a direct interaction with an unknown and uncertain environment Environment While not terminal do: Agent perceives state s t Agent performs action a t Action Reward State Agent receives reward r t Environment evolves to state s t+1 Agent

Environment RL - State Agent Environment state • What is the market state? • Which data required • Price data top of the book • LOB L1, L2 • LOB messages • Secondary and non standard data • Event data versus time clocked data • How to combine agent and market state to environment state?

Environment RL - Policy Agent Agent policy specification • What is the agent action? • Continuous action for percentages of wealth • Discrete units of lots to buy/sell • Order implementation using market/limit orders • Long only vs. long/short • Long only vs. long/short? • Long only agent do not face bankruptcy • Short position can lead to bankruptcy

Environment RL - Policy Agent Distributions on the simplex • Commonly known distributions (Dirichlet, …) not appropriate • Exploiting less known Hilbert space structure on (open) simplex leading to isometry to Euclidian space (Aitchison) Isometry Pull back normal distribution, Student-t, etc.

Environment RL - Interaction Agent Interaction of agent and environment • Market evolution, LOB resilience • Temporal and permanent market impact • Position change • Order cancellations Market evolution • Partial fills and impact Market Market Market Policy liquidity prices prices Target Executed trades trades Agent Agent positions positions Filled trades New state State Time

Our 6 Main Challenges • Sparse trading, learn to use cash and wait for opportunities • Robustness of RL • Scaling up RL training • Handling high resolution event time series data • Adapting agents to changing markets while not forgetting • Explaining agent decisions and behavior

Sparse Trading • Reward modelling, including realistic transaction cost modelling • Adding risk to give cash a value • Properly balance risk and reward • Combining tree search and RL or option framework to learn to postpone trading

Robustness • Reward modelling • Very long history • Looking at different scales of time series • Training on synthesized data, e.g. reconstruct prices from skewed sampling from empirical return distribution

Scaling up RL • Breaking long episodes into partial episodes with differential memory Partial roll out p Partial states DNC Estimated s s s Environment states s initial state ……… Policy Environment reaction Action a a a

High Resolution Event TS • New hybrid RNN – CNN network topology • Properly apply convolution over time and cross section • Cross section should be permutation invariant! • Convolution at different time frequencies • Residual NN OHLC OHLC too simplistic

Adapting while not Forgetting • New attention mechanism relative to prior attention with penalty • Prior attention reflects “agent style” Prioritization of new data History New Important time Important time marked in prior marked in prior

Explaining Agent Decisions • Learning supervised model to explain agent returns • Compare to different ETF and investment products following a specific investment style • Value • Growth • Momentum • Mean reversion

Contact Info www.flink.ai | daniel@flink.ai

Deep RL and Memory NVIDIA GTC 2018 March 26, 2018 Daniel Egloff, - PowerPoint PPT Presentation

Trade and Manage Wealth with Deep RL and Memory NVIDIA GTC 2018 March 26, 2018 Daniel Egloff, Founder, CEO, Head R&D Problem Retail investor customer demands Manage portfolio more actively Get additional return from smart

Memory II. Memory improvement III. Problems with memory 3 systems/stages of Memory: memory

1 Memory SoC Persistent Memory-Driven Memory Memory Processor-Centric Memory SoC SoC

Networks Computer-Computer Comm CPU CPU CPU CPU Memory Device Device Memory Memory

Virtual Memory 1 Memory Hierarchy Memory 4GB Cache 1M Registers 1K Question: What if

Personal SE Computer Memory Addresses C Pointers Computer Memory Organization Memory is a

Memory Memory processing is the ability to: Acquire (Short term memory) Manipulate

Memory Management Memory Manager Requirements Minimize primary memory access time

AGN deep multiwavelength AGN deep multiwavelength AGN deep multiwavelength surveys: surveys:

Virtual Memory and Virtual Memory and Demand Paging Demand Paging Virtual Memory Illustrated

Dynamic Memory Management 333 Dynamic Memory Management Process Memory Layout Process Memory

Memory Management Ideally programmers want memory that is large fast non

UNIFIED MEMORY IN CUDA 6 MARK HARRIS NVIDIA CONFIDENTIAL Unified Memory Dramatically Lower

Lecture 11: Persistent Memory Databases 1 / 71 Persistent Memory Databases Recap

Memory Hierarchy: Caching CSE 141, S2'06 Jeff Brown The memory subsystem Computer Control

28.05.04 09:50 Memory Management The computer memory is a limited resource so the Memory

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

Teagasc Technology Foresight Presentation to the Second Stakeholders Workshop 12 th January,

Two Cycles of APM Data in Oregons APCD HEALTH POLICY & ANALYTICS DIVISION Stacey Schubert

CHOLAMANDALAM FINANCIAL HOLDINGS LIMITED (CFHL) CORPORATE PRESENTATION FY19 1 Murugappa

International Energy Forum WWW.IEFS.ORG.SA LONDON ENERGY MEETING, 19 DECEMBER 2008 IEF EF

Bid id brie riefin ing se sessio ion: Request for r Acquis isit itio ion of f Onlin

Housing Training Workshop Joe Sullivan 16 th October 2014 Waterford City and County Council

Tackling Homelessness Housing Advisor Programme Learning Event Neil Morland & Co Housing

Assessing your risk from tenancy fraud David Hughes Head of Internal Audit and Counter Fraud

Deep RL and Memory NVIDIA GTC 2018 March 26, 2018 Daniel Egloff, - PowerPoint PPT Presentation

Trade and Manage Wealth with Deep RL and Memory NVIDIA GTC 2018 March 26, 2018 Daniel Egloff, Founder, CEO, Head R&D Problem Retail investor customer demands Manage portfolio more actively Get additional return from smart

Memory II. Memory improvement III. Problems with memory 3 systems/stages of Memory: memory

1 Memory SoC Persistent Memory-Driven Memory Memory Processor-Centric Memory SoC SoC

Networks Computer-Computer Comm CPU CPU CPU CPU Memory Device Device Memory Memory

Virtual Memory 1 Memory Hierarchy Memory 4GB Cache 1M Registers 1K Question: What if

Personal SE Computer Memory Addresses C Pointers Computer Memory Organization Memory is a

Memory Memory processing is the ability to: Acquire (Short term memory) Manipulate

Memory Management Memory Manager Requirements Minimize primary memory access time

AGN deep multiwavelength AGN deep multiwavelength AGN deep multiwavelength surveys: surveys:

Virtual Memory and Virtual Memory and Demand Paging Demand Paging Virtual Memory Illustrated

Dynamic Memory Management 333 Dynamic Memory Management Process Memory Layout Process Memory

Memory Management Ideally programmers want memory that is large fast non

UNIFIED MEMORY IN CUDA 6 MARK HARRIS NVIDIA CONFIDENTIAL Unified Memory Dramatically Lower

Lecture 11: Persistent Memory Databases 1 / 71 Persistent Memory Databases Recap

Memory Hierarchy: Caching CSE 141, S2'06 Jeff Brown The memory subsystem Computer Control

28.05.04 09:50 Memory Management The computer memory is a limited resource so the Memory

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

Teagasc Technology Foresight Presentation to the Second Stakeholders Workshop 12 th January,

Two Cycles of APM Data in Oregons APCD HEALTH POLICY &amp; ANALYTICS DIVISION Stacey Schubert

CHOLAMANDALAM FINANCIAL HOLDINGS LIMITED (CFHL) CORPORATE PRESENTATION FY19 1 Murugappa

International Energy Forum WWW.IEFS.ORG.SA LONDON ENERGY MEETING, 19 DECEMBER 2008 IEF EF

Bid id brie riefin ing se sessio ion: Request for r Acquis isit itio ion of f Onlin

Housing Training Workshop Joe Sullivan 16 th October 2014 Waterford City and County Council

Tackling Homelessness Housing Advisor Programme Learning Event Neil Morland &amp; Co Housing

Assessing your risk from tenancy fraud David Hughes Head of Internal Audit and Counter Fraud

Two Cycles of APM Data in Oregons APCD HEALTH POLICY & ANALYTICS DIVISION Stacey Schubert

Tackling Homelessness Housing Advisor Programme Learning Event Neil Morland & Co Housing