1 1 Pivotal Confidential–Internal Use Only
Implementing a highly scalable Stock prediction system with Apache Geode, Spring XD and Spark MLib (incubating) Fred Melo William Markito @fredmelo_br @william_markito 2
About us Fred Melo William Markito Technical Director for Data Enterprise Architect for GemFire fmelo@pivotal.io wmarkito@pivotal.io @fredmelo_br @william_markito
A Simple Example Forecast Data Sources Look for patterns
Applicability " Smart System"
What do we want to build? Evaluates LIVE DATA “ According to historical Real-Time trends, there’s an 80% chance this stock prices might go down within the next few minutes" Trading Data Live data becomes historical over time Smart System Learns with HISTORICAL TRENDS Historical " How were the technical indicator readings when the latest price drops happened? "
Data The Machine Learning Pipeline data flow Temperature Hot 1- Live data is ingested into the grid Apache Geode / GemFire Spring XD Live Data 4 - “Hot" data ages, becoming part of the 2 - Trained ML model compares historical dataset new data to historical patterns Machine 5 - Re-training 3 - Results are pushed Learning model triggered, ML Apache Hawq immediately to model updated. deployed applications Spring XD Cold
Data The Machine Learning Pipeline data flow Temperature Simplified Model Hot 1- Live data is ingested into the grid Apache Geode / GemFire Spring XD Live Data 2 - Trained ML model compares new data to historical patterns Machine 3 - Results are pushed Learning model 5 - Re-training immediately to triggered, ML deployed applications model updated. Warm Spring XD
1 Transform Sink Split Real data /Stocks 2 Enrich Filter /TechIndicators 3 Simulator /Predictions Predict SpringXD Machine Indicators Learning Extensible Open-Source Dashboard Fault-Tolerant Horizontally Scalable Cloud-Native
Too complex?? Eating it in small bites…
GemFire SpringXD
1 Transform Sink Split Real data /Stocks 2 Enrich Filter /TechIndicators 3 Simulator /Predictions Predict SpringXD Machine Indicators Learning Extensible Open-Source Dashboard Fault-Tolerant Horizontally Scalable Cloud-Native
Apache Geode Concepts • Cache • Configurable through XML, ,Java /Stocks • Region • Distributed j.u.Map on steroids /TechIndicators • Highly available, redundant • Member /Predictions • Locator, Server, Client • Callbacks • Listener, Writer, AsyncEventListener, Parallel/Serial
Apache Geode HA and Fail-Tolerance
1 Transform Sink Split Real data /Stocks 2 Enrich Filter /TechIndicators 3 Simulator /Predictions Predict SpringXD Machine Indicators Learning Extensible Open-Source Dashboard Fault-Tolerant Horizontally Scalable Cloud-Native
Streams 1 Transform Sink Split Pipelines SpringXD 2 Sources Enrich Filter Sinks 3 Predict Filters Taps
1 Transform Sink Split Real data /Stocks 2 Enrich Filter /TechIndicators 3 Simulator /Predictions Predict SpringXD Machine Indicators Learning Extensible Open-Source Dashboard Fault-Tolerant Horizontally Scalable Cloud-Native
Label Features Machine Learning Model (e.g. Linear Regression) price(x) medium avg (x+1) medium avg (x) relative strength (x)
Label Features Machine Learning Model (e.g. Linear Regression) price(x) medium avg (x+1) medium avg (x) relative strength (x)
Demo Time Error
Source code and detailed instructions available at: https://github.com/Pivotal-Open-Source-Hub/StockInference-Spark Follow us on Twitter! Fred Melo William Markito @fredmelo_br @william_markito 22
23 1 Pivotal Confidential–Internal Use Only
Recommend
More recommend