A Stock Prediction System using open-source software Fred Melo William Markito fmelo@pivotal.io wmarkito@pivotal.io @fredmelo_br @william_markito
It's all about DATA Prediction Data Sources Look for patterns
Machine Learning is the answer Clustering Genetic Algorithms Neural Networks
Applying Machine Learning Train with historical dataset Apply model to the new input
Why so hard? Hard to scale Hard to make it real-time Hard to add new data sources Why?
Traditional models are reactive and static Store Analytics Data Lake HDFS No real-time information Hard to change ETL based Labor intensive Data-source specific Inefficient
Stream-based, real-time closed-loop analytics are needed In-Memory Data Stream Pipeline Real-Time Data Expert System / Data Lake HDFS Machine Learning Continuous Learning Multiple Data Sources Continuous Improvement Real-Time Processing Continuous Adapting Store Everything
How can it be addressed? Info Look at past trends Neural Network (for similar input) Evaluate current input Analysis Score / Predict
How can it be addressed? Info Filter Neural Network Analysis [ json ]
How can it be addressed? Info Filter Enrich Neural Network Analysis
How can it be addressed? Info Filter Enrich Transform Neural Network Analysis
How can it be addressed? Info Neural Network Filter Enrich Transform Analysis
How can it be addressed? Info Neural Network Filter Enrich Transform Analysis Transform
How can it be addressed? Neural Network Real-time In-Memory Data Grid scoring Train
How can it be addressed? Neural Network In-Memory Data Grid Update Push Front-end
Streaming real-time analytics architecture Other Sources and Destinations Distributed Computing JMS Fast Data Ingest Transform Sink SpringXD Store / Analyze Predict / Machine Learning
Demo Architecture Fast Data HTTP Sink Transform Split Filter Predict Sink HTTP SpringXD Push Machine Learning Extensible Open-Source Fault-Tolerant Horizontally Scalable Dashboard
splitter http-server Simulator geode-json Transformer client splitter http-client tap geode-json obj-to-json client shell - R SpringXD
Data Stream Pipelining SpringXD ANALYZE INGEST / SINK PROCESS • Little or no coding required • Import and invoke PMML jobs • Call Spark, Reactor or RxJava easily • Dozens of built-in connectors • Built-in configurable filtering, • Call Python, R, Madlib and other splitting and transformation • Seamless integration with Kafka, tools Sqoop • Out-of-box configurable jobs for • Built-in configurable counters and batch processing • Create new connectors easily gauges using Spring
Scale-Out and HA Architecture SpringXD XD admin Split Filter Transform Sink Ingest SpringXD Stream Deployment XD Nodes XD Nodes XD Nodes XD Nodes XD Nodes Ingest Split Transform Sink Filter Messaging
Demo Architecture Fast Data HTTP Sink Transform Split Filter Predict Sink HTTP SpringXD Push Machine Learning Extensible Open-Source Fault-Tolerant Horizontally Scalable Dashboard
Geode client-server architecture
Partitioned Regions
Event handling
Demo Architecture Fast Data HTTP Sink Transform Split Filter Predict Sink HTTP SpringXD Push Machine Learning Extensible Open-Source Fault-Tolerant Horizontally Scalable Dashboard
Neural Networks
Neural Networks
Neural Network price(x) medium medium avg (x) avg (x+1) relative strength (x)
Neural Network
Demo Architecture Fast Data HTTP Sink Transform Split Filter Predict Sink HTTP SpringXD Push Machine Learning Extensible Open-Source Fault-Tolerant Horizontally Scalable Dashboard
Demo Time
splitter http-server Simulator geode-json Transformer client splitter http-client tap geode-json obj-to-json client shell - R SpringXD
SpringXD http://projectgeode.org http://projects.spring.io/spring-xd http://www.r-project.org
The demo code is on GitHub! @fredmelo_br @william_markito Follow-up: In-Memory Unconference "A place for all things in-memory: projects, people, ideas, roadmaps, discussions." Location: Hill Country A/B” Weds 4:15pm - 6pm. (after this talk)
Recommend
More recommend