VISUALISING REAL TIME TRAFFIC DATA USING ELASTICSEARCH AND C3JS @ jettroCoenradie Trifork Amsterdam Case Study ANWB (Royal Dutch Automobile Association)
FACT SHEET Software engineer @ Trifork Jettro Coenradie specialised in search @jettroCoenradie Twitter @gridshore Gihub https://github.com/jettro Linkedin https://www.linkedin.com/in/jettro http://www.gridshore.nl Blogs http://blog.trifork.com/author/jettro/
GOAL Ideas for combining (open) data Evaluate options and performance
WHAT IS ANWB? • Dutch Automobile Driver Assistance • Sister from: FDM (Danmark) ADAC (Germany) AA (England)
Founded in 1883 as Algemene Nederlandse Wieler Bond General Dutch Bicycle Association
WHAT IS ELASTICSEARCH • Distributed / Scalable search • Structured and full-text • Data analytics • Log analysis
(OPEN) DATA Real time traffic data Weather data Automobile Assistance data
GOAL FOR THE PROJECT Amount of cars on the roads Wrong data Traffic intensity on the roads
FLOW OF THE PROJECT • Get to know the data: Logstash / Kibana • Start improving data quality • Present data using our own charts
TECHNICAL OVERVIEW Tomcat - Spring mvc - c3js Data view Spring Integration Data xml / csv integration Data Store elasticsearch
DEMO
Index A Index B Index C Shard 1 Shard R 1 Shard 2 Shard R 2 Lucene Lucene Lucene Lucene
TIME BASED INDICES Strings Numbers NDW Dates Geo points
TIME BASED INDICES Alias NDW-2014-09-15 NDW NDW-2014-09-16 NDW-2014-09-17 mapping-template
SCHEMA-LESS Dynamic schema • There is always a schema • The schema can be dynamic • Often you want to be specific Dates / Numbers / Geo locations
SEARCH Full text search Versus Structured search
STRUCTURED SEARCH Filters • Can be cached most of the time • No scoring • Fast
FILTERS WE USED • Range filters • Term filters • Composite (bool) filters
Date Range Filter Range Filter Term Filter
AGGREGATIONS Two types of aggregations • Create buckets of data • Compute Metrics
Set of documents Doc Doc Doc Term: red, blue, green, yellow Range: 0-10, 10-20, 20-30, 30-40 Condition Bucket Bucket Bucket Bucket
D D Set of documents
AGGREGATIONS WE USED • Date histogram aggregations • Terms aggregations • AVG aggregations
Date Histogram Aggregation + AVG metric Aggregation
Terms Aggregation
GEO LOCATIONS Two types of locations • Using latitude and longitude • Using geohash (creating a grid)
GEO LAT/LON • Used for distance based queries • Used for distance based aggregations
GEO HASH • Uses a hash te represent a square • More characters means more precision
GEOHASH http://www.bigdatamodeling.org/2013/01/intuitive-geohash.html
PERCOLATOR “The opposite of executing a query and finding results”
PERCOLATOR “Match an (existing) document against stored queries.”
PERCOLATOR Zuid-West Geo polygon Noord-West filter Noord-Oost Zuid { location: [ Zuid-West 3.5123, 46.3412 ] }
QUESTIONS @jettroCoenradie
Recommend
More recommend