OpenTSDB + Bigtable Integrating time series database with Google Cloud Bigtable Danil Zburivsky, Big Data Practice Lead - @zburivsky Christos Soulios, Big Data Architect - @c_soulios
Pythian specializes in design, implementation, and management of systems that directly contribute to revenue and business success. History 19 years in business Growing at 30+% per year 400+ employees 300+ customers worldwide HQ Ottawa, Canada - global reach Technology agnostic = trusted advisor Deep expertise: Oracle, Oracle Apps, MySQL, AWS, SQL Server, Cassandra/DataStax, Azure, PostgreSQL, Cloudera, MapR, Hortonworks etc. Google Premier Partner Status (as of end Aug) 5 Certified Developers (soon to be 12) Dedicated Google Technical Champion Launch partner for: Kubernetes, Dataflow, Cloud SQL, Dataproc Integrated OpenTSDB with Bigtable DW Explorers Program Partner Upcoming BigQuery & Cloud ML Launch Partner
Time series data • (time, metric, value) • OS and apps metrics • Industrial equipment • Web traffic
Storing time series data is a challenge • Volume can be explosive • Data arrival and access patterns are different
Storing time series data is a challenge • Volume can be explosive • Data arrival and access patterns are different
Better alternatives — specialized stores • NoSQL • Data model and storage optimized for time series • Separate query language
OpenTSDB • Open source • Uses HBase as a data store • Data model optimized for time series • REST API <metric_uid><timestamp><tagk1><tagv1>[...<tagkN><tagvN>] <col_t+1>[...<col_t+N>]
OpenTSDB Architecture Server Server Server Server TSD RPC HTTP TSD TSD Web UI TSD RPC Scripts/Alerting HBase RPC HBase
HBase can be too much • HBase requires a full Hadoop setup (3xZK, 2xNN, 3xDN, 2xHMaster, 3xHRegion) • HBase tuning is a job for the brave (HFiles, WAL, MemStore, BucketCache, BlockCache)
HBase can be too much
But all I wanted was a time series database
Google Cloud Bigtable • Highly Scalable NoSQL database • Low latency, high throughput • Powers most Google products • Available as a Google Cloud Service
Migrate HBase apps to Cloud Bigtable • The Bigtable client is API compatible with HBase client • Only replace hbase-client.jar with bigtable-hbase.jar • No code changes required!
Migrate OpenTSDB to Cloud Bigtable • OpenTSDB does not use standard hbase-client.jar • OpenTSDB is based on AsyncHBase library
AsyncHBase library • Open source HBase client library • Multi-threaded Multiple threads use the same instance • Fully asynchronous, non-blocking • Implements the low level HBase RPCs
Detour: Asynchronous programming
Detour: Why asynchronous? • Efficient thread usage • Less threads = less memory • CPU scheduler friendly • Extremely high concurrency
AsyncHBase library http://www.tsunanet.net/~tsuna/asynchbase/benchmark/viz.html
AsyncHBase library “ AsyncHBase client differs significantly from HBase's client . Switching to it is not easy as it requires to rewrite all the code that was interacting with any HBase API” AsyncHBase documentation
AsyncBigtable library ● Complete rewrite of AsyncHBase API ● Uses standard hbase-client for Bigtable access ● Compatible with the bigtable-hbase API
AsyncBigtable challenges ● OpenTSDB jar dependencies ● AsyncBigtable is not async! ● BufferedMutator + Threadpool to emulate async
AsyncBigtable library
AsyncBigtable library ● Merged upstream OpenTSDB v2.3.0 ● http://opentsdb.net/docs/build/html/user_guide/backend s/bigtable.html ● https://github.com/OpenTSDB/asyncbigtable
Future work ● Native Bigtable API ● Fully asynchronous ● Improve performance ● Add more unit tests
Questions? https://github.com/opentsdb/asyncbigtable
Recommend
More recommend