Hawkular Metrics Metric Storage & Alerting Stefan Negrea
About Me Co-Creator of Hawkular Metrics 2
Hawkular Metrics Hawkular Demo & Alerting Introduction to Hawkular Metrics 3
Pre-History ● 2006 JBoss Operations Network 1.0 ● 2008 Project RHQ ○ JBoss Operations Network 2.0 ○ Metrics stored in Postgres 4
Pre-History 5
Pre-History ● 2012 - 2013 RHQ Storage Nodes ○ Cassandra based ○ Store metrics ● 2014 RHQ Metrics 6
= + Hawkular It’s a hawk with a monocular. Hawks are known to have a very sharp vision and very good hunters, they can catch preys anticipating their movements at a very fast speed. The goal is to be able to monitor and catch anomalies in fast pace environments. All* projects are Apache License 2.0 7
History ● 2014 Hawkular organization formed ● 2014 Hawkular Alerting started ● 02/2015 RHQ Metrics joins Hawkular org ● 12/2015 Hawkular Metrics integrated in OpenShift Origin v3 ● 10/2016 Hawkular Metrics includes Hawkular Alerting 8
Hawkular Metrics Hawkular Metrics is a storage engine for metric data metric data = a measurement taken at a specific time storage engine = store metrics efficiently for their useful lifetime 9
Supported Metrics Memory usage ● Gauge (metric1, 4.5, 1493301898245) ○ number (metric1, 5.6, 1493301898246) (metric1, 1.2, 1493301898247) ○ varies (not monotonic) ○ rate of change Number of visitors ● Counter (metric2, 4, 1493301898248) ○ integer (metric2, 5, 1493301898249) (metric2, 9, 1493301898250) ○ monotonic (increasing or decreasing (metric2, 0, 1493301898251) ○ rate of change ○ support for reset 10
Supported Metrics Server status ● Availability (metric3, UP, 1493301898253) ○ Availability of a resource (metric3, DOWN, 1493301898254) (metric3, UP, 1493301898255) ○ up, down, or unknown ○ can compute interesting stats based on values ● String Value of configuration key ‘k’ (metric4, “k=v”, 1493301898256) ○ just that (metric4, “k=t”, 1493301898257) ○ possible uses: logs, events, config (metric4, “k=1”, 1493301898258) (metric4, “k=4”, 1493301898259) 11
12
Cassandra - Storage Management & Support ● Highly available, fault tolerant ● No specialized node roles ● Minimal configuration Performance & Scalability ● Optimized for writes ● Data compression ● Indexing 13
Cassandra - Storage ● CQL based ● Partitioning & indexing of data based on usage ● Use built-in compression & TTL ● Use the Datastax driver fully async ● Support for latest C* 3.0.x release ● Keep updating to latest stable ● Use multiple tables for indexing 14
App Layer ● REST API with JSON ● JAX-RS 2.0 (async spec) ● Fully async = JAX-RS 2.0 async + RX Java + async C* driver ● Stateless** server (Metrics, mostly) ● Minimal clustering via Infinispan ● Schema Management ● Easy to use ○ packaged distribution with WildFly ○ download and run, only JDK required 15
Performance - Sample C* - 4 CPU, 4GB Hawkular - 4 CPU, 4GB message sizes: 10 datapoints: 2592 req/sec => 25920 datapoints/sec 100 datapoints: 365 req/sec => 36500 datapoints/sec 5000 datapoints: 7.6 req/sec => 38000 datapoints/sec C*, 8 CPU, 8GB Hawkular, 8 CPU, 4GB message sizes: 10 datapoints: 4655 req/sec => 46550 datapoints/sec 100 datapoints: 604 req/sec => 60400 datapoints/sec 5000 datapoints: 15 req/sec => 75000 datapoints/sec 16
Features ● Multi-tenant ○ tenant id required on each request (HAWKULAR-TENANT header) ○ no way to get data from multiple tenants at once ● Can insert data without pre-creating metrics ● Data is compressed using Gorilla compression ○ 2 hour time window ○ further reduces disk footprint ○ LZ4 enabled in Cassandra ○ Load testing: ■ 5000 data points/sec for 5 days = 26GB ■ 83M data points ~ 1GB of disk space 17
Features ● Bulk insertion endpoint for metrics and data ● Tagging support for metrics and single data points ○ key, value; multi-tag support ■ tag1 = d ○ metrics queryable via TQL (tag query language) ■ AND, OR, NOT ■ grouping ■ wildcard matching ■ a1 = 'd' OR ( a1 != 'ab' AND c1 ) 18
Features - Simple REST API ● Endpoint for each metric type ○ /gauges, /availability, /counters, /strings ○ Each metric type has almost identical endpoints ● Raw data - /gauges/raw ● Raw data for single metric - /strings/{metric_id}/raw ● Query time aggregation ○ multiple metrics - /availability/stats ○ single metric - /counter/{metric_id}/stats ● Bulk operations - /metrics 19 ** String metrics do not have stats (yet?)
Features - Aggregation & Rate ● Query Time Aggregation ○ Combine multiple metrics and get statistical data ○ Gauge and counter: average, median, percentile, sum ○ Availability: ratios for uptime and downtime, downtime duration ○ Time Slicing: first group data, then compute stats ○ Single or multiple metrics ● Rate ○ available for gauges and counters ○ rate of change of the values for the timespan ○ ex: how fast is the number of total requests increasing 20
Metrics + Alerting ● Natural fit: collect data and then alert on anomalies ● Two ways to alert on metric data ○ Dedicated API for setting up alerts, incoming data is filtered and processed by the alerting engine ○ Metrics Alerter that queries single or multiple metrics, no need to predefine alerts triggers ahead of time. 21
Alerting Features ● Single and group Triggers ● Template triggers ● Complex conditions ● Dampening ● Auto-resolve/auto-disable triggers ● Pluggable notifiers 22
23
Roadmap - 2017 ● Automatic & persisted aggregation ● Management capabilities for the Cassandra cluster ● Query language ● Performance improvements ○ already have a good baseline, but can do better ○ read/write 24
Demo
Demo ● Install ccm ○ https://github.com/pcmanus/ccm ● Start a single node C* cluster ○ ccm create -v 3.0.12 -n 1 -s hawkular ● Download, extract and start Hawkular Metrics ○ https://origin-repository.jboss.org/nexus/content/groups/public/org/ha wkular/metrics/hawkular-metrics-wildfly-standalone/0.26.1.Final/ ○ bin/standalone -b 0.0.0.0 ● Download, extract and start Grafana ● Download, install, and configure the Hawkular plugin for Hawkular ○ https://grafana.com/plugins/hawkular-datasource/installation ○ https://github.com/hawkular/hawkular-grafana-datasource i. pick a tenant id of your choice 26
Demo ● Install the Hawkular Metrics python client via pip ○ pip install hawkular-client ● Install psutil to collect CPU stats ○ pip install psutil ● Create an custom agent (using python client) ○ make sure you use the same tenant id configured with Grafana ○ pre-create and tag a metric for each CPU ○ collect CPU usage every 10 seconds ○ send the data to Hawkular Metrics 27
Demo #! /usr/bin/env python3 import psutil, time from hawkular.metrics import HawkularMetricsClient, MetricType client = HawkularMetricsClient(tenant_id='test') cpu_percent = psutil.cpu_percent(interval = 1, percpu = True) for index, cpu in enumerate(cpu_percent) : client.create_metric_definition(MetricType.Gauge, 'cpu%s' % index, cpu = 'cpu%s' % index) while True : cpu_percent = psutil.cpu_percent(interval = 1, percpu = True) for index, cpu in enumerate(cpu_percent) : client.push(MetricType.Gauge, 'cpu%s'% index, float(cpu)) time.sleep(10) 28
Resources ● Web - http://www.hawkular.org/ ● Github - https://github.com/hawkular ● Metrics Documentation - http://www.hawkular.org/tags/metrics.html ● Alerting Documentation - http://www.hawkular.org/tags/alerts.html ● Twitter - https://twitter.com/hawkular_org 29
Thank you! hawkular.org #hawkular (on freenode) snegrea@redhat.com
Recommend
More recommend