CLICKHOUSE MATERIALIZED VIEWS A SECRET WEAPON FOR HIGH PERFORMANCE - PowerPoint PPT Presentation

CLICKHOUSE MATERIALIZED VIEWS A SECRET WEAPON FOR HIGH PERFORMANCE ANALYTICS Robert Hodges -- Percona Live 2018 Amsterdam

Introduction to Presenter Robert Hodges - Altinity CEO www.altinity.com 30+ years on DBMS plus Leading software and services virtualization and security. provider for ClickHouse ClickHouse is DBMS #20 Major committer and community sponsor in US and Western Europe

Introduction to ClickHouse Understands SQL a b c d a b c d Runs on bare metal to cloud Stores data in columns Parallel and vectorized execution a b c d a b c d Scales to many petabytes Is Open source (Apache 2.0) Is WAY fast!

ClickHouse materialized views are triggers (Trigger) Source table Target table INSERT Target table Target table

You can even create recursive views (Trigger) Source table Target table INSERT Target table Target table

Why might a materialized view be useful? SELECT toYear(FlightDate) AS year, sum(Cancelled) / count(*) AS cancelled, sum(DepDel15) / count(*) AS delayed_15 FROM airline.ontime GROUP BY year ORDER BY year ASC LIMIT 10 ... 10 rows in set. Elapsed: 0.894 sec. Processed 173.82 million rows, 1.74 GB (194.52 million rows/s., 1.95 GB/s.) Can we make it faster?

Let’s precompute and store aggregates! CREATE MATERIALIZED VIEW ontime_daily_cancelled_mv ENGINE = SummingMergeTree PARTITION BY tuple() ORDER BY (FlightDate, Carrier) POPULATE AS SELECT FlightDate, Carrier, count(*) AS flights, sum(Cancelled) / count(*) AS cancelled, sum(DepDel15) / count(*) AS delayed_15 FROM ontime GROUP BY FlightDate, Carrier

Query performance impact is significant SELECT toYear(FlightDate) AS year, sum(flights) AS flights, sum(cancelled) AS cancelled, sum(delayed_15) AS delayed_15 FROM airline.ontime_daily_cancelled_mv GROUP BY year ORDER BY year ASC LIMIT 10 . . . 10 rows in set. Elapsed: 0.007 sec. Processed 148.16 thousand rows, 3.85 MB (20.37 million rows/s., 529.50 MB/s.)

What’s going on under the covers? Compressed size: ~14.6GB Compressed size: ~2.31 MB Uncompressed size: ~4.10 MB Uncompressed size: ~55.4GB ontime .inner.ontime_daily_cancelled_mv INSERT (MergeTree) (SummingMergeTree) ( T r INSERT i g g e r ) ontime_daily_cancelled_mv SELECT (materialized view) Materialized view is 6471x smaller than source table! SELECT

A brief study of ClickHouse table structures CREATE TABLE ontime ( Year UInt16, Table engine type Quarter UInt8, Month UInt8, How to break data ... into parts ) ENGINE = MergeTree() PARTITION BY toYYYYMM(FlightDate) ORDER BY (Carrier, FlightDate) How to index and sort data in each part

Possible ways to transform tables Reduce ontime_agg (Materialized data view) And/Or Change ontime ontime_resorted layout And/Or Change ontime_kafka table type

Exercise: the famous ‘last point problem’ CPU CPU Utilization CPU Host Utilization Utilization 7023 CPU Table CPU CPU Utilization Host Utilization 9601 Problem : Show the CPU CPU Utilization current CPU utilization for CPU Host Utilization Utilization each host 6522

ClickHouse can solve this using a subquery SELECT t.hostname, tags_id, 100 - usage_idle usage FROM ( TABLE SCAN! SELECT tags_id, usage_idle FROM cpu WHERE (tags_id, created_at) IN (SELECT tags_id, max(created_at) USE INDEX FROM cpu GROUP BY tags_id) ) AS c INNER JOIN tags AS t ON c.tags_id = t.id ORDER BY usage DESC, OPTIMIZED t.hostname ASC JOIN COST LIMIT 10

SQL queries work but are inefficient Using direct query on table: OUTPUT: ┌─hostname──┬─tags_id─┬─usage─┐ 10 rows in set. Elapsed: 0.566 sec. │ host_1002 │ 9003 │ 100 │ Processed 32.87 million rows, 263.13 │ host_1116 │ 9117 │ 100 │ MB (53.19 million rows/s., 425.81 │ host_1141 │ 9142 │ 100 │ │ host_1163 │ 9164 │ 100 │ MB/s.) │ host_1210 │ 9211 │ 100 │ │ host_1216 │ 9217 │ 100 │ Can we bring last │ host_1234 | 9235 │ 100 │ │ host_1308 │ 9309 │ 100 │ point performance │ host_1419 │ 9420 │ 100 │ │ host_1491 │ 9492 │ 100 │ closer to real-time? └───────────┴─────────┴───────┘

Create target table for aggregate data CREATE TABLE cpu_last_point_idle_agg ( created_date AggregateFunction(argMax, Date, DateTime), max_created_at AggregateFunction(max, DateTime), time AggregateFunction(argMax, String, DateTime), tags_id UInt32, usage_idle AggregateFunction(argMax, Float64, DateTime) ) Minimal data ENGINE = AggregatingMergeTree() PARTITION BY tuple() ORDER BY tags_id Different table type Different storage layout

argMaxState links columns with aggregates CREATE MATERIALIZED VIEW cpu_last_point_idle_mv MV TO cpu_last_point_idle_agg table AS SELECT argMaxState(created_date, created_at) AS created_date, maxState(created_at) AS max_created_at, argMaxState(time, created_at) AS time, tags_id, argMaxState(usage_idle, created_at) AS usage_idle FROM cpu GROUP BY tags_id Derive data

Digression: How aggregation works Source value created_at Partial maxState(created_at) aggregate Merged maxMerge(max_created_at) aggregate

Selecting rows that match max value (Same row) Source created_at usage_idle values (Pick usage_idle value from any row with matching created_at) Partial maxState(created_at) argMaxState(usage_idle, aggregates created_at) (Pick usage_idle from aggregate with matching created_at) Merged avgMerge(created_at) avgMaxMerge(usage_idle) aggregates

Let’s hide the merge details with a view CREATE VIEW cpu_last_point_idle_v AS SELECT argMaxMerge(created_date) AS created_date, maxMerge(max_created_at) AS created_at, argMaxMerge(time) AS time, tags_id, argMaxMerge(usage_idle) AS usage_idle FROM cpu_last_point_idle_mv GROUP BY tags_id

...Select again from the covering view SELECT t.hostname, tags_id, 100 - usage_idle usage FROM cpu_last_point_idle_v AS b INNER JOIN tags AS t ON b.tags_id = t.id ORDER BY usage DESC, t.hostname ASC LIMIT 10 ... 10 rows in set. Elapsed: 0.005 sec. Processed 14.00 thousand rows, 391.65 KB (2.97 million rows/s., 82.97 MB/s.) Last point view is 113 times faster

Common uses for materialized views ● Precompute aggregates ● Fetch last point data ● Transform table on-disk indexing and sorting ○ Like a Vertica projection ● Keep aggregates after raw input is dropped ● Create data cleaning pipelines ● Read from Kafka queues

Presenter: rhodges@altinity.com Thank you! ClickHouse: https://github.com/ClickHouse/ClickHouse Altinity: We’re hiring! https://www.altinity.com

...to reduce the amount of data we read Index Columns Part Rows match Table PARTITION BY expression Index Columns Sort columns Part on ORDER BY clause Skip indexes reduce data Part Sparse index selects rows to read

CLICKHOUSE MATERIALIZED VIEWS A SECRET WEAPON FOR HIGH PERFORMANCE - PowerPoint PPT Presentation

CLICKHOUSE MATERIALIZED VIEWS A SECRET WEAPON FOR HIGH PERFORMANCE ANALYTICS Robert Hodges -- Percona Live 2018 Amsterdam Introduction to Presenter Robert Hodges - Altinity CEO www.altinity.com 30+ years on DBMS plus Leading software and

ClickHouse for Time-Series Alexander Zaitsev Agenda What is special about time series What is

ClickHouse Deep Dive Aleksei Milovidov ClickHouse use cases A stream of events Actions of

Low Cost Transactional and Analytics with MySQL + Clickhouse Have your Cake and Eat it Too

Autonomous ETL With Materialized Views Abhishek Somani, Adesh Rao May 2018 Agenda 1. Standard

Lazy Maintenance of Materialized Views Jingren Zhou, Microsoft Research, USA Paul Larson,

Efficient Maintenance of Materialized Top- k Views Ke Yi, Hai Yu, Jun Yang Dept. of Computer

Views 2 Designing the user interface Roy Scholten hi Views . Views 2 Views 2 have you heard

Dont forget materialized views Stephanie Baltus Sr. Software engineer 1 About 2 A bit of

Easy Freshness with Pequod Cache Joins Bryan Kate, Eddie Kohler, Mike Kester Harvard University

An Evolutionary Approach to Materialized Views Selection in a Data Warehouse Environment by

Securing Materialized Views: a Rewriting-Based Approach Sarah Nait Bahloul, Emmanuel Coquery and

Structured Materialized Views for XML Queries Andrei Arion 1 , 2 eronique Benzaken 2 V Ioana

Views 1 Views A view is a relation defined in terms of stored tables (called base tables )

Answering Queries Using Answering Queries Using Materialized view: result set is stored

ClickHouse In Real Life Case Studies and Best Practices Alexander Zaitsev, LifeStreet/Altinity

Clickhouse at MessageBird Analysing billions of events in real-time* Aleksandar Aleksandrov &

-Wall found compilation errors Aditya Kumar Facebook Enable compiler warnings with -Wall

How does interaction with a smartphone vary when walking? RESEARCH QUESTIONS 1. Does

Home-Based vs Supervised Exercise Program for PAD: What Works Best? Andrew W. Gardner, Ph.D.

WALKING VS. CONFORMAL RESULTS FROM THE SCHR ODINGER FUNCTIONAL METHOD B. Svetitsky Tel

STATE LAW ENFORCEMENT TRENDS & LEGISLATION AMBER WIDGERY | JULY 2020 NATIONAL CONFERENCE OF

DCS/CSCI 2350: Social & Economic Networks Matching Markets Readings: Ch. 10 of EK &

Taxing Top Incomes in a World of Ideas Chad Jones September 2019 0 / 47 The Saez (2001)

Measuring Ex-Ante Welfare in Insurance Markets Nathaniel Hendren

CLICKHOUSE MATERIALIZED VIEWS A SECRET WEAPON FOR HIGH PERFORMANCE - PowerPoint PPT Presentation

CLICKHOUSE MATERIALIZED VIEWS A SECRET WEAPON FOR HIGH PERFORMANCE ANALYTICS Robert Hodges -- Percona Live 2018 Amsterdam Introduction to Presenter Robert Hodges - Altinity CEO www.altinity.com 30+ years on DBMS plus Leading software and

ClickHouse for Time-Series Alexander Zaitsev Agenda What is special about time series What is

ClickHouse Deep Dive Aleksei Milovidov ClickHouse use cases A stream of events Actions of

Low Cost Transactional and Analytics with MySQL + Clickhouse Have your Cake and Eat it Too

Autonomous ETL With Materialized Views Abhishek Somani, Adesh Rao May 2018 Agenda 1. Standard

Lazy Maintenance of Materialized Views Jingren Zhou, Microsoft Research, USA Paul Larson,

Efficient Maintenance of Materialized Top- k Views Ke Yi, Hai Yu, Jun Yang Dept. of Computer

Views 2 Designing the user interface Roy Scholten hi Views . Views 2 Views 2 have you heard

Dont forget materialized views Stephanie Baltus Sr. Software engineer 1 About 2 A bit of

Easy Freshness with Pequod Cache Joins Bryan Kate, Eddie Kohler, Mike Kester Harvard University

An Evolutionary Approach to Materialized Views Selection in a Data Warehouse Environment by

Securing Materialized Views: a Rewriting-Based Approach Sarah Nait Bahloul, Emmanuel Coquery and

Structured Materialized Views for XML Queries Andrei Arion 1 , 2 eronique Benzaken 2 V Ioana

Views 1 Views A view is a relation defined in terms of stored tables (called base tables )

Answering Queries Using Answering Queries Using Materialized view: result set is stored

ClickHouse In Real Life Case Studies and Best Practices Alexander Zaitsev, LifeStreet/Altinity

Clickhouse at MessageBird Analysing billions of events in real-time* Aleksandar Aleksandrov &amp;

-Wall found compilation errors Aditya Kumar Facebook Enable compiler warnings with -Wall

How does interaction with a smartphone vary when walking? RESEARCH QUESTIONS 1. Does

Home-Based vs Supervised Exercise Program for PAD: What Works Best? Andrew W. Gardner, Ph.D.

WALKING VS. CONFORMAL RESULTS FROM THE SCHR ODINGER FUNCTIONAL METHOD B. Svetitsky Tel

STATE LAW ENFORCEMENT TRENDS &amp; LEGISLATION AMBER WIDGERY | JULY 2020 NATIONAL CONFERENCE OF

DCS/CSCI 2350: Social &amp; Economic Networks Matching Markets Readings: Ch. 10 of EK &amp;

Taxing Top Incomes in a World of Ideas Chad Jones September 2019 0 / 47 The Saez (2001)

Measuring Ex-Ante Welfare in Insurance Markets Nathaniel Hendren

Clickhouse at MessageBird Analysing billions of events in real-time* Aleksandar Aleksandrov &

STATE LAW ENFORCEMENT TRENDS & LEGISLATION AMBER WIDGERY | JULY 2020 NATIONAL CONFERENCE OF

DCS/CSCI 2350: Social & Economic Networks Matching Markets Readings: Ch. 10 of EK &