Databases Have Forgotten About Single Node Performance, A Wrongheaded Trade Off Matvey Arye, PhD Core Database Engineer mat@timescale.com · github.com/timescale
How computing used to look
How computing looks today
What happened? @timescale
Databases changed from scaling up to scaling out
We entered the MapReduce Era
But… MapReduce • Grep: 150 Seconds specializes in • Sort: 890 Seconds long jobs that • Average job completion time of touch lots of data Google production workloads: 634 Seconds (Source: Google MapReduce Paper)
Why the high-latency?
Network Latencies • Connection setup/teardown • TCP ramp-up • Network transfer
Consensus is Expensive • Dreaded two-phase commit • Snapshot Isolation even harder • Co-odinators often bottlenecks
The “Straggler” Problem
The high-latency of distributed databases is nothing new Graph Connected Component Analysis Page Rank (20 iterations) Core Twitter_r Core Twitter_rv uk_2007_5 uk_2007_5 s v s Spark 128 1784s 8000s+ Spark 128 857s 1759s Giraph 128 200s 8000s+ Giraph 128 596s 1235s GraphLab 128 242s 714s 249s GraphLab 128 833s Graphx 128 251s 800s Graphx 128 419s 462s Laptop 1 153s 417s Laptop 1 300s 651s Laptop* 1 15s 30s Source: Scalability! But at what COST? - McSherry et al.
But that was 10 years ago The world has changed Then Now 160 GB Spinning Rust 60 TB SSDs 2 Core CPU 64 Core CPU 4GB RAM TBs of RAM Edge computing GPUs, TPUs
Financial & Datacenter And data Marketing & DevOps needs have become real-time. Web / Mobile Events Transportation & Logistics Industrial Machines
@timescale
Shorter latency requirements mean we need to focus again on scaling up
So how do we improve single-node performance ? (Lessons from TimescaleDB)
Postgres 9.6.2 on Azure standard DS4 v2 (8 cores), SSD (premium LRS storage) Each row has 12 columns (1 timestamp, indexed 1 host ID, 10 metrics)
Insight #1 Partition, even on a single machine.
Older
Older
Older
Older
Partitioning on a single node. Hypertable Time Chunk (sub-table) (older) Space
Chunks are “right-sized” Recent (hot) chunks fit in memory
TimescaleDB vs. PostgreSQL (single-row inserts) 144K METRICS / S 14.4K INSERTS / S TimescaleDB 0.5, Postgres 9.6.2 on Azure standard DS4 v2 (8 cores), SSD (LRS storage) Each row has 12 columns (1 timestamp, indexed 1 host ID, 10 metrics)
TimescaleDB vs. PostgreSQL ( batch inserts) 1.11M METRICS / S >20x TimescaleDB 0.5, Postgres 9.6.2 on Azure standard DS4 v2 (8 cores), SSD (LRS storage) Each row has 12 columns (1 timestamp, indexed 1 host ID, 10 metrics)
Other benefits to partitioning
Avoid querying chunks via constraint exclusion SELECT time, device_id, temp FROM data WHERE time > ‘2017-08-22 18:18:00+00’
Efficient retention policies Drop chunks, don’t delete rows ⇒ avoids vacuuming
Insight #2 Partition, even on a single machine across many disks
Single node: Scaling up via adding disks How Benefit Faster inserts • Chunks spread across many disks (elastically!) Parallelized queries either RAIDed or via distinct tablespaces •
Elasticity on a single node
Insight #3 Query Optimizations
Avoid querying chunks via constraint exclusion (more) SELECT time, device_id, temp FROM data Won’t exclude WHERE time > now() - interval ’24 hours’ chunks in plain PostgreSQL
Example Query Optimization CREATE INDEX ON readings(time); Will this use SELECT date_trunc(‘minute’, time) as bucket, avg(cpu) the index? FROM readings GROUP BY bucket ORDER BY bucket DESC LIMIT 10;
Example Query Optimization CREATE INDEX ON readings(time); Timescale SELECT date_trunc(‘minute’, time) as bucket, understands avg(cpu) FROM readings time GROUP BY bucket ORDER BY bucket DESC LIMIT 10;
4/20/2018 out.full.10concurrency.cache.withwhere.svg Flame Graph How to find bottlenecks? po.. post.. post.. libsys.. post.. postgr.. libsystem_kerne.. postgr.. libsystem_kernel.d.. postgr.. li.. postgres`LruInsert postgr.. postgres`LruInsert postgres`FileAccess po.. postgr.. po.. postgres`FileAccess li.. postgres`FileSeek po.. po.. postgr.. post.. lib.. postgres`FileSeek postgres`_mdnblocks po.. po.. postgr.. postgres`op. postgres`_mdnblocks postgres`mdnblocks po.. pos.. postgr.. postgres`pr. postgres`mdnblocks postgres`smgrnblocks po.. pos.. postgr.. postgres`pr. postgres`smgrnblocks postgres`RelationGetNumbe.. postgr.. po.. pos.. postgr.. postgres`pr. postgres`RelationGetNumberOfBlock.. postgres`estimate_rel_size po.. postgres`get_relat.. postgres`pr. postgres`get_relation_info postgres`relation_excluded_by_constr.. postgres`build_simple_rel postgres`set_append_rel_size postgres`build_simple_rel postgres`set_rel_size po.. postgres`add_base_rels_to_query postgres`set_base_rel_sizes pos.. postgres`add_base_rels_to_query postgres`make_one_rel postgres`.. postgres`query_planner postgres`.. postgres`grouping_planner postgres`subquery_planner postgres`standard_planner timescaledb0.10.0dev.so`timescaledb_planner postgres`planner postgres`pg_plan_query postgres`ExplainOneQuery postgres`ExplainQuery postgres`standard_ProcessUtility timescaledb0.10.0dev.so`prev_ProcessUtility timescaledb0.10.0dev.so`timescaledb_ddl_command_start po.. postgres`ProcessUtility pos.. postgres`PortalRunUtility pos.. postgres`FillPortalStore pos.. postgres`PortalRun postgres`exec_simple_query postgres`PostgresMain postgres`0x10f97ed8e postgres`BackendStartup postgres`ServerLoop postgres`PostmasterMain postgres`0x10f89e519 libdyld.dylib`start postgres`0x10 Function: postgres`standard_planner (110 samples, 90.91%) file:///Users/arye/Downloads/out.full.10concurrency.cache.withwhere.svg 1/1
Insight #4 HA != Horizontally-distributed
Single Node + Replicas = High-availability
Replication much cheaper than horizontal- • Avoids consensus penalty distribution • Can more easily be asynchronous • Still provides HA
Results
Results vs Cassandra (5 nodes): Query Timescale Cassandra Type Scanned (h) Devices metrics query time (ms) Multiplier groupby-hour 24 all 1 27866.50 9.66 groupby-hour 24 all 5 35827.60 37.52 groupby-hour 24 all 10 44912.30 45.05 high-cpu 24 all 1 49,807 32.95 cpu-max 12 8 10 126.90 11.84 cpu-max 12 1 10 20.54 37.41 high-cpu 24 1 1 56 18.65 groupby-minute 12 1 1 33.80 0.91 groupby-minute 1 8 1 23.50 1.97 groupby-minute 12 1 5 27.60 10.25 groupby-minute 1 8 5 20.40 17.90 lastpoint all all 10 266.10 2285.30 groupby-orderby-limit all all 1 75.20 3181.62 Higher multiplier indicates worse Cassandra performance.
TimescaleDB vs. Cassandra Insert metrics/s (5 nodes) Cassandra 150K Timescale 745K
Results vs Mongo: Query Timescale Mongo Type Scanned (h) Devices metrics query time (ms) Multiplier groupby-hour 24 all 1 29,968 7.63 groupby-hour 24 all 5 39,157 5.51 groupby-hour 24 all 10 49,058 5.50 high-cpu 24 all 1 51,323 5.24 cpu-max 12 8 10 260 1.14 cpu-max 12 1 10 28 1.32 high-cpu 24 1 1 95 1.17 groupby-minute 12 1 1 62 0.77 groupby-minute 1 8 1 28 1.95 groupby-minute 12 1 5 69 0.94 groupby-minute 1 8 5 26 2.17 lastpoint all all 10 453 101.72 groupby-orderby-limit all all 1 149 1667.25 Higher multiplier indicates worse Mango performance.
TimescaleDB vs. Mongo Insert metrics/s Mongo 807K Timescale 994K
Lessons for DB designers • Consider (insights!): • Concentrate on scale-up • Partitioning even on single-node • Even across disks • Performance analysis • High-level: Query optimization • Low-level: Profiling • High-availability is possible on single-node
Lessons for DB users • Absolute performance as important as scaling numbers. • Don’t go horizontally distributed unless you have to. • HA not same as horizontal scalability. • Replication cheaper than distribution. • SQL and ACID are extremely useful.
Open Source (Apache 2.0) • github.com/timescale/timescaledb Join the Community • slack.timescale.com
Timescale is hiring! Core Database Engineers • R&D Engineers • Solutions Engineers • Evangelists • Customer Success • careers.timescale.com
Recommend
More recommend