RelaTiont!aliModeli–iiint!gleiieries TiQuery ● Fint!dimeas Turement!Ts T SELECT – Specify time range TIME_ROUND(timestamp, 60), AVG(value) – Specify metric name FROM – Specify dimension value measurements WHERE ● AggregaTeidaTaipoint!Ts T timestamp BETWEEN '2015-01-01Z00:00:00' AND – Round to desired interval '2015-01-01Z01:00:00' AND name = – Group by that interval 'cpu.percent' AND dimensions @> – T ake average of all data '{"host": "dev-01"}'::JSONB points in that interval GROUP BY 1
RelaTiont!aliModeli–iPerformant!fideiAnt!alys Tis T Query Duration (seconds) Query Duration (seconds) Data Volume Time Range (M/rows) (seconds) 喘argeT:i<100ms T (QueryiDuraTiont!)
RelaTiont!aliModeli–iiint!gleiieries TiQueryi(vs Ti喘imeiRant!ge) 0.40 0.35 Query Duration (seconds) 0.30 0.25 0.20 3M Rows 2M Rows 0.15 1M Rows 0.10 0.05 0.00 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 Query Time Range (seconds)
RelaTiont!aliModeli–iiint!gleiieries TiQueryi(vs TiDaTaiVolume) 1.40 1.20 Query Duration (seconds) 1.00 0.80 0.60 0.40 0.20 0.00 0 1 2 3 4 5 6 7 8 9 10 Data Volume (M-rows)
RelaTiont!aliModeli-iAnt!alys Tis T ✔ QueryiTimeifxedi ✗ QueryiTimeis Tfidales Ti regardles Ts TiofiTimei lint!earlyiwiThidaTai rant!ge volume ✔ Ont!iTargeTifor ✗ Everyiqueryireads Ti <i~1Mirows T everyirow ✗ Full table scan
Int!dexint!g
Int!dexint!g ● 喘imes TTamps Tiareies Ts Tent!Tiallyiint!Tegers T ● Pos TTgreiQLihas Timant!yiint!dexiTypes T B喘REE,iHAiH,iBRIN,iGIN,iGIi喘 ● B喘REEiexfidellent!TiforiEqualiTyiant!diBeTweent!
Int!dexint!gi–iB喘REE 3 three 1 2 two 2 3 4 four 6 six 6 4 5 1 one 5 five 7 8 eight 8 7 seven Index Table
Int!dexint!gi–iB喘REE 3 three 1 2 two =7 2 3 4 four 6 six 6 4 5 1 one 5 five 7 8 eight 8 7 seven Index Table
Int!dexint!gi–iB喘REE 3 three 1 >= 6 2 two 2 <= 8 3 4 four 6 six 6 4 5 1 one 5 five 7 8 eight 8 7 seven Index Table
Int!dexint!gi–iiint!gleiieries TiQuery ● BE喘WEENipredifidaTe SELECT ● Elimint!aTes TihugeiporTiont!i TIME_ROUND(timestamp, 60), ofiTableifidont!Tent!Ts T AVG(value) FROM – High selectivity measurements WHERE ● Exfidellent!Tifidant!didaTeifori timestamp BETWEEN '2015-01-01Z00:00:00' AND int!dexifidreaTiont! '2015-01-01Z01:00:00' AND name = 'cpu.percent' AND dimensions @> '{"host": "dev-01"}'::JSONB GROUP BY 1
Int!dexint!gi-i喘imes TTamp ● ipefidifyiTableiToiint!dex ● ipefidifyiint!dexiType – Optional: BTREE is default ● ipefidifyifidolumnt!iToiint!dex CREATE INDEX ON measurements USING BTREE (timestamp);
Int!dexint!gi–iiint!gleiieries TiQueryi(vs TiDaTaiVolume) 1.40 1.20 1.00 Query Duration (seconds) 0.80 No Index 0.60 With Index 0.40 0.20 0.00 0 1 2 3 4 5 6 7 8 9 10 Data Volume (millions/rows)
Int!dexint!gi–iiint!gleiieries TiQueryi(vs TiDaTaiVolume) 0.025 0.020 Query Duration (seconds) 0.015 9000 8000 0.010 7000 0.005 0.000 0 1 2 3 4 5 6 7 8 9 10 Data Volume (millions/rows)
Int!dexint!gi–iiint!gleiieries TiQueryi(vs Ti喘imeiRant!ge,i10MiRows T) 0.10 0.09 0.08 0.07 Query Duration (seconds) 0.06 0.05 1 Metric 0.04 10 Metrics 0.03 0.02 0.01 0.00 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 Query Time Range (seconds)
Int!dexint!gi–iiint!gleiieries TiQueryi(vs Ti喘imeiRant!ge,i10MiRows T) 0.25 0.20 Query Duration (seconds) 0.15 1 Metric 10 Metrics 0.10 100 Metrics 0.05 0.00 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 Query Time Range (seconds)
Int!dexint!gi-iAnt!alys Tis T ✔ DaTaiVolume ✗ 喘imeiRant!ge ✗ Over 4000s (100 Metrics) ✔ T o 10M ✗ Nowiapparent!Tiqueryi ✔ 喘imeiRant!ge duraTiont!iint!fidreas Tes Tias Ti ✔ T o 9000s (10 Metrics) 喘imeiRant!geigrows T ✔ QueryiTimeis TTableias Ti ✗ Int!fidreas Tint!gint!umberiofi DaTaiVolumeiint!fidreas Tes T meTrifids Tidras TTifidallyi afefidTs TiqueryiduraTiont! ✗ Data for each uninteresting series must be fltered out
Int!dexint!gi–iiint!gleiieries TiQuery ● Moreiint!dexint!gd SELECT – name TIME_ROUND(timestamp, 60), AVG(value) – dimensions FROM measurements WHERE timestamp BETWEEN '2015-01-01Z00:00:00' AND '2015-01-01Z01:00:00' AND name = 'cpu.percent' AND dimensions @> '{"host": "dev-01"}'::JSONB GROUP BY 1
Int!dexint!gi–iAddiTiont!al ● CreaTeint!ewiint!dexes Tiont!i meas Turement!Ts TiTable ● ipefidifyi nt!ame CREATE INDEX ON measurements – Equality: Use BTREE USING BTREE ● ipefidifyi diment!s Tiont!s T (name); CREATE INDEX ON – Containment: Use GIN measurements – Find contents of JSON USING GIN (dimensions);
Int!dexint!gi–iieries TiQueryi(vs Ti喘imeiRant!ge,i10MiRows T,i100iMeTrifids T) 0.25 0.20 Query Duration (seconds) 0.15 Time & Metric 0.10 Time Index 0.05 0.00 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 Query Time Range (seconds)
Int!dexint!g ✗ Ohidear ✗ 喘ooimufidhiint!dexint!gifidant!ibeiharmful
Normalis TaTiont!
Normalis TaTiont! CREATE TABLE values ( timestamp TIMESTAMPTZ, value FLOAT8, CREATE TABLE measurements ( metric_id INT, timestamp TIMESTAMPTZ, value_meta JSON value FLOAT8, ); name VARCHAR, CREATE TABLE metrics ( dimensions JSONB, id SERIAL, value_meta JSON name VARCHAR, ); dimensions JSONB, UNIQUE (name, dimensions) );
Normalis TaTiont! ● Values Tis TToredibyiint!Tegeri id CREATE TABLE values ( – References entry in metric table timestamp TIMESTAMPTZ, – The name/dimensions for each value FLOAT8, metric are only stored once metric_id INT, – Eliminates repeated bulky data value_meta JSON in measurements table ); ● MeTrifidiTableidefnt!es Ti id CREATE TABLE metrics ( – SERIAL produces incrementing id SERIAL, integers to allot id values name VARCHAR, dimensions JSONB, – UNIQUE constraint is useful UNIQUE (name, dimensions) during normalisation ); Implicitly creates suitable index ●
Normalis TaTiont!i–iView ● Mimifidi meas Turement!Ts T CREATE VIEW measurements – Views can be queried in AS the same way as tables SELECT timestamp, ● Defnt!ediwiThiiELEC喘 value, – Query to run which name, dimensions, produces contents of view value_meta ● Joint!int!ormalis TediTables T FROM values ● Cant!ire-us Teis Tamei INNER JOIN queries Tias Tibefore metrics ON (metric_id = id);
Normalis TaTiont!i–iViewiInt!s TerT ● Cant!’Tiint!s TerTidaTaiint!Toi CREATE RULE measurements_insert views TibyidefaulT AS ON INSERT TO measurements DO INSTEAD ● Cant!is Tpefidifyiant!iafidTiont!i INSERT INTO values ( timestamp, Toiperformiont!iINiER喘 value, ● Int!s TerTiint!Toi values T metric_id, value_meta ● HelperiprofidedureiToi ) VALUES ( NEW.timestamp, allofidaTei meTrifid_id NEW.value, create_metric ( ● Normalis TaTiont!iis Ti NEW.name, NEW.dimensions), Trant!s Tparent!Tiforius Ter NEW.value_meta );
Normalis TaTiont!i–iMeTrifidiLookup ● iTorediprofidedure CREATE FUNCTION create_metric ( in_name VARCHAR, – T ake name/dimensions in_dims JSONB ) RETURNS INT LANGUAGE plpgsql AS $_$ – Returns metric_id DECLARE out_id INT; ● Fint!diexis TTint!gimeTrifid BEGIN SELECT id INTO out_id – Return existing id FROM metrics AS m WHERE m.name = in_name AND ● Ifint!ew,iThent!iINiER喘 m.dimensions = in_dims; IF NOT FOUND THEN INSERT INTO metrics – Allocates new id ("name", "dimensions") – Return the new id VALUES (in_name, in_dims) RETURNING id INTO out_id; END IF; RETURN out_id; END; $_$;
Normalis TaTiont!i-iInt!dexint!g ● 喘imes TTampiint!dex – Same as before ● Newiint!dexiont!imeTrifid_id CREATE INDEX ON values – Allow effjcient fltering of USING BTREE metrics during JOIN (timestamp); – Serves similar purpose to CREATE INDEX ON existing metric indexing values USING BTREE (metric_id);
Normalis TaTiont!i–iieries TiQueryi(vs Ti喘ime,i10MiRows T,i100iMeTrifids T) 0.25 0.20 Query Duration (seconds) 0.15 Normalised Denormalised (Time Index) 0.10 Denormalised (Extra Index) 0.05 0.00 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 Query Time Range (seconds)
Normalis TaTiont! ✔ Normalis TaTiont!i ✗ 喘heimeTrifidiint!dexint!gi elimint!aTedioverheadiofi s TTillidoes Tnt!’Tihaveiai addiTiont!alimeTrifidi pos TiTiveiefefidT int!dexint!g
Normalis TaTiont!i–iBiTmapiInt!dexiifidant! :02 time value metric time metric 2 :03 index index A 10:01 . 1 B 10:01 . 2 B C C 10:02 . 1 D D D 10:02 . 2 E F E 10:03 . 1 F H F 10:03 . 2 G 10:04 . 1 D H 10:04 . 2 F
Normalis TaTiont!i–iMulTi-Columnt!iInt!dexint!g time value metric :02 time 2 metric A 10:01 . 1 :03 index B 10:01 . 2 C 10:02 . 1 D D 10:02 . 2 F E 10:03 . 1 F 10:03 . 2 G 10:04 . 1 H 10:04 . 2
Normalis TaTiont!i–iMulTi-Columnt!iInt!dexint!g CREATE INDEX ON values CREATE INDEX ON USING BTREE values (timestamp, metric_id); USING BTREE (timestamp); CREATE INDEX ON values CREATE INDEX ON USING BTREE values (metric_id); USING BTREE (metric_id, timestamp);
Normalis TaTiont!i–iieries TiQueryi(vs TiRant!ge,i10MiRows T,i100iMeTrifids T) 0.25 0.20 Query Duration (seconds) 0.15 Normalised (Single Index) Normalised 0.10 Denormalised (Time Index) Denormalised (Extra Index) 0.05 0.00 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 Query Time Range (seconds)
Normalis TaTiont! ● Int!fidreas Teivolumei10MiToi100M – @ 1Hz / 100 Metrics: 1M seconds Before: ~1.15 days ● Now: ~11.5 days ● ● Int!fidreas TeimaxiTimeirant!ges Tifromi9000s TiToi90,000s T – Before: 2.5 hours – Now: 1.04 days
Normalis TaTiont!i–iieries TiQueryi(vs TiVolume,i10iMeTrifids T) 0.12 0.1 0.08 Query Duration (seconds) 0.06 10000 20000 30000 0.04 0.02 0 0 10 20 30 40 50 60 70 80 90 100 Data Volume (M-rows)
Normalis TaTiont!i–iieries TiQueryi(vs TiRant!ge,i100MiRows T) 0.16 0.14 0.12 Query Duration (seconds) 0.10 0.08 1 Metric 10 Metrics 0.06 0.04 0.02 0.00 0 10000 20000 30000 40000 50000 60000 70000 80000 90000 Query Time Range (seconds)
Normalis TaTiont!i–iieries TiQueryi(vs TiRant!ge,i100MiRows T) 0.90 0.80 0.70 0.60 Query Duration (seconds) 0.50 1 Metric 0.40 10 Metrics 100 Metrics 0.30 0.20 0.10 0.00 0 10000 20000 30000 40000 50000 60000 70000 80000 90000 Query Time Range (seconds)
Normalis TaTiont!i–iieries TiQueryi(vs TiRant!ge,i100MiRows T)i(+Cont!fg) 0.90 0.80 0.70 0.60 Query Duration (seconds) 0.50 1 Metric 0.40 10 Metrics 100 Metrics 0.30 0.20 0.10 0.00 0 10000 20000 30000 40000 50000 60000 70000 80000 90000 Query Time Range (seconds)
Normalis TaTiont!i-iAnt!alys Tis T ✗ 喘imeiRant!ge ✔ DaTaiVolume ✗ Over 30,000s (100 ✔ T o 100M Metrics) ✔ 喘imeiRant!ge ✗ Over 90,000s ✔ T o 90,000s (10 Metrics) ✗ NeediaibeTTeris TTraTegyi foris Tervifidint!gilargeri Timeirant!ges T
iummaris Tint!g
iummaris Tint!gi-iProblem ● Fori100imeTrifids T,is Tomeiqueries Timis Ts TiTargeTi100ms T – Over ~40Ks (~11 days) ● QueryimighTibeireTurnt!int!giupiToi40Kipoint!Ts T ● Is TiThis TiafidTuallyint!efides Ts Taryd – Especially if data is simply used for visualisation – An average 1080p monitor only has ~2000 pixels ● LeTs Tis Tayi4000ipoint!Ts Tiareient!ough,iorievent!i400
iummaris Tint!gi-iExample values values_2 time value metric time sum metric 10:00 10 1 10:00 30 10 30 1 10:00 2 2 10:00 8 2 8 2 10:01 20 1 10:02 20 20 5 1 10:01 6 2 10:02 4 5 5 2 10:02 5 1 10:02 4 2 ✔ iummaryiTableijus TTiai 10:03 15 1 frafidTiont!iofiTheis Tize 10:03 1 2
iummaris Tint!gi ● CreaTeivalues TiTable – Use for 10:1 summary CREATE TABLE values_10 ( timestamp TIMESTAMPTZ, ● Ont!eient!Try/Timeiperiod metric_id INT, – Per metric sum FLOAT8, count FLOAT8, – UNIQUE provide indexing min FLOAT8, ● MulTipleiaggregaTes T max FLOAT8, – SUM UNIQUE (metric_id, – COUNT timestamp) ); – MIN – MAX
iummaris Tint!g ● CreaTeiaiviewias Tibefore – Only storing metric_id ● iimplifes Tiqueries T CREATE VIEW summary_10 AS ● Joint!s TimeTrifididefnt!iTiont!s T SELECT * FROM values_10 INNER JOIN metrics ON (metric_id = id);
iummaris Tint!gi–i喘riggeriDefnt!iTiont! ● BoilerplaTe ● Defnt!eiTriggerifunt!fidTiont! CREATE FUNCTION summarise_10 () – Stored procedure RETURNS TRIGGER LANGUAGE plpgsql AS $_$ – Contents omitted BEGIN : ● 喘riggeriToiexefiduTe… END; $_$; – On INSERT CREATE TRIGGER summarise_10_t – T o values table AFTER INSERT ON values FOR EACH ROW – Data passed to procedure EXECUTE PROCEDURE summarise_10 ();
iummaris Tint!gi–i喘riggeriAfidTiont! ● Int!s TerTiint!Tois Tummary INSERT INTO values_10 VALUES ( TIME_ROUND(NEW.timestamp, 10), NEW is inserted data – NEW.metric_id, ● Rount!diTimeiToiperiod NEW.value, 1, 10 seconds NEW.value, – NEW.value ● Int!iTialiaggregaTeivalues T ) ON CONFLICT (metric_id, ● Ifient!Tryiexis TTs Tialready timestamp) DO UPDATE SET ● UpdaTeiint!s TTead sum = sum + EXCLUDED.sum, count = count + EXCLUDED.count, EXCLUDED is current row – min = LEAST (min,EXCLUDED.min), max = GREATEST(max,EXCLUDED.max) Combine new value with – ; existing aggregate value
iummaris Tint!gi–iiint!gleiieries TiQuery ● Mos TTlyiunt!fidhant!ged SELECT ● Queryis TummaryiTable,i TIME_ROUND(timestamp, 60), nt!oTirawimeas Turement!Ts T (SUM(sum) / SUM(count)) AS avg FROM ● HaveiToiaggregaTeiThei summary_10 WHERE parTialiaggregaTiont!s T timestamp BETWEEN '2015-01-01Z00:00:00' AND MIN: MIN(min) – '2015-01-01Z01:00:00' AND name = MAX: MAX(max) – 'cpu.percent' AND dimensions @> SUM: SUM(sum) – '{"host": "dev-01"}'::JSONB COUNT: SUM(count) GROUP BY – 1 AVG: SUM(sum)/SUM(count) –
iummaris Tint!gi–iieries TiQueryi(vs TiRant!ge,i100MiRows T) 0.12 0.10 0.08 Query Duration (seconds) 0.06 1 Metric 10 Metrics 100 Metrics 0.04 0.02 0.00 0 10000 20000 30000 40000 50000 60000 70000 80000 90000 Query Time Range (seconds)
iummaris Tint!gi–iieries TiQueryi(vs TiRant!ge,i100MiRows T) 0.12 0.10 0.08 Query Duration (seconds) 1 Metric 10 Metrics 0.06 100 Metrics 1 Metric 10 Metrics 0.04 100 Metrics 0.02 0.00 0 10000 20000 30000 40000 50000 60000 70000 80000 90000 Query Time Range (seconds)
iummaris Tint!g ● Int!fidreas TeivolumeiToi1BN – @ 1Hz / 100 Metrics: 10M seconds – Before: 11.5 days – Now: ~115 days : 16½ weeks ● Int!fidreas TeimaxiTimeirant!ges Tifromi90Ks TiToi900Ks T – Before: ~1.04 days – Now: ~10.4 days
iummaris Tint!gi–iieries TiQueryi(vs TiVolume;i100M-1BN) 0.120 0.100 0.080 Query Duration (seconds) 0.060 100000 200000 300000 0.040 0.020 0.000 0 100 200 300 400 500 600 700 800 900 1000 Data Volume (M-rows)
iummaris Tint!gi–iieries TiQueryi(vs TiRant!ge,i1BNiRows T) 0.12 0.1 0.08 Query Duration (seconds) 0.06 0.04 0.02 0 0 100000 200000 300000 400000 500000 600000 700000 800000 900000 Query Time Range (seconds)
iummaris Tint!g ✔ DaTaiVolume ✔ T o 1BN – ~16 weeks ✔ 喘imeiRant!ge ✔ T o ~10 days ✔ 喘ois TfidaleifurTherdi喘ryi100:1is Tummary
Clos Tint!giNoTes T
Recommend
More recommend