How to visually spot and analyze slow MongoDB operations Kay Agahd idealo internet GmbH
About me ● Current location: Berlin/Germany ● Education: Engineer's degree, Software Engineering ● Experience: ○ Software developer from 1998 - 2009 in Paris/France ○ Database engineer since 2012 at idealo in Berlin/Germany ● Certifications: ○ MongoDB certified Java developer (final grade 100%) ○ MongoDB certified DBA (final grade 96%) linkedin.com/in/kayag/ 2
About idealo ● founded in 2000 ● Europe’s leading price comparison website and app ● Germany, Austria, United Kingdom, France, Italy and Spain ● > 1 billion offers online (November 2018) ● fast growing 3
idealo and MongoDB ● different types of databases: Oracle, MySQL, PostgreSQL, MongoDB ● MongoDB in production since v1.6 (ca. 2011) ● sharding in production since MongoDB v1.8 ● MongoDB stores mainly offers for back-end usage ● > 2 billions docs in offerStore, up to 1 bn both read/writes per day ● > 10 billions docs in offerHistory 4
Motivation Why we need a better MongoDB profiler
Review profiling ● MongoDB supports profiling of “slow” operations ● “slow” is a threshold to be set when turning profiling on (100 ms) ● profiler writes collected data to a capped collection of the profiled database ● profiling per-database or per-instance on a running mongod 6
Inconveniences ● each mongod and database needs to be handled separately ● sharding: shards * repl. factor * databases = #profilers ● gives only a view on a limited time span due to capped collection ● profiling/analyzing may add stress to the profiled server ● different formats of “query” field makes querying more difficult ● bug: ops through mongos omit the user (JIRA: SERVER-7538) 7
idealo requirements ● easily switch on/off profiling, even for many mongod’s involved ● quick overview of types of slow-ops and their quantity within a time period (“types” means op type, user, server, query shape, etc.) ● historical view to see how slow-ops evolve to extrapolate them ● discovering spikes in time or in slow-op types ● filtering by slow-op types and/or time range to drill down ● usable also by non-database-admins, e.g. software developers 8
What we’ve built MongoDB slow operations profiler
How it works mongod 1 slow ops app DB 1 profiler 1..n collector DB DB 2 DB n mongod n profiler n..m DB n1 DB n2 DB nm 10
Example of slow op document { "shardVersion": [ "nreturned": 185, "op": "query", Timestamp(14944, 25276), "responseLength": 119954, "ns": "offerStore.offer", ObjectId("591c6...8fcde") "protocol": "op_command", "query": { ] "millis": 4189, "find": "offer", }, "execStats": { "filter": { "keysExamined": 2210852, "stage": "PROJECTION", "shopId": 292731, "docsExamined": 232, "nReturned": 185, "opIds": { "cursorExhausted": true, "executionTimeMillisEstimate": 3941, "$in": [ "keyUpdates": 0, "works": 2210853, 29337,5478 "writeConflicts": 0, "advanced": 185, ] ... 124 lines omitted ... "numYield": 17272, }, "locks": { "offerStatus": { }, "Global": { "$gt": 0 "ts": ISODate("2018-10-26T07:17:12.747Z"), "acquireCount": { } "client": "10.135.128.219", "r": }, "allUsers": [ NumberLong("34546")}}, "projection": { { "Database": { "traceId": 1, "user": "__system", "acquireCount": { "bokey": 1, "db": "local" "r": "version": 1, } NumberLong("17273")}}, "offerTitle": 1 ], "Collection": { }, "user": "__system@local" "acquireCount": { "batchSize":1000, } "r": 11 ... NumberLong("17273")}} },
Condense slow op documents { "_id":ObjectId("5bd3090b68b5c4203f53ce7e"), "ts":ISODate("2018-10-26T12:31:07.752Z"), "adr":"host523.idealo.de", "lbl":"offerStoreDE", "rs":"offerStoreDE09", "db":"offerStore", "col":"offers" "op":"getmore", "fields":["shopId", "opIds.$in", "offerStatus.$gt"], "sort":["_id"], "nret":500, "reslen":94656, "millis":5322, "user":"__system@local" } 12
Some numbers of the collector DB ● > 250 millions slow ops stored within the last > 5 years ● average doc size = 238 Bytes ● uncompressed data size ca. 55 GB ● index size < 9 GB ● total storage size (snappy compression) < 12 GB 13
Configuration ● Collector { "collector":{ "hosts":["myCollectorHost_member1:27017", "myCollectorHost_member2:27017", "myCollectorHost_member3:27017"], "db":"profiling", "collection":"slowops", "adminUser":"", "adminPw":"" }, ... 14
Configuration ● databases to be profiled "profiled":[ { "label":"dbs foo", "hosts":["someHost1:27017", "someHost2:27017", "someHost3:27017"], "ns":["someDB.someCollection", "anotherDB.*"], "enabled": false }, { "label":"dbs bar", "hosts":["someMongoRouter1:27017","someMongoRouter2:27017"], "ns":["*.*"], "adminUser":"kay", "adminPw":"never.tell.it!:-)", "enabled":false, "slowMS":500, "responseTimeoutInMs":2000 } ],... 15
How it looks 16
How it looks - part 2 17
Slow ops diagram 2018/10/30 10:04 = Count:95 db=offerStore coll=offers op=query fields=[shopId,mCC.$gt] Duration: avg:322 max:990 sum:31.682 ms 18
Slow ops data table 19
Slow ops search form 20
Further benefits ● global collector allows to see evolution of slow ops ● aggregate slow ops of last minute grouped by label {$match: { ts: {$gt:from, $lt:to}} }, {$group: {_id: {label:"$lbl"}, count:{$sum:1}, sumMs:{$sum:"$millis"}, maxMs:{$max:"$millis"}, sumNret:{$sum:"$nret"}, sumResplen:{$sum:"$reslen"} } } ● send metrics i.e. count, durations and nReturned to graphite ● build grafana dashboard 21
Historical view per DBS 22
Further documentation ● Open source project, you are welcome to contribute: https://github.com/idealo/mongodb-slow-operations-profiler ● Blog post 1 of 2: https://medium.com/idealo-tech-blog/how-to-visually-spot- and-analyze-slow-mongodb-operations-d91ac819e0de ● Blog post 2 of 2: https://medium.com/idealo-tech-blog/mongodb-slow- operations-analyzer-2-0-24da414fad13 We are hiring: jobs.idealo.de 23
Rate My Session 24
Recommend
More recommend