how to visually spot and analyze slow mongodb operations
play

How to visually spot and analyze slow MongoDB operations Kay Agahd - PowerPoint PPT Presentation

How to visually spot and analyze slow MongoDB operations Kay Agahd idealo internet GmbH About me Current location: Berlin/Germany Education: Engineer's degree, Software Engineering Experience: Software developer from 1998 - 2009


  1. How to visually spot and analyze slow MongoDB operations Kay Agahd idealo internet GmbH

  2. About me ● Current location: Berlin/Germany ● Education: Engineer's degree, Software Engineering ● Experience: ○ Software developer from 1998 - 2009 in Paris/France ○ Database engineer since 2012 at idealo in Berlin/Germany ● Certifications: ○ MongoDB certified Java developer (final grade 100%) ○ MongoDB certified DBA (final grade 96%) linkedin.com/in/kayag/ 2

  3. About idealo ● founded in 2000 ● Europe’s leading price comparison website and app ● Germany, Austria, United Kingdom, France, Italy and Spain ● > 1 billion offers online (November 2018) ● fast growing 3

  4. idealo and MongoDB ● different types of databases: Oracle, MySQL, PostgreSQL, MongoDB ● MongoDB in production since v1.6 (ca. 2011) ● sharding in production since MongoDB v1.8 ● MongoDB stores mainly offers for back-end usage ● > 2 billions docs in offerStore, up to 1 bn both read/writes per day ● > 10 billions docs in offerHistory 4

  5. Motivation Why we need a better MongoDB profiler

  6. Review profiling ● MongoDB supports profiling of “slow” operations ● “slow” is a threshold to be set when turning profiling on (100 ms) ● profiler writes collected data to a capped collection of the profiled database ● profiling per-database or per-instance on a running mongod 6

  7. Inconveniences ● each mongod and database needs to be handled separately ● sharding: shards * repl. factor * databases = #profilers ● gives only a view on a limited time span due to capped collection ● profiling/analyzing may add stress to the profiled server ● different formats of “query” field makes querying more difficult ● bug: ops through mongos omit the user (JIRA: SERVER-7538) 7

  8. idealo requirements ● easily switch on/off profiling, even for many mongod’s involved ● quick overview of types of slow-ops and their quantity within a time period (“types” means op type, user, server, query shape, etc.) ● historical view to see how slow-ops evolve to extrapolate them ● discovering spikes in time or in slow-op types ● filtering by slow-op types and/or time range to drill down ● usable also by non-database-admins, e.g. software developers 8

  9. What we’ve built MongoDB slow operations profiler

  10. How it works mongod 1 slow ops app DB 1 profiler 1..n collector DB DB 2 DB n mongod n profiler n..m DB n1 DB n2 DB nm 10

  11. Example of slow op document { "shardVersion": [ "nreturned": 185, "op": "query", Timestamp(14944, 25276), "responseLength": 119954, "ns": "offerStore.offer", ObjectId("591c6...8fcde") "protocol": "op_command", "query": { ] "millis": 4189, "find": "offer", }, "execStats": { "filter": { "keysExamined": 2210852, "stage": "PROJECTION", "shopId": 292731, "docsExamined": 232, "nReturned": 185, "opIds": { "cursorExhausted": true, "executionTimeMillisEstimate": 3941, "$in": [ "keyUpdates": 0, "works": 2210853, 29337,5478 "writeConflicts": 0, "advanced": 185, ] ... 124 lines omitted ... "numYield": 17272, }, "locks": { "offerStatus": { }, "Global": { "$gt": 0 "ts": ISODate("2018-10-26T07:17:12.747Z"), "acquireCount": { } "client": "10.135.128.219", "r": }, "allUsers": [ NumberLong("34546")}}, "projection": { { "Database": { "traceId": 1, "user": "__system", "acquireCount": { "bokey": 1, "db": "local" "r": "version": 1, } NumberLong("17273")}}, "offerTitle": 1 ], "Collection": { }, "user": "__system@local" "acquireCount": { "batchSize":1000, } "r": 11 ... NumberLong("17273")}} },

  12. Condense slow op documents { "_id":ObjectId("5bd3090b68b5c4203f53ce7e"), "ts":ISODate("2018-10-26T12:31:07.752Z"), "adr":"host523.idealo.de", "lbl":"offerStoreDE", "rs":"offerStoreDE09", "db":"offerStore", "col":"offers" "op":"getmore", "fields":["shopId", "opIds.$in", "offerStatus.$gt"], "sort":["_id"], "nret":500, "reslen":94656, "millis":5322, "user":"__system@local" } 12

  13. Some numbers of the collector DB ● > 250 millions slow ops stored within the last > 5 years ● average doc size = 238 Bytes ● uncompressed data size ca. 55 GB ● index size < 9 GB ● total storage size (snappy compression) < 12 GB 13

  14. Configuration ● Collector { "collector":{ "hosts":["myCollectorHost_member1:27017", "myCollectorHost_member2:27017", "myCollectorHost_member3:27017"], "db":"profiling", "collection":"slowops", "adminUser":"", "adminPw":"" }, ... 14

  15. Configuration ● databases to be profiled "profiled":[ { "label":"dbs foo", "hosts":["someHost1:27017", "someHost2:27017", "someHost3:27017"], "ns":["someDB.someCollection", "anotherDB.*"], "enabled": false }, { "label":"dbs bar", "hosts":["someMongoRouter1:27017","someMongoRouter2:27017"], "ns":["*.*"], "adminUser":"kay", "adminPw":"never.tell.it!:-)", "enabled":false, "slowMS":500, "responseTimeoutInMs":2000 } ],... 15

  16. How it looks 16

  17. How it looks - part 2 17

  18. Slow ops diagram 2018/10/30 10:04 = Count:95 db=offerStore coll=offers op=query fields=[shopId,mCC.$gt] Duration: avg:322 max:990 sum:31.682 ms 18

  19. Slow ops data table 19

  20. Slow ops search form 20

  21. Further benefits ● global collector allows to see evolution of slow ops ● aggregate slow ops of last minute grouped by label {$match: { ts: {$gt:from, $lt:to}} }, {$group: {_id: {label:"$lbl"}, count:{$sum:1}, sumMs:{$sum:"$millis"}, maxMs:{$max:"$millis"}, sumNret:{$sum:"$nret"}, sumResplen:{$sum:"$reslen"} } } ● send metrics i.e. count, durations and nReturned to graphite ● build grafana dashboard 21

  22. Historical view per DBS 22

  23. Further documentation ● Open source project, you are welcome to contribute: https://github.com/idealo/mongodb-slow-operations-profiler ● Blog post 1 of 2: https://medium.com/idealo-tech-blog/how-to-visually-spot- and-analyze-slow-mongodb-operations-d91ac819e0de ● Blog post 2 of 2: https://medium.com/idealo-tech-blog/mongodb-slow- operations-analyzer-2-0-24da414fad13 We are hiring: jobs.idealo.de 23

  24. Rate My Session 24

Recommend


More recommend