A Proactive approach to Monitoring Slow Queries Shashank Sahni ThousandEyes
Shashank Sahni Principal SRE @ThousandEyes Reach out to shashank@thousandeyes.com Twitter @shredder12 @shredder12 @ThousandEyes 2
MySQL @ThousandEyes Percona Xtradb Clusters Low Latency Transactional store 7k tx/sec @shredder12 @ThousandEyes 3
What is a Slow Query?
If exec_time > threshold Victims of Poorly Dealing with Database Designed a lot of data Congestion @shredder12 @ThousandEyes 5
Performance Impact
Database Performance Impact BAD WORSE WORST • Slow by Design • Steal resources from • Thrashing System Resources other queries • High latency for • Leaving DB unresponsive. applications issuing them Outage!! • DB congestion due to extended locks. @shredder12 @ThousandEyes 7
Slow Queries Always Sneak in Performance Human Mistake Schema Changes Degrades Over Time Bad Design Deleted Index Increasing table size @shredder12 @ThousandEyes 8
Proactively dealing with Slow Queries
Catch Early and Minimize Impact Continuous Automation Notification Process Faster analysis New changes are Notify application pushed everyday owner for faster resolution @shredder12 @ThousandEyes 10
Slow Query Pipeline @ThousandEyes
Detects new queries in under 20 min @shredder12 @ThousandEyes 12 12
Notifications • Notifies the application owner – MySQL user associated to app • JIRA Issues – Creates Issue for new queries – Raises priority if performance worsens – Reopens Issue if query shows up again @shredder12 @ThousandEyes 13
Select Top Offenders count > C && (time_sum > S || time_avg > A) Occurrence Total Exec Time Avg Exec Time • Avoids one off queries • Queries with high total exec_time • Queries with high avg exec_time • Captures Recurring queries • Filters queries which have small • Filters queries which have high avg but high count avg but small coun @shredder12 @ThousandEyes 14
Select Poorly Designed Queries full_scan_sum > 0 && Innodb_queue_wait_max < T Full Table Scan Queued or Executing • Not using Indexes • Filters query whose exec_time wasn’t spend in Storage engine’s queue. Their Execution was indeed slow. • Poor Joins • Set T to just under your slow query threshold • Rarely genuine long_query_time. @shredder12 @ThousandEyes 15
Architecture Anemometer pt-query-digest pushing analyzed logs from databases Slow Query Database SQN Jira Slow Query Notifier @shredder12 @ThousandEyes 16
pt-query-digest MySQL Query Analyzer from Percona • Great tool for ad hoc or batch analysis • Invaluable Insights Our Setup • Runs periodically on all DB nodes • Analyzes slow query logs and stores the data in a central slow query database. @shredder12 @ThousandEyes 17
Anemometer Slow Query Monitor from Box • Web utility to visualize and search through analyzed slow query data • Good Search and Deepdive features • Not under active development. • Github - box/anemometer @shredder12 @ThousandEyes 18
Slow Query Notifier SQN From ThousandEyes • In-house tool to monitor slow query events. • Configurable query properties to catch top offenders – JIRA workflow for notifications • create/reopen/prioritize • To be open-sourced @shredder12 @ThousandEyes 19
Lessons Learned
Target Top Offenders Happy Less Noise More Impact Developers! @shredder12 @ThousandEyes 21
Share Every Win Result of resolving Top 4 queries @shredder12 @ThousandEyes 22
Future Works
Future Works Open source Slow-Query-Notifier Support for notification backends – email etc. Support for Mongodb Evaluate PMM for Slow Query visualization 24 @shredder12 @ThousandEyes
25 @shredder12 @ThousandEyes
Thank You
Recommend
More recommend