when it all goes wrong
play

When it all Goes Wrong @leinweber Will Leinweber @leinweber Citus - PowerPoint PPT Presentation

When it all Goes Wrong @leinweber Will Leinweber @leinweber Citus Data (Microsoft) bitfission.com (warning autoplays midi) @leinweber coming from citus cloud heroku postgres @leinweber special thanks citus cloud dan farina


  1. When it all Goes Wrong

  2. @leinweber Will Leinweber @leinweber Citus Data (Microsoft) bitfission.com 
 (warning autoplays midi)

  3. @leinweber coming from citus cloud heroku postgres

  4. @leinweber special thanks citus cloud 
 — dan farina (@danfarina) heroku postgres 
 — maciek sakrejda (@uhoh_itsmaciek)

  5. @leinweber same sorts of problems from pages & alerts from support tickets

  6. @leinweber this talk more app dev who uses postgres 
 rather than dba

  7. @leinweber the problem with Postgres it’s pretty good you don’t get experience with how it breaks

  8. @leinweber what to do for a problem

  9. @leinweber what to do for a problem

  10. @leinweber complicated system network hardware o/s postgres

  11. @leinweber using the database (too much) 95% application 4% auto vacuum 1% everything else

  12. @leinweber hard to convince all the graphs saying DB is slow and nothing has changed …must be the database!

  13. @leinweber https://upload.wikimedia.org/wikipedia/commons/9/98/Survivorship-bias.png

  14. @leinweber “but I didn’t change anything” no deploys! no database migrations! no scaling!

  15. @leinweber “but I didn’t change anything” https://upload.wikimedia.org/wikipedia/commons/0/09/Redherring.gif

  16. @leinweber “but I didn’t change anything” more tra ffi c? change in access patterns? one big user logged in?

  17. @leinweber run out of a resource

  18. @leinweber snowball

  19. @leinweber example manageable user 1s query => 2x expensive frequent, small queries 3ms => 12ms

  20. @leinweber assumptions app maintenance hardware

  21. @leinweber assumptions postgres should not crash …with overcommit o ff and no containers large extensions increase chance

  22. @leinweber if not postgres, then what

  23. @leinweber system resources cpu memory disk parallelism / backends locks

  24. @leinweber cpu cpu mem mem disk disk parallelism parallelism

  25. @leinweber cpu mem disk parallelism credentials wrong networking broken locking issue, check pg_locks idle in transaction

  26. @leinweber cpu mem disk parallelism application submitting backlogged workload connection leak pool sizes set too large pg_lock issue + application backlog

  27. @leinweber cpu mem disk parallelism workload skew causing thrashing unusual sequential scan workload failover or restart => no cache pg_prewarm

  28. @leinweber cpu mem disk parallelism same as just disk, but also the application is piling on

  29. @leinweber cpu mem disk parallelism large GROUP BY s high disk latency due to unusual page dispersion pattern in the workload

  30. @leinweber cpu mem disk parallelism workload has high mem ( GROUP BY ) 
 + app adding backlog lock contention slowing mem release

  31. @leinweber cpu mem disk parallelism large GROUP BY s + paging in unusual data

  32. @leinweber cpu mem disk parallelism Look for what is causing disk access

  33. @leinweber cpu mem disk parallelism small, in-memory workload lots of seq scans on small table index scan w/ filter dropping lots

  34. @leinweber cpu mem disk parallelism app backlog 
 + too much processing on small data simply a lot of work

  35. @leinweber cpu mem disk parallelism large seq scans

  36. @leinweber cpu mem disk parallelism loading cold data + application backlog

  37. @leinweber cpu mem disk parallelism small # of backends doing a lot more work

  38. @leinweber cpu mem disk parallelism entity, workload, entity*workload soft deletes and non-conditional indexes

  39. @leinweber cpu mem disk parallelism reporting query

  40. @leinweber cpu mem disk parallelism app backlog, but with CPU/mem problems

  41. @leinweber tools of the trade

  42. @leinweber tools of the trade C symbols

  43. @leinweber tools of the trade: perf perf record -p <pid> && perf report

  44. @leinweber tools of the trade: perf perf top

  45. @leinweber tools of the trade: perf www.brendangregg.com/perf.html

  46. @leinweber tools of the trade: gdb gdb -batch -ex 'bt' -p <pid>

  47. @leinweber

  48. @leinweber

  49. @leinweber tools of the trade: iostat iostat -xm 10

  50. @leinweber tools of the trade: iotop

  51. @leinweber tools of the trade: htop

  52. @leinweber Tools of the trade: bwm-ng

  53. @leinweber tools of the trade: backends pgrep -lf postgres + grep + wc select * from pg_stat_activity

  54. @leinweber tools of the trade: pg_s_s select * from pg_stat_statements

  55. @leinweber tools of the trade: summary cpu mem disk parallelism network perf x gdb x iostat x iotop x htop x x bwm x pgrep x

  56. @leinweber what to do

  57. @leinweber what to do configuration change

  58. @leinweber what to do db change

  59. @leinweber what to do code change

  60. @leinweber flirting with disaster Velocity NY 2013: Richard Cook 
 "Resilience In Complex Adaptive Systems” Jens Rasmussen: 
 Risk management in a dynamic society: a modeling problem

  61. @leinweber flirting with disaster economic boundary

  62. @leinweber flirting with disaster economic boundary workload boundary

  63. @leinweber flirting with disaster economic boundary performance boundary workload boundary

  64. @leinweber flirting with disaster economic boundary error margin performance boundary workload boundary

  65. @leinweber flirting with disaster economic boundary performance boundary workload boundary

  66. @leinweber flirting with disaster economic boundary error margin performance boundary workload boundary

  67. @leinweber flirting with disaster economic boundary error margin performance boundary workload boundary

  68. @leinweber flirting with disaster Velocity NY 2013: Richard Cook 
 "Resilience In Complex Adaptive Systems” Jens Rasmussen: 
 Risk management in a dynamic society: a modeling problem

  69. @leinweber thank you Will Leinweber @leinweber citusdata.com

Recommend


More recommend