production profiling what why and how
play

Production Profiling: What, Why and How Richard Warburton - PowerPoint PPT Presentation

Production Profiling: What, Why and How Richard Warburton (@richardwarburto) Sadiq Jaffer (@sadiqj) https://www.opsian.com Why Performance Matters Development isnt Production Profiling vs Monitoring Production Profiling Conclusion


  1. Production Profiling: What, Why and How Richard Warburton (@richardwarburto) Sadiq Jaffer (@sadiqj) https://www.opsian.com

  2. Why Performance Matters Development isn’t Production Profiling vs Monitoring Production Profiling Conclusion

  3. Customer Experience

  4. Responsive Applications make more Money Amazon: 100ms of latency costs 1% of sales Google: 500ms seconds in search page generation time drops traffic by 20%

  5. Stop Costly Downtime

  6. Reduce Costs

  7. Why Performance Matters Development isn’t Production Profiling vs Monitoring Production Profiling Conclusion

  8. Development isn’t Production Performance testing in development can be easier May not have access to production Tooling often desktop-based Not representative of production

  9. Unrepresentative Hardware vs

  10. Unrepresentative Software

  11. Unrepresentative Workloads vs

  12. The JVM may have very different behaviour in production Hotspot does adaptive optimisation Production may optimise differently

  13. Why Performance Matters Development isn’t Production Profiling vs Monitoring Production Profiling Conclusion

  14. Ambient/Passive/System Metrics Preconfigured numerical measure about the system CPU Time Usage / Page-load Times Cheap and sometimes effective

  15. Logging Records arbitrary events emitted by the system being monitored log4j/slf4j/logback Logs of GC events Often manual, aids system understanding, expensive

  16. Coarse Grained Instrumentation Measures time within some instrumented section of the code Time spent inside the controller layer of your web-app or performing SQL queries More detailed and actionable though expensive

  17. Production Profiling What methods use up CPU time? What lines of code allocate the most objects? Where are your CPU Cache misses coming from? Automatic, can be cheap but often isn’t

  18. Where Instrumentation can be blind in the Real World Problem: Every 5 seconds an HTTP endpoint would be really slow. Instrumentation: on the servlet request, didn’t even show the pause! Cause: Tomcat expired its resources cache every 5 seconds, on load one resource scanned the entire classpath

  19. Surely a better way? Not just Metrics - Actionable Insights Diagnostics aren’t Diagnosis What about Profiling?

  20. Why Performance Matters Development isn’t Production Profiling vs Monitoring Production Profiling Conclusion

  21. How to use Production Profilers 1) Extract relevant time period and apps/machines 2) Choose a type of profile: CPU Time/Wallclock Time/Memory 3) View results to tell you what the dominant consumer of a resource is 4) Fix biggest bottleneck 5) Deploy / Iterate

  22. CPU Time vs Wallclock Time

  23. Profiling Hotspots

  24. Profiling Treeviews

  25. Profiling Flamegraphs

  26. Instrumenting Profilers Add instructions to collect timings (Eg: JVisualVM Profiler) Inaccurate - modifies the behaviour of the program High Overhead - > 2x slower

  27. Sampling/Statistical Profilers new Person() Repo.readPerson() View.printHtml() ??? ??? Controller.doSomething() Controller.next() WebServerThread.run()

  28. Safepoint Bias after Inlining new Person() Repo.readPerson() View.printHtml() ??? Controller.doSomething() Controller.next() WebServerThread.run()

  29. Time to Safepoint VM Operation Threads -XX:+PrintSafepointStatistics Safepoint poll

  30. Advanced Statistical Profiling in Java OS Signals to interrupt threads on resource consumption threshold JVM’s signal handler-safe AsyncGetCallTrace to walk the stack

  31. People are put off by practical as much as technical issues

  32. Barriers to Ad-Hoc Production Profiling Generally requires access to production Process involves manual work - hard to automate Low-overhead open source profilers unsupported

  33. What if we profiled all the time?

  34. Historical Data Allows for post-hoc incident analysis Enables correlation with other data/metrics Performance regression analysis

  35. Putting Samples in Context Application version Environment parameters (machine type, CPU, location, etc.) Ad-hoc profiling we can’t do this

  36. Opsian - Continuous Profiling W e b R e p o r t s JVM Agents Opsian Aggregation service

  37. Summary We can profile in production with low overhead To overcome practical issues we can profile production all the time Profiling all the time opens up new capabilities

  38. Why Performance Matters Development isn’t Production Profiling vs Monitoring Production Profiling Conclusion

  39. Performance Matters Development isn’t Production Metrics can be unactionable Instrumentation has high overhead Continuous Profiling provides insight

  40. We need an attitude shift on profiling + monitoring

  41. Systematic not Ad Hoc Proactive Continuous not Reactive

  42. Please do Production Profiling. All the time.

  43. Any Questions? https://www.opsian.com/

  44. The End

Recommend


More recommend