djigger
play

djigger An open-source performance analysis solution Context - PowerPoint PPT Presentation

djigger An open-source performance analysis solution Context Performance Testing & Analysis @ several companies Depending on project : often no tools or tools that cant be used 2012 Thread dumps are available : while (true) do


  1. djigger An open-source performance analysis solution

  2. Context Performance Testing & Analysis @ several companies ● Depending on project : often no tools or tools that can’t be used ● 2012 Thread dumps are available : while (true) do kill -3 PID done ● Analyzing thread dumps manually is a pain ● Let’s build our own thread dump analyzer !

  3. Development 2012 2013 2014 2015 2016 public release Thread Dump Sampler Collector Agent Full APM Analyzer (aggregate events) (24/7 archiving) (distributed tracing) (no more kill -3) (instrument) ~ 10 companies use djigger in France and Switzerland

  4. About performance analysis

  5. My definition Performance Analysis : gathering and interpreting necessary & sufficient data to understand and optimize a system or solve a performance problem.

  6. Necessary and sufficient conditions Performance Analysis : gathering and interpreting necessary & sufficient data to understand and optimize a system or solve a performance problem. Necessary : without the necessary data, we can’t understand nor solve the problem Sufficient : runtimes are complex and we can’t afford to harvest every detail

  7. It’s not just about tools Performance Analysis : gathering and interpreting necessary & sufficient data to understand and optimize a system or solve a performance problem. Necessary : without the necessary data, we can’t understand nor solve the problem Sufficient : runtimes are complex and we can’t afford to harvest every detail Many factors affect our ability to do this correctly, not just tooling

  8. Many factors are at play... Permissions Monitoring Knowledge Problem & maturity of the stack inputs environment

  9. Many factors are at play... Permissions Monitoring Knowledge Problem & maturity of the stack inputs environment who owns the code? may I access the system? may I change things?

  10. Many factors are at play... Permissions Monitoring Knowledge Problem & maturity of the stack inputs environment do we have proper tooling? are all environments monitored? do I have the necessary data?

  11. Many factors are at play... Permissions Monitoring Knowledge Problem & maturity of the stack inputs environment have I already seen this pattern? are components closed/proprietary? can I understand this runtime?

  12. Many factors are at play... Permissions Monitoring Knowledge Problem & maturity of the stack inputs environment what’s the occurrence pattern? what’s the desired behaviour? what are the actual symptoms?

  13. About metrics

  14. There’s a ton of metrics out there Cache hit User CPU ratio Memory Pool ? usage Logs AWR / v$ Net I/O Kern CPU Queue size Cache Heap Size Disk I/O dumps

  15. I don’t play the elimination game (anymore) Cache hit User CPU ratio Memory Pool usage Logs AWR / v$ Net I/O Kern CPU Queue size Cache Heap Size Disk I/O dumps

  16. Let’s look at what the program is doing What are the main actors of a program’s execution?

  17. Let’s look at what the program is doing What are the main actors of a program’s execution? Threads. What’s the most important information about a thread?

  18. Let’s look at what the program is doing What are the main actors of a program’s execution? Threads. What’s the most important information about a thread? Its stack state (in particular, method calls). ..but what are java stacks blind to?

  19. Let’s look at what the program is doing What are the main actors of a program’s execution? Threads. What’s the most important information about a thread? Its stack state (in particular, method calls). ..but what are java stacks blind to? GC pauses.

  20. Look at what the program is doing I check thread stacks and GC overhead first.

  21. Analysis process

  22. A 3-step approach to analyzing latency issues WHAT WHERE WHY

  23. A 3-step approach to analyzing latency issues WHAT ex.: a servlet call

  24. A 3-step approach to analyzing latency issues WHAT WHERE ex.: a servlet call ex.: time is spent in DB

  25. A 3-step approach to analyzing latency issues WHAT WHERE WHY ex.: 1-n pattern and ex.: a servlet call ex.: time is spent in DB query can be cached

  26. A 3-step approach to analyzing latency issues WHAT WHERE WHY Find out which events are Read stacks & object data Identify top consumers problematic (transaction, to identify faulty or in the execution trees method, click..) optimizable behaviour ex.: 1-n pattern and ex.: a servlet call ex.: time is spent in DB query can be cached

  27. Collecting events sampling instrumentation

  28. Collecting events sampling instrumentation Thread-dump events, Concrete measurements approximation of reality and object capture

  29. Collecting events sampling instrumentation Thread-dump events, Concrete measurements approximation of reality and object capture without with (for BCI) agent agent

  30. Stacktrace Sampling

  31. A dummy thread at runtime mypackage.MyClass.main() MyClass.doStuff() MyClass.doMoreStuff() g g g g g acquire socketRead( put e e e e e Connection() ) t t t t t Object.wait()

  32. A dummy thread at runtime time mypackage.MyClass.main() MyClass.doStuff() MyClass.doMoreStuff() g g g g g acquire socketRead( put e e e e e Connection() ) t t t t t Object.wait() stacked methods

  33. A random thread dump mypackage.MyClass.main() MyClass.doStuff() MyClass.doMoreStuff() g g g g g acquire socketRead( put e e e e e Connection() ) t t t t t Object.wait() at java.lang.Object.wait() at mypackage.datasource.acquireConnection() at mypackage.Myclass.doMoreStuff() at mypackage.MyClass.main()

  34. Sampling = periodical thread dumps mypackage.MyClass.main() MyClass.doStuff() MyClass.doMoreStuff() g g g g g acquire socketRead( put e e e e e Connection() ) t t t t t Object.wait()

  35. Time-based events e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e

  36. Time-based aggregation Tree aggregator 57% 43% 14% 28% 43% 43%

  37. Thread-based aggregation Thread 1 X% Y% Z% Tree aggregator A% B% C% Thread 2 Y% D% Thread 3

  38. What does it look like in djigger? =

  39. 3-step approach with sampling WHAT WHERE WHY search without drill-down read stacks aggregated agent and stats locally events 1 2 3

  40. Example

  41. Example

  42. Example

  43. Instrumentation

  44. A dummy thread at runtime (again) mypackage.MyClass.main() MyClass.doStuff() MyClass.doMoreStuff() g g g g g acquire socketRead( put() e e e e e Connection() ) t t t t t Object.wait()

  45. Subscriptions mypackage.MyClass.main() MyClass.doStuff() MyClass.doMoreStuff() g g g g g acquire socketRead( put() e e e e e Connection() ) t t t t t Object.wait() Active subscriptions: Start event: End event:

  46. Subscription-based events e begin = 11:38:20.243, method= doMoreStuff, duration= 1223 ms e begin = 11:38:20.252, method= acquireConnection, duration= 613 ms e begin = 11:38:20.271, method= wait, duration= 599 ms

  47. Transaction flags e begin = 11:38:20.243, method= doMoreStuff, duration= 1223 ms e begin = 11:38:20.252, method= acquireConnection, duration= 613 ms e begin = 11:38:20.271, method= wait, duration= 599 ms tId= 1fa23

  48. Object capture executeQuery(“SELECT * FROM TABLE”) e begin = 11:38:20.243, …, data = “ SELECT * FROM MYTABLE ” e begin = 11:38:20.252, method= acquireConnection, duration= 613 ms e begin = 11:38:20.271, method= wait, duration= 599 ms tId= 1fa23

  49. Distributed transactions e begin = 11:38:20.243, ... e begin = 11:38:20.252, ... JVM 1 e begin = 11:38:20.271, ... tId= 1fa23 drill-down e begin = 11:38:20.301, ... JVM 2 e begin = 11:38:20.252, ... tId= 87e01

  50. 3-step approach with instrumentation WHAT WHERE WHY search with drill-down capture entry point agent across JVMs object data events refine 1 2 3

  51. What does it look like in djigger?

  52. What does it look like in djigger? invoke() executeQuery() invoke() executeQuery() executeQuery() invoke() invoke() handleRequest() invoke() invoke() ... ...

  53. What does it look like in djigger?

  54. Component overview

  55. connectors JMX, -javaagent, kill -3, jstack, process attach, ... events

  56. P P R R O O F F I I M M L L E E O O R R D D E E connectors JMX, -javaagent, kill -3, jstack, process attach, ... events harvest & analyze client

  57. APM MODE connectors JMX, -javaagent, kill -3, jstack, process attach, ... events harvest collector analyze events persist client events store

  58. Download and try out djigger !

  59. Download djigger at http://denkbar.io

  60. Thanks for your attention

Recommend


More recommend