djigger An open-source performance analysis solution
Context Performance Testing & Analysis @ several companies ● Depending on project : often no tools or tools that can’t be used ● 2012 Thread dumps are available : while (true) do kill -3 PID done ● Analyzing thread dumps manually is a pain ● Let’s build our own thread dump analyzer !
Development 2012 2013 2014 2015 2016 public release Thread Dump Sampler Collector Agent Full APM Analyzer (aggregate events) (24/7 archiving) (distributed tracing) (no more kill -3) (instrument) ~ 10 companies use djigger in France and Switzerland
About performance analysis
My definition Performance Analysis : gathering and interpreting necessary & sufficient data to understand and optimize a system or solve a performance problem.
Necessary and sufficient conditions Performance Analysis : gathering and interpreting necessary & sufficient data to understand and optimize a system or solve a performance problem. Necessary : without the necessary data, we can’t understand nor solve the problem Sufficient : runtimes are complex and we can’t afford to harvest every detail
It’s not just about tools Performance Analysis : gathering and interpreting necessary & sufficient data to understand and optimize a system or solve a performance problem. Necessary : without the necessary data, we can’t understand nor solve the problem Sufficient : runtimes are complex and we can’t afford to harvest every detail Many factors affect our ability to do this correctly, not just tooling
Many factors are at play... Permissions Monitoring Knowledge Problem & maturity of the stack inputs environment
Many factors are at play... Permissions Monitoring Knowledge Problem & maturity of the stack inputs environment who owns the code? may I access the system? may I change things?
Many factors are at play... Permissions Monitoring Knowledge Problem & maturity of the stack inputs environment do we have proper tooling? are all environments monitored? do I have the necessary data?
Many factors are at play... Permissions Monitoring Knowledge Problem & maturity of the stack inputs environment have I already seen this pattern? are components closed/proprietary? can I understand this runtime?
Many factors are at play... Permissions Monitoring Knowledge Problem & maturity of the stack inputs environment what’s the occurrence pattern? what’s the desired behaviour? what are the actual symptoms?
About metrics
There’s a ton of metrics out there Cache hit User CPU ratio Memory Pool ? usage Logs AWR / v$ Net I/O Kern CPU Queue size Cache Heap Size Disk I/O dumps
I don’t play the elimination game (anymore) Cache hit User CPU ratio Memory Pool usage Logs AWR / v$ Net I/O Kern CPU Queue size Cache Heap Size Disk I/O dumps
Let’s look at what the program is doing What are the main actors of a program’s execution?
Let’s look at what the program is doing What are the main actors of a program’s execution? Threads. What’s the most important information about a thread?
Let’s look at what the program is doing What are the main actors of a program’s execution? Threads. What’s the most important information about a thread? Its stack state (in particular, method calls). ..but what are java stacks blind to?
Let’s look at what the program is doing What are the main actors of a program’s execution? Threads. What’s the most important information about a thread? Its stack state (in particular, method calls). ..but what are java stacks blind to? GC pauses.
Look at what the program is doing I check thread stacks and GC overhead first.
Analysis process
A 3-step approach to analyzing latency issues WHAT WHERE WHY
A 3-step approach to analyzing latency issues WHAT ex.: a servlet call
A 3-step approach to analyzing latency issues WHAT WHERE ex.: a servlet call ex.: time is spent in DB
A 3-step approach to analyzing latency issues WHAT WHERE WHY ex.: 1-n pattern and ex.: a servlet call ex.: time is spent in DB query can be cached
A 3-step approach to analyzing latency issues WHAT WHERE WHY Find out which events are Read stacks & object data Identify top consumers problematic (transaction, to identify faulty or in the execution trees method, click..) optimizable behaviour ex.: 1-n pattern and ex.: a servlet call ex.: time is spent in DB query can be cached
Collecting events sampling instrumentation
Collecting events sampling instrumentation Thread-dump events, Concrete measurements approximation of reality and object capture
Collecting events sampling instrumentation Thread-dump events, Concrete measurements approximation of reality and object capture without with (for BCI) agent agent
Stacktrace Sampling
A dummy thread at runtime mypackage.MyClass.main() MyClass.doStuff() MyClass.doMoreStuff() g g g g g acquire socketRead( put e e e e e Connection() ) t t t t t Object.wait()
A dummy thread at runtime time mypackage.MyClass.main() MyClass.doStuff() MyClass.doMoreStuff() g g g g g acquire socketRead( put e e e e e Connection() ) t t t t t Object.wait() stacked methods
A random thread dump mypackage.MyClass.main() MyClass.doStuff() MyClass.doMoreStuff() g g g g g acquire socketRead( put e e e e e Connection() ) t t t t t Object.wait() at java.lang.Object.wait() at mypackage.datasource.acquireConnection() at mypackage.Myclass.doMoreStuff() at mypackage.MyClass.main()
Sampling = periodical thread dumps mypackage.MyClass.main() MyClass.doStuff() MyClass.doMoreStuff() g g g g g acquire socketRead( put e e e e e Connection() ) t t t t t Object.wait()
Time-based events e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e
Time-based aggregation Tree aggregator 57% 43% 14% 28% 43% 43%
Thread-based aggregation Thread 1 X% Y% Z% Tree aggregator A% B% C% Thread 2 Y% D% Thread 3
What does it look like in djigger? =
3-step approach with sampling WHAT WHERE WHY search without drill-down read stacks aggregated agent and stats locally events 1 2 3
Example
Example
Example
Instrumentation
A dummy thread at runtime (again) mypackage.MyClass.main() MyClass.doStuff() MyClass.doMoreStuff() g g g g g acquire socketRead( put() e e e e e Connection() ) t t t t t Object.wait()
Subscriptions mypackage.MyClass.main() MyClass.doStuff() MyClass.doMoreStuff() g g g g g acquire socketRead( put() e e e e e Connection() ) t t t t t Object.wait() Active subscriptions: Start event: End event:
Subscription-based events e begin = 11:38:20.243, method= doMoreStuff, duration= 1223 ms e begin = 11:38:20.252, method= acquireConnection, duration= 613 ms e begin = 11:38:20.271, method= wait, duration= 599 ms
Transaction flags e begin = 11:38:20.243, method= doMoreStuff, duration= 1223 ms e begin = 11:38:20.252, method= acquireConnection, duration= 613 ms e begin = 11:38:20.271, method= wait, duration= 599 ms tId= 1fa23
Object capture executeQuery(“SELECT * FROM TABLE”) e begin = 11:38:20.243, …, data = “ SELECT * FROM MYTABLE ” e begin = 11:38:20.252, method= acquireConnection, duration= 613 ms e begin = 11:38:20.271, method= wait, duration= 599 ms tId= 1fa23
Distributed transactions e begin = 11:38:20.243, ... e begin = 11:38:20.252, ... JVM 1 e begin = 11:38:20.271, ... tId= 1fa23 drill-down e begin = 11:38:20.301, ... JVM 2 e begin = 11:38:20.252, ... tId= 87e01
3-step approach with instrumentation WHAT WHERE WHY search with drill-down capture entry point agent across JVMs object data events refine 1 2 3
What does it look like in djigger?
What does it look like in djigger? invoke() executeQuery() invoke() executeQuery() executeQuery() invoke() invoke() handleRequest() invoke() invoke() ... ...
What does it look like in djigger?
Component overview
connectors JMX, -javaagent, kill -3, jstack, process attach, ... events
P P R R O O F F I I M M L L E E O O R R D D E E connectors JMX, -javaagent, kill -3, jstack, process attach, ... events harvest & analyze client
APM MODE connectors JMX, -javaagent, kill -3, jstack, process attach, ... events harvest collector analyze events persist client events store
Download and try out djigger !
Download djigger at http://denkbar.io
Thanks for your attention
Recommend
More recommend