LinuxCon North America 2016 LinuxCon North America 2016 Investigating System Performance for DevOps Using Kernel Tracing jeremie.galarneau@efficios.com @LeGalarneau
Presenter Presenter Jérémie Galarneau EfficiOS Inc. – Head of Support – http://www.efficios.com Maintainer of – Babeltrace – LTTng-Tools LinuxCon North America 2016 2
Projects Projects ● LTTng-tools – Session configuration daemon – Network streaming daemon – Command-line Interface ● Babeltrace – CLI trace reader – CTF-Writer library – CTF reader/writer Python bindings LinuxCon North America 2016 3
Previous Talks Previous Talks ● Have mostly focused on LTTng tracer internals – Scalability, – Reducing per-event cost ● Today – How traces can be collected in the real world, – Tools we have developed to make them useful LinuxCon North America 2016 4
LTTng: Two Tracers, Two Groups of Users LTTng: Two Tracers, Two Groups of Users ● User space (LTTng-UST) – Some open source projects are already instrumented (node.js, CoreCLR, QEMU, ...) – Manually instrument applications (requires knowledge of internals) – Often used as a flexible, high-speed logger – Understand interactions between user space and kernel space ● Kernel space (LTTng-modules) – Out-of-tree kernel modules (no kernel patching required, your distro has packages) – Leverages mainline kernel instrumentation – Only useful to kernel experts... or is it? LinuxCon North America 2016 5
Do we need tools? Do we need tools? ● Traces are huge – Recording very fine-grained events b l o c k , s c h e d , i r q , n e t , p o w e r , s y s c a l l s , . . . ● – Manual inspection works, but requires knowledge of application/kernel internals – Not always clear what you are looking for LinuxCon North America 2016 6
Yes, we do! Yes, we do! ● User space developers are building custom tools! – Know exactly what they are looking for – Modelling their application – Tracking internal resources (worker threads, memory pools, connections, users, etc.) ● Originally text-based tools – Piping hundreds of GBs of text traces through g r e p , s e d , p e r l , a w k . . . – Lots of one-off scripts being passed around – Unmanageable, hard to maintain, etc. – Break when Babeltrace’s text output changes (new event fields) LinuxCon North America 2016 7
Babeltrace Python Bindings Babeltrace Python Bindings ● Introduced Python bindings to read traces (2013) – Provide users with an easy way to “hack something together” Debugging ● Testing ● – Reasonably efficient under most scenarios – Scripts are maintained as internal tools ● Could we do the same for kernel space? LinuxCon North America 2016 8
LTTng analyses LTTng analyses ● Development started in early 2014 ● Collection of utils ● Models some kernel subsystems to track their current state – Latency statistics and distributions (IO, Scheduling, IRQ) – System call statistics – IRQ handler duration – Top resource users https://github.com/lttng/lttng-analyses LinuxCon North America 2016 9
LTTng analyses LTTng analyses LinuxCon North America 2016 10
LTTng analyses LTTng analyses LinuxCon North America 2016 11
LTTng analyses LTTng analyses LinuxCon North America 2016 12
LTTng analyses LTTng analyses LinuxCon North America 2016 13
LTTng analyses - Trace Compass Integration LTTng analyses - Trace Compass Integration Invoke custom analyses ● LAMI 1.0 ● Open Specification – JSON based – LinuxCon North America 2016 14
Principles – Tracking Resources Principles – Tracking Resources [ . . . ] s c h e d _ s w i t c h s y s c a l l _ e n t r y _ o p e n s y s c a l l _ e x i t _ o p e n c p u _ i d = 3 t i d = 1 2 3 4 t i d = 1 2 3 4 n e x t _ t i d = 1 2 3 4 f i l e n a m e = r e t ( f d ) = 2 2 / e t c / l d . s o . c a c h e f l a g s = 5 2 4 2 8 8 m o d e = 1 LinuxCon North America 2016 15
Principles – Tracking Latencies Principles – Tracking Latencies read() duration s y s c a l l _ e n t r y _ r e a d s c h e d _ s w i t c h s c h e d _ s w i t c h s y s c a l l _ e x i t _ r e a d LinuxCon North America 2016 16
Custom Periods Custom Periods $ l t t n g - p e r i o d l o g - - p e r i o d [ N A M E [ ( P A R E N T ) ] ] : B E G I N _ E X P R [ : E N D _ E X P R ] - - p e r i o d - a g g r e g a t e P E R I O D S - - p e r i o d - a g g r e g a t e - b y P E R I O D - - p e r i o d - g r o u p - b y F I E L D [ . . . ] ● Example for r durations e a d ( ) $ l t t n g - p e r i o d l o g - - t o p - - f r e q - - s t a t s - - p e r i o d ‘ r e a d _ d u r a t i o n : $ e v t . $ n a m e = = “ s y s c a l l _ e n t r y _ r e a d ” : $ e v t . $ n a m e = = “ s y s c a l l _ e x i t _ r e a d ” ’ - - p e r i o d - c a p t u r e s ‘ r e a d _ d u r a t i o n : r e a d _ s i z e = $ e v t . c o u n t ’ LinuxCon North America 2016 17
Custom Periods Custom Periods j o b _ l e n f e t c h p r o c e s s p o s t j o b _ s t a r t f e t c h _ e n d p r o c e s s i n g _ e n d j o b _ e n d $ l t t n g - p e r i o d l o g - - p e r i o d ‘ j o b _ d u r a t i o n : $ e v t . $ n a m e = = “ j o b _ s t a r t ” : $ e v t . $ n a m e = = “ j o b _ e n d ” - - p e r i o d ‘ f e t c h ( j o b _ l e n ) : $ e v t . $ n a m e = = “ j o b _ s t a r t ” : $ e v t . $ n a m e = = “ f e t c h _ e n d ” ’ [ . . . ] LinuxCon North America 2016 18
Tackling Some “Real” Problems Tackling Some “Real” Problems Long Time to First Byte ● Demos on ● WordPress – MariaDB/MySQL – Apache httpd – The stack doesn’t matter... ● as long as it runs on Linux! LinuxCon North America 2016 19
Anatomy of a Request Anatomy of a Request request ● – f d = a c c e p t 4 ( . . . ) – c l o s e ( f d ) LinuxCon North America 2016 20
Anatomy of a Request Anatomy of a Request url_rewrite ● – f d = a c c e p t 4 ( . . . ) – n e w s t a t ( . . . ) LinuxCon North America 2016 21
Anatomy of a Request Anatomy of a Request db_queries ● – c o n n e c t ( . . . ) PHP resumes execution – n e w s t a t ( . . . ) - > LinuxCon North America 2016 22
Anatomy of a Request Anatomy of a Request ● Coarse breakdown of periods – The goal is not to benchmark – Want to know in which “phase” the problem happens ● r e q u e s t ( p a r e n t ) – u r l _ r e w r i t e – d b _ q u e r i e s – r e n d e r i n g – s e n d _ t o _ c l i e n t LinuxCon North America 2016 23
Deploying LTTng Deploying LTTng MariaDB Apache httpd l t t n g - s e s s i o n d l t t n g - s e s s i o n d Kernel Traces LTTng session profile Collector l t t n g - r e l a y d LinuxCon North America 2016 24
Case #1 Case #1 ● Almost all requests are unacceptably slow ● No connectivity problems between clients and servers ● Capture a trace for a couple of seconds on both nodes – Stream to my machine for storage and investigation No need to plan for storage in production ● Local buffering can be used ● – No push-back against the kernel if the network bandwidth is too low LinuxCon North America 2016 25
Case #1 Case #1 ● Extract a log of all requests from the WordPress node’s trace $ l t t n g - p e r i o d l o g - - t o p - - s t a t s - - f r e q - - p e r i o d ' r e q u e s t : $ e v t . $ n a m e = = " s y s c a l l _ e x i t _ a c c e p t 4 " & & $ e v t . p r o c n a m e = = " a p a c h e 2 " : $ e v t . p i d = = $ b e g i n . $ e v t . p i d & & $ e v t . $ n a m e = = " s y s c a l l _ e n t r y _ c l o s e " & & $ e v t . f d = = $ b e g i n . $ e v t . r e t ' LinuxCon North America 2016 26
Not a tracers’ forte Not a tracers’ forte ● Collecting long traces to compute CPU usage is inefficient ● Most monitoring tools would have found this right away LinuxCon North America 2016 27
Recommend
More recommend