investigating system performance for devops using kernel
play

Investigating System Performance for DevOps Using Kernel Tracing - PowerPoint PPT Presentation

LinuxCon North America 2016 LinuxCon North America 2016 Investigating System Performance for DevOps Using Kernel Tracing jeremie.galarneau@efficios.com @LeGalarneau Presenter Presenter Jrmie Galarneau EfficiOS Inc. Head


  1. LinuxCon North America 2016 LinuxCon North America 2016 Investigating System Performance for DevOps Using Kernel Tracing jeremie.galarneau@efficios.com  @LeGalarneau 

  2. Presenter Presenter  Jérémie Galarneau  EfficiOS Inc. – Head of Support – http://www.efficios.com  Maintainer of – Babeltrace – LTTng-Tools LinuxCon North America 2016 2

  3. Projects Projects ● LTTng-tools – Session configuration daemon – Network streaming daemon – Command-line Interface ● Babeltrace – CLI trace reader – CTF-Writer library – CTF reader/writer Python bindings LinuxCon North America 2016 3

  4. Previous Talks Previous Talks ● Have mostly focused on LTTng tracer internals – Scalability, – Reducing per-event cost ● Today – How traces can be collected in the real world, – Tools we have developed to make them useful LinuxCon North America 2016 4

  5. LTTng: Two Tracers, Two Groups of Users LTTng: Two Tracers, Two Groups of Users ● User space (LTTng-UST) – Some open source projects are already instrumented (node.js, CoreCLR, QEMU, ...) – Manually instrument applications (requires knowledge of internals) – Often used as a flexible, high-speed logger – Understand interactions between user space and kernel space ● Kernel space (LTTng-modules) – Out-of-tree kernel modules (no kernel patching required, your distro has packages) – Leverages mainline kernel instrumentation – Only useful to kernel experts... or is it? LinuxCon North America 2016 5

  6. Do we need tools? Do we need tools? ● Traces are huge – Recording very fine-grained events b l o c k , s c h e d , i r q , n e t , p o w e r , s y s c a l l s , . . . ● – Manual inspection works, but requires knowledge of application/kernel internals – Not always clear what you are looking for LinuxCon North America 2016 6

  7. Yes, we do! Yes, we do! ● User space developers are building custom tools! – Know exactly what they are looking for – Modelling their application – Tracking internal resources (worker threads, memory pools, connections, users, etc.) ● Originally text-based tools – Piping hundreds of GBs of text traces through g r e p , s e d , p e r l , a w k . . . – Lots of one-off scripts being passed around – Unmanageable, hard to maintain, etc. – Break when Babeltrace’s text output changes (new event fields) LinuxCon North America 2016 7

  8. Babeltrace Python Bindings Babeltrace Python Bindings ● Introduced Python bindings to read traces (2013) – Provide users with an easy way to “hack something together” Debugging ● Testing ● – Reasonably efficient under most scenarios – Scripts are maintained as internal tools ● Could we do the same for kernel space? LinuxCon North America 2016 8

  9. LTTng analyses LTTng analyses ● Development started in early 2014 ● Collection of utils ● Models some kernel subsystems to track their current state – Latency statistics and distributions (IO, Scheduling, IRQ) – System call statistics – IRQ handler duration – Top resource users https://github.com/lttng/lttng-analyses LinuxCon North America 2016 9

  10. LTTng analyses LTTng analyses LinuxCon North America 2016 10

  11. LTTng analyses LTTng analyses LinuxCon North America 2016 11

  12. LTTng analyses LTTng analyses LinuxCon North America 2016 12

  13. LTTng analyses LTTng analyses LinuxCon North America 2016 13

  14. LTTng analyses - Trace Compass Integration LTTng analyses - Trace Compass Integration Invoke custom analyses ● LAMI 1.0 ● Open Specification – JSON based – LinuxCon North America 2016 14

  15. Principles – Tracking Resources Principles – Tracking Resources [ . . . ] s c h e d _ s w i t c h s y s c a l l _ e n t r y _ o p e n s y s c a l l _ e x i t _ o p e n c p u _ i d = 3 t i d = 1 2 3 4 t i d = 1 2 3 4 n e x t _ t i d = 1 2 3 4 f i l e n a m e = r e t ( f d ) = 2 2 / e t c / l d . s o . c a c h e f l a g s = 5 2 4 2 8 8 m o d e = 1 LinuxCon North America 2016 15

  16. Principles – Tracking Latencies Principles – Tracking Latencies read() duration s y s c a l l _ e n t r y _ r e a d s c h e d _ s w i t c h s c h e d _ s w i t c h s y s c a l l _ e x i t _ r e a d LinuxCon North America 2016 16

  17. Custom Periods Custom Periods $ l t t n g - p e r i o d l o g - - p e r i o d [ N A M E [ ( P A R E N T ) ] ] : B E G I N _ E X P R [ : E N D _ E X P R ] - - p e r i o d - a g g r e g a t e P E R I O D S - - p e r i o d - a g g r e g a t e - b y P E R I O D - - p e r i o d - g r o u p - b y F I E L D [ . . . ] ● Example for r durations e a d ( ) $ l t t n g - p e r i o d l o g - - t o p - - f r e q - - s t a t s - - p e r i o d ‘ r e a d _ d u r a t i o n : $ e v t . $ n a m e = = “ s y s c a l l _ e n t r y _ r e a d ” : $ e v t . $ n a m e = = “ s y s c a l l _ e x i t _ r e a d ” ’ - - p e r i o d - c a p t u r e s ‘ r e a d _ d u r a t i o n : r e a d _ s i z e = $ e v t . c o u n t ’ LinuxCon North America 2016 17

  18. Custom Periods Custom Periods j o b _ l e n f e t c h p r o c e s s p o s t j o b _ s t a r t f e t c h _ e n d p r o c e s s i n g _ e n d j o b _ e n d $ l t t n g - p e r i o d l o g - - p e r i o d ‘ j o b _ d u r a t i o n : $ e v t . $ n a m e = = “ j o b _ s t a r t ” : $ e v t . $ n a m e = = “ j o b _ e n d ” - - p e r i o d ‘ f e t c h ( j o b _ l e n ) : $ e v t . $ n a m e = = “ j o b _ s t a r t ” : $ e v t . $ n a m e = = “ f e t c h _ e n d ” ’ [ . . . ] LinuxCon North America 2016 18

  19. Tackling Some “Real” Problems Tackling Some “Real” Problems Long Time to First Byte ● Demos on ● WordPress – MariaDB/MySQL – Apache httpd – The stack doesn’t matter... ● as long as it runs on Linux! LinuxCon North America 2016 19

  20. Anatomy of a Request Anatomy of a Request request ● – f d = a c c e p t 4 ( . . . ) – c l o s e ( f d ) LinuxCon North America 2016 20

  21. Anatomy of a Request Anatomy of a Request url_rewrite ● – f d = a c c e p t 4 ( . . . ) – n e w s t a t ( . . . ) LinuxCon North America 2016 21

  22. Anatomy of a Request Anatomy of a Request db_queries ● – c o n n e c t ( . . . ) PHP resumes execution – n e w s t a t ( . . . ) - > LinuxCon North America 2016 22

  23. Anatomy of a Request Anatomy of a Request ● Coarse breakdown of periods – The goal is not to benchmark – Want to know in which “phase” the problem happens ● r e q u e s t ( p a r e n t ) – u r l _ r e w r i t e – d b _ q u e r i e s – r e n d e r i n g – s e n d _ t o _ c l i e n t LinuxCon North America 2016 23

  24. Deploying LTTng Deploying LTTng MariaDB Apache httpd l t t n g - s e s s i o n d l t t n g - s e s s i o n d Kernel Traces LTTng session profile   Collector l t t n g - r e l a y d LinuxCon North America 2016 24

  25. Case #1 Case #1 ● Almost all requests are unacceptably slow ● No connectivity problems between clients and servers ● Capture a trace for a couple of seconds on both nodes – Stream to my machine for storage and investigation No need to plan for storage in production ● Local buffering can be used ● – No push-back against the kernel if the network bandwidth is too low LinuxCon North America 2016 25

  26. Case #1 Case #1 ● Extract a log of all requests from the WordPress node’s trace $ l t t n g - p e r i o d l o g - - t o p - - s t a t s - - f r e q - - p e r i o d ' r e q u e s t : $ e v t . $ n a m e = = " s y s c a l l _ e x i t _ a c c e p t 4 " & & $ e v t . p r o c n a m e = = " a p a c h e 2 " : $ e v t . p i d = = $ b e g i n . $ e v t . p i d & & $ e v t . $ n a m e = = " s y s c a l l _ e n t r y _ c l o s e " & & $ e v t . f d = = $ b e g i n . $ e v t . r e t ' LinuxCon North America 2016 26

  27. Not a tracers’ forte Not a tracers’ forte ● Collecting long traces to compute CPU usage is inefficient ● Most monitoring tools would have found this right away LinuxCon North America 2016 27

Recommend


More recommend