LTTng, Filling the Gap Between Kernel Instrumentation and a Widely Usable Kernel Tracer Mathieu Desnoyers Michel R. Dagenais École Polytechnique de Montréal École Polytechnique de Montréal mathieu.desnoyers@polymtl.ca michel.dagenais@polymtl.ca Abstract and explain the distinction between core kernel tracing infrastructure (which must be shared, common and has the largest impact on the This paper presents an overview of tracing re- kernel code-base) and trace data transport and quirements stated by the LTTng user-base. It trace management infrastructures, which can be presents LTTng as a tracer having a wide user- categorized as driver code. base, with needs different from kernel devel- opers. It presents tracing infrastructure as be- It will then explain how the LTTng user-base ing made of distinct parts which can be catego- differs from the target user-base of most of the rized as either common core-kernel infrastruc- non-core tracing facility currently present in the ture (instrumentation, tracing time-source) or mainline kernel, therefore building a base for tracer-specific, driver-like code (trace manage- mainline LTTng “driver” code. ment, buffering mechanism). This paper builds the case for LTTng mainlining into the Linux kernel by explaining the specific user require- ments LTTng fulfills, its degree of maturity and 2 Tracing Infrastructure in Main- the number of users it has. This case is then line Kernel supported by showing that most of its code- base does not affect the kernel core. This section will detail the tracing infrastruc- ture integrated in the Linux kernel 2.6.30-rc3. 1 Introduction The first gategory, instrumentation mecha- nisms, includes Kprobes, Kernel Markers and With current systems becoming increasingly Tracepoints. Kprobes[9], allow dynamically multi-core and complex, the need for tools to inserting breakpoints into the Linux kernel on help understanding performance and latency which handlers can be connected. The Linux problems is clear[2]. However, a kernel trac- Kernel Markers[3] allow adding ad-hoc instru- ing solution usable by the large community of mentation along with a format string and a vari- Linux users has not made its way into the main- able argument list. This allows easy addition of line Linux kernel yet. instrumentation at the source code level. The This paper will detail the various tracing so- Tracepoints[4] are a variant of the Linux Kernel lutions currently available in the Linux kernel Markers, which provide better manageability of 1
the instrumentation by requiring an instrumen- However, this does not apply to core kernel tation declaration to be added into a system- code with very good reasons : modifications wide header. Tracepoints are meant to allow to the core kernel code can have broad impacts the kernel instrumentation process to be man- on the kernel and on many kernel maintainers. aged by the subsystem maintainers, along with Furthermore, they are, by nature, hard to isolate the overview of the overall community. A third in a specific module. static instrumentation mechanism present in the Therefore, the core kernel code is “jealously kernel is the function tracer, which allows in- kept” from external contributions, and those strumentation of function entry and exit by the typically have to go through a very thorough compiler with an almost non-existing overhead round of review before being accepted. when dynamically disabled. Looking at the Linux Trace Toolkit Next Gen- The second category, kernel tracers, is currently eration patchset 2 , one might wonder which part being integrated under the Ftrace[8] umbrella. of it could be considered as core kernel code It includes principally the block I/O tracer[1], and which parts are self-contained drivers. the memory I/O tracer, kmemtrace, KVMtrace (tracing Linux KVM), the wakeup tracer and The core kernel modifications present in the the event tracer. The number of such tracers is LTTng patchset are limited to the kernel in- increasing from one kernel version to another. strumentation infrastructure, the instrumenta- The approach taken here is to let tracers attach tion per se and the trace clock. to Ftrace to provide a data output. One tracer can be selected as the “current” tracer at any Most of the instrumentation infrastructure : given time. Their primary user-base is meant to kernel markers and tracepoints, have been be kernel developers. The motto Ftrace follows merged into the mainline kernel already. The is to include everything needed to use the tracer “Immediate Values” patches aim at diminishing within the kernel, to make sure kernel develop- the performance impact of dormant instrumen- ers do not run into userspace package depen- tation, which will become increasingly useful dency problems. as more tracepoints are added into the ker- nel. The LTTng instrumentation touches vari- ous kernel subsystems and is being submitted for integration into the mainline kernel. The 3 Core Kernel vs Driver Code time-base LTTng uses (trace clock) aims at pro- viding a reliable and fast time-base suitable for the needs of tracing. Ingo Molnar proposed a As a general guideline coming from the Linux trace clock implementation which is currently kernel maintainers 1 , the kernel community is in mainline but does not have the reliability actively trying to make it easier for contribu- and performance characteristics provided by tors to have their kernel drivers merged into the the LTTng implementation. Those pieces of in- mainline Linux kernel. The linux-next tree in- frastructure will therefore have to be submitted cludes a staging drivers section with this pre- for mainlining as “core kernel” modifications. cise goal : to integrate new drivers into the ker- nel tree early in their development. This is however where stops the LTTng core kernel intrusiveness. LTTng code-base is 1 Referring to Andrew Morton and Greg Kroah- 2 http://www.lttng.org Hartman at LFCS2009[7] 2
made primarily of self-contained kernel mod- servers). At the other end of the spectrum, ules which aim at using as few pieces of ker- embedded system developers need to fit within nel infrastructure as possible, so the instrumen- very limited memory and bandwidth resources tation coverage can be maximized. Therefore, (Nokia embedded products). It is used in the the trace sessions management code, the ring- field on Siemens production systems to gather buffer data extraction mechanism and the buffer a continuous flight recorder trace of the sys- layout are all self-contained in kernel modules tem’s behavior to circular buffers, providing that do not affect the rest of the Linux kernel meaningful bug reports from the end-user site code-base. to Siemens technical support teams. We can therefore claim that the LTTng trace The currently existing tracing solution in the management mechanism should be considered Linux kernel, Ftrace, targets mainly kernel de- for mainlining under the same criterions that velopers. It focuses primarily on providing spe- apply to drivers. cialized tracers for kernel behavior to debug kernel-level problems occuring, for example, Given that Ftrace provides parts of the fea- at the scheduler, driver, memory management, tures provided by LTTng, one might wonder if block layer levels. We will see below that this LTTng mainlining would in fact duplicate fea- difference of target users makes more difficult tures already provided by Ftrace. The follow- the sharing of some common pieces of low- ing section will shown how LTTng and Ftrace level infrastructure. user-bases and requirements differ. Ftrace currently relies on some assumptions about tracer usage specfic to kernel users. Pri- marily, data is written into physical pages, 4 Case for LTTng Mainlining which limits the maximum event size to a page and requires padding to be added whenever an This section will show how LTTng fits with event would cross a page boundary. The buffer respect to the various requirements for kernel locking structure is specialized for kernel trac- code to be considered for mainlining, namely : ing, which makes it unlikely to be reusable for to fulfill specific user requirements, having a user-space tracing. For instance, assumptions large user-base and being actively developed by about preemption being disabled at the tracing a group of contributors. site are made, which does not hold in user- space. Ftrace uses many function calls which are costly performance-wise on many architec- 4.1 LTTng Specific User Requirements tures. For example, on a 64-bits Intel Xeon, adding a function call to the LTTng tracer fast LTTng fulfills tracing requirements from de- path slows down tracer execution by 20%. In terms of buffer space usage, the ring_buffer velopers, system administrators, technical sup- port and users running Linux as their operat- infrastructure reserves precious event header ing system. Those requirements include hav- bytes to encode the type and size of events ing the ability to analyze and debug application, (2 bits for event type, 3 bits for event length library and kernel system-wide performance. and 27 bits for time delta). The event pay- Within these users, some of the most demand- load itself is a multiple of 32 bits. The locking ing need to trace high-performance comput- mechanism currently used by Ftrace is a per- ing multi-core application workloads (Google cpu spinlock with interrupts disabled; this en- 3
Recommend
More recommend