AndreiYemelianov May 17, 2016 at 11:01

Deep into the Core: Introducing LTTng

Tutorial

In one of the previous publications, we have already touched upon the problems of tracing and profiling the Linux kernel.

Today we would like to return to this issue again and talk in detail about one interesting tool - the LTTng kernel tracer , developed by Canadian programmer and researcher Mathieu Denouillet. With it, you can get information about events both in the kernel space and in user space.

LTTng can be used, for example, to solve the following problems (the list is far from complete):

analysis of interprocess communication in the system;
analysis of the interaction of applications in user space with the kernel;
measuring the time spent by the kernel in servicing application requests;
analysis of the features of the system under high loads.

LTTng has long been included in the official repositories of modern Linux distributions. It is used in practice by companies such as Google, Siemens, Boeing and others. There are practically no publications about him in Russian.
In this article, we will look at how LTTng works and show how kernel tracing is done using it.

A bit of history

In the last article, we examined in detail the mechanism of trace points (tracepoints) and even gave code examples. But we didn't say anything about the fact that the tracepoints mechanism was created in the process of working on LTTng.

Back in 1999, IBM employee Karim Yaghmour began work on the LTT (Linux Trace Toolkit) project . The basis of LTT was the following idea: to statically instrument the most important fragments in the kernel code and thereby obtain information about the operation of the system. A few years later, this idea was picked up and developed by Mathieu Denouillet as part of the LTTng (Linux Tracing Tool New Generation) project. The first LTTng release took place in 2005.

The words New Generation are used in the name of the tool for a reason: Denoir has made a huge contribution to the development of Linux tracing and profiling mechanisms. He added static instrumentation for critical nuclear functions: this is how the kernel markers mechanism came about , as a result of which improved the familiar tracepoints appeared. Tracepoints is actively used in LTTng. It is thanks to this mechanism that it became possible to trace without creating an additional load on the system.
Based on the work done, Denoir prepared a dissertation , which was defended in 2009.

Mathieu Denouillet is constantly working to improve LTTng. The latest stable version 2.7 was released in October 2015. Version 2.8 is about to release, which is currently in release candidate status and available for download here .

Installation

LTTng is included in the repositories of most modern Linux distributions, and you can install it in the standard way. In new versions of popular distributions (for example, in the recently released Ubuntu 16.04), the latest version is available for installation by default - 2.7: If you are using an older version of Linux, you will need to add a PPA repository to install LTTng 2.7:

$ sudo apt-get install lttng-tools lttng-modules-dkms

$ sudo apt-add-repository ppa:lttng/ppa
$ sudo apt-get update
$ sudo apt-get install lttng-tools lttng-modules-dkms

The lttng-tools package contains the following utilities:

babeltrace - utility for viewing trace outputs in CTF format (Common Trace Format) ;
lttng-sessiond - daemon for trace control;
lttng-consumerd - a daemon that collects data and writes it to the ring buffer;
lttng-relayd is a daemon that transmits data over a network.

The lttng-modules-dkms package includes numerous kernel modules that interact with the built-in tracing and profiling mechanisms ... You can view their complete list using the command

$ lsmod | grep lttng

All these modules can be divided into three groups:

modules for working with trace points (tracepoints);
modules for working with a ring buffer;
sample modules for dynamic kernel tracing.

From the official repositories, the LTTng-UST package is also available for installation, with the help of which events are traced in user space. In the framework of this article we will not consider it. We refer interested readers to an article that can serve as a pretty good introduction to the topic.
All LTTng commands must be executed either on behalf of the root user or on behalf of the user included in the special tracing group (it is created automatically during installation).

Key concepts: sessions, events, and channels

The process of interaction of all LTTng components can be represented in the form of the following scheme:

LTTng

To better understand how all this works, we will make a short theoretical digression and analyze the contents of the basic concepts.

As mentioned above, tracing in LTTng is based on both static and dynamic code instrumentation.

During the execution of the instrumented code, a special test function is called (the English probe, which can also be translated as a “sensor” or “probe”), which reads the session status and writes information about events into the channels .

Let us clarify the content of the concepts of “session” and “channel”. A session is a trace procedure with user-defined parameters. The figure above shows only one session, but in LTTng you can start several sessions at the same time. You can stop a session at any time, change its settings and then start it again.

Each session uses channels to transmit debugging information. A channel is a set of events with certain characteristics and additional contextual information. Among the characteristics (in more detail we will talk about them below) of a channel are: buffer size, trace mode, buffer clearing period.

Why do we need channels? First of all, in order to maintain a common ring buffer, into which events are recorded during tracing and from where they are then taken by the consumer daemon. The ring buffer, in turn, is divided into numerous sections (sub-buffers) of the same size. Event data is written to the section until it is full. After that, the data recording will be continued, but to another section. The data from the crowded sections is locked by the daemon receiver and saves them on disk (or transfers over the network).

Ideally, situations where the sections are full and there is nowhere to write data should not occur. In real practice, however, such cases sometimes occur. In the official documentationIt’s bluntly said that LTTng puts performance at the forefront: it’s better to lose some of the events than to slow down the system. When creating a channel, you can choose one of two operating modes when the buffer overflows:

discard mode - all new events will not be recorded until one of the sections is freed;
overwrite mode - the oldest events will be deleted, and new ones will be recorded in their place.

You can read more about setting up channels here .

Event trace

Let's start learning LTTng in practice and start the first trace session.
You can view the list of events available for tracing in this way (we give only a small fragment, in fact, this list is much larger):

$ lttng list --kernel
Kernel events:
-------------
      writeback_nothread (loglevel: TRACE_EMERG (0)) (type: tracepoint)
      writeback_queue (loglevel: TRACE_EMERG (0)) (type: tracepoint)
      writeback_exec (loglevel: TRACE_EMERG (0)) (type: tracepoint)
      writeback_start (loglevel: TRACE_EMERG (0)) (type: tracepoint)
      writeback_written (loglevel: TRACE_EMERG (0)) (type: tracepoint)
      writeback_wait (loglevel: TRACE_EMERG (0)) (type: tracepoint)
      writeback_pages_written (loglevel: TRACE_EMERG (0)) (type: tracepoint)
            …….
      snd_soc_jack_irq (loglevel: TRACE_EMERG (0)) (type: tracepoint)
      snd_soc_jack_report (loglevel: TRACE_EMERG (0)) (type: tracepoint)
      snd_soc_jack_notify (loglevel: TRACE_EMERG (0)) (type: tracepoint)
      snd_soc_cache_sync (loglevel: TRACE_EMERG (0)) (type: tracepoint)

Let's try to track some event - for example, sched_switch. First, create a session:

$ lttng create test_session
Session test_session created.
Traces will be written in /user/lttng-traces/test_session-20151119-134747

So, the session is created. All data collected during this session will be written to the file / user / lttng-traces / test_session-20151119-134747. Then we activate the event of interest to us:

$ lttng enable-event -k sched_switch
Kernel event sched_switch created in channel channel0

Next, execute:

$ lttng start
Tracing started for session test_session
$ sleep 1
$ lttng stop

Information about all sched_switch events will be saved in a separate file. You can view its contents as follows:

$ lttng view

The list of events will be too long. Moreover, it will include too much information. Let's try to specify the request and get information only about the events that occurred during the sleep command:

$ babeltrace /user/lttng-traces/test_session-20151119-134747 | grep sleep
[13:53:23.278672041] (+0.001249216) luna sched_switch: { cpu_id = 0 }, { prev_comm = "sleep", prev_tid = 10935, prev_prio = 20, prev_state = 1, next_comm = "swapper/0", next_tid = 0, next_prio = 20 }
[13:53:24.278777924] (+0.005448925) luna sched_switch: { cpu_id = 0 }, { prev_comm = "swapper/0", prev_tid = 0, prev_prio = 20, prev_state = 0, next_comm = "sleep", next_tid = 10935, next_prio = 20 }
[13:53:24.278912164] (+0.000134240) luna sched_switch: { cpu_id = 0 }, { prev_comm = "sleep", prev_tid = 10935, prev_prio = 20, prev_state = 0, next_comm = "rcuos/0", next_tid = 8, next_prio = 20 }

The output contains information about all sched_switch events registered in the system kernel during the session. It consists of several fields. The first field is the timestamp, the second is the so-called delta (the amount of time between the previous and current events). The cpu_id field indicates the number of the CPU for which the event was recorded. The following is additional contextual information.

At the end of the trace, the session must be deleted:

$ lttng destroy

Tracing system calls

To track system calls, the lttng enable-event command has a special option, syscall. In the ftrace article, we looked at the chroot system call tracking example. Let's try to do the same with LTTng:

$ lttng create 
$ lttng enable-event -k --syscall chroot
$ lttng start

Create an isolated environment using the chroot command, and then execute:

$ lttng stop
$ lttng view
[12:05:51.993970181] (+?.?????????) andrei syscall_entry_chroot: { cpu_id = 0 }, { filename = "test" }
[12:05:51.993974601] (+0.000004420) andrei syscall_exit_chroot: { cpu_id = 0 }, { ret = 0 }
[12:06:53.373062654] (+61.379088053) andrei syscall_entry_chroot: { cpu_id = 0 }, { filename = "test" }
[12:06:53.373066648] (+0.000003994) andrei syscall_exit_chroot: { cpu_id = 0 }, { ret = 0 }
[12:07:36.701313906] (+43.328247258) andrei syscall_entry_chroot: { cpu_id = 1 }, { filename = "test" }
[12:07:36.701325202] (+0.000011296) andrei syscall_exit_chroot: { cpu_id = 1 }, { ret = 0 }

As you can see, the output contains information about all the inputs to and from the syscall system call. Compared to the ftrace output, it looks somewhat more structured and human-readable.

Dynamic tracing

Above, we looked at examples of static tracing using the tracepoints mechanism.

To track events on Linux, you can also use the dynamic tracing mechanism kprobes (short for kernel probes, as you might guess). It allows you to add new trace points (samples) to the kernel "on the fly." It is on kprobes that the well-known SystemTap tool is based. It works like this: to add a sample to the kernel, you need to write a script in a special C-like language; then this script is translated into C code, which is compiled into a separate kernel module. Using such a tool in practice is very difficult.

Starting with kernel version 3.10, kprobes support has been added to ftrace. Thanks to this, dynamic tracing became possible without writing scripts or adding new modules.

Implemented kprobes support in LTTng. Two types of samples are supported: actually kprobes (“basic” samples that can be inserted anywhere in the kernel) and kretprobes (placed before exiting a function and giving access to its result). Let's

look at a few practical examples. In LTTng, an option called probe is used to set up “basic” samples:

$ lttng create
$ lttng enable-event --kernel sys_open --probe sys_open+0x0
$ lttng enable-event --kernel sys_close --probe sys_close+0x0
$ lttng start
$ sleep 1     
$ lttng stop

The output obtained as a result of the trace will look like this (here is a small fragment):

…..
[12:45:26.842026294] (+0.000028311) andrei sys_close: { cpu_id = 1 }, { ip = 0xFFFFFFFF81209830 }
[12:45:26.842036177] (+0.000009883) andrei sys_open: { cpu_id = 1 }, { ip = 0xFFFFFFFF8120B940 }
[12:45:26.842064681] (+0.000028504) andrei sys_close: { cpu_id = 1 }, { ip = 0xFFFFFFFF81209830 }
[12:45:26.842097484] (+0.000032803) andrei sys_open: { cpu_id = 1 }, { ip = 0xFFFFFFFF8120B940 }
[12:45:26.842126464] (+0.000028980) andrei sys_close: { cpu_id = 1 }, { ip = 0xFFFFFFFF81209830 }
[12:45:26.842141670] (+0.000015206) andrei sys_open: { cpu_id = 1 }, { ip = 0xFFFFFFFF8120B940 }
[12:45:26.842179482] (+0.000037812) andrei sys_close: { cpu_id = 1 }, { ip = 0xFFFFFFFF81209830 }

The output contains the ip field - the address of the monitored function in the kernel.

Using the −−function option, you can set dynamic samples to enter and exit a function, for example:

$ lttng enable-event call_rcu_sched -k --function call_rcu_sched
.....
[15:31:39.092335027] (+0.000000742) cs31401 call_rcu_sched_return: { cpu_id = 0 }, { }, { ip = 0xFFFFFFFF810E7B10, parent_ip = 0xFFFFFFFF810A206D }
[15:31:39.092398089] (+0.000063062) cs31401 call_rcu_sched_entry: { cpu_id = 0 }, { }, { ip = 0xFFFFFFFF810E7B10, parent_ip = 0xFFFFFFFF810A206D }
[15:31:39.092398883] (+0.000000794) cs31401 call_rcu_sched_return: { cpu_id = 0 }, { }, { ip = 0xFFFFFFFF810E7B10, parent_ip = 0xFFFFFFFF810A206D }

The above output contains one more field: parent_ip - address of the function that called the monitored function.

In this section, we have provided only the simplest examples. However, the capabilities of LTTng are much wider: combining various instrumentation methods, it can be used to obtain such information about the operation of the system that it is difficult or impossible to obtain using other tools. You can read more about this and see interesting examples, for example, in this article on LWN.net .

Conclusion

LTTng is an interesting and promising tool. To talk about all its features, one article is clearly not enough. Therefore, we will be grateful for any comments and additions.
And if you use LTTng in practice, we invite you to share your experience in the comments.

For everyone who wants to get to know LTTng better, here is a selection of links to interesting and useful materials:

https://lttng.org/docs/ - official LTTng documentation;
http://mdh.diva-portal.org/smash/get/diva2:325301/FULLTEXT01 - study of LTTng performance in a multi-core environment, many interesting technical details;
https://sourceware.org/ml/systemtap/2011-q1/msg00244.html - performance comparison of LTTng and SystemTap;
https://events.linuxfoundation.org/slides/2011/lfcs/lfcs2011_tracing_goulet.pdf - presentation on tracing events in user space.

If for one reason or another you can’t leave comments here, welcome to our corporate blog .

Tags: