Examining Tracing with eBPF: Guide and Examples

From the sandbox

Hi, Habr! I bring to your attention the translation of an article by Brendan Gregg devoted to the study of eBPF

There were at least 24 eBPF presentations at the Linux Plumbers conference. He quickly became not just an invaluable technology, but also an in-demand skill. Perhaps you would like to set a goal for the new year - study eBPF!

The term eBPF should mean something significant, such as the Virtual Kernel Instruction Set (VKIS), but in its origin it is an advanced Berkeley Packet Filter. It is applicable in many areas, such as network performance, firewalls, security, tracing, and device drivers. For some of them there is a lot of freely available information on the Internet - for example, on tracing, but on others it is not. The term trace refers to performance analysis and observation tools that can generate information for each event. Perhaps you have already used the tracer - tcpdump and strace are specialized tracer.

In this post, I'm going to describe the process of learning how to use eBPF for tracing purposes, grouped into sections for beginners, experienced and advanced users. Eventually:

Getting Started: running bcc tools
Proficient: developing bpftrace tools
Advanced: developing bcc tools , contributing to bcc and bpftrace

Beginners

1. What are eBPF, bcc, bpftrace and iovisor?

eBPF does for Linux the same as JavaScript does for HTML, well, sort of. So, instead of a static HTML site, JavaScript allows you to specify mini-programs that run on events — such as mouse clicks — that run in a secure virtual machine in a browser. And with eBPF — instead of editing the kernel, you can now write mini-programs that run on events like disk I / O on a secure virtual machine in the kernel. In fact, eBPF looks more like a v8 virtual machine that runs javascript than javascript itself. eBPF is part of the Linux kernel.

Programming directly in eBPF is incredibly difficult, just like in bytecode on v8. But nobody kodit in v8: everyone writes in JavaScript, or often in a framework over JavaScript (jQuery, Angular, React, etc.). The same with eBPF. People will use it, and write code through frameworks. The main traces are bcc and bpftrace . They don’t live in the kernel codebase, they live in the Linux Foundation project on a github called iovisor .

2. Can an example trace using eBPF?

This eBPF-based utility shows fully established TCP sessions with their process ID (PID), command name (COMM), bytes sent and received (TX_KB, RX_KB), and the duration in milliseconds (MS):

# tcplife

PID COMM LADDR LPORT RADDR RPORT TX_KB RX_KB MS
22597 recordProg 127.0.0.1 46644 127.0.0.1 28527 0 0 0.23
3277 redis-serv 127.0.0.1 28527 127.0.0.1 46644 0 0 0.28
22598 curl 100.66.3.172 61620 52.205.89.26 80 0 1 91.79
22604 curl 100.66.3.172 44400 52.204.43.121 80 0 1 121.38
22624 recordProg 127.0.0.1 46648 127.0.0.1 28527 0 0 0.22
3277 redis-serv 127.0.0.1 28527 127.0.0.1 46648 0 0 0.27
22647 recordProg 127.0.0.1 46650 127.0.0.1 28527 0 0 0.21
3277 redis-serv 127.0.0.1 28527 127.0.0.1 46650 0 0 0.26
[...]

This is not what eBPF makes possible — I can rewrite tcplife to use older kernel technologies. But if I had done this, we would never have launched such a tool in production due to poor performance, security problems, or both. eBPF has made this tool practical : it is effective and safe. For example, it does not track every packet, as was done with previous approaches, and that could lead to excessive performance degradation. Instead, it tracks only TCP session events that occur less frequently. This makes the overhead so low that we can run this tool in 24x7 mode.

3. How can I use it?

Beginners should start exploring bcc. See bcc installation instructions for your operating system. For Ubuntu, it looks something like this:

# sudo apt-get update
# sudo apt-get install bpfcc-tools
# sudo / usr / share / bcc / tools / opensnoop

PID COMM FD ERR PATH
25548 gnome-shell 33 0 / proc / self / stat
10190 opensnoop -1 2 /usr/lib/python2.7/encodings/ascii.x86_64-linux-gnu.so
10190 opensnoop -1 2 /usr/lib/python2.7/encodings/ascii.so
10190 opensnoop -1 2 /usr/lib/python2.7/encodings/asciimodule.so
10190 opensnoop 18 0 /usr/lib/python2.7/encodings/ascii.py
10190 opensnoop 19 0 /usr/lib/python2.7/encodings/ascii.pyc
25548 gnome-shell 33 0 / proc / self / stat
29588 device poll 4 0 / dev / bus / usb
^ C

Here I ended up running opensnoop to test the functionality of the tools. If you have come this far, you have definitely enjoyed eBPF!

In companies like Netflix and Facebook, bcc is installed on all servers by default. Maybe you want to do the same.

4. Is there a beginner's guide?

Yes, I wrote a bcc guide, which is a good starting point for newbies to tracing with eBPF:

Bcc tutorial

As a beginner, you do not need to write any code for eBPF. bcc already contains more than 70 tools that you can immediately use. This tutorial will guide you through the following eleven steps: execsnoop, opensnoop, ext4slower (or btrfs *, xfs *, zfs *), biolatency, biosnoop, cachestat, tcpconnect, tcpaccept, tcpretrans, runqlat and profile.

After you have tried them, you just have to know that there are many other means:

They are also fully provided with documentation by means of man pages and files with examples. The sample files (* _example.txt in bcc / tools) contain screenshots with explanations: for example, biolatency_example.txt . I have written many of them (both man pages and tools) that look like an additional 50 blog posts, you will find them in the bcc repository.

What is missing is real production examples. I wrote this documentation when eBPF was so new that it was only available in our test environments, so most of the examples are artificial. Over time, we will add examples from the real world. This is an area where newbies can help: if you solve a problem, consider writing an article and sharing screenshots or adding them as sample files.

For experienced

At this point, you should already run bcc and test these tools, and also be interested in modifying them and writing your own tools. The best way is to go to bpftrace, which contains a high-level language that is much easier to learn. The disadvantage is that it is not as flexible in configuration as bcc, so you may be confronted with restrictions and want to go back to bcc.

Refer to bpftrace installation instructions . This is a newer project, so at the time of this writing, packages are not yet assembled for all systems. In the future, it should just be apt-get install bpftrace or something similar.

1. bpftrace tutorial

I developed a tutorial that teaches how to use bpftrace through a series of one-liners:

Bpftrace tutorial with one-line examples

There are 12 lessons that will teach you how to work with bpftrace, step by step. Here is an example:

# bpftrace -e 'tracepoint: syscalls: sys_enter_open {printf ("% d% s \ n", pid, str (args-> filename)); } '
Attaching 1 probe ...
181 / proc / cpuinfo
181 / proc / stat
1461 / proc / net / dev
1461 / proc / net / if_inet6
^ C

It uses the open system call as a trace point to track the PID and open file paths.

2. bpftrace reference manual

For details on bpftrace, I wrote a guide containing examples of syntax, tests, and built-in commands:

Bpftrace reference manual

This is for the sake of brevity: I try to place the title, summary and screenshot on one page. I think it is too long - if you are looking for something and you need to scroll through the page several times.

3. bpftrace in examples

There are over 20 tools in the bpftrace repository, which you can see by examples:

Bpftrace tools

For example:

# cat tools / biolatency.bt

[...]
BEGIN
{
    printf ("Tracing block device I / O ... Hit Ctrl-C to end. \ n");
}
kprobe: blk_account_io_start
{
    @start [arg0] = nsecs;
}
kprobe: blk_account_io_completion
/ @ start [arg0] /
{
    @usecs = hist ((nsecs - @start [arg0]) / 1000);
    delete (@start [arg0]);
}

Like bcc, these utilities have man pages and sample files. For example, biolatency_example.txt .

For advanced

1. We study the development of bcc

I created two manuals to help:

There are also many examples in bcc / tools / *. Py. The bcc tools consist of two parts: the BPF code for the kernel written in C, and the user-space level tool written in Python (or lua, or C ++). The development of bcc tools is quite advanced and may include some small core components or internal application components.

2. Participation in the development

Help with:

For bpftrace, I created the bpftrace internal development guide . It's hard when you program in llvm IR, but if you are ready to accept the challenge ...

There is also the core of eBPF (aka BPF): if you look at the bcc and bpftrace issues, you will see there are several requests for improvements. For example, the kernel tag in bpftrace . Also look at the netdev mailing list for the latest BPF kernel developments that are added to net-next before merging them with the Linux mainline.

In addition to writing code, you can also take part in testing, packaging, blog posts and discussions.

Conclusion

eBPF can do a lot of different things. In this post, I looked at mastering eBPF for tracing and performance analysis. Eventually:

Getting Started: running bcc tools
Proficient: developing bpftrace tools
Advanced: developing bcc tools , contributing to bcc and bpftrace

I also have a separate eBPF Tracing Tools page covering all of this in more detail. Successes!

Tags: