We study netfilter: we write our match module based on xt_string to search for several patterns

  • Tutorial


Recently noticed that on a habr there is a little information on development of kernel modules. All I found:

It was always surprising that people who are more or less knowledgeable of C are afraid and avoid even reading the kernel code, as if it consists of assembler for 60% (which is actually not that complicated either). Actually, I plan to write a series of articles on the development or refinement of existing netfilter and iptables modules.

I hope they will be interesting for beginner kernel developers, driver writers, or just people who want to try themselves in a new development area.

What do we do

As the title of the article says, we will write a simple iptables module based on xt_string. Xt_string is a netfilter module that can search for a sequence of bytes in a packet. However, in my opinion, he lacks the ability to search for several sequences of bytes in a given order. Well, since the GPL, what prevents it from giving this opportunity?

Actually, in this article we will write down such a module, we will call it xt_wildstring, which can be used for thick PR as follows:

iptables -I FORWARD -p tcp --dport 80 --tcp-flags ACK,PSH ACK,PSH -m wildstring --wildstring "reductor*price*carbonsoft.ru" -j DROP.

I will start writing the article at the same time as the development begins.
It should be noted right away - this module was not written for production, but only as a simple example, which will quickly arrange the development and testing of kernel modules, as well as get to know a little deeper with netfilter.

Briefly about netfilter and iptables

Typically, the iptables module consists of two parts - kernelspace and userspace. Kernelspace contains a Linux kernel module that can be dynamically loaded and used. It also works with packages when we add a rule to iptables. Userspace already contains the iptables module, which allows you to create rules and pass them to the Linux kernel.

Netfilter modules can be divided into three categories:
  • Hooks are essentially the default chains and tables that are substituted on the package path through the kernel
  • Matches - modules that return true or false, allow you to use conditions, for example, determine which protocol the packet belongs to
  • Targets are modules that perform some action on the package, the most famous are ACCEPT / DROP, although in fact there are much more

Where in the source are these modules:

Netfilter is part of the Linux kernel source and is located in several directories in version 2.6.32:
/ usr / src / linux / net / netfilter / - most match modules.
/ usr / src / linux / net / ipv4 / netfilter / is part of the target modules.
/ usr / src / linux / include / linux / netfilter / - the headers of both of these modules.

The iptables modules are located in the
/ usr / src / iptables / extensions /

directory . The headers of the kernelspace and userspace modules must be the same, so it is better if it is a single file.

Now let's move from theory to practice

We will not reinvent the wheel, not for that the GPL was invented. Take the xt_string module from the latest CentOS 6 kernel, as one of the most stable at the moment.

A lot of information came out about setting up the assembly system for the module and the stand, so I hid it under the spoiler. If there is a misunderstanding or interest in where and what is collected, launched and tested - it makes sense to look under it.

Build system and test bench settings.
Preparing a build and debugging system

Yes, many dream of a convenient IDE for developing Linux Kernel. But, alas and ah, I did not find anything worthwhile. One of the reasons for this is relatively simple - in the case of segfault in the kernel, we get Kernel Panic and spend a lot of time on rebooting if a panic occurs on our working machine. Therefore, development, as a rule, is carried out in a virtual machine, or on a separate stand, in case the code is written for specific hardware. However, our module is universal, so we install virtual machines.

We put CentOS on two virtual machines

Actually, so that our brain does not stand idle during Kernel Panic in case of failures, and they are guaranteed to be, we will proceed as follows. Install two virtual machines that have access to the Internet and to each other. One will be the module builder, and the second will be a test stand.

On the collector we get the linux and iptables sources

By the way, on the collector we need some good and useful programs.

yum install git ncurses-devel make gcc rpm-build indent

Now we will bookmark one of the most useful repositories for a person developing for CentOS:


From here we will take src.rpm Linux kernels and Iptables.

rpm -i http://vault.centos.org/6.4/os/Source/SPackages/kernel-2.6.32-358.el6.src.rpm
rpm -i http://vault.centos.org/6.4/os/Source/SPackages/iptables-1.4.7-9.el6.src.rpm

Then, go to / root / rpmbuild / SPECS / and deploy the source files with patch patches from CentOS.

rpmbuild -bp iptables.spec
rpmbuild -bp kernel.spec

In / root / rpmbuild / BUILD / we will see folders with Linux kernel sources and iptables.

Now you need to assemble the whole kernel at least once, in order to be able to rebuild only the net / netfilter / folder when making changes to our module. For convenience and familiarity, we will make symlinks:

ln -s /root/rpmbuild/BUILD/kernel-2.6.32-358.el6/linux-2.6.32-358.el6.x86_64/ /usr/src/linux
ln -s /root/rpmbuild/BUILD/iptables-1.4.7/ /usr/src/iptables/

Go to / usr / src / linux. First, we’ll generate a config.

make menuconfig

We save it and collect the entire core. By the way, maybe on rpmbuild or make it will hang on gpg: keyring `./pubring.gpg 'created. To avoid this, let's say that random with us is urandom.

rm -f /dev/random
ln -s /dev/urandom /dev/random

And the actual assembly:

make prepare
make -j 3 
make modules_install

In general, it would be nice to store the source code of the module in the GIT repository, for me it is located in ~ / GIT / wildstring /.

Rebooting the stand with kernel panic

You can do this in two ways, in my opinion, the most correct one is to set the / proc / sys / kernel / panic parameter to 2. But the output of panic is important to us, so if necessary, you can use the script on the host system in the following spirit:

while true; do
	if ! ping -qc 1 $ip; then
		virt-viewer $name
		sleep 2
		virsh destroy $name
		virsh start $name
		sleep 60

Checking the health of the module

test_wildstring() {
	iptables -F OUTPUT
	rmmod xt_wildstring
	insmod xt_wildstring
	iptables -I OUTPUT -p tcp –dport 80 -m wildstring “opensource*carbonsoft” -j DROP
	wget -t 1 -T 1 http://carbonsoft.ru/opensource/
	Iptables -nvL OUTPUT
if [ “$1” = 'while' ]; then
	while true; do
		sleep 1

Which can be used as follows:

One-time launch:

Endless cycle:
./test_wildstring.sh while

Copy string from linux and iptables

We find the modules we need and copy them to our repository.

cp -v /usr/src/linux/net/netfilter/xt_string.c ~/GIT/wildstring/xt_wildstring.c
mkdir -p ~/GIT/wildstring/include/linux/netfilter/
cp -v /usr/src/linux/include/linux/netfilter/xt_string.h ~/GIT/wildstring/include/linux/netfilter/xt_wildstring.h

Writing a Makefile

We describe the assembly of the kernel module, the iptables module, as well as the alignment of the code, the cleaning of the working folder, and a couple of other goals.

obj-m += xt_wildstring.o
all: module lib
	cp include/linux/netfilter/xt_wildstring.h /usr/src/linux/include/linux/netfilter/xt_wildstring.h
	make -C /lib/modules/2.6.32/build M=$(PWD) modules
	cp libxt_wildstring.c /usr/src//iptables/extensions
	cp include/linux/netfilter/xt_wildstring.h /usr/src/iptables/include/linux/netfilter/xt_wildstring.h
	make -C /usr/src/iptables/extensions
	cp /usr/src/iptables/extensions/libxt_wildstring.so libxt_wildstring.so
	gcc userspace_wildstring.c -o userspace
	rm -f userspace
	scp xt_wildstring.ko root@
	scp libxt_wildstring.so root@
	rm -f *~ *.ko *.so *.mod.c *.ko.unsigned *.o modules.order Module.symvers
	Lindent *.c include/linux/netfilter/xt_wildstring.h

Comments on Makefile:
  • 2.6.32 - hardcodes, since uname -r = 2.6.32-358.0.1.el6.x86_64, but I don’t have these sources at hand, so the symlink symlink /lib/modules/2.6.32-358.0.1. el6.x86_64 / build will not work.
  • Since I am not a makefile guru, and have not come up with a beautiful and correct way to compile libxt_wildstring.so like xt_wildstring.ko, I decided not to bother and write this target with simple bash commands.
  • In order for scp in the install target to work without a password, you need to generate SSH keys on the build system and drop them to the test bench.
  • The Lindent command is copied from / usr / src / linux / scripts / Lindent to / usr / local / bin, as it is often used. I recommend using it always when writing code in the Linux kernel, since they don’t go to someone’s monastery with their charter. Better even before each commit.

We remove excess in .gitignore

Untracked files in git status are somewhat annoying, so create ~ / GIT / wildstring / .gitignore:

* .o
* .so
. *
* .Ko
* .ko.unsigned
* .mod.c
! .Gitignore

Rename to wildstring

So that the module does not conflict with the original, it makes sense to rename it and all its functions from string to wildstring. An important point - you need to edit everything: the header, the userspace module, and the kernelspace module. In this case, grep will save the father of Russian democracy:

grep -ri string xt_wildstring.c | grep -vi wildstring

Extending match info structure

And again, a little theory: each match-module has its own match-info structure, which is formed on the basis of parameters passed from userspace. It is described in the header file ( xt_wildstring.h ).

The standard xt_string.h is as follows
#ifndef _XT_STRING_H
#define _XT_STRING_H
enum {
struct xt_string_info
	__u16 from_offset; //сдвиг от начала данных в пакете – откуда начинаем поиск.
	__u16 to_offset; //сдвиг от начала данных в пакете – до куда продолжаем поиск.
	char	algo[XT_STRING_MAX_ALGO_NAME_SIZE]; //используемый алгоритм.
	char 	pattern[XT_STRING_MAX_PATTERN_SIZE]; //то, что мы ищем, шаблон.
	__u8 patlen; //длина шаблона, заполняется автоматически.
	union {
		struct {
			__u8 invert; //флаг инверсии модуля ! -m string –string “something”
		} v0;
		struct {
			__u8 flags; //не помню точно что это.
		} v1;
	} u;
	/* Used internally by the kernel 
	 * конфиг текстового поиска.
	 *вообще довольно забавное по назначению поле, но кто говорил что
	 *конфигоманией страдают только java-программисты?
	 *возрадуемся по крайней мере тому, что он не в xml.
	struct ts_config __attribute__((aligned(8))) *config; 
#endif /*_XT_STRING_H*/

Multiply several fields of the xt_wildstring_info structure in xt_wildstring.h

First, add pointers to substrings. It is pointers, not arrays of characters, as in the original, since the second and third pointers can be empty, that is, a template without asterisks will be passed to the module. By analogy, we add variables for them to store the length of the substrings + according to the structure of the text search parameters in the package for each template. As a result, the structure began to look like this:

enum {
struct xt_wildstring_info
	__u16 from_offset;
	__u16 to_offset;
	/* указатели на шаблоны */
	char 	*pattern_part1;
	char 	*pattern_part2;
	char 	*pattern_part3;
	__u8 patlen;
	/* длины шаблонов */
	__u8 patlen_part1;
	__u8 patlen_part2;
	__u8 patlen_part3;
	union {
		struct {
			__u8 invert;
		} v0;
		struct {
			__u8 flags;
		} v1;
	} u;
	/* Used internally by the kernel */
	/* оригинальный конфиг по идее уже не нужен */
	struct ts_config __attribute__((aligned(8))) *config;
	struct ts_config __attribute__((aligned(8))) *config_part1;
	struct ts_config __attribute__((aligned(8))) *config_part2;
	struct ts_config __attribute__((aligned(8))) *config_part3;

Starting to use the new header fields

Go to xt_wildstring.c .

Now what we added to the header is time to use it. To begin with, let's bring to preparation and destruction of search configs.

Here again, a little theory - as a rule, the structure of a match module contains the following functions and structures:

  • init - initialization of the module when it is loaded;
  • exit - destruction of the module when it is loaded;
  • mt - function checking the package;
  • mt_check - a function that checks the correctness of a module call when adding a rule;
  • mt_destroy - a function that cleans resources when a rule is deleted;
  • mt_reg - structure of pointers to the functions mt_check, mt and mt_destroy + additional information about the module;

In the original xt_string, the rule is added and removed as follows:

In string_mt_check (adding), the ts_config, (ts - text search) structure is generated based on the string and the search algorithm. The search function for the package data (skb_find_text) uses it as a parameter. The memory occupied by this structure (the string_mt_destroy function) is cleared by the textsearch_destroy function, which is called when the rule is removed from the chain.

Add a pair of textsearch_prepare to xt_wildstring_check

Before changing anything, we comment on the original wildstring_mt function, which actually checks the package when it passes through the rule, because it is worthwhile to make changes a little, but this function depends very much on them, but at the same time it is not important to us.

static bool
wildstring_mt(const struct sk_buff *skb, const struct xt_match_param *par)
	return false;
#if 0

First, prepare our ts_conf in the xt_wildstring_check function, which is called when the rule is added to iptables. We will copy the pointer to the beginning of the line into a temporary variable, and we will walk through it with the strsep function, which will split the line according to the given character set. If a token is found, we calculate its length and use it to prepare text search parameters.

s = (char *) conf->pattern;
conf->pattern_part1 = strsep(&s, delim);
if (!conf->pattern_part1)
	return false; //первый элемент в любом случае должен быть
conf->patlen_part1 = strlen(conf->pattern_part1);
ts_conf = textsearch_prepare(conf->algo, conf->pattern_part1,
		conf->patlen_part1, GFP_KERNEL, flags);
if (IS_ERR(ts_conf))
	return false;
conf->config_part1 = ts_conf;

We fill in the next two ts_conf by analogy, with the only difference being that if the pointer to the pattern turned out to be empty, then this is no longer an error, and return true, that is, we work with fewer patterns.

And destroy them in wildstring_mt_destroy

This function is called when the rule is removed from iptables. To destroy the parameters when deleting the rule, multiply destroy.

static void wildstring_mt_destroy(const struct xt_mtdtor_param *par)
	struct xt_wildstring_info *conf = WILDSTRING_TEXT_PRIV(par->matchinfo);
	if (conf->pattern_part1)
	if (conf->pattern_part2)
	if (conf->pattern_part3)

Bring to mind the match

And so the module began to load-unload successfully, and the rules were added-deleted, and no Kernel Panic. Now we return to the previously commented wildstring_mt function and add to it a search for all the templates passed to the function.
First, we need a variable to preserve the length of the shift at which we managed to find the desired substring.

unsigned int skb_find = 0;

In general, this is not the best name, it would be much clearer something like tmp_from_offset or wildstring_from_offset, but everything is already in the commits on the github, so, alas, it's late. Now, instead of returning the result of the first search, we will assign it to our new variable, analyze it and if nothing is found, return false, and so on until we go through all the given patterns.

memset(&state, 0, sizeof(struct ts_state));
skb_find = skb_find_text((struct sk_buff *)skb, conf->from_offset,
		conf->to_offset, conf->config_part1, &state);
if (skb_find == UINT_MAX)
	return false;

And so we repeat for config_part2 and config_part3, with the difference that the presence of pattern_part2 and pattern_part3 must be checked and, if not, return true.

We finish and check

Further we treat all compilation errors. In general, it is better to compile as often as possible, and at each logical end, check the operation of the module in an infinite loop until the next part is added or we notice that kernel panic has happened. It’s worth doing it because the price of the error is much higher and much more time elapses between writing code and verifying that it is fully operational than when writing most userspace utilities. That is why at the very beginning of the article so much attention is paid to the convenience of the assembly and debugging system on the stand, because, as everyone knows, no matter how good the thing inside, if it is inconvenient to use, it will not be used.

Testing on a couple of test cases using wget or curl. When creating a rule, it is important to remember that in the HTTP package GET is in front of the HOST, and the template will have to be written a little backwards:
  • "Something * html * example.com"
  • "Pron * avi * yoursite"
  • "Reductor * scheme * carbonsoft.ru"

That is, we add the rule:

iptables -I OUTPUT -p tcp –dport 80 -m wildstring “reductor*scheme*carbonsoft” -j DROP

and try to download the page:

wget -t 1 -T 1 http://www.carbonsoft.ru/products/reductor/carbon-reductor/#scheme

Bingo - we broke off and iptables -nvL OUTPUT shows an increased packet counter.

Why not lists?

An attentive and experienced reader may exclaim, let’s cry out there - they say why such distortions and crutches, when you can use lists and add / remove structure to it, consisting of pattern, patlen and config, and then go through this list for_each_entry. But - the purpose of the article is to show the device of the netfilter module, and working with lists in the linux kernel would add another additional essence to the module that you need to understand. Well, and besides, you have to leave something to the reader for independent exercises.


Actually, we learned how to make kernel modules for netfilter, isn't that great?
In general, you can use the module not only for HTTP, but also for many other protocols, examples, perhaps, I will add later in the comments.

Sources can be found in the opensource section on our website .

Also popular now: