Is there any benefit from custom kernels
Many have heard of various optimized and improved kernels; these are Zen Kernel and the pf-kernel that I know of. In addition to adding new features (TuxOnIce, aufs support), they can improve performance thanks to an improved task manager (BFS) and scheduler (BFQ). In this topic, I want to compare the performance of pf-kernel with standard kernels in Ubuntu and Arch Linux, and also describe the process of building and installing pf-kernel for Ubuntu. I don’t see much point in testing Zen Kernel, because firstly, the project looks abandoned, and secondly, the set of patches is very similar both there and there.
Let's start with the Arch Linux test on a netbook.
UnixBench test results on the standard core (3.0-ARCH):
And here is the same test for pf-kernel (3.0-pf):
As you can see, the overall performance increase was 20%.
Now the results of the same tests, but the same for Ubuntu.
On the standard kernel (2.6.38-11-generic):
On the pf core (2.6.38-pf8):
The increase was 18%, which in my opinion is quite noticeable. Why did the second test produce a slightly lower result? Most likely, the fact is that the test was conducted on x86_64 and in the standard kernel there were more optimizations for the processor architecture than with the core assembled for Pentium Pro on Intel Atom (SSE and others).
As you can see from all this, there is a point in assembling your kernel. The results are approximately the same on two fairly different processors: Intel Atom N270 and Core 2 Duo E8500.
I will not describe the kernel installation process for ARCH, it is as simple as possible. I am sure that it will not be difficult for its users.
Download the kernel of our version from kernel.org. Attention: you need to download the version without stabilization patches (in the case of 2.6.38.11, you just need to download 2.6.38).
Download pf-kernel for this version of the kernel from here .
Unpack the archives and install the patch.
patch -p1 <(pfkernel patch address)
Copy your config to the kernel folder.
cp / boot / config-`uname -r` .config
If you wish, you can make localmodconfig, which will disable all unnecessary modules, this can greatly accelerate the assembly of the kernel.
make localmodconfig
if it swears that there is no / sbin / lsmod
ln -s / bin / lsmod / sbin / lsmod We configure
the make menuconfig kernel
Нужно включить BFS, BFQ и tuxonice при желании, а также во вкладке о процессоре стоит выбрать оптимизацию под свой процессор.
Ставим патч для ядер с kernel.org
sed -rie 's/echo "\+"/#echo "\+"/' scripts/setlocalversion
Очищаем директорию
make-kpkg clean
Собираем
CONCURRENCY_LEVEL=`getconf _NPROCESSORS_ONLN` fakeroot make-kpkg --initrd --append-to-version=-pf kernel_image kernel_headers
Вот собственно и все. Ставим ядро командой dpkg -i *.deb, перезагружаемся и выбираем его в загрузчике.
UPDATE:
Zen Kernel показал практически идентичный результат, местами чуть лучше, но в общем не более чем на 5%, а затем скрашился даже не завершив все тесты (время теста около 40 минут).
Некто Mr.z I very much doubted the correctness of the calculations, here in the table you can see the increase in indicators for each test, as well as the average increase, and not just the index increase. The numbers came out almost exactly the same.
For IoGa , WiseLord and gnomeby - Comparison of a vanilla kernel with a vanilla assembled for its architecture, if it showed a performance increase, it is not more than the level of error, almost no difference.
Tests
Arch linux
Let's start with the Arch Linux test on a netbook.
UnixBench test results on the standard core (3.0-ARCH):
Test | Score | Unit | Time | Iters. | Baseline | Index |
---|---|---|---|---|---|---|
Dhrystone 2 using register variables | 3432673.5 | lps | 10.0 s | 7 | 116700.0 | 294.1 |
Double-Precision Whetstone | 821.7 | MWIPS | 10.2 s | 7 | 55.0 | 149.4 |
Execl throughput | 1048.3 | lps | 29.7 s | 2 | 43.0 | 243.8 |
File Copy 1024 bufsize 2000 maxblocks | 120834.3 | Kbps | 30.0 s | 2 | 3960.0 | 305.1 |
File Copy 256 bufsize 500 maxblocks | 36417.8 | Kbps | 30.0 s | 2 | 1655.0 | 220.0 |
File Copy 4096 bufsize 8000 maxblocks | 290993.0 | Kbps | 30.0 s | 2 | 5800.0 | 501.7 |
Pipe throughput | 240124.9 | lps | 10.0 s | 7 | 12440.0 | 193.0 |
Pipe-based context switching | 21672.7 | lps | 10.0 s | 7 | 4000.0 | 54.2 |
Process creation | 2885.9 | lps | 30.0 s | 2 | 126.0 | 229.0 |
Shell Scripts (1 concurrent) | 738.5 | lpm | 60.0 s | 2 | 42.4 | 174.2 |
Shell Scripts (8 concurrent) | 135.6 | lpm | 60.4 s | 2 | 6.0 | 226.1 |
System call overhead | 600176.7 | lps | 10.0 s | 7 | 15000.0 | 400.1 |
System Benchmarks Index Score: | 221.1 |
And here is the same test for pf-kernel (3.0-pf):
Test | Score | Unit | Time | Iters. | Baseline | Index |
---|---|---|---|---|---|---|
Dhrystone 2 using register variables | 3700926.6 | lps | 10.0 s | 7 | 116700.0 | 317.1 |
Double-Precision Whetstone | 846.1 | MWIPS | 10.2 s | 7 | 55.0 | 153.8 |
Execl throughput | 1343.2 | lps | 29.6 s | 2 | 43.0 | 312.4 |
File Copy 1024 bufsize 2000 maxblocks | 127468.0 | Kbps | 30.0 s | 2 | 3960.0 | 321.9 |
File Copy 256 bufsize 500 maxblocks | 37622.9 | Kbps | 30.0 s | 2 | 1655.0 | 227.3 |
File Copy 4096 bufsize 8000 maxblocks | 342606.2 | Kbps | 30.0 s | 2 | 5800.0 | 590.7 |
Pipe throughput | 296672.7 | lps | 10.0 s | 7 | 12440.0 | 238.5 |
Pipe-based context switching | 41227.5 | lps | 10.0 s | 7 | 4000.0 | 103.1 |
Process creation | 3969.3 | lps | 30.0 s | 2 | 126.0 | 315.0 |
Shell Scripts (1 concurrent) | 861.1 | lpm | 60.1 s | 2 | 42.4 | 203.1 |
Shell Scripts (8 concurrent) | 159.4 | lpm | 60.2 s | 2 | 6.0 | 265.6 |
System call overhead | 642005.3 | lps | 10.0 s | 7 | 15000.0 | 428.0 |
System Benchmarks Index Score: | 264.6 |
As you can see, the overall performance increase was 20%.
Ubuntu
Now the results of the same tests, but the same for Ubuntu.
On the standard kernel (2.6.38-11-generic):
Test | Score | Unit | Time | Iters. | Baseline | Index |
---|---|---|---|---|---|---|
Dhrystone 2 using register variables | 39162082.2 | lps | 10.0 s | 7 | 116700.0 | 3355.8 |
Double-Precision Whetstone | 9143.1 | MWIPS | 9.9 s | 7 | 55.0 | 1662.4 |
Execl throughput | 11472.2 | lps | 29.8 s | 2 | 43.0 | 2668.0 |
File Copy 1024 bufsize 2000 maxblocks | 1041722.3 | Kbps | 30.0 s | 2 | 3960.0 | 2630.6 |
File Copy 256 bufsize 500 maxblocks | 327345.4 | Kbps | 30.0 s | 2 | 1655.0 | 1977.9 |
File Copy 4096 bufsize 8000 maxblocks | 1730411.9 | Kbps | 30.0 s | 2 | 5800.0 | 2983.5 |
Pipe throughput | 4204868.3 | lps | 10.0 s | 7 | 12440.0 | 3380.1 |
Pipe-based context switching | 738528.0 | lps | 10.0 s | 7 | 4000.0 | 1846.3 |
Process creation | 32309.9 | lps | 30.0 s | 2 | 126.0 | 2564.3 |
Shell Scripts (1 concurrent) | 11023.5 | lpm | 60.0 s | 2 | 42.4 | 2599.9 |
Shell Scripts (8 concurrent) | 1425.4 | lpm | 60.0 s | 2 | 6.0 | 2375.7 |
System call overhead | 5723850.3 | lps | 10.0 s | 7 | 15000.0 | 3815.9 |
System Benchmarks Index Score: | 2580.4 |
On the pf core (2.6.38-pf8):
Test | Score | Unit | Time | Iters. | Baseline | Index |
---|---|---|---|---|---|---|
Dhrystone 2 using register variables | 71269301.5 | lps | 10.0 s | 7 | 116700.0 | 6107.1 |
Double-Precision Whetstone | 9175.2 | MWIPS | 9.9 s | 7 | 55.0 | 1668.2 |
Execl throughput | 12014.6 | lps | 30.0 s | 2 | 43.0 | 2794.1 |
File Copy 1024 bufsize 2000 maxblocks | 1580881.5 | Kbps | 30.0 s | 2 | 3960.0 | 3992.1 |
File Copy 256 bufsize 500 maxblocks | 428842.2 | Kbps | 30.0 s | 2 | 1655.0 | 2591.2 |
File Copy 4096 bufsize 8000 maxblocks | 2315055.5 | Kbps | 30.0 s | 2 | 5800.0 | 3991.5 |
Pipe throughput | 4389021.4 | lps | 10.0 s | 7 | 12440.0 | 3528.2 |
Pipe-based context switching | 831655.8 | lps | 10.0 s | 7 | 4000.0 | 2079.1 |
Process creation | 34789.6 | lps | 30.0 s | 2 | 126.0 | 2761.1 |
Shell Scripts (1 concurrent) | 11890.9 | lpm | 60.0 s | 2 | 42.4 | 2804.5 |
Shell Scripts (8 concurrent) | 1506.4 | lpm | 60.0 s | 2 | 6.0 | 2510.7 |
System call overhead | 5815793.6 | lps | 10.0 s | 7 | 15000.0 | 3877.2 |
System Benchmarks Index Score: | 3050.7 |
The increase was 18%, which in my opinion is quite noticeable. Why did the second test produce a slightly lower result? Most likely, the fact is that the test was conducted on x86_64 and in the standard kernel there were more optimizations for the processor architecture than with the core assembled for Pentium Pro on Intel Atom (SSE and others).
As you can see from all this, there is a point in assembling your kernel. The results are approximately the same on two fairly different processors: Intel Atom N270 and Core 2 Duo E8500.
I will not describe the kernel installation process for ARCH, it is as simple as possible. I am sure that it will not be difficult for its users.
Build and install pf-kernel for Ubuntu
Download the kernel of our version from kernel.org. Attention: you need to download the version without stabilization patches (in the case of 2.6.38.11, you just need to download 2.6.38).
Download pf-kernel for this version of the kernel from here .
Unpack the archives and install the patch.
patch -p1 <(pfkernel patch address)
Copy your config to the kernel folder.
cp / boot / config-`uname -r` .config
If you wish, you can make localmodconfig, which will disable all unnecessary modules, this can greatly accelerate the assembly of the kernel.
make localmodconfig
if it swears that there is no / sbin / lsmod
ln -s / bin / lsmod / sbin / lsmod We configure
the make menuconfig kernel
Нужно включить BFS, BFQ и tuxonice при желании, а также во вкладке о процессоре стоит выбрать оптимизацию под свой процессор.
Ставим патч для ядер с kernel.org
sed -rie 's/echo "\+"/#echo "\+"/' scripts/setlocalversion
Очищаем директорию
make-kpkg clean
Собираем
CONCURRENCY_LEVEL=`getconf _NPROCESSORS_ONLN` fakeroot make-kpkg --initrd --append-to-version=-pf kernel_image kernel_headers
Вот собственно и все. Ставим ядро командой dpkg -i *.deb, перезагружаемся и выбираем его в загрузчике.
UPDATE:
Zen Kernel показал практически идентичный результат, местами чуть лучше, но в общем не более чем на 5%, а затем скрашился даже не завершив все тесты (время теста около 40 минут).
Некто Mr.z I very much doubted the correctness of the calculations, here in the table you can see the increase in indicators for each test, as well as the average increase, and not just the index increase. The numbers came out almost exactly the same.
For IoGa , WiseLord and gnomeby - Comparison of a vanilla kernel with a vanilla assembled for its architecture, if it showed a performance increase, it is not more than the level of error, almost no difference.