Open source: kodoyumor, kodotryuki, NOT codebred
Tinkering in a variety of open source software, I occasionally find all sorts of interesting things: sometimes it's just a funny comment, sometimes it's something witty in a broader sense. Such collections periodically appear in the “global Internet” and on Habré - there is, say, a well-known question on StackOverflow about comments in the code, and here a collection of funny names of legal entities and toponyms was recently published . I will try and structure and lay out what I gradually accumulated. Under the cut you are waiting for quotes from QEMU, the Linux kernel, and not only.
Linux kernel
I think for many it is not a secret that letters from the Linux Kernel Mailing List periodically disagree on quotes. So let's take a look at the code. Immediately, the kernel build system greets us with a surprise: as you know, projects compiled by Autoconf have a Makefile with two standard cleanup targets: clean
and distclean
. Naturally, the core is not going with the help of Autoconf, and even then only there is only one menuconfig
, so more goals here: clean
, distclean
and mrproper
- yes, "Mr. Proper"core cleaner twice as fast.
Speaking of configuring the system: once upon a time, I was surprised when I came across it in addition to the clear commands like allnoconfig
, allyesconfig
(I suspect that can compile something much debugging , so now I had a load on real hardware would not dare ...) and allmodconfig
the mysterious goal allrandconfig
. “They're mocking me,” I thought, then told my acquaintance about this observation, to which he replied that it was probably quite a meaningful command, but not for actual assembly, but for testing the correctness of the dependencies between the options — as I said now, a fazzing of configuration parameters.
However, there is life in the core outside the assembly system: documentation is sometimes not only technical, but also a kind of artistic value. Suppose you want to warn hibernation users of its fragility and risk of data loss if certain rules are not followed. I would sadly write, they say ATTENTION: <substitute a couple of the most boring lines> . But the developer who wrote this acted differently:
Some warnings, first.
* BIG FAT WARNING *********************************************************
*
* If you touch anything on disk between suspend and resume...
* ...kiss your data goodbye.
*
* If you do resume from initrd after your filesystems are mounted...
* ...bye bye root partition.
* [this is actually same case as above]
*
* ...
Little tricks
Not surprisingly, not every code can be compiled with optimizations: when I tried to force them on for all object files, I naturally ran into some source of entropy or something like that, which I gave #error
out if optimization was turned on. Well, cryptography is like that. Do you want a code that does not compile if you turn off all optimizations, inlining, etc.? How is this possible? And this is such a static assert:
/* SPDX-License-Identifier: GPL-2.0 */// .../*
* This function doesn't exist, so you'll get a linker error if
* something tries to do an invalidly-sized xchg().
*/externvoid __xchg_called_with_bad_pointer(void);
staticinlineunsignedlong __xchg(unsignedlong x, volatilevoid *ptr, int size)
{
unsignedlong ret, flags;
switch (size) {
case1:
#ifdef __xchg_u8return __xchg_u8(x, ptr);
#else
local_irq_save(flags);
ret = *(volatile u8 *)ptr;
*(volatile u8 *)ptr = x;
local_irq_restore(flags);
return ret;
#endif/* __xchg_u8 */// ...default:
__xchg_called_with_bad_pointer();
return x;
}
}
It is assumed, apparently, that with any use with a constant argument, this function will unfold into only one branch switch
, and if used with a valid argument, this branch will not default:
.
In a non-optimized form, this function will cause a linking error practically by design ...
Did you know
- ... that the kernel has a bytecode JIT compiler from user mode? This technology is called eBPF and is used for routing, tracing, and more. By the way, if you are not afraid of experimental "nuclear" tools, look at the bpftools package.
- ... that the core can go for about five processor days? There is such a system call
sendfile
that copies bytes from one file descriptor to another. If you give it the same descriptor and set the correct offset in the file, it will rewind the same data until it copies 2 GB. - ... that there is a variant of the hibernation work carried out by the user process - I would not be surprised if it can be saved on the network storage as well.
QEMU
In general, when I read Robert Love about the Linux kernel device, and then I got into the QEMU source code, I had a certain feeling of deja vu. There were lists that are embedded in structures by value (and not like they learn in the initial programming course — via pointers), and a certain RCU subsystem (what I mean, I did not fully understand, but it also exists in the kernel) probably a lot more similar.
What is the first thing that a neat person wants to work on on a project? Probably with a coding style. And already in this, one might say, ceremonial, document, we see:
1. Whitespace
Of course, the most important aspect in any coding style is whitespace.
Crusty old coders who have trouble spotting the glasses on their noses
can tell the difference between a tab and eight spaces from a distance
of approximately fifteen parsecs. Many a flamewar has been fought and
lost on this issue.
Here, too, about the perennial question about the maximum length of lines:
Lines should be 80 characters; try not to make them longer.
...
Rationale:
- Some people like to tile their 24" screens with a 6x4 matrix of 80x24
xterms and use vi in all of them. The best way to punish them is to
let them keep doing it.
...
(Hmm ... It's twice as much on each axis than I sometimes use. Is it such a Linux HD?)
There is still a lot of interesting - read .
And again tricks
They say C is a low-level language. But if it is good to pervert, it is possible to manifest the wonders of compile-time code generation without any Scala or even C ++.
For example, in the QEMU codebase, the file is hidden softmmu_template.h
. When I saw this name, I thought that it was supposed to be copied into my implementation of the TCG backend and corrected until the correct implementation of TLB was obtained from it. No matter how wrong! Here's how to use it correctly :
accel / tcg / cputlb.h:
define DATA_SIZE 1#include"softmmu_template.h"#define DATA_SIZE 2#include"softmmu_template.h"#define DATA_SIZE 4#include"softmmu_template.h"#define DATA_SIZE 8#include"softmmu_template.h"
As you can see, sleight of hand and no C ++. But this is a pretty simple example. How about something more complicated?
There is such a file: tcg / tcg-opc.h . Its content is rather mysterious and looks something like this:
...
DEF(mov_i32, 1, 1, 0, TCG_OPF_NOT_PRESENT)
DEF(movi_i32, 1, 0, 1, TCG_OPF_NOT_PRESENT)
DEF(setcond_i32, 1, 2, 1, 0)
DEF(movcond_i32, 1, 4, 1, IMPL(TCG_TARGET_HAS_movcond_i32))
/* load/store */
DEF(ld8u_i32, 1, 1, 1, 0)
DEF(ld8s_i32, 1, 1, 1, 0)
DEF(ld16u_i32, 1, 1, 1, 0)
DEF(ld16s_i32, 1, 1, 1, 0)
...
In fact, everything is very simple - it is used like this:
tcg / tcg.h:
typedefenum TCGOpcode {
#define DEF(name, oargs, iargs, cargs, flags) INDEX_op_ ## name,#include"tcg-opc.h"#undef DEF
NB_OPS,
} TCGOpcode;
Or so:
tcg / tcg-common.c:
TCGOpDef tcg_op_defs[] = {
#define DEF(s, oargs, iargs, cargs, flags) \
{ #s, oargs, iargs, cargs, iargs + oargs + cargs, flags },
#include "tcg-opc.h"
#undef DEF
};
It is even strange that there were no other use cases. And note, in this case there are no tricky scripts for code generation - only C, only hardcore.
Did you know
- ... that QEMU can work not only in the full system emulation mode, but also run a separate process for a different architecture, communicating with the host core?
Java, JVM and all-all-all
What am I all about Linux? Let's talk about something cross-platform. About JVM, for example. Well, about GraalVM, probably, many developers in this ecosystem have already heard. If not heard, then in two words: it is epic. So, after the story of Graal, let's move on to the good old JVM.
Sometimes the JVM needs to stop all managed flows — the garbage collection stage is so intricate or something else — but luck, stop the flows only at so-called safepoints. As explained here , a normal global variable check takes a long time, including some shamanism with memory barriers. What did the developers do? They limited themselves to one reading of the variable.
Есть такой шуточный язык — HQ9+. Он создавался как "очень удобный учебный язык программирования", а именно, на нём очень просто выполнять типичные задачи, которые задают ученикам:
- по команде 'H' интерпретатор печатает Hello, World!
- по команде 'Q' печатает сам текст программы (квайн)
- на '9' он печатает текст песенки про 99 bottles of the beer
- по 'i' он увеличивает переменную i на единицу
- больше он ничего делать не умеет, а зачем?..
How does the JVM with one instruction achieve the goal? And it's very simple - if you stop it, it removes the display for the memory page with this variable - the streams fall along SIGSEGV, and the JVM parks them and removes them from the pause when the “maintenance” ends. I remember on StackOverflow to the question from the interview How do you crash a JVM? answered:
JNI. In fact, with JNI, crashing is the default mode of operation. You have to work.
Jokes jokes, and sometimes in the JVM it really is.
Well, since I mentioned the code generation in Scala, and now we are talking about this ecosystem, here's an interesting fact for you: the code generation in Scala (the one that is macros) is approximately as follows: you write code on Scala using the API compiler, and compile it. Then the next time you start the compiler, you simply pass the resulting code generator to the compiler classpath itself, and the latter, upon seeing a special directive, calls it, passing the syntax trees received during the call. In response, he receives an AST, which must be substituted at the call site.
Features of licensing ideologies
I like the free software ideology, but it also has some funny features.
Once, about ten years ago, I updated my Debian stable and, thinking about the syntax of a command, I typed man <команда>
, which I received a comprehensive description like “[program name] is a program with documentation distributed under the GNU GFDL license with immutable sections that are not DFSG-free. ” They say that this program was written by some evil proprietors from some FSF ... (Now the discussion is googling .)
And some small but important library is considered to be non-free software by some distributions, as the author added to the standard permissive license that this program should be used for good and not for evil . Laughing with laughter, and I, too, would probably be afraid of such a thing in production - is it not enough, what ideas about good and evil from the author.
Any different
Features of international compiler in the period of Moore's law
Severe LLVM developers have limited supported alignment:
The maximum alignment is 1 << 29.
As they say, it makes you laugh first, and then think : the first thought is that who needs alignment with 512 MiB. Then I read about the development of the kernel on Rust , and there they propose to make the structure “page table” aligned to 4096 bytes. And as you read Wikipedia, so there in general:
48-bit space for more than 512 GB of memory (for about 0.195% of the 256 TB virtual space).
Format version - how to store?
Once I decided to find out why export in one program does not work, but it turns out that it works ... Or not?
Having manually started backend commands, I realized that, in principle, everything is in order, just the version should be transferred as "2.0", and it simply leaves "2". Anticipating a trivial correction by editing a string constant, I discover a function double getVersion()
- but what, major is, minor is, even the point is! However, in the end everything was decided not much harder than expected, Ijust increased the accuracy of the output Forwarded the data type and forwarding the lines.
About the difference between theorists and practitioners
In my opinion, somewhere on Habré I have already seen the translation of an article about what is the minimum C program that falls on startup, but still? int main;
- the symbol main
is, and technically , you can transfer control to it. sirikid correctly noted that even the bytes are int
superfluous here. In general, even speaking of a program of 9 bytes in size, it is better not to be scattered with statements that it is the smallest ... True, the program will fall, but the rules are fully consistent.
So, we can do what should work, but what about running the non-executable?
$ ldd /bin/ls
linux-vdso.so.1 (0x00007fff93ffa000)
libselinux.so.1 => /lib/x86_64-linux-gnu/libselinux.so.1 (0x00007f0b27664000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f0b2747a000)
libpcre.so.3 => /lib/x86_64-linux-gnu/libpcre.so.3 (0x00007f0b27406000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f0b27400000)
/lib64/ld-linux-x86-64.so.2 (0x00007f0b278e9000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f0b273df000)
$ /lib/x86_64-linux-gnu/libc.so.6
... and libc to him in a human voice:
GNU C Library (Ubuntu GLIBC 2.28-0ubuntu1) stable release version 2.28.
Copyright (C) 2018 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.
There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A
PARTICULAR PURPOSE.
Compiled by GNU CC version 8.2.0.
libc ABIs: UNIQUE IFUNC ABSOLUTE
For bug reporting instructions, please see:
<https://bugs.launchpad.net/ubuntu/+source/glibc/+bugs>.
Programmers play golf
There is a whole site on StackExchange dedicated to Code Golf - competitions with the style “Solve this problem with a minimum fine depending on the size of the source code”. The format itself involves very sophisticated solutions, but sometimes they become very sophisticated. Therefore, one of the questions was a collection of standard forbidden loopholes. I especially like this one:
Using MetaGolfScript
MetaGolfScript is a family of programming languages. For example, the program in MetaGolfScript-209180605381204854470575573749277224 prints "Hello, World!".
One line
An uninitialized bool that crashes the program , or where the undefined behavior results in
... which, by the way, is magic. Someone in LLVM joked on April 1, and another report from this made: My little optimizer: undefined behavior is magic
Remember the ambiguous call to praise the developers of open source software in the tracker? In Binaryen it was implemented simply and with taste :
Issue: This project is AMAZING
Resolution: wontfix, works as intendedin February 2009, a person asked a question on StackOverflow: "How can I do this and that cross-browser? Or is it, Three working browsers are great and who uses Chrome anyway? "
Finally, where does the title of the article come from? This is a rephrased trick from the output of the compiler emcc
from Emscripten :
$ emcc --help
...
emcc: supported targets: llvm bitcode, javascript, NOT elf
(autoconf likes to see elf above to enable shared object support)