When is the program code admirable?

The theme of the ideal code often causes controversy among experienced programmers. The more interesting it was to get the opinion of Igor Marnat, director of development at Parallels RAS. Under the cut his author's view on the stated topic. Enjoy!

As an introduction, I would like to dwell on the question of why I decided to write this short article. Before writing it, I asked the question from the title of several developers. With most of the guys had to work for more than five years, with some a little less, but I trust their professionalism and experience unconditionally. All the experience of industrial development for more than ten years, all work in Russian and international companies, manufacturers of software.

Some colleagues found it difficult to answer (some people still think), others gave one or two examples at once. To those who gave examples, I asked a clarifying question - “What, in fact, caused this admiration?”. The answers corresponded to the results of the next stage of my small research. I searched the network for answers to this question in various formulations close to the title of the article. All articles answered in much the same way as my comrades answered.

The developers' answers, as well as the wording of the found articles, related to the readability and structure of the code, the elegance of logical structures, the use of all the features of modern programming languages and the following of a certain style of design.

When I asked myself about the “divine code”, the answer came up immediately, from the subconscious. I immediately thought of two code examples I worked with for a long time (more than ten years ago), but I still feel a sense of admiration and some awe. Having considered the reasons for the admiration of each of them, I formulated several criteria, which will be discussed below. In the first example I’ll dwell on the occasion, the second I would like to make out in more detail. By the way, to varying degrees, all these criteria are discussed in the reference book of each developer “ Perfect Code ” by Steve McConnell, but this article is noticeably shorter.

90s example

The first example I’m talking about is the implementation of the v42bis modem protocol. This protocol was developed in the late 80s - early 90s. An interesting idea embodied by the developers of the protocol is the implementation of stream compression of information during transmission over an unstable (telephone) communication line. The difference in stream compression from file compression is fundamental. When compressing files, the archiver has the ability to analyze the entire data set, determine the best approach to compressing and encoding the data, and write the data to the file as a whole, without worrying about possible loss of data and metadata. When unzipping, in turn, the data set is again fully accessible, integrity is provided with a checksum. With on-line compression, only a small data window is available to the archiver, there is no guarantee that there will be no data loss,

The authors of the algorithm have found an elegant solution, a description that takes literally several pages . Many years have passed, but I am still impressed with the beauty and elegance of the approach proposed by the developers of the algorithm.

This example does not relate to the code per se, but rather to the algorithm, so we will not dwell on it in more detail.

Linux is the head of everything!

I would like to analyze the second example of a perfect code in more detail. This is the Linux kernel code. The code that as of this writing controls the operation of 500 supercomputers from the top 500 , the code that runs on every second phone in the world and that controls most of the servers on the Internet.

Consider for example the memory.c file from the Linux kernel , which belongs to the memory management subsystem.

1. Source codes are easy to read.They are written using a very simple style that is easy to follow and difficult to get confused. Capital letters are used only for preprocessor directives and macros, everything else is written in small letters, words in the names are separated by underscores. This is probably the simplest possible coding style, apart from the lack of style at all. At the same time, the code is perfectly readable. Indents and approach to commenting are visible from any piece of any kernel file, for example:

static void tlb_remove_table_one(void *table)
{
        /*
         * This isn't an RCU grace period and hence the page-tables cannot be
         * assumed to be actually RCU-freed.
         *
         * It is however sufficient for software page-table walkers that rely on
         * IRQ disabling. See the comment near struct mmu_table_batch.
         */
        smp_call_function(tlb_remove_table_smp_sync, NULL, 1);
        __tlb_remove_table(table);
}

2. There are not too many comments in the code, but those that exist are usually useful. They, as a rule, do not describe the action, which is obvious from the code (a classic example of a useless comment - “cnt ++; // increment counter”), but the context of this action - why what is done here, why it is done so, why here, with what assumptions it is used, with what other places in the code it is connected. For example:

/**
 * tlb_gather_mmu - initialize an mmu_gather structure for page-table tear-down
 * @tlb: the mmu_gather structure to initialize
 * @mm: the mm_struct of the target address space
 * @start: start of the region that will be removed from the page-table
 * @end: end of the region that will be removed from the page-table
 *
 * Called to initialize an (on-stack) mmu_gather structure for page-table
 * tear-down from @mm. The @start and @end are set to 0 and -1
 * respectively when @mm is without users and we're going to destroy
 * the full address space (exit/execve).
 */voidtlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm,
                        unsigned long start, unsigned long end)

Another use of comments in the kernel is a description of the change history, usually at the beginning of the file. The history of the nucleus has been around for almost thirty years, and it’s just interesting to read some places, you feel yourself in the history:

/*
 * demand-loading started 01.12.91 - seems it is high on the list of
 * things wanted, and it should be easy to implement. - Linus
 */
 
/*
 * Ok, demand-loading was easy, shared pages a little bit tricker. Shared
 * pages started 02.12.91, seems to work. - Linus.
 *
 * Tested sharing by executing about 30 /bin/sh: under the old kernel it
 * would have taken more than the 6M I have free, but it worked well as
 * far as I could see.
 *
 * Also corrected some "invalidate()"s - I wasn't doing enough of them.
 */

3. The kernel code uses special macros to validate data. They are also used to check the context in which the code works. The functionality of these macros is similar to the standard assert, with the difference that the developer can override the action that is performed when the condition is true. General approach to data processing in the kernel - everything that comes from user space is checked, in case of erroneous data the corresponding value is returned. In this case, WARN_ON can be used to issue a record in the kernel log. BUG_ON is usually quite useful when debugging new code and running the kernel on new architectures.

The BUG_ON macro usually causes the contents of the registers and the stack to be printed and either stops the entire system or the process in the context of which the corresponding call occurred. The WARN_ON macro simply displays a message to the kernel log in the event that the condition is true. There are also macros WARN_ON_ONCE and a number of others, the functionality of which is clear from the name.

void unmap_page_range(struct mmu_gather *tlb,
….
         unsignedlong next;
 
        BUG_ON(addr >= end);
        tlb_start_vma(tlb, vma);
 
 
int apply_to_page_range(struct mm_struct *mm, unsignedlong addr,
…
        unsignedlong end = addr + size;
        int err;
 
        if (WARN_ON(addr >= end))
                return -EINVAL;

The approach, in which data obtained from unreliable sources are checked before use, and the system’s response to “impossible” situations is foreseen and defined, makes it much easier to debug the system and its operation. You can consider this approach as the implementation of the principle of fail early and loudly.

4. All core components of the kernel provide users with information about their state through a simple interface, the virtual file system / proc /.

For example, information about the state of memory is available in the file / proc / meminfo

user@parallels-vm:/home/user$ cat /proc/meminfo
MemTotal:        2041480 kB
MemFree:           65508 kB
MemAvailable:     187600 kB
Buffers:           14040 kB
Cached:           246260 kB
SwapCached:        19688 kB
Active:          1348656 kB
Inactive:         477244 kB
Active(anon):    1201124 kB
Inactive(anon):   387600 kB
Active(file):     147532 kB
Inactive(file):    89644 kB
….

The information above is collected and processed in several source files of the memory management subsystem. So, the first MemTotal field is the value of the totalram field of the sysinfo structure, which is populated with the si_meminfo function of the page_alloc.c file .

Obviously, the organization of collecting, storing and providing the user with access to such information requires efforts from the developer and some overhead from the system. At the same time, the benefits of having convenient and simple access to such data are invaluable, both in the process of developing and operating the code.

The development of almost any system should start with a system for collecting and providing information about the internal state of your code and data. This will greatly help in the process of development and testing, and, further, in operation.

As Linus said , “Bad programmers worry about the code. Good programmers worry about data structures and their relationships. ”

5. All code is read and discussed by several developers before committing. The change history of the source code is recorded and available. Changes to any line can be traced back to its occurrence - what has changed, by whom, when, why, what issues were discussed by the developers. For example, the change https://github.com/torvalds/linux/commit/1b2de5d039c883c9d44ae5b2b6eca4ff9bd82dac#diff-983ac52fa16631c1e1dfa28fc593d2ef in the code memory.c, inspired by the https://bjcpage file, supported by your country, which is enabled by 2011 and you. a small code optimization was made (the call to enable memory protection from writing does not occur if the memory is already write-protected).

It is always important for the developer working with the code to understand the context around this code, with what assumptions the code was created, what and when it changed, in order to understand which scenarios might be affected by the changes that he himself is going to make.

6. All important elements of the life cycle of the kernel code are documented and available , starting with the coding style and ending with the content and release schedule of stable kernel versions . Each developer and user who wants to work with the kernel code in one capacity or another has all the necessary information for this.

These moments seemed important to me, basically, they determined my enthusiastic attitude to the kernel code. Obviously, the list is very short and can be expanded. But the above points, in my opinion, relate to key aspects of the life cycle of any source code from the point of view of the developer working with this code.

What I would like to say in conclusion. Kernel developers are smart and experienced, they have been successful. Proved by billions of devices running Linux

Be as kernel developers, use best practices and read Code Complete!

ZY By the way, what criteria of an ideal code do you personally have? Share your thoughts in the comments.

Tags:

When is the program code admirable?

90s example

Linux is the head of everything!

Also popular now: