HotWaterMusic January 17, 2013 at 16:23

The exceptional beauty of Doom 3 source code

Transfer
Recovery mode

Today you are waiting for a story about the source code of Doom 3 and how beautiful it is.
Yes, beautiful . Let me explain.

After the release of my Dyad video game, I decided to take a short break. I finally read some books and watched the films that I had put so far in the far box. Then I worked on the European version of Dyad , but all this time I was mostly waiting for feedback from Sony's quality department, so I had more than enough time. After a month of such hanging out, I seriously thought about what should I do next. I remembered that I had long been planning to separate the pieces of source code that I wanted to use in my new project from Dyad .

When I just started working on Dyad , it was a "transparent" game engine with good functionality, and it turned out like this thanks to my experience on previous projects. Toward the end of the development of the game, he turned into a hopeless mess.

Over the last 6 weeks of developing DyadI added 13k lines of code. Only one source of the main menu MainMenu.cc has inflated to 25,501 lines. Once upon a time, beautiful code turned into a real mess of all sorts of #ifdef, pointers to functions, ugly SIMDs and assembler inserts - and I discovered a new term for myself “code entropy”. With a sad look at all this, I went on a journey through the Internet in search of other projects that would help me understand how other developers are beautifully managing hundreds of thousands of lines of code. But after I looked at the code for a pair of large game engines, I was just discouraged; my "terrible" source code was even clean compared to the rest!

I continued my search, unsatisfied with such a result. In the end, I came across an interesting analysis of the Doom 3 source codefrom id Software , written by Fabian Sanglard .

I spent several days studying the Doom 3 source code and reading Fabian's articles, after which I tweeted:

I spent some time studying the Doom3 source. This is probably the most intuitive and cutest code I've ever seen.

And that was true. Until this moment, I never cared about the source code. Yes, I really don’t like to call myself a “programmer” too much. It works out pretty well for me, but for me, programming lasts exactly until everything starts working. After looking at the Doom 3 source code, I really learned to value good programmers.

***
To get you some idea: Dyad contains 193k lines of code, all in C ++. Doom 3 - 601k, Quake III - 229k and Quake II - 136k. These are big projects.

When I was asked to write this article, I used it as an excuse to read some more source code for other games and articles about programming standards. After several days of my research, I would be embarrassed by my own tweet, and this made me think - so what is it worth considering as “beautiful” source code? I asked several of my fellow programmers what they thought it meant. Their answers were obvious, but it still makes sense to quote them here:

The code must be grouped locally and uniformly functional: One function must do exactly one thing. It should be clear what a particular function does.
Local code should explain, or at least point to, the architecture of the entire system.
The code should be documented "on its own." Comments should be avoided in all possible situations. Comments duplicate the work of both reading and writing code. If you need to comment on something, then most likely it should be rewritten from scratch.

For idTech 4, the code standards are publicly available ( .doc ) and I can recommend them as decent reading. I will go over most of these standards and try to explain how they make Doom 3 code so beautiful.

Universal parsing and parsing

One of the smartest things I've seen in Doom is the use of their lexical analyzer and parser throughout the program. All resource files are ascii files with a single syntax including: scripts, animation files, configs, etc .; everything is the same. This allows you to read and process all files with the same piece of code. The parser is especially reliable, and supports the main subset of C ++. Adherence to a single parser and lexical analyzer helps other components of the engine not to worry about serializing data, since the code responsible for this part of the application has already been written. Thanks to this, the rest of the code becomes much clearer.

Const and strict parameters (Rigid Parameters)

The Doom code is strict enough, but (in my opinion) not strict enough with respect to const. Const serves several reasons that I'm sure too many programmers are ignoring. My rule is: "const should be used everywhere, except in cases where it cannot be used." My dream is that all variables in C ++ are const by default. Doom almost always adheres to a no-output policy for parameters; it means that all parameters passed to the function are either input or output, and never combine this role in one person. This simple trick allows you to see what happens to a variable as quickly as you pass it to a function. For example:

Just defining this function makes me happy!

Out of a few things that immediately catch my eye, much is already becoming clear:

idPlane is passed to the function as an immutable argument. I can safely use the same plane after calling this function without checking for idPlane changes.
I know that epsilon will not be changed inside the function (despite the fact that it can be copied without problems to another variable and used to initialize it - this method will be unproductive)
front, back, frontOnPlaneEdges and backOnPlaceEdges are OUTPUT variables. They will be recorded.
the final const modifier after the parameter list is my favorite. It indicates that idSurface :: Split () will not be able to change the surface itself. This is one of my favorite features in C ++ that I miss so much in other languages. It allows me to do the following:
void f (const idSurface & s) {
s.Split (....);
}

if Split would not be defined as Split (...) const; this code would not compile. Now I will always know that any call to f () will not change the surface, even if f () is passed to the surface by another function or calls any of the Surface :: method () methods. Const tells me a lot about this feature and also gives hints about the overall system architecture. One reading of the declaration of this function makes it clear that surfaces can be dynamically separated by planes. Instead of changing the original surface, new surfaces will be returned to us - front and back, and, possibly, side frontOnPlaneEdges and backOnPlaneEdges.

The rule of using const and the lack of "input-output" parameters in my assessment is one of the most important things that separate good code from amazing. Such an approach makes it easier not only to understand the system itself, but also to change or refactor it.

Minimalistic comments

This paragraph, of course, is still more concerned with code writing style issues, but nevertheless, Doom has such a wonderful thing as the lack of excessive commenting. In my practice, I have come across too much code that is very similar to the following:

Such techniques, in my opinion, are a very, very annoying thing. Why? Because I can already name what this code does - just look at its name. If it is not clear to me the purpose of the method from its name, then its name should be changed. If the name is too long, shorten it. If it cannot be changed and already shortened - well, then you can use the comment. All schoolchildren are taught that commenting is good; but it is not so. Comments are bad, until they are needed. And they are extremely rare. The creators of Doom have done a responsible job in order to keep the number of comments to a minimum. Using idSurface :: Split () as an example, let's look at how it is commented out:

// divides the surface into front and back surfaces, the surface itself remains unchanged
// frontOnPlaneEdges and backOnPlaneEdges optionally store the vertex indices that lie on the edges of the separating plane
// returns SIDE_?

The first line is completely redundant. We already know all this from the definition of a function. The second and third lines carry some new information. We could remove the second line, but this can cause potential ambiguity.

Mostly Doom Codevery severe in relation to their own comments, which makes it much easier to read. I know that this may be a matter of style for some people, but it seems to me that there is definitely a “right” way to do this. For example, what should happen if someone changes a function and removes a constant at the end? In this case, the function call will change for the external code, and now the comment will be unrelated to the code. Extraneous comments harm the readability and accuracy of the code, so that the code gets worse.

Indentation

Doom is not inclined to waste the free vertical space of the screen.
Here is an example from t_stencilShadow :: R_ChopWinding ():

I can read the whole algorithm without problems, because it fits on 1/4 of my screen, leaving the other 3/4 in order to understand how this code can relate to its surroundings. I have seen too much of this in my life:

There will be another remark here that falls under the category of “style”. I programmed for more than 10 years in the style of the last example, and forced myself to switch to a more compact code only six years ago while working on one of the projects. I'm glad I switched in time.

The second method takes 18 lines in relation to 11 lines in the first. Almost twice as many lines of code with the samefunctionality. Also, the next piece of code clearly won't fit on my screen. And what's in it?

This code does not make any sense without the previous piece with a loop. If id did not save vertical space, then their code would become much more difficult to read, support and would immediately lose in beauty.

Another thing that id decided to accept as a permanent rule, I also strongly support it, is the decision to always use {}, even when it is not necessary. I have seen too much code like this:

I could not find any examples in id codewhere they would have missed at least {}. If we omit the additional {}, then parsing the while () block will take several times more time than it should. In addition, any revision turns into a real misery - just imagine that I need to insert an if-condition in the else if (c> d) path.

Minimal use of templates

id violated one of the greatest prohibitions in the C ++ world. They rewrote all the required STL functions. Personally, I am with STL in a “one step from love to hate” relationship. In Dyad, I used it in debug builds to manage dynamic resources. In the release, I packed all the resources so that it became possible to load them as quickly as possible, and they stopped using the STL functionality. STL is a pretty handy thing because it gives you access to basic data structures; its main trouble is that its use leads to ugly code and is prone to errors. For example, take a look at the std :: vector class. Let's say if I need to iterate over all the elements:

In C ++ 11, the same thing looks much easier:

Personally, I don’t like the use of auto, it seems to me that it makes the code easier to write, but harder to read. I sometimes used auto in past years, but now it seems to me that this was the wrong decision. I'm not going to even start discussing the absurdity of some of the STL algorithms, such as std: for_each or std :: remove_if.

Removing a value from std :: vector is also terrible:

Imagine that every programmer must type this line correctly every time!

idremoves all ambiguity: they roll out their own base containers, string class, etc. They try to make them more specific than their counterparts in STL, perhaps in order to make them easier to understand. They are minimally templated and use their own memory allocators. And the STL code is overwhelmed by the constant use of templates so much that it is simply impossible to read.

C ++ code quickly becomes unmanageable and ugly, so programmers constantly have to work hard to get the opposite effect. And so you understand how far things can go, look at this STL source code. The Microsoft and GCC STL implementations are one of the scariest sources I've ever seen. Even if the programmer blows off any dust particles from the template code, the code still turns into a complete mess. For example, look at the Loki library from Andrei Alexandrescu, or the boost libraries - these lines were written by one of the best C ++ programmers in the world, and even his efforts to make them as beautiful as possible could only degenerate into an ugly and completely unreadable code.

How does id solve this problem ?? They simply do not try to bring everything to a “common denominator”, over-generalizing their functions. They have the HashTable and HashIndex classes, the first requires the key type to be const char *, and the second - the pair int-> int. In the case of C ++, this decision is considered to be bad - “should” have a single class HashTable, and write in it two different processing for KeyType = const char * and. But what id did is also correct, and moreover, made their code many times more beautiful.

It’s easy to make sure of this, just trace the contrast between the “good C ++ programming style” to generate the hash and the way id worked with it.

To many, it would seem a good idea to create a special class of calculations that can be passed as a parameter to a HashTable:

it can be set as a specific type:

Now you can pass ComputeHashForType as a HashComputer for a HashTable:

I did it in my own way. It looks like a smart decision, but ... how ugly! What if we deal with a large number of parameters in the template? With a memory allocator? With debugging? Then we get something like this:

Brutal definition of a function, right?

So what is all this for? I could hardly find a method name without a bright syntax highlight. It is likely that the definition of a function will take up more space than its body. Definitely hard to read and not too pretty.

I saw how other engines handle a mess-like method of offloading function arguments using billions of typedefs. This is even worse! Maybe the code “right in front of you” will become more understandable, but the gap between the system and the current code will be even greater than it was before, and this code will no longer indicate the design of the entire system - which violates our beauty principle. For example, we have code:

and

you used them together and did something like this:

Perhaps the StringHashTable memory allocator called StringAllocator is not conducive to global memory, which may confuse you. You will have to look through the entire code, find out that StringHashTable is actually a typedef from intricate templates, go through the source code of the template, find another allocator, find its description ... a nightmare, just a nightmare.

Doom goes against the principles of C ++ logic: the code is written as specific as possible, uses generalizations only where it makes sense. What makes a HashTable from Doomwhen does he need to generate a hash or something else? It calls idStr :: GetHash () because the only type of keys it accepts is const char *. What happens if another key is needed? It seems to me that they template the key and simply force key.getHash (), and the compiler ensures that the key types have int getHash () method.

Residues in the “inheritance” from C

I don’t know exactly how many of the id programmers in the 90s work in the company now, but at least John Carmack himself has extensive programming experience in C. All id games before Quake III were written in C. I have met C ++ programmers who don’t had a lot of programming experience in C, so their code was too C ++ zirovanny. The previous example was just one of many - here are others that I find quite often:

using get / set methods too often
using stringstream
excessive operator overload.

id strictly monitors all of these cases.

It often happens that someone creates a class in this way:

It is a waste of lines of code and the subsequent time to read it. Such an option will eat more of your time than

What if you often have to increase var by a certain number n?

Compared to the

first example, it is much easier to write and read.

id does not use stringstream. stringstream contains one of the most important “bastardizations” of operator overload, which I have encountered: <<.

For example,

It's not beautiful. This method has a strong advantage: you can determine the equivalent of the toString () function from Java for a particular class that will affect the class variables, but the syntax will become too inconvenient, and id decides not to use this method. Opting for printf () instead of stringstream makes the code easier to read, and I think that is the right choice.

Much better!

The syntax of the << operator for SomeClass comes to the ridiculous:

[Note: John Carmack once noticed that statistical code analysis programs helped to find out that their common bug was caused by incorrect matching of parameters in printf (). I wonder if they switched to stringstream in Ragebecause of this? .. GCC and clang both give a similar error message when using the -Wall flag, so you can see for yourself without resorting to expensive analyzers to find these errors.]

Another principle that Doom code does so beautiful, this is a minimal use of operator overload. This very popular and convenient feature introduced in C ++ allows you to do something like this:

Without overloading, these operations will become less obvious and will require more time to write and read. Here doomand stops. I saw the code that goes further. I saw code that overloads the '%' operator to denote the scalar product of vectors, or the Vector * Vector operator that performs vector multiplication. It makes no sense to start an operator * for such an action, which will be feasible only in 3D. After all, if you want to do some_2d_vec * some_2d_vec, then what will you order to do? What about 4d or more? That is why the principle of minimal interference from id is correct - it does not leave us any ambiguities.

Horizontal indentation

One of the most important things I learned from Doom source code was a simple style change. I'm used to that my classes look something like this:

According to the code standard for Doom 3 , id use real tabs, which matches 4 spaces. The same default tab allows all programmers to align the definitions of their classes horizontally without too much thought:

They prefer not to include definitions of inline functions inside class definitions. The only case I met was when the code was written on the same line as the function declaration. Most likely, such a practice is not the norm and is not approved. This way of organizing class definitions makes them easy to read. It may take you a little longer to retype what you already entered to define the methods:

I myself am against an excessive set of characters on the keyboard. The main thing I need is to do my work as quickly as possible - but in this situation, a little busting with typing when defining a class pays off more than once or more if the programmer has to look at the class definition. There are a few more examples of the coding style described in the Doom 3 Coding Standards ( .doc ) document, which is responsible for all the beauty of Doom 3 source code .

Method Names

In my opinion, the rules for naming methods in Doom still lack something. Personally, I like to use this rule in my work: all method names must begin with verbs, and exceptions are only those cases when this cannot be done.

For example:

much better than:

Yes, he is incredibly beautiful.

I am glad that this article was released because it allowed me to think about what we mean by the beauty of the code. To be honest, I'm not sure that I understood at least something. It is possible that all my assessments are too subjective. For myself, I noted at least a couple of very important things - the style of indentation and the constant use of constants.

Many of the choices in the style of the code are my personal preferences; I have no doubt that other programmers will turn out to be completely different. In my opinion, the burden of choosing the style of writing the code lies entirely on the shoulders of the person who is going to write and read it, but you should think about this from time to time.

I would like to advise everyone to look into the source code of Doom 3, because you don’t see such source code every day: here you have a complete set, from system architecture design to tabs between characters.

Sean McGrath ( by Shawn McGrath State ) - the game developer, who lives in Toronto, creator of the popular psychedelic games for Playstation 3 - Puzzle Racing Dyad . We advise you to take a look at his game and follow him on Twitter .

Notes

Note John Carmack

Thank you! A few comments:

I continue to think that in a certain way, Quake 3 code is still better, since it became the pinnacle of my C-style evolution - unlike the first attempt to program the engine in C ++, but this can only be my illusion due to the small amount lines in the first, or due to the fact that I no longer looked into it for a dozen years. I think that “good C ++” is better than “good C” in terms of readability, with the rest being equivalent languages.

I kind of “stuck” it with C ++ in Doom 3 - the fact is that I was an experienced C programmer with OOP skills left over from NeXT and Objective-C, so I started writing C ++ code without complete learning all the principles of using the language. Looking back, I can notice that I regret that I did not read Effective C ++ and something else on this topic. A couple of other programmers had enough C ++ experience, but they mostly followed my stylistic choices.

I did not trust templates for many years, and now I use them with caution, but I somehow decided that the charms of strict typing outweigh the scales in the direction opposite to the strange code in the header files. So the debate around the STL still does not subside in our id, and now they have received an additional “twinkle”. Returning to the days when Doom 3 began, I can almost certainly say that using STL would definitely be a bad idea, but now ... there are a lot of reasonable arguments for, even in the case of games.

Now I have become a terrible const nazi, and I report to any programmer who does not make a variable or parameter a constant, if they could be it.

Касаемо лично меня — моя собственная эволюция направляет меня в сторону более функционального стиля программирования, что подразумевает отучивание от большого числа старых привычек и отход от некоторых приемов ООП.

[www.altdevblogaday.com]

Прим. переводчика

Я сам наткнулся на блог Фабиена около полутора лет тому назад, и могу смело порекомендовать его всем интересующимся — если и не ради вдумчивого чтения, то хотя бы ради вдохновения.
По поводу «чистого» кода — в Твиттере я спрашивал не так давно у Кармака, чего бы он порекомендовал почитать по теме. Он настоятельно советовал книгу «Art of the Readable Code» (Amazon).

Теги: