qehgt February 22, 2012 at 05:01

Why is GNU make bad?

GNU make is a well-known utility for automatically building projects. In the UNIX world, it is the de facto standard for this task. Being not so popular among Windows developers, however, it led to the appearance of such analogues as nmake from Microsoft.

However, despite its popularity, make is a largely flawed tool. Its reliability is in doubt; low productivity, especially for large projects; The makefile language itself looks abstruse and at the same time it lacks many of the basic elements that are originally present in many other programming languages.

Of course, make is not the only utility to automate the build. Many other tools have been created to remove the limitations of make. Some of them are definitely better than the original make, but this has had little effect on the popularity of make. The purpose of this document, in simple terms, is to talk about some of the problems associated with make, so that they do not come as a surprise to you.

Most of the arguments in this article relate to the original UNIX make and GNU make. Since GNU make today is most likely much more common, when we mention make or “makefiles,” we will mean GNU make.

The article also assumes that the reader is already familiar at the basic level with make and understands such concepts as “rules”, “goals” and “dependencies”.

Language design

Anyone who has ever written a makefile has most likely come across a “feature” of its syntax: it uses tabs. Each line describing the start of a command must begin with a tab character. Spaces do not fit - just a tab. Unfortunately, this is just one of the weird aspects of the make language.

Recursive make

Recursive make is a common pattern when defining makefile rules when a rule creates another make session. Since each make session only reads the top-level makefile once, this is a natural way to describe the makefile for a project that consists of several sub-projects.

“Recursive make” creates so many problems that an article has even been written to show why this solution is bad. It identifies many difficulties (some of which are mentioned below), but writing makefiles that don't use recursion is actually a difficult task.

Parser

Most parsers of programming languages follow the same behavior model. In the beginning, the source text is divided into “tokens” or “scanned”, comments and spaces are thrown out and the input text (specified in a fairly free form) is translated into a stream of “tokens” such as “characters”, “identifiers” and “reserved words” . The resulting stream of tokens is then “parsed” using a language grammar that determines which combinations and order of tokens are correct. In the end, the resulting “grammar tree” is interpreted, compiled, etc.

The make parser does not follow this standard model. You cannot parse a makefile without executing it at the same time. Variable substitution can occur anywhere, and since you do not know the value of a variable, you cannot continue parsing. As a result, it is a very non-trivial task to write a separate utility that can parse makefiles, since you have to write an implementation of the whole language.

Also, there is no clear division into tokens in the language. For example, let's see how a comma is processed.

Sometimes a comma is part of a line and does not have a special status:

X = y,z

Sometimes a comma separates the lines that are compared in the if statement :

ifeq ($(X),$(Y))

Sometimes a comma separates function arguments:

$(filter %.c,$(SRC_FILES))

But sometimes, even among function arguments, a comma is just part of the line:

$(filter %.c,a.c b.c c.cpp d,e.c)

(since it filtertakes only two parameters, the last comma does not add a new parameter; it becomes just one of the characters of the second argument)

Spaces follow the same obscure rules. Sometimes gaps are taken into account, sometimes not. Lines are not enclosed in quotation marks; because of this, it is visually not clear which spaces are significant. Due to the lack of such a data type as a “list” (only lines exist), spaces should be used as a separator for list items. As a result, this leads to excessive complexity of the logic, for example, if the file name simply contains a space.

The following example illustrates the confusing logic for handling spaces. An obscure trick is required to create a variable that ends with a space. (Usually spaces at the ends of lines are thrown out by a parser, but it happens before, but not after replacement of variables).

NOTHING :=
SPACE := $(NOTHING) $(NOTHING)
CC_TARGET_PREFIX := -o$(SPACE)
# вот теперь можно писать правила вида $(CC_TARGET_PREFIX)$@

And we just touched commas and spaces. Only a few people understand the intricacies of the make parser.

Uninitialized and environment variables.

If an uninitialized variable is accessed in the makefile, make does not report an error. Instead, it gets the value of this variable from an environment variable of the same name. If an environment variable with the same name is not found, then it is simply considered that the value will be an empty string.

This leads to two types of problems. First, typos are not caught and are not considered errors (you can force make to issue warnings for such situations, but this behavior is disabled by default, and sometimes uninitialized variables are used intentionally). Second, environment variables can unexpectedly affect the code of your makefile. You cannot know for sure which variables could be set by the user, therefore, for reliability, you must initialize all the variables before referencing them or adding them through. +=

There is also a confusing difference between the behavior of make if it is called as " make FOO=1" with a call " export FOO=1 ; make". In the first case, the line in the makefile FOO = 0has no effect! Instead, you should write override FOO = 0.

Conditional Expression Syntax

One of the main drawbacks of the makefile language is its limited support for conditional expressions (conditional statements, in particular, are important for writing cross-platform makefiles). Newer versions of make already contain support for the " else if " syntax. Of course, the if statement has only four basic options: ifeq, ifneq, ifdef , and ifndef . If your condition is more complex and requires checking for “and / or / not”, then you have to write more cumbersome code.

Suppose we need to define Linux / x86 as the target platform. The following hack is the usual way to replace the “and” condition with its surrogate:

ifeq ($(TARGET_OS)-$(TARGET_CPU),linux-x86)
    foo = bar
endif

The condition "or" will not be so simple. Suppose we need to define x86 or x86_64, and instead of " foo = bar " we have code for 10+ lines and we don’t want to duplicate it. We have several options, each of which is bad:


# Кратко, но непонятно
ifneq (,$(filter x86 x86_64,$(TARGET_CPU))
  foo = bar
endif
# Многословно, но более понятно
ifeq ($(TARGET_CPU),x86)
  TARGET_CPU_IS_X86 := 1
else ifeq ($(TARGET_CPU),x86_64)
  TARGET_CPU_IS_X86 := 1
else
  TARGET_CPU_IS_X86 := 0
endif
ifeq ($(TARGET_CPU_IS_X86),1)
  foo = bar
endif

Many places in makefiles could be simplified if the language supported full syntax.

Two kinds of variables

There are two kinds of variable assignments in make. " : = " evaluates the expression to the right immediately. The usual " = " evaluates the expression later when the variable is used. The first option is used in most other programming languages and, as a rule, is more effective, in particular, if the expression is difficult to calculate. The second option, of course, is used mostly in makefiles.

There are objective reasons for using " = " (with deferred computation). But often you can get rid of it using a more accurate makefile architecture. Even without considering the performance issue, deferred calculations make makefile code more difficult to read and understand.

Usually, you can read a program from beginning to end - in the same order in which it is executed, and know exactly what state it is in at any given time. With deferred computation, you cannot know the value of a variable without knowing what happens next in the program. A variable can change its value indirectly, without directly changing it. If you try to look for errors in the makefile using the "debug output", for example like this:

$(warning VAR=$(VAR))

... you may not get what you need.

Pattern Substitution and File Search

Some rules use the% sign to indicate the main part of the file name (without the extension) - in order to set the rule for generating some files from others. For example, the rule " % .o:% .c " to compile .c files into an object file with the extension .o .

Suppose we need to build an object file foo.o but the source file foo.c is located somewhere not in the current directory. Make has a vpath directive that tells him where to look for such files. Unfortunately, if a file named foo.c is found twice in directories, make may select the wrong file.

The following standard programming pattern for makefiles fails if the two source files have the same name (but different extension) and are adjacent. The problem is that the conversion "source file name => object file name" loses some of the information, but the make'a design requires this to do the reverse mapping.

O_FILES := $(patsubst %.c,%.o,$(notdir $(C_FILES)))
vpath %.c $(sort $(dir $(C_FILES)))
$(LIB): $(O_FILES)

And other missing features

make does not know any data types - just strings. No boolean type, lists, dictionaries.
There is no concept of “scope”. All variables are global.
Support for cycles is limited. $ (foreach) will evaluate the expression several times and combine the results, but you cannot use $ (foreach) to create, for example, a group of rules.
User-defined functions exist, but have the same limitations as foreach . They can only deal with variable substitution and cannot use the language syntax completely or create new dependencies.

Reliability

Make's reliability is low, especially on large projects or incremental compilation. Sometimes the assembly crashes with a strange error, and you will have to use “magic spells” such as make clean and hope that everything is fixed. Sometimes (a more dangerous situation) everything looks good, but something has not been recompiled and your application will crash after starting.

Missing Dependencies

You must tell make about all the dependencies of each target. If you do not, it will not recompile the target when the dependent file changes. For C / C ++, many compilers can generate dependency information in a format understood by make. For other utilities, however, the situation is much worse. Let's say we have a Python script that includes other modules. Changes in the script lead to a change in its results; this is obvious and easy to make in the makefile. But a change in one of the modules can also change the output of the script. A full description of all these dependencies and keeping them up to date is a non-trivial task.

Using the label "time of last file modification"

make determines that the target requires rebuilding by comparing its “last modification time” with the same time for its dependencies. There is no analysis of the contents of the file, only a comparison of their times. But using this file system information is not always reliable, especially in a networked environment. The system clock may lag, sometimes other programs may force the time they need to modify the files, overwriting the "real" value. When this happens, make does not rebuild the goals that need to be rebuilt. The result is only a partial recompilation.

Command line options

When a parameter string of a program changes, its results can also change (for example, a change in -Doption, which is passed to the C preprocessor). make will not recompile in this case, which will result in incorrect intermediate recompilation.

You can try to protect yourself from this by adding a Makefile to each target. However, this approach is unreliable, as you may miss a target. Moreover, Makefiles may include other Makefiles, which may also include Makefiles. You will need to list them all and keep this list up to date. In addition, many changes to makefiles are minor. You most likely do not want to recompile the entire project just because you changed the comment in the makefile.

Inheritance of environment variables and dependency on them

Not only each environment variable becomes a make variable, but also these variables are passed to every program that make runs. Since each user has his own set of environment variables, two users running the same assembly can get different results.
Changing any environment variable passed to the child process can change its output. That is, such a situation should initiate a rebuild, but make will not.

Multiple Concurrent Sessions

If you run two instances of make in the same directory at the same time, they will collide when they try to compile the same files. Most likely, one of them (or even both) will crash.

Editing files during rebuilding.

If you edited and saved the file while make was working, the result cannot be predicted. Maybe make will pick up these changes correctly, or maybe not, and you will need to run make again. Or, if you're unlucky, saving can happen at such a moment that some of the goals will require rebuilding, but subsequent make runs will not find this.

Delete unnecessary files

Suppose your project originally used the file foo.c, but later this file was deleted from the project and from the makefile. The temporary object file foo.o will remain. This is usually acceptable, but such files can accumulate over time and sometimes lead to problems. For example, they may be mistakenly selected during a vpath search. Another example: let's say one of the files previously generated by make during assembly is now put into the version control system. The rule that generated this file is also removed from the makefile. However, version control systems usually do not overwrite files if they see that a non-versioned file with the same name already exists (for fear of deleting something important). If you didn’t pay attention to the error message, didn’t delete this file manually and didn’t re-update the source directory,

File Name Normalization

You can access the same file using different paths. Without even taking into account the hard and symbolic links, foo.c, ./foo.c, ../bar/foo.c, /home/user/bar/foo.c can point to the same file. make should handle them appropriately, however he does not.
The problem is even worse under Windows, where the file system is not case sensitive.

Consequences of an interrupted or failed rebuild

If the assembly crashes in the middle of the process, further incremental recompilations may not be reliable. In particular, if the command returned an error, make does not delete the intermediate output file! If you run make again, it may find that the file no longer requires recompilation and try to use it. Make has a special option that forces him to delete such files, but it is not enabled by default.
Pressing Ctrl-C during rebuilding can also put your source tree in an incomprehensible state.
Every time you encounter problems during incremental rebuilding, there is a doubt - if one file is not rebuilt correctly, who knows how many more such files are? In such a situation, you may need to start over with make clean. The problem is that make clean does not give any guarantee (see above), you may have to expand the source tree again in another directory.

Performance

Make's performance does not scale well (non-linearly) as the size of the project grows.

Incremental Assembly Performance

You can hope that reassembling a project takes time proportional to the number of goals that need to be rebuilt. Unfortunately, this is not the case.
Due to the fact that the result of incremental assemblies does not always inspire confidence, users should do a complete rebuild more or less regularly, sometimes if necessary (if something isn’t going, try make clean; make ), and sometimes constantly (due to paranoia ) It is better to be sure and wait for a complete rebuild than to risk that some part is out of sync with the source.
The “last modified time” of a file may change without changing the contents of the file. This leads to unnecessary re-comments.
A poorly written makefile may contain too many dependencies, because of this, goals can be recompiled even if its (real) dependencies have not changed. Inaccurate use of “phony” goals is another source of errors (such goals should always be rebuilt).
Even if your makefiles do not contain errors, and your incremental builds are absolutely reliable, performance is not perfect. Suppose you edited one of the .c files (not the header file) in a large project. If you type make at the root of the project, make will have to parse all the makefiles, calling itself recursively many times, and go through all the dependencies to find out if they need to rebuild. The start time of the compiler itself can be significantly less than the total time.

Recursive make and performance

Careless use of a recursive make can be dangerous, for example, in such a scenario. Suppose your project contains the source code for two executable files A and B, which in turn depend on the C library. The top-level makefile must recursively enter the directories A and B, of course. We would also like to be able to call make in the directories A and B if we want to build only one of the executable files. Accordingly, we must recursively call make from the ../C directory as well. And if you call make from the root of the project, we get into C twice!
In this example, this doesn’t look scary, but in large projects make can look dozens of times into some directories. And each time the makefile must be read, parsed and all its dependencies must be checked. There are no built-in tools in make to prevent such situations.

Parallel Make

The "parallel launch" of the make promises a big increase in speed, especially on modern processors with many cores. Unfortunately, reality is far from promises.
The text output of the "parallel make" is hard to read. It is hard to see which warning / line / etc. refers to which team when several processes simultaneously work in the same environment.
Parallel make is particularly sensitive to the correct specification of dependencies. If the two rules are not connected through dependencies, make suggests that they can be invoked in any order. When a single make is called, its behavior is predictable: if A depends on B and C, then first B will be built, then C, then A. Of course, make has the right to build C to B, but (in sequential make mode) the order is defined .
In parallel mode, B and C can (but do not have to) be built in parallel. If C (in fact) depends on B, but this dependence is not written in the makefile, then building C will most likely fail (but not necessarily, depending on specific times).
Parallel make pokes out missing dependency issues in makefiles. This in itself is a good thing, because they lead to other problems, and it’s great that you can catch them and fix them. But in practice, on large projects, the result of using a parallel make is disappointing.
The interaction of a parallel make with a recursive make is difficult. Each session of the make is independent, that is, each tries to parallelize its work independently of the others and does not have a general idea of a complete dependency tree. We must find a compromise between reliability and performance. On the one hand, we want to parallelize the assembly not only of a single makefile, but of all the other makefiles. But since make does not know about cross-makefile dependencies, fully parallelizing sub-makes does not work.
Some sub-make's can be run in parallel, others must be run in serial mode. Specifying these dependencies is inconvenient, and it is very easy to skip a few of them. There is a temptation to return to a reliable sequential way of parsing the makefile tree and to parallelize only single makefiles at a time, but this greatly reduces the overall performance, in particular with incremental builds.

Automatic dependency generation for Microsoft Visual C ++

Many compilers, like GCC, can provide dependency information in a format understood by make. Unfortunately, Microsoft Visual C ++ does not do this. It has a special switch / showIncludes , but an additional script is required to translate this information into make'a format. This requires running a separate script on each C-file. Running, for example, the Python interpreter for each file is not an instant operation.

Inline Rules

make contains a huge amount of built-in rules. They make it possible to simplify the code for small makefiles a bit, but medium and large projects usually override them. They affect performance because make has to wade through all these additional patterns trying to find rules for compiling files. Many of them are outdated - for example, the use of audit systems with RSC and SCCS. They are used by only a few people, but these rules will slow down all the assemblies of all other users.
You can disable them from the command line via make -r , but this is not the default behavior. You can disable them by adding a special directive to the makefile, but this is also not the default - and many forget to do this.

Other

There are also other make notes that do not fall into the previous categories.

Silence is gold

According to Eric Raymond, “one of the oldest and invariant UNIX world design rules is that if a program has nothing to say interesting or unexpected, it should be silent. Well-behaved programs do their work unobtrusively, with a minimum of required attention and anxiety. Silence is gold". make does not follow this rule.
When you run make, its log contains all the commands it runs and everything that these commands give to stdout and stderr. It's too much. Important warnings / errors sink in this stream, and the text is often displayed so quickly that it becomes unreadable.
You can greatly reduce this output by running make -sbut this is not the default behavior. Also, there is no intermediate option in which make displays what it is doing now - without typing command lines.

Multipurpose Rules

Some utilities generate more than one file as a result of their work. But make rules can only have one purpose. If you try to write a separate dependency on such an additional file, make cannot detect the connection between the two rules.

Warnings That Should Be Errors

Make prints warnings, but does not stop if it detects cyclic dependencies. This, most likely, indicates a serious error in the makefile, but make evaluates this situation as a minor nuisance.
Similarly, make prints a warning (and continues to work further) if it detects that there are two rules that describe how to make one target. He simply ignores one of them. And again - this is a serious bug in the makefile, but make does not think so.

Creating Directories

It is very convenient to put the output files for different configurations into different directories, and you will not need to rebuild the entire project when you change the configuration. For example, you can put "debug" binaries in the "debug" directory and similarly for the "release" configuration. But before you start putting files into these directories, you will need to create them.
It would be great if make did it automatically - obviously it’s impossible to build a target if its directory does not exist yet - but make does not.
It is not very practical to call mkdir -p $ (dir $ @)) in each rule. This is inefficient, and besides, you should ignore the error if the directory already exists.
You can try to solve the problem this way:

debug/%.o: %.c debug
        $(CC) -c $< -o $@
debug:
        mkdir $@

It looks workable - if "debug" does not exist, then create it before debug / foo.o is compiled. But only looks. Creating a new file in a directory changes the "last modification time" of this directory. Suppose we are compiling two files - debug / foo.o and debug / bar.o. Creating debug / bar.o will change the modification time of the debug directory. Now it will become newer than when debug / foo.o was created, that is, the next time we call make, the debug / foo.o file will be recompiled unnecessarily. And if recompilation is done by deleting the old file and creating a new one (and not by overwriting the existing file), you will get an endless series of unnecessary recompilations.
The solution is to create a dependency on the file (for example, debug / dummy.txt), and not on the directory. This requires additional actions in the makefile ( touch debug / dummy.txt ), and may conflict with make's ability to automatically delete intermediate files. And if you are not careful in specifying this additional dependency (on dummy.txt) for each goal, you will get problems when you run make in parallel.

conclusions

Make is a popular utility with many flaws. It can simplify your life, or it can complicate it. If you are working on a large software product, you should consider other alternatives to make. If you should use only make, you should be aware of its shortcomings.

PS: all of the above is a translation of this article . For a long time I was going to write a topic on the same topic, but after reading the article, I realized that it would be better to make a translation. Not all of the author’s arguments are “make-specific” (and some generally apply to absolutely all utilities of this kind), but all programmers who have to use it in their work need to know and understand the different rake of make.

Tags: