Criticism of the article "How to write in C in 2016"

Original author: Keith S. Thompson
  • Transfer
  • Tutorial

От переводчика:

Данная публикация является третьей и последней статьей цикла, стихийно возникшего после публикации перевода статьи "How to C in 2016" в блоге Inoventica Services. Тут критикуются некоторые изложенные в оригинале тезисы и окончательно формируется законченная "картина" мнений о поднимаемых автором первой публикации вопросах и методах написания кода на С. Наводку на англоязычный оригинал предоставил пользователь ImpureThought, за что ему отдельное спасибо. Со второй публикацией, наводку на текст которой дал, как я думаю, знакомый многим, пользователь CodeRush, можно ознакомиться здесь.

Matt (on whose website the author’s last name is not indicated, at least as far as I know) published an article “Programming C in 2016”, which later appeared on Reddit and Hacker News, it was on the last resource that I found it.

Yes, you can endlessly “discuss” C programming, but there are aspects that I clearly disagree with. This critical article is written in a constructive discussion. It is possible that in some cases Matt is right, and I am mistaken.

I do not quote Matt's entire publication. In particular, I decided to omit some points with which I agree. Let's get started.

The first rule of programming in C is to not use it if you can get by with other tools.

I do not agree with this statement, but this is too broad a topic for discussion.

When programming in C сlang, it defaults to C99, and therefore additional options are not required.

It depends on the version clang: clang 3.5by default it works with C99, clang 3.6- with C11. I'm not sure how tough it is when using out of the box.

If you need to use a specific standard for gcc or clang, do not complicate, use std = cNN -pedantic.

By default it gcc-5asks -std=gnu11, but in practice you need to specify c99 or c11 without GNU.

Well, except if you do not want to use specific gcc extensions, which, in principle, are quite suitable for these purposes.

If you find yourself in the new code something like char, int, short, longor unsigned, here's a mistake.

Excuse me, of course, but this is nonsense. In particular, int is the most acceptable type of integer data for the current platform. If we are talking about fast unsigned integers with at least 16 bits, there is nothing wrong with using int (or you can refer to an option int_least16_tthat will do fine with functions of the same type, but IMHO this is much more detailed than it's worth).

In modern programs, you must specify and only then select the standard data types.#include

The fact that the name is intnot spelled out «std»does not mean that we are dealing with something non-standard. Types such as int, longand others, are built into the C language. And typedefs fixed in appear later as additional information. This does not make them less "standard" than the built-in types, although they are, in some ways, inferior to the latter.

float- 32-bit floating-point standard
double- 64-bit floating-point standard

floatand double- the very common IEEE types for 32 and 64-bit floating point standards, in particular on modern systems, do not get stuck on this when programming in C. I worked on systems where float was used on 64 bits.

Please note: no more. char.Usually in the C programming language, the command is charnot only called, but also used incorrectly.

Unfortunately, merging parameters and bytes when programming in C is inevitable, and here we are just stuck. The char type is stably equated to one byte, where “byte” is at least 8 bits.

Software developers continually use the char command to mean “byte,” even when unsigned byte operations are performed. It is much more correct to indicate for individual unsigned byte / octet values uint8_t, and to select for a sequence of unsigned byte / octet values uint8_t *.

If bytes are meant, use unsigned char. If it's about octets, choose uint8_t. In the case when CHAR_BIT > 8, uint8_tit will not succeed to create, and, therefore, it will not work and compile the code (perhaps this is exactly what you need). If we work with objects with at least 8 bits, use uint_least8_t. If bytes mean octets, add something like this to the code:

#include 
#if CHAR_BIT != 8
    #error "This program assumes 8-bit bytes"
#endif

Please note: POSIX is requesting CHAR_BIT == 8.

in the C programming language, string literals ("hello")look like char *.

No, string literals are set to char []. In particular, for "hello" it is char [6]. Arrays are not pointers.

Do not try to write code using unsigned. Now you know how to write decent code without awkward C conventions with numerous data types that not only make the content unreadable, but also call into question the efficiency of using the finished product.

Many types in C are given names consisting of several words. And there is nothing wrong with that. If you are too lazy to type extra characters, this does not mean that you should stuff the code with all kinds of abbreviations.

Who would want to enter unsigned long long int if you can restrict uint64_tyourself to simple ?

On the one hand, you can use unsigned long long, meaning int. At the same time, knowing that these are different things and that the type unsigned long longis at least 64-bit, and indentation may or may not be present in it. uint64_tIt is designed for exactly 64 bits, and without indentation bits; this type is not necessarily registered in one or another code.

unsigned long longbuilt-in type in C. Any specialist working with this programming language is familiar with it.

Either try uint_least64_t, which may be identical or different from unsigned long long.

Types are much more specific and more precise in meaning, they better convey the intentions of the author, are compact - which is important both for operation and for readability.

Of course, the types of intN_tand uintN_tmuch more specific. But not all codes are the main thing. Do not specify what does not matter to you. Choose uint64_tonly when you really need exactly 64 bits - no more, no less.

Sometimes types with an exact length are required, for example, when it is necessary to adapt to a specific format (Sometimes emphasis is placed on byte order, alignment of elements, etc.C does not provide a description of specific parameters). Most often it’s enough to specify a certain range of values, for which the built-in types [u] int_leastN_t or [u] int_leastN_t are suitable.

The correct type for pointers in this case is uintptr_t, it is set by files .

What a terrible mistake.

Let's start with small errors: uintptr_tset , not .

This, if at all about the specifics. A command call, where void*it is impossible to convert to another integer type without data loss, is unlikely to determine uintptr_t(Such cases are extremely rare, if at all).

Instead:

long diff = (long)ptrOld - (long)ptrNew;


Yes, that’s not how things are done.

Use:

ptrdiff_t diff = (uintptr_t)ptrOld - (uintptr_t)ptrNew;


But this option is no better.

If you want to emphasize the difference in types, write:

ptrdiff_t diff = ptrOld - ptrNew;

If you need to focus on bytes, choose something like:

ptrdiff_t diff = (char*)ptrOld - (char*)ptrNew;

If ptrOldthey ptrNewdo not indicate the necessary parameters, or simply jump from the end of the object, it will be difficult to trace how the pointer calls the command to subtract data. The transition to uintptr_tguarantees at least a relative result, however, it can hardly be called very useful. Comparison or other arithmetic with pointers is permissible only when writing code for high-level systems, otherwise it is important that the pointers under study refer to the end of a certain object or skip from it (Exception: == and! = Work fine for pointers that refer to different objects).

In such situations, it is rational to refer to intptr_t, the integer data type corresponding to the values ​​equal to the word on your platform.

But no. The concept of "equal to the word" is very abstract. intptr_tsigned integer type that successfully converts void*to intptr_tand from without data loss. Moreover, this may be a value in excess of void*.

On 32-bit platforms, it intptr_ttransforms into int32_t.

It happens, but not always.

On 64-bit platforms intptr_ttakes on a look int64_t.

And again, it is likely, but not necessary.

Essentially, size_tit’s something like “an integer capable of storing huge array indices.

Nooo.

which means that he can fix impressive indicators of bias in the program being created.

Yes, this type of data allows you to save information about the size of the largest object involved in starting the program (there is also an opinion that this is also optional , but in practice it can be assumed that this is exactly what happens). It can fix the main memory offset if all offsets are made within the same object.

In any case size_t, it has practically the same characteristics on modern platforms as it does, uintptr_tand therefore on 32-bit versions it size_ttransforms into uint32_t, and on 64-bit versions , into uint64_t.

Most likely, but not necessary.

And more specifically, it size_tcan be used to save the size of any individual object, while it uintptr_tsets any pointer value, and, accordingly, with their help you will no longer mix up byte addresses of various objects. Most modern systems work with indivisible address lines, and therefore, theoretically, the maximum size of an object is equal to the total memory size. C programming standards require strict adherence to this requirement. So, for example, you may encounter a situation when on a 64-bit system objects do not exceed 32 bits.

Highlighting the word “modern”, we automatically omit both old alternatives (like x86, which used segmented addressing with near and far pointers), and do not touch on possible future products, which may also provide compatibility with C standards, although they go beyond definition "Modern."

Do not reference data types during operation. Always use appropriate type pointers.

This is one of the options, but not the only successful solution (And, for sure, you will agree that you still need to mention void * for "% p").

The initial value of the pointer is% p (in modern compilers it is displayed in the hexadecimal system; initially sends the pointer to void *)

Great tip - only the output format is set by the launch options. This is usually a hexadecimal value, but do not assume that no other is given.

     printf("Local number: %" PRIdPTR "\n\n", someIntPtr);

The name someIntPtrimplies the type int*, actually sets the type intptr_t.

There may be variations on the topic, which means that you do not need to memorize endless combinations of macro names:

some_signed_type n;
some_unsigned_type u;
printf("n = %jd, u = %ju\n", (intmax_t)n, (uintmax_t)u);

intmax_tand uintmax_t, as a rule, 64-bit. Their conversions are much more economical than physical I / O.

Note:% falls into the body of the format string literal, while the type pointer stays outside of it.

These are all parts of the format string. Macros are defined as string literals combined with adjacent string literals.

Modern compilers support #pragma once

But no one says that you must use this directive. Even the processor instructions do not make such recommendations public. And in the “Headings with Once” section, not a word about #pragma once; but it is described #ifndef. In the next section, “Alternatives to the #ifndef Packer” flashed #pragma once, but in this case it was only noted that this is not a portable option.

This function is supported by all compilers, and on different platforms, and is a much more effective mechanism than manually entering the header security code.

And who makes such recommendations? The directive #ifndefmay not be ideal, but it is reliable and portable.

IMPORTANT: If your structure has internal indentation, the {0} method will not zero out additional bytes intended for this purpose. So, for example, it happens if the struct thing has 4 bytes of indentation after counter(on a 64-bit platform), because the structures are filled in increments of one word. If you need to nullify the entire structure, including unused bytes of indentation, specify memset(&localThing, 0, sizeof(localThing)), because sizeof(localThing) == 16 bytes, despite the fact that only 8 + 4 = 12 bytes are available.

The task is getting complicated. There is usually no reason to pay special attention to indentation bytes. If you still want to devote your precious time memsetto them , use to reset them. Although I note that cleaning structures using memset, even taking into account that the whole elements are indeed assigned a value of zero, does not guarantee the same effect for floating-point types or pointers - they should be 0.0 and NULL( respectively, although on most systems function works fine).

Arrays of variable length appeared in C99

No, C99 does not provide initializers for VLAs (variable length arrays). But Matt, in fact, does not write about VLA initializers, mentioning only the VLA itself.

Arrays of variable length are a contradictory phenomenon. Unlike malloc, they do not involve error detection in resource allocation. So if you need to allocate N the number of bytes of data, you will need:

{
    unsigned char *buf = malloc(N);
    if (buf == NULL) { /* allocation failed */ }
    /* ... */
    free(buf);
}

at least, by and large, it is safer than:

{
    unsigned char buf[N];
    /* ... */
}

Yes, errors when using VLA are fraught with serious problems. But the same thing can be said, practically, about each function in any programming language.

And with old arrays of fixed length, similar questions arose. As long as you check the size before creating the array, a VLA with variable N is just as harmless as a fixed-length array of the same size. As a rule, to describe arrays of fixed length, a value is selected that exceeds the number of expected elements, since part of it is necessary for storing actual data. With VLA, you can allocate exactly as much space as the components require. And here I agree with Matt's recommendation.

In addition to one aspect: in C11, VLAs can be selected as desired. I doubt that most C11 compilers, in fact, will perceive variable-length arrays as optional, except in the case of small embedded systems. However, this feature should be remembered if you plan to write the most portable code.

If a function works with * arbitrary ** source data and a certain length, do not limit the type of this parameter. *

Obviously ERROR:

void processAddBytesOverflow(uint8_t *bytes, uint32_t len) {
    for (uint32_t i = 0; i < len; i++) {
        bytes[0] += bytes[i];
    }
}

Use instead:

void processAddBytesOverflow(void *input, uint32_t len) {
    uint8_t *bytes = input;
    for (uint32_t i = 0; i < len; i++) {
        bytes[0] += bytes[i];
    }
}

I agree, the void*ideal type for fixing the parameters of an arbitrary piece of memory. Take at least functions mem*in the standard library (But len ​​should be size_t, not uint32_t).

By declaring the source data type as void *, and re-assigning or referring again to the actual data type that is needed directly in the function body, you will protect users, because they don’t have to think about what is happening in your library.

A small note: this is not spelled out in Matt's function. Here we see an implicit conversion void*to uint8_t*.

In this example, some readers have run into alignment problems.

And they were mistaken. If we work with a specific piece of memory, as with a sequence of bytes, it is always safe.

C99 provides us with a whole set of functions , which trueequals 1, a false - 0.

Yes, and besides, this can be boolused to be used as an alias for the built-in type _Bool.

В случае с удачными/неудачными возвращаемыми значениями функции должны выдавать true or false, а не возвращаемый тип int32_t, требующий ручного ввода 1 и 0 (или, что еще хуже, 1 и -1; как тогда разобраться: 0 – success, а 1 — failure? Или 0 – success, а -1 — failure?)).

There is a widespread algorithm, in particular, on systems like Unix, when, if successful, the function returns 0, and if it fails, it returns some non-zero value (often -1). In many situations, variable non-zero results indicate different kinds of errors. When adding new functions to ready-made interfaces, it is important to follow the aforementioned standard (0 is equivalent to success, since, in general, there is only one option for the function to work effectively, but there can be many errors in it).

A function created to analyze certain conditions should give out trueor false. Just do not confuse them with successful / unsuccessful code execution outcomes.

The function boolmust be given a name in the form of an assertion. In English, it will be a wording that answers the yes / no question. For instance,is_foo()and. A has_widget()function designed for a specific action, in the case of which it is important for you to know how successfully it can be performed, it will probably be asked by another statement. In some languages ​​it is reasonable to resort to adding / subtracting exceptions. In C, you have to follow certain unspoken rules, including setting a zero value for a positive result of the function.

The only product that in 2016 will allow you to format products developed in C is clang-format. The native clang-format settings are an order of magnitude higher than any other automatic C-code formatter.

I myself have not used clang-format. I just have to meet him.

But I would like to voice a few fundamental points regarding the formatting of C-code:

  • We put open brackets at the end of the line;
  • Instead of tab we use spaces;
  • 4-columns in one level;
  • Our curly braces are all (with the exception of individual cases when, in order to increase readability, it is easier to list tasks directly on the line);
  • Follow the instructions of the project you are working on.

I rarely turn to automatic formatting tools. Maybe in vain?

Never use get malloc
used to calloc.

Here's another. Trying to reset all bits of the allocated memory comes down to a very arbitrary process, and, as a rule, this is not a good idea. If the code is written correctly, you cannot call this or that object without first assigning it the corresponding value. Using calloc, you will encounter the fact that any bug in the code will be equal to zero, which means that it will be easy to confuse a system error with unnecessary data. Does that sound like a code improvement?

Zeroing the memory often leads to the fact that an error in the program code runs sequential algorithms; by definition, this cannot be called the correct course of launch. But sequential errors are much more difficult to track.

Yes, if the code was written without errors. But if you adhere to a defensive strategy when creating the code, it may be worth assigning a specific value from the category of invalid to the allocated memory .

On the other hand, if zeroing all the bits solves the tasks, you can try to use it calloc.



PS
We also invite readers next week to visit our cloud data center with a guided tour. Announcement of the event on Habré here .

Also popular now: