Инициализация в С++ действительно безумна. Лучше начинать с Си

https://mikelui.io/2019/01/03/seriously-bonkers.html
  • Перевод
Recently I was reminded why I consider it a bad idea to give C ++ to newbies. This is a bad idea , because in C ++ a real mess is a beautiful, though perverse, tragic and amazing mess. Despite the current state of the community, this article is not directed against modern C ++. Rather, it partially continues the article “Initialization in C ++ is insane” by Simon Brand , and in part it is a message to every student who wants to start his education by looking into the abyss.

Typical objections from students when they are told about studying C:

  • “Someone else uses it?”
  • "This is stupid"
  • "Why do we study C?"
  • “We have to learn something better, for example, C ++” ( laughter )

It seems that many students think that learning C doesn’t matter much (from the author: it’s not), and instead you should start with C ++. Let's look at just one of the reasons why this is an absurd proposition: creating a fucking variable . In the original article, Simon Brand suggested that the reader is already familiar with the weirdness of initialization in versions prior to C ++ 11. We will look at some of them here and go a little further.

Let me first explain that in this article my personal opinion, and not the official position of the University of Drexel, where I teach at the Department of Electrical Engineering and Computer Engineering. My lectures are usually included in the course of the engineering program, rather than computer science, that is, they relate more to system programming and embedded systems.

Summary in one gifka


u / AlexAlabuzhev on Reddit managed to retell the whole article in one gif. (I think this is the original work of Timur Dumler )


I have nothing against C ++, but there is a lot that you don’t need at an early stage.

That's all. Go home. Walk the dog. Wash your laundry. Call your mom and tell me you love her. Try a new recipe. There is nothing to read, guys. In fact, think about how badly the engineers (that is, me) are able to communicate their thoughts ...

Everything, I persuaded as I could!

So are you still here? This soldier. If I could, I would give you a medal! And tasty chocolate milk!

Now back to our usual ... programming.

Initialization in C


Introduction


First, consider initialization in C , because it is similar to C ++ for compatibility reasons. It will be pretty quick, because C is so boring and simple ( ahem ). Every newcomer learns this initialization by heart, because in C it works differently than in many new statically typed languages. There is either a default initialization for acceptable values ​​or a compilation error.

int main() {
    int i;
    printf("%d", i);
}

Any normal C programmer knows that this initializes ias an undefined value (for all intents and purposes, inot initialized). It is usually recommended to initialize variables when they are defined, for example int i = 0;, and variables should always be initialized before use. Regardless of how many times to repeat, shout, yell, gently remind students of this, there remain those who believe that the variable is initialized to 0.

Great, let's try another simple example.

int i;
int main() {
    printf("%d", i);
}

Obviously the same thing? We have no idea about the meaning i - it can be any.

Not.

Since the variable has a static storage duration, it is initialized to an unsigned zero. You will ask why? Because it is said in the standard. Similar behavior on pointer types that I am not even going to discuss in this article.

Okay, look at the structure.

struct A {
    int i;
};
int main() {
    struct A a;
    printf("%d", a.i);
}

Same. anot initialized. We will see a warning when compiling.

$ gcc -Wuninitalized a.c
a.c: In function ‘main’:
a.c:9:5: warning: ‘a.i’ is used uninitialized in this function [-Wuninitialized]
     printf("%d\n", a.i);

In C, you can initialize an object in several simple ways. For example: 1) using an auxiliary function, 2) during the definition, or 3) assign a global default value.

struct A {
    int i;
} const default_A = {0};
void init_A(struct A *ptr) {
    ptr->i = 0;
}
int main() {
    /* helper function */
    struct A a1;
    init_A(&a1);
    /* during definition;
     * Initialize each member, in order. 
     * Any other uninitialized members are implicitly
     * initialized as if they had static storage duration. */
    struct A a2 = {0};
    /* Error! (Well, technically) Initializer lists are 'non-empty' */
    /* struct A a3 = {}; */
    /* ...or use designated initializers if C99 or later */
    struct A a4 = {.i = 0};
    /* default value */
    struct A a5 = default_A;
}

This is almost all you need to know about initialization in C, and this is enough to cause many cunning mistakes in many student projects. And certainly problems will appear if we assume that by default everything is initialized in 0.

C ++ Initialization


Act 1. Our hero begins the journey.


If you can't wait to learn all the horrors of C ++, first learn how to initialize variables. Here the same behavior as in C from the previous code, but with some reservations in the rules of this behavior. In the text, I will italicize specific C ++ jargon to emphasize those moments where I do not just arbitrarily call things, but point to a huge number of new ... features in C ++ compared to C. Let's start with simple:

struct A {
    int i;
};
int main() {
    A a;
    std::cout << a.i << std::endl;
}

Here, C and C ++ have almost the same behavior. In C, a type object is simply created A, the value of which can be any. In C ++, a the default is initialized , that is used to build the structure default constructor . Because it is Aso trivial, it has an implicitly-defined default constructor , which in this case does nothing. The implicitly-defined default constructor "has exactly the same effect" as:

struct A {
    A(){}
    int i;
}

To check for an uninitialized value, look at the warning at compile time. At the time of this writing, I g++ 8.2.1issued good warnings, but clang++ 7.0.1in this case I did not give anything (with the established one -Wuninitialized). Note that optimization is included to view additional examples.

$ g++ -Wuninitalized -O2 a.cpp
a.cpp: In function ‘int main()’:
a.cpp:9:20: warning: ‘a.A::i’ is used uninitialized in this function [-Wuninitialized]
     std::cout << a.i << std::endl;

In fact, this is what we expect from C. So how do you initialize A::i?

Act 2. Our hero stumbles


Perhaps you can use the same methods as in C? After all, C ++ is a superset of C, right? ( khm )

struct A {
    int i;
};
int main() {
    A a = {.i = 0};
    std::cout << a.i << std::endl;
}

$ g++ -Wuninitialized -O2 -pedantic-errors a.cpp
a.cpp: In function ‘int main()’:
a.cpp:9:12: error: C++ designated initializers only available with -std=c++2a or -std=gnu++2a [-Wpedantic]
     A a = {.i = 0};

Here you have relatives. Explicit initializers are not supported in C ++ to C ++ 20. This is the C ++ standard, which is scheduled for release in 2020. Yes, in C ++, the function is implemented 21 years after it appeared C. Please note that I have added -pedantic-errorssupport for non-standard gcc extensions to be removed.

What about that?

struct A {
    int i;
};
int main() {
    A a = {0};
    std::cout << a.i << std::endl;
}

$ g++ -Wuninitialized -O2 -pedantic-errors a.cpp
$

Well, at least it works. We can also do this A a = {};with the same effect as zero initialization a.i. This is because it Ais an aggregate type . What it is?

Before C ++ 11, the aggregated type (in essence) is either a simple C-style array or a structure that looks like a simple C structure. Neither access specifiers, nor base classes, nor user constructors, nor virtual functions. Aggregated type gets aggregated initialization. What does it mean?

  1. Each class object is initialized with each element of the linked list in order.
  2. Each object without a corresponding linked list of elements will receive the value “initialized” .

Great, what does this mean? If the object has a different class type with a custom constructor, this constructor will be called. If an object is a class type without a custom constructor, like A, it will be recursively initialized with a specific value. If we have a built-in object, like int i, then it is initialized to zero .

Urrrrrrraaa! Finally, we got a kind of default value: zero! Wow.

After C ++ 11, the situation looks different ... let's come back to this later.

Difficult to remember and confused? Please note that each version of C ++ has its own set of rules. And there is. It's damn messy and nobody likes. These rules usually work, so usually the system works as if you initialize the elements as zero. But in practice, it is better to explicitly initialize everything. I do not find fault with aggregated initialization, but I do not like the need to wade through the wilds of the standard in order to know exactly what happens during initialization.

Act 3. The hero wandered into the cave


Well, we initialize Аthe C ++ method with constructors ( solemn music )! We can assign an initial value to an element iin the structure Аin the custom default constructor:

struct A {
    A() : i(0) {}
    int i;
};

This initializes iin the list of member initializers . A dirtier way is to set the value inside the constructor body:

struct A {
    A() { i = 0; }
    int i;
};

Since the body of the constructor can do almost anything, it is better to select the initialization in the list of member initializers (technically part of the body of the constructor).

In C ++ 11 and later versions, you can use default member initializers (seriously, if possible, just use them).

struct A {
    int i = 0; // default member initializer, available in C++11 and later
};

Ok, now the default constructor ensures that it is iset to 0 when any structure is Ainitialized by default. Finally, if we want to allow users A to set an initial value i, we can create another constructor for this. Or mix them with the default arguments:

struct A {
    A(int i = 0) : i(i) {}
    int i;
};
int main() {
    A a1;
    A a2(1);
    std::cout << a1.i << " " << a2.i << std::endl;
}

$ g++ -pedantic-errors -Wuninitialized -O2 a.cpp
$ ./a.out
0 1

Note. You cannot write A a();to call the default constructor, because it will be perceived as a declaration of a function with a name athat takes no arguments and returns an object A. Why? Because someone once wanted to allow function declarations in blocks of compound operators, and now we are stuck with this.

Fine! That's all. Mission Complete. You have received a push and are ready to continue the adventures in the world of C ++ by obtaining a useful survival guide with instructions on initializing variables. Turn around and move on!

Act 4. The hero continues to plunge into darkness.


We could stop. But, if we want to use modern features of modern C ++, then we need to go deeper. In fact, my version of g ++ (8.2.1) uses the default gnu++1y, which is equivalent to C ++ 14 with some additional GNU extensions. Moreover, this version of g ++ also fully supports C ++ 17. “Does it matter?” - you may ask. Boy, put on your fishing boots and follow me right in the thick of it.

In all the latest versions, including C ++ 11, this newfangled method of initializing objects, called the initialization list , is implemented . Feel like a chill run down your back? This is also called uniform initialization.. There are several good reasons to use this syntax: see here and here . One funny quote from the FAQ:

Uniform initialization of C ++ 11 is not completely uniform, but it is almost so.

The initialization list is used with curly braces ( {thing1, thing2, ...}this is called braced-init-list ) and looks like this:

#include <iostream> 
struct A {
    int i;
};
int main() {
    A a1;      // default initialization -- as before
    A a2{};    // direct-list-initialization with empty list
    A a3 = {}; // copy-list-initialization with empty list
    std::cout << a1.i << " " << a2.i << " " << a3.i << std::endl;
}

$ g++ -std=c++11 -pedantic-errors -Wuninitialized -O2 a.cpp
a.cpp: In function ‘int main()’:
a.cpp:9:26: warning: ‘a1.A::i’ is used uninitialized in this function [-Wuninitialized]
     std::cout << a1.i << " " << a2.i << " " << a3.i « std::endl;

Hey hey, did you notice that? Remained uninitialized only a1.i. Obviously, the initialization list works differently than just calling the constructor.

A a{};produces the same behavior as A a = {};. In both cases, it is ainitialized with an empty braced-init-list. In addition, it is A a = {};no longer called aggregate initialization - now it's copy-list-initialization ( sighs ). We have already said that it A a;creates an object with an undefined value and calls the default constructor.

The following happens in lines 7/8 (remember that this is after C ++ 11 ):

  1. The initialization list for Aleads to the second item.
  2. An aggregate initialization is triggered , since A is an aggregate type .
  3. Since the list is empty, all members are initialized to empty lists.
    1. int i {} initializes a value of i0.

And if the list is not empty?

int main() {
    A a1{0}; 
    A a2{{}};
    A a3{a1};
    std::cout << a1.i << " " << a2.i << " " << a3.i << std::endl;
}

$ g++ -std=c++11 -pedantic-errors -Wuninitialized -O2 a.cpp
$

a1.iinitialized to 0, a2.iinitialized to an empty list, and a3 a copy constructed from a1. You know what a copy designer is, right ? Then you also know about the displacement constructors, the rvalue references, as well as the transmitted references, pr-values, x-values, gl-value ... well, it doesn't matter.

Unfortunately, in each version with C ++ 11 the value of the aggregate changed, although there is still no functionally functional between the C ++ 17 and C ++ 20 aggregates. Depending on which version of the C ++ standard is being used, something may or may not be an aggregate. Trend towards liberalization. For example, public base classes in aggregates are allowed starting from C ++ 17, which in turn complicates the rules for initializing aggregates. Everything is great!

How do you feel? Some water? Clenched fists? Maybe take a break, go outside?

Act 5. Goodbye, common sense


What happens if Ait is not an aggregate type?

In short, what is a unit:

  • array or
  • structure / class / union where
    • no private / protected members
    • no declared or user provided constructors
    • no virtual functions
    • no default member initializers (in C ++ 11, for later versions without a difference)
    • no base classes (public base classes are allowed in C ++ 17)
    • no inherited constructors ( using Base::Base;, in C ++ 17)

So a non-aggregate object can be like this:

#include <iostream>
struct A {
    A(){};
    int i;
};
int main() {
    A a{};
    std::cout << a.i << std::endl;
}

$ g++ -std=c++11 -pedantic-errors -Wuninitialized -O2 a.cpp
a.cpp: In function ‘int main()’:
a.cpp:8:20: warning: ‘a.A::i’ is used uninitialized in this function [-Wuninitialized]
     std::cout << a.i << std::endl;

Here, Athere is a user-supplied constructor, so initializing the list works differently.

In line 7, the following happens:

  1. The initialization list for Aleads to the second item.
  2. A non-aggregate with an empty braced-init-list causes the value to initialize, go to the third item.
  3. A custom constructor is found, so the default constructor is called, which does nothing in this case, it is a.inot initialized.

What is a user-provided constructor?

struct A {
    A() = default;
};

This is not a user-provided constructor. It is as if no constructor is declared in the voice, but Ais an aggregate.

struct A {
    A();
};
A::A() = default;

This is the constructor provided by the user. It's like we wrote A(){}in the body, where Аit is not an aggregate.

And guess what? In C ++ 20, the wording has changed: now it requires that the aggregates do not have user- declared constructors :). What does this mean in practice? I'm not sure! Let's continue.

How about the following:

#include <iostream>
class A {
    int i;
    friend int main();
};
int main() {
    A a{};
    std::cout << a.i << std::endl;
}

A- This is a class, not a structure, so it iwill be private, and we had to set it mainas a friendly function. What does Аnot the unit. This is just the usual type of class. That means it a.iwill remain uninitialized, right?

$ g++ -std=c++11 -pedantic-errors -Wuninitialized -O2 a.cpp
$

Damn it. And this is when we sort of started to deal with all this. It turns out that it is a.iinitialized as 0, even if it does not cause initialization of the aggregate:

  1. Initializing the list for A, go to step 2.
  2. A non-unit, a class type with a default constructor, and an empty braced-init-list list cause the value to initialize, go to step 3.
  3. The constructor provided by the user was not found, so we initialize the object as zero, go to step 4.
  4. The default initialization call, if the implicitly defined default constructor is not trivial (in this case the condition fails and nothing happens).

What if we try aggregate initialization:

#include <iostream>
class A {
    int i;
    friend int main();
};
int main() {
    A a = {1};
    std::cout << a.i << std::endl;
}

$ g++ -std=c++11 -pedantic-errors -Wuninitialized -O2 a.cpp
a.cpp: In function ‘int main()’:
a.cpp:7:13: error: could not convert ‘{1}’ from ‘<brace-enclosed initializer list>’ to ‘A’
     A a = {1};

A not an aggregate, so the following happens:

  1. Initializing the list for A, go to step 2.
  2. Search for a suitable designer.
  3. There is no way to convert 1to A, compilation fails.

As a bonus naughty primerchik:

#include <iostream>
struct A {
    A(int i) : i(i) {}
    A() = default;
    int i;
};
int main() {
    A a{};
    std::cout << a.i << std::endl;
}

There are no private variables here, as in the previous example, but there is a custom constructor, as in the penultimate example: thus, A is not an aggregate. User-supplied constructor eliminates null initialization, right ?

$ g++ -std=c++11 -pedantic-errors -Wuninitialized -O2 a.cpp
$

No ! Understand the points:

  1. Initializing the list for A, go to step 2.
  2. A non-unit, a class type with a default constructor, and an empty braced-init-list list cause the value to initialize, go to step 3.
  3. No default custom constructor was found ( this is what I missed above ), so the object is initialized as zero , go to step 4.
  4. The default initialization call, if the implicitly defined default constructor is not trivial (in this case the condition fails and nothing happens).

One last example:

#include <iostream>
struct A {
    A(){}
    int i;
};
struct B : public A {
    int j;
};
int main() {
    B b = {};
    std::cout << b.i << " " << b.j << std::endl;
}

$ g++ -std=c++11 -pedantic-errors -Wuninitialized -O2 a.cpp
a.cpp: In function ‘int main()’:
a.cpp:11:25: warning: ‘b.B::<anonymous>.A::i’ is used uninitialized in this function [-Wuninitialized]
     std::cout << b.i << " " << b.j << std::endl;

b.jinitialized and b.inot. What happens in this example? I do not know! All bases band members here should receive a zero initialization. I asked a question on Stack Overflow , and at the time of publication of this message I did not get a firm answer, except for a possible compiler error, people came to a consensus that there is a compiler error. These rules are subtle and complex for everyone . For comparison, the static clang analyzer (not an ordinary compiler) does not warn about uninitialized values ​​at all. Understand yourself.

... (stupidly looks at you) (the look turns into a polite smile) well, let's dive even deeper!

Act 6. Abyss


Something called appeared in C ++ 11 std::initializer_list. He has his own style: obviously std::initializer_list<T>. You can create it using braced-init-list. And by the way, the braced-init-list for the initialization list has no type. Do not confuse initializer_list with the initialization list and braced-init-list! All of them are related to member initializer lists and default member initializers, as they help initialize non-static data elements, but they are very different. They are related, but different! Easy, right?

struct A {
    template <typename T>
    A(std::initializer_list<T>) {}
    int i;
};
int main() {
    A a1{0};
    A a2{1, 2, 3};
    A a3{"hey", "thanks", "for", "reading!"};
    std::cout << a1.i << a2.i << a3.i << std::endl;
}

$ g++ -std=c++17 -pedantic-errors -Wuninitialized -O2 a.cpp
a.cpp: In function ‘int main()’:
a.cpp:12:21: warning: ‘a1.A::i’ is used uninitialized in this function [-Wuninitialized]
     std::cout << a1.i << a2.i << a3.i << std::endl;
                     ^
a.cpp:12:29: warning: ‘a2.A::i’ is used uninitialized in this function [-Wuninitialized]
     std::cout << a1.i << a2.i << a3.i << std::endl;
                             ^
a.cpp:12:37: warning: ‘a3.A::i’ is used uninitialized in this function [-Wuninitialized]
     std::cout << a1.i << a2.i << a3.i << std::endl;

Okay. Have Aone template constructor that accepts std::initializer_list<T>. Each time a constructor is invoked, provided by the user, that does nothing, so it iremains uninitialized. The type Tis displayed depending on the elements in the list, and a new constructor is created depending on the type.

  • Thus, in the eighth line {0}is displayed as std::initializer_list<int>with one element 0.
  • The ninth line {1, 2, 3}is displayed as std::initializer_list<int>with three elements.
  • In the tenth line, the initialization list braced-init-list is displayed as std::initializer_list<const char*>with four elements.

Note: A a{}will result in an error, as the type cannot be inferred. For example, we need to write a{std::initializer_list<int> {}}. Or we can specify the constructor exactly as in A(std::initializer_list<int>){}.

std::initializer_listIt acts approximately as a typical STL container, but with only three component functions: size, beginand end. Iterators beginand endyou can normally dereference, increment and compare. This is useful when you want to initialize an object with lists of different lengths:

#include <vector>
#include <string>
int main() {
    std::vector<int> v_1_int{5};
    std::vector<int> v_5_ints(5);
    std::vector<std::string> v_strs = {"neato!", "blammo!", "whammo!", "egh"};
}

We std::vector<T>have a constructor that accepts std::initializer_list<T>, so we can easily initialize the vectors, as shown above.

Note. The vector is v_1_intcreated from its constructor, which takes std::initializer_list<int< initone element 5.

The vector is v_5_intscreated from the constructor size_t count, which initializes the vector from ( 5) elements and initializes them to values ​​(in this case, all are equal 0).

Okie – docks, last example:

#include <iostream>
struct A {
    A(std::initializer_list<int> l) : i(2) {}
    A(int i = 1) : i(i) {}
    int i;
};
int main() {
    A a1;
    A a2{};
    A a3(3);
    A a4 = {5};
    A a5{4, 3, 2};
    std::cout << a1.i << " "
              << a2.i << " "
              << a3.i << " "
              << a4.i << " "
              << a5.i << std::endl;
}

At first glance, it is not too difficult. We have two constructors: one accepts std::initializer_list<int>, and the other accepts arguments by default int. Before looking at the output below, try saying what the value will be for each i.

Thought ...? Let's see what happens.

$ g++ -std=c++11 -pedantic-errors -Wuninitialized -O2 a.cpp
$ ./a.out
1 1 3 2 2

With a1everything should be easy. This is a simple default initialization that chooses a default constructor using its default arguments. a2uses an initialization list with an empty list. Since u Ahas a default constructor (with arguments by default), the value is initialized with a simple reference to this constructor. If you did An’t have this constructor, you would go to the constructor in the third line, invoking an empty list. a3uses brackets rather than braced-init-list, so chooses overload resolution 3with the constructor accepting int. Next, it а4uses an initialization list, for which overload resolution leans in favor of the constructor that accepts the objectstd::initializer_list. Obviously, a5it is impossible to relate to some int, so the same constructor is used as for a4.

Epilogue


I hope you understand that this article is ( mostly ) polemical and, I hope, a bit informative. Many of the nuances described here can be ignored, and the language will react predictably if you remember to initialize the variables before use and initialize the data elements during construction. For writing literate code, it is not necessary to study all borderline situations With ++, you will still deal with pitfalls and idioms as you go. For clarityThe initialization list is a good thing. If you wrote a default constructor, it is invoked and must initialize everything. Otherwise, everything is initialized to zero, and then the default member initializers are activated independently. Uninitialized behavior also needs to be left, because somewhere, there is probably code that relies on uninitialized variables.

Hopefully, I managed to demonstrate that C ++ is a big, difficult language (for many historical reasons). The whole article is devoted to the nuances of initialization. Just initialize the variables . And we didn’t even cover the topic entirely, but briefly described only 5 types of initialization. Simon in the original article mentions 18 types of initialization.

I would not like to teach beginners how to program in C ++. In this article, there was no place for the concepts of system programming, arguments about programming paradigms, problem-solving methodologies or fundamental algorithms. If you are interested in C ++, sign up for a course specifically on C ++, but keep in mind that this particular language will be taught there. If you are interested in C with classes or C with namespaces , first learn about the implementation thisand collisions of identifiers in C.

C is an excellent, clear, fast, well supported and widely used language for solving problems in various fields. And it definitely does not have 18 types of initialization.



By the way, I completely forgot that I was discussing exactly on this topic a month ago . That's what the subconscious does.



Discussion of this article and criticism in various forums:

  1. Lobste.rs
  2. Hacker news
  3. Reddit

Responding to the most common criticism: yes, you can learn sensible ways to initialize variables and never meet the abyss . In this regard, I specifically wrote in the epilogue that the list of initialization is a good thing. Personally, I rarely use templates, but I still use C ++. That's not the point. The fact is that a novice programmer can completely ignore STL and use the standard C library, ignore references, exceptions and inheritance. So we are approaching C with classes, except that it is not C, and you still do not understand pointers, memory allocation, stack, heap, virtual memory. And now whenever I really need a C, I have to switch to anothera language that could learn from the beginning. If you are going to use C ++, use C ++. But if you want to use C ++ without all the features of C ++, then just study C. And repeat from the first paragraph , I am not against C ++. We see warts on the body of our loved ones and still love them.

And that is all I can say about it.

Also popular now: