Internal and external linking in C ++

http://www.goldsborough.me/c/c++/linker/2016/03/30/19-34-25-internal_and_external_linkage_in_c++/
  • Transfer
Good day everyone!

We present you an interesting article translation that was prepared for you as part of the C ++ Developer Course . We hope that it will be useful and interesting for you, as well as our listeners.

Go.

Have you ever come across the terms internal and external communication? Want to know what the extern keyword is used for, or how does the declaration of something static affect the global scope? Then this article is for you.

In a nutshell

The translation unit includes the implementation file (.c / .cpp) and all its header files (.h / .hpp). If an object or function has an internal binding within a translation unit, then this symbol is visible to the linker only within that translation unit. If an object or function has an external binding, then the linker will be able to see it when processing other translation units. Using the static keyword in the global namespace gives the symbol internal binding. The keyword extern gives external linking.
The default compiler gives the characters the following bindings:

  • Non-const global variables - external binding;
  • Const global variables - internal binding;
  • Functions - external binding.



Basics

Let's first talk about two simple concepts needed to discuss binding.

  • The difference between a declaration and a definition;
  • Units broadcast.

Also note the names: we will use the concept of “symbol” when it comes to any “code entity” that the linker works with, for example, with a variable or function (or with classes / structures, but we will not focus on them).

Ad VS. Definition

Let us briefly discuss the difference between the declaration and the definition of a symbol: the declaration (or declaration) tells the compiler about the existence of a particular symbol, and allows access to this symbol in cases that do not require an exact memory address or symbol store. The definition tells the compiler what the function body contains or how much memory to allocate to the variable.

In some situations, the compiler does not have enough declarations, for example, when a data element of a class has a reference type or value (that is, not a reference, and not a pointer). At the same time, a pointer to a declared (but undefined) type is allowed, since it needs a fixed amount of memory (for example, 8 bytes on 64-bit systems), independent of the type pointed to. To get a value on this pointer, a definition is needed. Also, to declare a function, you need to declare (but not define) all parameters (no matter whether they are taken by value, reference, or pointer) and the return type. Determining the type of return value and parameters is only necessary to define a function.

Functions

The difference between a definition and a function declaration is quite obvious.

intf();               // объявлениеintf(){ return42; } // определение

Variables

With variables, it's a little different. The declaration and definition are usually not separated. The main thing is that:

int x;

Not only announces x, but also defines it. This happens due to the default constructor call int. (In C ++, unlike Java, the constructor of simple types (such as int) does not by default initialize the value to 0. In the example above, x will be equal to any garbage lying in the memory address allocated by the compiler).

But you can explicitly separate the declaration of a variable and its definition using a keyword extern.

externint x; // объявлениеint x = 42;   // определение

However, when initialized and added extern to an ad, the expression becomes a definition and the keyword extern becomes useless.

externint x = 5; // то же самое, что и int x = 5;

Pre-Declaration

In C ++, there is the concept of pre-declaring a character. This means that we declare the type and name of the symbol for use in situations that do not require its definition. So we will not need to include a full definition of a character (usually a header file) unless explicitly necessary. Thus, we reduce the dependence on the file containing the definition. The main advantage is that when a file is modified with a definition, the file where we previously declare this symbol will not require recompilation (and, therefore, all other files including it).

Example

Suppose we have a function declaration (called a prototype) for f, which accepts an object of type Class by value:

// file.hppvoidf(Class object);

Immediately turn on the definition Class - naive. But since we have just announced f, it is enough to provide an announcement to the compiler Class. Thus, the compiler will be able to recognize the function by its prototype, and we will be able to get rid of the file.hpp dependency on the file containing the definition Class, say class.hpp:

// file.hppclassClass;voidf(Class object);

Suppose file.hpp is contained in 100 other files. And let's say we change the definition of Class in class.hpp. If you add class.hpp to file.hpp, file.hpp and all 100 files containing it will need to be recompiled. Due to the preliminary declaration of Class, the only files that need to be recompiled are class.hpp and file.hpp (assuming that f is defined there).

Frequency of use

An important difference between a declaration and a definition is that a symbol can be declared many times, but is defined only once. So you can pre-declare a function or class as many times as you like, but there can be only one definition. This is called the Rule of One Definition . In C ++, the following works:

intf();
intf();
intf();
intf();
intf();
intf();
intf(){ return5; }

And it does not work:

intf(){ return6; }
intf(){ return9; }

Translation Units

Programmers typically work with header and implementation files. But not compilers - they work with translation units (translation units, TU for short), which are sometimes called compilation units. The definition of such a unit is quite simple - any file passed to the compiler after it has been pre-processed. To be precise, this is a file resulting from the work of a preprocessor extending a macro, including source code, which depends on #ifdefboth #ifndef expressions and copy-paste of all files #include.

There are the following files:

header.hpp:

#ifndef HEADER_HPP#define HEADER_HPP#define VALUE 5#ifndef VALUEstructFoo {private: int ryan; };
#endifintstrlen(constchar* string);
#endif/* HEADER_HPP */

program.cpp:

#include"header.hpp"intstrlen(constchar* string){
	int length = 0;
	while(string[length]) ++length;
	return length + VALUE;
}

The preprocessor will issue the following translation unit, which is then passed to the compiler:

intstrlen(constchar* string);
intstrlen(constchar* string){
	int length = 0;
	while(string[length]) ++length;
	return length + 5;
}

Relationships

After discussing the basics, you can proceed to relationships. In general, communication is the visibility of symbols for the linker when processing files. Communication can be either external or internal.

External communication

When a symbol (variable or function) has an external link, it becomes visible to linkers from other files, that is, “globally” visible, accessible to all translation units. This means that you must define such a symbol in a specific location of one translation unit, usually in the implementation file (.c / .cpp), so that it has only one visible definition. If you try simultaneously with the declaration of the symbol to perform its definition, or place the definition in the file to the declaration, then you risk annoying the linker. Attempting to add a file to more than one implementation file leads to the addition of definition to more than one translation unit — your linker will cry.

The extern keyword in C and C ++ (explicitly) declares that the symbol has an external link.

externint x;
externvoidf(conststd::string& argument);

Both symbols have an external connection. As noted above, const global variables have intrinsic binding by default, non-const global variables are extrinsic. This means that int x; - the same as extern int x ;, right? Not really. int x; is actually analogous to extern int x {}; (using the syntax universal / bracket initialization, to avoid the most unpleasant syntax analysis (the most vexing parse)), since int x; not only declares, but also defines x. Therefore, do not add extern to int x; Globally as bad as defining a variable when declaring its extern:

int x;          // то же самое, что и externint x{}; // скорее всего приведет к ошибке компоновщика.externint x;   // а это только объявляет целочисленную переменную, что нормально

Bad Example

Let's declare a function fwith an external link in file.hpp and define it there:

// file.hpp#ifndef FILE_HPP#define FILE_HPPexternintf(int x);
/* ... */intf(int){ return x + 1; }
/* ... */#endif/* FILE_HPP */

Please note that you do not need to add extern here, since all functions are explicitly extern. Separation of ads and definitions is also not required. So let's just rewrite it like this:

// file.hpp#ifndef FILE_HPP#define FILE_HPPintf(int){ return x + 1; }
#endif/* FILE_HPP */

Such a code could be written before reading this article, or after reading it under the influence of alcohol or heavy substances (for example, cinnamon buns).

Let's see why this is not worth doing. Now we have two implementation files: a.cpp and b.cpp, both included in file.hpp:

// a.cpp#include"file.hpp"/* ... */


// b.cpp#include"file.hpp"/* ... */

Now let the compiler work and generate two translation units for the two implementation files above (remember what #includeliterally means copy / paste):

// TU A, from a.cpp
int f(int) { return x + 1; }
/* ... */

// TU B, from b.cpp
int f(int) { return x + 1; }
/* ... */

The linker intervenes at this stage (binding occurs after compilation). The linker takes a symbol fand searches for a definition. Today he was lucky, he finds as many as two! One in the broadcast unit A, the other in B. The linker freezes with happiness and tells you something like this:

duplicate symbol __Z1fv in:
/path/to/a.o
/path/to/b.o

The linker finds two definitions for a single character f. Since u fhas external binding, it is visible to the linker when processing both A and B. Obviously, this violates the Rule of One Definition and causes an error. More precisely, this causes a duplicate symbol error, which you will receive no less than the undefined symbol error that occurs when you declare a symbol, but forgot to define it.

Using

A standard example of declaring extern variables is global variables. Suppose you are working on a self-baking cake. Surely there are global variables associated with the cake that should be available in different parts of your program. Let's say the clock frequency of the edible scheme is inside your cake. This value is naturally required in different parts for the synchronous operation of all chocolate electronics. The (evil) C-way to declare such a global variable has the form of a macro:

#define CLK 1000000

A C ++ programmer who is disgusted with macros will better write real code. For example:

// global.hppnamespace Global
{
	externunsignedint clock_rate;
}
// global.cppnamespace Global
{
	unsignedint clock_rate = 1000000;
}

(A modern C ++ programmer will want to use delimiting literals: unsigned int clock_rate = 1'000'000;)

Internal Communication

If the symbol has an internal link, then it will be visible only within the current translation unit. Do not confuse visibility with access rights, such as private. Visibility means that the linker can use this symbol only when processing a translation unit in which the symbol was declared, and not later (as in the case of symbols with an external link). In practice, this means that when declaring a symbol with an internal link in a header file, each translation unit that includes this file will receive a unique copy of this symbol. As if you predetermined each such character in each translation unit. For objects, this means that the compiler will literally allocate a completely new, unique copy for each translation unit, which obviously can lead to high memory consumption.

To declare a symbol with an internal link, in C and C ++ there is a static keyword. This use differs from the use of static in classes and functions (or, in general, in any blocks).

Example

Let's give an example:

header.hpp:

staticint variable = 42;

file1.hpp:

voidfunction1();

file2.hpp:

voidfunction2();

file1.cpp:

#include"header.hpp"voidfunction1(){ variable = 10; }


file2.cpp:

#include"header.hpp"voidfunction2(){ variable = 123; }

main.cpp:

#include"header.hpp"#include"file1.hpp"#include"file2.hpp"#include<iostream>automain() -> int{
	function1();
	function2();
	std::cout << variable << std::endl;
}

Each translation unit that includes header.hpp receives a unique copy of a variable, due to its internal connection. There are three translation units:

  1. file1.cpp
  2. file2.cpp
  3. main.cpp

When calling function1, a copy of the variable file1.cpp is set to 10. When calling function2, a copy of the variable file2.cpp is set to 123. However, the value that is displayed in main.cpp does not change and remains equal to 42.

Anonymous namespaces

In C ++, there is another way to declare one or more symbols with an internal link: anonymous namespaces. This space ensures that the characters declared inside it are visible only in the current translation unit. In essence, this is just a way to declare a few static characters. For a while, the use of the static keyword to declare a symbol with an internal link was abandoned in favor of anonymous namespaces. However, they were again used because of the convenience of declaring a single variable or function with an internal link. There are a few minor differences that I will not dwell on.

In any case, it is:

namespace { int variable = 0; }

Does (almost) the same thing as:

staticint variable = 0;

Use

So in what cases to use internal communications? Using them for objects is a bad idea. Memory consumption of large objects can be very high due to copying for each translation unit. But basically, it just causes weird, unpredictable behavior. Imagine that you have a singleton (a class in which you create an instance of only one instance) and suddenly several instances of your singleton appear (one for each translation unit).

However, internal communication can be used to hide local translation helper functions from the global scope. Suppose there is a helper function foo in file1.hpp that you use in file1.cpp. At the same time, you have a function foo in file2.hpp, used in file2.cpp. The first and second foo are different, but you cannot think of other names. Therefore you can declare them static. If you do not add both file1.hpp and file2.hpp to the same translation unit, this will hide foo from each other. If this is not done, they will implicitly have an external connection and the definition of the first foo will interfere with the definition of the second, causing a linker error to violate the rule of one definition.

THE END OPEN DAY.

You can always leave your comments and / or questions here or visit us at

Also popular now: