Pointers, references, and arrays in C and C ++: points over i

  • Tutorial
In this post, I will try to finally understand such subtle concepts in C and C ++ as pointers, links, and arrays. In particular, I will answer the question whether C arrays are pointers or not.

Conventions and Assumptions


  • I will assume that the reader understands that, for example, there are links in C ++, but not in C, so I will not constantly remind you of which language (C / C ++ or C ++ specifically) I am talking about, the reader will understand this from context;
  • Also, I assume that the reader already knows C and C ++ at a basic level and knows, for example, the syntax for declaring a link. In this post, I will be engaged in a meticulous analysis of trifles;
  • I will designate types as the declaration of the TYPE variable of the corresponding type would look. For example, the type "array of length 2 ints" I will denote as int TYPE[2];
  • I will assume that we are mainly dealing with ordinary data types, such as int TYPE, int *TYPEand so on, for which the operations =, &, * and others are not redefined and denote ordinary things;
  • “Object” will always mean “everything that is not a reference”, and not “an instance of a class”;
  • Everywhere, except in special cases, C89 and C ++ 98 are implied.


Pointers and links


Pointers . What are pointers, I will not talk. :) We will assume that you know this. Let me remind you only of the following things (all code examples are supposed to be inside some function, for example, main):

int x;
int *y = &x; // От любой переменной можно взять адрес при помощи операции взятия адреса "&". Эта операция возвращает указатель
int z = *y; // Указатель можно разыменовать при помощи операции разыменовывания "*". Это операция возвращает тот объект, на который указывает указатель


I also recall the following: char is always exactly one byte in all C and C ++ standards sizeof (char) == 1(but at the same time, the standards do not guarantee that the byte contains exactly 8 bits :)). Further, if we add a number to the pointer to some type T, then the real numerical value of this pointer will increase by this number times sizeof (T). That is, if p is of type T *TYPE, then it is p + 3equivalent (T *)((char *)p + 3 * sizeof (T)). Similar considerations apply to subtraction.

References . Now about the links. Links are the same as pointers, but with a different syntax and some other important differences, which will be discussed later. The following code is no different from the previous one, except that it contains links instead of pointers:
int x;
int &y = x;
int z = y;


If there is a link to the left of the assignment sign, then there is no way to understand if we want to assign the link itself or the object to which it refers. Therefore, such an assignment always assigns to an object, not a reference. But this does not apply to link initialization: the link itself is, of course, initialized. Therefore, after the link is initialized, there is no way to change it itself, i.e. the link is always constant (but not its object).

Lvalue . Those expressions that can be assigned are called lvalue in C, C ++ and many other languages ​​(this is an abbreviation for “left value”, that is, to the left of the equal sign). The remaining expressions are called rvalue. Variable names are obviously lvalue, but not only they. Expression a[i + 2], some_struct.some_field, *ptr, *(ptr + 3)- also lvalue.

The amazing fact is that references and lvalues ​​are, in a sense, the same thing. Let's speculate. What is lvalue? This is something that can be appropriated. That is, it is a certain fixed place in memory where you can put something. That is, the address. That is, a pointer or link (as we already know, pointers and links are two syntactically different ways in C ++ to express the concept of an address). Moreover, the link rather than the pointer, because the link can be placed to the left of the equal sign and this will mean assignment to the object to which the link points. So lvalue is a reference.

What is a link? This is one of the syntaxes for the address, i.e., again, something where you can put it. And the link can be put to the left of the equal sign. So the link is an lvalue.

Okay, but (almost any) variable can also be to the left of the equal sign. So a (such) variable is a link? Nearly. An expression that is a variable is a reference.

In other words, let's say we announced int x. Now x is a type variable int TYPEand no other. This is int and that's it. But if I now write x + 2or x = 3, then in these expressions the subexpression xis of type int &TYPE. Because otherwise this x would be no different from, say, 10, and it (like the top ten) could not be assigned anything.

This principle (“expression that is a variable is a reference”) is my invention. That is, I have not seen this principle in any textbook, standard, etc. Nevertheless, it simplifies a lot and is conveniently considered true. If I were to implement the compiler, I would just consider the variables in the expressions as references there, and, quite possibly, this is exactly what is assumed in real compilers.

Moreover, it is convenient to assume that a special data type for lvalue (i.e., a link) exists even in C. This is exactly what we will continue to assume. Just the concept of a link cannot be expressed syntactically in C, a link cannot be declared.

The principle “any lvalue is a reference” is also my invention. But the principle “any link is an lvalue” is a completely legitimate, universally recognized principle (of course, the link must be a reference to a mutable object, and this object must allow assignment).

Now, taking into account our agreements, we formulate strictly the rules for working with links: if it is declared, say int x, then now the expression x is of type int &TYPE. If now this expression (or any other expression of the link type) is to the left of the equal sign, then it is used as a reference, in almost all other cases (for example, in a situation x + 2) x is automatically converted to typeint TYPE(another operation, next to which the link is not converted to its object, is &, as we will see later). To the left of the equal sign can only be a link. Only a link can initialize a (non-constant) link.

* And & operations. Our agreements allow us to take a fresh look at operations * and &. Now the following becomes clear: the operation * can only be applied to the pointer (specifically, this was always known) and it returns a reference to the same type. & always applies to a link and returns a pointer of the same type. Thus, * and & turn pointers and links into each other. That is, in fact, they do nothing at all and only replace the essence of one syntax with the essence of another! Thus, & generally speaking, it is not entirely correct to call the operation of taking an address: it can be applied only to an existing address, it just changes the syntactic embodiment of this address.

Note that the pointers and references are declared as int *xandint &x. Thus, the principle “declaration suggests use” is once again confirmed: the declaration of the pointer reminds how to turn it into a link, and the declaration of a link - vice versa.

I also note that &*EXPR(here EXPR is an arbitrary expression, not necessarily one identifier) ​​is equivalent to EXPR whenever it makes sense (i.e., always when EXPR is a pointer), and *&EXPRalso equivalent to EXPR whenever it makes sense (i.e. when EXPR is a link).

Arrays


So, there is such a data type - an array. Arrays are defined, for example, like this:
int x[5];

An expression in square brackets must certainly be a compile-time constant in C89 and C ++ 98. At the same time, a number should be placed in square brackets, empty square brackets are not allowed.

Just as all local variables (I remind you, we assume that all code examples are inside functions) are on the stack, arrays are also on the stack. That is, the above code led to the allocation directly on the stack of a huge memory block the size of 5 * sizeof (int)which our entire array is located. It is not necessary to think that this code has been declared by some pointer, which points to the memory located somewhere far away, on the heap. No, we declared an array, the real one. Here on the stack.

What will be equal sizeof (x)? Of course, it will be equal to the size of our array, t. E. 5 * sizeof (int). If we write
struct foo
{
  int a[5];
  int b;
};

then, again, the place for the array will be entirely allocated right inside the structure, and the sizeof from this structure will confirm this.

You can take the address ( &x) from the array , and it will be a real pointer to the place where this array is located. The type of expression &xis easy to understand int (*TYPE)[5]. At the beginning of the array, its zero element is placed, so the address of the array itself and the address of its zero element coincide numerically. That is, &xthey are &(x[0])numerically equal (here I famously wrote an expression &(x[0]), in fact, it is not so simple in it, we will come back to this). But these expressions have different types - int (*TYPE)[5]and int *TYPE, therefore, they cannot be compared with ==. But you can use trick void *: the following expression is true: (void *)&x == (void *)&(x[0]).

Well, let’s take it, I convinced you that an array is just an array, and not anything else. Where does all this confusion between pointers and arrays come from then? The fact is that the array name is converted to a pointer to its zero element during almost any operation.

So we have announced int x[5]. If we write now x + 0, then this converts our x (which was of type int TYPE[5], or, more precisely, int (&TYPE)[5]) to &(x[0]), i.e., to a pointer to the zero element of the array x. Now our x has a type int *TYPE.

Converting an array name to void *or applying == to it also leads to a preliminary conversion of this name to a pointer to the first element, therefore:
&x == x // ошибка компиляции, разные типы: int (*TYPE)[5] и int *TYPE
(void *)&x == (void *)x // истина
x == x + 0 // истина
x == &(x[0]) // истина


Operation [] . A record is a[b]always equivalent *(a + b)(recall that we do not consider overrides operator[]and other operations). Thus, a record x[2]means the following:
  • x[2] equivalently *(x + 2)
  • x + 2 refers to those operations in which the array name is converted to a pointer to its first element, so this happens
  • Further, in accordance with my explanations above, it is x + 2equivalent (int *)((char *)x + 2 * sizeof (int)), that is, x + 2means "move the pointer x to two ints"
  • Finally, the dereferencing operation is taken from the result and we retrieve the object that is placed by this shifted pointer


The types of participating expressions are as follows:
x // int (&TYPE)[5], после преобразования типа: int *TYPE
x + 2 // int *TYPE
*(x + 2) // int &TYPE
x[2] // int &TYPE


I also note that to the left of the square brackets it does not have to be exactly an array, there can be any pointer. For example, you can write (x + 2)[3], and it will be equivalent x[5]. I also note that *athey are a[0]always equivalent, both in the case when a is an array, and when a is a pointer.

Now, as I promised, I am returning to &(x[0]). Now it’s clear that in this expression, first, x is converted to a pointer, then it is applied to this pointer in accordance with the above algorithm, [0]and as a result, a type value is obtained int &TYPE, and finally, using &, it is converted to typeint *TYPE. Therefore, to explain with the help of this complex expression (inside which the conversion of the array to the pointer is already performed) a slightly simpler concept of converting the array to the pointer - it was a bit of a trick.

And now the backfill question : what is it &x + 1? Well, &xthis is a pointer to the whole array, + 1leading to a step to the whole array. That &x + 1is - this (int (*)[5])((char *)&x + sizeof (int [5])), i.e. (int (*)[5])((char *)&x + 5 * sizeof (int))(here int (*)[5]- this int (*TYPE)[5]). So, &x + 1numerically equal x + 5, and not x + 1, as one might think. Yes, as a result, we point to a memory that is outside the array (immediately after the last element), but who cares? After all, in C, it is still not checked if the array is exceeded. Also note that the expression*(&x + 1) == x + 5truly. Yet it can be written like this: (&x)[1] == x + 5. It will also be true *((&x)[1]) == x[5], or, which is the same thing (&x)[1][0] == x[5](unless, of course, we grab a segmentation fault for trying to access our memory :)).

An array cannot be passed as an argument to a function . If you write int x[2]either int x[]in the function header, this will be equivalent int *xand a pointer will always be passed to the function (the sizeof of the passed variable will be the same as that of the pointer). In this case, the size of the array specified in the header will be ignored. You can easily specify in the header int x[2]and pass an array of length 3 there.

However, in C ++ there is a way to pass an array reference to a function:
void f (int (&x)[5])
{
  // sizeof (x) здесь равен 5 * sizeof (int)
}
int main (void)
{
  int x[5];
  f (x); // OK
  f (x + 0); // Нельзя
  int y[7];
  f (y); // Нельзя, не тот размер
}

With this transfer, you still pass only the link, not the array, i.e. the array is not copied. But still, you get a few differences compared to regular pointer passing. An array reference is passed. Instead, you cannot pass a pointer. It is necessary to transfer exactly an array of the specified size. Inside the function, an array reference will behave exactly like an array reference, for example, it will have sizeof like an array.

And most interestingly, this program can be used like this:
// Вычисляет длину массива
template  size_t len (t (&a)[n])
{
  return n;
}

The std :: end function in C ++ 11 for arrays is similarly implemented.

"Pointer to an array . " Strictly speaking, a “pointer to an array” is just a pointer to an array and nothing else. In other words:
int (*a)[2]; // Это указатель на массив. Самый настоящий. Он имеет тип int (*TYPE)[2]
int b[2];
int *c = b; // Это не указатель на массив. Это просто указатель. Указатель на первый элемент некоего массива
int *d = new int[4]; // И это не указатель на массив. Это указатель

However, sometimes the phrase "pointer to an array" informally means a pointer to the memory area in which the array is located, even if the type of this pointer is not suitable. According to this informal understanding, c and d (and b + 0) are pointers to arrays.

Multidimensional arrays . If declared int x[5][7], then x is not an array of length 5 of some pointers pointing somewhere far away. No, x is now a single 5 x 7 monolithic block placed on the stack. sizeof (x)equal to 5 * 7 * sizeof (int). The elements are stored in memory as follows: x[0][0], x[0][1], x[0][2], x[0][3], x[0][4], x[0][5], x[0][6], x[1][0]and so on. When we write x[0][0], events develop like this:
x // int (&TYPE)[5][7], после преобразования: int (*TYPE)[7]
x[0] // int (&TYPE)[7], после преобразования: int *TYPE
x[0][0] // int &TYPE

The same goes for **x. I note that in expressions, say, x[0][0] + 3and **x + 3in reality, extraction from memory occurs only once (despite the presence of two asterisks), at the time of the conversion of the final link of the type int &TYPEsimply to int TYPE. That is, if we looked at the assembler code that is generated from the expression **x + 3, we would see in it that the operation of extracting data from memory is performed there only once. **x + 3can be written in another way as *(int *)x + 3.

Now let's look at this situation:
int **y = new int *[5];
for (int i = 0; i != 5; ++i)
  {
    y[i] = new int[7];
  }


What is y now? y is a pointer to an array (in an informal sense!) of pointers to arrays (again, in an informal sense). Nowhere does a single 5 x 7 block appear here, there are 5 blocks of size 7 * sizeof (int)that can be far apart. What is there y[0][0]?
y // int **&TYPE
y[0] // int *&TYPE
y[0][0] // int &TYPE

Now, when we write y[0][0] + 3, the extraction from memory occurs twice: extraction from the array y and subsequent extraction from the array y[0], which may be far from the array y. The reason for this is that there is no conversion of the array name to a pointer to its first element, unlike the example with a multidimensional array x. Therefore, **y + 3it is not equivalent here *(int *)y + 3.

I will explain one more time. x[2][3]equivalently *(*(x + 2) + 3). And y[2][3]equivalently *(*(y + 2) + 3). But in the first case, our task is to find the “third element in the second row” in a single block of size 5 x 7 (of course, the elements are numbered from scratch, so this third element will be fourth in a sense :)). The compiler calculates that the actual item is actually on2 * 7 + 3-th place in this block and extracts it. E. x[2][3]Are equivalent to ((int *)x)[2 * 7 + 3], or what is the same thing *((int *)x + 2 * 7 + 3). In the second case, first it extracts the 2nd element in the y array, and then the 3rd element in the resulting array.

In the first case, when we do x + 2, we shift immediately to 2 * sizeof (int [7]), i.e., to 2 * 7 * sizeof (int). In the second case, it y + 2is a shift by 2 * sizeof (int *).

In the first case, (void *)xand (void *)*x(and (void *)&x!) Is the same pointer, in the second, it is not.

Also popular now: