About fundamental mistakes in the design of programming languages
One day I came across an article that the most expensive mistake in the design of programming languages was the decision to determine the end of a line in C by a NULL byte. One of the options for translating this article on Habré (although I, in my opinion, read another). This article surprised me a bit. Firstly, as if in those days of saving each bit of memory, you could boo and allocate another 2-4 bytes in each line to store its size. Secondly, this solution does not bear any particularly catastrophic consequences for the programmer. I can come up with two errors that can be made on this subject: it is incorrect to allocate memory for a string (forget the place under NULL) and it is incorrect to write the string (forget NULL). Compilers are already warning about the first error, the use of library functions helps to avoid the second. All the trouble.
A much bigger problem from the time of C language design (and then C ++) seems to me something else - the for statement. For all its apparent harmlessness, it’s just a storehouse of potential errors and problems.
Let's recall its classic application:
What could go wrong here?
1. Despite the fact that the example with int most often occurs in textbooks on the first pages, the use of int is most often incorrect. We mainly go through arrays \ vectors \ lists. Those. firstly, we need an unsigned type, and secondly, we need a data type corresponding to the maximum size of the collection used. Those. it would be right to write
Tell me, do you often write this? That's it. It looks so scary that few have the willpower to write like that everywhere. As a result, we have millions of incorrectly written cycles. What is this if not a mistake in the design of a programming language?
2.
All programmers are taught to correctly name variables. For names like "a, b, temp, var, val, abra_kadabra" give teachers hands in pairs, well, or senior colleagues to young juniors. However, there is an exception. “Well, if it's a counter in a loop, then you can just i or j.” Br-rr Stop! That is, you need to give the correct names to the variables in all cases ... except for these cases, when for some reason the variables do not need clear names and you can write one incomprehensible letter? This is why it happened so? And it happened so because if we forced the programmer to name the variable “currentRowIndex”, then we would have to write it three times in the for loop:
As a result, the length of the string grows from 37 to 79 characters, which is inconvenient to read or write. So we write i. Which leads to the fact that we already use j in the internal for loop, in some Floyd-Worshell algorithm, Wikipedia recommends that we use the variable k for the third level of the cycle, and so on. In addition to the obvious non-obviousness of the written code, here we also have copy-paste errors. Take write some kind of matrix multiplication, without first confusing the i and j variables anywhere, each of which in one place in the code means a column, and in another - a row of the matrix.
We live with this due to poor for loop design.
3.
The trouble with the for loop is that, as a rule, we need to start viewing it with a zero element. Except those cases when it is necessary from the first, second, found earlier, last, cached, etc. A crammed hand of a programmer habitually writespaste = 0 , and then debugging and remembering Kuzkin’s mother is required to fix such a familiar = 0to the desired option. You say that there is no fault for, but is there a careless programmer? I do not agree. If you ask the same programmer to write the same code using do \ while or while - he will write it the first time without error. Because before his eyes in this case there will be no bored pattern, all the do \ while or while cycles are quite unique, the programmer every time thinks what the cycle starts with and by what criterion he stops. In the design of the for loop, this need to think sometimes seems superfluous, which is why it is almost always neglected.
4.
A convenient feature of the for loop is that the variable i is created in the scope of the loop and destroyed when it exits. This, in general, is good and sometimes allows you to save memory or somehow use RAII. But this does not work at all in cases where we need to find something in the loop and stop. We can stop, but to return the index of the found element, we need an additional variable. Or the definition of i before the loop. An extra variable is an unjustified cost for those cases when nothing is found. Declaring i before the loop breaks the coherence of the code - the first for section remains empty, which forces the reader to ponder the code above, trying to understand whether this is an error, or it was necessary.
It may seem nit-picking, but for me the for loop lacks the ability to return the index value in case of an early stop. It might look like some kind of post-block (like else for a while loop) in which the last value of the iteration counter would be available. Or a function in the spirit of GetLastError (), which would return the last value of the variable i at the time break was called;
5.
Checking the condition in the second block of the for statement does not look logical, because at each iteration of the loop (except the first), the counter increment (the third block) will be performed first, then the condition check (second block). The condition check is in the second block to emphasize the fact that it will be executed during the first iteration of the cycle immediately after the counter i is initialized - only with this explanation everything looks more or less logical. As a result, we got a cycle whose syntax is concentrated on its first iteration and does not reflect what is happening on all subsequent ones (which are usually many times more). That's the design of the for statement.
6.
"Smaller". Or is less equal? Or "not equal"? Before ".size ()" or before ".size () - 1"? Yes, it is easy to find the answer to these questions, but why, tell me, can these questions be generally asked? And how in those rare cases when you need to write a non-standard version to let your fellow programmers know that this is not a mistake, and that is exactly what you were going to write?
7.
This is generally the only place where we tell the cycle about which collections we are going to walk. And even that, we mention it only in the context of size. Here, they say, so many steps need to be taken. At the same time, in the cycle itself, we can well walk along the vec2 vector, which, of course, according to the law of meanness, will have exactly the same length in the debug, and in the release it will be different, which is why we will find this bug much later than that moment, when it was necessary to do it.
8.
How people just do not come up with a designation of the number of elements in the collection! Yes, STL with its size () is quite consistent, but other libraries use length (), and count (), and number () and totalSize () - all this in different versions of CamelCase and under_score writing styles. As a result, to use the concept of “collection size”, we have to give the knowledge about the implementation of this particular collection to the for loop. And when changing the collection to another - rewrite all for'y.
9.
Here, of course, we have any holivar about the prefix and postfix form of increment. If you want to fight with a colleague and spend half a day remembering the language standard and studying the results of code optimizations by modern compilers, welcome to the good old thread "++ i vs i ++". There are many different places (and Habr is one of them) where you can talk about it, but was it really necessary to make the third block of the for statement used by thousands in every first project?
10.
Here we also have the classic debate “Yes, this is the most effective way to organize an endless cycle!” with “It looks disgusting, while (true) is much more expressive.” More holivars to the god of holivars!
eleven.
This code compiles. Some compilers give a warning, but no one throws an error. The second and third blocks, mixed up in places, are not striking because all familiar things are written there - increment, condition check. The for statement looks like some kind of hardware connector, into which the plug can be inserted both up and down, while it will work only in one case, and in the second it will burn.
A significant part of the further evolution of programming languages looks like an attempt to fix for. Languages of a higher level (and subsequently C ++) introduced the for_each operator. Standard libraries are replenished with algorithms for searching and modifying collections. C ++ introduced the auto keyword - basically in order to get rid of the need to write wild
A much bigger problem from the time of C language design (and then C ++) seems to me something else - the for statement. For all its apparent harmlessness, it’s just a storehouse of potential errors and problems.
Let's recall its classic application:
for (int i = 0; i <vec.size (); i ++)
{...}
What could go wrong here?
1. Despite the fact that the example with int most often occurs in textbooks on the first pages, the use of int is most often incorrect. We mainly go through arrays \ vectors \ lists. Those. firstly, we need an unsigned type, and secondly, we need a data type corresponding to the maximum size of the collection used. Those. it would be right to write
for (int i = 0; i < vec.size(); i++)
std::vector::size_type
Tell me, do you often write this? That's it. It looks so scary that few have the willpower to write like that everywhere. As a result, we have millions of incorrectly written cycles. What is this if not a mistake in the design of a programming language?
2.
for (int i = 0; i < vec.size(); i++)
All programmers are taught to correctly name variables. For names like "a, b, temp, var, val, abra_kadabra" give teachers hands in pairs, well, or senior colleagues to young juniors. However, there is an exception. “Well, if it's a counter in a loop, then you can just i or j.” Br-rr Stop! That is, you need to give the correct names to the variables in all cases ... except for these cases, when for some reason the variables do not need clear names and you can write one incomprehensible letter? This is why it happened so? And it happened so because if we forced the programmer to name the variable “currentRowIndex”, then we would have to write it three times in the for loop:
for (int currentRowIndex = 0; currentRowIndex < vec.size(); currentRowIndex++)
As a result, the length of the string grows from 37 to 79 characters, which is inconvenient to read or write. So we write i. Which leads to the fact that we already use j in the internal for loop, in some Floyd-Worshell algorithm, Wikipedia recommends that we use the variable k for the third level of the cycle, and so on. In addition to the obvious non-obviousness of the written code, here we also have copy-paste errors. Take write some kind of matrix multiplication, without first confusing the i and j variables anywhere, each of which in one place in the code means a column, and in another - a row of the matrix.
We live with this due to poor for loop design.
3.
for (int i = 0; i < vec.size(); i++)
The trouble with the for loop is that, as a rule, we need to start viewing it with a zero element. Except those cases when it is necessary from the first, second, found earlier, last, cached, etc. A crammed hand of a programmer habitually writes
4.
for (int i = 0; i < vec.size(); i++)
A convenient feature of the for loop is that the variable i is created in the scope of the loop and destroyed when it exits. This, in general, is good and sometimes allows you to save memory or somehow use RAII. But this does not work at all in cases where we need to find something in the loop and stop. We can stop, but to return the index of the found element, we need an additional variable. Or the definition of i before the loop. An extra variable is an unjustified cost for those cases when nothing is found. Declaring i before the loop breaks the coherence of the code - the first for section remains empty, which forces the reader to ponder the code above, trying to understand whether this is an error, or it was necessary.
It may seem nit-picking, but for me the for loop lacks the ability to return the index value in case of an early stop. It might look like some kind of post-block (like else for a while loop) in which the last value of the iteration counter would be available. Or a function in the spirit of GetLastError (), which would return the last value of the variable i at the time break was called;
5.
for (int i = 0; i < vec.size(); i++)
Checking the condition in the second block of the for statement does not look logical, because at each iteration of the loop (except the first), the counter increment (the third block) will be performed first, then the condition check (second block). The condition check is in the second block to emphasize the fact that it will be executed during the first iteration of the cycle immediately after the counter i is initialized - only with this explanation everything looks more or less logical. As a result, we got a cycle whose syntax is concentrated on its first iteration and does not reflect what is happening on all subsequent ones (which are usually many times more). That's the design of the for statement.
6.
for (int i = 0; i < vec.size(); i++)
"Smaller". Or is less equal? Or "not equal"? Before ".size ()" or before ".size () - 1"? Yes, it is easy to find the answer to these questions, but why, tell me, can these questions be generally asked? And how in those rare cases when you need to write a non-standard version to let your fellow programmers know that this is not a mistake, and that is exactly what you were going to write?
7.
for (int i = 0; i < vec.size(); i++)
This is generally the only place where we tell the cycle about which collections we are going to walk. And even that, we mention it only in the context of size. Here, they say, so many steps need to be taken. At the same time, in the cycle itself, we can well walk along the vec2 vector, which, of course, according to the law of meanness, will have exactly the same length in the debug, and in the release it will be different, which is why we will find this bug much later than that moment, when it was necessary to do it.
8.
for (int i = 0; i < vec.size(); i++)
How people just do not come up with a designation of the number of elements in the collection! Yes, STL with its size () is quite consistent, but other libraries use length (), and count (), and number () and totalSize () - all this in different versions of CamelCase and under_score writing styles. As a result, to use the concept of “collection size”, we have to give the knowledge about the implementation of this particular collection to the for loop. And when changing the collection to another - rewrite all for'y.
9.
for (int i = 0; i < vec.size(); i++)
Here, of course, we have any holivar about the prefix and postfix form of increment. If you want to fight with a colleague and spend half a day remembering the language standard and studying the results of code optimizations by modern compilers, welcome to the good old thread "++ i vs i ++". There are many different places (and Habr is one of them) where you can talk about it, but was it really necessary to make the third block of the for statement used by thousands in every first project?
10.
for (;;)
Here we also have the classic debate “Yes, this is the most effective way to organize an endless cycle!” with “It looks disgusting, while (true) is much more expressive.” More holivars to the god of holivars!
eleven.
for (int i = 0; i++; i < vec.size())
This code compiles. Some compilers give a warning, but no one throws an error. The second and third blocks, mixed up in places, are not striking because all familiar things are written there - increment, condition check. The for statement looks like some kind of hardware connector, into which the plug can be inserted both up and down, while it will work only in one case, and in the second it will burn.
A significant part of the further evolution of programming languages looks like an attempt to fix for. Languages of a higher level (and subsequently C ++) introduced the for_each operator. Standard libraries are replenished with algorithms for searching and modifying collections. C ++ introduced the auto keyword - basically in order to get rid of the need to write wild
std::vector::iterator
in every cycle. Functional languages have suggested replacing loops with recursion. Dynamic languages suggested abandoning the type declaration in the first block. Everyone tried to somehow fix the situation - but you could immediately design better.