Why does "=" mean assignment?

Original author: Hillel Wayne
  • Transfer
Let's look at the following code:

a = 1
a = a + 1
print(a)

In the environment of FP, this moment of imperative programming is often criticized: “How can it be that a = a + 1? It’s the same as saying “1 = 2”. There is no point in mutable assignment. ”

Here we observe a mismatch of the designation: "equal" should mean "equality", when in practice it means "assign". I agree with this criticism and consider this a bad notation. But I also know that in some languages ​​an a = a + 1expression is written instead a := a + 1. Why is this entry not the norm?

This question is usually answered "because it is done in C". But this is like shifting responsibility to someone else: how many of us know why this is done in C? Let's figure it out together!

The big four


In the early 1960s, there were four dominant high-level languages: COBOL, FORTRAN II, ALGOL-60, and LISP. At that time, programmers assign broke into two classes: initialization ( initialization ) - When you first define a variable, and reassignment ( reassignment ) - when you do you change the value of an existing variable.

So, let's add comments to our Python example and get the following code:

a = 1 # Инициализация
a = a + 1 # Переприсвоение
print(a)

At that time, people did not use these terms specifically to refer to operations, but in essence it was just what every programmer did. In the table below, you can see which of the operators were used for each language, and how the equality test was performed.
TongueInitializationAssignmentEquality
FORTRAN==.EQ.
COBOLINITIALIZEMOVE [1]Equal
AlgolN / a: ==
Lispletsetequal

ALGOL did not have a separate statement to initialize - instead, you created a variable of a certain type and then used the statement to assign it something. You could write integer x; x := 5;, but not x := 5;. The only language from the list that was used =for assignment is FORTRAN - and it looks like a suitable candidate for answering our question.

But you and I know that C comes from ALGOL; which, in turn, means that for some reason it was decided to abandon the assignment operator :=and change the value of the operator =from checking for equality ...

ALGOL spawns CPL


ALGOL-60 is most likely one of the most influential programming languages ​​in the history of computer science. It is likely that with all this it is also one of the most useless languages. The main specification of the language intentionally did not provide any functionality for input / output. You could “hardcode” the inputs and measure the outputs, but if you needed to do anything useful with them, you needed to find a compiler that would extend the base language. ALGOL was designed to study algorithms and therefore it “broke” when you tried to do something else on it.

However, it turned out to be such a "strong" language that others wanted to generalize it for use in business and industry. The first such attempt was made by Christopher Stracci and the University of Cambridge. The resulting CPL language added a lot of innovative features to the ALGOL functionality, most of which we deeply regretted in the future. One of them was an initialization definition in which a variable could be initialized and assigned in a single expression. Now, instead of writing, x; x := 5;you could just write integer x = 5. Just super!

But here we switched from :=to =. This is because there were three types of variable initialization in the CPL:

  • = meant initialization by value.
  • ≃ meant initialization by reference, so if x ≃ y, then reassigning x also changes y. But if you wrote x ≃ y + 1 and tried to reassign x, then the program would have crashed.
  • ≡ means initialization through substitution, i.e. turning x into a function that takes no arguments (niladic function), which calculates a right-handed value every time it is used. In this case, it is not explained anywhere what should happen if you try to reassign x - and, believe me, I also do not want to know this too much.

Problem: Now =used for both initialization and equality. Fortunately, in practice in CPL, these options for using the symbol were clearly delineated: if you wrote =somewhere, it was clearly understood what was meant.

Just a year later, Ken Iverson will create an APL that will use the symbol for all kinds of assignments. Since most keyboards do not have such a key and never have, the author himself will quickly refuse it - his next language, J, will also use the symbol =:[2] for assignments . However, the APL deeply influenced S, which in turn deeply influenced R - which is why it <-is the preferred assignment operator in R.

CPL spawns BCPL


CPL was a wonderful language with only one slight flaw: no one was able to write its implementation. Several people were able to partially implement various subsets of his "features", but this language turned out to be too large and complicated for compilers of that era. Therefore, it is not surprising that Martin Richards decided to get rid of the unnecessary complexity of the box and created the BCPL. The first BCPL compiler appeared in 1967 ... and the first CPL compiler appeared only in 1970.

Among many other simplifications were the rules of the “three types of initialization”, which ordered a long life. Richards believed that substitution expressions were highly specialized and could be replaced by functions (the same, in his opinion, applied to assignments). Therefore, he combined them all into idle time.=, except for the names of the global memory addresses that used :. As with CPL, it =was an equality test. For reassignment , he used :=- in the same way as CPL and ALGOL did. Many of the languages ​​that followed after also followed this convention: =for initialization, :=for assignment, =for equality. But it went to the masses when Nicklaus Wirth created Pascal - that’s why today we call such designations “in Pascal style”.

As far as I know, BCPL was also the first “weakly typed” language, since the only data type was the machine word ( data word ) [3]. This allowed us to make the compiler much more portable due to a potential increase in the number of logical errors, but Richards hoped that improvements in the process and naming with a description would help to counter this. In addition to all this, it was in BCPL that curly braces first appeared to define blocks.

BCPL spawns B


Ken Thompson wanted the BCPL to run on the PDP-7. Despite the fact that BCPL had a “compact compiler”, it was still four times larger than the minimum amount of working memory on the PDP-7 (16 kB instead of 4 kB). Therefore, Thompson needed to create a new, more minimalistic language. Also for personal aesthetic reasons, he wanted to minimize the number of characters in the source code. This influenced the design of language B the most; that's why operators like ++ and - appeared in it.

If you leave aside the use of named global memory addresses, the following notation has always been used in BCPL: =for initialization and :=for reassignment) Thompson decided that these things can be combined into a single token, which can be used for all types of assignments, and chose = because it was shorter. However, this introduced some ambiguity: if it was xalready declared, then what was it x = y- assignment or checking for equality? And this is not all - in some cases it was assumed that these were both operations at once! Therefore, he was forced to add a new token ==as a single form of expressing the meaning "equals this." As Thompson himself put it:
Since assignment in a typical program is approximately two times more common than equality comparisons, it was appropriate to make the assignment operator half shorter.
In the time elapsed between the advent of BCPL and B, Simula 67, the first object-oriented language, was created. Simula followed ALGOL's agreements on strict separation of initialization and reassignment steps. Alan Kay around the same time began work on Smalltalk, which added blocks, but followed the same syntax.

Thompson (joined by Denis Ritchie) released the first version of B around 1969. So until 1971 (approximately) most of the new languages ​​used the designation for assignment :=.

B begets C


... the rest is history.

Well, there’s something else worth talking about. ML came out a year later, and as far as I know, it was the first language that drew serious attention to pure functions and the absence of mutations. But still it was a lifeline in the form of reference cells ( the reference cells ), which can be reassigned to new values using the operator :=.

Since 1980, we have seen an increase in the popularity of new imperative corrective languages ​​- in particular, Eiffel and Ada, both of which use a symbol for the assignment operation :=.

If you look at the whole picture,=has never been a "natural choice" for an assignment operator. Almost all the languages ​​in the ALGOL family tree were used instead for assignment :=, perhaps because it =was so closely associated with equality. Nowadays, most languages ​​use = because C uses it, and we can trace this story back to the CPL, which was the same mess .

Notes


1. At this point, COBOL is becoming very strange. They have several operators that can mutate implicitly, like ADD TO and COMPUTE. COBOL is a bad language.
2. I like to think that it was a kind of gag :=, in fact this statement agreed with the rest of the language used .and the :suffixes of verbs.
3. Later, a keyword for a floating point type will be added to the BCPL. And when I say “later”, I mean 2018 .

Also popular now: