mikhanoid November 6, 2009 at 15:16

To the issue of identifiers

We are developing the language here slowly. And besides a huge number of syntactic and semantic questions, it is necessary to solve interface questions (as they can be called): how sexy the code looks, how quickly a person gets into what is written, and so on. So one of such questions is the question of what characters to allow identifiers to be made of for a programmer, and whether to make them case sensitive. The question is not trivial, and here's why:

A little interscriptum :). Actually, I almost always wrote in stylethis_is_the_variable, and if I hadn’t seen the Plan9 code, I wouldn’t have any questions: we would have made identifiers in the language “like in C”, but it so happened that I read Plan9, and the fact that I understand Plan9 sources much easier than Linux sources. And this despite the fact that Plan9 variables, commonly referred to as: wrblock, lzput, hufftabinit, quotefmtinstall, and Linux as follows: spin_lock_irqsave, rt_mutex_adjust_prio_chain, dma_chan_busy, seq_puts. Why is that? When trying to give myself an explanation, some thoughts arose that, I dare to hope, would be useful to someone.

As you know, there are several popular lexical schemes for naming variables: So, which one is better for perceiving the code is an open question. There is a standard point of view:

this_is_the_var

thisIsTheVar

thisisthevar

this_is_the_var- The best option, because you can immediately parse the words that make up the identifier. But whether it is good or bad is a moot point. Because ...

First, should we strive to express the meaning of the identifier through a description of the process abstracted by him? For example, everyone knows that printf- it is printf, and no one really does not think about what it really is: print_values_with_formatting_on_standart_output. Or, everyone knows what stdoutit is stdout. Does it make sense to put the identifier in its name, or is it better to perceive and write the program text when the meaning of the identifier is derived from the program text? And if the second is true, then vice versa, do long names interfere with the perception of the text? In addition, do long names interfere with the understanding of the text? After all, in the case ofthis_is_the_variablethe programmer has to work at two levels: evaluate the meaning of the phrase that identifies the identifier, and evaluate the relationship of the identifier with the entire program. As examples:

while ((current_character = getc (stdin))! = EOF)
{
	do_something ();
}

and

while ((c = getc (stdin))! = EOF)
{
	do_something ();
}

The examples are simple, but in the first case you need to first read current_character, understand that this is the current symbol, then connect this understanding with how it works getc, after which every time a symbol occurs in the text current_character, the reader must evaluate this mental construction in his head (This is not a scientific fact, but just mine - a non-specialist hypothesis). In the second example, this does not happen, the meaning is c'hieroglyphic', that is, it is not embedded in the identifier by an appeal to an external language, but right here, in the text (you can say the image, therefore hieroglyphic) of the program. Is this useful? I personally do not know, but it is likely to think about it (?).

Secondly, and this complements the previous one, style identifiersthis_is_the_variablethey simply confuse the brain with the perception of the identifier as a whole. Considering GitHub, for example, in some cases I just read a line by syllables for a relatively long time, trying to understand where the variable declaration begins. Or can we compare: lpfnWndProcwith window_event_handling_procedure_ptr, what is perceived as a whole?

Thirdly, long identifiers that describe something in detail physically expand the field that needs to be analyzed in order to understand the meaning of what is written.

All this leads to the question: is it necessary to allow underscores to be used in identifiers, thereby stimulating programmers to verbosity and multibit?

Another question: should identifiers be case sensitive? The generally accepted answer to this question is: yes, they should. But here, too, one can express doubts: such, for example. Case insensitivity gives more freedom in the interaction of programmers: one is more convenient to write lpfnWndProc, and the other lpfwndproc, the third is marked with different appearance for- different types of cycles: for example foR, this is a run through the list, and it FORis an iterative search for a numerical solution.

A small digression: nevertheless, numbers are algorithms, and it is much more natural than numbers - coordinates, or numbers - values.

While dealing with various disputes and arguments around this topic, I came across a remark that making case insensitive identifiers without underscores is bad, because there are a huge number of libraries written in C (or Assembler) for which case sensitivity and underscore are important. But in our language it will be possible to create such identifiers:, ID.'вам нужен такой идентификатор? они есть у нас!'and they can be used to communicate with libraries in C and any other case sensitive language. So is it worth doing variables with underlining and case sensitivity? And will the lack of these features contribute to writing better code, and then better understanding of it?

Such a text turned out. Thanks for attention.

PS Here's what one of the developers of Plan9, Rob Pike, wrote about C programming rules: www.lysator.liu.se/c/pikestyle.html

Tags:

To the issue of identifiers

Also popular now: