Fancy Range Operator

    I must warn you that this is another article that does not contain any revelations. For those super-geeks who know the whole perldoc by heart, it will be absolutely useless, so, dear super-geeks, you can pass by and not inform that all this is on the docks. I already know that. :-) My article is for everyone else, for those who have not fully mastered the whole perldoc, or mastered, but did not understand, or understood, but did not remember.

    I think many people know about the so-called range operator, written as ..(two points), with which you can quickly create arrays from a set of consecutive elements. For example, the following code creates an array of 35 numbers: 3, 4, 5, ..., 37:
    my @arr = 3 .. 37;
    In addition to numbers, you can use strings: in this case, the so-called magic increment will be performed to generate array elements (for example, you can specify a range of letters:) 'a' .. 'z'.

    However, the range operator can also be used in a scalar context, accepting Boolean expressions as operands and returning a Boolean result. And here the most interesting part begins, because it is an operator with a state : the result of the operation will depend not only on the values ​​of the left and right operands, but also on the call history of this expression!

    You can, of course, give a formal definition here that describes the methodology for calculating the result for the range operator, but I personally needed this formal definition to re-read it five or six times before I finally could understand the point. Therefore, it is better to go the other way. Imagine that we are processing a file line by line, and we need to perform a certain action for some blocks in this file (for example, skip a multi-line comment). What is a block? This is an almost arbitrary set of lines, enclosed between two markers marking the beginning and end of the block. For definiteness, we take comments in the style of C / C ++ (and for simplicity we assume that a comment and a useful command cannot adjoin on the same line). Here is an example of the code that we will be processing:
    01:  int i = 10, j, k;
    02:  for (j = i; j < 2 * i; ++j) {
    03:      /*
    04:        Здесь мы будем выполнять
    05:        какие-то очень сложные
    06:        и непонятные действия. */
    07:      k = j * j;
    08:      printf("Result[%d]: %d\n", j, k);
    09:      /* Результат выведен. */
    10:  }
    We will write code that displays all the uncommented lines of the above text. What will the C programmer do in line-by-line processing? Well, for example, it will create a variable where it will store the current state: whether we are inside a comment (that is, we have read the marker " /*", but have not yet met " */"), and depending on the value of this variable, display or not the next line to the screen, and also do not forget to change the value in a timely manner when a marker of the desired type is found. And what will the programmer at Pearl do? And he will use the range operator and write something like the following:
    while (my $line = ) {
        if (($line =~ m/^\s*\/\*/) .. ($line =~ m/\*\/\s*$/)) {
            # $line - это комментарий, пропускаем.
        }
        else {
            # $line - это код, печатаем.
            print $line;
        }
    }
    What does this code do? In short, exactly what we need. The left operand of the range operator here corresponds to the beginning of the comment, the right to the end. And the range operator itself returns true if and only if, in the process of executing the code, we are in the “interval” from the operation of the left operand to the operation of the right. Thus, the operator fully justifies its name: it sets the logical range.

    Now we can give a formal definition (free translation of excerpts from the official perlop documentation ):
    Each operator ..contains its own Boolean state. It contains the value “false” while the left operand is false. As soon as the left operand becomes true, the range operator accepts the true value and remains so until the right operand accepts the true value. AFTER this, the range operator again assumes a false value.
    Now let’s try to sort through the magic that turns this somewhat vague definition into correspondence with the usual range. To do this, we introduce ourselves as a debugger and we will execute the program sequentially, step by step, reading the input file line by line. For brevity, I denote the left and right operands (Boolean expressions matching the beginning and end of the comment) as MB and ME (short for marker begin / marker end).
    1. int i = 10, j, k;
      On this line, both MB and ME give a false, therefore, the operator ..will also return a false. Thus, this line is not a comment.
    2. for (j = i; j < 2 * i; ++j) {
      Similarly, MB and ME give false, so the whole expression will also be false.
    3.     /*
      And here, finally, MB is triggered ; ME remains a lie. According to the definition, at this moment the range operator takes a true value, and we get the result that the read line is a comment.
    4.       Здесь мы будем выполнять
      On this line, the expressions MB and ME again find nothing and return false. But since the operator has already switched to the true state, he will now remain in it until ME accepts the true value. Thus, we again get the truth, that is, this line is a comment.
    5.       какие-то очень сложные
      Here, ME has not yet become true, so the operator ..continues to give out the truth, i.e., we have not reached the end of the comment.
    6.       и непонятные действия. */
      And here, finally, ME is triggered . The range operator sighs and delivers the truth for the last time, after which it switches back to its original state. But for this line we still get the truth, which correlates well with our ideas about the structure of multi-line comments: this line is the final, but still part of the comment, and should not be output according to the above TK.
    7.     k = j * j;
      The freebie is over, sir. MB and ME are both false, the operator is in its original state and returns false, so this line is not a comment.
    8.     printf("Result[%d]: %d\n", j, k);
      ... just like this one. For the same reasons.
    9.     /* Результат выведен. */
      But this piece is very interesting: both MB and ME work here simultaneously . What does it change? Yes, in general, nothing. The operator .., by definition, will return the truth, remembering this for the future, but, since the second operand is also true, immediately the reverse switches back to the initial state: the comment began and immediately ended.
    10. }
      This line is not caught by either MB or ME , and since the operator managed to switch back, he will return a false here, marking this line as uncommented.
    I hope that this simple example helped you figure out what's what. In fact, the operator contains the same local state variable that the C-programmer who we invented was forced to enter explicitly, as well as tinker with managing its values.

    Of course, the scope of the range operator is not at all limited to line-by-word or word-by-word word processing, everything is determined only by your imagination. For example, with its help it is possible to determine some ranges in data arrays when the boundaries are set by more complex conditions than an elementary check for more or less.

    It is also worth mentioning here that in addition to the two-point operator, ..there is also a three -point operator (...) It behaves in exactly the same way as a two-point one, but with one difference: when the first operand in the initial state of the operator takes a true value, the second operand is ignored. Thus, if we tried to use the operator for our example ..., the comment in the ninth line of the file would not be considered complete, but continuing to the end of the file (more precisely, to the line where the next marker for the end of the comment appears, but in our example file of such a line simply no). As an example of use, we can cite the situation when the beginning and end of the block being processed are specified by the same special line. The two-point operator here would be powerless, and the three-point operator would be perfect.

    Finally, I want to read a little notation: please do not forget about the readability and self-commenting of the code. You do not need to use the opportunity of the language just because it is in it. If, indeed, a certain block is clearly allocated in the sequential processing of some data, then this operator makes it possible to describe this block shortly, clearly and elegantly. But if you start shoving the range operator wherever you go, just because it is so original, unusual, and no one else has it, believe me, it won’t lead to anything good.

    Also popular now: