newpavlov July 29, 2016 at 15:58

Rust: for and iterators

Transfer
Tutorial

In this article we will discuss for loops, as well as related concepts of iterators and “iterable objects”.

Depending on your previous experience with other programming languages, these concepts may seem very familiar in terms of syntax and semantics, or they may be completely new and incomprehensible. The closest analogues of them can be found in Python, but, I think, programmers in Java, C # or (modern) C ++ will also see many intersections with what is in their languages.

The basics

In Rast, the for loop syntax is almost Spartan laconic:

let v = vec!["1", "2", "3"];
for x in v {
    println!("{}", x);
}

(The variant of the for loop through a double semicolon is absent in Rasta as a phenomenon, just like in Python we can either iterate over a certain range or use while or loop for more complicated cases) The

code above will print three lines with 1, 2, 3. Perhaps less obvious is the fact that the vector v was moved inside the loop during its execution. Attempting to use this vector after a loop will throw an error:

:6:22: 6:23 error: use of moved value: `v` [E0382]
:4         println!("{}", x);
:5     }
:6     println!("{:?}", v);
                              ^

Possession of the vector and its elements completely irrevocably moved inside the loop. Being quite unexpected compared to other languages, this behavior is fully consistent with the general Rasta policy of "moving by default."

But not fully accustomed to the rules of moving and borrowing, this fact can still turn out to be a surprise for you, because for the most part, movement is associated with calling functions and their context. In most cases, to simplify understanding, you can consider the for loop above similar to the for_each function :

for_each(v, |x| println!("{}", x));

This view also gives a hint how to avoid moving the value inside the loop. Instead of passing the vector itself, we can only pass a link to it:

for_each_ref(&v, |x| println!("{}", x));

By translating this code back into the loop form:

 for x in &v {
    println!("{}", x);
}
println!("{:?}", v);

We will get rid of the compiler error.

Iterators and Iterable Objects

It is important to note that the added ampersand ( & ) is by no means part of the for loop syntax . We just changed the object we iterate over, instead of Vec, directly vector, we pass & vec, immutable (immutable) link to it. The consequence of this is a change of type x from T to & T , i.e. this is now an element reference . (this does not affect the body of the loop due to the presence of " conversion by dereferencing ")

So it turns out that Vecand & Vec, both are "iterable objects." The usual way to implement this for programming languages is to introduce a special object - an “iterator”.

The iterator keeps track of which element it points to at the moment and supports at least the following operations:

Getting current item
Move to next item
Notification that items have run out

Some languages provide different iterators for each of this tasks, but in Rast it was decided to combine them into one. Looking at the documentation for the Iterator trait, you will see that in order to satisfy the implementation it is enough to have one next method .

Remove syntactic sugar

But how exactly are iterator objects created from iterable objects?

In a typical Rasta manner, this task is delegated to another trait called IntoIterator :

// (упрощено)
trait IntoIterator {
    fn into_iter(self) -> Iterator;
}

A unique feature of Rast is that into_iter , the only method of this trait, not only creates an iterator from the collection, but also essentially absorbs the original collection, leaving the resulting iterator the only way to access the elements of the collection. (Because of what can we say this? The thing is that into_iter receives self as an argument , not & self or & mut self , which means that ownership of the object is transferred inside this method)

(translator's note: hereinafter, the author does not discusses in detail the difference between the methods of the into_iter , iter and iter_mut collectionsto create iterators, which means that the first one moves the collection inside itself, and the second one immutably borrows, which means the iteration goes through the immutable links of the elements, the third one borrows mutually, thereby changing the elements of the collection during the iteration)

This behavior protects us from a very common mistake called iterator invalidation, which is probably fairly well known to C ++ programmers. Because the collection is essentially “converted” into an iterator, the following becomes impossible:

Existence of more than one iterator pointing to the collection
Changing a collection while one of the iterators is in scope

Do all these “movements” and “borrowings” sound familiar to you? Earlier, I noted that iterating over a vector in a for loop, we essentially move it “inside the loop”.

As you can already guess during iteration over a vector, we actually call IntoIterator :: into_iter for this vector , getting an iterator at its output. By calling next in each iteration, we continue to cycle through until we get None .

So the loop:

for x in v {
    // тело цикла
}

Essentially just syntactic sugar for the following expression:

let mut iter = IntoIterator::into_iter(v);
loop {
    match iter.next() {
        Some(x) => {
            // тело цикла
        },
        None => break,
    }
}

You can see well that v cannot be used not only after the cycle has ended, but even before it begins . This happened because we moved the vector inside the iter iterator using the into_iter method of the trait ... IntoIterator !

Simple, right? :)

The for loop is syntactic sugar for calling IntoIterator :: into_iter followed by repeatedly calling Iterator :: next .

Ampersand

However, this behavior is not always desirable. But we already know a way to avoid this. Instead of iterating over the vector itself, we use the link to it:

for x in &v {
    // тело цикла
}

(approx. transl .: equivalent for x in v.iter () {...} )

At the same time, everything we talked about above applies here, up to the disclosure of syntactic sugar. The into_iter method is called as before, with one difference, instead of a vector, it gets a link to it:

// (упрощено)
impl IntoIterator for &Vec {
    fn into_iter(self) -> Iterator { ... }
}

Thus, the output iterator will produce references to the elements of the vector ( & T ), and not the elements ( T ) themselves . And because self above is also a link, the collection does not move anywhere, so we can safely access it after the end of the cycle.

The same goes for mutable links:

for x in &mut v {
    // тело цикла
}

(approx. transl .: equivalent for x in v.iter_mut () {...} )

The only difference is that into_iter is now called for & mut Vec. Accordingly, a result of the form Iteratorallows us to modify the elements of the collection.

To support these two cases, we did not need any additional compiler support, as everything is already covered by the same trait.

Expanding the syntactic sugar of a loop through IntoIterator works the same way both for collection objects themselves and for mutable and immutable references to them.

What about the iter method ?

So far, we have only talked about for loops , which represent a very imperative style of computation.

If you are more inclined to functional programming, you may have seen and written various constructions combining methods like these:

let doubled_odds: Vec<_> = numbers.iter()
    .filter(|&x| x % 2 != 0).map(|&x| x * 2).collect();

Methods like map and filter are called iterator adapters, and they are all defined for the Iterator trait . They are not only very numerous and expressive, but can also be supplied with third-party racks .

In order to take advantage of the adapters, we first need to get an iterator. We know that loops usually get it through into_iter , so basically we can use the same approach here:

let doubled_odds: Vec<_> = IntoIterator::into_iter(&numbers)
    .filter(|&x| x % 2 != 0).map(|&x| x * 2).collect();

In order to improve the readability of the code and reduce its size, collections usually provide an iter method , which is a shorthand for the expression above. It is this method that you will usually see in expressions like the above.

v.iter () is nothing more than a shorthand for IntoIterator :: into_iter (& v) .

How about both?

The last thing worth noting: Rast does not indicate what to use, iterators or loops, for working with collections. With optimizations turned on in release mode, both approaches should be compiled into equally efficient machine code with inline closures and loops deployed if necessary.

Thus, the choice of an approach is nothing more than a matter of style and habit. Sometimes the right solution is to mix both approaches, which Rast allows you to do without problems:

fn print_prime_numbers_upto(n: i32) {
    println!("Prime numbers lower than {}:", n);
    for x in (2..n).filter(|&i| is_prime(i)) {
        println!("{}", x);
    }
}

As before, this is possible through the disclosure of syntactic sugar using the IntoIterator trait . In this case, Rast will apply the conversion of the iterator to itself.

Iterators themselves are also “iterable objects”, through the “transparent” implementation of the IntoIterator :: into_iter trait .

Finally

If you want to know more information about iterators and loops, the official documentation will be the best source for you . Although mastering all the iterator adapters is by no means necessary to write effective code on the Rast, it is very likely that a careful review of the documentation for the collect method and the associated FromIterator trait will be very useful for you.

Tags: