Intervals: the upcoming evolution of C ++

From the sandbox

The C ++ 20 standard will appear soon, in which, most likely, they will add the concept of intervals ( ranges ), but few know what they are and what they eat. I couldn’t find the Russian-speaking sources available to a wide audience about this beast, so in this article I would like to tell you more about it based on the lecture by Arno Schödl “From Iterators to Ranges: The Upcoming Evolution Of the STL” from Meeting C ++ 2015- th year. I will try to make this article as clear as possible for those who first come across this concept, and at the same time I will talk about all sorts of chips like interval adapters for those who are already familiar with this concept and want to learn more.

Libraries with ranges

At the time of this writing, there are three main libraries that implement the intervals:

The first library, in fact, is the progenitor of this concept (which is not surprising, after all, what is missing in the Boost library collection :)). The second - a library Eric nibbler ( by Eric Niebler ), about it will be discussed later. Finally, the latest library, as you might guess, was written by the think-cell company, which can be said to have developed and improved Boost.Range.

Why are intervals our future?

For those who are not yet familiar with the concept of interval, we define this non-trivial concept as something that has a beginning and an end (a pair of iterators ).

Let's now consider the following problem: there is a vector, it is necessary to remove from it all repeating elements. Under the current standard, we would solve it like this:

std::vector<T> vec=...;
std::sort( vec.begin(), vec.end() );
vec.erase( std::unique( vec.begin(), vec.end() ), vec.end() );

In this case, we specify the name of the vector as much as 6 times! However, using the concept of intervals (by combining iterators at the beginning and end of a vector into one object), you can write many times simpler by specifying the desired vector only once :

tc::unique_inplace( tc::sort(vec) );

What is the interval at the moment within the current standard?

In the C ++ 11 standard, a range-based for loop and universal access to the beginning / end of containers was added, and in the last C ++ 17 standard, nothing new was added at intervals.

for ( int& i : <range_expression> ) {
 ...
}

std::begin/end(<range_expression>)

Future intervals

Let us now dwell on the previously mentioned library Range V3. Erik Nibler, her creator, created the Range's Technical Specification as his home project , modifying the algorithm library to support the intervals. It looks like this:

namespace ranges {
    template< typename Rng, typename What > 
    decltype(auto) find( Rng && rng, What const& what ) {
        returnstd::find( 
            ranges::begin(rng),
            ranges::end(rng),
            what 
        );
    }
}

On his website there is some kind of preview of what he wants to standardize, this is the Range V3 .

What, after all, can range be considered?

First of all, containers (vector, string, list, etc.), because they have a beginning and an end. It is clear that containers own their elements, that is, when we turn to containers, we turn to all their elements. Similarly, when copying and declaring permanent (deep copying and constancy). Secondarily , views can also be considered as intervals. Views is just a pair of iterators pointing to the beginning and the end, respectively. Here is their simplest implementation:

template<typename It> 
structiterator_range {
    It m_itBegin;
    It m_itEnd;
    It begin()const{
        return m_itBegin; 
    }
    It end()const{ 
        return m_itEnd;
    } 
};

Views, in turn, only refer to the elements, so copying and constancy are lazy (this does not affect the elements).

Interval adapters

At this, the inventors of the intervals did not dwell, because otherwise this concept would be quite useless. Therefore we introduced a concept such as adapters interval ( range adaptors ).

Transformer adapter

Consider the following problem: Let the vector int 's be given , in which you need to find the first element equal to 4:

std::vector<int> v;
auto it = ranges::find(v, 4);

Now let's imagine that the type of the vector is not int, but some kind of complex self-written structure, but in which there is an int, and the task is still the same:

structA {int id;
    double data; 
};
std::vector<A> v={...};
auto it = ranges::find_if(
    v,
    [](A const& a) { return a.id == 4; }
);

It is clear that these two codes are similar in semantics, however, they differ significantly in syntax, because in the latter case we had to manually write a function that runs through the int field . But if you use a transforming adapter ( transform adapter ), then everything looks much more succinctly:

structA {int id;
    double data; 
};
std::vector<A> v={...}; 
auto it = ranges::find(
    tc::transform(v, std::mem_fn(&A::id)), 
    4);

In fact, the transforming adapter “transforms” our structure by creating a wrapper class around the int field. It is clear that the pointer points to the id field , but if we wanted it to point to the whole structure, then we need to add .base () at the end . This command encapsulates a field, which is why the pointer can run through the entire structure:

auto it = ranges::find(
    tc::transform(v, std::mem_fn(&A::id)), 
    4).base();

Here is an example implementation of a transforming adapter (it consists of iterators, each of which has its own functor):

template<typename Base, typename Func> 
structtransform_range {structiterator {private:
        Func m_func; // в каждом итератореdecltype( tc::begin(std::declval<Base&>()) ) m_it; 
    public:
        decltype(auto) operator*() const { 
            return m_func(*m_it);
        }
        decltype(auto) base() const {
            return (m_it); 
        }
        ... 
    };
};

Filter adapter

And if in the last task we needed to find not the first such element, but to “filter” the entire field of int 's for the presence of such elements? In this case, we would use the filter adapter ( filter adaptor ):

tc::filter( v,
    [](A const& a) { return4 == a.id; } 
);

Note that the filter is performed lazily during iterations.

But its naive implementation (something like this is implemented in Boost.Range):

template<typename Base, typename Func> 
structfilter_range {structiterator {private:
        Func m_func; // функтор и ДВА итератораdecltype( ranges::begin(std::declval<Base&>()) ) m_it;
        decltype( ranges::begin(std::declval<Base&>()) ) m_itEnd;
    public:
        iterator& operator++() {
            ++m_it;
            while( m_it != m_itEnd && !static_cast<bool>(m_func(*m_it)) )
                ++m_it; 
            return *this;
        }
        ... 
    };
};

As we can see, it already requires two iterators instead of one, as it was in the transforming adapter. The second iterator is necessary in order not to accidentally overstep the bounds of the container during iterations.

Some optimizations

Well, what does the iterator from tc :: filter (tc :: filter (tc :: filter (...))) look like ?

Boost.Range

As part of the implementation above, it looks like this:

Not to look nervous!

m_func3 

m_it3

 m_func2 

 m_it2

 m_func1 

 m_it1; 

 m_itEnd1;

 m_itEnd2 

 m_func1

 m_it1;

 m_itEnd1; 

m_itEnd3

 m_func2 

 m_it2

 m_func1 

 m_it1; 

 m_itEnd1;

 m_itEnd2 

 m_func1

 m_it1; 

 m_itEnd1;

Obviously, it is terribly inefficient.

Range v3

Let's think about how to optimize this adapter. The idea of Eric Niebler was that we should put general information (a functor and a pointer to the end) into an adapter object, and then we can store a link to this adapter object and the desired iterator. Then, as part of this implementation, the triple filter would look like :

*m_rng

m_it

Tyk

m_rng3

m_it3

 m_rng2 

 m_it2

 m_rng1 

 m_it1

This is still not perfect, although at times faster than the previous implementation.

think-cell index concept

Now consider the solution of the company think-cell. They introduced the so-called concept of indexes ( index concept ) to solve this problem. An index is an iterator that performs all the same operations as a regular iterator, but does so by referring to intervals.

template<typename Base, typename Func>
structindex_range {
    ...
    using Index = ...;
    Index begin_index()const;
    Index end_index()const;
    voidincrement_index( Index& idx )const; 
    voiddecrement_index( Index& idx )const; 
    reference dereference( Index& idx )const;
    ...
};

We show how you can combine an index with a regular iterator.

It is clear that a regular iterator can also be considered an index. In the opposite direction, compatibility can be implemented for example:

template<typename IndexRng>
structiterator_for_index {
    IndexRng* m_rng;
    typename IndexRng::Index m_idx;
    iterator& operator++() {
        m_rng.increment_index(m_idx); 
        return *this;
    }
    ... 
};

Then the triple filter will be implemented super-efficiently:

template<typename Base, typename Func> 
structfilter_range {
    Func m_func; 
    Base& m_base;
    using Index = typename Base::Index;
    voidincrement_index( Index& idx )const{
        do {
            m_base.increment_index(idx);
        } while ( idx != m_base.end_index() 
            && !m_func(m_base.dereference_index(idx)) );
    }
};

template<typename IndexRng>
structiterator_for_index {
    IndexRng* m_rng;
    typename IndexRng::Index m_idx; 
    ...
};

As part of this implementation, the algorithm will work quickly regardless of the depth of the filter.

Intervals with lvalue and rvalue containers

Now let's see how the intervals work with lvalue and rvalue containers:

lvalue

Range V3 and think-cell behave the same with lvalue. Suppose we have this code:

auto rng = view::filter(vec, pred1);
bool b = ranges::any_of(rng, pred2);

Here we have a pre-declared vector that lies in memory (lvalue), and we need to create an interval and then somehow work with it. We create a view using view :: filter or tc :: filter and become happy, there are no errors, and we can use this view, for example, in any_of.

Range V3 and rvalue

However, if our vector had not yet been remembered (for example, if we only created it), and we would have the same task, then we would try to write and faced with an error:

auto rng = view::filter(create_vector(), pred1); // не скомпилируетсяbool b = ranges::any_of(rng, pred2);

Why did it arise? View will be a hanging link to rvalue due to the fact that we create a vector and directly put in the filter, that is, the filter will have an rvalue link, which will then point to something unknown when the compiler goes to the next line and an error occurs. In order to solve this problem, in Range V3 came up with an action :

auto rng = action::filter(create_vector(), pred1); // теперь скомпилируетсяbool b = ranges::any_of(rng, pred2);

Action does everything at once, that is, it simply takes a vector, filters it by predicate and puts it in the interval. However, the disadvantage is that it is no longer lazy, and think-cell tried to correct this minus.

think-cell and rvalue

Think-cell made it so that instead of view a container is created there:

auto rng = tc::filter(creates_vector(), pred1); 
bool b = ranges::any_of(rng, pred2);

As a result, we do not encounter a similar error, because in their implementation the filter collects the rvalue container instead of the link, so this happens lazily. In Range V3, they didn’t want to do that, because they were afraid that there would be errors due to the fact that the filter behaves either as a view or as a container, but the think-cell is convinced that programmers understand how the filter behaves, and most of the errors arise precisely because of this "laziness."

Generator intervals

Let us generalize the concept of intervals. In fact, there are intervals without iterators. They are called generator ranges . Suppose we have a GUI widget (interface element), and we call the move widget. We have a window that asks to move its widget, we also have a button in the list box , and another window should also flip through its widgets, that is, we call traverse_widgets , which connects the elements to the functor ( you can say, there is an enumeration function where you connect the functor, and the function lists in this functor all the elements that it has ).

template<typename Func>
voidtraverse_widgets( Func func ){
    if (window1) {
        window1->traverse_widgets(std::ref(func));
    }
    func(button1);
    func(listbox1);
    if (window2) {
        window2->traverse_widgets(std::ref(func));
    }
}

This is somewhat similar to the interval of widgets, but there are no iterators here. Writing them directly would be inefficient and, above all, very difficult. In this case, we can say that such constructions are also considered as intervals. Then for such cases the use of useful interval methods takes place, such as any_of :

mouse_hit_any_widget=tc::any_of(
    [] (auto func) { traverse_widgets(func); },
    [] (autoconst& widget) {
        return widget.mouse_hit();
    }
);

think-cell tries to implement the methods so that they have the same interface for all types of intervals:

namespace tc {
    template< typename Rng >
    boolany_of( Rng const& rng ){
        bool bResult = false;
        tc::enumerate( rng, [&](bool_context b) {
            bResult = bResult || b;
        } );
        return bResult;
    }
}

Using tc :: enumerate , the difference between the intervals is hidden, since such an implementation adheres to the concept of internal iteration (what the concepts of external and internal iteration are described in more detail in the lecture), however, such an implementation has its drawbacks, namely std :: any_of stops as soon as true is encountered . They try to solve this problem, for example, by adding exceptions (the so-called interrupted generator intervals ).

Conclusion

I hate the range-based for loop, because it motivates people to write it wherever it is needed and where it is not needed, because of which the brevity of the code often worsens, for example, people write this:

bool b = false; 
for (int n : rng) {
    if ( is_prime(n) ) {
        b = true;
        break;
    }
}

instead of this:

bool b = tc::any_of( rng, is_prime );

Tags: