Reflection in C ++ 14

This article is a transcript (with minor corrections) of the report of Anton antoshkka Polukhin - “A bit of magic for C ++ 14”.

I recently tinkered with C ++ and accidentally discovered a couple of new metaprogramming techniques that allow reflection in C ++ 14. A couple of motivational examples. Here you have some kind of POD structure, some fields in it:

struct complicated_struct {
    int i;
    short s;
    double d;
    unsigned u;
};

The number of fields and their names do not matter, the important thing is that with this structure we can write the following piece of code:

#include 
#include "magic_get.hpp"
struct complicated_struct { /* … */ };
int main() {
    using namespace pod_ops;
    complicated_struct s {1, 2, 3.0, 4};
    std::cout << "s == " << s << std::endl; // Compile time error?
}

The main function, in it we create a variable of our structure, somehow we initialize it through aggregate initialization, and then we try to display this variable in std :: cout. And at this moment, we, in theory, should have a compilation error: we did not define the output operator in the stream for our structure, the compiler does not know how to compile and output all this. However, it compiles and displays the contents of the structure:

antoshkka@home:~$ ./test
s == {1, 2, 3.0, 4}


We can return to the code, change the names of the fields, change the name of the structure, change the name of the variable, anything we can do - the code will continue to work and correctly display the contents of the structure. Let's see how it works.

The operator is described in the header file magicget.hpp; it works with any data types:

template 
std::basic_ostream&
    operator<<(std::basic_ostream& out, const T& value)
{
    flat_write(out, value);
    return out;
}

This statement calls the flat_write method. The flat_write method displays curly braces and contains a dash string in the middle:

template 
void flat_write(std::basic_ostream& out, const T& val) {
    out << '{';
    detail::flat_print_impl<0, flat_tuple_size::value >::print(out, val);
    out << '}';
}

In the middle of the dash line is flat_tuple_size:: value. And here it should be noted that in the standard library there is std :: tuple_size, which enters the number of elements in the tuple. However, here T is not a tuple, not std :: tuple, but a user type. Here flat_tuple_size prints the number of fields in the user type.

Let's look further at what the print function does:

template 
struct flat_print_impl {
    template 
    static void print (Stream& out, const T& value) {
        if (!!FieldIndex) out << ", ";
        out << flat_get(value);         // std::get(value)
        flat_print_impl::print(out, value);
    }
};

The print function displays or does not display a comma depending on the index of the field we are working with, and then comes the call to the flat_get function and comment that it works like std :: get, that is, it returns the field from the structure by index. A natural question arises: how does it work?

It works as follows: the output operator in the stream determines the number of fields in your structure, iterates through the fields through the indices, and displays each of the fields by index. Thus it turns out what you saw at the beginning of the article.

Let's look further at how to make the flat_get and flat_tuple_size methods that work with custom structures, determine the number of fields in the structure, display this structure by fields:

/// Returns const reference to a field with index `I`
/// Example usage: flat_get<0>(my_structure());
template 
decltype(auto) flat_get(const T& val) noexcept;
/// `flat_tuple_size` has a member `value` that constins fields count
/// Example usage: std::array::value > a;
template 
using flat_tuple_size;

Let's start with a simple one. We will count the number of fields in the structure. We have a POD structure T:

static_assert(std::is_pod::value, "")

for this structure we can write an expression:

T { args... }

This is aggregate structure initialization. This expression is successfully compiled if the number of arguments is less than or equal to the number of fields inside the structure and each of the argument types corresponds to the type of field inside the structure.

From this abracadabra we will try to get the number of fields inside the structure T. How will we do this? We take our structure T and try to initialize it with some huge number of arguments. This does not compile. We will reject one of the arguments and try again. This also does not compile, but someday we will get to the number of arguments that is equal to the number of fields inside our structure, and then it will be assembled. At this moment, we just need to remember the number of arguments - and here we are ready: we have the number of fields inside the structure. This is a basic idea. Let's go into the details.

How many arguments do we need from the very beginning if our T structure contains only char or unsigned char and other types of 1 byte size? In this case, the number of fields inside the structure T will be equal to the size of this structure. If we have other fields, for example, an int or a pointer, then the number of fields will be less than the size of the structure.

We got the number of fields from which to start aggregate initialization. That is, we will initialize our structure T with the number of arguments equal to sizeof (T). If it failed to compile, then we throw back one argument, try again, if it was compiled, then we found the number of fields inside the structure. One problem remains: even if we guessed with the number of arguments inside the structure, the code will still not be compiled. Because we need to know the type of field exactly.

Let's do a workaround. We will make a structure with an operator of implicit type conversion to any type:

struct ubiq {    
    template 
    constexpr operator Type&() const;
};
int i = ubiq{};
double d = ubiq{};
char c = ubiq{};

This means that the variables of this structure are cast to any type: int, double, std :: string, std :: vector, any custom types, to anything.

The whole recipe: we take the T structure and try to aggregate the initialization of this structure with the number of arguments equal to sizeof (T), where each argument is an instance of our ubiq structure. At the aggregate initialization stage, each instance from ubiq will turn into a field type inside the T structure, and we only need to select the number of arguments. If many arguments have not been compiled, we discard one and try again. If compiled, then we consider the number of arguments - and we got the result.

Now a little code. Change the ubiq structure slightly: add a template parameter to make it easier to use this structure with variadic templates. We also need std :: make_index_sequence (an entity from C ++ 14 that expands to std :: index_sequence - a long chain of digits).

Ready to see the scary code? Go.

Only two functions:

// #1
template 
constexpr auto detect_fields_count(std::size_t& out, std::index_sequence)
    -> decltype( T{ ubiq_constructor{}, ubiq_constructor{}... } )
{ out = sizeof...(I) + 1;      /*...*/ }
// #2
template 
constexpr void detect_fields_count(std::size_t& out, std::index_sequence) {
    detect_fields_count(out, std::make_index_sequence{});
}

Both functions are called detect_fields_count. The first function is a bit more specialized. So when the compiler sees detect_fields_count, he will think that the first function is more specialized and that he should try to use it.

This function has a trailing return type, i.e. the type of this function is decltype from T with
aggregate initialization. If we guessed with the number of arguments, then this expression will compile, we fall into the body of this function and in the output variable out we write the number of arguments that we have. If it doesn’t work out (we didn’t guess the number of arguments), then the compiler will think that this is not a mistake, but a substitution failure, and he should find another function with the same name, but less specialized. He will take function # 2. Function # 2 discards one of the indices (i.e. reduces the number of arguments by one) and calls detect_fields_count again. Again, either the first function or the second will be called. Thus we will go over the arguments and find the number of fields inside the structure. That was the easy part.

Complex ahead: how to get the type of field inside the structure T?

We already have our T expression with aggregate initialization and we pass ubiq instances inside. For each ubiq instance, an implicit type conversion operator is called, and we know the type of the field inside this operator. All we need now is to somehow grab this information and pull it into an external scope, where we can work with it - beyond the aggregate initialization of the T structure. Unfortunately, in C ++ there is no mechanism to write the data type to a variable. More precisely, there is std :: type_index and std :: type_info, but they are useless at the compilation stage. We will not pull the type back from them later.

Let's try to get around this limitation somehow. To do this, recall what POD is (but very roughly: the standardization committee likes to change the definition every three years).

A POD structure is a structure whose fields are marked with either public, private, or protected (we are only interested in public fields). And all the fields inside this structure are either other POD structures or fundamental types: pointers, int, std :: nullptr_t. For a couple of minutes, we will forget about pointers and it will turn out that there are not enough fundamental types, less than 32-x, and this means that we can assign a certain identifier (integral number) to each fundamental type. We can write this digit in the output array, pull this output array beyond the implicit conversion of the operator, and there the digit is converted back to type. Here is such a simple idea.

Implementation went. To do this, change our ubiq structures:

template 
struct ubiq_val {
    std::size_t* ref_;
    template 
    constexpr operator Type() const noexcept {
        ref_[I] = typeid_conversions::type_to_id(identity{});
        return Type{};
    }
};

There is now a pointer to the output array, and this output array has a terrible name, ref_, but it happened. The implicit type conversion operator has also changed: now it calls the type_to_id function. It converts the type to an identifier, and we write this identifier into the output array ref_. It remains to generate a bunch of type_to_id methods. We will do this with a macro:

#define BOOST_MAGIC_GET_REGISTER_TYPE(Type, Index)              \
    constexpr std::size_t type_to_id(identity) noexcept { \
        return Index;                                           \
    }                                                           \
    constexpr Type id_to_type( size_t_ ) noexcept {     \
        Type res{};                                             \
        return res;                                             \
    }                                                           \
    /**/

The macro will generate a type_to_id function for us, which turns the type into an identifier and will also generate an id_to_type function for us, which turns the identifier back into a type. This macro is not visible to the user. As soon as we used it, we undefine it. We register the fundamental types (not all are listed here):

BOOST_MAGIC_GET_REGISTER_TYPE(unsigned char         , 1)
BOOST_MAGIC_GET_REGISTER_TYPE(unsigned short        , 2)
BOOST_MAGIC_GET_REGISTER_TYPE(unsigned int          , 3)
BOOST_MAGIC_GET_REGISTER_TYPE(unsigned long         , 4)
BOOST_MAGIC_GET_REGISTER_TYPE(unsigned long long    , 5)
BOOST_MAGIC_GET_REGISTER_TYPE(signed char           , 6)
BOOST_MAGIC_GET_REGISTER_TYPE(short                 , 7)
BOOST_MAGIC_GET_REGISTER_TYPE(int                   , 8)
BOOST_MAGIC_GET_REGISTER_TYPE(long                  , 9)
BOOST_MAGIC_GET_REGISTER_TYPE(long long             , 10)
...

We do not use a toe. I will say why later. Registered all fundamental types. Now we make a function that turns type T into an array of field identifiers inside this type T. All the most interesting is in the body of this function:

template 
constexpr auto type_to_array_of_type_ids(std::size_t* types) noexcept
    -> decltype(T{ ubiq_constructor{}... })
{
    T tmp{ ubiq_val< I >{types}... };
    return tmp;
}

Here comes the aggregate initialization of the temporary variable, and we pass the ubiq instances there. This time they hold a pointer to the output array: here types is the output array into which we write the identifiers of the field types. After this line (after initializing the temporary variable), the types output array will store the type identifiers of each field. The type_to_array_of_type_ids function is constexpr, i.e. everything can be used at the compilation stage. Beauty! We have to turn identifiers back into types. This is done like this:

template 
constexpr auto as_tuple_impl(std::index_sequence) noexcept {
    constexpr auto a = array_of_type_ids();              // #0
    return std::tuple<                                      // #3
        decltype(typeid_conversions::id_to_type(            // #2
            size_t_{}                                 // #1
        ))...                                               
    >{};
}

Zero line: here we get an array of identifiers. Here, the type of the variable a is something similar to std :: array, but strongly extended so that it can be used in constexpr expressions (because we have C ++ 14 and not C ++ 17, where most of the problems are with constexpr for std :: arrary fixed).

In line # 1, we create an integral constant from an element from an array. The integral constant is std :: integral_constant the first parameter for which is size_t_, and the second parameter will be just our a [I]. size_t_ is using a declaration, alias. In line # 2, we convert the identifier back to type, and in line # 3 we create std :: tuple, and each element from this tuple exactly matches the data types inside the T structure, the structure inside which we looked. Now we can do something very dumb. For example, reinterpret_cast user structure to tuple'u. And we can work with the user structure as with a tuple. Well, it's a little awkward: reinterpret_cast.

I warn you, do not try to copy and run because the code is a bit simplified. For example, std :: tuple does not regulate the order of creation and destruction of arguments: some implementations initialize arguments from the end to the front, store them in the wrong order, which is why std :: tuple does not work. You have to make your own.

Let's go further. What to do with pointers: pointers to constant pointers, to pointers to ints, etc.?

We have a type_to_id function. It returns std :: size_t and we did not use a bunch of bits from this std :: size_t: we used only for 32 fundamental types. So, these bits can be used to encode information about the pointer. For example, if we have a field with an unsigned char type in our user structure, then in binary form it will look like this: The least significant bit contains the char identifier. This is one: so we assigned it in the macro. If we have an unsigned char pointer, then the most significant bits will now store information that it is a pointer: If we have a constant pointer, then the most significant bits will store information that it is a constant pointer:

unsigned char c0; // 0b00000000 00000000 00000000 00000001



unsigned char* с1; // 0b00100000 00000000 00000000 00000001



const unsigned char* с2; // 0b01000000 00000000 00000000 00000001

If we add an additional level of indication (another pointer), then the other most significant bits will change and will store information about what we have a pointer: Change the underlying type: the most significant bits do not change, the least significant bits now contain the identifier seven, which means what we are working with short: Add functions that convert the type to identifier (and add these bitsiks accordingly):

const unsigned char** с3; // 0b01000100 00000000 00000000 00000001



const short** s0; // 0b01000100 00000000 00000000 00000111



template
constexpr std::size_t type_to_id(identity)
template
constexpr std::size_t type_to_id(identity)
template
constexpr std::size_t type_to_id(identity)
template
constexpr std::size_t type_to_id(identity)

And add the inverse functions that convert the identifier back to the type:

template constexpr auto id_to_type(size_t_,
if_extension = 0) noexcept;
template constexpr auto id_to_type(size_t_,
if_extension = 0) noexcept;
template constexpr auto id_to_type(size_t_,
if_extension = 0) noexcept;
template constexpr auto id_to_type(size_t_,
if_extension = 0) noexcept;

Here if_extension is std :: enable_if with aliases and a lot of magic. The magic is that, depending on the identifier, it allows you to call only one of the functions presented.

What to do with enum'ami I do not know. The only thing I could come up with was calling std :: underlying_type. That is, we are losing information about what kind of enum is: we cannot register all user enums in our list of fundamental types, this is simply impossible. Instead, we only encode how this enum is stored. If it is int, then we will save it as int, if the user specified class enum: char, then we will get char and will encode only char, information about the type of enum will be lost.

Complex structures and classes have the same problem: we cannot register them all with our list of fundamental types. Therefore, we will just once again look inside the class and encode all the fields that are in this class as if they are in our zero-level class.

Let's say we have structure a, it has a field whose type is structure b, we look inside b and drag all the fields from b into a. I simplify: there is still a lot of logic with alignments so that it does not break.

This is done like this: one type_to_id function is added:

template 
constexpr auto type_to_id(identity, typename std::enable_if<
    !std::is_enum::value && !std::is_empty::value>::type*) noexcept 
{
    return array_of_type_ids(); // Returns array!
}

This time it can return an array (all past ones returned size_t). We will need to change our ubiq structure so that it can work with arrays, add logic to how to determine offset, where to write offset, and information about which substructure we are working with. This is all long and not very interesting, there are a couple of examples of how it turns out, these are also technical details.

What does this give us and where can all this be used? No, you don’t have to write it yourself because there is a ready-made library that implements all this. Here is what this library gives you.

First, comparisons : you no longer need to manually write comparisons for
POD structures. There are three methods. That is, you can not write anything at all, and connect one header file and for all POD structures you will have a comparison out of the box.

There are heterogeneous comparisons : two structures with the same fields, but
different data types, can be compared with each other.

There is a universal hash function : pass any user structure there, and it considers the hash from it.

I / O operators : what we have already seen in the introduction, too, everything is there and it works.

When I talked about this metaprogram magic for the first time, the developers of some piece of iron were very happy (I just won’t remember). They say that they have 1000 different planar structures that represent different protocols. That is, the one-to-one structure maps to the protocol. For each of these structures, they have three serializers (depending on which hardware and which wires are used in the future). And they have 3,000 serializers. They were very unhappy with this. Using this library, they were able to simplify 3,000 serializers to 3 serializers. They were insanely happy.

These metaprogramming tricks open up the possibility for basic reflection: you can write new type_traits, for example: is_continuous_layout, is_padded, has_unique_object_representations(as in C ++ 17).

You can write great punch_hole functions(which are not in the library) and which define unused bits and bytes in the user structure, return a link to them, allowing other people to use them.

Finally, you can write more generalized algorithms: for example, you can modify boost :: spirit so that it immediately parses into the user structure and that you do not need to declare this structure using macros from boost :: fusion and boost :: spirit. By the way, one of the boost :: spirit developers came up to me and said: “- It's brilliant! I want this thing, give me a link to the library. ” I gave it to him.

A couple of examples. There is such a scary structure:

namespace foo {
    struct comparable_struct {
        int i; short s; char data[50]; bool bl; int a,b,c,d,e,f;
    };
} // namespace foo
std::set s;

It has a bunch of fields. The problem is that we want to transfer this structure to some container. For example in std :: set. Without a library, you would have to write a scary comparator for this structure. We could use std :: tie to write a comparator, but if the structure changes, we must remember that everywhere we need to make the appropriate changes. Hell. It’s better not to think about it. Everything works with the library out of the box: take a structure, shove it into std :: set, everything works. Serialization of this structure also works:

std::set s = { /* ... */ };
std::ofstream ofs("dump.txt");
for (auto& a: s)
    ofs << a << '\n';

Just in ofstream drive variables without thinking about anything.

Deserialization also works: just from the stream, push the values ​​back into the structure, insert into the container that you want to use:

std::set s;
std::ifstream ifs("dump.txt");
foo::comparable_struct cs;
while (ifs >> cs) {
    char ignore = {};
    ifs >> ignore;
    s.insert(cs);
}

Beauty: minimum code.

My favorite example is because it is the most meaningless, but it looks beautiful. Function flat_tie: it allows you to initialize your structure from std :: tuple.

template 
auto flat_tie(T& val) noexcept;
struct my_struct { int i, short s; };
my_struct s;
flat_tie(s) = std::tuple{10, 11};

Now you have my_struct :: i storing the value 10, and my_struct :: s storing 11 inside the s structure.

Thus, at this point, we have a library that allows you to do reflection, some kind of, but working in C ++ 14, a bunch of examples of where it can be used.

But the library uses reinterpret_cast. And I don’t like reinterpret_casts: they do not allow constexpr functions, and even that is ugly.

So let's try to fix it. We will fix this quickly and simply: we’ll just get into a half-megabyte of code. We will do this in C ++ 17. The idea is this: in C ++ 17 added structure binding. This is such a thing that allows us to decompose the structure into fields, get links to fields. The only problem is that you need to know exactly the number of fields inside the structure, otherwise everything will not be collected, and through enable_if it is impossible to use this structure binding. But at the beginning of the article, we already learned how to get the number of fields within the structure. From the number of fields we make the data type and use tag dispatching:

template 
constexpr auto as_tuple(T& val) noexcept {
  typedef size_t_()> fields_count_tag;
  return detail::as_tuple_impl(val, fields_count_tag{});
}

We generate a bunch of as_tuple_impl functions:

template 
constexpr auto as_tuple_impl(T&& val, size_t_<1>) noexcept {
  auto& [a] = std::forward(val);
  return detail::make_tuple_of_references(a);
}
template 
constexpr auto as_tuple_impl(T&& val, size_t_<2>) noexcept {
  auto& [a,b] = std::forward(val);
  return detail::make_tuple_of_references(a,b);
}

For these functions as_tuple_impl, the second parameter will be the number of fields inside the structure T. If we determine that we have one field inside the structure T, the first function as_tuple_impl will be called. She uses structure binding, takes out the first field, makes a tuple in which there will be a link to this field, returns this tuple to the user. If we have two fields inside the structure, then we call structure binding for two fields. This is the second function, it decomposes the user structurally into fields a and b, we return a tuple that stores links to the first field and to the second field. Beauty!

The best part is that all this is constexpr, and now we can write std :: get which works with custom fields and structures - and all this at the compilation stage. Incredible beauty. The only problem is that the code with structure binding has not been tested yet: compilers do not yet support structure binding. Therefore, this is a beautiful theory, and most likely it will work with a couple of changes.

Original report by Anton antoshkka Polukhin - https://youtu.be/jDI5CHKFKd0
Precise and Flat Reflection Library (magic_get)

Also popular now: