Qualab February 10, 2013 at 12:24

Type conversion in Boost.Python. We do the conversion between the familiar types of C ++ and Python

Tutorial

This article is not a continuation of the C ++ API wrapper story. There will be no wrappers today. Although logically this is the third part of this story .
Today there will be a sea of blood, the dismemberment of existing types and their magical transformation into familiar analogues in another language.
We are not talking about the existing conversion between the lines, no, we will write our own converters.
We will turn the usual python datetime.datetime into boost :: posix_time :: ptimeBoost libraries and vice versa, to hell with it, we generally turn the entire datetime library into boost types! And so as not to be boring, we’ll sacrifice the built-in Python 3.x byte array class, it just doesn’t yet have a converter in Boost.Python, and then we brutally use the byte array conversion in the new Python converter uuid.UUID to boost :: uuids: : uuid . Yes, the converter can be used in the converter!
Do you want blood, Colosseum?! ..

Instead of introducing

If anyone hasn't noticed, Boost.Python does a great job turning a bunch of scalars into objects of Python classes of the appropriate type. If you want to compare, write in pure C, use the C-API directly, let it piss your brain. Spend a lot of time to understand the comfort of modern technology, the convenience of an easy chair, the need for a hot bath and a remote control for the TV. Fans of wooden benches, washing in the ice-hole and splinter, let them continue to be engaged in popular art.
So, there is such a thing: built-in converters in Boost.Python - built-in type converters from Python to C ++ and vice versa, which are partially implemented in $ (BoostPath) \ libs \ python \ src \ converter and $ (BoostPath) \ boost \ python \ converter. There are many of them, they solve somewhere around 95% of problems when working with the built-in types of Python and C ++, there is string conversion, which is not ideal of course , but if in C ++ we work with UTF-8 strings or wide-strings, then everything will be fast, qualitatively and imperceptibly, in the sense of convenient to use.
Almost everything that is not done by built-in converters is solved by wrappers of your classes. Boost.Python offers a truly monstrously simple way to describe class wrappers as a meta-language that even looks like a Python class:

class_( "Some" )
    .def( "method_A", &Some::method_A, args( "x", "y", "z" ) )
    .def( "method_B", &Some::method_B, agrs( "u", "v" ) )
;

Everything is great, but there is one thing but ...
... one big and wonderful thing: C ++ and Python are languages with their own libraries. In c ++

#include 
#include

is the de facto counterpart in Python:

import datetime
import uuid

So, a lot of things in your C ++ code can already be tied specifically to working with, for example, the boost :: gregorian :: date class , and in Python, in turn, a lot is tied to the datetime.date class , its analogue. To work in Python with a wrapper of the boost :: gregorian :: date class wrapped with all methods, overloading operators and trying to stick instances of it instead of the usual datetime.date - I don’t even know what it's called, it's not a crutch, it's dancing with a grenade. And this grenade will explode, gentlemen of the jury. On the Python side, you need to work with the built-in library of date and time.
If you read this, and look at your code, where through extract you get the fields of Python datetime in C ++, then there’s nothing stupid to smile at, everything described in the paragraph above applies to you no less. Even if you have a mega-class of date / time in C ++, it is better to write a type converter than to unhook them one field in some kind of bicycle method.
In general, if on the Python side its own type, and on the C ++ side its own established type that implements the basic logic with the same functional component, then you need a converter.
You really need a converter.

What is a converter

A converter is some kind of conversion registered in Boost.Python from a C ++ type to a Python type or vice versa. On the C ++ side, you use the familiar types, in the full confidence that in Python it will be the corresponding type. Actually, converters usually write in both directions, but writing a conversion from C ++ to Python is an order of magnitude easier, you will see for yourself. The thing is that creating an instance in C ++ requires memory, which is often a non-trivial task. Creating an object in Python is an extremely simple task, so let's start by converting from C ++ to Python.

Type Conversion from C ++ to Python

To convert from C ++ to Python, you need a structure that has a static convert method that accepts a type reference in C ++ and returns PyObject *, a common type for any object used in the Python C-API and as the filling of boost :: python :: object.
Let's get a template structure right now because we want a mass slaughter:

template< typename T >
struct type_into_python
{
    static PyObject* convert( T const& );
};

All that is required is to implement, for example, for the type boost :: posix_time :: ptime, the template structure specialization method:

template<> PyObject* type_into_python::convert( ptime const& );

and register the converter when declaring the module inside BOOST_PYTHON_MODULE :

    to_python_converter< ptime, type_into_python >();

Well, since I said Az, let's tell you and Buki. The converter implementation for boost :: posix_time :: ptime will look something like this:

PyObject* type_into_python::convert( ptime const& t )
{
    auto d = t.date();
    auto tod = t.time_of_day();
    auto usec = tod.total_microseconds() % 1000000;
    return PyDateTime_FromDateAndTime( d.year(), d.month(), d.day(), tod.hours(), tod.minutes(), tod.seconds(), usec );
}

Important! When registering a module, we definitely need to connect datetime via the C-API:

    PyDateTime_IMPORT;
    to_python_converter< ptime, type_into_python >();

Without the string PyDateTime_IMPORT, nothing will take off.

We were lucky in general that in the C-API of the Python language there is a ready-made function for creating PyObject * on a new datetime.datetime by its parameters, essentially an analog of the constructor of the datetime class. And with no luck, Boost has such a “fun” API for the ptime class . The class turned out to be not entirely independent, you have to extract the date and time from it, which are the individual components there, and the time is presented in the form of time_duration - an analog not so much datetime.time, but rather datetime.timedelta! This, in general, will not allow one to represent the types of the datetime library in C ++, but it’s quite unpleasant that boost :: posix_time :: time_durationdoes not provide direct access to microseconds and milliseconds. Instead, you have to either “cunningly” work with the fractional_seconds () method, or stupidly do the terrible thing - take the module total_microseconds ()% 1000000. What's worse - I haven't decided yet, I don’t like how time_duration is done . We will make the datetime.time class for it, and we will not touch another similar datetime.timedelta class for now.

Convert from Python to C ++

Hehe, my friends, this is a really difficult point. Stock up validol, fasten belts.
Everything seems to be exactly the same: we make a structure template with two methods convertible and construct - the ability to convert and type constructor in C ++. Actually, it’s all the same what the methods are called, the main thing is to refer to them during registration, it is most convenient to do this in the constructor of our template structure:

template< typename T >
struct type_from_python
{
    type_from_python()
    {
        converter::registry::push_back( convertible, construct, type_id() );
    }
    static void* convertible( PyObject* );
    static void construct( PyObject*, converter::rvalue_from_python_stage1_data* );
};

Actually, when declaring a module, it will be enough to call the constructor of this structure. Well, of course, you need to implement these methods for each convertible type, for example, for ptime:

template<> void* type_from_python::convertible( PyObject* );
template<> void  type_from_python::construct( PyObject*, converter::rvalue_from_python_stage1_data* );

Let's take a look at the implementation of the convertibility test method and the ptime construction method right away:

void* type_from_python::convertible( PyObject* obj )
{
    return PyDateTime_Check( obj ) ? obj : nullptr;
}
void type_from_python::construct( PyObject* obj, converter::rvalue_from_python_stage1_data* data )
{
    auto storage = reinterpret_cast< converter::rvalue_from_python_storage* >( data )->storage.bytes;
    date date_only( PyDateTime_GET_YEAR( obj ), PyDateTime_GET_MONTH( obj ), PyDateTime_GET_DAY( obj ) );
    time_duration time_of_day( PyDateTime_DATE_GET_HOUR( obj ), PyDateTime_DATE_GET_MINUTE( obj ), PyDateTime_DATE_GET_SECOND( obj ) );
    time_of_day += microsec( PyDateTime_DATE_GET_MICROSECOND( obj ) );
    new(storage) ptime( date_only, time_of_day );
    data->convertible = storage; 
}

With the convertible method, everything is clear: you datetime - go through , no - nullptr and exit.
But the construct method will be equally furious for absolutely every type!
Even if you have your own type of MyDateTime, you will have to create it locally through the host new where you will be allowed to host it! See this funny operator here:

    new(storage) ptime( date_only, time_of_day );

This is the host new. It creates your new object in the specified location. This is the place we still need to calculate, we are offered the following way to get the desired pointer:

    auto storage = reinterpret_cast< converter::rvalue_from_python_storage* >( data )->storage.bytes;

I will not comment on this. Just remember.
Everything else is extra computation to invoke the understandable constructor of the non-self-contained ptime class.
Do not forget to fill out another field at the end:

    data->convertible = storage;

Again, I don’t know how to call it softer, just remember that it is important and the field needs to be filled. Think of it as an unpleasant little thing in front of universal happiness.
Examples of how someone else does this can be seen here , here and here on the Boost.Python website in the FAQ section .

Converting datetime types to and back

Total, for date and time separately, everything is quite simple. Thanks to our template structure, we only need to add an implementation for date and time_duration of the following methods for specializing our template structures:

template<> PyObject* type_into_python::convert( date const& );
template<> void*     type_from_python::convertible( PyObject* );
template<> void      type_from_python::construct( PyObject*, converter::rvalue_from_python_stage1_data* );
template<> PyObject* type_into_python::convert( time_duration const& );
template<> void*     type_from_python::convertible( PyObject* );
template<> void      type_from_python::construct( PyObject*, converter::rvalue_from_python_stage1_data* );

The task is simple, it boils down to breaking the previous methods into pairs for the date and time separately.
For boost :: gregorian :: date and datetime.date :

PyObject* type_into_python::convert( date const& d )
{
    return PyDate_FromDate( d.year(), d.month(), d.day() );
}
void* type_from_python::convertible( PyObject* obj )
{
    return PyDate_Check( obj ) ? obj : nullptr;
}
void type_from_python::construct( PyObject* obj, converter::rvalue_from_python_stage1_data* data )
{
    auto storage = reinterpret_cast< converter::rvalue_from_python_storage* >( data )->storage.bytes;
    new(storage) date( PyDateTime_GET_YEAR( obj ), PyDateTime_GET_MONTH( obj ), PyDateTime_GET_DAY( obj ) );
    data->convertible = storage; 
}

And for boost :: posix_time :: time_duration and datetime.time :

PyObject* type_into_python::convert( time_duration const& t )
{
    auto usec = t.total_microseconds() % 1000000;
    return PyTime_FromTime( t.hours(), t.minutes(), t.seconds(), usec );
}
void* type_from_python::convertible( PyObject* obj )
{
    return PyTime_Check( obj ) ? obj : nullptr;
}
void type_from_python::construct( PyObject* obj, converter::rvalue_from_python_stage1_data* data )
{
    auto storage = reinterpret_cast< converter::rvalue_from_python_storage* >( data )->storage.bytes;
    time_duration* t = new(storage) time_duration( PyDateTime_TIME_GET_HOUR( obj ), PyDateTime_TIME_GET_MINUTE( obj ), PyDateTime_TIME_GET_SECOND( obj ) );
    *t += microsec( PyDateTime_TIME_GET_MICROSECOND( obj ) );
    data->convertible = storage; 
}

Registering all this stuff in our module will look something like this:

BOOST_PYTHON_MODULE( ... )
{
    ...
    PyDateTime_IMPORT;
    to_python_converter< ptime, type_into_python >();
    type_from_python< ptime >();
    to_python_converter< date, type_into_python >();
    type_from_python< date >();
    to_python_converter< time_duration, type_into_python >();
    type_from_python< time_duration >();
    ...
}

We check the work with converting the date and time

It’s time to check our mega-conversion in business, we’ll get all sorts of unnecessary functions that accept a date / time at the input and return the date / time at the output.

ptime tomorrow();
ptime day_before( ptime const& the_moment );
date last_day_of_this_month();
date year_after( date const& the_day );
time_duration delta_between( ptime const& at, ptime const& to );
time_duration plus_midday( time_duration const& the_moment );

Declare them in our module to call from Python:

    def( "tomorrow", tomorrow );
    def( "day_before", day_before, args( "moment" ) );
    def( "last_day_of_this_month", last_day_of_this_month );
    def( "year_after", year_after, args( "day" ) );
    def( "delta_between", delta_between, args( "at", "to" ) );
    def( "plus_midday", plus_midday, args( "moment" ) );

The way these our functions do the following (although in reality this is no longer important, input / output types are important):

ptime tomorrow()
{
    return microsec_clock::local_time() + days( 1 );
}
ptime day_before( ptime const& that )
{
    return that - days( 1 );
}
date last_day_of_this_month()
{
    date today = day_clock::local_day();
    date next_first_day = (today.month() == Dec) ? date( today.year() + 1, 1, 1 ) : date( today.year(), today.month() + 1, 1 );
    return next_first_day - days( 1 );
}
date year_after( date const& the_day )
{
    return the_day + years( 1 );
}
time_duration delta_between( ptime const& at, ptime const& to )
{
    return to - at;
}
time_duration plus_midday( time_duration const& the_moment )
{
    return time_duration( 12, 0, 0 ) + the_moment;
}

In particular, here is such a simple script (in Python 3.x):

from someconv import *
from datetime import *
# test datetime.datetime <=> boost::posix_time::ptime
t = tomorrow(); print( 'Tomorrow at same time:', t )
for _ in range(3): t = day_before(t); print( 'Day before that moment:', t )
# test datetime.date <=> boost::gregorian::date
d = last_day_of_this_month(); print( 'Last day of this month:', d )
for _ in range(3): d = year_after(d); print( 'Day before that day:', d )
# test datetime.time <=> boost::posix_time::time_duration
at = datetime.now()
to = at + timedelta( seconds=12*60*60 )
dt = delta_between( at, to )
print( "Delta between '{at}' and '{to}' is '{dt}'".format( at=at, to=to, dt=dt ) )
t0 = time( 6, 30, 0 )
t1 = plus_midday( t0 )
print( t0, "plus midday is:", t1 )

It should work out correctly and end approximately with the output of the correct dates and times. A test script will of course be attached. (I don’t write a conclusion, so as not to be scared how much it was written!)
You can, in principle, not be shy and write your test functions, they will all work as they should, if you did everything right.
In an extreme case, at the end I will lay out a link to the project along with a test script.

A byte array as a byte vector in C ++

Generally speaking, the example below is extremely harmful. A standard std :: vector template of type with a bit depth below int will be extremely inefficient. Losing during copying and, as a result, during vector :: resize () will be catastrophic, simply because copying will be done bitwise. With all optimizations turned on, this will lead to losses of up to 170% with simple copying compared to memcpy () (measured in the MSVS v10 Release build). Which is not particularly pleasant for a frequently used piece of code. Especially when copying is not visible, and sometimes implicitly resize () occurs. There are “entertaining” subsidence in performance, in the sense that there will be something to do, catching the brakes in a large system.

The example below is purely academic, if you need manic code optimization somewhere and you write part of the module code in C ++ just for that. If you are on the side of performance, you can safely use this transformation.
For Python 2.x, this section is irrelevant in principle. Then byte arrays were called strings. It will be much more interesting to read about working with unicode and converting it to a standard C ++ line here in PyWiki .
But for Python 3.x, this conversion will reduce a huge piece of code with a bunch of C-API to use the usual vector ( byte is an unsigned 8-bit integer - uint8_t ).

So, again we use our wonderful template structures and rejoice:

typedef uint8_t byte;
typedef vector byte_array;
...
template<> PyObject* type_into_python::convert( byte_array const& );
template<> void*     type_from_python::convertible( PyObject* );
template<> void      type_from_python::construct( PyObject*, converter::rvalue_from_python_stage1_data* );

All the same, we add the registration of converters to the announcement of our module:

BOOST_PYTHON_MODULE( ... )
{
    ...
    to_python_converter< byte_array, type_into_python >();
    type_from_python< byte_array >();
}

And the simplest implementation, we just use the C-API knowledge of the PyBytes object and work with the std :: vector methods:

PyObject* type_into_python::convert( byte_array const& ba )
{
    const char* src = ba.empty() ? "" : reinterpret_cast( &ba.front() );
    return PyBytes_FromStringAndSize( src, ba.size() );
}
void* type_from_python::convertible( PyObject* obj )
{
    return PyBytes_Check( obj ) ? obj : nullptr;
}
void type_from_python::construct( PyObject* obj, converter::rvalue_from_python_stage1_data* data )
{
    auto storage = reinterpret_cast< converter::rvalue_from_python_storage* >( data )->storage.bytes;
    byte* dest; Py_ssize_t len;
    PyBytes_AsStringAndSize( obj, reinterpret_cast( &dest ), &len );
    new(storage) byte_array( dest, dest + len );
    data->convertible = storage; 
}

It is unlikely that additional comments will be required, for the knowledge of the C-API of the PyBytes object I will send it here .

Convert uuid.UUID to boost :: uuids :: uuid and vice versa

You will laugh, but before that we simplified our work by creating those two templates at the very beginning, which again comes down to the implementation of the three methods:

using namespace boost::uuids;
...
template<> PyObject* type_into_python::convert( uuid const& );
template<> void*     type_from_python::convertible( PyObject* );
template<> void      type_from_python::construct( PyObject*, converter::rvalue_from_python_stage1_data* );

We habitually add two new lines to the module declaration - registration of conversion there and back:

    to_python_converter< uuid, type_into_python >();
    type_from_python< uuid >();

And now the most interesting thing is that the C-API will not help us here, it’s more likely to hinder, it’s easiest to act through boost :: python :: import the Python module “uuid” and the class “UUID” of the same module.

static object py_uuid = import( "uuid" );
static object py_uuid_UUID = py_uuid.attr( "UUID" );
PyObject* type_into_python::convert( uuid const& u )
{
    return incref( py_uuid_UUID( object(), byte_array( u.data, u.data + sizeof(u.data) ) ).ptr() );
}
void* type_from_python::convertible( PyObject* obj )
{
    return PyObject_IsInstance( obj, py_uuid_UUID.ptr() ) ? obj : nullptr;
}
void type_from_python::construct( PyObject* obj, converter::rvalue_from_python_stage1_data* data )
{
    auto storage = reinterpret_cast< converter::rvalue_from_python_storage* >( data )->storage.bytes;
    byte_array ba = extract( object( handle<>( borrowed( obj ) ) ).attr( "bytes" ) );
    uuid* res = new(storage) uuid;
    memcpy( res->data, &ba.front(), ba.size() );
    data->convertible = storage;
}

Sorry to use global variables, this is usually done in a singleton with Py_Initialize () and Py_Finalize () in the constructor and destructor, respectively. But since here we have a purely educational example and is used only from Python so far, you can get by with such a quick approach, once again I'm sorry, but the code is more clear.

Since the behavior in these methods is very different from all of the above, it is necessary to describe in more detail what is actually happening.
In py_uuid, we saved the uuid plug-in object from the Python standard library.
In py_uuid_UUID we saved an object of class uuid.UUID. It is the class itself as such. Applying brackets to this object will lead to a call to the constructor and creating an object of this type. What we will do later. However, this class itself as such is still useful to us for the convertible method - checking the type of the argument whether the object is a UUID.

To Python, from C ++ everything is clear - just call the constructor, pass None to the first parameter (the default constructor boost :: python :: object will create just None ), and the second leaves our byte array from the previous section. If you have Python 2.x, the code changes a bit and simplifies, just pass the string and pretend that it is a byte array.

When checking a Python object for convertibility, the PyObject_IsInstance () function helps us a lot.
We take a PyObject * pointer of type uuid.UUID using the ptr () method of boost :: python :: object. This is where the class object itself came in handy. In fact, classes in Python are the same objects. And that’s great. Thank you for such a logical and understandable language.

Here is the conversion code from Python to C ++, nothing is clear what is happening on this line:

    byte_array ba = extract( object( handle<>( borrowed( obj ) ) ).attr( "bytes" ) );

Here, in fact, everything is extremely simple. From the uuid.UUID object that came as PyObject * we create a full boost :: python :: object. Pay attention to the construction of handle <> ( borrowed (obj)) - here it is very important not to lose the borrowed call, otherwise our fresh object will crash the passed object in the destructor.
So, we got from PyObject * the boost :: python :: object object by reference to an argument of type uuid.UUID. We take the bytes attribute from our object , pull out byte_array from it through extract. Everything, we have contents.
Lovers of doing everything through serialization-deserialization can defecate through conversion to string and vice versa. Any lexical_cast () to help them and a stone on the neck. Remember that creating strings and serializing in C ++ is essentially a very expensive operation.
Python 2.x users will immediately receive bytes as a string. Such lines used to be, as in C / C ++, essentially through char *.
In general, then everything is simple, fill the array, excuse me for unsafe copying, and pass the completed object back to C ++.

Checking the operation of byte array and UUID conversions

Let's get some more functions that drive our types back and forth between C ++ and Python:

byte_array string_to_bytes( string const& src );
string bytes_to_string( byte_array const& src );
uuid random_uuid();
byte_array uuid_bytes( uuid const& src );

We will describe them in our module for calling from Python:

BOOST_PYTHON_MODULE( someconv )
{
    ...
    def( "string_to_bytes", string_to_bytes, args( "src" ) );
    def( "bytes_to_string", bytes_to_string, args( "src" ) );
    def( "random_uuid", random_uuid );
    def( "uuid_bytes", uuid_bytes, args( "src" ) );
    ...
}

Actually, their behavior is not so important, but let's honestly describe their implementation for clarity of the result:

byte_array string_to_bytes( std::string const& src )
{
    return byte_array( src.begin(), src.end() );
}
string bytes_to_string( byte_array const& src )
{
    return string( src.begin(), src.end() );
}
uuid random_uuid()
{
    static random_generator gen_uuid;
    return gen_uuid();
}
byte_array uuid_bytes( uuid const& src )
{
    return byte_array( src.data, src.data + sizeof(src.data) );
}

In general, such a test script (in Python 3.x):

from someconv import *
from uuid import *
...
# test bytes <=> std::vector
print( bytes_to_string( b"I_must_be_string" ) )
print( string_to_bytes( "I_must_be_byte_array" ) )
print( bytes_to_string( " - Привет!".encode() ) )
print( string_to_bytes( " - Пока!" ).decode() )
print( bytes_to_string( string_to_bytes( " - Ну пока!" ) ) )
# test uuid.UUID <=> boost::uuids::uuid
u = random_uuid()
print( 'Generated UUID (C++ module):', uuid_bytes(u) )
print( 'Generated UUID (in Python): ', u.bytes)

It should correctly work out and give the result something like:

I_must_be_string
b'I_must_be_byte_array'
 - Привет!
 - Пока!
 - Ну пока!
Generated UUID (C++ module): b'\xf1B\xdb\xa9



Кстати, если вы для проверки возьмёте и удалите borrowed из ковертации UUID из Python в C++, то свалитесь ровно на последней строчке, так как объект будет уже уничтожен и не у чего будет брать свойство bytes.

Итого


Мы научились не только писать конвертеры, но и обобщать их, сводить трудовые затраты при их написании к минимуму и использовать один из другого. Собственно мы уже знаем что это, как этим пользоваться и где оно жизненно необходимо.

Ссылка на проект лежит здесь (~207 KB). Проект MSVS v11, настроен на сборку с Python 3.3 x64.

Полезные ссылки

Документация Boost.Python
Как написать конвертер строки
Преобразование Unicode в Python 2.x
Преобразование массивов между C++ и Python
Ещё вариант конвертации даты/времени

Теги:
C++
C++11
Python
boost.python
модуль
гибрид
скрипт
python3
класс
class
wrapper
embedded
converter
конвертер