Performance in Python. Easy way

Original author: James Robert
  • Transfer
  • Tutorial
I always knew that one of the advantages of Python is the ability to rewrite the most inhibitory pieces of C code and increase the speed of the program to unattainable interpreted language heights. But I have never tried it myself, because thought it was too complicated. After reading this article I don’t think so anymore.

Programmers familiar with ctypes are unlikely to find something interesting here, but for beginners, I ask for cat.

Ctypes is a Python mechanism for importing functions from external libraries.
% timeit - IPython shell magic function that measures the execution time of an expression in Python


Ctypes is great! Let's start with a small banal example: summing numbers in a certain range.
Here is a Python implementation of this function
def sumrange(arg):
    return sum(xrange(arg))

Excellent! But what if we try to summarize a really large range of numbers, for example from 0 to 10 ** 8 (i.e. 100,000,000)
In [2]: %timeit sumrange(10**2)
1000000 loops, best of 3: 1.53 us per loop
In [3]: %timeit sumrange(10**8)
1 loops, best of 3: 9.77 s per loop
In [4]: %timeit sumrange(10**9)
1 loops, best of 3: 97.8 s per loop

Not so much fun anymore. Let's try something else:
def sumrange2(arg):
    x = i = 0
    while i < arg:
        x += i
        i += 1
    return x

What will come of it?
In [10]: %timeit sumrange2(10**2)
100000 loops, best of 3: 10.5 us per loop
In [11]: %timeit sumrange2(10**8)
1 loops, best of 3: 18.5 s per loop

Wow ... It's even worse ... This time I won’t even try 10 ** 9.
So how do we speed up execution? Just don’t offer mathematical optimizations ... we are in the new world of computers! ( in the original: don't suggest math tricks ... this is the new world of computing! )
Yes, I know that the complexity of the algorithm is a constant and does not depend on the value of the argument, n * (n + 1) / 2. But the article is not devoted to this.

What about ctypes?

#include 
unsigned long long sumrange(unsigned long long arg)
{
    unsigned long long i, x;
    x = 0;
    for (i = 0; i < arg; i++) {
        x = x + i;
    }
    return x;
}

Save it with the name sumrange.c and compile it (we will not use optimizations for the purity of the experiment):
$ gcc -shared -Wl,-install_name,sumrange.so -o sumrange.so -fPIC sumrange.c

Import into Python what happened:
import ctypes
sumrange_ctypes = ctypes.CDLL('./sumrange.so').sumrange
sumrange_ctypes.restype = ctypes.c_ulonglong
sumrange_ctypes.argtypes = ctypes.c_ulonglong,


And the Oscar gets ...

In [15]: %timeit sumrange_ctypes(10**2)
1000000 loops, best of 3: 1.28 us per loop
In [16]: %timeit sumrange_ctypes(10**8)
1 loops, best of 3: 381 ms per loop
In [17]: %timeit sumrange_ctypes(10**9)
1 loops, best of 3: 3.79 s per loop
In [18]: %timeit sumrange_ctypes(10**10)
1 loops, best of 3: 37.8 s per loop


Summary Summary:
10 ** 210 ** 810 ** 910 ** 10
Pure Python Method # 11.53 μs9.77 s97.8 s-
Pure Python Method 210.5 μs18.5 s--
ctypes1.28 μs381 ms3.79 s37.8 s


Infernal performance boost!

For Node.js hackers, there is the ctypes equivalent - FFI (Foreign Function Interface): github.com/rbranson/node-ffi

Also popular now: