Performance in Python. Easy way
- Transfer
- Tutorial
I always knew that one of the advantages of Python is the ability to rewrite the most inhibitory pieces of C code and increase the speed of the program to unattainable interpreted language heights. But I have never tried it myself, because thought it was too complicated. After reading this article I don’t think so anymore.
Programmers familiar with ctypes are unlikely to find something interesting here, but for beginners, I ask for cat.
Ctypes is a Python mechanism for importing functions from external libraries.
% timeit - IPython shell magic function that measures the execution time of an expression in Python
Ctypes is great! Let's start with a small banal example: summing numbers in a certain range.
Here is a Python implementation of this function
Excellent! But what if we try to summarize a really large range of numbers, for example from 0 to 10 ** 8 (i.e. 100,000,000)
Not so much fun anymore. Let's try something else:
What will come of it?
Wow ... It's even worse ... This time I won’t even try 10 ** 9.
So how do we speed up execution? Just don’t offer mathematical optimizations ... we are in the new world of computers! ( in the original: don't suggest math tricks ... this is the new world of computing! )
Yes, I know that the complexity of the algorithm is a constant and does not depend on the value of the argument, n * (n + 1) / 2. But the article is not devoted to this.
What about ctypes?
Save it with the name sumrange.c and compile it (we will not use optimizations for the purity of the experiment):
Import into Python what happened:
And the Oscar gets ...
Summary Summary:
Infernal performance boost!
For Node.js hackers, there is the ctypes equivalent - FFI (Foreign Function Interface): github.com/rbranson/node-ffi
Programmers familiar with ctypes are unlikely to find something interesting here, but for beginners, I ask for cat.
Ctypes is a Python mechanism for importing functions from external libraries.
% timeit - IPython shell magic function that measures the execution time of an expression in Python
Ctypes is great! Let's start with a small banal example: summing numbers in a certain range.
Here is a Python implementation of this function
def sumrange(arg):
return sum(xrange(arg))
Excellent! But what if we try to summarize a really large range of numbers, for example from 0 to 10 ** 8 (i.e. 100,000,000)
In [2]: %timeit sumrange(10**2)
1000000 loops, best of 3: 1.53 us per loop
In [3]: %timeit sumrange(10**8)
1 loops, best of 3: 9.77 s per loop
In [4]: %timeit sumrange(10**9)
1 loops, best of 3: 97.8 s per loop
Not so much fun anymore. Let's try something else:
def sumrange2(arg):
x = i = 0
while i < arg:
x += i
i += 1
return x
What will come of it?
In [10]: %timeit sumrange2(10**2)
100000 loops, best of 3: 10.5 us per loop
In [11]: %timeit sumrange2(10**8)
1 loops, best of 3: 18.5 s per loop
Wow ... It's even worse ... This time I won’t even try 10 ** 9.
So how do we speed up execution? Just don’t offer mathematical optimizations ... we are in the new world of computers! ( in the original: don't suggest math tricks ... this is the new world of computing! )
Yes, I know that the complexity of the algorithm is a constant and does not depend on the value of the argument, n * (n + 1) / 2. But the article is not devoted to this.
What about ctypes?
#include
unsigned long long sumrange(unsigned long long arg)
{
unsigned long long i, x;
x = 0;
for (i = 0; i < arg; i++) {
x = x + i;
}
return x;
} Save it with the name sumrange.c and compile it (we will not use optimizations for the purity of the experiment):
$ gcc -shared -Wl,-install_name,sumrange.so -o sumrange.so -fPIC sumrange.cImport into Python what happened:
import ctypes
sumrange_ctypes = ctypes.CDLL('./sumrange.so').sumrange
sumrange_ctypes.restype = ctypes.c_ulonglong
sumrange_ctypes.argtypes = ctypes.c_ulonglong,
And the Oscar gets ...
In [15]: %timeit sumrange_ctypes(10**2)
1000000 loops, best of 3: 1.28 us per loop
In [16]: %timeit sumrange_ctypes(10**8)
1 loops, best of 3: 381 ms per loop
In [17]: %timeit sumrange_ctypes(10**9)
1 loops, best of 3: 3.79 s per loop
In [18]: %timeit sumrange_ctypes(10**10)
1 loops, best of 3: 37.8 s per loop
Summary Summary:
| 10 ** 2 | 10 ** 8 | 10 ** 9 | 10 ** 10 | |
|---|---|---|---|---|
| Pure Python Method # 1 | 1.53 μs | 9.77 s | 97.8 s | - |
| Pure Python Method 2 | 10.5 μs | 18.5 s | - | - |
| ctypes | 1.28 μs | 381 ms | 3.79 s | 37.8 s |
Infernal performance boost!
For Node.js hackers, there is the ctypes equivalent - FFI (Foreign Function Interface): github.com/rbranson/node-ffi