How I switched from Python to Julia (and why)

From the sandbox

A little background about Python

Python is a great language. I tried several languages before it: Pascal at school; C, C with classes, C ++ - at the university. The last two (three) have instilled a persistent aversion to programming: instead of solving the problem, you are busy with allocations and destructors (scary words from the past), thinking in terms of low-level primitives. My opinion is that C is not suitable for solving educational and scientific problems (at least in the field of mathematics). I am sure that I will be objected, but I am not trying to impose anything on anyone, I just express my opinion.

Python has become a revelation in its time. For the first time in my life, I wrote several levels of abstraction higher than is customary in Xi. The distance between the task and the code has decreased as never before.

I probably would have written it all my life in Python if I didn’t have to suddenly implement NIST statistical tests. It would seem that the task is very simple: there is an array of several lengths (> = 10) megabytes, there is a set of tests that need to be applied to this array.

What is good numpy?

In python for working with arrays there is a good numpy package, which is essentially a high-level interface on fast libraries like LAPACK. It would seem - the entire gentleman's set for scientific computing is available (Sympy, Numpy, Scipy, Matplotlib), what more could you ask for?

Everyone who has dealt with numpy knows that he is good, but not in everything. It is necessary to try to make the operations vectorized (as in R), i.e. in a sense, uniform for the whole array. If suddenly your task for some reason does not fit into this paradigm, then you have problems.

What kind of non-vectorized tasks are I talking about? Yes, at least take the same NIST: calculate the length of the linear shift register according to the Berlekamp-Messi algorithm; calculate the length of the longest subsequence of consecutive units, and so on. I do not exclude the possibility that there is some ingenious solution that will reduce the task to a vectorized one.

Clever?

Как пример из того же NIST: необходимо было подсчитать число «переключений» последовательности, где под «переключением» я имею в виду смену подряд идущих единиц (...1111...) на подряд идущие нули (...000...), и наоборот. Для этого можно взять исходную последовательность без последнего элемента (x[: -1]) и вычесть из неё последовательность, сдвинутую на 1 (x[2: ]), а затем рассчитать число ненулевых элементов в полученной новой последовательности. Всё вместе будет выглядеть как:

np.count_nonzero(x[:-1] - x[1:])

Это может выглядеть как занимательная разминка для ума, но по сути здесь происходит нечто неестественное, некий трюк, который будет неочевиден читателю через некоторое непродолжительное время. Не говоря уже о том, что это всё равно медленно — аллокацию памяти никто не отменял.

Non-vectorized operations in Python are long. How to deal with them if numpy is not enough? Usually offer several solutions:

Numba jit. If she worked as described on the main page of Numba, then I think it would be worthwhile to finish the story. Over the prescription I had forgotten a little what had gone wrong with her; the bottom line is that the acceleration was not as impressive as I expected, alas.
Cython. OK, raise your hands to those who believe that cython is a really beautiful, elegant solution that does not destroy the very meaning and spirit of Python? I do not think so; if you’ve gotten to Cython, then you can stop deceiving yourself and move on to something less sophisticated like C ++ and C.
Rewrite bottlenecks on C and jerk them out of your favorite Python. OK, but what if I have the whole program - it is solid calculations and bottlenecks? C does not offer! I'm not talking about the fact that in this decision you need to know well not one, but two languages - Python and Xi.

Here comes the JULIA!

Namayavshis with existing solutions and not finding (having failed to program) anything good, I decided to rewrite it with something “faster”. In fact, if you write the “number grind” in the 21st century with normal array support, vectorized out-of-box operations, etc. etc., then the choice you have is not particularly thick:

Fortran . And do not laugh, which of us did not pull the BLAS / LAPACK of their favorite languages? Fortran is a really good (better SI!) Language for SCIENTIFIC calculations. I did not take it for the reason that since the times of Fortran they have already invented a lot of things and added them to programming languages; I was hoping for something more modern.
R suffers from the same problems as Python (vectorization).
Matlab - probably yes, I unfortunately have no money to check.
Julia is a dark horse, it will take off — it won't take off (and it was natural until stable-version 1.0)

Some obvious advantages of Julia

It looks like Python, in any case, the same "high-level", with the ability, if necessary, to go down to the bits in the numbers.
There is no messing around with memory allocations and things like that.
Powerful type system. Types are prescribed optionally and very dosage. You can write a program without specifying a single type - and if you do it STRONGLY, then the program will be fast. But there are nuances.
It's easy to write custom types that are as fast as built-in types.
Multiple dispatch. You can talk about it for hours, but in my opinion, this is the best thing about Julia, it is the basis for the design of all programs and in general the basis of the philosophy of language.
Thanks to multiple dispatch, many things are written very uniformly.
Examples with multiple dispatch
```
rand()               # выдать случайное число в диапазоне от 0 до 1rand(10)          # массив длины 10 из случайных чисел от 0 до 1rand(3,5)         # матрица размера 3 х 5 из случайных ....
using Distributions
d = Normal()   # Нормальное распределение с параметрами 0, 1rand(d)             # случайное нормально распределенное числоrand(d, 10)      # массив длины 10 ... и так далее
```
То есть, первым аргументом может быть любое (одномерное) распределение из Distributions, но вызов функции останется буквально тем же. И не только распределение (можно передавать собственно сам ГСЧ в виде объекта MersenneTwister и многое другое). Ещё один (на мой взгляд показательный) пример — навигация в DataFrames без loc/iloc.
6. Arrays are native, embedded. Vectorization out of the box.

Cons, silent about which would be a crime!

New language. On the one hand, of course, a minus. In some ways, perhaps immature. On the other hand, he took into account the rakes of many past languages, stands “on the shoulders of giants,” absorbed many interesting and new things.
Immediately fast programs are unlikely to write. There are some unobvious things that are very easy to deal with: you step on a rake, ask the community for help, you step in again ... etc. Basically it is type instability, type instability and global variables. In general, as far as I can judge by myself, a programmer who wants to write quickly on Julia goes through several stages: a) he writes like in Python. This is great and possible, but sometimes there will be performance problems. b) writes as in C: prescribing manic types wherever possible. c) gradually understands where it is necessary (very metered) to prescribe types, and where they interfere.
Ecosystem. Some packages are raw, in the sense that something is constantly falling off somewhere; some are mature, but not compatible with each other (conversion to other types is necessary, for example). On the one hand, this is obviously bad; on the other hand, there are many interesting and bold ideas in Julia just because “we are standing on the shoulders of giants” - a tremendous experience has been gained in attacking the rake and “how not to do it”, and this is taken into account (partially) by the developers of the packages.
The numbering of the arrays with 1. Ha, kidding, this is of course a plus! No, well, seriously, what's the matter, with what index does the numbering begin? You get used to it in 5 minutes. Nobody complains that integers are called integer, not whole. These are all matters of taste. And yes, take at least the same Cormen on the algorithms - there everything is numbered from one, and sometimes it is inconvenient to rework in Python.

Well, what about the speed of something?

It is time to remember for the sake of which everything was started.

Note: I safely forgot Python, so write your improvements in the comments, I will try to run them on my laptop and see the execution time.

So, two small examples with microbench marks.

Something vectorized

Задача: на вход функции подаётся вектор X из 0 и 1. Необходимо преобразовать его в вектор из 1 и (-1) (1 ->1, 0 -> -1) и подсчитать, сколько коэф-тов из преобразования Фурье данного вектора по абсолютному значению превосходят границу boundary.

defprocess_fft(x, boundary):return np.count_nonzero(np.abs(np.fft.fft(2*x-1)) > boundary)
%timeit process_fft(x, 2000)
84.1 ms ± 335 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

functionprocess_fft(x, boundary)
    count(t -> t > boundary, abs.(fft(2*x.-1)))
end
@benchmark process_fft(x, 2000)
BenchmarkTools.Trial:
  memory estimate:  106.81 MiB
  allocs estimate:  61--------------
  minimum time:     80.233 ms (4.75% GC)
  median time:      80.746 ms (4.70% GC)
  mean time:        85.000 ms (8.67% GC)
  maximum time:     205.831 ms (52.27% GC)
  --------------
  samples:          59
  evals/sample:     1

Ничего удивительного мы здесь и не увидим — все всё равно считают не сами, а передают в хорошо оптимизированные фортран-программы.

Something non-vectorized

На вход подается массив из 0 и 1. Найти длину самой длинной подпоследовательности из подряд идущих единиц.

deflongest(x):
    maxLen = 0
    currLen = 0# This will count the number of ones in the blockfor bit in x:
        if bit == 1:
            currLen += 1
            maxLen = max(maxLen, currLen)
        else:
            maxLen = max(maxLen, currLen)
            currLen = 0return max(maxLen, currLen)
%timeit longest(x)
599 ms ± 639 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)

function longest(x)
    maxLen = 0
    currLen = 0
    # This will count the number of ones in the block
    for bit in x
        if bit == 1
            currLen += 1
            maxLen = max(maxLen, currLen)
        else
            maxLen = max(maxLen, currLen)
            currLen = 0
        endendreturnmax(maxLen, currLen)
end
@benchmark longest(x)
BenchmarkTools.Trial:
  memory estimate:  0bytes
  allocs estimate:  0--------------minimumtime:     9.094 ms (0.00% GC)
  mediantime:      9.248 ms (0.00% GC)
  mean time:        9.319 ms (0.00% GC)
  maximum time:     12.177 ms (0.00% GC)
  --------------
  samples:          536
  evals/sample:     1

Разница очевидна «невооруженным» взглядом. Советы по «допиливанию» и/или векторизации numpy-версии приветствуются. Хочу также обратить внимание, что программы практически идентичны. Для примера я не прописал ни одного типа в Julia (сравни с предыдущим) — всё равно всё работает быстро.

I also want to note that the versions presented were not included in the final program, but were further optimized; here they are given as an example and without unnecessary complications (forwarding additional memory in Julia to do rfft in-place, etc.).

What happened in the end?

The final code for Python and for Julia I will not show: this is a secret (at least for now). At the time when I quit to finish the python version, she ran all the NIST tests on an array of 16 megabytes of binary characters in ~ 50 seconds. On Julia at the moment all tests are run on the same volume ~ 20 seconds. It may well be that I am a fool, and there was space for optimizations, etc. etc. But this is me, what I am, with all its advantages and disadvantages, and in my opinion, it is necessary to consider not the spherical speed of programs in vacuum in programming languages, but how I can program it (without any really gross errors).

Why did I write all this?

People here became interested; I decided to write when the time comes. In the future, I think to go through Julia more thoroughly, with an example of solving some typical problems. What for? More languages, good and different, and let everyone who wants to find something that suits him personally.

Tags: