beduin01 February 22, 2016 at 13:04

D std.ndslice as Python Numpy replacement

Recovery mode

Foreword: I have been writing in Python for over 6 years and can call myself a professional in this language. Recently, I even wrote a book about him . However, for the last 8 months I switched to D and for 4 months I have been actively involved in the development of this language in terms of expanding the standard Phobos library. I also participated in the code review of the std.ndslice module, which will be discussed.

std.ndslice, like Numpy, is designed to work with multidimensional arrays. However, unlike Numpy, ndslice has an extremely low overhead because it is based on ranges (ranges) that are used everywhere in the regular library. Ranges allow you to avoid unnecessary copying procedures, as well as beautifully organize lazy calculations.

In this article, I would like to talk about the advantages that std.ndslice gives compared to Numpy.

Why should Python programmers look towards D?

Because D allows you to write almost the same simple and clear code as Python, while this code will work much faster than Python code.

The following example creates a range of numbers from 0 to 999 using a function iota(an analog in Python is called xrange) and returns an array of dimension 5x5x40.

import std.range : iota;
import std.experimental.ndslice;
void main() {
    auto slice = sliced(iota(1000), 5, 5, 40);
}

Although D is a statically typed language, and an explicit type indication increases the readability of the code, so Python programmers will find it easier to use automatic typing using a keyword auto.

The function slicedreturns a multidimensional slice. At the input, it can accept both simple arrays and ranges. As a result, we got a 5x5x40 cube consisting of numbers from 0 to 999.

A few words about what Ranges is. It is more correct to translate them into Russian as ranges. Ranges allow you to describe the rules for enumerating data of any type of data, whether it be a class or structure. It’s enough for you to implement the function:, frontreturning the first element of the range popFront,, moving to the next element andemptyreturns a boolean value indicating that the iterated sequence is empty. Rangesallow you to perform lazy busting by accessing data as needed. The Ranges concept is described in more detail here .

Please note that no empty memory allocations! This is because it iotaalso allows you to generate lazy ranges, and slicedin lazy mode it also takes data from iotaand processes it as it arrives.

As you can see std.ndslice works a little differently than Numpy. Numpy creates its own type for data, while std.ndslice provides a way to manipulate existing data. This allows you to use the same data throughout your program without wasting resources on useless memory allocation! It’s easy to guess that this has a very serious effect on the performance of your solutions.

Let's look at the following example. In it, we will receive data from stdin, filter only unique rows, sort them, and then return to stdout.

import std.stdio;
import std.array;
import std.algorithm;
void main() {
    stdin
        // get stdin as a range
        .byLine(KeepTerminator.yes)
        .uniq
        // stdin is immutable, so we need a copy
        .map!(a => a.idup)
        .array
        .sort
        // stdout.lockingTextWriter() is an output range, meaning values can be
        // inserted into to it, which in this case will be sent to stdout
        .copy(stdout.lockingTextWriter());
}

If you want to deal with lazy range generation in more detail, I recommend reading this article.

Since it slicehas three dimensions, it is a range that returns a range of ranges. This is clearly seen in the following example:

import std.range : iota;
import std.stdio : writeln;
import std.experimental.ndslice;
void main() {
    auto slice = sliced(iota(1000), 5, 5, 40);
    foreach (item; slice) {
        writeln(item);
    }
}

The result of his work will be as follows (ellipses for short):

[[0, 1, ... 38, 39], [40, 41, ... 78, 79], [80, 81, ... 118, 119], [120, 121, ... 158, 159], [160, 161, ... 198, 199]]
...
[[800, 801, ... 838, 839], [840, 841, ... 878, 879], [880, 881, ... 918, 919], [920, 921, ... 958, 959], [960, 961, ... 998, 999]]

The loop foreachworks almost the same as forin Python. However, in D, you can use it both in the C style and in the manner of loops in Python, but without trouble with enumerateor xrange.

Using UFCS (Uniform Function Call Syntax) you can rewrite the code in the following manner:

import std.range : iota;
import std.experimental.ndslice;
void main() {
    auto slice = 1000.iota.sliced(5, 5, 40);
}

UFCS allows you to record method calls in a chain and write:

a.func(b)

instead:

func(a, b)

Let's now generate an empty project using the dub package manager . We will write dub initin the command and in the file \source\app.d:

import std.experimental.ndslice;
void main() {
}

Currently std.experimental.ndslice;located in the section std.experimental. This does not mean that it is raw. This means that he needs to insist.

Now we will assemble the project with the command:

dub

The D ndslice module is very similar to Numpy:

a = numpy.arange(1000).reshape((5, 5, 40))
b = a[2:-1, 1, 10:20]

Equivalently:

auto a = 1000.iota.sliced(5, 5, 40);
auto b = a[2 .. $, 1, 10 .. 20];

Now let's create a two-dimensional array and get the middle of each column.

Python:

import numpy
data = numpy.arange(100000).reshape((100, 1000))
means = numpy.mean(data, axis=0)

import std.range;
import std.algorithm.iteration;
import std.experimental.ndslice;
import std.array : array;
void main() {
    auto means = 100_000.iota
        .sliced(100, 1000)
        .transposed
        .map!(r => sum(r) / r.length)
        .array;
}

In order for this code to not work in lazy mode, I had to call a method array. However, in a real application, the result will not be calculated while it is used in another part of the program.

There are currently no built-in statistics modules in Phobos . Therefore, the example uses a simple lambda to find the average value. The function map!has an exclamation mark at the end. This means that this is a boilerplate function. It allows you to generate code at the compilation stage based on the expression specified in its body. Here is a good article on how to work themselves into the D templates .

Although the code in D turned out to be a little more verbose than in Python, but thanksmap!the code will work with any input that is a range. While Python code will only work with special arrays from Numpy.

Here I must say that after this test it turned out that Python lost D at times and after much discussion at Hacker News, I realized that I made a mistake and the comparison was not entirely correct. iotadynamically creates the data that the function takes sliced. And accordingly, we do not touch the memory until the moment of its last relocation. D also returns an array with a data type longwhile Numpy from double. As a result, I rewrote the code and brought the value of the array to 1000 000 instead of 10 000. Here's what happened:

import std.range : iota;
import std.array : array;
import std.algorithm;
import std.datetime;
import std.conv : to;
import std.stdio;
import std.experimental.ndslice;
enum test_count = 10_000;
double[] means;
int[] data;
void f0() {
    means = data
        .sliced(100, 10000)
        .transposed
        .map!(r => sum(r, 0L) / cast(double) r.length)
        .array;
}
void main() {
    data = 1_000_000.iota.array;
    auto r = benchmark!(f0)(test_count);
    auto f0Result = to!Duration(r[0] / test_count);
    f0Result.writeln;
}

Tests conducted on a 2015 MacBook Pro with a 2.9 GHz Intel Core Broadwell i5. In Python, I used a function %timeitin D to measure speed std.datetime.benchmark. I compiled with the help of all LDC v0.17 with the following keys: ldmd2 -release -inline -boundscheck=off -O. Or if you use dub then options will be analogous to these keys dub --build=release-nobounds --compiler=ldmd2.

Here are the results of the first test:

Python: 145 µs
LDC:      5 µs
D is 29x faster

Here is the test of the corrected version:

Python: 1.43 msec
LDC:  628    μs
D is 2.27x faster

Agree that it’s not a bad difference considering that Numpy is written in C, and in D everyone scolds the garbage collector.

How does D avoid Numpy problems?

Yes, Numpy is fast, but fast only when compared to simple Python arrays. But the problem is that it is only partially compatible with these simple arrays.

The Numpy library is somewhere on the side of the rest of Python. She lives her life. It uses its own functions, it works with its types. For example, if we need to use an array created in Python in NumPy, we will need to use one np.asarraythat will copy it into a new variable. A quick github search revealed that thousands of projects use this crutch. While the data could be simply transferred from one function to another without these empty copies.

import numpy as np
a = [[0.2,0.5,0.3], [0.2,0.5,0.3]]
p = np.asarray(a)
y = np.asarray([0,1])

They are trying to solve this problem by rewriting part of the Python standard library to use Numpy types. However, this is still not a complete solution, which leads to wonderful jokes when writing:

sum(a)

instead:

a.sum()

we get a 10x drop in speed.

D simply does not have such problems by design. It is a compiled, statically typed language. During code generation, all types of variables are known. In std.ndslice itself, you get full access to all the functions of the Phobos library, for example, such wonderful things as std.algorithm and std.range. Oh yes, you can use the D templates to generate code right at the compilation stage.

Here is an example:

import std.range : iota;
import std.algorithm.iteration : sum;
import std.experimental.ndslice;
void main() {
    auto slice = 1000.iota.sliced(5, 5, 40);
    auto result = slice
        // sum expects an input range of numerical values, so to get one
        // we call std.experimental.ndslice.byElement to get the unwound
        // range
        .byElement
        .sum;
}

You take and just use the function sumand it just takes and works, just like any other function from the base library.

In Python itself, in order to get a list of defined lengths initialized with a specific value, we need to write:

a = [0] * 1000

Numpy will be completely different:

a = numpy.zeros((1000))

And if you do not use Numpy, then your code will be 4 times slower than the unnecessary copy operation that eats memory. Range comes to our aid in D, which allows us to do the same operation quickly and without empty copy operations:

auto a = repeat(0, 1000).array;

And if necessary, we can immediately call ndslice:

auto a = repeat(0, 1000).array.sliced(5, 5, 40);

The main advantage of Numpy at present is its prevalence. Now it is used really very widely from banking systems to machine learning. There are a lot of books, examples and articles on it. However, the mathematical possibilities in D will obviously be expanded very soon. So the author of ndslice said that he is currently working on BLAS (Basic Linear Algebra Subprograms) for Phobos, which will also be fully integrated with ndslice and the standard library.

A powerful mathematical subsystem will allow you to very effectively solve a number of problems where you need to work with big data. For example, computer vision systems. A prototype of one of these systems is already being developed and is called DCV .

Image Processing on D

The following example will show how the median filter works to eliminate noise in the image. The function movingWindowByChannelcan also be used in other filters where the use of a sliding window is required. movingWindowByChannelallows you to navigate through the image using a sliding window. Each such window is passed to a filter that calculates pixel values based on the selected zone.

This function does not process areas with partial overlap. However, with its help, you can calculate the values in them too. To do this, create an enlarged image with the edges reflecting the borders of the original image and then process it.

/**
Params:
    filter = unary function. Dimension window 2D is the argument.
    image = image dimensions `(h, w, c)`,
        where с is the number of channels in the image
    nr = number of rows in the window
    nс = number of columns in the window
Returns:
    image dimensions `(h - nr + 1, w - nc + 1, c)`,
        where с is the number of channels in the image.
        Dense data layout is guaranteed.
*/
Slice!(3, C*) movingWindowByChannel(alias filter, C)
(Slice!(3, C*) image, size_t nr, size_t nc)
{
    // local imports in D work much like Python's local imports,
    // meaning if your code never runs this function, these will
    // never be imported because this function wasn't compiled
    import std.algorithm.iteration: map;
    import std.array: array;
    // 0. 3D
    // The last dimension represents the color channel.
    auto wnds = image
        // 1. 2D composed of 1D
        // Packs the last dimension.
        .pack!1
        // 2. 2D composed of 2D composed of 1D
        // Splits image into overlapping windows.
        .windows(nr, nc)
        // 3. 5D
        // Unpacks the windows.
        .unpack
        // 4. 5D
        // Brings the color channel dimension to the third position.
        .transposed!(0, 1, 4)
        // 5. 3D Composed of 2D
        // Packs the last two dimensions.
        .pack!2;
    return wnds
        // 6. Range composed of 2D
        // Gathers all windows in the range.
        .byElement
        // 7. Range composed of pixels
        // 2D to pixel lazy conversion.
        .map!filter
        // 8. `C[]`
        // The only memory allocation in this function.
        .array
        // 9. 3D
        // Returns slice with corresponding shape.
        .sliced(wnds.shape);
}

The following function is an example of how you can calculate the median of an object. The function is greatly simplified in order to increase readability.

/**
Params:
    r = input range
    buf = buffer with length no less than the number of elements in `r`
Returns:
    median value over the range `r`
*/
T median(Range, T)(Range r, T[] buf)
{
    import std.algorithm.sorting: sort;
    size_t n;
    foreach (e; r) {
        buf[n++] = e;
    }
    buf[0 .. n].sort();
    immutable m = n >> 1;
    return n & 1 ? buf[m] : cast(T)((buf[m - 1] + buf[m]) / 2);
}

Well, now Main itself:

void main(string[] args)
{
    import std.conv: to;
    import std.getopt: getopt, defaultGetoptPrinter;
    import std.path: stripExtension;
    // In D, getopt is part of the standard library
    uint nr, nc, def = 3;
    auto helpInformation = args.getopt(
        "nr", "number of rows in window, default value is " ~ def.to!string, &nr,
        "nc", "number of columns in window, default value is equal to nr", &nc);
    if (helpInformation.helpWanted)
    {
        defaultGetoptPrinter(
            "Usage: median-filter [] []\noptions:",
            helpInformation.options);
        return;
    }
    if (!nr) nr = def;
    if (!nc) nc = nr;
    auto buf = new ubyte[nr * nc];
    foreach (name; args[1 .. $])
    {
        import imageformats; // can be found at code.dlang.org
        IFImage image = read_image(name);
        auto ret = image.pixels
            .sliced(cast(size_t)image.h, cast(size_t)image.w, cast(size_t)image.c)
            .movingWindowByChannel
                !(window => median(window.byElement, buf))
                 (nr, nc);
        write_image(
            name.stripExtension ~ "_filtered.png",
            ret.length!1,
            ret.length!0,
            (&ret[0, 0, 0])[0 .. ret.elementsCount]);
    }
}

If not all the examples are shown you clear recommend that you read the free version of the wonderful book Programming in D .

PS If you know how to translate this publication into the status of "translations", then write in private.

Tags:

D std.ndslice as Python Numpy replacement

Why should Python programmers look towards D?

How does D avoid Numpy problems?

Image Processing on D

Also popular now: