Go for big data

Original author: Daniel W.
  • Transfer

In this post, we’ll talk about using the Intel Data Analytics Acceleration Library (Intel DAAL) with the Go programming language for batch, interactive, and distributed processing.

Based on Go , the most advanced infrastructure projects were built, including Kubernetes , Docker , Consul , etcd and many others. Go is becoming the preferred language for DevOps, web servers and microservices. This language is easy to learn, easy to deploy, it is very fast, it has an excellent set of development tools for it.

Processing and analysis of data are used in business more and more, therefore, it is necessary to implement resource-intensive computational algorithms at all levels of the infrastructure of companies, including at those levels where the Go language is used. A logical question arises: how to integrate such solutions as machine learning, distributed data conversion and interactive data analysis into Go-based systems?

One way to reliably, quickly, and scalably process data in Go is to use the Intel Data Analytics Acceleration Library (Intel DAAL) in Go programs. This library provides batch, interactive, and distributed processing algorithms for a range of useful tasks.



Because Go works well with C / C ++, you can implement this functionality in Go programs without too much difficulty. At the same time, we will significantly benefit in speed: these libraries are already optimized for Intel architecture. As shown here , in certain operations, for example when analyzing key components, Intel DAAL can run seven times faster than Spark with MLlib. It's very nice! It would be very useful to use such power in Go applications.

Install Intel DAAL


Intel DAAL is available as open source , follow these instructions to install it . On my Linux computer, it was incredibly simple.

  1. Download source code.
  2. Running the installation script.
  3. Setting the necessary environment variables (you can also use the provided shell script for this).

Before integrating Intel DAAL into any Go program, it makes sense to make sure that everything works correctly. You can use the various getting started guides in the Intel DAAL documentation for this . In particular, these manuals provide an example of an Intel DAAL application for the Cholesky decomposition algorithm . Below we will try to create it in Go language. The initial example of the Cholesky decomposition algorithm in C ++ looks like this.

/****************************************************************************
!  Copyright(C) 2014-2017 Intel Corporation. All Rights Reserved.
!
!  The source code, information and material ("Material") contained herein is
!  owned by Intel Corporation or its suppliers or licensors, and title to such
!  Material remains with Intel Corporation or its suppliers or licensors. The
!  Material contains proprietary information of Intel or its suppliers and
!  licensors. The Material is protected by worldwide copyright laws and treaty
!  provisions. No part of the Material may be used, copied, reproduced,
!  modified, published, uploaded, posted, transmitted, distributed or disclosed
!  in any way without Intel's prior express written permission. No license
!  under any patent, copyright or other intellectual property rights in the
!  Material is granted to or conferred upon you, either expressly, by
!  implication, inducement, estoppel or otherwise. Any license under such
!  intellectual property rights must be express and approved by Intel in
!  writing.
!
!  *Third Party trademarks are the property of their respective owners.
!
!  Unless otherwise agreed by Intel in writing, you may not remove or alter
!  this notice or any other notice embedded in Materials by Intel or Intel's
!  suppliers or licensors in any way.
!
!****************************************************************************
!  Content:
!    Cholesky decomposition sample program.
!***************************************************************************/
#include "daal.h"
#include 
using namespace daal;
using namespace daal::algorithms;
using namespace daal::data_management;
using namespace daal::services;
const size_t dimension = 3;
double inputArray[dimension *dimension] =
{
    1.0, 2.0, 4.0,
    2.0, 13.0, 23.0,
    4.0, 23.0, 77.0
};
int main(int argc, char *argv[])
{
    /* Create input numeric table from array */
    SharedPtr inputData = SharedPtr(new Matrix(dimension, dimension, inputArray));
    /* Create the algorithm object for computation of the Cholesky decomposition using the default method */
    cholesky::Batch<> algorithm;
    /* Set input for the algorithm */
    algorithm.input.set(cholesky::data, inputData);
    /* Compute Cholesky decomposition */
    algorithm.compute();
    /* Get pointer to Cholesky factor */
    SharedPtr factor =
        staticPointerCast(algorithm.getResult()->get(cholesky::choleskyFactor));
    /* Print the first element of the Cholesky factor */
    std::cout << "The first element of the Cholesky factor: " 
<< (*factor)[0][0];
    return 0;
}

Try compiling and running this code to make sure Intel DAAL is successfully installed. In addition, this will give you an idea of ​​what we will do in Go. Any questions and problems related to installing Intel DAAL can be discussed on the Intel DAAL forum (for me personally, this forum turned out to be an extremely useful resource when I started trying to work with Intel DAAL).

Using Intel DAAL in Go Programs


If we are talking about using the Intel DAAL library in Go programs, we have several possible options.

  1. Directly invoking Intel DAAL from Go through a wrapper function.
  2. Creating a reusable library with specific Intel DAAL functionality.

Below I demonstrate both of these approaches. All source code is available here . This is just one example. It would be nice if over time we could add other Go programs with Intel DAAL to this repository. For experiments, please send inquiries. I would be very interested to see what you create.

If you have not used Go before, then I recommend that you familiarize yourself with this language before continuing with this article. Note that Go does not even need to be installed on the local computer in order to start learning it. You can take advantage of the introduction to Go on the Internet and the Go Playground website , and only then, when you are ready, you can install Go on your local computer .

Call Intel DAAL library directly from Go


Go provides a tool called cgo that allows you to create Go packages that invoke C code. In this case, we will use cgo to organize the interaction of our Go program with the Intel DAAL library.

By the way, the use of cgo with Go programs is subject to certain restrictions, which are discussed in sufficient detail on the Internet (in particular, see the discussion by Dave Cheney or this articleCockroach Labs). When deciding to use cgo, always take these limitations into account, or at least just remember them. In this case, we are ready to reconcile with the limitations of cgo in order to take advantage of the optimized distributed library Intel DAAL: these limitations will more than pay off with increased performance in certain cases with high computational load or with large amounts of data.

To integrate the Cholesky decomposition algorithm from Intel DAAL into the Go program, you will need to create the following folder structure (in the $ GOPATH directory ).

cholesky`
├── cholesky.go`
├── cholesky.hxx`
└── cholesky.cxx`


The cholesky.go file is our Go program that will use the Cholesky decomposition algorithm from the Intel DAAL library. The cholesky.cxx and cholesky.hxx files are C ++ definitions / declarations that include Intel DAAL and tell the cgo compiler which Intel DAAL functionality we will use. Let's consider each of them.

First, the * .cxx file.

#include "cholesky.hxx"
#include "daal.h"
#include 
using namespace daal;
using namespace daal::algorithms;
using namespace daal::data_management;
using namespace daal::services;
int choleskyDecompose(int dimension, double inputArray[]) {
    /* Create input numeric table from array */
    SharedPtr inputData = SharedPtr(new Matrix(dimension, dimension, inputArray));
    /* Create the algorithm object for computation of the Cholesky decomposition using the default method */
    cholesky::Batch<> algorithm;
    /* Set input for the algorithm */
    algorithm.input.set(cholesky::data, inputData);
    /* Compute Cholesky decomposition */
    algorithm.compute();
    /* Get pointer to Cholesky factor */
    SharedPtr factor =
        staticPointerCast(algorithm.getResult()->get(cholesky::choleskyFactor));
    /* Return the first element of the Cholesky factor */
    return (*factor)[0][0];
}

Now the * .hxx file.


#ifndef CHOLESKY_H
#define CHOLESKY_H
// __cplusplus gets defined when a C++ compiler processes the file.
// extern "C" is needed so the C++ compiler exports the symbols w/out name issues.
#ifdef __cplusplus
extern "C" {
#endif
int choleskyDecompose(int dimension, double inputArray[]);
#ifdef __cplusplus
}
#endif
#endif

These files define the C ++ choleskyDecompose wrapper function that uses the Cholesky Intel DAAL decomposition algorithm to decompose the input matrix and output the first element of the Cholesky multiplier (as in the example in the Intel DAAL Getting Started Guide). Please note that in this case, our input data is an array of matrix dimension length (i.e., a 3 x 3 matrix corresponds to an input array of length 9). You need to include extern “C” in the * .hxx file. In this case, the C ++ compiler will “know” that it is necessary to export the corresponding names defined in our C ++ files.

After defining the shell function of the Cholesky decomposition in the * .cxx and * .hxx files, you can call this function directly from Go. cholesky.go looks like this.

package main
// #cgo CXXFLAGS: -I$DAALINCLUDE
// #cgo LDFLAGS: -L$DAALLIB -ldaal_core -ldaal_sequential -lpthread -lm
// #include "cholesky.hxx"
import "C"
import (
	"fmt"
	"unsafe"
)
func main() {
	// Define the input matrix as an array.
	inputArray := [9]float64{
		1.0, 2.0, 4.0,
		2.0, 13.0, 23.0,
		4.0, 23.0, 77.0,
	}
	// Get the first Cholesky decomposition factor.
	data := (*C.double)(unsafe.Pointer(&inputArray[0]))
	factor := C.choleskyDecompose(3, data)
	// Output the first Cholesky dcomposition factor to stdout.
	fmt.Printf("The first Cholesky decomp. factor is: %d\n", factor)
}

Let's look at this process step by step to understand what is happening here. First you need to tell Go that you need to use cgo when compiling the program, and you also need to compile with certain flags.

// #cgo CXXFLAGS: -I$DAALINCLUDE
// #cgo LDFLAGS: -L$DAALLIB -ldaal_core -ldaal_sequential -lpthread -lm
// #include "cholesky.hxx"
import "C"

To use, import “C” is required: this is a pseudo package reporting the use of cgo. If there is a comment immediately before the import command “C”, then this comment (called the preamble) will be used as a header when compiling the C ++ components of this package.

Using CXXFLAGS and LDFLAGS, you can specify the compilation and link flags that cgo should use when compiling, and you can add our C ++ function using // #include “cholesky.hxx”. To compile this example, I used Linux and gcc, as indicated above using the appropriate flags. However, you can follow this guide to determine how to build the application with Intel DAAL.

After that, you can write Go code in the same way as for any other program, and access our wrapper function as C.choleskyDecompose () .

// Define the input matrix as an array.
inputArray := [9]float64{
	1.0, 2.0, 4.0,
	2.0, 13.0, 23.0,
	4.0, 23.0, 77.0,
}
// Get the first Cholesky decomposition factor.
data := (*C.double)(unsafe.Pointer(&inputArray[0]))
factor := C.choleskyDecompose(3, data)
// Output the first Cholesky dcomposition factor to stdout.
fmt.Printf("The first Cholesky decomp. factor is: %d\n", factor)

A unique feature in this case (it is due to the use of cgo) is that you need to convert the pointer to the first element of our float64 slice to an unsafe pointer, which can then be explicitly converted to a * C.double pointer (compatible with C ++) for our function choleskyDecompose. Packing in an unsafe pointer allows us to bypass the type safety restrictions in force in Go programs.
Excellent! So, we have the Go program, which called the Cholesky decomposition algorithm from the Intel DAAL library. Now it's time to build and run this program. This can be done in the usual way with go build.

$ ls
cholesky.cxx  cholesky.go  cholesky.hxx
$ go build
$ ls
cholesky  cholesky.cxx  cholesky.go  cholesky.hxx
$ ./cholesky 
The first Cholesky decomp. factor is: 1
$ 

And the result is ready! Of course, the first multiplier of Cholesky's decomposition is 1. We have successfully used the Intel DAAL library directly from Go. However, our Go program looks rather strange with unsafe pointers and C code fragments. In addition, this is a one-time solution. Now let's try to implement the same functionality as a reusable Go package, which can be imported in the same way as any other Go package.

Create a reusable Go package with Intel DAAL


To create a Go package containing Intel DAAL functionality, we will use the SWIG program . In Go, in addition to using cgo, you can invoke SWIG during assembly to compile Go packages that implement C / C ++ functionality. For such an assembly, you will need to create the following folder structure. In this case, the * .cxx and * .hxx shell files may remain the same. But now you need to add the * .swigcxx file. This file looks like this.

choleskylib
├── cholesky.go
├── cholesky.hxx
├── cholesky.cxx
└── cholesky.swigcxx




%{
#include "cholesky.hxx"
%}
%include "cholesky.hxx"

Now the SWIG program creates a wrapper code for the Cholesky decomposition function, which allows using this code as a Go package.

In addition, we create a reusable Go package (rather than a standalone application), so the * .go file may not include package main or function main. It should just determine the name of our package. In this case, let's call it cholesky. Now cholesky.go will look like this.

package cholesky
// #cgo CXXFLAGS: -I$DAALINCLUDE
// #cgo LDFLAGS: -L$DAALLIB -ldaal_core -ldaal_sequential -lpthread -lm
import "C"

(Again, specify the files in the header.)

Now you can build the package and install it locally.

$ ls
cholesky.cxx  cholesky.go  cholesky.hxx  cholesky.swigcxx
$ go install
$ 

This command compiles all the necessary binaries and libraries accessed by the Go program using this package. Go “sees” that there is a * .swigcxx file in our folder, and automatically uses SWIG to build the package.

Sumptuously! Now we have the Go package using Intel DAAL. Let's see how the import and use of the package work.

package main
import (
	"fmt"
	"github.com/dwhitena/daal-go/choleskylib"
)
func main() {
	// Define the input matrix as an array.
	inputArray := [9]float64{
		1.0, 2.0, 4.0,
		2.0, 13.0, 23.0,
		4.0, 23.0, 77.0,
	}
	// Get the first Cholesky decomposition factor.
	factor := cholesky.CholeskyDecompose(3, &inputArray[0])
	// Output the first Cholesky dcomposition factor to stdout.
	fmt.Printf("The first Cholesky decomp. factor is: %d\n", factor)
}

Class! This code is much cleaner than using Intel DAAL directly. You can import the Cholesky algorithm package, like any other Go package, and call the wrapped function as cholesky.CholeskyDecompose (...) . In addition, all unsafe components were automatically processed in SWIG. Now you can simply pass the address of the first element of our original float64 slice to cholesky.CholeskyDecompose (...) .

This program, like any other Go program, can be compiled and run with the go build command:

$ ls
main.go
$ go build
$ ls
example  main.go
$ ./example 
The first Cholesky decomp. factor is: 1
$ 

Hurrah! All is correct. Now you can use this package in other Go programs if we need the Cholesky decomposition algorithm.

Conclusions and Resources


Using Intel DAAL, cgo and SWIG, we were able to integrate the optimized Cholesky decomposition algorithm into Go programs. Of course, the possibilities are not limited only to this algorithm. Similarly, you can create programs and packages in Go that use any algorithms implemented in Intel DAAL. You can create neural networks with batch, interactive, and distributed processing, clustering, acceleration, co-filtering, and other features directly in Go applications.

All the code used above is available here .

Go Programming Resources

  • Join Gophers on Slack and talk with other members of the # data-science channel who are involved in big data, data analysis, machine learning, and other similar solutions using Go.
  • Visit the GopherData organization’s website , where users interact with developers of data management, processing, analysis, machine learning, and Go visualization tools.
  • Track GopherData on Twitter.
  • Use (and replenish) the growing list of tools for Go .

DAAL Resources


about the author


Daniel (@dwhitena) is a PhD, an experienced data researcher, and works for Pachyderm (@pachydermIO). He is developing modern distributed data pipelines, including predictive models, data visualization, statistical analysis and other features. He spoke at conferences around the world (ODSC, Spark Summit, Datapalooza, DevFest Siberia, GopherCon b lheubt), teaches data research and analysis at Ardan Labs (@ardanlabs), supports the Go core for Jupyter and is actively involved in the development of various projects open source data mining.

Also popular now: