Serve everything

Not so long ago, in a rather distant galaxy, on a provincial planet there were famous descendants of monkeys who were so lazy that they decided to invent artificial intelligence. "So what?" They thought. It’s good to have the Overmind in advisersA “brain” that will think for you when necessary, your problems will be quickly solved, and it’s even better than a living creature can ever do ... And, without thinking about the consequences, they began to reverse their monkey brains and the cognitive process disassemble building bricks. They thought, thought and thought up, you won’t believe it - a neuron model, a mathematical learning algorithm, and then neural networks with different topologies pulled up. Of course, this did not work to say very well. There were a lot of shortcomings, compared with natural intelligence, but a certain range of problems, these models allowed us to solve with reasonable accuracy. And slowly, digitized and serialized skills began to appear in the form of neural network models. Today, dear lovers of the history of the universe,

About creation and training of models of neural networks (skills) on Habré is written a lot, therefore we will not talk about it today. Having trained or received serialized AI skills, we expect to use them in our target information systems, and here a problem arises. What works at the laboratory stand cannot be transferred to production in its original form, it is necessary to implement the entire associated technology stack and even make significant modifications to the target platform (there are, of course, exceptions in the form of CoreML, but this is a special case and only for Apple equipment). In addition, there are a great many tools for the development and serialization of models, is it really necessary for everyone to develop a separate integration solution? In addition, even in the laboratory, it is often necessary to get a quick conclusion from the model, without waiting for the loading of the entire associated development stack.
As a suggestion for solving these problems, I would like to tell you about a relatively new opensource tool, which, perhaps, will be useful for you when developing projects related to AI.

0Mind(read ZeroMind) - a free skill server. The solution is a modular, universal, easily extensible application server with elements of a framework for serving (highly accessible output) heterogeneous machine learning models. The server is ugly in Python 3 and uses Tornado for asynchronous request processing. Regardless of which machine learning framework was used to prepare and serialize the model, 0Mind makes it easy to use a skill or group of skills using the universal REST API. In fact, the solution is an asynchronous web server with a REST API, unified for working with AI skill models, and a set of adapters for various machine learning frameworks. You may have worked with tensorflow serving - this is a similar solution, but 0Mind is not tied to the tf stack and can serve several models of different frameworks on the same port. Thus, instead of implementing the entire technology stack for deriving AI models in the target information system, you can use the simple and familiar REST API to the skill of interest, in addition, the prepared model remains on the server and does not end up in the software distribution. In order not to confuse once again with complex terms, let's move on to examples of use and begin to cast console spells.


Everything is simple here:

git clone 0Mind

Now we have a working server instance. Install the dependencies:

cd 0Mind
pip3 install -r requirements.txt

Or if you use Conda:

conda install --yes --file requirements.txt

An important caveat is that the server supports several machine learning frameworks , and in order not to add all of them depending on it and not to install with it, you yourself decide which framework frameworks you will load on the host with the 0Mind instance, install and configure these tools independently.


The entry point or main server executable is .
Possible startup options are -c or --config_file with the path to the configuration file. By default, 0Mind uses the configs / model_pool_config.json file as the configuration file . The server also uses the file sonfigs / logger.json management standard logging module the logging the Python.

For the purpose of demonstrating the capabilities, we can leave the default configuration file intact. Read more about the configuration in the official documentation .

The main server settings are: id, host, port, tasks.

id - (number) unique identifier of the model pool (used for balancing and addressing in a distributed network of pools)
host - (string) network address or domain name of the given host
port - (number) on which port you want to host the 0Mind service (should be free on given host)
tasks - (list of objects) list of tasks loaded with the service (may be empty). In the default config, the CNN_MNIST demo model prepared by Keras is loaded, and we will use it to demonstrate the capabilities.

Additional (optional) configuration parameters:

model_types- (list of lines) you can limit the types of loaded models to this pool by specifying them in this list. If the list is empty, then there are no restrictions.

debug - (Boolean type) is responsible for enabling or disabling the debug mode of Tornado. In debug mode, in case of errors, extended error information is returned to stdout, which is useful when developing extensions.


The main thing in 0Mind is the list of supported frameworks and the REST API features .

Requests to the REST API can be performed using a browser or http utilities. In this guide, as well as in the documentation for the server, we will use cURL as the most simple and affordable tool for open systems.

At the moment, the 0Mind API has only 10 requests:

1. http: // $ HOST: $ PORT / info - general information about the 0Mind
2 instance . Http: // $ HOST: $ PORT / info / system - system information about the host on which runs 0Mind
3. http: // $ HOST: $ PORT / info / task - information about the specified task
4. http: // $ HOST: $ PORT / info / tasks - list of tasks of the 0Mind instance
5. http: // $ HOST: $ PORT / model / list - a list of identifiers of models loaded into pool
6. http: // $ HOST: $ PORT / model / info - displays interface information about model
7. http: // $ HOST: $ PORT / model / load - uploads a new model to pool
8. http: // $ HOST: $ PORT / model / drop - unloads a previously loaded model from pool
9. http: // $ HOST: $ PORT / model / predict - requests output from the model
10.http: // $ HOST: $ PORT / command / stop - stops the 0Mind service and ends its process


You can start a server instance, for example, like this:


For example, we’ll get general information about a running server instance:


{"service": "ModelPool", "id": 1, "options": {"debug": false}, "version": [1, 1, 4]}

Ok, now we find out which models are loaded into the pool:


{"id": 1, "check_sum": "4d8a15e3cc35750f016ce15a43937620", "models": ["1"]}

Now let's clarify the interface of the loaded model with identifier “1”:


{"inputs": {"0": {"name": "conv2d_1_input:0", "type": "float32", "shape": [null, 28, 28, 1]}}, "outputs": {"0": {"name": "dense_2/Softmax:0", "type": "float32", "shape": [null, 10]}}, "tool": "keras"}

It remains to find out with which filters the model is loaded. To do this, we clarify the details of the task of loading the model with the identifier "1":


{"id": "1", "model_file": "ML/models/mnist_cnn_model.keras", "model_type": "keras", "input_filters": {"conv2d_1_input:0": ["i_img_file_to_ns_arr.ImageFileToNormAndScaledNPArrayFilter"]}, "output_filters": {}}

As you can see, our model has one input filter - i_img_file_to_ns_arr.ImageFileToNormAndScaledNPArrayFilter and it filters the input with the name - conv2d_1_input: 0. This filter simply converts the specified image file into a tensor and scales it in accordance with the model input. Filters are another great 0Mind generalized tool. Since the preprocessing and postprocessing of data for models is the same, you can simply accumulate these filters for quick use in further work with other models, indicating the desired task as an attribute to load the model.

Data output from the model (inference)

Well, now we have all the information necessary for inference, we can get a conclusion from the model. As input, we use the image from the test suite included in the 0Mind samples / image5.png distribution :

curl -d '{"conv2d_1_input:0": [{"image_file": "samples/image5.png"}]}' -H "Content-Type:application/json" -X POST

The only input of the conv2d_1_input: 0 model with the i_img_file_to_ns_arr.ImageFileToNormAndScaledNPArrayFilter filter is the data in the format accepted by the filter - [{"image_file": "samples / image5.png"}]. In response from 0Mind, we get the model output:

{"result": {"dense_2/Softmax:0": [[2.190017217283827e-21, 1.6761866200587505e-11, 2.2447325167271673e-14, 0.00011080023978138342, 1.881280855367115e-17, 0.9998891353607178, 1.6690393796396863e-16, 9.67975005705668e-12, 1.1265206161566871e-13, 2.086113400079359e-13]]}, "model_time": 0.002135753631591797}

So, the only output of the “dense_2 / Softmax: 0” model (see information about the model above) gave us the model's confidence vector in the classification of this image. As you can see, the highest probability is 0.99 for a class with an index of 6 (classes are numbers 0-9), which corresponds to number 5 . Thus, the model successfully coped with the recognition of the manuscript and gave a conclusion with high confidence. The inference time of the model on the 0Mind host was 0.002135753631591797 seconds, because the output was on a regular x86 CPU.

Dynamic loading and unloading of models

Now unload our model from the pool:


{"result": true, "unload_time": 0.000152587890625, "memory_released": 0, "model_id": "1"}

We load the same model again, but now with a different identifier (“new”) and an output filter of the io_argmax.ArgMaxFilter model, which will most likely derive the index from the model confidence vector. We will have to change the indices of the inputs and outputs of the model - this is due to the features of Keras:

curl -d '{"id": "new", "output_filters": {"dense_2_1/Softmax:0": ["io_argmax.ArgMaxFilter"]}, "model_file": "ML/models/mnist_cnn_model.keras", "input_filters": {"conv2d_1_input_1:0": ["i_img_file_to_ns_arr.ImageFileToNormAndScaledNPArrayFilter"]}, "model_type": "keras"}' -H "Content-Type:application/json" -X POST

{"result": true, "load_time": 0.45618462562561035, "memory_consumed": 16183296, "model_id": "new"}

And now we ask the model to recognize for us two images at once in one request samples / image5.png and samples / image1.png :

curl -d '{"conv2d_1_input:0": [{"image_file": "samples/image5.png"}, {"image_file": "samples/image1.png"}]}' -H "Content-Type:application/json" -X POST

{"result": {"dense_2_1/Softmax:0": [5, 1]}, "model_time": 0.003907206535339355}

The demo model was not mistaken again.


Expanding the capabilities of 0Mind is not difficult, thanks to its modular architecture, the use of popular tools and good code conventions in the project. The main extension vectors can be:

  1. Adapters are interlayer classes for working with new machine learning and neural network frameworks.
  2. Filters are data handlers for entering and leaving skill models.
  3. Request handlers - allow you to add new functionality to the 0Mind API requests and responses.

Also popular now: