Machine learning and mobile development

As a rule, the data scientist has a vague idea of mobile development, and mobile application developers do not engage in machine learning. Andrei Volodin - Prisma AI engineer lives at the junction of these two worlds and told Podlodka, the leader of the podcast, what it is.

Taking advantage of the moment, Stas Tsyganov (Tutu.ru) and Gleb Novik (Tinkoff Bank), first of all, once and for all made it clear that no one was teaching neural networks on mobile devices . They also figured out that machine learning, unfortunately, is not a magician; discussed modern techniques like deep learning, reinforcement learning, and capsule networks.

As a result, since Podlodka is an audio show about mobile development, they came to her and learned how it all works for mobile devices.

Next is the text version of this conversation, and the podcast recording is here .

About Andrei Volodin, cocos2d and Fiber2d

GLEB: Tell us a little about yourself, please. What do you do?

ANDREW: I’m a mobile developer, but I’m doing very little classical iOS development. My responsibilities almost do not include working with UIKit. I am the main developer of the Cocos2d game engine that is quite popular on GitHub. At the moment I am in the position of GPU engineer in Prisma. My responsibilities include the integration of neural networks on video cards and working with augmented reality, in particular, with the VR-kit.

GLEB: Cool! Particularly interesting about cocos2d. As far as I know, this framework appeared quite a long time ago.

ANDREW: Yes, somewhere in 2009.

GLEB: Have you applied it from the very beginning?

ANDREW: No. I became the main developer only in 2015. Before that I was a core contributor. Apportable, which funded the development, went bankrupt, the people who received the development money went away, and I became the lead. Now I am an administrator on the forum, helping new users with some problems, the last few versions have been released by me. That is, I am the main maintainer at the moment.

GLEB: But is cocos2d still alive?

ANDREY: Already rather not, primarily due to the fact that it is written in Objective-C, and there is a lot of legacy there. For example, I support my old toys, written with its use, other developers - my legacy projects. From current engines you could hear about Fiber2d. This is also my project.

Fiber2d is the first Swift game engine that was ported to Android. We launched a game entirely written in Swift, and on iOS, and on Android. About this, too, can be found on Github. This is the next milestone in the development of the cocos2d community.

Pro machine learning on fingers

GLEB: Let's start moving gradually to our topic today. Today we will talk about machine learning and everything around it - connected and unrelated to mobile phones. To begin with, let's see what machine learning is all about. We will try to explain as much as possible on the fingers, because not all mobile developers are well acquainted with it. Can you tell us what it is?

ANDREW: If you follow the classical definition, machine learning is a search for patterns in the data set. A classic example is neural networks, which are now very popular. Among them are the networks associated with the classification. A simple example of a classification task is the definition of what is drawn in the picture: there is some kind of image, and we want to understand what it is: a dog, a cat, or something.

To write it with the help of standard code is very difficult, because it is not clear how to do it. Therefore, mathematical models are used, which are commonly called machine learning. They are based on the fact that certain patterns are extracted from a large number of examples, and then, using these patterns, one can make predictions with some accuracy on new examples that were not in the original data set. This, if in a nutshell.

GLEB: Accordingly, training is a story about changing a model with the help of a training dataset?

ANDREY: During training, the model, as a rule, remains constant. That is, you choose some kind of architecture and teach it. If we take for example neural networks, by which all machine learning is not limited, there initially, roughly speaking, all weights are zeros or other identical values. In the course of how we feed our data to the learning framework, the weights change slightly with each new example, and at the end they are poured into a trained machine.

STAS: The ultimate destination of this model is to quickly get the result by submitting some data not from the training sample?

ANDREW: Yes, but it's not just about speed. For example, some tasks could not be solved in a different way — say, an example with classification is very nontrivial. Before the classification nets were fired, there were no solutions to understand what is shown in the picture, especially not. That is, in some areas it is a revolutionary technology.

About manual labor and machine learning

STAS: I recently told my grandmother what machine learning is. She initially thought that machine learning is when a machine teaches someone. I began to explain to her that, in fact, on the contrary, we are trying to teach the machine to perform some task later.

I presented the problems that machine learning solves. Most of them, before machine learning fired, were performed by people. Moreover, it was considered not so low-skilled work, but not very high-tech, so to speak. These are the simplest operations that a person can perform in many respects. Can you imagine?

ANDREW: You can say that too. In fact, now such work is still needed, but only to prepare datasets for machine learning. Indeed, in some areas, for example, in medicine, machine learning makes it possible to smooth out routine tasks and ease the process somewhat. But not always. I would not say that machine learning is closed on facilitating dull work. Sometimes it does a pretty intellectual job.

STAS: Can you give an example of such intellectual work?

ANDREI: For example, our Prisma application - surely many people used it (this is not an advertisement!) It’s not that this is intellectual work and people often redraw the image into pictures, and the neural network does it - you give it a regular picture and get something new. Then you can argue about whether it is beautiful or not, but it’s undeniable that it’s something that a person cannot do, or it takes a tremendous amount of time.

About history

GLEB: Yes, I think this is a great example. Probably worth a little turn to history. How long has this theme evolved? It seems to me that almost from the very beginning of programming, in any case, very, very long time ago.

ANDREY: Yes, in general, most of the concepts that are now being applied have already been developed in the 90s. Naturally, now there are new algorithms, and the quality of the then algorithms has improved. And although there is a feeling that a sudden interest in machine learning arose from nowhere, in fact people have been interested in it for a long time.

Progress in the first stages was due to the fact that these are mostly mathematical models, and mathematics has stabilized for a long time in terms of discoveries.

The current explosion is solely due to the factthat the power of iron around us has grown greatly , primarily due to the use of video cards. Due to the fact that today we are able to do huge parallel computing, new technologies have appeared - machine learning, cryptocurrency, etc.

For the most part, the current interest and the current wave in general are due to the fact that this has simply become possible . These calculations could be done earlier, but catastrophically long. Now they take quite a reasonable time, and so everyone began to use it.

About iron

STAS: I am passing a course now, and there, including, I need to train all sorts of models. I teach some of them on my working MacBook. Yes, in some cases you have to wait, maybe 5 minutes, and the models are not the best, the average accuracy is around 85%, but most importantly - they work. It is clear that in battle I want to have this percentage better and possibly for production it does not quite fit.

ANDREW: Yes, such models are probably not very interesting. Most likely, this is due to the simplest predictions and other things. In reality, for example, our training sample can weigh 90GB, and this can all be taught for a week. Companies such as Nvidia, boast that now they have released a new special Tesla video card and you can train Inception V3 in 24 hours! This is considered a direct breakthrough, because earlier it took several weeks.

The more dataset and the more complex the structure of the model, the longer the training takes. But the problem with performance is not only that. In principle, if you really need, you can wait a month. The problem is related to inference - how then to apply this neural network. It is necessary that during its use, it also showed a good result in terms of performance.

STAS: Because, among other things, I want everything to work on mobile devices, and work quickly.

ANDREY: I don’t think that initially it began to develop with a view to working on mobile applications. Such a boom began somewhere in 2011, and then all the same these were desktop solutions. But now the true interest of the community is supported by the fact that on iPhones, including, it has become possible to launch networks that work in real time.

GLEB: Stas, you said that the final result depends on how powerful your video card is and on the system in general. That is otherwise not working?

ANDREY: It is not, but I'm not sure that the model will be trained on a low-power car.

GLEB: By the way, I remember, 5 years ago, when there was just a boom in neural networks, our teachers said that everything new is just a good old oblivion. All this was already in the 70s-80s and it would not work, if it did not work out then. Probably, they were still wrong.

ANDREW: Yes. For some tasks, machine learning is now very strongly fired. Objectively, we can say that they work.

About deep learning

GLEB: There is such a fashionable thing - deep learning. How is it different from what we talked about for this?

ANDREY: I would not say that there are differences. There are just some subsets of machine learning, and a huge amount of them. It must be understood that what is called deep learning is that part of machine learning that is commonly referred to as neural networks . It is called deep because there are many layers in neural networks, and the more layers there are, the deeper the neural network is. From this came the name.

But there are other types of machine learning. For example, machine learning, built on trees, has been successfully used for face tracking so far, because it is much faster than neurons. It is also used for ranking, advertising and other things.

So deep learning is not something else. In fact, this is a subset of machine learning, which includes a lot of things. Just deep learning has become today the most popular.

About the theory of neural networks

STAS: I wanted to talk a little about the theory of neural networks, I will try more simply. You said that there are many layers in them. In theory, if we have one layer, and there are some objects located on a plane, with the help of one layer we can actually divide this plane into two parts, right?

ANDREW: No, not really.

STAS: What gives us a large number of layers, if on the fingers?

ANDREW: What is a neural network? Let's just break it down. It's just a mathematical function that takes a set of numbers as input, and a set of numbers also gives an output — that's all.

What's inside? Nowadays, the most popular are convolutional networks, within which convolution occurs - just a lot of matrix multiplications on each other, the results are added up, these operations are performed in each layer. Plus between the layers there is a so-called activation, which just allows neural networks to be deep.

Since the combination of linear transformations is a linear transformation, having made 10 linear layers, they can still be represented as one linear layer. In order for the layers not to collapse, there are certain mathematical operations between them that make the function nonlinear. This is necessary in order to increase the number of parameters.

Roughly speaking, a neural network is simply a huge array of numbers, which are then somehow applied to our data, for example, to a picture. But the picture is also a set of numbers in fact - it's just a series of pixels. When we train the network, we consider, for example, 15 million parameters (each number is a separate parameter), which can be slightly shifted with the help of some heuristics slightly to the left, slightly to the right. Thanks to such a huge number of parameters such steep results are obtained.

Deep learning is needed precisely to ensure that these parameters were many, and everything did not collapse to one layer.

GLEB: It seems more or less clear.

ANDREW: Deep learning is a subset of machine learning. But for some reason, HYIP was raised on this topic - especially some time ago, from all the gaps, it seems to me, one could hear about deep learning. I do not know whether it is justified or not.

GLEB: I think that such popularity is due to the fact that it gives impressive results.

About tasks

STAS: With neural networks, most machine learning tasks can be solved, right?

ANDREW: Yes.

STAS: Let's talk then, what tasks can be solved using machine learning methods?

ANDREW: Actually, this is a sensitive topic, because in reality you need to stop idealizing and romanticizing what is happening. As I said, there is no artificial intelligence. This is a purely mathematical model and a mathematical function that multiplies something, etc.

It seems from the outside that now machine learning has stalled a bit on certain categories of tasks. These are, for example, classification (an example about which we spoke at the very beginning), tracking of objects and their segmentation. The last is in our application Sticky AI - it highlights the person, and the background removes. There is also biological medical segmentation, when, for example, cancer cells are detected. There are generative networks that learn from a set of random numbers, and then they can create something new. There are tasks Style Transfer and others.

But at the moment there is no convenient platform and infrastructure for the use of machine learning. For example, you have some kind of problem that you, as a person, solve easily, but as a programmer, you cannot solve it due to complexity and because you cannot simply write an imperative algorithm. But at the same time, it is not possible to train a neural network either, primarily because there is a problem with a lack of data. In order to train a neuron, we need large datasets with a lot of examples, besides very strongly formalized ones, described in a certain regulation, etc. Plus, we need the architecture of this neural network.

That is, you first need to formalize the input data in the form of numbers, make the architecture of the model itself, then formalize the output data in the form of numbers, somehow interpret them. To do this, you need a fairly powerful mathematical apparatus and in general an understanding of how everything works. Therefore, now the use of neurons, as it seems to me, outside specialized companies like ours, sags a little.

Some tasks that had not been solved before, neurons learned to solve very coolly. But there is no such thing that the neurons came and solved the whole range of unsolved problems.

GLEB: In what areas do you see global tasks for which neural networks are not suitable at all?

ANDREY: It's hard to answer right off the bat. We meet tasks on which we work and on which it is impossible to train a neural network. For example, now the game industry is very interested in learning, and even there are some neurons that have artificial intelligence. But, for example, in AAA games this is not yet used, because it is still impossible at this point to train the artificial intelligence of an abstract soldier to behave like a person so that it looks natural. Complicated.

About Dotu

STAS: Have you heard that artificial intelligence is already winning DotA?

ANDREW: Yes, but it is still somewhat different. DotA is a pretty mathematical game, it can be described. I do not want to offend anyone, but in fact, like checkers, the game is one and the same. There are certain rules, and you just play them.

But while there are difficulties to create some kind of natural behavior, associated primarily with a small amount of data and a small number of engineers who know how to do it.

For example, in Google, engineers are trying to teach a 3D model of a person to walk using neural networks - just to make it move. It always looks awful, people don't walk like that.

About TensorFlow

STAS: You said that now in fact there is no easy and cheap way, without understanding machine learning at all, to solve machine learning problems. In any case, it turns out that it is necessary to fumble. I would like to know about TensorFlow. It seems that Google is trying to make it so that even people who do not really understand all this and who do not have a very large background can solve some simple tasks. Tell us what TensorFlow is and how do you think that is possible?

ANDREW: Let's go in order. TensorFlow is not really the easiest thing of all.. This is one of the most popular so-called learning frameworks - a general-purpose learning framework, not necessarily a neuron. This is not the highest level existing framework. There are, for example, Keras, which is a higher level abstraction on TensorFlow. There you can do the same with much less code.

Typical tasks are solved quite simply, because, in particular, Github is already full of examples and repositories. For example, if your company does a search for pictures for a bookstore, then you, in principle, everything is fine. You go to Github, there are examples of how you can take features of a picture, you write a search on features - everything is ready!

In fact, this is the case with a large number of tasks. If you can shift your problems to already solved or so problems in machine learning in typical ways, then you are cool. But if you have something directly unique, such that you cannot code, and you also need something that is commonly called artificial intelligence, then I would not say that TensorFlow is the simplest way. To understand it, you need to spend a decent amount of time.

STAS: I understand, on the Google side, what they offer along with the framework and power plays?

ANDREY: Yes, they are very actively developing now, and in general they are striving to make everybody in the community fumble through neurons. They have a platform on which, if you are ready to share your results, they will give you servers and stuff. The only condition that you have to share this with everyone, that is, the platform is not intended for commercial use.

About problems

GLEB: Let's go over the problems that are currently in the field of machine learning. What is most relevant? First is the data.

ANDREW: Yes. Recently, I even saw a meme - a sad picture and the caption "When I found a cool architecture, but there is no dataset to train it." Indeed, such a problem exists.

GLEB: Well, but let's say we have a large dataset, where the model is trained for a week. How to solve problems with time? Roughly speaking, for us it is like a compilation, only compilation goes at worst 15 minutes on some hard iOS project, usually still faster. There is the same situation, only two weeks already. If to adjust, this time will be gone.

ANDREW: Two weeks are probably the realities that are already gone. Now it is a day - two - three. But your fears are taking place. Indeed, it happens that the work of R & D is that they think: “Oh, let's try this!” - they change something, start training, and after two days they find out that the metrics fell by 0.5%: “OK , it didn't work, let's try this! ”And they wait again. Such a problem does exist, and one cannot get away from it.

STAS: I would like to return to the story about my poor laptop, about the fact that, after all, the coolness of the model and the success of its predictions are probably related to the time of training. Is there such a correlation? I will give an example. You can first make sure that your model more or less solves your problem - say, 70%. After that, you understand that everything will work better for you if you add the number of signs. The more signs, the longer everything will be short. Let you run for a day, but almost you will be sure that the model will work better.

ANDREY: Yes, this optimization works, rather, in the early stages. When you improve an existing model, it does not work that way. Just in order to realize some advantage, we sometimes have to retrain the model from scratch. This is a really time consuming process.

STAS: I felt that all my examples are at the second grade level.

About the work of developers

GLEB: I remembered another well-known picture about the compilation. And then the network is trained.

STAS: If training takes a whole day, what are the developers doing at this time? In the morning I came, made coffee for myself, sent me to study, went home - will we meet tomorrow?

ANDREY: In general, the work related to machine learning itself is not very similar to the classical development. I practically do not do it myself, but I can tell from the experience of my colleagues.

First of all, these are people who have a very strong mathematical background . Mostly they graduated from the faculties of applied mathematics, not computer science. Their professional trait is that they constantly read scientific articles and are constantly looking for new hypotheses.

In reality, this work is more towards science. To program even a rather complex neural network, you need to write about 500 lines in Python, and quite the same type. These people usually do not feel like classic coders. They have no repositories with branches and all that. They operate on several other categories, and their work is different. Often they do very hardcore things, they always have something to do while the neuron is learning.

This is from what I see from the outside. I'm still more a developer, I write more code and integrate the results of their work into mobile applications. But I can say with confidence that this is quite significantly different from the classical work of a programmer.

STAS: Does it mean that, ultimately, the learning model with Python is rewritten to something faster, for example, in C ++? To optimize the code just so that learning happens faster.

ANDREW: No, not exactly. Usually, a model is trained on some learning framework, say, on TensorFlow, and as a result, a model is obtained in the TensorFlow format. Then there are several options for how to run it.

The first option is to run it where TensorFlow itself can run. To do this, you need to compile the TensorFlow kernel - this is a static library of about 1 GB - put it where we want, and run it.

Naturally, this option has limitations. For example, on iOS it is difficult and long. Therefore, most often there are so-called converters, which are able from models of different learning frameworks, it does not matter whether it is Caffe, Torch, TensorFlow, etc., to get weights that you can then apply and use in your production.

Although this is also already gone realia, but literally a year and a half ago, the work of developers like me was as follows. R & D have trained some kind of neuron, it turned out a model. They took out weights from it and wrote them down simply in buffers from real numbers. Then they are given to the developer (me) and they say: “Zakod this neuronka!” - and you write all these layers, then you load the weights from the buffers and run it all. But nobody rewrites the neural network itself to C ++.

That is, these are two separate stages:

Learning, which is an abstraction. That is, it does not matter on which framework it is executed.
Using a trained model in many ways.

About getting datasets

GLEB: Let's go back to the actual problems. We talked about data problems and started talking about engineers.

STAS: Yes, tell me how to solve the problem of getting datasets? Won't you yourself sit and mark a million examples, right?

ANDREW: Unfortunately, this is exactly how it works. The only solution, and the other does not exist and will not exist for a very long time - this is manual labor. Usually it looks like this: there is an army of freelancers who are given a million photos, they visually determine: “So, in this photo is a cat, on this one is a dog”, in some format this is fixed and given to companies. There are special exchanges, for example, Yandex-Toloka, where you can leave the task to mark the data for some fee. Amazon also has its own stock exchange, which mainly employs Indians. This is a pretty big industry.

Now this is solved only by manual labor and investment. In order to collect dataset, you need a decent amount of money to pay for manual labor and then use the marked data.

STAS: I remembered the story about Google and their captcha, where they, at the expense of users, recognized the photos for Google Street Maps. It turns out, it is also from there.

ANDREW: Yes, sometimes companies cheat and somehow use you as freelancers.

About the profession Data Scientist

GLEB: And what about the engineers? I want to understand how things are going on with the market in general - there are not enough of them, too many of them, where to get them?

STAS: I bet on what is missing.

ANDREW: Yes, there is a certain shortage, but not even of engineers, but rather a data scientist. It is not enough people who can do something cool. Now this is an overhead theme. In the same HSE, all the departments with machine learning are overcrowded, everyone wants to study them, every second student has a machine learning diploma. It is everywhere - even on economics already!

But it is not enough hardcore people, they are immediately snapped up. If you show even minimal skills, then you just immediately leave for Mountain View, because they are waiting for you there. This is really true.

STAS: You now threw another coal into the hype train.

ANDREY: It will probably come to naught sooner or later, but now the specialty data scientist is consistently popular. If you do cool stuff, then companies are willing to pay cool money.

GLEB: There are some stereotypes of how programmers look. Is there any difference for data scientists?

ANDREW: I would not say. But they are not like us. Programmers are mostly very meticulous, they like everything to be on the shelves, the Git-repository is perfect and all that. Data scientist is still more a mathematician, the main thing for him is to find a solution to the problem. They don't have such weird things as code review, unit tests — they don't do all this nonsense. They use programming only as a tool, and their main activity is more intelligent.

STAS: Just wanted to say that they do not have a tool cult, as many developers have.

ANDREW: Yes, and they change frameworks almost every week, because as soon as something new appears, for example, Kaffe 2, PyTorch - they try it! They do not have this: "I am a Data scientist at TensorFlow."

About GPU engineers

The main problem is actually with developers like me. It seems to me that it is difficult to find a company that was interested in a developer who writes on Swift, but at the same time deals not with the UI-kit, but with hardcore pieces. Honestly, even on the move, I don’t know who in principle can have such vacancies.

It is clear that these are some technological startups, but startups cannot take everyone. In this sense, I really value my work, even if it is not so profitable at times. There are few engineers who can do this, but the demand is very small. Therefore, there is a compromise between interesting projects and work at the enterprise.

If you are interested in working on something unique, more complex than screen layout, etc., you should definitely look for such vacancies. I think over the years they will appear more, because the market is growing. But I still have little idea where you can get.

ANDREI: In this sense, I really value her very much, because such a job is quite difficult to find. Just usually, GPU development is needed, after all, mostly by gaming companies. For you to be a relatively classic developer, but to deal with hardcore things - in this sense, the labor market is very small and narrow. I do not think that you can immediately get a Junior GPU engineer or trainee. You must first figure it out yourself, gain experience, and come to the company already ready to solve problems on your own.

Pro glands

STAS: It’s time for the glands to lack of pieces of iron.

GLEB: Because everyone has been disassembled into bitcoins, do you want to say?

ANDREY: Yes, Sbertech recently stated that the entire shortage of the video card market was due to them.

GLEB: And why should they - for machine learning or for mining?

ANDREW: I will not undertake to say exactly, but, in my opinion, it’s still for training.

GLEB: In general, is there a problem that not everyone can afford the necessary equipment?

ANDREY: Usually nobody does this at home. Most often it is a rented server, say, on Amazon, which has an elastic payment depending on how many resources you use. In reality, you can do machine learning with Macbook Air. That is, you simply start the calculation on the server through the terminal - somewhere far on the hot video cards it is considered, and then you download the result, that's all.

In reality, nobody teaches anything on laptops, and Nvidia Titan is also not worth anyone’s home, because it’s not profitable. Every year new video cards come out, basically everything is on servers.

About testing

GLEB: How are things going with testing? I have a friend who works at Nvidia and is engaged in testing the performance of programs that teach networks. That's all I heard about machine learning testing. What processes in this sense are there?

ANDREW: Actually now, for the most part, it looks like an analogue of debugging with the help of many print from our world, that is, when you do it a little by hand. But recently there was a NIPS conference at which they said that at Stanford the guys developed a neuron repository and models that read the metrics from several different iterations and variations. They can be compared, look how and what changes. That is, now there is something, but the infrastructure is still somewhat damp - there are a lot of heterogeneous tools that do not work well with each other.

But there is progress. The ONNX standard is now being launched to describe neural networks in a unified style, which has already been accepted by many companies. But still the infrastructure is damp. As far as I can tell, a lot is being done manually. That is, to test, you need to run and see if it works or not. It is clear that you can consider some metrics, but sometimes there is subjective testing on the eye . It still has a place to be.

About benchmarks

STAS: I thought about what benchmarks can be used. I correctly understand that when you teach a model on a specific dataset and on a specific hardware, you can judge how much time it will chase?

ANDREY: No, it does not affect anything at all. Both untrained and trained model will work one amount of time. The trained model differs from the untrained only by the numbers inside, but the number of the numbers is the same. If you specifically operate in terms of neural networks, then the speed directly depends on the thickness of the model - how many layers, how thick these layers are, etc.

There are lightweight neurons that remember not so many parameters. There are heavy ones who remember them more. It is always a compromise - there is no such thing that you take twice as many layers, and your result is twice as good. This is a non-linear correlation. It is always necessary to find a compromise so that for one data set the network is not too thin, learn quite a few criteria and learn correctly. But at the same time it should not be too thick, because the amount of data is still not enough to use everything.

About reinforcement training and go game

GLEB: We have just talked about the most general aspects of machine learning. But there are, especially the last couple of years, breakthroughs, for example, AlphaGo. I thought it was a deep learning, but as it turns out, correctly called this reinforcement training.

ANDREY: Reinforcement training - training with reinforcements. This is a kind of training that works a little differently. Suppose you are teaching a model, for example, looking for a way out of the maze or playing checkers. You take your algorithm, your structure and put into this environment and predetermine a set of actions that can be performed in this environment. The model enumerates the options, conditionally trying to go left or right. Reinforcement training lies in the fact that every time a car performs an action, you either beat it or you say: “All the rules!”

Indeed, there was an AlphaGo algorithm — who does not know if there is such a game of go. For a long time, it was the only game in which the machine could not beat a man. There are so many combinations that it takes a lot of time to figure everything out. A person, in the literal sense of the word, does not search through options, so he can cope with the search for moves faster. The AlphaGo algorithm based on a neural network trained with the help of reinforcement training became the first algorithm that was able to beat a professional go player.

Of course, reinforcement training does not develop so that the machine can win go, there are more important cases, and this is a trend topic in machine learning, which is now actively developing.

GLEB: If I do not confuse anything, the order is ^10,170combinations are astronomical numbers. I watched when the very first match of AlphaGo vs. Lee Sedol was. It was not the coolest player in the world, but one of the strongest. When they beat him, the whole community, of course, drooped. There was a feeling - damn it, they still got to us!

When they played for the second time in a year, the program became completely abnormal, she studied with herself without the initial data, without any initial training. In addition, it has become many times more powerful than the first one, and now it can run on one computer, and previously a distributed network of machines was required. In short, we finally won!

About genetic algorithms

STAS: I want to ask about genetic algorithms. According to the description, it seems that genetic algorithms can also be attributed to learning with reinforcement. As I imagine them, there is a generation, we take each individual subject in a generation, it performs some task, we evaluate its actions, and then, based on these estimates, we select the best ones. Then we cross their specific properties, create a new generation, add a little mutation, and now we have a new generation. And we repeat these operations, trying to increase the final utility of each specific member of this generation. It seems that it is similar in meaning. Is this considered a reinforcement training or not?

ANDREW: No, genetic algorithms are still somewhat different.

STAS: Do they belong to machine learning?

ANDREW: I would not say that. I will not undertake to assert now, but we went through the genetic algorithms at the university, like everyone else, and it seems to me that this thing is somewhat simpler and more ad hoc or, in short, imperative. That is, we know in advance what input will be, output will be. In machine learning, however, things are somewhat different - there is some probability, the accuracy of predictions and everything in this spirit.

Perhaps, people who understand terminology better than me will correct me, but I would say no from top of my head .

STAS: It turns out that genetic algorithms for solving most real-world problems are not used?

ANDREW: Yes, they are mostly more algorithmic and I rarely met them in practice.

Pro capsule nets

GLEB: There is another subset of machine learning - the so-called capsule networks. Again, we will not go too deep. Tell literally in two words what it is and why is this trend now?

ANDREY: This is just a super new topic, it is only a few months old. Jeffrey Hinton released the article and said that current convolutional networks are a road to nowhere, and we offer a new vision of how it will evolve. The community accepted this statement ambiguously and divided into two camps: some say it is an overhead, others say a big thing and all that.

But if you explain it right on your fingers, how do convolutional networks work? Take, for example, neurons that work with images. There is a convolution - a column of matrices that runs through the image with a certain step, as if scanning it. At each iteration of such a small step, this whole convolution is applied to this piece, and each convolution turns into a new conditional pixel, but of much larger dimension, this operation is repeated for all the grids.

But the problem with convolutional networks is such that all the data that come to the first layer reach the very end - maybe not in full, but they all influence and all reach the final stage. Roughly speaking, if you need to identify some part of the image, for example, one cat, you do not need to scan the entire image. It is enough to locate at some point the zone where the cat is most likely located, and only consider it as a person does.

This is how capsule networks work. I will not undertake to expertly explain their insides, but from what I understood: inside the capsule nets there are certain trees, and each subsequent capsule accepts only relevant data as input. That is, all of everything that we initially accepted as input does not pass through them, and with each new layer (I don’t know how to speak in the terminology of capsule nets) only the data that is really needed — only important pieces of data are processed. . This is the key difference between convolutional and capsule networks.

GLEB: It sounds interesting, but I don’t quite understand - is it just about the images in question?

ANDREW: No, that's about everything. I used images just to explain. The key idea is this: let's not chase all the data and all the features, but only those that are relevant to the next layer.

More about games

STAS: I heard that after AlphaGo guys are going to beat everyone in StarCraft?

ANDREY: I’m forced to disappoint you, but I don’t really follow that. It’s not that eSports is interesting to me, but it is already becoming clear that this is the future. For example, there are already startups that teach how to play DotA. They, as a personal trainer, analyze how you play, and say, where you are not good enough, they have their own data trained in cyber sports. There are startups for bets who predict who will win and so on.

A lot of people now work in this area, primarily because a lot of money is spinning in it. But for me personally, this is simply completely uninteresting, so I am not following news and trends, unfortunately.

STAS: What do you think is the difficulty in creating good artificial intelligence for strategic games? I understand correctly that basically this is a very large number of options?

ANDREW: Yes. In fact, we have already discussed this moment, when I explained that in AAA games artificial intelligence is still not used, but at the same time it is in AlphaGo and, perhaps, somewhere else.

The game of go, for all its complexity, consists in the fact that at each step you simply put a piece in order to outline a stone, and the game StarCraft is a very complex thing. There you can send your units along a virtually unlimited number of trajectories, build different sets of your constructions, etc. All of these are parameters.

Plus the difficulty lies in the fact that neural networks do not always think like a person. When we, for example, build a unit, we remember it. But many neurons run every time. Of course, there are recursive networks that can remember their past achievements. They, in particular, are used for translation and textual information, when, as the sentence is generated, the neuronka uses more and more data.

There are enormous difficulties here because all the information and options need to be formalized, that is, to find such a dataset for training, so that it still somehow adequately responds to the actions of your opponent, which can also be a million, unlike the game or chess.

STAS: It is clear - a lot of parameters.

GLEB: But what I don’t understand, it’s clear that DotA has fewer parameters, but it’s still the same in the sense that it’s sent anywhere, etc.

STAS: Andrew here reduced to the fact that, firstly, you have one unit and the number of options is much smaller.

ANDREW: To be honest, I have never played the second Dot ever, but in the original, as far as I know, this is a super deterministic game. There are 3 corridors and towers that need to be destroyed.

GLEB: Yes, but in StarCraft, although I don’t play at all, there are also some ways and the same units. You say that there are many of them, but most likely they are always driven in packs. That is about the same thing happens.

STAS: You still need to arrange each unit correctly during the battle. At that moment, when they are not driven by a bundle, but are being placed, the parameters immediately become larger.

ANDREY: Your problem is that you think in such categories: put a unit, etc., but all the time you forget that a neuron is just a matrix — numbers that multiply. There you have to formalize, for example, such things as tasks. Let's say there is a map for StarCraft and there is some kind of task on it - it doesn’t matter whether to defeat a player or something else. All this needs to be presented in the form of mathematical primitives, and this is the most difficult.

If it really was artificial intelligence, the gap between Dota and StarCraft would be minimal. StarCraft, maybe a little more complicated in mechanics, but still about the same. But due to the fact that we operate with numbers, it is more difficult to formalize.

Pro network learning

STAS: I have the last question I want to ask before we go to our mobile phone. I do not know how it is properly called, but there is a way when one neural network essentially follows the other and tries to find patterns.

ANDREY: I will not undertake to explain now how it works, but I know for sure that there are supercool algorithms that I hear about at work sometimes, when two neural networks learn at the expense of each other. This area of expertise is completely unavailable to me, but it all sounds cool. As far as I know, this is used for generative networks. More, unfortunately, I can not say.

STAS: Good. You gave the most important keywords, the rest of Gleb and readers will easily goof.

About mobile phones (Apple)

GLEB: Let's move on to mobile phones, to which we have long been going. Firstly, what can we do when we talk about machine learning, on mobile devices?

ANDREW: By the way, do you have a podcast for iOS developers?

GLEB: We are not an iOS podcast. Yes, Stas?

STAS: Yes, for mobile developers. What is the question?

ANDREY: Just because the situation is simply very different. Apple, due to the fact that it has always been good with the integration of software and hardware, and it is famous for this, very elegantly hooked on a hype train with machine learning.

In 2014, Apple introduced the Metal Metal API. Such things as computer detectors, etc., were sewn into it. With the advent of iOS 10, all this enabled the inclusion of a lot of layers, activations and other operators from neural networks into the Metal Performance Shaders framework, in particular, convolutional neural networks.

It gave just a huge boost, because, as a rule, calculations on a video card are several times faster than on the central processor. When Apple was given the opportunity to count on mobile video cards, and quickly, it was no longer necessary to write their mathematical operators and so on. This is a very strong shot. A year later, they released CoreML (we'll talk about it a little later).

Apple had a very good foundation. I don’t know if they had such a vision, or it was the same, but they are now objectively leaders in the machine learning industry on mobile devices.

About mobile phones (Android)

What works relatively cool and great in realtime on iOS, unfortunately, does not work as cool on Android. This is not only due to the fact that Android sucks. There are still other factors - first of all, the fact that Android has a very diverse infrastructure: there are weak devices, there are strong ones - you can't fit everything.

If Metal is supported on all iOS devices, then it is already more difficult on Android - OpenGL of one version is supported somewhere, somewhere else, is not supported at all. Somewhere there is Vulkan, somewhere it is not. All manufacturers have their own drivers, which, of course, are not optimized in any way, but simply minimally support the standard. It even happens that you run some neural networks on an Android on a GPU, and they work in speed just like on a CPU, because working with shared memory is very inefficient and all that.

On Android, things are bad now. It is rather surprising, because Google is one of the leaders, but it sags a little in this regard. On Android, there is a direct lack of a qualitative implementation of the capabilities of modern machine learning.

For us, for example, even in the application, not all features work the same way. The fact that iOS is fast, Android is slower, even on comparable power flagship devices. In this sense, at the moment, Android as a platform sags.

About CoreML

STAS: Once told about CoreML, it would probably be correct to say about TensorFlow Lite.

ANDREW: CoreML, in fact, is a dark horse. When he came out last year, they all said first: “Wow, cool!” But then it became clear that this was just a small wrapper over Metal. Companies that are seriously engaged in machine learning, including ours, have long had their own solutions. For example, our solutions in tests showed better results than CoreML in terms of speed and other parameters.

But the main problem of CoreML was that it could not be customized. Sometimes it happens that you need a complex layer in the neural network, which is not, for example, in Metal, and you need to write it yourself. In CoreML, it was not possible to embed your layers, and so you had to downgrade to Metal on the lower level and write everything yourself.

Recently, CoreML added this, and now this framework has become more interesting. If you are a developer who doesn’t have anything related to machine learning at all in a company or in an application, you can launch a neuron right in two lines and quickly run it off on the GPU. The results that show performance tests for CoreML are comparable to custom solutions and bare Metal.

That is, CoreML works quite well. It is a little damp, it has bugs, but every month it gets better. Apple is actively rolling out updates - not the way we are used to, that updates from Apple’s frameworks are released once a year or on half-major iOS versions. CoreML is actively updated, in this sense, everything is great.

TensorFlow Lite provides a converter to CoreML, CatBoost also supports a converter to CoreML. In short, Apple did everything again, as it should. They released an open-source converter and said: “Let's write all the converters in CoreML” - and many learning frameworks supported this.

At first there was some skepticism about CoreML, at the last WWDC the most frequent question for CoreML developers was: “Why do not you allow downloading models from the Internet? Why don't you allow them to encrypt? ”It was possible to get these models, and, it turns out, to steal intellectual property.

Now all this has been repaired, functionality has been added, and currently CoreML is exactly the leading platform in this sense.

STAS: Can you elaborate on this? It turns out that now you can no longer store the model, but simply download it from somewhere?

ANDREW: Yes, it is already possible. Earlier, when we asked about this, the developers smiled and said: "Just see the headers." There really were designers to whom it is possible to transfer files and everything will gather.

But CoreML models inside are made quite interesting. They are actually ordinary binaries that store weights, but plus they generate swift files, which then create implicit classes. You use these classes in your application, and compilers compile this model into some files.

Now, using certain hacks and approaches, you can make it so that this model will be portable. You can protect your intellectual property by encrypting and lighten the weight of the application.

In general, now CoreML is moving in the right direction. Not everything can be done legally from the point of view of App Review, not everything can be done easily, without hacks, but it is noticeable how the developers improve the framework.

STAS: Cool! I wanted to add that CoreML looks like a typical solution. Relatively speaking, it is convenient when you want to do something simple using machine learning in your application. It seems that if this is a typical task, then Apple tried to make it as simple as possible all this way, if you find a ready-made model, datas and so on. This is just a story about a typical task, because, probably, everything is ready for them.

ANDREY: For typical tasks, this is generally super! Without hyperbole - there really need two lines of code in order to run the model. In this sense - yes, this is very cool, especially for indie developers or companies that do not have an R & D department on their staff, but also want to add something cool.

But this is not so interesting, because typical tasks were solved on Github and with Metal - you could just copy this code for yourself and compile - albeit a bit more complicated.

It is important that now this framework is moving not only towards classic everyday tasks, but also towards complex solutions. This is really cool!

About mobile training

GLEB: You say that after the appearance of Metal it became possible to train models on mobile phones?

ANDREY: No, it was never possible to train on mobile phones. It does not make sense, you can only run. If I said so, I made a reservation. On mobile phones, of course, no one teaches.

STAS: I also did not hear anything about learning on the mobile phone.

GLEB: I did not hear it either, but I was thinking about it. Of course, it seems intuitively that this is a strange thing. But there are definitely no interesting problems, when would this be relevant?

ANDREW: It's hard to imagine them. If there is something like that, then only distributed learning. There are even scientific articles on how to do this, but I understand that you are asking how to learn from the data collected on the same phone? Simply, even if you collect so much (which will not happen), it will take so long to learn that it will never end, and no one will port the code for learning to mobile platforms, because why? Training always happens on servers, and inference on devices.

STAS: But ultimately it turns out that way. If you are a company, you want to have something like that, you need data, and you can collect it from your users, that is, periodically load it yourself.

ANDREW: Yes, but it works a little differently. You collect data from all users on one hot spot to your hot server, you train there, then send back the finished model. But not so that everyone has something to teach.

STAS: On the other hand, the mobile phone would be heated - and in the winter it would be relevant, but very, probably, for a long time.

About mobile phones and the future

GLEB: Are there any other interesting things in terms of applying machine learning to mobile devices? We talked about what we already have now. It would be interesting to look a little bit into the future - so that we generally want to get on our mobile platforms for some superfoods, superresolutions.

ANDREW: Now, oddly enough, performance is a bottleneck - because a lot of what we want to run doesn’t pull iPhones. Of course, you need to wait some more time, when it will be possible to run more complex tasks.

There are some problems with realtime. For example, even in our flagship application, video streaming with a style-transfer does not work with all styles, because it is too long and labor-intensive. There are bottlenecks associated with the level of development of iron.

In fact, CoreML is developing very strongly. In the future, I think he will be fine. Most of all I want the industry to calm down, to start standardization: more common formats, converters, conventions - more things that work equally well on Android and iOS, because for business it is very important. We often have such that we cannot integrate a cool feature just because we cannot roll it out only on iOS or only on Android.

It would be great, it would be good for everyone if active, healthy competition began, so that it would work great everywhere - both on Android and iOS, so that Github would stop fevering from endless learning frameworks. Now this is some sort of obscurantism - even Uber has its own framework called Horovod. Apple has its own framework - everyone just has their own framework, some have several. It seems to me that this all increases the entry threshold, the difficulty of converting, including to the mobile phone - so in the future I want a steady improvement and development of everything in a row.

I think that there will be no revolution in the near future. I am not an expert, and, perhaps, I have no right to say such a thing - but from what I see, something supernew is not foreseen. I just want a steady improvement in what is now.

Pro machine learning

GLEB: What do you advise to people who are not particularly in the subject but want to try? What to read, see how to go into this topic? Stas has already talked about courses from Yandex and MIPT.

ANDREW: If you asked my partners, they would answer you - read Bishop (M. Bishop. Pattern Recognition and Machine Learning. Christopher. Springer. 2006), some more complicated books. But due to the fact that I have a mathematical background, I worked for a long time with 3D graphics, where there is also a lot of linear algebra, I am more or less savvy. Anyway, this is quite an integrated approach. You are very lucky if you do it at the university - go to these couples, listen - and you will not regret it.

But if it happened so that this moment is already missed, or you entered another department or faculty, then I definitely recommend self-study. There are already courses that have become the de facto standard, for example, machine learning courses from Andrew Ng on Coursera. It does not even make sense to give a link to it, because it is the first everywhere in all lists.

Definitely, you need to go through several such courses in order to understand how it works inside at least at the level of sensations, in order to try to look at your unpretentious models - start at least with letter recognition on the MNIST dataset. It's like Hello World, only in the world of machine learning.

Probably, this is not the area where you can create a project, enter, something to poke and see what will happen. Here we must nevertheless approach more fundamentally, master the set of knowledge that is necessary for the foundation, and then increase the expertise.

STAS: And after the courses?

ANDREY: After the courses there are advanced-courses from Andrew Ng! There is, for example, the Kaggle portal, where competitions take place in which you can participate and try to train. When you are already a little savvy and can teach classical architectures, from this moment you have to read already semi-scientific or scientific articles, to understand the intricacies - if you see yourself in the role of Data Scientist.

If you are a mobile developer who wants to touch this, probably this level will be enough for you. I have mastered approximately at this level, and I don’t go further - I don’t need to go into the R & D process itself. I am doing exactly what the developer is doing. But, all the same, I needed a background. At first, the team also taught me, conducted seminars with the basics for programmers, so that we would quickly understand.

But gaining a minimum level of knowledge is quite simple. You need to go through several courses, try yourself at Kaggle, do something else - and in principle you will be ready to solve 90% of existing problems.

Results

GLEB: Let's break the line then. It seems to me that we quite concisely and at the same time clearly and interestingly discussed what is around machine learning, at least what our programmer pens have reached for us.

We have learned that machine learning is simply the multiplication of matrices. Of course, not quite, but yes - this is a mathematical model with no magic inside.
We remembered all sorts of fashionable things like deep learning.
We discussed more modern techniques like reinforcement training.
We talked a little about capsule nets.
We discussed what are the actual problems of machine learning, namely, that it is very important to obtain data for good learning.
We talked a little about the market and that there are few engineers. The demand, however, is also not very big, but still waiting for the Mountain View!
As a result, we slipped to our favorite topic - mobile development, and learned that CoreML is great, it is developing very quickly.

Many thanks to Andrei Volodin for telling us all this.

By the way, Andrei plans to make a detailed report on this theme at AppsConf 2018 , which will be held on October 8 and 9 in Moscow.

The program committee has already received more than 80 applications , but Call for Papers is still open - submit applications until 3 August . Let me remind you, we are waiting for hardcore, applied and in places HYIP reports and carry out a rigorous selection.

Tags: