Machine Learning: State of the art



    In 2015, a new word entered the world of art: “ inceptionism ”. Cars learned to redraw paintings, and already in 2016, Prisma was downloaded by millions of people. Today we’ll talk about art, machine learning, and artificial intelligence with Ivan Yamshchikov, the author of the acclaimed Neural Defense .



    Meet: Ivan Yamshchikov. He received a PhD in Applied Mathematics at Brandenburg University of Technology (Cottbus, Germany). At the moment - a researcher at the Max Planck Institute (Leipzig, Germany) and analyst / consultant at Yandex.

    - Neurona, Neural Defense and Pianola - how did such a serious passion for creative AI begin? At what point did you decide to really seriously engage in this topic?

    Ivan Yamshchikov:
    I would not call it a serious passion for creative artificial intelligence. It’s just that once Alexei Tikhonov shared his ideas on a neuro-poet, and in the summer of 2016 we decided to record the album “Neural Defense” together. Since then it has become clear that the field is much wider, and now I am engaged in the topic of artificial intelligence on an ongoing basis.

    This is an incredibly interesting topic and right now it is moving: in the history of AI there were several “winters”, periods of disappointment from unjustified and overstated expectations; Now is the third period of extreme interest in AI, and it is possible that we will soon again face a similar problem. Despite this, there is a really qualitative leap in the work of many systems: machine translation, aggregation systems, autonomous systems.

    - The idea of ​​combining matmodels and music / painting is not new, but why did the approach fire right now?

    Ivan Yamshchikov: This is perhaps one of my favorite topics: when you played shooters in the 90s, you unconsciously helped the development of AI.

    Video cards (GPUs) were upgraded for the development of graphical interfaces and games, but at some point people figured out that they can be used for parallel computing, CUDA appeared. Initially, in the scientific field in fluid dynamics, where a number of models can be well calculated on a graphics card, such a calculation on a CPU would be several times more expensive. A few years later it turned out that neural networks are also perfectly parallelized and trained using graphic cards. And this boom in the development of scientific computing has allowed the creation of neural networks of sizes that were previously inaccessible.

    Cloud computing also played a role: now it’s not even necessary to buy a GPU; you can rent it; in the same way, you can rent the right amount of CPU. This lowered the entry threshold, and in technology it is always like this: when the threshold is lower, the results appear much faster.

    As for painting, the key article here is the Neural Artistic Style , written by researchers from Tübingen. As a result of the experiments, it turned out that on one of the layers of the neural network signs that were responsible for the style (as drawn) were collected, and on the other for the semantics (what is - the content). From this article, the famous Prisma application was born.

    And we decided to make music, because we love literature and poetry. And Yegor Letov was chosen because we love him and we wanted to try to imitate his style. In general, these are exclusively aesthetic preferences.

    In general, working with music is much more pragmatic than with text: when you work with a dictionary, it is based on one-hot encoding (all words are numbered, and the i-th word is a vector where it stands at the i-th position 1, but not all the others - 0). After processing a set of documents, a very large dimension is obtained. Further, the dimension is artificially reduced using a number of methods, for example, word2vec ( https://ru.wikipedia.org/wiki/Word2vec ; https://habrahabr.ru/post/249215/ ).

    One way or another, we are talking about a space dimension of several hundred, and not three- or four-dimensional. It is usually difficult to work with a space of such a dimension: some areas have a high data density, while others, on the contrary, are too sparse - the structure will turn out to be very complex. And if we talk about music and take notes, then each note is a combination of an octave and a note; 12 notes (with sharps / flats) in octave and 4-5 octaves. And from this point of view, this space has a much lower dimension.

    And if you approach the melody as a whole, you can scale the data so that the parameter space is dense: there will be few gaps. When we experimented with neural networks for different types of data, we found that the indicated property of music allows us to better and faster understand whether the generally trained model works or not, so this was a pragmatic solution.

    - What is the best place for a technical person to get acquainted with creative AI, is there good resources, courses, lectures on this topic?

    Ivan Yamshchikov:We will eat the elephant in parts. First, neural networks! = Artificial intelligence. On the other hand, the National Assembly is one of the most popular topics, and a lot of materials are available on it. There are also courses and materials on AI and machine learning. We list the main Russian-language ones: the joint course of HSE and Yandex, the Vorontsov course in machine learning, the Vetrov course in Bayesian methods, the Lempitsky course in deep learning, the English language: courses on Udacity (including TensorFlow), on Coursera.

    There are no courses on creative AI as such - the topic itself lies at the intersection of science and art; and most of the questions here at the junction are open.

    What I really recommend looking at and what to spend time on is machine learning courses (see above), including deep ones.

    - Many say that modern methods of machine learning just copy pieces from already created works and combine them according to the found canons, was there something that really surprised you in the works created by AI?

    Ivan Yamshchikov: In general, this, of course, is a reasonable and justified criticism, but I have two counterarguments on it: technical and philosophical.

    Let's start with the technical. Previously, we did not know how to do this - but now we have learned. And the fact that we can technically create such things now is a breakthrough. Maybe not from the point of view of art history, but from a purely technical one - for sure.

    From a philosophical perspective: a postmodernist does the same. And if we live in the era of postmodernism, then virtually any author in a sense copies, imitates, or is inspired by experience. And in general, if we consider the problem of learning (of course, not for everything there is a fully formalized mathematical apparatus), then this is a transformation of the flow of information into knowledge. And knowledge is information that is filtered, organized, and ranked in a certain way. And if you look deeper, then the basis of any training, including human learning, is experience combined and transformed. And it seems that something new has turned out, but in reality it follows from a combination of experience gained earlier.

    As for the surprised things. I have a favorite line from Neurona: “The God who's always welcome to Iraq” (a God who is always welcome in Iraq) is a completely unexpected line.

    There is such a term in psychology: apophenia - the ability to find patterns where they do not exist. In this sense, the creativity of machines now, of course, appeals to apophenia: the stronger this property is in a person, the more interesting is machine creativity.

    - Continuing the previous question: in the match AlphaGo vs. Le Sedol AlphaGo played on the 5th line - a move that no one would have made (which caused a storm in the GO community) - what are the examples in the created works of something clearly not inherent in the human style?

    Ivan Yamshchikov:A huge amount of data is available to a person: tactile and taste sensations, smells, and many others. And this is a huge experience available to man, in a sense determines consciousness. Machines of this volume do not and, accordingly, their existence is much less diverse and interesting than that of humans. As a result, texts written by a machine are radically different from those written by humans.

    The fundamental questions are: did we understand the principle by which a person creates texts. And here in the scientific community there is no unequivocal and clear answer. The problem of generating discrete sequences, whether it be text or music, is an open question, at the very edge of scientific knowledge, and different people around the world are now struggling with it.

    - A number of technical experts call the absence of objective criteria for the quality of work one of the problems in evaluating the work of a creative AI: how do people generally evaluate the quality of the generated music and pictures drawn using neural networks?

    Ivan Yamshchikov:
    I really like this question, and if you have any ideas, come and write an article together. I am not kidding.

    Now the working criterion for quality control is based on a collective assessment by people. 400 thousand people listened to neural defense in the first week, and based on the distribution of ratings and comments, one can assess whether people liked it a lot and how similar it turned out.

    Speaking in more detail, it is technically possible to consider two cases: training with a teacher and without. In the first case, we have answers - marked up by people or tested earlier, which the algorithm tries to focus on, but in the second they are not. If there are answers, then for each specific task you can enter some metric of similarity to the answer and objectively measure what happened. And if they are not, then it is completely unclear how to introduce such a metric.


    “It is extremely interesting how exactly the texts of the Neural Defense were generated?” Is it possible to intuitively talk about the mat. generator apparatus?

    Ivan Yamshchikov:
    Yandex employee Yuri Zelenkov developed a number of poetic heuristics that evaluate rhythm and rhyme in Russian. We used a combination of these heuristics and an LSTM network (Long Short Term Memory: https://habrahabr.ru/company/wunderfund/blog/331310/), who read an array of Russian poetry: she was given a couple of <poems, author>, and in the data array was all Russian poetry that we could find, that is, conditionally from Pushkin to the present day, including Russian rock and pop music. However, even this amount of data was not enough, and we gave the machine to read each text in random order so that each poem was read 10 times. This allowed us to significantly increase the amount of data and significantly improved the quality.

    Further to the entrance we submit the author and say: "Come on as this author." And we applied to the entrance of Yegor Letov. I will talk more about this at the SmartData 2017 conference , where I will reveal many details.

    When we generated English texts for Neurona, poetic heuristics were no longer used. Lesha Tikhonov proposed to include the phonetics of words in the latent space of signs that is formed inside the neural network, and the algorithm itself “understood” what can be rhymed and how.

    - AI already plays poker and GO, redraws pictures and videos, writes music and poems: what next? What is the next undefeated peak for creative AI?

    Ivan Yamshchikov: There is already a short film based on the plot created by RNN. Unfortunately, she is rather mediocre. People still do not know how to “explain” the concept of a plot to a neural network.

    But from the point of beautiful applications, everything is limited by the imagination of the author. Now it seems to me the most promising possibility of interacting with the network, that is, creating objects that interact with the viewer / listener.

    In a sense, computer games are the art of the future. While in the game, you live the story in different ways, i.e. experience is individual. Similar interactivity in art is the next step.

    For example, when you listen to music, and it sounds like a live concert and you can interact with a singer / band / music. A simple example: Yandex.Music or Spotify can adjust the rhythm and music to the mood or specifically select tracks for, for example, sports.

    If you recall the live photos from Apple - these are, in fact, several frame options. Accordingly, it can be assumed that when a musician records an album, in a sense he will record several versions or variations of the composition within certain limits. And then the track will be able to adapt to the mood of the listener, focusing on some external data. The analogy here is quite simple - if you sit down with friends and sing a song with a guitar, then the same song will come out differently depending on your mood, but it will be the same song. I am sure that something similar can already be realized technologically and in music.

    - Now one of the most popular topics for discussion is working in a car + person pair, such games are held at chess and GO tournaments. Are there any interesting examples of working as a person + machine in art?

    Ivan Yamshchikov: In general, this is already happening: a person is already creating music on a computer. I periodically discuss this topic with, so to say, skeptics who are worried that robots will replace people, and draw apocalyptic scenarios. I try to reassure everyone and bring here such an argument. Usually we create a machine to do what we do poorly. When you need to create a machine that digs the earth well, we do not create a huge man with a shovel, but an excavator that digs the earth much better than this huge man.

    There is such a cognitive trap: when we talk about artificial intelligence, we think that it will look like human - that is, like ours, only more!

    For example, when at the beginning of the 19th century, science fiction writers tried to come up with a future, there was a gigantic airship in it, not an airplane. In general, a person easily foresees quantitative changes, but can hardly imagine qualitative changes. It is easy to imagine that everything will be faster, cheaper, more (less) ... But the leaps in technology are poorly predicted. And it seems to me that the same thing is happening now with artificial intelligence.

    But a qualitative (and not only quantitative) breakthrough is already taking place in applications and works related to understanding what a person wants: what he is looking for, is going to buy. And based on the generation of texts, we can make fundamentally different methods of communication with a person. A person will be able to use new tools as they become available. For example, a programmer will have smart assistants in writing code, for an artist this can be a system for selecting colors, and for a composer, it can be a system that gives inspiration and helps to correctly convey emotions in a work.



    If machine learning topics are close to you as well as to us, we want to draw your attention to a number of key reports at the upcoming SmartData 2017 conference , which will be held on October 21 in St. Petersburg:


    Also popular now: