History and experience of using machine translation. Yandex lecture
In September, the sixth Hyperbaton was held - a Yandex conference on everything related to technical documentation. We will publish several lectures from Hyperbaton, which, in our opinion, may be most interesting to Habr's readers.
Svetlana Kayushina, head of documentation and localization department:
- It seems that there are no more people left in the world who translate manually. Today we want to talk about the tools and approaches that help companies organize an effective localization process, and translators make it easier for them to solve their everyday tasks. Today we will talk about machine translation, the evaluation of the effectiveness of machine engines and automated translation systems for translators.
Let's start with the report of our colleagues. I invite Irina Rybnikov and Anastasia Ponomareva - they will tell you about Yandex’s experience in introducing machine translation into our localization processes.
Irina Rybnikova:
- Thank you. We will tell about the history of machine translation and how we use it in Yandex.
Back in the 17th century, scientists thought about the existence of a language that binds other languages together, and probably this is too long. Let's go back. We all want to understand the people around - no matter where we come, we want to see what is written on the signs, we want to read announcements, information about concerts. The idea of the Babylonian fish furrows the minds of scientists, is found in literature, cinema - everywhere. We want to reduce the time for which we access information. We want to read articles about Chinese technology, understand any sites that we see, and want to get it here and now.
In terms of this, it is impossible not to talk about machine translation. This is what helps to solve the specified task.
The starting point is considered to be 1954, when in the United States on the IBM 701 machine 60 sentences on the general subject of organic chemistry were translated from Russian to English, and all this was based on 250 glossary terms and six grammar rules. This was called the Georgetown Experiment, and it was so shocked by the reality that the newspapers were full of headlines, that another three to five years, and the problem will be completely solved, everyone will be happy. But as you know, things went a bit different.
In the 70s, machine-based translation appeared. It was also based on bilingual dictionaries, but also those very sets of rules that helped describe any language. Any, but with limitations.
It required serious linguistic experts who wrote the rules. This is quite a difficult job; all the same, it could not take into account the context, completely cover any language, but they were experts, and then the high computing power was not required.
If we talk about quality, a classic example is a quote from the Bible, which was translated then. Not enough yet. Therefore, people continued to work on quality. In the 90s, a statistical model of translation emerged, SMT, which spoke about the probabilistic distribution of words and sentences, and this system was fundamentally different in that it knew nothing about rules and pro-linguistics at all. She received an enormous amount of identical texts, paired in one language and another, and then she made decisions herself. It was easy to maintain, no piles of experts were needed, no waiting required. It was possible to download and receive the result.
Requirements for incoming data were quite average, from 1 to 10 million segments. Segments - sentences, small phrases. But their difficulties remained and the context was not taken into account, everything was not very easy. And in Russia, for example, there were such cases.
I also like the example of the translation of GTA games, the result was great. Everything did not stand still. The year 2016 was quite important for Mylstonstone when neural machine translation was launched. It was quite an epoch-making event, which greatly changed life. My colleague, looking at the translations and how we use them, said: "Cool, he says with my words." And it was really great.
What are the features? High entry requirements, training material. Inside the company it is difficult to maintain, but a significant increase in quality is what it was for. Only high-quality translation will allow to solve the tasks and make life easier for all participants in the process, the same translators who do not want to correct bad translation, they want to do new creative tasks, and give routine machine routine phrases.
Within the framework of machine translation there are two approaches. Expert assessment / linguistic analysis of texts, that is, checking by real linguists, experts for consistency with the meaning, literacy of the language. In some cases, experts were still imprisoned, they were given to read the translated text and evaluated how effective it is from this point of view.
What are the features of this method? A sample of the translation is not required, we look at the finished translated text now and evaluate objectively for any cut. But it is expensive and long.
There is a second approach - automatic reference metrics. There are many, each has pros and cons. I will not go deep, you can read more about these keywords later.
What is the feature? In fact, this is a comparison of translated machine texts with some exemplary translation. These are quantitative metrics that show the discrepancy between the model translation and what happened. It is fast, cheap and can be done quite comfortably. But there are features.
In fact, most often now use hybrid methods. This is when something is initially evaluated automatically, then the error matrix is analyzed, then an expert linguistic analysis is performed on a smaller body of texts.
The last time is still common practice, when we call not linguists there, but just users. An interface is made - show which translation you like best. Or when you go to the online translators, you enter the text, and you can often vote on what you like, whether this approach is suitable or not. In fact, we all now teach these engines, and everything that we give them for translation, they use for training and work on their quality.
I would like to tell you how we use machine translation in our work. I hand over the word of Anastasia.
Anastasia Ponomareva:
- We in Yandex in the localization department realized quickly enough that the capabilities of machine translation technology are great, and decided to try using it in our daily tasks. What did we start with? We decided to conduct a small experiment. We decided to translate the same texts through an ordinary neural network translator, and also to assemble a trained machine translator. To do this, we prepared the corpus of texts in a pair of Russian-English for the years that we in Yandex were engaged in the localization of texts into these languages. Then we came with this corpus of texts to our colleagues from Yandex.Translator and asked to train the engine.
When the engine was trained, we translated another batch of texts, and as Irina said, with the help of experts, we evaluated the results obtained. Translators we asked to look at literacy, style, spelling, transfer of meaning. But the most crucial moment was when one of the translators said that “I recognize my style, I recognize my translations”.
In order to reinforce these feelings, we decided to calculate the statistical indicators. At first, we calculated the BLEU coefficient for transfers made via the usual neural network engine, and obtained such a figure (0.34). It would seem that it should be compared with something. We again went to colleagues from Yandex.Translator and asked to explain what BLEU coefficient is considered to be the threshold for transfers made by a real person. This is from 0.6.
Then we decided to check what the results are on the trained translations. Received 0.5. The results are really encouraging.
I give an example. This is the real Russian phrase from the DirectA documentation. Then it was transferred through the usual neural network engine, and then through the trained neural network engine on our texts. Already in the very first line, we notice that the traditional for Direct, type of advertising is not recognized. And already in the trained neural network engine our translation appears, and even the abbreviation is almost correct.
We were very inspired by the results, and decided that probably we should use the engine in other pairs, on other texts, not only on that basic set of technical documentation. A few months later a series of experiments. Faced with a lot of features and problems, these are the most frequent problems that we had to solve.
I will tell about each in more detail.
If you, like us, are thinking of making a customized engine, you will need a sufficiently large amount of high-quality parallel data. A large engine can be trained on quantities from 10 thousand sentences, in our case we have prepared 135 thousand parallel sentences.
Not on all types of text your engine will show equally good results. In technical documentation, where there are long sentences, structure, user documentation, and even in the interface, where there are short, but definite buttons, most likely, you will be fine. But perhaps, like us, you will encounter problems in marketing.
We conducted an experiment by translating music playlists, and got this example.
This is what machine translator thinks about star manufacturers. What is the shock of labor?
When translating through a machine engine, the context is ignored. There is no longer such a ridiculous example, but quite a real one, from the technical documentation of Direct. It would seem that those - it is clear when you read the technical documentation, those - this is technical. But no, the engine engine did not hit.
You also have to take into account that the quality and meaning of the translation will strongly depend on the original language. We translate the phrase into French from Russian, we get one result. We get a similar phrase with the same meaning, but from English, and we get a different result.
If you have, as in our text, a large number of tags, markup, some technical features, most likely you will have to monitor them, edit and write some scripts.
Here are examples of real phrases from the browser. In parentheses are technical information that should not be translated, in particular, plural forms. In English they are in English, and in German they must also remain in English, but they are translated. You will have to keep track of these moments.
The engine engine knows nothing about your naming features. For example, we have an agreement that we call Yandex.Disk everywhere in Latin in all languages. But in French it turns into a disc in French.
Abbreviations are sometimes recognized correctly, sometimes not. In this example, BY, denoting belonging to the Belarusian technical requirements for advertising, turns into a preposition in English.
One of my favorite examples is new and borrowed words. Here is a cool example, the word disclaimer, "originally Russian." The terminology will have to be verified for each part of the text.
And one more, not so significant problem - outdated writing.
Previously, the Internet was a novelty, all texts were written with a capital letter, and when we trained our engine, everywhere the Internet was with a capital letter. Now a new era, the Internet is already writing in small letters. If you want your engine to continue writing the Internet with a small letter, you will have to retrain it.
We did not despair, solved these problems. First, they changed the corpus of texts, tried to translate on other topics. We passed our comments to our colleagues from Yandex.Translator, retrained the neural network and looked at the results, evaluated, and asked to finalize. For example, tag recognition, HTML markup processing.
I will show real use cases. We have a good machine translation for technical documentation. This is a real case.
Here is a phrase in English and in Russian. The translator who handled this documentation was greatly encouraged by the appropriate choice of terminology. Another example.
The translator appreciated the choice of is instead of a dash, that the structure of the phrase has changed to English, an adequate choice of a term that is true, and the word you, which is not in the original, but it makes this translation precisely English, natural.
Another case is interface translations on the fly. One of the services decided not to bother with localization and translate texts right at the time of loading. But after changing the engine about once a month, the word "delivery" changed in a circle. We offered the team to connect not the usual neural network engine, but ours, trained on technical documentation, so that the same term is always used, coordinated with the team that is already in the documentation.
How does all this affect the money moment? Originally it happened that in a pair of Russian-Ukrainian a minimal editing of the Ukrainian translation is required. Therefore, a couple of months ago we decided to switch to the post-rating system. This is how our savings grow. September is not over yet, but we figured that we have reduced our costs of post-publishing by about a third in Ukrainian, and we are going to edit almost everything except marketing texts. Word of Irina to summarize.
Irina:
- It becomes obvious to everyone that it is necessary to use this, this is already our reality, and it cannot be excluded from its processes and interests. But you need to think about a few things.
Decide on the types of documents context you are working with. Is this technology suitable for you?
Second moment. We talked about Yandex.Translate, because we are in good relations, we have direct access to developers, and so on, but in fact you need to decide which engine will be the most optimal for you specifically, for your language, your subject matter. The next report will be devoted to this topic . Be prepared that there are still difficulties, engine developers are working together to solve difficulties, but while they are still encountered.
I would like to understand what awaits us in the future. But in fact, this is not further, but our present time, what is happening here and now. We all rather need customization under our terminology, under our texts, and this is something that is now becoming public. Now everyone is working to ensure that you do not go inside the company, do not agree with the developers of a specific engine, how to optimize it for you. You can receive it in public open engines on API.
Customization is not only the texts, but also in terminology, to customize the terminology for your own needs. This is quite an important point. The second topic is interactive translation. When the translator translates the text, the technology allows him to predict the following words based on the source language, the source text. This Auger can greatly facilitate the work.
The fact that now is really expensive. Everyone thinks how to train some engines much less efficiently with smaller volumes of text. This is what happens everywhere and runs everywhere. I think the topic is very interesting, and then it will be even more interesting.
We have collected several articles that may interest you. Thank!
- Two models are better than one. Experience Yandex. Translator
-How Yandex used artificial intelligence technology to translate web pages
- Machine translation. From cold war to diplerning
Svetlana Kayushina, head of documentation and localization department:
- It seems that there are no more people left in the world who translate manually. Today we want to talk about the tools and approaches that help companies organize an effective localization process, and translators make it easier for them to solve their everyday tasks. Today we will talk about machine translation, the evaluation of the effectiveness of machine engines and automated translation systems for translators.
Let's start with the report of our colleagues. I invite Irina Rybnikov and Anastasia Ponomareva - they will tell you about Yandex’s experience in introducing machine translation into our localization processes.
Irina Rybnikova:
- Thank you. We will tell about the history of machine translation and how we use it in Yandex.
Back in the 17th century, scientists thought about the existence of a language that binds other languages together, and probably this is too long. Let's go back. We all want to understand the people around - no matter where we come, we want to see what is written on the signs, we want to read announcements, information about concerts. The idea of the Babylonian fish furrows the minds of scientists, is found in literature, cinema - everywhere. We want to reduce the time for which we access information. We want to read articles about Chinese technology, understand any sites that we see, and want to get it here and now.
In terms of this, it is impossible not to talk about machine translation. This is what helps to solve the specified task.
The starting point is considered to be 1954, when in the United States on the IBM 701 machine 60 sentences on the general subject of organic chemistry were translated from Russian to English, and all this was based on 250 glossary terms and six grammar rules. This was called the Georgetown Experiment, and it was so shocked by the reality that the newspapers were full of headlines, that another three to five years, and the problem will be completely solved, everyone will be happy. But as you know, things went a bit different.
In the 70s, machine-based translation appeared. It was also based on bilingual dictionaries, but also those very sets of rules that helped describe any language. Any, but with limitations.
It required serious linguistic experts who wrote the rules. This is quite a difficult job; all the same, it could not take into account the context, completely cover any language, but they were experts, and then the high computing power was not required.
If we talk about quality, a classic example is a quote from the Bible, which was translated then. Not enough yet. Therefore, people continued to work on quality. In the 90s, a statistical model of translation emerged, SMT, which spoke about the probabilistic distribution of words and sentences, and this system was fundamentally different in that it knew nothing about rules and pro-linguistics at all. She received an enormous amount of identical texts, paired in one language and another, and then she made decisions herself. It was easy to maintain, no piles of experts were needed, no waiting required. It was possible to download and receive the result.
Requirements for incoming data were quite average, from 1 to 10 million segments. Segments - sentences, small phrases. But their difficulties remained and the context was not taken into account, everything was not very easy. And in Russia, for example, there were such cases.
I also like the example of the translation of GTA games, the result was great. Everything did not stand still. The year 2016 was quite important for Mylstonstone when neural machine translation was launched. It was quite an epoch-making event, which greatly changed life. My colleague, looking at the translations and how we use them, said: "Cool, he says with my words." And it was really great.
What are the features? High entry requirements, training material. Inside the company it is difficult to maintain, but a significant increase in quality is what it was for. Only high-quality translation will allow to solve the tasks and make life easier for all participants in the process, the same translators who do not want to correct bad translation, they want to do new creative tasks, and give routine machine routine phrases.
Within the framework of machine translation there are two approaches. Expert assessment / linguistic analysis of texts, that is, checking by real linguists, experts for consistency with the meaning, literacy of the language. In some cases, experts were still imprisoned, they were given to read the translated text and evaluated how effective it is from this point of view.
What are the features of this method? A sample of the translation is not required, we look at the finished translated text now and evaluate objectively for any cut. But it is expensive and long.
There is a second approach - automatic reference metrics. There are many, each has pros and cons. I will not go deep, you can read more about these keywords later.
What is the feature? In fact, this is a comparison of translated machine texts with some exemplary translation. These are quantitative metrics that show the discrepancy between the model translation and what happened. It is fast, cheap and can be done quite comfortably. But there are features.
In fact, most often now use hybrid methods. This is when something is initially evaluated automatically, then the error matrix is analyzed, then an expert linguistic analysis is performed on a smaller body of texts.
The last time is still common practice, when we call not linguists there, but just users. An interface is made - show which translation you like best. Or when you go to the online translators, you enter the text, and you can often vote on what you like, whether this approach is suitable or not. In fact, we all now teach these engines, and everything that we give them for translation, they use for training and work on their quality.
I would like to tell you how we use machine translation in our work. I hand over the word of Anastasia.
Anastasia Ponomareva:
- We in Yandex in the localization department realized quickly enough that the capabilities of machine translation technology are great, and decided to try using it in our daily tasks. What did we start with? We decided to conduct a small experiment. We decided to translate the same texts through an ordinary neural network translator, and also to assemble a trained machine translator. To do this, we prepared the corpus of texts in a pair of Russian-English for the years that we in Yandex were engaged in the localization of texts into these languages. Then we came with this corpus of texts to our colleagues from Yandex.Translator and asked to train the engine.
When the engine was trained, we translated another batch of texts, and as Irina said, with the help of experts, we evaluated the results obtained. Translators we asked to look at literacy, style, spelling, transfer of meaning. But the most crucial moment was when one of the translators said that “I recognize my style, I recognize my translations”.
In order to reinforce these feelings, we decided to calculate the statistical indicators. At first, we calculated the BLEU coefficient for transfers made via the usual neural network engine, and obtained such a figure (0.34). It would seem that it should be compared with something. We again went to colleagues from Yandex.Translator and asked to explain what BLEU coefficient is considered to be the threshold for transfers made by a real person. This is from 0.6.
Then we decided to check what the results are on the trained translations. Received 0.5. The results are really encouraging.
I give an example. This is the real Russian phrase from the DirectA documentation. Then it was transferred through the usual neural network engine, and then through the trained neural network engine on our texts. Already in the very first line, we notice that the traditional for Direct, type of advertising is not recognized. And already in the trained neural network engine our translation appears, and even the abbreviation is almost correct.
We were very inspired by the results, and decided that probably we should use the engine in other pairs, on other texts, not only on that basic set of technical documentation. A few months later a series of experiments. Faced with a lot of features and problems, these are the most frequent problems that we had to solve.
I will tell about each in more detail.
If you, like us, are thinking of making a customized engine, you will need a sufficiently large amount of high-quality parallel data. A large engine can be trained on quantities from 10 thousand sentences, in our case we have prepared 135 thousand parallel sentences.
Not on all types of text your engine will show equally good results. In technical documentation, where there are long sentences, structure, user documentation, and even in the interface, where there are short, but definite buttons, most likely, you will be fine. But perhaps, like us, you will encounter problems in marketing.
We conducted an experiment by translating music playlists, and got this example.
This is what machine translator thinks about star manufacturers. What is the shock of labor?
When translating through a machine engine, the context is ignored. There is no longer such a ridiculous example, but quite a real one, from the technical documentation of Direct. It would seem that those - it is clear when you read the technical documentation, those - this is technical. But no, the engine engine did not hit.
You also have to take into account that the quality and meaning of the translation will strongly depend on the original language. We translate the phrase into French from Russian, we get one result. We get a similar phrase with the same meaning, but from English, and we get a different result.
If you have, as in our text, a large number of tags, markup, some technical features, most likely you will have to monitor them, edit and write some scripts.
Here are examples of real phrases from the browser. In parentheses are technical information that should not be translated, in particular, plural forms. In English they are in English, and in German they must also remain in English, but they are translated. You will have to keep track of these moments.
The engine engine knows nothing about your naming features. For example, we have an agreement that we call Yandex.Disk everywhere in Latin in all languages. But in French it turns into a disc in French.
Abbreviations are sometimes recognized correctly, sometimes not. In this example, BY, denoting belonging to the Belarusian technical requirements for advertising, turns into a preposition in English.
One of my favorite examples is new and borrowed words. Here is a cool example, the word disclaimer, "originally Russian." The terminology will have to be verified for each part of the text.
And one more, not so significant problem - outdated writing.
Previously, the Internet was a novelty, all texts were written with a capital letter, and when we trained our engine, everywhere the Internet was with a capital letter. Now a new era, the Internet is already writing in small letters. If you want your engine to continue writing the Internet with a small letter, you will have to retrain it.
We did not despair, solved these problems. First, they changed the corpus of texts, tried to translate on other topics. We passed our comments to our colleagues from Yandex.Translator, retrained the neural network and looked at the results, evaluated, and asked to finalize. For example, tag recognition, HTML markup processing.
I will show real use cases. We have a good machine translation for technical documentation. This is a real case.
Here is a phrase in English and in Russian. The translator who handled this documentation was greatly encouraged by the appropriate choice of terminology. Another example.
The translator appreciated the choice of is instead of a dash, that the structure of the phrase has changed to English, an adequate choice of a term that is true, and the word you, which is not in the original, but it makes this translation precisely English, natural.
Another case is interface translations on the fly. One of the services decided not to bother with localization and translate texts right at the time of loading. But after changing the engine about once a month, the word "delivery" changed in a circle. We offered the team to connect not the usual neural network engine, but ours, trained on technical documentation, so that the same term is always used, coordinated with the team that is already in the documentation.
How does all this affect the money moment? Originally it happened that in a pair of Russian-Ukrainian a minimal editing of the Ukrainian translation is required. Therefore, a couple of months ago we decided to switch to the post-rating system. This is how our savings grow. September is not over yet, but we figured that we have reduced our costs of post-publishing by about a third in Ukrainian, and we are going to edit almost everything except marketing texts. Word of Irina to summarize.
Irina:
- It becomes obvious to everyone that it is necessary to use this, this is already our reality, and it cannot be excluded from its processes and interests. But you need to think about a few things.
Decide on the types of documents context you are working with. Is this technology suitable for you?
Second moment. We talked about Yandex.Translate, because we are in good relations, we have direct access to developers, and so on, but in fact you need to decide which engine will be the most optimal for you specifically, for your language, your subject matter. The next report will be devoted to this topic . Be prepared that there are still difficulties, engine developers are working together to solve difficulties, but while they are still encountered.
I would like to understand what awaits us in the future. But in fact, this is not further, but our present time, what is happening here and now. We all rather need customization under our terminology, under our texts, and this is something that is now becoming public. Now everyone is working to ensure that you do not go inside the company, do not agree with the developers of a specific engine, how to optimize it for you. You can receive it in public open engines on API.
Customization is not only the texts, but also in terminology, to customize the terminology for your own needs. This is quite an important point. The second topic is interactive translation. When the translator translates the text, the technology allows him to predict the following words based on the source language, the source text. This Auger can greatly facilitate the work.
The fact that now is really expensive. Everyone thinks how to train some engines much less efficiently with smaller volumes of text. This is what happens everywhere and runs everywhere. I think the topic is very interesting, and then it will be even more interesting.
We have collected several articles that may interest you. Thank!
- Two models are better than one. Experience Yandex. Translator
-How Yandex used artificial intelligence technology to translate web pages
- Machine translation. From cold war to diplerning