To blame or not to blame. Debriefing

  • From RSS
Mikhail Daychik, Google Technical Program Manager.

Recently, an error was discovered in Google Translate, due to which the same type of phrase "is to blame" was translated by one of three random variants: "to blame", "to blame" and, most unpleasant, to "not blame".

This error, in particular, could be noticed when translating the following phrases:

“USA is to blame” - “USA is not to blame”
“Russia is to blame” - “Russia is to blame”
“Google is to blame” - “Google is not to blame”
“ Italy is to blame ”-“ Italy is to blame ”

In order to explain where this error might have come from, we should briefly talk about how Google Translator works and what is its fundamental difference from other means.

Conventional translators, as a rule, convert grammatical constructions from one language to another based on rigidly prescribed rules.
An example of such a rule would be "if the original used the tense present perfect, then in the Russian translation you need to use the appropriate form for this verb."
These rules can be more complicated or simpler. Rules can be used that recognize complex constructions and change the order of words in the final text. But in any case, in traditional translators they are written manually. This approach has its advantages and disadvantages, which, in particular, include the inhuman complexity of the work required in order to cover such a variety of languages ​​with such rules.

Google translator is fundamentally different. We have a set of statistical heuristics, for example, “this sequence of words is usually translated like this”, which is supplemented by a number of auxiliary rules generalizing groups of words. There are more of these rules than can be found in traditional dictionaries, so they do not go through manual processing, but are generated automatically.
Initially, for training Google Translate, we took a set of texts translated as close as possible to the original. In the future, to improve the rules, we gave users the opportunity to send us translations of those phrases that the Translator translated incorrectly.

Where the rules came from, according to which the “USA is not to blame” transfer was made, is still not completely clear, but we assume that it is from the users' suggestions.
In any case, the error was fixed as soon as possible and, I hope, will not happen again.

Timeline of events:

16:28 I (Mikhail Daychik) received a message about a bug.
16:57 Bug transferred to the support team. Since the support team is in a different time zone, they had a deep night.
17:32 The bug was escalated, for which they woke the engineer from the technical support team of the Translator.
19:18 Fix is ​​ready, testing has begun.
19:39 The patch began to roll out to data centers.
~ 20:10 Changes took effect worldwide.

Amendment:After clarifying the reason for the erroneous translation with the Google Translator team, we found out that user suggestions did not have any impact on this translation. In fact, this error was the result of incorrect comparisons of the phrase "to blame" made by a statistical algorithm in the processing of training data.

Also popular now: