inlingo_l10n May 25, 2019 at 10:16 pm

How localization works in Netflix - translation

From the sandbox

Hello, Habr! I present to you the translation of the material “Localization Technologies at Netflix” written by the Netflix team about internal localization processes and programs designed specifically for this.

The localization program at Netflix is based on three principles: impeccable linguistics, a harmonious atmosphere in the team, and advanced technologies.

We are not afraid to experiment and try new processes and tools, to oppose the norms generally accepted in localization - thanks to this we have come so far! Working at Netflix means being a pioneer.

In this article, we talk about two technologies that will lead us to WORLD domination ... More under the cut.

Netflix Global String Repository

Netflix did not succeed because we make quality content, but because we deliver that content. Most of the success is an intuitive, easy to use, and localized user interface (UI). Netflix is available on different platforms: web version, Apple iOS, Google Android, Sony PlayStation, Microsoft Xbox, Sony TVs, Panasonic and so on. Each of these platforms has its own requirements for internalization, which is a serious challenge for our team.

Here are some examples where UI localization is required:

add new language
adding new features
changes to existing texts and data

Text translation for UI is not an automated process; during the translation, the localization managers work together with the development team in order to clearly understand what this or that line belongs to, which languages it needs to be translated into, and by what time the localized files need to be provided. It gets a lot more complicated when several features are developed in parallel and run in different Git branches.

After the translation is completed, the application is assembled, tested and placed on the platform. Some devices require third-party confirmation (for example, from Apple). All this provokes an undesirable delay in the deadlines. Especially unpleasant are cases of emergency changes.

But what if we make the localization process open to all stakeholders — both for the development team and for the localizers? What if we no longer need to rebuild the builds every time we make changes to the text?

To solve these problems, we developed a global UI string repository called Global String Repository; localized strings are stored here, which are substituted into the environment for code execution. We integrated the Global String Repository into the localization process, so they complement each other.

Global String Repository separates localization packages and namespace (placeholders). The localization package stores all data line by line in all languages. Placeholders are placeholders for packages the team is working on. During development, standard placeholders are used. The workflow looks like this:

The developer makes changes to the English version of the string in the package (in the placeholder namespace)
The translation process starts automatically
Linguists complete the translation
Translators make placeholder kits available

When integration with the Global String Repository occurs, there are two types of application behavior:

At runtime: allows you to quickly make changes to the UI
During assembly: using the Global String Repository separately for localization, and data packets with the assembly (build)

The Global String Repository enables integration during the build phase by providing access to localized data through the REST API.

We open the Global String Repository through the Netflix API, so the same scaling and requirements apply to it as the metadata of other APIs. For applications that integrate at run time, this is a critical part. We have 60 million users who run Netflix on different devices, so the Global String Repository is a priority.

Like Netflix, the Global String Repository has a microservice architecture. Microservice is a Java web application (implemented in Apache Cassandra and ElasticSearch) that is hosted in three AWS regions. We collect statistics for each API request.

The Global String Repository interface is developed on Node.js, Bootstrap and Backbone and hosted in AWS.

On the user side, the Global String Repository uses the REST API to retrieve data and offers a Java client with built-in caching.

Despite the fact that we have come a long way and are actively developing the Global String Repository, we have something to strive for. Here is what we are working on now:

We develop support for strings with numerical variables and strings with gender identifiers
We develop the robustness of our technical solutions
Improving scaling processes
We support export to different formats (Android XML, Microsoft .Resx, etc.)

The Global String Repository is not tied to the Netflix business domain, so we plan to release it as open source software.

Hydra

Netflix - a global service that supports many locales in a myriad of different combinations on different devices / UI; manual testing is not appropriate in this case. Previously, the team of localizers and UI developers tested everything manually on different devices - from consoles to iOS and Android; this is how we checked all the lines for compliance with the context and the UI (for example, if there was any “trimming” of the text).

But Netflix's philosophy is that we strive for excellence. This approach allows us to rethink what we are doing. So Hydra was born.

The task of Hydra is to create a catalog of all possible options for a unique screen that will show exactly the screen that is required (search is carried out by filters, for example, you can select a device and locales). For example, as a specialist in German localization, you can configure filtering so that you see the whole path that unregistered users on the PS3, website and Android go through. The same screens can be viewed at the pace at which the user will open them on his device.

Working with screens in Hydra

Hydra does not work with screens directly; it serves to catalog and display them. To take a screen display from the Hydra catalog, we use our UI automation model. With Jenkins CI, data-driven tests run in parallel in all supported locales: this creates screenshots that are published to Hydra with the appropriate metadata (page name, function area, UI platform and one critical piece of metadata - a unique on-screen definition).

A unique screen definition is necessary in order to compile a complete catalog of screens without false matches. This allows you to compare a larger number of screens in the long run, as the image of each screen is compared with itself. The definition of a unique screen differs from UI to UI; for a browser, this is a combination of the page name, browser, resolution, local environment, and development environment.

Technology

Hydra is an AWS full-stack web application. The Java back-end has two main functions: it processes incoming screenshots and provides data for the back-end via the REST API.

When UI automation sends the screen to Hydra, the image file itself is written to S3, which ensures its infinite storage (plus or minus), and much smaller metadata is written to the RDS database, so that later it can be requested via the REST API. REST endpoints (REST endpoints) display query string parameters for MySQL queries.

For example:

REST/v1/lists/distinctList?item=feature&selectors=uigroup,TVUI;area,signupwizard;locale,da-DK

This request contains parameters for selecting the necessary data from the Database:

select distinct feature where uigroup = ‘TVUI’ AND area = ‘signupwizard’ AND locale = ‘da-DK’

The JavaScript front-end, which uses knockout.js, allows users to select filters and view screens that match those filters. The contents of the filters, as well as the screens that correspond to the selected filters, are provided by the calls to the REST endpoints mentioned above.

Application scaling

After installing Hydra and starting automation, adding new locales is as easy as adding one line to an existing properties file, which is sent to the Data Provider of the testNG framework. Screens with a new locale will be displayed with the following working Jenkins builds.

What's next?

We need to implement a function that will notify that the screen has changed. At the moment, if the line changes, there is nothing that would automatically notify about this. Hydra can turn into a more or less working queue, and then localization experts will be able to log into the system and see only a specific set of screens that have changed.

Another feature is the ability to map individual lines of keys to which screens you want to display. This will allow the translator to change the line, and then perform a key search and see the screens affected by this change; so the translator will see how this line changes in context in advance.

We are not afraid to solve complex problems. Netflix will become a global service, and our localization team will expand. Such challenges allow us to attract the most talented people, and we create a team that can do what is considered impossible.

Tags: