An Alternative Way to Localize Websites: Mutant CDN Content

Introduction


Most web developers were faced with the task of translating the website into several languages. The mission is quite simple, and the solution, as a rule, refers to the routine. I am sure that many will agree with the statement that localization is a boring, non-creative part of the project.

In this article, I would like to introduce an alternative website translation model for discussion. If you try to describe the principle in one sentence, then it is: CDN, which translates content between the user and the original source.

The need for translations


I doubt that the benefit of multilingualism of the resource is worth proving, but, nevertheless, I devote one small paragraph to this.

Any Internet site is accessible to three billion users on the planet by default - simply because your site is on the Internet. If you are selling something on the site, then simply adding a language you actually enter a new market. At a minimum, you will want to have a version in the local language (the language of the territory where you do business) and the English version, because English is half of the Internet content according to W3Techs .

Existing methods


Translation Files


There are different options - from special formats like GNU gettext, to simple text files that your current framework can use. The result is approximately the same: at the time of text output, a function is called that checks for the translation in the dictionary.

PHP example:

// gettext:
echo _("Привет, мир!");
// Фреймворк Laravel 5
echo trans('common.hello_world');

Advantages of the method:

  • Proven for decades method (read the usual);
  • The ability to simply give individual files to third-party translators;
  • Little effect on the code.

Cons of the method:

  • As a rule, there is no possibility of making immediate changes;
  • gettext dictionaries need to be compiled, and final files committed to the repository;
  • Relatively inconvenient and slow text management, the larger the project - the more files and the more confusing hierarchy;
  • There are no standard mechanisms for working on translations by a team of translators;
  • As a rule, two schemes are used in parallel - translations for the backend and translations for the frontend (JavaScript);
  • From time to time, HTML tags appear in texts for translations, because it was inconvenient for programmers to pull them out.

Database translations


Unlike the first method, translations are stored in the project database, which allows you to make adjustments on the go. As a rule, project developers create a translation control panel for administrators, which takes extra time.

Advantages of the method:

  • Easier to organize the work of teams;
  • The ability to make immediate changes.

Cons of the method:

  • It is more difficult to give sections for translation to third-party translators;
  • Frontend texts are still translated separately from backend texts;
  • From time to time, HTML tags appear in texts for translations, because it was inconvenient for programmers to pull them out.

User side translation via JavaScript


A relatively new method, is now offered by several Western startups. All you have to do is add a link to an external JavaScript file, which will begin to replace texts in the DOM based on translations provided (or approved) in advance.

Advantages of the method:

  • Easy installation with little to no programming;
  • Frontend and backend are translated simultaneously from one translation repository;
  • There will be no HTML tags in the text repository because all texts were processed post factum from the DOM.

Cons of the method:

  • Search engines will not see additional languages;
  • Share the link on social networks will also be impossible;
  • Additional network load (read the risks of delays) when you open the site.

Cdn translator


Actually, what is brought up for discussion in this article. But what if a “layer” is inserted between the user and the sites — an edge server capable of translating web content? Services like CloudFlare already know how to mutate client pages minimally - add the Google Analytics code, for example. What if we take a step further and allow the user to replace texts and links?

Conventional CDN behavior:

  1. The client requests address X;
  2. If the address X is in the cache, then it immediately returns from the cache;
  3. If the X address is not in the cache, then the edge server makes a request to the original site, and then returns a response to the client. Depending on the headers in the response of the original site and the rules set on the site, resource X can now be cached.



CDN Translator Behavior:

  1. The client requests address X;
  2. If the address X is in the cache, then it immediately returns from the cache as is;
  3. If the X address is not in the cache, then the edge server makes a request to the original site, and then applies the mutation rules - replaces links, replaces translated texts. Depending on the headers in the response of the original site and the rules set on the site, resource X may be cached.

Step 2b in detail


After receiving a response from the original site, the edge server has a task how to translate it. Suggested tactics:

  1. Pay attention to the Content-type header. If the value is not included in the list of supported ones, then do not try to transform the content;
  2. Pay attention to the size of the answer. If the size is higher than the established border - do not try to transform the content;
  3. Start parsing and editing content. Example for an HTML page: walk through all the DOM nodes that have a text descendant node. Request translated text in the repository, passing the source text and context as parameters.
  4. Replacing the necessary pieces of content, we return the result to the user. If headers and rules allow, then we cache the result.

It would be logical to implement the repository as a freestanding RESTful API, and it would be convenient to set the context like URL: selector. For example, we want to translate the word “Main page” as “Home” in any block of any page starting with / news, we get the context “/ news *: head”. The world is so used to CSS / jQuery style selectors that almost any developer will be able to start working with this syntax on the move.

Since the border server requests translation into the repository API, the implementation of SDK and packages for popular languages ​​and frameworks becomes completely logical. Website owners are given the choice - you can translate content via CDN, you can through our class in the existing code.

Suppose we have a PHP application and use the Laravel framework. Implementing legacy support is trivial - we re-declare the trans () helper function, replace it with our implementation, where the search is not in local text files, but in the remote API. To avoid delays at each request, we use a cache or a separate proxy process.

Similarly, we can change the contents of JavaScript objects, graphics, and so on.

Advantages of the method:

  • Complete abstraction of the application and translations - the application does not know at all about the availability of other language versions. Programmers calmly work on the main product;
  • Backend and frontend content is translated simultaneously using one translation repository;
  • You can simply translate graphic images;
  • It is very simple to run translated versions of the site on other (separate) domains;
  • Compatible with any existing CDN service. Can be arranged in a chain;
  • Compatibility with search engines and social networks;
  • There will be no HTML tags in the text repository, because all the texts were processed post factum from the DOM;
  • Easy to organize team work.

Cons of the method:

  • I could not find, but I will be very glad to help in this!

YouTube video


To clearly explain the concept, I shot a very short video clip that shows my prototype of such a translation system. Narrative in English, but I added Russian subtitles.



Implementation


I already checked the feasibility and practicality of the proposed method - I wrote a primitive version of the border application in PHP and Lumen.

My method, receiving a request from the user and returning the translated response:

/**
 * @param Request $request
 * @param WebClientInterface $crawler
 * @param MutatorInterface $mutator
 * @param TranslatorInterface $translator
 * @return Response
 */
public function show(Request $request, WebClientInterface $crawler, MutatorInterface $mutator, TranslatorInterface $translator)
{
    $url = $request->client['origin'] . parse_url($request->url(), PHP_URL_PATH);
    $response = $crawler->makeRequest($request->getMethod(), $url);
    if ($response === false) abort(502);
    $mutator->initWithWebRequest($response);
    if ($response->isTranslatable()) $mutator->translateText($translator);
    if ($response->isCacheable()) $mutator->cache(60);
    $mutator->replaceLinks($request->client['origin'], $request->getSchemeAndHttpHost());
    return (new Response($mutator->getBody(), $mutator->getStatusCode()))
        ->withHeaders($mutator->getHeaders());
}

Уверен, что многие начнут сомневаться в парадигме из-за нагрузки на процессор – ведь тот же nginx потому и не хочет никак мутировать содержимое ответов, что это очень негативно отразилось бы на производительности. Вообще, переводить вот так, post factum – это, безусловно, дороже с точки зрения ресурсов.

Мои аргументы здесь следующие. Мы наблюдаем постоянное удешевление IT-ресурсов в течение последних 5-10 лет, наступила эпоха серверов за 5 долларов – для многих сайтов не так уж и страшно немного повысить нагрузку. Во-вторых, если я все-таки займусь этим проектом, то оптимизация производительности будет одним из приоритетных направлений. Наверняка, можно найти много мест для улучшений!

Заключение


The industry is always moving towards optimization, increased comfort and cost savings. I believe that the proposed method for localizing web applications may very likely become the main one in 5-10 years.

Moreover, CDN, as a structure, may have more and more new applications. CloudFlare offered the world DDoS protection, Imgix makes responsive images on the fly.

Also popular now: