TigraSan October 28, 2013 at 01:54

Recipe i18n. Basis - Babel, json with coffee and a grant with hbs to your taste

Tutorial

In my previous post, I wrote about why and why it was necessary to make pybabel-hbs, an extractor of gettext strings from handlebars templates.

A little later there was a need to extract the same from json.
This is how pybabel-json came about.
pip install pybabel-json or on github

There was used the javascript lexer built into babel, but there were nuances as well, but the post wasn’t about that, it was written less interesting than it was in the hbs plugin and hardly needs attention.

This post is about how the whole set for localization looks in general, from and to what to do with data from the database, or from another not quite static place.
From and to includes:
(I must say that not a single item is mandatory, all this is quite easily connected to any application only partially and by necessity)

- Babel. A set of utilities for localizing applications.
- Grunt. Task Manager (s),
- coffeescript. It doesn’t need a presentation, all the client code is written in coffee, and you also need to extract strings from it.
- handlebars - templates
- json - string stores
- Jed. gettext client for js
- po2json. Utility for converting .po files to .json format supported by Jed

A bit about gettext and myths

gettext is initially a set of utilities for localizing applications, today I would call gettext also a generally accepted format. (not to be confused with the only one)
The minimal essence can be described as follows: there are lines in English that pass through a certain gettext function and turn out into a line in the desired language, preserving the rules of the language regarding different declension for plurals + the ability to specify context and domain.
It is important to note that it is the strings, they are the keys, and not the constant USER_WELCOME_MESSAGE somewhere turning into text.

Not everyone needs a context and I haven’t implemented it yet in my babel plug-ins, because if necessary, pull requests are welcome
. There will be a couple of words later about domain.
And here is ngettext- A thing absolutely necessary for many, if not all.
And then about the myths.

Zero apples. Zero apples
One apple. One apple
Two apples. Two apples
Five apples. Five apples

This simple example should show all lovers of linguistic constants a la “USER_WELCOME_MESSAGE”, who are then given for translation, that everything is not as simple as it seems at first glance.

The rules predefined and described in babel decide which line will be selected.
For example, this is for English:

"Plural-Forms: nplurals=2; plural=(n != 1)\n"

And this is for Russian:

"Plural-Forms: nplurals=3; plural=(n%10==1 && n%100!=11 ? 0 : n%10>=2 && "
"n%10<=4 && (n%100<10 || n%100>=20) ? 1 : 2)\n"

Great and Mighty :)
No need to be afraid, you won’t have to write this manually for, for example, Japanese.

So, about the myths.
Several times I heard the opinion that you can make the main site in Russian and wrap Russian lines in gettext calls, and then add English.
If you have your crutches using the same linguistic constants, you don’t have any inclined sentences with numbers, and you use an ugly format like “You have apples: 1”, then of course you can do basic Russian.
If you want to display a little more beautiful messages to the user, such as “You have 1 apple”, “You have 7 apples”, then the main language should be English.

Why? It's all about apples.
The plural is not always in the singular, and the singular is not always in the unit.
English is simple in this regard, but not Russian.

ngettext by default, as the key expects exactly English. Moreover, ngettext only accepts two parameters as input - singular and plural. And not an array of plurals.

Thus, if you still want to use Russian by default, you will at least have to support the Russian-Russian translation file, in which the line “You have% s apples” will turn into the correct declension. Yes, you can - but it's crooked.
When changing, you will need to remember that only the key is changed, not a string in Russian, and you need to go and edit the Russian language file in parallel. In general, no need to do this. ngettext is as compatible as possible with the English language as the original.

By the way, at the same time I will show an example of how .po files look for English and for Russian

msgid "You have %(apples_count)d apple"
msgid_plural "You have %(apples_count)d apples"
msgstr[0] "У вас %(apples_count)d яблоко"
msgstr[1] "У вас %(apples_count)d яблока"
msgstr[2] "У вас %(apples_count)d яблок"

msgid "You have %(apples_count)d apple"
msgid_plural "You have %(apples_count)d apples"
msgstr[0] ""
msgstr[1] ""

That is, the number of result lines depends on the language configuration. Maybe there is a language in which there are a dozen plural forms ...

OK, So Where Do I Start?

All those who still have 3 apples must be motivated in order to start

pip install babel

The hard part is behind.

It remains:
- Change the entire text in the code to gettext calls
- Set babel on the code
- Based on the received .pot file, make a .po file corresponding to each desired language.

And what actually translate?

The question is not as simple as it seems at first glance: The

part is simple - templates and code.
Django and flask - there are extractors from
Python and javascript templates supported by babel originally
handlebars and json - I had to make links at the beginning of the post.
For coffeescript - the recipe further.
For everything else - google to help.

Once again, the part is simple - the code, for this all the lines need to be wrapped in calls to gettext / ngettext in accordance with the format that each of the extractors requires. As a rule, they also provide the ability to override which function they should use.
For example, I have this:

pybabel extract -F babel.cfg -o messages.pot -k "trans" -k "ntrans:1,2" -k "__" .

trans and ntrans are specified for javascript, and __ for python, in which this function is used to transparently transmit a string (more on this later).

That is, all
print ("apple") must be converted to print (ngettext ("apple"))
A all
print ("I have% s apples") to print (ngettext ("I have% s apple", "I have% s apples", num_of_apples)% num_of_apples) I

must notice here that I wish everyone I never use and do not recommend using unnamed parameters.
In my case - only named ones, that is, it should look like this:

Python:

print(gettext("I have an apple!"))
print(ngettext(
      "I have %(apples_count)d apple",
      "I have %(apples_count)d apples",
       num_of_apples
).format(apples_count=num_of_apples))

The standard gettext is used, for flask and django there are own

Javascript wrappers :

console.log(i18n.trans("I have an apple!"))
console.log(i18n.ntrans("I have %(apples_count)d apple","I have %(apples_count)d apples",num_of_apples,{apples_count:num_of_apples}));

Here and in coffee, proxies are used for Jed methods from here:
github.com/tigrawap/pybabel-hbs/blob/master/client_side_usage/i18n.coffee
Parameters are transferred to the line due to the built-in Jed sprintf

Coffeescript:

console.log i18n.trans "I have an apple!"
console.log i18n.ntrans "I have %(apples_count)d apple", "I have %(apples_count)d apples", num_of_apples, 
        apples_count:num_of_apples

Hadlebars:

{{#trans}}
I have an apple!
{{/trans}}
{{# ntrans num_of_apples apples_count=num_of_apples}}
  I have %(apples_count)d apple
{{else}}
   I have %(apples_count)d apples
{{/ntrans}}

JSON string store:

{
    "anykey":"I have an apple!",
    "another_any_key":{
           "type":"gettext_string",
           "funcname":"ngettext",
           "content":"I have %(apples_count)d apples",
           "alt_content":"I have %(apples_count)d apples"
    }
}

Offtopic: Explanation of this format in the documentation for pybabel-json.

I think it was not difficult to notice that num_of_apples is repeated every call twice.
The reason is that once it is passed as an argument to ngettext, by which it is decided which string is used, and the second time as a parameter for the string, along with other possible parameters substituted into this string.

- As I said - this is the simple part, wrap the existing text.
Next, you need

1) Change all the buttons on which the labels on the buttons with texts. Everyone knows that buttons with text are bad. But often this has to be accepted, because it’s faster, and the designer wants it that way :)
- With this item, everything should be clear - tedious, but necessary

2)
A much more interesting point is what to do with seemingly constant lines, but which are not quite constant?
As an example, I’ll give our case - genres for songs. It seems to be dynamics, they are stored in the database, but in fact - a rarely changing statics, which would be nice to tear out and send for translation.

This is exactly what caused pybabel-json.
This solution is also a solution to any other translation problem, such as a response to a third-party server error message. We can say that this is static, but this is static that we do not control, which must be beautifully wrapped for translation.
All that is needed - to create a file .json
errors.json
with content

{
    "from_F_service": [
       "Connection error",
        "Access denied"
],
    "from_T_service":[
        "Oops, it is too long"
]
}

No keys, clean array of strings.
The worst thing that happens if the service has changed the message is that the user will receive an untranslated version. As a rule, these are trifles.

With the data in the database, the situation is similar, in the build-push-deploy system, whatever (after all, do you have something)? at the same level where there will be commands for assembling everything and everything babel needs to add a script before these commands that will extract all the necessary data from the database and collect similar json, the babel launched by the track will already collect the data.
Needless to say - such files should be added to .gitignore or an analogue of whatever it may be, in general, so that source control does not get into it.

All lines that are received in this way should go through the gettext function call
That is, if it is in python, then gettext (), in js Jed or the proxy methods given earlier.

It should also be noted that sometimes you want to do it in the reverse order. Or you need to do it in reverse order.
That is, to determine in the code that the line should be translated, but the translation itself will be launched elsewhere.
I will give an example in python:

class SomeView(MainView):
      title=gettext("This view title")

If you write such a code, then you run the risk of getting the created copy of the class in English if the class was created when the server started, or for example the Chinese version if the creation was dynamic but cached at the first call.

In such cases, I would like to mark it for translation, but translate it in the right one place
The right place is to create an object, not a class
i.e.


def __(string,*k,**kwargs):
    return string
class MainView(SomeParent):
    def __init__(self):
             #....
             self.title=gettext(self._title)
             #....
class SomeView(MainView):
      _title=__("This view title")

That is, the string collector will define __ as a string for translation, the function itself does nothing, and the translation will be launched at the right time.
Thus, everything is in one place and looks beautiful.

This applies to many languages, including coffeescript and javascript, if you write under node.js.
For the browser, this is less relevant, since even at the time of creating the class it should already be known for which language to create.

But in any case, it’s more correct to translate it in the constructor, and not at the time the class is created.

It seems to have bypassed all the possibilities of the direction of translation known to me, let's say all this has been done.

Glue it all together

Now you can try to collect all this, there are a few simple steps:
0) Create an empty directory of the original lines, so as not to swear in the future on the lack of a file

touch messages.pot

1) Create .po files of target languages This is done 1 time and should not be included in the build. .po files are files containing both the original lines and the translation for them, for each language.

pybabel init -i messages.pot -d path/i18n -l es
#Эта команда создаст .po  для испанского языка в директории path/i18n/es (включая саму директорию i18n если нужно)
#Повторить для каждого языка, либо за раз: (Кстати может кто подскажет, как это можно было сделать без echo?, echo мне кажется костылем) 
echo {es,en,fr,de,ja} | xargs -n1 pybabel init -i messages.pot -d path/i18n -l

2) Create / update a .pot file - the main repository of lines. It also should not be included in the build, but should be run when it is necessary to receive new .po files that will be sent for translation.

python/node/your_language update_translation_jsons 
#Упомянутое ранее обновлении данных из ДБ
pybabel extract -F babel.cfg -o messages.pot -k "trans" -k "ntrans:1,2" -k "__" .
# извлечение новых строк
# trans - для экстрактора из джаваскрипта, ntrans - тоже
#  __ для "прозрачного" экстрактора из питона
# babel.cfg - конфиг babel-а что и откуда брать
pybabel update -i messages.pot -d path/i18n/
#обновление .po файлов для всех языков,

It will not be out of place to show an example of a babel.cfg file, this is a mapping file that indicates how and from which files to extract strings:


[python: path/backend/notifier.py]
[hbs: path/static/**.hbs]
[json: path/static/i18n/src/**.json]
[javascript: path/static/**.coffee_js]
encoding = utf-8

3) Run all .po files through po2json to get .json, which Jed will accept.
This can and should be included in the build.
What you can’t do is let it in git, they don’t belong there.

How exactly to feed all the .po file and where to put them is on the conscience of the user.
I run them in grunt, like the rest of the build.
The grunt-po2json which is on github and in the grant repository is broken, because it does not support rename, but it is needed, because it is more convenient for me when all the final .json files go to the same directory, I fixed it locally, but I need to send it to that it’s a pull request ... Of

course, it’s much easier, after installing po2json ( npm install po2json ) to include something similar in the build script:

echo {es,en,fr,de,ja} | xargs -n1 -I {}  po2json /path/i18n/{}/LC_MESSAGES/messages.pot /path/to/build/i18n/{}.json

Thoughts not included in the stream, but meaningful to draw attention to them moments

During the post he promised several times "more on that later", but for later there was no suitable place.

For example:
coffeescript does not have its own extractor, because with the build of statistics, coffeescript is compiled (or translated) in javascript.
Therefore, it’s enough to start assembling .js lines after translation into javascript.
In my case, everything is even a little wrong, next to each coffee file is the coffee_js file, which is created using grunt watch at the time of editing (and restarts the girl stats, but this is a topic for a separate post :)), these files by themselves outside the gita. Here are the lines from which they are pulled out

- There was still a mention of domains.
Domains are ultimately different files, messages.pot / messages.po = domain messages
You can create several domain names, bind all domain names to a Jed instance, or create several different Jed instances and redirect them to them.
But for this you need to expand the handlebars helpers or any other wrapper ... I have never had such a need, but as a rule I prefer not to do anything too much in advance :)

- A small footnote to tex in the introductory block

If you want to display a little more beautiful messages to the user, such as “You have 1 apple”, “You have 7 apples”, then the main language should be English.

It should be understood that in the ngettext call it is necessary to write “you have% (apples_count) d apples”, and not “you have one apple”,
because in both the case of the one and the 21st, the final line should be in the first form - that is, “You have a% d apple”

- It will also be important to focus on one issue that I have not yet had time to resolve automatically:
babel creates an “empty line” (the configuration of the .po file that determines which language it is and which should be a string for the plural) is not compatible with the format Jed
Jed expects that there will be «plural_forms», babel also produces plural-Forms
Here it is necessary either the output of babel, or the input of Jed, or between them will rule.
But first, look in the configuration of both.

If you missed something, did not describe, etc. - write in the comments, supplement.
The goal was not to analyze each utility in detail, the goal was to talk about the existence of these and about how and why exactly how they work together.
The rest will find a place in the comments

Tags: