afan March 23, 2009 at 5:56 pm

Idea: formatting function for convenient string localization

Problem: when translating applications into other languages (more often we are faced with the Russification of English-language products), support for plural forms of numbers is most often affected. For example, "1 note, notes 2, 5 appreciable ki " or supposedly universal "1 file (s)» , etc. The fact is that in all programming languages, usually this does not go beyond sprintf () or some kind of template engine, and support for multiple forms must be programmed with pens every time: if N = 1, then “1 note”, otherwise “N notes”. And doing it every time is lazy. The gettext framework partially solves this problem, where there is the concept of several options for a localized string, but this does not make life much easier, because in a single string shown to the user there can be several parts depending on numbers (“Found 23 files in 3 folders”), and then pieces of lines must then be glued together anyway.

Therefore, the following idea came to me: why not come up with some kind of common format, a micro language (similar to the well-established format of the function format () or sprintf ()) to address this problem and simplify the writing of localized code in the future.

As you know, in different languages a different number of plural forms of numbers. There are two of them in English ( “1 file” , “many files”) In Russian - three ( “1 file” , “2 files” , “many files” ). In Arabic, in general, as Pootle tells us, as many as 6 pieces. Therefore, we need to be able to set directly in the line a set of several substrings and a parameter on which the choice of these substrings depends.

The proposed substring format with multiple variants:
{%COUNTER%|FORM0|FORM1|FORM2[|FORM3][|FORM4][|...]}

where
%COUNTER%is a certain variable name that can have values [0,1,2,3, ...]
FORM0 is the version of the string for the value COUNTER = 0 (this is a special case, usually requiring a separate message)
FORM1, FORM2 etc. - these are alternative text options for each plural form of a given language; for English it will be two options, for Russian - three, etc.

Example in English:
{%F%|No files|1 file|%F% files} found in {%D%|0 folders|1 folder|%D% folders}.

This will give us the following final options for the output line when substituting different values of% F% and% D%:
% F% = 0,% D% = 1 => No files found in 1 folder.
% F% = 1,% D% = 2 => 1 file found in 2 folders.

The same example of a line translated into Russian:

{%F%|Не найдено файлов|1 файл найден|%F% файла найдено|%F% файлов найдено} в {%D%|0 папках|1 папке|%D% папках|%D% папках}.

This will give us the following final variants of the output line when substituting various values of% F% and% D%:
% F% = 0,% D% = 1 => No files were found in 1 folder.
% F% = 1,% D% = 2 => 1 file found in 2 folders.

Note that having received the entire string for the translation containing several such variant inclusions, the translator a) is easier to understand in which context all the parts of the string are used and b) he has the ability to maneuver so that in the end the string sounds better.

It would be great to have an implementation of such a micro language in different programming languages.

I hope someone comes up with an idea.

UPD: The result was not long in coming: in this topic, the webdew habrayuzer shares the implementation of the function in C #, for which many thanks to him.

Tags:

Idea: formatting function for convenient string localization

Also popular now: