wtigga May 7, 2019 at 08:20

Differences between Fluent and gettext

Transfer

Continuing the discussion about the advantages of Fluent over the usual gettext, I publish the official position of the creators of Fluent in translation.

Gettext is a localization system deeply rooted in the GNU project and its associated architectural solutions. The Fluent Project sees gettext as a good example of a complete low-level platform-independent ecosystem of libraries and tools for managing the full product release cycle with localization files in a readable format. At the same time, the Fluent paradigm leads us to other architectural solutions in important localization aspects, which, in turn, lead to completely different APIs and life cycles.

In other words, gettext is a great project, but we do not share its views on the approach to localization.

Here are the main differences between gettext and Fluent:

	gettext	Fluent
Message id	source string	provided by developer
Argument Binding	positional *	based on keys
Cancellation of a transfer	fuzzy matches	ID change
Data storage	human-readable format (.po) or compiled format (.mo)	human-readable format (.ftl)
External arguments	нет	rich support
Plural support	special-cased	part of the general syntax of the selector options
Plural Support Latitude	At the discretion of the developer, affects all translations	At the discretion of the localizer, affects only a specific locale
Designed for	Languages of the C * family	Web, modern client languages
Post links	determined by the developer	defined by localizer
Message Templates	necessary (.pot)	нет
Localizer Comments	no *	full support
Error recovery	fragile	strong recovery logic
Compound Messages	нет	value + attribute per message
Bidirectional texts	нет	bidirectional isolation
International formatting	нет	explicit and implicit

Arrangement

The most important difference between gettext and Fluent is the message identifier. Gettext decided to use the source string (usually in English) as an identifier. This choice seems simple, but later on it imposes many restrictions.

First of all, with this approach, any change in the original line invalidates all the translations associated with it. This seriously increases the load on developers, forcing them to never change the original messages, as this will require updating all translations.

Secondly, it complicates the introduction of several messages with the same text in the source language, which must be translated in different ways. For example, the text for the “Open” button and for the “Open” mark can be translated in different ways, since the first text is a command, and the second is a description. Gettext has an optional msgctxt context line for distinguishing between lines with the same source segment. This approach places the responsibility for recognizing such situations on developers, which contradicts the principle of separation of interests.

Fluent does not recommend reusing texts for precisely this reason. Separating the source text from other translations is also important for our ability to enter compound messages (which contain multiple lines for one translation unit, attached to one user interface widget) and for identifier-based links to messages.

Fluent establishes an “agreement” between developers and localizers. The developer enters a unique identifier and a set of variables (number of unread messages, username, etc.), and the localizer, using the Fluent syntax, decides how to construct the message text for this identifier.

The developer should not worry about the detailed implementation of translations of such messages. All that a developer needs is to get a single line of text suitable for a specific place in the UI to request a string by a specific identifier.

Message Options

Gettext supports a small set of functions for internationalization, in particular for plurals. But such plural syntax is a special case, in addition to the standard gettext syntax, and it is difficult to scale for other cases requiring variability.

Fluent supports the basic concept of string variation, which can be used with selectors. Usually the plural rule will be such a selector, but depending on the grammatical features of the language, there may be others, such as gender, declension, or even the environment - for example, the time of day or the operating system. The Fluent syntax allows locators to consider all of these features and create text that exactly matches the situation.

External arguments

Gettext does not support external arguments. In other words, you cannot specify the formatting of parameters - numbers, dates. To format parameters in gettext, it is recommended to return a string, which will be passed to printf or run String.prototype.replace on the resulting string.

In Fluent, support for external arguments is at the very core of syntax. External arguments are not only interpolated, but also used as parameters for the selector, and can also be transferred to built-in functions. This allows localizers to create much more accurate texts for specific cases. On top of that, Fluent places FSI / PDI markers around objects to protect directivity isolation in bidirectional text, and prohibits any manipulation of leaf strings, reducing the burden on developers.

Isolation of Responsibility

In addition, the way gettext handles plural rules requires the system designer to choose whether the message will be a multivariate message or a single line. From the point of view of Fluent, the developer should not deal with such issues. In many cases, when one option is enough in English, in other languages you need to add variants with plurals.

Fluent assumes that the developer should not have similar linguistic knowledge when developing software with many locales, and each language should have a certain freedom of action during localization.

As a result, Fluent stores each translation separately, without “leaking” the requirements of one language to others, and keeps all translations “opaque” for a developer who does not need to worry about what functions the localizers may need for a given line.

Cancellation of a transfer

In the development cycle, there are three situations where the translation is “canceled” (becomes invalid) in relation to the original:

Minor change: does not affect the translation (correct punctuation, typos).
Medium change: affects the design of the message, but does not cancel the correctness of the associated translation (for example, Show All Bookmarks -> Show Bookmarks Manager ).
Significant change: the new meaning of the sentence ( Click to save -> Click to open ).

For architectural reasons, gettext integrates all three levels in a single state under the name fuzzy match ( fuzzy ). Any change in the source line (at least complete, at least insignificant) leads to the cancellation of translations.

In Fluent, the use of unique identifiers allows you to keep two of these levels separate from the third: when you make small changes to the source text of a line and when you save the identifier, translations remain valid. On the other hand, if the developer changes the identifier, then all translations are canceled and will require updating.

We believe that such an architectural solution is more beneficial for most production cycles, although we recognize that for changes in the averageAt the level, the developer will have to choose between saving or changing the identifier (i.e. between a minor and a significant change).

We are also considering the idea of message versioning so that the developer can mark the message as updated without completely invalidating its contents. This state will allow you to keep the translation valid based on the point that the old version of the translation is still better than the untranslated string, and at the same time will allow the tools to notify the localizer about the need to update the translation.

Data format

Gettext uses three file formats - * .po, * .pot and * .mo. This affects the implementation of gettext in the production cycle by adding steps like extracting and compiling messages.

Fluent uses a single * .ftl file format, which simplifies implementation and does not require additional steps that can lead to data discrepancies.

Unicode Support

Gettext can be encoded in UTF-8. In general, this is where Unicode support ends. It uses its own data set for plurals, does not know how to format dates and numbers, does not help in working with bidirectional texts.

Fluent makes extensive use of standardized libraries and CLDR, ICU, and ECMA402 algorithms, neatly combining localization and internationalization.

Conclusion

We believe that the Fluent API and syntax represent a significant improvement over gettext, and we recommend that you use them for international software.

More About Fluent

Tags: