Features of text localization

Original author: Richard Ishida
  • Transfer
This article was published on the w3c blog in 2007, but it seems that most software developers do not even think about the problem of text length. How many times have you seen messages that do not fit in alert?



Who this article is for: web developers, web project managers, localization specialists, as well as everyone who is interested in how changing the length of text during localization affects the page design.

When text is translated from one language to another, the length of the original and translated text is likely to be different. There are several situations in which these differences will be systematically repeated.

This article contains reference materials that describe some of these differences. Other articles will examine what implications this has for web page development and suggest solutions.

In general, the more flexible the layout you create, the better. Allow the text to flow and avoid small fixed blocks or dense placement of them wherever possible. Be especially careful when accurately positioning text in a layout. Separate the content of the page and its presentation in such a way that you can easily adapt the font sizes, line spacing, etc. when translating. You should also keep this in mind when designing databases when setting the length of text fields.

Most problems with English and Chinese



Texts in English and Chinese are usually very compact, and translation from these languages ​​will usually be wider. Sometimes much wider.



For example, the Flickr user interface has recently been translated into several languages. One of the most common messages that appear when you view your photos is the number of views. For example, "392 views." Let's compare the length of the translation of the word “view”, relative to the English original.

TongueTransferAttitude
Korean조회0.8
Englishviews1
Chinese次 检视1,2
Portuguesevisualizações2.6
Frenchconsultations2.6
German-mal angesehen2,8
Italianvisualizzazioni3
Due to the large width of the glyphs, each character in Chinese or Korean counts as two characters in English.

Increasing text lengths up to 300% in Italian is a common occurrence for short lines like this. In 1994, IBM published the Localized Application Design Guide, which provides the following average values ​​for translation length correlations from English for European languages ​​(see Volume 1 of the Guide).

Number of characters in EnglishMedium length change
To 10200-300%
11-20180-200%
21-30160-180%
31-50140-160%
51-70130-140%
More70 150%
In general, as a rule, the text of the translation takes up more space, and the smaller the length of the original message, the higher the likelihood of a significant increase in the length of the translation.

Of course, this is not all lines or messages increase in length, but you should find a way to solve this problem when it arises. For example, Flickr translates “FAQ” as “FAQ” in the German and French versions, and in Portuguese as “Perguntas freqüentes” and “Preguntas frecuentes” in Spanish.

As a rule, the shorter the word in English, the higher the likelihood that the translation will be “jammed” in a cramped space, for example, next to a form field or inside a graph, or in a tab of limited width, etc.

Keep in mind that text extension is not a problem with interfaces with original text in English or Chinese. If the original application is in Spanish, the term “Idioma de la interfaz” will be shorter in English (“Interface language”), but significantly longer in Malay (“Bahasar pegantar untuk penelusuran”). In addition, shorter translation lines also cause problems, as they create excess white space on the page.

When translating entire paragraphs of the text, the relative expansion is likely to be less, but still situations may arise that are worth paying attention to. For example, can you display on the “first screen” everything that you have in mind? Will the page elements still be aligned the way you want if the blocks grow in height at different speeds?

Complicating factors



In addition to the unpredictability of the number of characters in the translation, there are other factors that complicate the management of the text in the layout.

Articulated Nouns



In some languages, such as Finnish, German, and Dutch, one large “word” is often created to replace a sequence of several short words in other languages.

For example, the English phrase “Input processing features” turns into “Eingabeverarbeitungsfunktionen” in German. The text in English is easily divided into two lines if there is a restriction on the width of the block, for example, next to the input fields in the form, or on tabs or buttons, or in narrow columns. The German "super-word" cannot be carried over automatically and can create a significant problem in the layout.

Character Width



Chinese, Japanese and Korean, as well as some other languages, have a more complex spelling of characters than Latin-based languages. This leads to the fact that even if the number of characters in the translation string remains the same, or even becomes slightly less, the space occupied by the string can be much larger than in the original.

For example, the English word "desktop" turns into "デ ス ク ト ッ プ" in Japanese. The Japanese translation is less than one character, but usually takes up much more horizontal space.

Character height and line spacing



Often characters of non-Latin text are much higher than Latin characters. In addition, writing features often require longer line spacing.

For example, the figure below shows the same text in English and Thai. Please note that in both cases there are only two lines, but the Thai version takes up much more space. This is partly due to the complexity of the characters (this leads to higher glyphs, and, consequently, an increase in the height of the line), but in addition, Thai is characterized by a greater leading. There are many scripts that require a much longer line spacing than Latin: Arabic (especially Nastalik ), Chinese, Devanagari (used for Hindi), Japanese, Korean, Tibetan, etc.



Think twice before using abbreviations



When you use abbreviations to place text in tight spaces, you should seriously consider whether this is really a good idea? Other languages ​​may not have this abbreviation, and the translation text may be much longer.

In many languages, abbreviations are a rarity. This may be due to the style of the language. In other cases, this may be due to practical considerations. For example, Arabic “words” are usually built on the basis of a compact root with prefixes, suffixes and minor internal changes to more accurately reflect the meaning. Abbreviations without loss of meaning become a big problem.

In addition, you will have to introduce translators to the list of abbreviations and abbreviations used.

Also popular now: