Localization and globalization

    In the article, the author tries to outline some aspects of internationalization necessary for understanding in .NET, features of Chinese and not only localizations, and some funny moments.

    Terminology


    Here are some freely translated and edited definitions from MSDN:
    Localization is the process of translating application resources into localized versions for each language and regional settings that the application supports. The transition of localization should occur only after the Localizability condition is satisfied , i.e. executable code is separate from any user interface elements.

    Globalization is the process of designing and developing an application that supports a localized user interface and regional data for users from different cultures. Information about specific linguistic and regional parameters may include: a writing system, calendars used, agreements on the format of date and time, numbers, monetary and physical quantities, sorting rules, and even address, telephone, and default paper sizes.

    So what the teams are doing that lead to the desired kind of initially non-localizable products is called a big word - globalization , but not localization.


    language and regional standards


    So, in .NET there is a main class that provides information about the language and regional standards (for unmanaged code in English called `` locale '') - System.Globalization.CultureInfo. Next to him there are Calendar, RegionInfo, NumberFormatInfo, DateTimeFormatInfoand more. others.

    Culture has a name (essentially a code), in these terms it is convenient to communicate. An invariant culture has an empty name, so we will designate it as ivl.

    Two flow cultures

    Any instance stream Threadhas two properties: public CultureInfo CurrentCulture {get; set;}and the public CultureInfo CurrentUICulture {get; set;}
    first culture is used to format numbers, dates, etc. regional settings, and the second is used in the algorithm for finding suitable localized resources.

    So why do we need two cultures? There is a reason for this: for a descendant of Anglo-Saxons born and living in India, the native language is English. On it, he wants to see the program interfaces on his laptop. However, when working in Excel, he will most likely operate in rupees (the letter रु in Hindi), and he knows that the area of ​​his native country is 32,87,590.01km 2 .



    Tree structure

    Cultures form a tree. Those. every culture has a parent.

    At the root of the tree is "no" culture - invariant . It does not contain information about the region, represents a non-existent invariant language, the formatting rules in which are strangely similar to American ones. The parent of the invariant culture is another invariant culture, and so on until stack overflow.

    The opposite is certain ( specific , specific ) culture. They contain information about the language / letter, and about the region, and about the formatting of numbers and dates. Examples: ru-RU, en-US, en-IN.

    Parents of specific crop cultures are neutralculture. The purpose of such cultures is to carry information about language and writing. Until .NET 4.0neutral cultures could not contain information about formatting and the region, now this information is taken from the dominant specific culture. Examples: ru, en, mn-Cyrl(Mongolian, Cyrillic), mn-Mong(Traditional Mongolian alphabet).

    Question for filling the attentive reader: who can be the parent of a neutral culture?

    Common misconceptions

    So, we can easily present a branch of a tree of cultures using an example ivl <— ru <— ru-RU. But it is not true to say that hierarchy always consists of three cultures. So, for example, the authors of the book C # 2005 Programming Language for Professionals thought in the example to chapter 17, and then this was almost true .

    But languages ​​with several types of writing break the stereotype.



    Before .NET 4.0, everything was completely confusing: there were specific cultures whose parent was invariant. See Tula .

    Chinese bush



    On the Chinese spoken by more than 1.3 billion people, is an official in the People's Republic of China, Republic of China (aka Taiwan) and Singapore. And do not forget about special administrative areas - Hong Kong and Macau.

    There are two types of Chinese writing: simplified (since 1956) and traditional. Traditionally, the Chinese wrote from top to bottom, and the columns went from right to left. More recently, since 2004, Taiwan has ceased to officially use vertical writing. Now the "European" writing method is used - horizontally from left to right.

    Let's get back to .NET. Culture zh-CHSand zh-CHTin .NET 2.0 have been deprecated and replaced zh-Hansand zh-Hant. In the tree of cultures zh-Hansis the parent zh-CHSfor the correct operationfallback process . In the future, with any patch, obsolete cultures may disappear.

    Separately to emphasize that in the PRC both types of letters are used: in the Hong Kong and Macao - the traditional, the rest of the b o proc eed territory - Simplified.

    Fallback process

    To search for suitable resources (text, coordinates and sizes of controls, icons, etc.), the instance ResourceManagerlooks at Thread.CurrentThread.CurrentUICulture. UI culture can be either specific or neutral. But Thread.CurrentThread.CurrentCulturem. only specific culture.

    First, the resource manager tries to find resources whose culture matches the UI culture. If he doesn’t find it, he takes the parental culture and repeats the search. If in this way we reach an invariant culture, then we will have to use the default resources (neutral) (often they are located in the main assembly, but not necessarily).

    True, default resources can also be culture-marked. See details on MSDN .

    Jambs from MS

    No. 1

    The following bush of cultures is presented to your attention - Uzbek:



    It is clear what happened: after 1991, languages ​​once translated into Cyrillic began to relieve the Cyrillic alphabet intensely.

    The class CultureInfohas a property string NativeName, i.e. name of the culture in the described language. For culture, uz-Latn-UZmeaning NativeNameis equal U'zbek (U'zbekiston Respublikasi), although it really should be O'zbek (O'zbekiston Respublikasi).

    Bug already has many versions of .NET.

    Number 2

    Let's talk about the former Union Republic of Moldova , the self-name of `` Moldova ''. Moldovans speak the Moldavian language. Although philologists argue that this is not an independent language, but a dialect of Romanian.

    In fact, there are three Romanian languages:
    • Romanian in Romania (Latin);
    • Romanian in Transnistria (Cyrillic), remaining in the form in which it was at the time of the collapse of the Union;
    • Romanian in Moldova (Latin), with its own version of Latinization, which does not coincide with that adopted in Romania.

    It would seem that in .NET we can expect to see three specific Romanian cultures, well, or two - for political reasons (Transnistria). But no, there is no Moldova in the Windows NLS API . There is only culture ro-RO, Romanian (Romania). This is the locale that Moldavian users use. But Microsoft in Moldova there .

    And of course, .NET allows you to create your own cultures.

    It is interesting that once upon a time, in the first .NET and old OSes, cultures ru-MOand were noticed ro-MO. Yes, the region code was MO, and not MDas it is now. ISOHas the standard changed?

    Taboo for localized applications


    The list may not be complete, but examples from personal experience in catching bugs of localized applications.

    No. 1

    Obviously, you can never be wired to the names of system folders. Although, it would seem, where can they go Program Files? Due to some absurdity, they did not begin to rename this folder in localized to Russian Windows. But not in all localizations!

    In Spanish localization, the folder is proudly named Archivos de programa. I recommend: translating Google from Spanish into Russian.

    Number 2

    The real scourge of a globalized-localized application is strings. Concatenated. But even if a line with the substitutions, then the contents of the translators substitutions without comment unpredictably: "{0}" вызвало ошибку "{1}".{2}Обратитесь к {3}. And by {2}that I mean banal Environment.NewLine.

    References


    MSDN


    Articles


    Instruments



    Also popular now: