ABBYYTeam September 13, 2010 at 11:57

What does the Chinese keyboard look like?

You probably imagined it as a whole organ - a grandiose building a couple of meters long with hundreds and thousands of keys. In fact, most Chinese people use a regular QWERTY Latin keyboard. But how can it be possible to type so many different hieroglyphs with it? We asked our employee Julia Dreisis to tell about this. Her long-standing love and work connect her with China.

Background: Typewriters

For several thousand years, cunning Chinese managed to bring the number of characters to 50,000 with a tail. And although the number of signs needed in everyday life is not measured in tens of thousands, anyway, whatever one may say, the standard set of the old printing house is 9,000 letters.

For a long time, the set was carried out according to the principle "for each character - a separate printing element." Therefore, I had to work with monster cars like this:

Shuange typewriter, 1947 (the principle of operation was invented by the Japanese Kyoto Sugimoto in 1915).

Its main element is a bank of hieroglyphs located on an ink pad. A mechanical system is fixed above the hieroglyphs: a handle, a “foot” for gripping and a bobbin with a sheet of paper. The whole mechanism, along with the reel following the handle, is able to move left, right, forward and backward due to the efforts of the driver. To type the text, the engineer for a long time searches with the magnifying glass for the desired character, places the system above it and, by pressing the pen, activates the “foot”, which grabs the character and, unrolling, prints it on a piece of paper. At the same time, the bobbin with the sheet rotates a bit, providing space for the next character. Of course, the printing process on such an aggregate is extremely slow - an experienced operator could dial no more than 11 characters per minute.

In 1946, the renowned Chinese philologist Lin Yutang proposed a variant of a typewriter, built on a completely new principle - the decomposition of hieroglyphs into components.

Electromechanical typewriter Lin Yutang, 1946.

Unlike the overall predecessors, the new typewriter was no more than its Latin counterparts, and there were few keys on it. The fact is that the keys did not correspond to hieroglyphs, but to their components. In the center of the device was a “magic eye”: when the driver pressed a combination of keys, a variant of the character appeared in the “eye”. To confirm the selection, it was necessary to press an additional function key. With only 64 keys, such a machine could easily provide a set of 90,000 characters and a speed of 50 characters per minute!

Although Lin Yutang managed to get a patent in the USA for his invention, it never went to the masses. It is not surprising, because the production of one such device at that time cost about $ 120,000. In addition, on the day the presentation for the Remington company was scheduled, the machine refused to work - even the magic eye did not help. The idea was safely postponed until better times.

But in the era of the widespread use of computers, Lin Yutang's idea of decomposing hieroglyphs into their constituent parts took on a new life. It formed the basis of structural methods for entering Chinese characters, which we will talk about now.
(By the way, in the 1980s, the Taiwanese company MiTAC even developed its structural input method, Simplex, directly on the basis of the coding system of Lin Yutang.)

Structural methods

At least a dozen of such methods are known, and all of them are based on the graphic structure of the hieroglyph. Chinese characters are jigsaw puzzles assembled from the same parts (so-called graphemes). The number of these graphemes is not so great - 208, and they can already be "shoved" into a regular keyboard. True, we get about 8 graphemes per key, but this problem can be easily solved.

One of the most common methods of structural input is to kill jixing (Wubing zixing - “input on five lines”). How does he work? I warn you right away: difficult.

In fact, all Chinese characters are divided into four groups:

The base 5 traits (一, 丨, 丿, 丶, 乙) and 25 other very often used characters (each of them has a key associated with it).
Hieroglyphs, between which there is a certain distance between graphemes. For example, the hieroglyph 苗 consists of graphemes 田 and 田, between which there is a distance (although they are a little “pressed” on the seal and it may seem to you that there is no distance between them).
Hieroglyphs whose graphemes are connected to each other. Thus, the character 且 is a grapheme 月 connected to a horizontal line; 尺 consists of grapheme 尸 and flip.
Hieroglyphs whose graphemes intersect or overlap each other. For example, the character 本 is the intersection of the graphemes 木 and 一.

Well, we mentally broke the character we are about to introduce into graphemes. What's next? First, look at the layout of the kill :

At first glance, it may seem that the graphemes are randomly located. This is actually not the case. The keyboard is divided into five zones, according to the number of basic features, (in the figure they are marked in different colors). Inside each zone, the keys are numbered - from the center of the keyboard to the edges. The number is made up of two digits from 1 to 5 - depending on what basic features the grapheme is assembled from.

Well, let's start with the easiest to enter graphemes - capital graphemes for each key (they are shown in large print in the table). Each of them represents one of the 25 frequently used characters, which were discussed above. To enter such a character, just press four times on the corresponding key. It turns out that 金 = QQQQ, 立 = YYYY, etc.

Thus, 毅 = U + E + M + C. To enter hieroglyphs consisting of more than four graphemes, you need to enter the first three graphemes and the last.

The most difficult thing is to enter hieroglyphs consisting of two or three graphemes. Since there are a lot of them, inevitably several hieroglyphs will appear, claiming the same key combination. To distinguish between them, the developers came up with a special code. This code consists of two digits, the first is the serial number of the last character line of the character, and the second is the group number of the character (remember the four groups into which the characters are divided).

Fortunately, when typing most of the frequently used characters, you don’t have to think about codes, because the characters will appear on the screen after the first two or three taps. And the 24 most frequently encountered characters can be entered at all with one click (keys are assigned to them).

The drawbacks of structural input are obvious: it is complex - only the digest version of its description was above! To master it, the Chinese even came up with a special mnemonic poem. But the structural method opens up the possibility of blind input, which increases the maximum typing speed to 160 characters per minute. Therefore, professional typesetters use it. And do not forget: 160 characters per minute - this is about 500 keystrokes in the same minute!

For structural input, the most common QWERTY keyboard is most often used - after all, the location of the hieroglyphs on it still has to be learned. But sometimes you can see such keyboards with graphemes on the keys:

True, I have not seen such keyboards during my entire stay in China :)

Phonetic Methods

Typewriters using these typing methods simply do not exist - phonetic methods owe their appearance exclusively to computers. After all, using the phonetic method, you do not enter the character itself, but its pronunciation - and the system already finds the desired character. But here's the problem: in Chinese there are so many characters that dozens of hieroglyphs can correspond to the same pronunciation. The required character, as a rule, has to be selected manually from the list, which makes the input process rather slow. Predictive systems like T9 come to the rescue.

The most common phonetic method is the famous pinyin.(Pinyin). On its basis, a phonetic input system was built, which is included in the standard Asian Language Pack of the Windows system (starting from version XP - before that, it had to be installed additionally). Let's see how it works.

For example, we want to introduce the word “blogger” - wangmin .
First we type wang (or wang3 with the tone to reduce the number of options). After pressing the space bar, the character with reading the van is substituted . But this is not the van that we need. Right-click on it: A

long line of matches comes out. We can, breaking our eyes, look for our van there or simply enter the second syllable of the word - min. The system is smart - it itself will find the word wangmin in the dictionaryand automatically selects the desired van and the right min . Banzai, we did it!

The Google Pinyin and Sogou Pinyin input systems went further - they remember user preferences and suggest the necessary words based on the context.
Here is an example of how Google Pinyin analyzes a seemingly furious sequence

and produces the correct set:
I saw Wang Zhizhi play in the same match as Yao Ming (we are talking about two Chinese NBA players). It is especially nice that the names are spelled correctly.

In Taiwan, there is an alternative to the pinyin system - Zhuyin input(Zhuyin). The Latin alphabet is not used, but the syllabic alphabet with icons like hieroglyphs. Since there are few icons in the alphabet, it is easy to scatter them on the keyboard. Hong Kong has its own version of the Romanization of the local dialect - Jyutping (Jyutping), he is also actively used for phonetic input.

The main disadvantage of phonetic input systems is the relatively low typing speed - about 50 characters per minute (compare with kill jixing with its 160 characters per minute). The fact is that the input of the hieroglyph by the pinyin method occurs on average for six keystrokes, whereas when entering by killing jixing , four will be enough. In addition, blind dialing by this method is not possible. And then, you need to know Pinyin / Zhuyin, which is not suitable for every Chinese, because from the first grade of school, knowledge (if there were any) managed to weather a little. And it’s not always easy to remember how some rare hieroglyph is read. Therefore, in China, kill jixing is gaining more and more popularity . However, pinyin is still easier to learn than structural methods. Well, a foreigner such a system is just like a balm for the soul.

As we can see, for phonetic input we also do not need some special keyboard - any keyboard with a Latin QWERTY layout is enough. Well, for example, the one in front of you is quite suitable :)

Hybrid methods

These methods are a combination of phonetic and structural input methods. The simplest example is the yinxing method ( Yinxing - “sound and form”). The character is typed by entering the transcription and pointing to the graphic element. A limited set of graphic elements is spaced across the keyboard, so theoretically remembering them is not difficult.

In practice, hybrid input systems are gradually dying out. They require the user at the same time to know the complex combinatorics of structural systems and good knowledge of transcription. It’s easier to master one thing perfectly.

So is there a “standard” method?

But no. In China, the structural kill method and phonetic pinyin are most popular . In Taiwan, they love the phonetic method of Zhuyin (since many taught it at school, not Pinyin ) and the obsolete structural method of Cangjie . It was invented back in 1976 and has since retained all its drawbacks: it is very difficult to enter punctuation marks with this method, you must always guess the correct way to break the character and remember the complicated layout (many Taiwanese people even stick it from hopelessness on the monitor). In Hong Kong, tanjie is taught at school and clearly prefers it to all other methods.

Recognition Based Techniques

It turns out that none of the listed methods of keyboard input is ideal. It is not surprising that the Chinese decided to cling to their last hope - recognition. Now recognition of both speech and handwriting is included in the standard Language pack of Windows 7. It is assumed that before use it is better to put the system in “learning mode” for at least 15 minutes to give it time to get used to your handwriting and speech features.

But recognition-based methods have not gained wide acceptance. Keyboard input is still considered more reliable.

Recognition of spoken Chinese is complicated by the fact that the proportion of people who speak perfect pronunciation is not so large. Dialectical features come out here and there and spoil the whole picture. About foreigners for whom to master the four tones is already a feat, and there is no need to talk.

Handwriting hieroglyphs seems simpler, and now there are many PDAs that support this input method. But this method has not reached widespread use. The fact is that the majority of Chinese people write in inarticulate italics and it can be difficult for them to switch to the slow drawing of each feature. Often the problem is that they simply don’t remember the normal order of the lines, because they are used to writing abbreviated forms! So it turns out that input based on recognition is suitable mainly for language learners, which is what online dictionaries are actively using. For example, on the site of the popular Nciku, everyone is invited to draw the desired character with the mouse, and then choose from the options offered by the system:

And yet it exists!

An experimental Chinese keyboard, meaningless and merciless.

After all, this is how you imagined it?
Yes, yes, Chinese keyboards with thousands of keys exist. True, for obvious reasons, they do not go into mass production, remaining a kind of artifact.
But, you must admit, it’s still nice to realize that there is such a keyboard somewhere!

Happy Programmer's Day!

Julia Dreisis,
Leader of the Chinese Lexicographic Descriptions Group

PS You can read my other articles on Chinese and Chinese culture on the Lingvo team blog .

(Images from wikipedia.org and magazeta.com sites were used in the article)

Update: Julia Dreisis is now on Habré - dalimon, please love and favor.

Tags: