Keyboard Input Using IME

Introduction


In Asian language cultures, there are a large number of characters that do not always fit on a standard keyboard layout. To enter these characters, a special technology was developed, which is called the Input Manager Editor (Russian. "Input Method Editors"). Input Method Editor (IME) is a program or component of the operating system that allows users to enter characters that are physically absent on the keyboard.
Despite the fact that the term “input method editor” (IME) was originally used only in Microsoft Windows, it is currently used in other operating systems when it is important to distinguish the input method from the functionality of the program that provides it and the general support for input methods operating system.
The term “input method” usually means a certain keyboard input method in any language, for example, Tsanjie, input using Pinyin or the use of “dead” keys. ”
The term“ input method editor ”usually means a specific program that allows you to use the input method ( e.g. SCIM or Microsoft IME).

Default IME System


If the language culture contains no more than 100 characters, then when entering from the keyboard, you do not need to convert several keys pressed to one character, as was the case with typewriters. For modern keyboards, this rule is relevant, for example, a QWERTY keyboard contains 102 keys and several modifiers. However, if the language contains more than 100 characters, it is necessary to provide for the conversion of the combination of the entered characters before using them in the application. This process is called the “communication process” (FEP) and IME is the standard way for FEP on Windows.
By default, IME uses a syllable phonetic input card for the selected language. In a normal scenario, the user enters Latin characters that are included in the pronunciation of a particular syllable. If the IME recognizes the entered syllable, then it displays the user a list of words or phrases of candidates from which the user can select the final version. The selected word is then sent to the application through a series of Microsoft Windows messages - WM_CHAR. Since IME works at a level lower than a normal application (by intercepting keyboard input), the presence of IME is transparent to the application. Almost all application windows can easily use IME, not knowing about its existence and without requiring special coding.
A single word can be entered in two or three stages, depending on the chosen language system.
For example, a script for the Japanese language:
  1. The user enters the designation of syllables in Latin. For example, the word "tsunami" is typed as "tsunami".
  2. Entered syllables in Latin are automatically replaced with characters from hiragana or katakana. Hiragana and Katakana are syllabic alphabets in which each sound of the Japanese language has its own character. For example, if a hiragana is selected by the user: “Tsu” is automatically replaced by “つ”, “na” -> “な”, “mi” -> “み”.
  3. The user can leave the word written in hiragana, and can convert to a hieroglyph. Converting to hieroglyphs is similar to the T9 input system: in the worst case, the user is offered a list of options from which he must choose one option. In the tsunami example of a hiragana, つ な み turns into hieroglyphs 津 波. At the third stage, you can control the conversion, usually using the keys "Space", "Enter", arrows, "numpad". For example, Space converts from hiragana to hieroglyphs, another Space shows a list with replacement options, “Enter” means to end the conversion and leave the entered characters “as is”.



The first screenshot shows how a user enters a sequence of characters called a “composition string” (en. “Composition string”). It should be noted that the sequence of the two proposed symbols was transformed to one symbol "F" in the notebook.
The second screenshot shows that the user has completed entering syllables, and the IME system suggests choosing the appropriate word (the user pressed “space”). The user can confirm the word by pressing “Enter”.
After pressing “Enter” to confirm the entered word, the application (in the example, notepad) receives the resulting string in the form of a WM_IME_CHAR message. Subsequently, if the application does not process this message, it will receive a standard WM_CHAR message from the IME system.

Override default IME behavior



Typically, IME uses standard Windows procedures to create windows (using WinAPI).
Note: When the application runs in full screen mode, as is customary, for example, for games, standard windows do not work and cannot be displayed on top of the application. To solve this problem, the application must process IME messages on its own, and not rely on IME windows to complete this task.
By default, an application can use the IME library directly by processing IME-related messages and invoking the Input Method Manager (IMM).
When a user uses the IME layout to enter complex characters, IMM sends messages to the application to notify him of important events, such as the launch of the composition window or “show the list of word candidates”. An application usually ignores these messages and passes them, by default, to the Windows message handler, which causes the IME library to be called.
The process diagram (Fig. 2) shows how the text input process occurs:
  1. When the IMM receives keystrokes from the keyboard driver, it sends virtual characters to the IME system by calling the ImeProcessKey function. If the result of this function is “0”, then the keystroke must be processed by the operating system and the application on their own. Messages WM_KEYDOWN and WM_KEYUP, and then WM_CHAR or WM_COMMAND will be sent to the application.
  2. If the IME system returned a result other than "0", then the IMM will transmit the pressed characters, calling the "ImeToAsciiEx" function of the IME library.
  3. The IME system returns the “lpdwTransBuf” parameter, which contains the Windows messages that must be passed to the application. The IME system also accepts hIMC, as a parameter where the “composition string” is indicated. During operation, the IME system will change the contents of the hIMC memory area.
  4. Each time IMM receives the “lpdwTransBuf” parameter, it checks to see if this buffer contains messages for the application. Typically, a buffer contains a WM_IME_COMPOSITION message, which should be sent to the application each time the composition line changes.
  5. If the application does not support IME, then it will not process the WM_IME_COMPOSITION message. Thus, the user will not see the entered. In this case, the message is sent to the corresponding IME UI window (which is created by the IME system during initialization in the "ImeInquire" function), which is always created if the IME is activated. The IME window will display the composition line as entered by the user.
  6. If the application supports IME, then it will process the WM_IME_COMPOSITION message. If you need to get the contents of the composition string, the application calls the function "ImmGetCompositionString" from the library "Imm32.dll". A WM_IME_COMPOSITION message may also notify the application that a string has been generated.
  7. If the application receives the composition string from IMM, then it must call the DefWindowProc function for the WM_IME_CHAR message, because further processing may generate this message again.
  8. If the application does not support IME, then it will receive a WM_IME_CHAR message. If the application supports Unicode, then the Unicode character will be passed in the parameters, if not, the application will receive the WM_CHAR message, but it will know that it is necessary to convert the character.
  9. As a result, the application always receives the WM_CHAR message and knows whether to convert the character.


The IME library is a regular DLL file, usually with the extension “.ime”. Each IME system must be registered in Microsoft Windows in the registry: "HKEY_LOCAL_MACHINE \ SYSTEM \ ControlSet00X \ Keyboard Layouts". For any application, you can always get its keyboard context.

Potential threat when using IME system


Each keyboard layer (Keyboard Layout), including IME systems, is always loaded into every process of the operating system. A regular application running in a user environment cannot prohibit the loading of keyboard modules into its address space. The IME library, like the Windows library, is required to export several functions that you can override. Also, when loading a library, the DLLMain function is always called.
To implement an attack such as Keylogger, an attacker just needs to create his IME system for any keyboard layout (for example, IME for US, UK, RU, CH, JP, KR), which does not display windows and converts characters. Those. will be invisible to the user.
As mentioned above, the IMM manager always passes the keystroke codes to the selected IME system, calling the ImeProcessKey and ImeToAsciiEx functions exported from the IME library.
The easiest way to intercept keystrokes is to use the "ImeProcessKey" function, which should always return "0" to prevent further processing of messages for a given character. Because The IME library is loaded into all processes, the user will not be able to notice suspicious activity using standard tools (For example, there will be no separate process for the IME library).
Also, if ImeProcessKey returns a value other than zero, then key capture can be performed in the ImeToAsciiEx function, but it is necessary to convert the scan code of the pressed key to a character in the desired encoding, which is a trivial operation.

More malicious actions

Signed (legal) IME library files can be easily deleted from the user's computer. If the IME library was created by cybercriminals, then it can be added instead of the legal library. This may result in the following threats:
  • Malicious IMEs can change the standard IME of all users;
  • Even if the user changes the default IME, the IME editors that have already been selected will not be changed. The user must log into the OS or restart the computer.
  • If the IME library runs in a separate thread, then it can continue to work. The user will not be able to complete it by standard means.
  • Attackers can set special permissions for registry changes to make it difficult to remove the IME library.
  • IME loads even in 16-bit applications and command line applications.
  • IME can download the WinSock subsystem to access the Internet.

Also popular now: