SmartCAT: Cloud Technologies for Translators
The work of translators is interesting: a lot of information in different languages constantly passes through them. It often happens that the translation of the next 100-page instruction was needed yesterday. And if similar texts have already been translated earlier (previous versions of the manual or other technical documentation), then the task may be a little easier, but at the same time do copy-paste and make sure that all changes are taken into account, then there’s another lesson. In order to use an existing translation and ensure its consistency, there is a special class of programs called CAT tools.CAT stands for Computer-Aided (Assisted) Translation - “computer-assisted translation” or “automated translation”. But do not equate these technologies with machine translation when you enter text in one language, press a button and get its translation: automated translation is a broader concept, and in the case of CAT systems, an existing translation made by a person is used.
A few days ago, ABBYY Language Services began closed testing of SmartCAT , its own platform for automating the translation process. And in this post we will try to tell you a bit what CAT systems can do.
Firstly, CAT tools include various linguistic resources that facilitate the work of translators with similar texts containing standard phrases and sentences - technical, legal and medical terms, product descriptions and much more. One of the most common resources are Translation Memory databases - translation memory databases that contain previously translated segments of text (phrases and sentences). They are created and updated based on pairs of parallel texts. Another important resource is glossaries, which contain terms and concepts adopted by a particular company (or approved for a specific group of projects). In addition, SmartCAT allows you to work with machine translation technology. Foreign translators have been using this resource for a long time, because it helps speed up translation processes and increase productivity. In Russia, not everyone understands what can be expected from machine translation, but interest in this technology is growing: this year participants in many industry conferences (for example,Loc Kit , Translation Forum Russia ) discussed the features of the introduction and use of machine translation much more actively than at past events. All of the above linguistic resources simplify the work of a translator who uses a CAT tool. In the process of text translation, SmartCAT will offer translation options for individual segments, using substitutions from existing translation memory databases and connected glossaries with corporate terminology. The translator can:

- take advantage of such substitutions and accept them
- edit the proposed translation options (if you need to change the grammatical form)
- translate the segment in your own way.
At the same time, the changed version can also be added to existing translation memory databases, then the platform will offer it next time. In addition, in a separate panel on the right side of the SmartCAT interface, the results of machine translation of the selected segment will be displayed. In most cases, it is much easier to edit such “raw” material than to translate “from scratch” - this is usually called post-editing: the translator or editor checks the finished text, compares it with the original, and brings it to the desired language norm or the required level of quality. This will not work with works of art, creative texts (slogans, advertising materials, etc.), personal correspondence, and other similar texts.
CAT tools retain document formatting. Suppose a translator is working on a document with a complex structure that contains multilevel lists, styles, links, and other design elements. SmartCAT stores information about the layout of the source text in special tags that can be left in place when working on the translation, and then the translated text will look the same as the original.
Most CAT-tools are desktop programs - they are installed on one computer, and you can use the program only on it. If you want to translate on another computer - you need a floating license or some other tricks. SmartCAT has a simple interface and cloud architecture that offers certain advantages:
- several translators can simultaneously work on one project, even if they are in different parts of the world;
- all necessary materials (translation memory databases, glossaries, etc.) are simultaneously available to all translators of a particular project.

Our platform has a special TranslationConnector module that allows you to connect to external resources - content development and creation systems, electronic document management and many others. Thanks to this, you can get a translation of, say, a site or an e-commerce portal in just one click: the task in the internal resource is transferred to the translator responsible for its solution, and he makes the necessary changes directly in the system and returns the finished text. Thus, SmartCAT users can work with translation in the interfaces of their familiar systems, and companies can build and conduct translation processes in the most convenient way, creating solutions for specific projects based on the platform. Translation can be done either by an internal team (for example, a translation department) or an external team (translation companies).
Sometimes translators have to work with PDF documents and images, which brings significant inconvenience. You simply cannot change the text in such files, so you need to recognize them before translating - extract text data. Of course, you can always print scans, hang them next to the monitor and retype their contents in a text editor, if you do not mind the time and effort. SmartCAT greatly simplifies the work with such file formats due to its integration with ABBYY OCR technologies: just load the required document into the system and it will automatically extract the text for translation. That is, translators do not even have to leave the program.
In addition, our CAT tool can measure the performance of translators in specific projects. In March, our colleagues attended the TAUS conference on translation automation issues. According to the majority of participants in the event, in projects for post-editing machine translation, you need to track the time and amount of editing at the level of an individual segment. We decided that it makes sense to control not only the work with machine translation, but the entire translation process, and we added an online project monitoring system to SmartCAT. The platform in real time analyzes various metrics and performance indicators, which allows you to get information to optimize the work of translators, editors and proofreaders with linguistic materials. In addition, such data helps to evaluate
Now let's talk a bit about what our developers did to make SmartCAT see the light. In particular, they wrote a small but powerful application server with 1200 lines of code, which is a .Net assembly loader in win-service. It can safely shut down or reboot again if you suddenly encounter errors in the code, third-party components, or other unpleasant surprise. In this case, he carefully pledges his fall in order to stand up again. In this case, the plug-in assembly contains an NInject module with a handler for that part of the business process that cannot be fit into the framework of the web request. This part is presented in the form of a task, which is queued. And for fast and scalable work with job queues in MongoDB and SQL, we developed generalized patterns.
In addition, our experts have implemented a beautiful and convenient attribute-based routing in WebAPI 5.0. In order not to limit the task handlers for RAM or hard disk, we added streaming data from external file providers (for example, OCR-server) to TranslationConnector, and in it, in turn, the same transfer to MongoDB GridFS.
We also came up with a way to organize config files to more easily configure applications during development, testing and operation. For example, the deployment of these files does not contain accounting information for military services and databases - they are dynamically connected from another directory. There are also settings that depend on the specific role of the server and its network connections. All this allows you to contain many handlers on different servers.
In the near future, we will try to tell you more about the technical details from our developers and what advantages these technologies give SmartCAT users. The cloud platform itself is still in a stage of closed testing, but all interested can apply for participation in it on the official website .
Denis Frolov
ABBYY Language Services