Antivirus bot for Telegram
Last week, Doctor Web released an anti-virus bot for Telegram. As a direct participant in this project, I would like on behalf of the whole team to talk about why we made this bot, how it works and whether it is time to abandon the desktop antivirus.
Last summer, Telegram introduced bots and the Telegram Bot API. Chatbots existed for a long time, but in this case, the platform provided such wide opportunities for integration experiments that only the lazy did not make their own bots. There are even such exotic examples .
Most of the bots that we tested were entertaining (like IQ tests or sticker ratings), informational (for example, they sent a weather forecast, word translation or the address of the nearest ATM), or both at the same time - say, bots for finding Indian cinema . It turned out to be convenient to use them, and the format itself captivated us so much that we wanted to use it for our own information stand - our bot could give a description of the threat on request: let's say the user asks the bot what exactly the Linux.Encoder.1 antivirus does, and in response receives a detailed description of the threat. But twisting the idea in our hands a little, we found obvious flaws:
After thinking about all this, we decided to step further - and create a bot with truly applied functionality. An experimental antivirus bot.
The task seemed fascinating and useful. Messenger is responsible for traffic encryption and secure data exchange, and Telegram has proven itself in this. The user is responsible for the security of the device on which Telegram is installed - and all the usual social engineering tricks work. Both a computer and a smartphone can be infected with a Trojan, which at best will show tons of advertising, and at worst it will turn the device into an insensitive set of plastic and metal.
We conceived a bot that could check files and links on the fly and warn the user if they detect a threat. When antivirus protection is embedded, say, in email, the antivirus can be located either on the side of the mail hosting or on the user's device. The Bot API allows you to organize protection differently, in a new paradigm: the bot does not work on the user's machine or on the service side, it does not depend on the operating system or on the performance of the device. The only condition for it to work is that the Telegram client version must support the use of bots. If a suspicious message came in Telegram itself, it can be immediately forwarded to the bot. It’s convenient to send the bot a dubious link received from other sources.
Immediately make a reservation that such a mechanism is not a complete replacement for the antivirus. The bot cannot stop the user from clicking on a dangerous link or launching a file, he can only warn of danger - while the antivirus will protect it, even if the careless victim of the social engineer immediately downloads and launches the Trojan. At the same time, a technically savvy Telegram audience may be interested in an antivirus product that does not limit their actions in any way, but provides information on request. We think of the bot as a research project, and we are primarily interested in feedback - that's why you see our article here.
The bot is implemented using the Tornado framework - which, as the traffic controller at the intersection, coordinates data flows between the Telegram Bot API and the private API of our Dr.Web services. Initially, we went the standard way and used Django. However, the peculiarity of the Django framework is that during data input and output (receiving the request body, sending a response, working with the database, etc.), valuable time is wasted. We conducted an experiment using the Siege utility and realized that such a model was unsuitable for the efficient processing of thousands of one-time requests.
Therefore, we began to look towards asynchronous work models - and made a choice in favor of Tornado (where asynchrony, in fact, is the main feature). Currently, all bot code is asynchronous: including downloading files, checking links and even working with the database - when adding an entry to the database, the bot does not wait for a response from the server, but continues to perform tasks.
When messages destined for the bot come from the Telegram cloud, we need to parse the links in the received text. At the same time, it is important to avoid discrepancies between how our parser works (that is, which page the bot will check) and how the Telegram parser works (that is, what exactly will the user open by clicking on the messenger), so we followed as much as possible Telegram link parsing - focusing on the open source web version. Although their mechanism probably was not limited to this and periodically caused us questions (for example, in the mobile application for iOS, the link “test.com:8080” without specifying the protocol looks like “ test.com : 8080” for the sender, but how “ Test.com:8080 “ at the recipient).
Further processing of links and files goes in several stages: unpacking archives, opening shortened links and tracking redirects. If files are downloaded via a link, we download them - thanks to this, the bot can check not only files sent via Telegram, but also files using external links.
To more efficiently distribute the load on the servers, the first thing to do is check the file and link caches. After that, the bot passes the baton to various Dr.Web technologies through our internal APIs: the Dr.Web Cloud cloud service, the Scanning Engine antivirus engine, Link Checker, and the virus signature database. Data exchange is asynchronous and multi-threaded, and with an increase in load, we can increase the power by adding new servers and prescribing certain settings in the configuration files - the ability to scale was originally built into the bot architecture.
Finally, the verified materials are returned to the bot - and the bot sends the results to users, taking into account the restrictions on the frequency of messages from bots that the Telegram Bot API sets.
Users can check links and files both in private mode (send suspicious content to the bot or send messages received from other users to it) and in group chat - if you add the bot to the chat participants, it will work on all files and links in the chat.
The bot works in two modes: "silent" and normal. In normal mode, the bot responds to each file or link and sends a message that the link is safe or that it is not recommended to download the file. If the bot behaves like this in a group chat, then this can prevent people from communicating, so we made a “silent” mode. In this mode, the bot gives a sign only when the file or link in the chat contains a threat, and warns users from rash tep or click. Verification error messages also come in “silent” mode - otherwise, without waiting for an answer, the user might have mistakenly considered that the link or file was successfully verified and safe. You can select a mode using the / mode command .
As the API develops, we will introduce new features if they prove to be useful for our tasks. Not so long ago, Telegram introduced the use of bots in inline mode without adding a bot to the chat - so far this mechanism does not allow the file to be sent to the bot for verification, but we are considering its use. In the next updates we plan to make the bot faster and more reliable, we closely monitor users’ feedback.
A few words about localization (since this is not only my profession, but also passion): our bot can communicate in Russian, English or German. There were no particular difficulties, we use the gettext library, and store the localization files in the .po format.
As a rule, all texts for our products are written in an official style, so using emojis in resource files was an interesting new experience - in OS X they are supported “out of the box”, in Ubuntu it was enough to add a font to the system ( sudo apt-get install ttf- ancient-fonts), and on Windows, tricks were needed so that translators could see emojdi in localization files. We tried to insert emojis into .po files using codes, but not all operating systems can read them (for example, users of desktop clients for Windows saw text codes instead). Apparently, there are two reasonable decisions: either select a .po-file editor that displays all emojis, or replace them with codes, but convert them to emojis on our side. We are thinking towards the second option - but be that as it may, the user will not even notice this torment.
Another feature that we keep in mind when developing: the same emojis look different on different devices and are generally not supported everywhere. Emojipedia helped to solve this problem - in it you can see if there are any emojis on any platform, as well as copy the emojis or its code and paste them into a .po file.
And a small snag we encountered: Telegram does not allow localizing the bot completely, the bot description and tooltips in the input field are always in one language (in our case, in English). We hope that a solution for this will appear in the next releases of the Telegram Bot API.
In general, it took us 3 months to develop, internal test and localize with a team of 7 people. Colleagues were engaged in development in a relaxed mode in parallel with the main work tasks, so we had enough time to “meditate” on the logic of the bot. The hardest thing in this mode is to carry out load testing - for the main stress test, several dozens of employees with Telegram accounts were invited, and by a conditional signal they fed a collection of thousands of files to the bot. We hope that the influx of curious testers from the hub will not put us out of action, but if something happens we will connect additional capacities over time, do not judge strictly.
As far as we know, no one has done antivirus bots yet, so there is a wide field for experimentation. We will be glad if you share your thoughts and experience with our bot: @drwebbot
Concept
Last summer, Telegram introduced bots and the Telegram Bot API. Chatbots existed for a long time, but in this case, the platform provided such wide opportunities for integration experiments that only the lazy did not make their own bots. There are even such exotic examples .
Most of the bots that we tested were entertaining (like IQ tests or sticker ratings), informational (for example, they sent a weather forecast, word translation or the address of the nearest ATM), or both at the same time - say, bots for finding Indian cinema . It turned out to be convenient to use them, and the format itself captivated us so much that we wanted to use it for our own information stand - our bot could give a description of the threat on request: let's say the user asks the bot what exactly the Linux.Encoder.1 antivirus does, and in response receives a detailed description of the threat. But twisting the idea in our hands a little, we found obvious flaws:
- In the format of messages from the messenger, it is inconvenient to read about malicious programs: the description of the mechanism is often very long, with code examples and a mountain of screenshots.
- The situation itself seemed artificial when a user found out about a threat on his device, opened Telegram, found a bot and asked him a question about it, rather than simply google it.
- Different antivirus companies use different rules for naming threats. The user can search for a threat by a different name and not find the necessary information.
After thinking about all this, we decided to step further - and create a bot with truly applied functionality. An experimental antivirus bot.
The task seemed fascinating and useful. Messenger is responsible for traffic encryption and secure data exchange, and Telegram has proven itself in this. The user is responsible for the security of the device on which Telegram is installed - and all the usual social engineering tricks work. Both a computer and a smartphone can be infected with a Trojan, which at best will show tons of advertising, and at worst it will turn the device into an insensitive set of plastic and metal.
We conceived a bot that could check files and links on the fly and warn the user if they detect a threat. When antivirus protection is embedded, say, in email, the antivirus can be located either on the side of the mail hosting or on the user's device. The Bot API allows you to organize protection differently, in a new paradigm: the bot does not work on the user's machine or on the service side, it does not depend on the operating system or on the performance of the device. The only condition for it to work is that the Telegram client version must support the use of bots. If a suspicious message came in Telegram itself, it can be immediately forwarded to the bot. It’s convenient to send the bot a dubious link received from other sources.
Immediately make a reservation that such a mechanism is not a complete replacement for the antivirus. The bot cannot stop the user from clicking on a dangerous link or launching a file, he can only warn of danger - while the antivirus will protect it, even if the careless victim of the social engineer immediately downloads and launches the Trojan. At the same time, a technically savvy Telegram audience may be interested in an antivirus product that does not limit their actions in any way, but provides information on request. We think of the bot as a research project, and we are primarily interested in feedback - that's why you see our article here.
Implementation
The bot is implemented using the Tornado framework - which, as the traffic controller at the intersection, coordinates data flows between the Telegram Bot API and the private API of our Dr.Web services. Initially, we went the standard way and used Django. However, the peculiarity of the Django framework is that during data input and output (receiving the request body, sending a response, working with the database, etc.), valuable time is wasted. We conducted an experiment using the Siege utility and realized that such a model was unsuitable for the efficient processing of thousands of one-time requests.
Therefore, we began to look towards asynchronous work models - and made a choice in favor of Tornado (where asynchrony, in fact, is the main feature). Currently, all bot code is asynchronous: including downloading files, checking links and even working with the database - when adding an entry to the database, the bot does not wait for a response from the server, but continues to perform tasks.
When messages destined for the bot come from the Telegram cloud, we need to parse the links in the received text. At the same time, it is important to avoid discrepancies between how our parser works (that is, which page the bot will check) and how the Telegram parser works (that is, what exactly will the user open by clicking on the messenger), so we followed as much as possible Telegram link parsing - focusing on the open source web version. Although their mechanism probably was not limited to this and periodically caused us questions (for example, in the mobile application for iOS, the link “test.com:8080” without specifying the protocol looks like “ test.com : 8080” for the sender, but how “ Test.com:8080 “ at the recipient).
Further processing of links and files goes in several stages: unpacking archives, opening shortened links and tracking redirects. If files are downloaded via a link, we download them - thanks to this, the bot can check not only files sent via Telegram, but also files using external links.
To more efficiently distribute the load on the servers, the first thing to do is check the file and link caches. After that, the bot passes the baton to various Dr.Web technologies through our internal APIs: the Dr.Web Cloud cloud service, the Scanning Engine antivirus engine, Link Checker, and the virus signature database. Data exchange is asynchronous and multi-threaded, and with an increase in load, we can increase the power by adding new servers and prescribing certain settings in the configuration files - the ability to scale was originally built into the bot architecture.
Finally, the verified materials are returned to the bot - and the bot sends the results to users, taking into account the restrictions on the frequency of messages from bots that the Telegram Bot API sets.
Users can check links and files both in private mode (send suspicious content to the bot or send messages received from other users to it) and in group chat - if you add the bot to the chat participants, it will work on all files and links in the chat.
The bot works in two modes: "silent" and normal. In normal mode, the bot responds to each file or link and sends a message that the link is safe or that it is not recommended to download the file. If the bot behaves like this in a group chat, then this can prevent people from communicating, so we made a “silent” mode. In this mode, the bot gives a sign only when the file or link in the chat contains a threat, and warns users from rash tep or click. Verification error messages also come in “silent” mode - otherwise, without waiting for an answer, the user might have mistakenly considered that the link or file was successfully verified and safe. You can select a mode using the / mode command .
As the API develops, we will introduce new features if they prove to be useful for our tasks. Not so long ago, Telegram introduced the use of bots in inline mode without adding a bot to the chat - so far this mechanism does not allow the file to be sent to the bot for verification, but we are considering its use. In the next updates we plan to make the bot faster and more reliable, we closely monitor users’ feedback.
A few words about localization (since this is not only my profession, but also passion): our bot can communicate in Russian, English or German. There were no particular difficulties, we use the gettext library, and store the localization files in the .po format.
As a rule, all texts for our products are written in an official style, so using emojis in resource files was an interesting new experience - in OS X they are supported “out of the box”, in Ubuntu it was enough to add a font to the system ( sudo apt-get install ttf- ancient-fonts), and on Windows, tricks were needed so that translators could see emojdi in localization files. We tried to insert emojis into .po files using codes, but not all operating systems can read them (for example, users of desktop clients for Windows saw text codes instead). Apparently, there are two reasonable decisions: either select a .po-file editor that displays all emojis, or replace them with codes, but convert them to emojis on our side. We are thinking towards the second option - but be that as it may, the user will not even notice this torment.
Another feature that we keep in mind when developing: the same emojis look different on different devices and are generally not supported everywhere. Emojipedia helped to solve this problem - in it you can see if there are any emojis on any platform, as well as copy the emojis or its code and paste them into a .po file.
And a small snag we encountered: Telegram does not allow localizing the bot completely, the bot description and tooltips in the input field are always in one language (in our case, in English). We hope that a solution for this will appear in the next releases of the Telegram Bot API.
In general, it took us 3 months to develop, internal test and localize with a team of 7 people. Colleagues were engaged in development in a relaxed mode in parallel with the main work tasks, so we had enough time to “meditate” on the logic of the bot. The hardest thing in this mode is to carry out load testing - for the main stress test, several dozens of employees with Telegram accounts were invited, and by a conditional signal they fed a collection of thousands of files to the bot. We hope that the influx of curious testers from the hub will not put us out of action, but if something happens we will connect additional capacities over time, do not judge strictly.
As far as we know, no one has done antivirus bots yet, so there is a wide field for experimentation. We will be glad if you share your thoughts and experience with our bot: @drwebbot