smart June 1, 2011 at 17:49

The hard life of anti-spammers or how it really happens

The reason for this publication was the recent major changes to the anti-spam mechanism in our mail service. We would like to share the news, but not in the form of a dry press release. Therefore, we decided to talk about how AntiSpam is arranged in Mail.Ru Mail, and of course, to answer your questions with pleasure. So…

Mail.Ru antispam architecture

Own antispam in Mail.Ru has been around for many years. The desire to develop our own product is understandable, because at a certain stage in the development of the project, the requirements for the quality and scalability of the anti-spam mechanism became too great to be satisfied even by very highly customized “alien” products. Of course, we still use some services and components of third-party providers (for example, to check emails for the virus component), but their role is no longer decisive.

The requirements for our own anti-spam were very clear and logical - maximum speed and accuracy. Of course, there is no limit to perfection, the relationship of spammers with their opponents is an eternal struggle of the shield and the sword. But now we can already say with confidence that we have seriously advanced towards our cherished goal and are continuing to build up momentum.

So, how does it look and work - the modern antispam of Mail.Mail.Ru?

First of all, even on the “approach” to mail servers, all senders are checked against the base of IP addresses seen in spam mailings. The database is dynamically updated in real time: some IPs are whitewashed, while others are blacklisted. Accordingly, letters from IP addresses with a "tarnished" reputation are not accepted - this way we manage to cut off most of the botnets.

If the sender's IP is not in the black list, then the message is received by the server and checked by two anti-spam systems: Kaspersky Anti-Spam (or KAS for short) and the spam filtering system developed in Mail.Ru - MRAS (Mail.Ru Anti-Spam). These two systems always work in parallel.

The name MRAS appears, in particular, in the service headers of almost every letter passing through Mail.Ru. For example, the heading “X-Mras: Ok” indicates that no spam signatures were found in this message.

When choosing the MRAS architecture, we took the most common approach: collecting samples of spam emails, analyzing them and generating signatures. In simple terms, a signature is a piece of relevant information in a letter: phone number, link, characteristic phrase or keyword, etc. Evaluation of a message in MRAS is carried out according to signatures according to simple logic: if the message contains signatures specific to spam mailings, most likely this letter is spam.

Separately, it is worth noting the recognition system for graphic spam. Each picture that comes in the letter is analyzed and also decomposed into signatures that are involved in the decision. For example, antispam confidently determines the phone numbers and addresses of sites written graphically, and the algorithm works even with distorted and noisy images.

In addition to signatures, there are so-called rules in MRAS that describe more complex logic. Using the rules in MRAS, you can create filters that take into account multiple message attributes, including service headers, image parameters, link format or patterns, frequency and reputation characteristics of any entity in the message, etc.

When we selected the engine for the implementation of the rules, various options were discussed. The main requirements were: high performance, syntax flexibility and easy extensibility. We found that the Lua embedded interpreter met the above conditions. As a result, we got a powerful and flexible tool, which was useful not only for creating rules. Now, with the help of Lua-scripts, MRAS implements a significant part of business logic, for example, mechanisms for parsing images and frequency shingles, various reputation mechanisms.

How does MRAS find out about spam mailings?

There are several sources of spam samples for MRAS. The main source is complaints from users clicking the "This is spam" button in the web interface of mail. They are grouped, automatically filtered, and then fed into the decision-making system.

Another of the most important sources is trap boxes - specially registered and "lit up" boxes on the Internet, where only spam gets. Outwardly they look like boxes of ordinary users - these can be accounts in My World and other social networks, messages on forums and guest books, etc. Unscrupulous mailing lists gathering a database of addresses on the Internet are likely to hook several “traps” - and when a letter arrives at them, it will surely serve as the main one for spam signatures.

Finally, at the third stage there is a group of analysts of Mail.Mail.Ru, who in 24x7 mode in real time analyze received complaints from users about letters that may be spam, the content of trap boxes, etc.

Next, what happens to the letter exiting MRAS? Having worked on the significant signatures of the letters, the MRAS gives the letter a final grade, which can take one of three values:

the letter is not spam
the email may be spam
the letter is definitely spam.

The same ratings are given by KAS. If both anti-spam systems consider the message to be good, the message is sent to the Inbox folder, if one or both systems mark the message as possible spam, then to the Spam folder. If at least one of the systems is sure that the letter is spam, then such a message does not reach the user, and the sender receives a bounce message.

It is important to note that the same system processes outgoing emails from Mail.Ru servers. So if a user tries to send a spam email, he receives a notification that the message cannot be sent.

It is interesting to note that MRAS checks the message not only at the entrance, but also some time after it got into the user mailbox - this is due to the fact that new data on spam mailings could change the situation and, accordingly, the opinion of the system. Therefore, if at that moment when the message was processed by MRAS, it was not detected as spam, and after a few minutes it was already determined, MRAS transfers the message from the Inbox to the Spam folder. Naturally, this happens strictly before the user logs into the inbox and sees the letters.

All that was said above is an automatic spam filtering system that works for all users. However, different users have different preferences, so recently we introduced an individual (personal) component of spam filtering.

What's new?

It's no secret that with the massive spread of social networks, online games, online stores and other services that actively communicate with their audience via e-mail, mountains of various notifications began to accumulate in user boxes. And our studies show that for modern users, spam is no longer just a mass mailing about "printing business cards", "green cards" or "increasing self-known." People consider spam any unwanted email, whether it’s a boring newsletter with an opaque unsubscribe or a long-uninteresting Internet service that regularly enters Inbox.

According to internal statistics from Mail.Ru, users daily receive dozens of diverse mailings from social networks, stores and Internet services. An advanced user easily avoids the accumulation of mailing lists in the Inbox using filters or blacklists. In order to make life easier for all other users, we have implemented a personal anti-spam.

Now, any user can, once and for all, get rid of the annoying mailing of an Internet service, social network or store - i.e. quite legitimate services. It is enough to select one unnecessary letter and click the "This is spam" button, after which all letters from this sender will already arrive in the "Spam" folder. And of course, this will not affect the delivery of letters to other users, in this case we are talking about purely individual setting of the anti-spam mechanism “for yourself”.

By the way, the “This is spam” button has an antipode, without which the mechanism of personal antispam would not be complete. The "This is not spam" button, available for letters from the "Spam" folder, allows you to move the message that got into Spam by mistake and "whitewash" the sender's address in Inbox. In the future, all letters from this sender will be sent to Inbox.

Of course, in reality, everything is somewhat more complicated. When creating individual black and white lists, we take into account not only the sender's address, but also other parameters of the letter. Otherwise, we would be too nice for spam senders faking the From header;)

And of course, in addition to replenishing individual spam filters, clicking the “This is spam” and “This is not spam” buttons is also used to train general anti-spam. So, by clicking these buttons, the user does better not only to himself, but to all other users.

Interesting facts and figures

The very first days after the launch of personal antispam showed that this feature greatly simplifies the lives of users. By the end of the first week of the work of personal anti-spam, more than 1,000,000 letters per day began to be sent to spam - and of course, these are mainly notifications from social networks.

It was interesting for us to analyze what other letters users send to Spam. Here's what the distribution looks like:

By the way, as you would expect, users click the "This is not spam" button 10-20 times less often than "This is spam."

And finally ... how to send mail;)

Many of you are directly involved in web development and, one way or another, send letters to your users. To ensure that your letters are delivered reliably, we have formulated recommendations for senders. Their implementation, of course, is not strictly necessary, but it makes the world a better place;) General recommendations are at http://help.mail.ru/mail-help/rules/general , and more specific technical requirements are at http: / /help.mail.ru/mail-help/rules/technical .

The main task of mail is to deliver letters to its users. Therefore, we carefully combat the false positives of antispam when they occur. If your letters do not reach users - write to abuse@corp.mail.ru. In order to understand the problem, you need to attach a full copy of the letter you sent (with all the service headers), as well as a non-delivery response (also in full).

I would like to pay special attention to mechanisms that are designed to compensate for the shortcomings of the email transfer protocols. We are talking about specifying the correct SPF records, and especially about signing each message using DKIM, which we wrote about more than once on the Habr .

In fact, if all honest senders use these approaches, the global spam situation will radically improve. Therefore, we urge you to quickly implement these technologies, especially since it is quite simple to configure (for example, the documentation for configuring DKIM in Exim or one of the DKIM implementations for postfix ).

Sergey Martynov,
Head of the Mail.Ru Post Office

Tags: