How the infrastructure of email newsletter services is built: Pechkin-mail.ru experience

Mailing lists are an effective tool for increasing sales and marketing support for projects. That is why more and more companies are resorting to the use of appropriate tools to create and implement newsletters.
In our today's topic, we will talk about how the infrastructure of such services is built, using the example of the Pechkin-mail.ru project .
Why do we need such services
A common misconception is that the services for creating email newsletters are used by spammers to send letters with offers on the purchase of Viagra. In fact, services like ours (or foreign Mailchimp) are needed to send emails to the addresses of people who have subscribed to the company’s newsletter (for example, on its website) - that is, they have expressly agreed to receive letters.
Companies can send informational messages about the operation of their systems to their customers (for example, notification of new comments on Habré), offer some goods or discounts (you can unsubscribe from receiving such letters).
Respectable services of email newsletters with spammers simply do not work and strictly moderate the sent content, so as not to fall under the sanctions of the email services themselves.
How it works
In fact, the service Pechkin-mail.ru does not deliver letters on its own, but creates them in the form of a final html-code, generates it in a MIME format, personalizes, tracks the fate of the letter (whether it was open, etc.), but not “ puts "them in mailboxes.

There are many ESPs for this task - for example, andrill, smtp.com, mailgun, sendgrid. With the help of these cloud-based mail providers, we are able to achieve an extremely high level of email deliverability, high controllability of the pool of IP addresses and a low level of operating expenses.
Technology selection
From the experience of creating highly loaded web services for business (hundreds and thousands of users), we know that the main expenses of a technical nature are spent on supporting the current code, developing new functions. Trite, for salaries. That is why often do not be wise in choosing a programming language and other technologies. The most important thing to understand is that 99% of the problems can be solved by the correct architecture.
Below is information on the main technical characteristics of the Peckin-mail project:
Programming language
The service is written in PHP and JS. At the same time, some functions use proprietary C libraries (creating them is a topic for a separate topic).
Databases and OS
The mailing service is an extremely high load service: we send up to 10 million letters a day. And a significant proportion of these letters are tracked (openings, clicks, etc.). All this generates database queries. Accordingly, it is necessary by all means to avoid problems with it (failures are generally unacceptable).
We have been solving this problem for a long time and have come to a cluster on Percona (there is an excellent article on Habré on its setup). This is a fail-safe, horizontally-scalable cluster with master-master replication (a nice bonus is “hot” backups without sacrificing performance).
Percona is used to implement business logic and generate reports, in addition, we use MongoDB for queues and online storages of various data in the process of sending letters. Debian is used as the OS, virtualization is implemented through OpenVZ.
Iron and channels
As mentioned above, a feature of mailing list services is the large amount of data that needs to be worked on a lot and often. This means that you cannot do without fast disks (we now use SAS, but, of course, SSD is optimal).
Now Pechkin-mail.ru service runs on DELL PowerEdgeTM R720 DX-150 64Gb RAM, 4x600Gb SAS, Hetzner EX-5, Hetzner EX-10 servers. Additionally, Amazon EC2 (10 instances for peak load) and Selectel Storage are used, which is responsible for hosting customer pictures and their delivery to mailing recipients (we recommend it to everyone because of the cost).

Also, a high-quality and wide channel is very desirable - SMTP traffic is very heavy, often these are pictures and content sent to hundreds of thousands of recipients in a short time. If you take an average mailing list of 150KB (there are almost no pictures, they are hosted on a third-party server), then even sending it to 10,000 subscribers is already 1.5 GB of traffic. You need to keep this in mind when designing a system.
Important Functions
A good email newsletter service should be able to successfully solve a number of problems. Below is a list of the most important functions of such tools.
Work with address databases
To work with any service that processes customer data, the speed and ease of downloading this information plays an important role. For the mailing list service, it is important to quickly and easily download the address databases containing email addresses, names, surnames and other additional data.
For these purposes, we implemented a bunch of Excel and the commercial library libxl - a description of what ultimately turned out is published in a separate article on Habré .
In addition, Pechkin provides the ability to segment mailings - for example, based on the fields of the address base, according to the activity of subscribers in previous mailings or the date the addresses were added to the database.

Create Newsletters
The layout of emails is a separate and very interesting topic. We published the rules for the layout of newsletters and talked about how to embed YouTube videos in emails .
In addition, often there is a need to create a plain text version of the newsletter - we decided to automate this task and developed a generator of text versions of letters from HTML using the lynx text browser ( here you can read more about this solution).
In addition, based on statistics on the effectiveness of mailings, you can fine-tune the sending of letters - for example, sending at a certain time, in uniform pieces, creating an A / B mailing (two mailings to test the effectiveness of different options for headers, message texts, etc.).
Spam fighting
One of the main tasks of the email marketing service is the fight against spam. In "Pechkin" it is implemented by the following mechanisms:
- Pre-moderation of sent mailings - all mailings are viewed manually by the service staff in order to check their compliance with moderation policies (most of them are based on the Administrative and Technical rules of mail.ru). In automatic mode, moderation is carried out only for the most "trusted" customers;
- Automatic reputation system - we talked about how it works in one of the previous topics .
- Beacon spam filter control . We work with spam filters Mailgun, CloudMark and many others. In addition, automatic APIs appear that allow marking of potentially "spammer" mailings and monitor them during the sending process.
- Automatic blocking for spam . If the mail provider’s report indicates that the mailing is detected as spam, it stops immediately and the user is offered standard unlock instructions (including filling out a form on the mail provider’s website and explaining the legitimacy of the address databases and content). Based on the results of the check, the newsletter is either unlocked or the account is finally blocked for spam.
No need to reinvent the wheel
It is possible to achieve good work of a highly loaded service only if it is qualitatively designed without creating all kinds of crutches. In order to avoid this, we are quite actively using affordable cloud solutions (Amazon EC2, Selectel Cloud Storage). This is convenient and inexpensive: for example, during peak periods (sending more than 30 mailings at a time), we include additional “instances” on Amazon EC2.
Thanks to the competent construction of the infrastructure, we were able to achieve a service uptime indicator in 2014 at the level of 99.98%, and only two support staff can handle the issues of more than 2.5 thousand paying customers.