How the SMSDirect system works
After reading here about comparing SMS services for newsletters , we decided to tell you about our experience in building such a system, which has served us faithfully for several years and is constantly being improved and improved. We hope our experience will be useful to you. In general, for those who are interested, I ask for a cat.
What is our SMS mailing system?
The main task of the SMSDirect system is to send or send single SMS messages.
Service represents a whole complex, the tip of the iceberg of which is the client part - the site. This is the entry point to the system. There is a personal account (if you want to download the mailing lists manually), as well as a set of API methods for creating mailings and sending messages.
To enter the system, the client must register, and then use either a set of API methods or the interface of your personal account. The following functions are implemented here: loading and storing user client databases (subscriber lists), as well as creating and managing mailing lists for these lists (or manually entered numbers).
Component systems: a solution capable of processing a large amount of the same type of data and generating mailings from them and a reliable mechanism connecting our system with operators. The principal task here is to store a huge amount of user databases, their quick processing and access to them.
Since we have a very large amount of data, we had to decide how to process, divide and route it.
After we divided the entire amount of data into several internal connections, we needed to ensure that each message got into the necessary gateway. Accordingly, messages need to be distributed among operators in a certain way. Such distribution is carried out in a routing complex, which is an application system that deals with sending millions of messages over a given gateway. The division is carried out on the basis of the analysis of the message attributes: the subscriber belongs to a particular operator (not just by code, but by the service operator), message text, sender number, that is, the message properties are subjected to complete analysis and processing. Thus, we solve the problem of message distribution: it seems to be one distribution, and messages are distributed and leave in different ways depending on your operator (subscriber number) and, for example, the sender number. Several dedicated servers are involved in traffic routing, representing part of the infrastructure outside the UI and backend.
Nginx acts as the web server that processes user calls to the UI and API requests.
This solution is classic, almost universal for such tasks. At a lower level, it would be logical to deploy the Apache web server, but we decided not to use it, thereby saving resources (we do not need all the functionality, and paying the full price for the 1% of its capabilities we need is inappropriate), and created our own Perl core . This is what, in fact, is the basis of the system.
The long answer: This is a hollywood question, in fact there is no particular reason to use one or another language. The key issue in a large and complex project is a well-designed architecture, not a language. We have specialists who know and love different languages, the story has turned out so that when developing this project, experts who know Perl showed the greatest zeal :) At the same time, since Perl was finally chosen, we can note the advantages of this language, the most important of which Undoubtedly, there are CPAN (read - ready-made solutions for any task) and maturity of the language (read - guaranteed performance of the code during updates (here the pin is addressed to PHP)).
The kernel interacts with the web server through the FastCGI interface, which, firstly, serves to forward the request that came to us and accepted by nginx into our kernel, secondly, FastCGI and its associated modules represent the kernel launch mechanism in the form of several "daemons", then there is a kernel always running.
The core itself is divided into several separate "demons", each of which is responsible for its part of the processes. The most basic of them are a site with a personal account, a set of APIs, functionality that is responsible for downloading files from a user (subscriber bases), and a service part is provided in the kernel that does not interact with external data and focuses exclusively on internal (service) processes. By itself, this part is divided into several low-level functions. For example, when a user has downloaded a subscriber database, the system first sees it as just a file with data that is not suitable for work. After the file is downloaded, it is processed - the subscriber numbers are checked for correct lengths and prefixes, empty and too short numbers are discarded, that is, the "raw" file turns into some standard internal structure,
loaded by the user are saved to disk and lie in the file system, but after our service scripts have been run through them, the necessary descriptive information is extracted from them and stored in the database. This information is structured, in it you can make queries with filters for each attribute.
When processing databases and extracting some necessary entities from them, or when processing the obtained statistics, sorting takes a huge amount. Conventional sorters are of little use for such a volume and complexity of sorts, so we use MSORT for this.
Since the user uploads large subscription lists to us, they need to be stored somewhere, plus each newsletter generates a huge amount of their own service data: a block of generated messages on the newsletter, statuses received on these blocks, and also the final file for export. Intermediate files are deleted after the distribution ends. For example, sending out 10 million messages causes 3-5 signals for each of them, and this is already 30-50 million records in a file. In general, we generate billions of records that we should not lose, because at any time the client may need to figure out the details. To do this, a separate file storage is connected to our kernel.
In some cases, there is an urgent need to find a specific record in a particular file, be it a database of numbers or some other statistics. Reading it through exhaustive search from start to finish is long and expensive, therefore, in a number of processes we use many forgotten tools - Berkeley DB (BDB). This is a kind of hash on the disk, which contains a certain identifier that indicates the offset in the file, from which it is necessary to read the necessary information.
Routing and delivery of information to operators
The kernel interacts with the internal routing system, which sends messages to operators. This part was developed by i-Free and is supported by the company's infrastructure.
We are connected to the operators, and transmit messages (SMS messages) using native tools. As a rule, this is SMPP, which the operator allocates to its suppliers.