
Fear and hatred in a single startup. Part 1 - Fear
It's time to describe the architecture and features of the operation of a single application. And for yourself, so as not to forget, and for others - try to show how to do it is not necessary. All coincidences are random, all characters are invented. Only the technologies used and the described, ghm, architectural solutions are real. Go.
And it all started innocently: a Ruby on Rails project, micro-services architecture, a separate balancer with a “hot” backup ... Only then I found out that: the servers are running on Gentoo; microservices use a cunning algorithm for registering themselves in the application, which leads to an explosion of the entire application if at least one service crashes; no monitoring; no documentation; populated by robots. But enough lyrics, let's see how the project lived when I came to it.
To date, 90% of the writing below is irrelevant - it looked like about two years ago. The narrative, however, is in the present tense - it’s more convenient for me.
As I said, there is a Rails application, and a number of services on the backend. The exchange of messages between the Rails application and the services is done via a self-written message bus. Next to it is Redis, which stores a slice of the current state of entities, an application on NodeJS + socket.io (messaging between the frontend and backend). Simplified, it looks something like this:

The application operates with certain entities that are stored in the “Rails db” database. Services operate on the same entities as the main application, but each has its own separate base. Entity synchronization occurs through messages in ZeroMQ by events, which the main application generates. Each service consists of the service itself, and a set of event listeners (separate processes in essence) that receive messages (each listener processes messages of the same type), and tell the service what needs to be done (which fields in the database need to be changed). The process of the service itself also accepts a certain type of message, through which it transfers data from its database to the main application. Both services and the main application actively work with Redis (read and write); There is no synchronization / locking (i.e. races are possible, and they are). This is with regard to application architecture. Now, let's briefly go through the servers, and move on to the most interesting - the features of exploitation.
So, servers are ordinary dedicated servers (who said “virtual machine”? Ordinary pieces of iron). The average configuration is 2x Core i7, 32-64 GB RAM, 2x 1T HDD / RAID1. By quantity - we consider. Two “balancers” (one main, the second “hot” reserve, switching - floating IP). Three servers for services. One server for the database. One server for broker and Redis. One server is a test server (all components spin on it at once). Total - 8 pieces. There is no LAN between the servers, all traffic walks through the DC (some servers are spaced at different racks).
What is wrong with the above, you ask? Yes, that's right, but the devil, as usual, is in the details, and in our case it is everyday operation and support. Further, I will simply unstructuredly describe some points that I consider problematic.
Instructions for adding a new server begin with the words “install Gentoo”. Yes, the entire stack is spinning on Gentoo, with all that it implies. Provisioning is done with a bash script. Synctool is used to manage configurations .(such rsync on steroids per se). Application deployment - git pull hands on all seven servers. In the code of the main application in ten places the names of the database hosts, Redis, and broker are nailed. The division in production / development code is very conditional (it takes 2-3 days for a programmer to set up an environment for development). Services are so tightly integrated with the main application that if one crashes, the whole platform falls. The service itself is from three to fifteen processes, each of which simply starts up with hands on the screen (yes, 15-20 screen sessions are constantly running on one of the servers, and the file “here such screenshots should be on this server is in the repository with documentation” after reboot ”).
And most importantly - the application admin panel. Born in the hidden nooks of consciousness of a gloomy Teutonic genius, an explosive mixture of bash, Ruby and Falcon . Did you know that there is such a programming language - Falcon? Now you know (follow the link, by the way, the language is worthy of attention as an extension of horizons). The “admin panel” itself is just a set of SQL queries, and pieces of Ruby code, which allows you to operate with various entity parameters. There is nothing in it that justifies the use of anything other than bash. The same falcon - for what the hell it is used there I did not understand; everything that falcon scripts do is realistic — they form something like this under certain conditions, and do it through system ():
Everything described above was created by one person for two years. I’m not going to evaluate his programmer skills, but the admin (or devops, as you want) from him, frankly, is crap - which costs one Ghent on the prod instead of a normal binary distribution. At some point, even the business realized that it wouldn’t live normal again, and I was asked to see what can be done to try to take off with all this garbage. In the next part I will tell you what and how I did to minimize the amount of entropy in the Universe.
We had 2 balancers, 7 servers for the application, 5 self-written gems, a self-written kernel config and a whole host of programming languages and technologies of all kinds and colors, a database, as well as todo lists in Basecamp, NodeJS, soket.io and 2 dozen keys per entity in Redis. Not that it was a necessary supply for a startup, but if you started writing code, it becomes difficult to stop. The only thing that caused me concern was ZeroMQ. Nothing in the world is more helpless, irresponsible and vicious than the self-written message bus based on ZeroMQ. I knew that sooner or later we will move on to this rubbish.
And it all started innocently: a Ruby on Rails project, micro-services architecture, a separate balancer with a “hot” backup ... Only then I found out that: the servers are running on Gentoo; microservices use a cunning algorithm for registering themselves in the application, which leads to an explosion of the entire application if at least one service crashes; no monitoring; no documentation; populated by robots. But enough lyrics, let's see how the project lived when I came to it.
To date, 90% of the writing below is irrelevant - it looked like about two years ago. The narrative, however, is in the present tense - it’s more convenient for me.
As I said, there is a Rails application, and a number of services on the backend. The exchange of messages between the Rails application and the services is done via a self-written message bus. Next to it is Redis, which stores a slice of the current state of entities, an application on NodeJS + socket.io (messaging between the frontend and backend). Simplified, it looks something like this:

The application operates with certain entities that are stored in the “Rails db” database. Services operate on the same entities as the main application, but each has its own separate base. Entity synchronization occurs through messages in ZeroMQ by events, which the main application generates. Each service consists of the service itself, and a set of event listeners (separate processes in essence) that receive messages (each listener processes messages of the same type), and tell the service what needs to be done (which fields in the database need to be changed). The process of the service itself also accepts a certain type of message, through which it transfers data from its database to the main application. Both services and the main application actively work with Redis (read and write); There is no synchronization / locking (i.e. races are possible, and they are). This is with regard to application architecture. Now, let's briefly go through the servers, and move on to the most interesting - the features of exploitation.
So, servers are ordinary dedicated servers (who said “virtual machine”? Ordinary pieces of iron). The average configuration is 2x Core i7, 32-64 GB RAM, 2x 1T HDD / RAID1. By quantity - we consider. Two “balancers” (one main, the second “hot” reserve, switching - floating IP). Three servers for services. One server for the database. One server for broker and Redis. One server is a test server (all components spin on it at once). Total - 8 pieces. There is no LAN between the servers, all traffic walks through the DC (some servers are spaced at different racks).
What is wrong with the above, you ask? Yes, that's right, but the devil, as usual, is in the details, and in our case it is everyday operation and support. Further, I will simply unstructuredly describe some points that I consider problematic.
Instructions for adding a new server begin with the words “install Gentoo”. Yes, the entire stack is spinning on Gentoo, with all that it implies. Provisioning is done with a bash script. Synctool is used to manage configurations .(such rsync on steroids per se). Application deployment - git pull hands on all seven servers. In the code of the main application in ten places the names of the database hosts, Redis, and broker are nailed. The division in production / development code is very conditional (it takes 2-3 days for a programmer to set up an environment for development). Services are so tightly integrated with the main application that if one crashes, the whole platform falls. The service itself is from three to fifteen processes, each of which simply starts up with hands on the screen (yes, 15-20 screen sessions are constantly running on one of the servers, and the file “here such screenshots should be on this server is in the repository with documentation” after reboot ”).
And most importantly - the application admin panel. Born in the hidden nooks of consciousness of a gloomy Teutonic genius, an explosive mixture of bash, Ruby and Falcon . Did you know that there is such a programming language - Falcon? Now you know (follow the link, by the way, the language is worthy of attention as an extension of horizons). The “admin panel” itself is just a set of SQL queries, and pieces of Ruby code, which allows you to operate with various entity parameters. There is nothing in it that justifies the use of anything other than bash. The same falcon - for what the hell it is used there I did not understand; everything that falcon scripts do is realistic — they form something like this under certain conditions, and do it through system ():
echo "SELECT stuff FROM entities" | psql -U postgres db_name
Conclusion
Everything described above was created by one person for two years. I’m not going to evaluate his programmer skills, but the admin (or devops, as you want) from him, frankly, is crap - which costs one Ghent on the prod instead of a normal binary distribution. At some point, even the business realized that it wouldn’t live normal again, and I was asked to see what can be done to try to take off with all this garbage. In the next part I will tell you what and how I did to minimize the amount of entropy in the Universe.