Site Survival Checklist

Recently, I somehow suspiciously often observe the most primitive of the same type and quite easily solved problems on a variety of web-projects. Different bases, different languages, different fields of activity and monetization schemes. All of them are united by one thing - the slogan "business does not allow to rewrite." The ongoing or just completed stage of rapid development of a growing and aggressively squeezing the market share of the project from its competitors gave birth to a huge pile of so-called "Govnokoda." Doubtful architectural solutions either already bring a bunch of problems, or promise them in the future, but they work. The stream of new requirements does not allow time to clean up even in the infrastructure, not to mention the code. If you are familiar with this situation, welcome to podostalgize under the cat, learn something new and / or teach us. To whom to neigh, and to whom to cry.

“This is all just for hylood,” a thoughtful and perspicacious reader will say. Bad is the web project that does not dream of becoming a popular high-end.

Problems №1: base.

The most unpleasant problems on any web-project are always related to the database. We can easily scale everything else - from DNS-balancing to the upstream directive in the nginx config. “What about clustering?” Asks the thoughtful reader. That is the problem. This is the third time I have seen a cluster of violent databases. Twice MySQL and once MongoDB. Indexes are not configured, tables (collections - what's the difference?) Are not cleaned, but expensive servers for the cluster are paid. And basically these servers are busy raking non-indexed data and building unused indexes in the name of entropy.

This group of problems is especially widespread in offices practicing the now trendy trend of separating backend developers from admins / DevOps / NOC.

Why is it scary to keep the base tired? Yes, because you lose everything to hell: orders, customers, SEO page rank. And why should the hoster overpay?

Personally, I, spoiled by a poor childhood, immediately have a cry of the soul: do not pay the hoster, pay better to me.

Another wonderful thing: if there is a tired, tired base under your feet, and as a result, a few-second response from the web server on almost all pages, perform an impregnation performance without carefully touching the base.

Problem "n + 1"

It turns out that there are two large types of rape base, although a couple of months ago about this, the funniest of them, I personally did not suspect. Have you heard of the “n + 1 problem”? I kind of recall something like that in a deep junior childhood. In life, I would not have believed that something like this could break into a commercial project. The easiest way is to characterize the problem with pseudo-code:

list = db.query('SELECT * FROM products;')
for (item in list){
      orders = db.query('SELECT * FROM orders WHERE product_id = ?;', {product_id: product.id});
      ...
}

The problem is identified easily. We take the full access log of the web server and the query log database for the same period, and stupidly compare by volume. If 50MB access logs turn into a 20GB query log, the problem is identified. Of the bad news - you have to modify the code, and there is unlikely to be something good waiting for you, with such an attitude to the database.

The slogan of the problem: webdev is just the conversion of user actions into database queries and the output of results. Everything.

The go-programmers with their goroutines are most affected. To a lesser extent, it is inherent to adepts of the rail, apparently used to the fact that ActiveRecord usually takes care of such problems. And found in php and js encoders.

This also includes queries of the form:

SELECT p.*, ( SELECTcount(*) FROM comments WHERE product_id = p.id) comment_count FROM products p WHERE author_id = ?;

Gentlemen, this is not one request. This is also n + 1. Especially wonderful lack of LIMIT. Changing to JOIN (a subquery with GROUP BY) is also not sugar, but if you use indexes, you can live. In general, these are two queries and code leveling. Hands, if your ORM does not know how. Do you want - I will give lib for this.

Problem with indexes

For some reason, fellow web developers are distinguished by an enviable persistence in contempt for the correct indexing of data.

The problem is identified simply: download, for example, dev.mysql.com/doc/mysql-utilities/1.5/en/mysqlindexcheck.html or www.percona.com/doc/percona-toolkit/2.1/pt-duplicate-key-checker .html , run, look at the length of the list of extra indexes. If there are many of them - the base on the project is not well-groomed, you must definitely check the requests. If you didn’t find it, turn on the unindexed query log (depending on the type of database in different ways, google to help) and look at it. If there is nothing criminal, we can conditionally consider the indices put down neatly and skip to the next step. Although experience suggests that the princess in most cases is in this castle.

If at least one of the options gave an exhaust - get ready for hard and painstaking work. Unfortunately, you will have to not only put down indexes where possible, but also modify the code. For MySQL, which cannot be “ROWS examined” in EXPLAIN SELECT, you will need a full query log (long_query_time = 0). If you aggregate these data correctly, you can get nice statistics. For example, I like the sum (Rows Examined) parameter - it shows how much this type of request is storming the database. And also - the ratio of 95 percentiles according to the parameters “Rows Examined” vs “Rows Sent”. It shows how this type of query can be optimized. You can write the aggregator yourself or use www.percona.com/doc/percona-toolkit/2.2/pt-query-digest.html. But be extremely careful and careful - errors in aggregation will lead to a bunch of wasted efforts. Use the Cartesian principle of universal doubt - any aggregation mechanism that you didn’t see for proper operation should be considered potentially buggy.
And do not forget about the non-zero cost of recounting indices. Sometimes an extra index is worse than an absent one. Minimizing the number of indexes used is the first task that will be faced, for example, by ignoring the law "if you want to use the RDB table for queuing, do not use the RDB table". But not the last. The line is beautifully built on mechanisms

SELECTFORUPDATE ... SKIPLOCKED;

already available in PostgreSQL, if anyone is interested.

The most common misunderstanding of indexes is found among php developers. If I’m not too lazy, I’ll finish the tutorial on the topic “What indexes cannot and why” for PHP lovers.

Table size optimization

Remember the statement that expensive servers are exclusively occupied with increasing entropy - recalculating unused indexes and scooping up non-indexed data? Add to this the unfoundedly spread tables - and you will get a picture competing with its ruthless costly futility with Skolkovo or with the Orwell war for the purpose of disposing of overproduction. Position No. 1 in the ranking of the most obvious and technically mediocre of all the reasons for the growth of tables known to me is occupied by the desire to store outdated data forever. For example, in a social network for some reason they keep deleted messages forever. For some reason, in the main, far from underloaded, table. And yes, in the same base.

Solution: what about an additional base with a zip prefix? tables? mysqldump? backup?
Scary: I met projects where, due to the principle of "keep forever," it was impossible to do ALTER TABLE because of the bloat of highly loaded centralized tables. How can such a table be maintained? To give tambourines to regular psychics and earnestly pray for the health of the table to everyone else? Why live like this?

Joins

A request to 10 tables on the web in production can serve as an excuse for dismissing its author and reviewer, IMHO. Gentlemen, union, joins and subqueries of the 3rd level of nesting are a good way to show off. Or search for something in the database for yourself. But there is no way to communicate with the database at least a bit loaded project.

Including due to locks. Loki becomes a problem long before you see the deadlock in the error log. Yes, the number of locks can be reduced by uniformly sorting conditions like parent_id IN (sorted list id).

Your DBMS is guaranteed not familiar with the business logic of your application and can only try to figure out exactly how it will be best to pull you data from a dozen requested tables. Only you know this. What, you cannot construct hash indexes in php and join two data arrays in memory? What about the library? If you don’t find it, I will show mine.

Table centralization issues

In a rather complex asynchronous system, into which any large web project turns, the database and / or its tables become resources, since they are blockers for some types of operations during execution of other types. For example - recalculation of an index blocks the use of this index and certainly blocks another recount.

For some reason, the word “resource” is used to designate exclusively the hardware characteristics of the systems used - CPU, bandwidth, RAM. But to determine the permanent or peak shortage of such resources is quite simple, the correct tools - munin / monit / sa + sar / htop and / or a competent admin who knows how to use all this - will tell you everything about the situation for little money and in a very reasonable time.

But for some reason no one is trying to treat tables in a relational database as resources. But this is an obvious solution. If an UPDATE query on the table leads to a recount of the index, then no one using this SELECT index before the recount is completed (well, or during the switch, if you believe in the equal hands of the authors of your DBMS) will be executed, alas. In PostgreSQL, when using immutable tuples, any UPDATE recounts all the indexes on the table.
UP1: According to terrier, PostgeSQL is good for those who know how to cook it. Uber made a mistake and are spiteful. Not so very surprising. Remember Einstein’s statement about the infinity of the universe?

In MySQL, at first glance, everything is not so scary. But a well-organized highly loaded table carries exactly one or two most important indexes, and most updates will still lead to their recounts. Why do you need one table for all types of products?

Solution: create a table products2, ensure that the values of its primary key do not intersect with the source table. Enjoy it. If any more or less massive type of product differs from the others in structure, normalization of the database itself requires you to put it in a separate table.

Scary: people who like to centralize tables are characterized by a craving for centralization of (micro) services when building architecture. Not for them, you see, it says "do not encapsulate in services, encapsulate in classes." The result is the same - a bunch of bottleneck with non-obvious reasons for the existence of interfere with scaling the system with increasing data volumes and / or load.

Slogan: do not encapsulate in services, encapsulate in classes

Problem number 2 - code

On any project, any programmer always considers all code to be shit, except for the one he is writing right now. And then - not always.

Apparently, the amount of good code in the universe is a constant value. Some kind of conservation law, as with mass-energy or money. Therefore, when a programmer writes good code, some code at that moment becomes bad.

So you should not bother with the quality of the code beyond the minimum. CodeSniffer plus code review at the first level, code quality should satisfy any subjective-evaluation criteria of code “readability”.

Deeper - worse: for some reason, excess layers of abstraction are not clipped at the stage of code review. As well as over-pattern-usage. Do you know when a singleton is needed? To limit the availability of the resource access to which is already organized through the created non-unique instance of the class. If a singleton is written from scratch, you will end up with an antipattern in most cases. Dependency Injection is a pattern that makes it easy to palm off moki during unit tests and / or build an application using a config file, in the style of ZF 1.x. Otherwise - atipattern. Repository + Entity - driven db access in 100% of the cases I observe turns into legacy code unsuitable for further use. Most likely, the reason is that the repositories are used in stateless mode - as a group of functions with a similar purpose. Unlike its competing ActiveRecord pattern,
At the same time, surprisingly, simpler and really helpful SOLID rules are not used. Yes, what's SOLID - not even the encapsulation from the OOP definition is used. It feels like web virgins remember what encapsulation is only for interviews.

Solution: to show how a properly built hierarchy of interfaces (remember, design at the level of abstraction?) Allows you to create flexible and easily appended applications.

The slogan of the problem: excess layers of abstraction - this sucks. You need to think not with patterns, but with your head.

And yes, you are programmers yourself. Do you believe that there are programs without bugs? All the tools you use are also layers of abstraction, only architectural and service ones. Webdev is the transformation of user actions into database queries and the output of results. Explain to me where apache is hiding in this formula? Do you really like to see routing scattered across the heap of .htaccess files?

Problems No. 3 - front

The most despised class of programmers is javascript. Jokes about this. Distinctive features of projects in which the a la style frontend is zero - a bunch of plug-in files and / or JQuery mixed with ExtJS, and / or your MVC bike, evoking thoughts of early backbone.js. That is - an absolutely unsupported code, unreasonably expensive on any modifications, a priori buggy and crutch.

Solution: normal, es2015 javscript. A single-page application with routing, not a bunch of conflicting jQuery plugins and conditions. Single entry point. Gradual evolutionary movement, starting with routing. Reasonable and thoughtful choice of architecture-determining technologies. For example, TypeScript is opposed to the very idea of JS: anarchy, discord and a complete mess of high-quality rapid development. IMHO of course.

Problems №4 - environment

I don’t understand how a web developer in general can keep Linux on a working machine. Yes, the interface is terrible, fonts fly off, windows are ugly. X is a fine example of disgusting architecture. Yes, you have to think and / or google where in Windows and / or Makos it is enough to push buttons. Well, we are not designers. Well, here I will not continue, I’m not scribbling an article for the sake of clipping.

I don’t understand how it’s possible for a developer to customize his dev environment not for himself. Yes, let the beginner suffer a couple of days with the installation of the project. Well, his level will be immediately visible: on the questions asked and the problems that have not been resolved on their own. In the same way, a beginner will understand a lot about the project.

I don’t understand how you can code with turned off vorings. Free early error detection is not needed? Is it really easier to hire another department of testers?

I don’t understand how you can live without a beta site. Where will these extra departments of testers graze? How to provide zero-time-deployment without beta I can not imagine even more so.

I do not understand how it is possible without zero-time-deployment. To the office, do not get fined for a downtime site?

I don’t understand how you can keep a project in production without having a tuned integration test system. What, is it difficult to put Jenkins and put in a script that will login / register / check mail / buy / sell and panic once in an hour to an email / sms / hipchat? Really do not want to learn about problems not from the client? Ah, but not fined.

I don’t understand how you can keep all the code with the configs in the web root? Have you definitely deny from all registered everywhere? Not drawn to double-check?

I don’t understand how you can promote a site that has a few seconds page load time. This is expensive!

This ends the list of problems that can be solved from a technical point of view. Then they begin ...

Organizational issues.

Not all of them are solvable, unfortunately.

Problem number 0: communication.

You can only speak business with the dough language. He is not susceptible to the beauty of solutions, normalization of the base, CAP-theorems and other IT-values. He understands money and timing.

Solution: to select several such tasks from the backlog so that it can both reasonably and honestly tell the client that it will be cheaper to implement these tasks on the converted system, even taking into account the alteration. If you can’t pick up and justify the list of tasks - leave the boy, it’s too early for you to rewrite the system. Tasks, if anything, are better to choose fresher - the client will be drawn to them more strongly. And you don’t have to answer the question “why did you decide just now?”.

Slogan: business never lets rewrite. Write immediately normally.

Problem number 1: management

Govnokod itself is not born. Look at the micro-soft ones - Bill’s authoritarian management style, which allows itself to yell at subordinates and disrupt their deadlines, gave birth to such idiotic architectural and / or technological solutions that even end-users realize that they suffer from their consequences. A distinctive sign of the problem is the phrase “all programmers always want to rewrite everything. We will not".

What to do: convince. For example, through articles on Habr. Or leave. This is the only problem that cannot be solved systemically without an ax.

The “centralization of people” can be attributed to these same problems - the most intelligent people are so busy and control so many things that they cannot penetrate into any of them. And forever you have to wait for their decision. Why not bottleneck?

Solution: microcommands.

And yes, continuing the parallels with application architecture - extra layers of abstraction here are also sucks. Back in USSR, when for every hard worker there are two managers. Well, they had an oil and gas pipe. And there is.

The fewer ears on the way of information from the source of the requirements to their implementer - the better. And it turns out like a picture-story with a rocker. And a couple of departments of superfluous parasites - managers engaged in the war with JIRA / Redmine and / or the execution of orders of other managers.

The most important thing is not to give up. After all, perhaps you are in the most interesting stage - writing the platform of a settled monetary project. A great line in the resume if that, but for now - a wonderful financial recharge. Can you imagine, for example, how the phrase “got rid of several terabyte base” casually thrown at the interview will raise your bid?

And of course, everyone who has something to supplement and / or adjust this list - welcome to comment.

Tags: