What I do not like modern web

Recovery Mode

The very first step in working with the web is sending data to your server application. And if the parsing of a dozen small lines can be entrusted to the framework, then what about loading files?

Take, for example, PHP, although the description is true for 99% of other languages and technologies. Suppose we want to allow the user to upload pictures to the site, for this we make a file type field and ... Outwardly, everything is very simple, only a few bytes have changed in the form and in the code, but now instead of text from forms you can work with files! But not everything is so simple, your file must first be located somewhere in / tmp /, until the request comes completely, your script just does not get control and you can’t do anything about it. For example, instead of a picture, the user downloaded the exe-file, but you will know about it only after the download is completed. Thus, an attacker for some time may clog the channel and time of your disk subsystem, pretending that it loads useful files, and you will not even know about it. If the caching server is in front of the application server, then the situation is even worse: for example, nginx creates temporary files, i.e. First, the request from the user will settle on the disk, as soon as it is completed, the file is re-read and quickly sent to the application server (in our case, php), where it is saved to disk BACK. Total triple use of the disk, even if the user just needs to display the message “you forgot to enter the captcha”.

I'm not talking about the fact that more fun things with this approach can not be done. A simple feature like the "file download indicator" becomes inaccessible. For more complex examples: Youtube shows footage from a movie that is still being downloaded but not fully downloaded. You won’t get any control (and even notifications!) Before the entire movie (2 gigabytes, for example) is loaded. You will not even know that someone has burned 1.5 gigabytes of your disk, but then closed the browser or clicked the “refresh” button in the browser, without waiting for anything.

Of course, there are various crutches of varying degrees of curvature, which allow you to solve typical tasks, such as “getting request statistics via json”, implemented as web server modules, but such things have to be done independently and / or attached to the environment, the application ceases to be independent and grows dependent on specific applications and their libraries.

Cache

Caches are vital. The caching technique allows you to speed up the responsiveness of your site and reduce the load on it, allowing you not to do the same type of operation for multiple requests. For example, how many do not do 2 + 2, there will always be 4, but calculating 2 + 2 takes server resources, it is much more profitable to calculate this value 1 time, when the first visitor arrives, and then write down somewhere, in order for all other users to issue this ready result.

Do not confuse this caching with http-headers, they only have an effect on a specific client (in the original also on intermediate proxies), while caching on the server is designed to give the same content to multiple users.

Alas, it is not profitable to give caching to an intermediate server, in the case of the slightest update on the page, you have to create a page from scratch, and taking into account modern realities, when there are a lot of dynamic elements on the page, then virtually every page will be unique, and on the other hand, the GET / somepage.html? someshit = 12345 will break through the cache and a new page will be formed, which will not even take this parameter into account, but nevertheless, there will be expenses on its creation, which can again be used by an attacker. Therefore, the intermediate cache server is now very few people use and rely on them is already very difficult.

It remains to cache everything inside the application, almost every framework provides its own crutches, and ready-made like all kinds of memeshchedov and the like shit. Usually, novice developers simply write with boiling water when they find out that when generating a page, you can make 500 requests to memshed and there will be nothing for it (unlike all your favorite mysql). As a result, the entire code is covered with wrappers, which first check the result of the query in memkesh, and then climb over the result in mysql. I don’t argue, manual control of the cache is necessary, but full manual control leads to possible errors, where caching can simply be forgotten to include what, according to the law of meanness, will be a critical place.

Interfaces

What interface should the site have? Just do not say that greenbacks.

Previously, as a rule, the presentation of the site was one and indivisible. However, on the large portals there were buttons “version for printing”, or even a “wap-version”, which was later replaced by the “PDA-version” - already plain HTML, only more simple. Although looking where, if you get Twitter, then this is the only readable version of him. Time passed and there was a need to edit sites, not only for printers and phones, but also for all ipads and refrigerators with HTML5 support. Naturally, the developers loved all this, because they had to actually do 10 versions of one site, and they decided to do something universal. Some kind of API for the site.

What is the API - I do not know. Honestly, I do not know. Usually this is some kind of shit, when you spit on a server with a piece of urlencoded string, and in return you get a piece of json / xml / plain text, how lucky it is. Of course, no standards, even primitive data types can be anything, an empty result can also be anything from null to "" (empty quotes), or even no result. It is good if the authors read about such a word as REST and rushed to implement it, but against the background of everything else, it does not make any sense. The functionality is also not clear, if by requesting an HTML page we can get everything at once (latest news, comments, likes, etc.), how many requests we need to make in the case of the API - only the author of this API knows, it is quite possible that comments can be obtained, but likes to them are not. In fact, an API is a way to divide client and server development into completely different development teams that interact poorly with each other. And there can be no talk about the usefulness of this API.

And often it happens that for access to the API requires a certain key. Keys can often be obtained for free. Why not get such a key? The problem is that this leads us to explicitly taking requests into account, to some internal accounting. And who knows when the author wants to monetize all this? It is possible that after a while the free keys will be disabled or severely restricted by offering to use the paid rate. And sometimes in general, the issuance of all keys is suspended, and service is stopped, even on a commercial basis, this also happens. So why API? It is easier for me to pull off a page and parse it with regulars, than to tolerate such an outrage. Therefore I almost never use these your API, especially I am not going to deal with their features.

Tags:

What I do not like modern web

Cache

Interfaces

Also popular now: