It's all about the combination: the history of the security system of one site

One of the customers, a very large online store, once asked us to protect the web application from web attacks. Given the scale of the resource, we began to search for the right approach. As a result, we decided to use a combination of a positive and negative security model, implementing it using a firewall for web applications (in this case, we used the F5 ASM product as WAF, but in general the approach is universal for most WAFs). Many companies do not like to use the positive model, because they are very afraid (and often rightly) that for some users the resource may be unavailable and sales will decrease, and they will also have to spend a lot of time setting up policies. However, without this, the protection of the web resource will not be complete.
What is a positive and negative model?
The negative security model is probably the simplest thing that can be in WAF. From the point of view of the logic of work, of course. The setup itself can be quite nontrivial. A negative model can be described by the principle “ what is not forbidden is allowed ”. This is usually a collection of security policies and signatures that contain descriptions of known attacks. You cannot rely solely on such an approach for obvious reasons: there will always be a whole bunch of attacks for which policies or signatures are not written. However, one should not neglect such a model either, since it is the first frontier of defense, especially when you do not have a ready-made positive model or it is in the process of construction.
A positive security model describes the structure of a web resource with all possible limitations. This model works on the principle: “ what is not allowed is forbidden". Deviations from this model should be blocked. A positive security model is based on URLs, parameters, and methods. Moreover, the positive model can be configured very strictly, for example, for each URL to describe what methods can be used for it, what parameters, describe the valid values of these parameters, from which URLs you can go to it. Creating such a model for a large site is not an easy task. And if the site also changes every week or two, it is not easy to keep the model that has been suffering and licked up to date. It should also be borne in mind that no one can guarantee the absolute correctness of the constructed model. This means that absolutely normal user requests may well be blocked. From this we conclude that a positive model is, of course, good and safe, but resource availability may cause problems. A well-defined positive model is viable only for small infrequently changing resources. Consequently, for large and rapidly changing, it will have to be weakened for the sake of accessibility. But more on that later.
As a result, we get that using only one of these models has significant drawbacks: for a negative model, this is a low level of security, for a positive one, there may be problems with the availability of a web resource. But if you combine both solutions, you can already get an acceptable level of both security and resource availability.
Security System Development
Here is what the weekly average distribution looks like by type of attacks on the customer’s online store (based on the triggering of a negative model):


As you probably already understood, we decided not to step back from the classical approach to protecting a web application and create a hybrid of a positive and negative model. Started with a negative. F5 ASM has a large set of signatures available out of the box - just select the appropriate groups from the list. Signatures can be either system-independent or designed for a specific type of system (IIS, Apache, MySQL, MSSQL, etc.). Assigning all signatures in a row is impractical - this will only increase the load on WAF and may lead to additional false positives. For example, if you have IIS on Windows, it makes little sense to assign Linux group signatures, unless you want to get rid of all the garbage requests going to the web server, even if they cannot do any harm.

Signatures are a relatively accurate method, but you can’t have any illusions: you can’t avoid false positives anyway. For example, we were faced with a situation where developers included the word “shell” in the name of several URLs (in the context of “bumper” or “cover”). For such cases, of course, you will have to configure exceptions. After processing false positives, setting exceptions for some cases, or disabling the signature in principle, for others, we got a base for protecting the site. I almost forgot: in addition to signatures, WAFs can validate web requests for RFC compliance, which allows you to block any garbage and incorrectly formed HTTP requests. Of course, we did not neglect such protection.

But in order to find an approach to a positive security model, we had to sweat pretty. The site has more than 10 thousand diverse URLs. The very first and most crazy idea was to build a full-fledged positive model with a complete list of URLs, but we quickly realized the inappropriateness of this approach. And before we realized this, we managed to get this list automatically.
An important remark: you can build a positive model in manual mode, but the pleasure is dubious. Modern WAFs have an automatic learning mechanism, which, based on user requests, just builds the very positive model.
But not every machine learning is equally useful. Yes, for an element to fall into a positive model, it must meet in more than one web request, and also come from different addresses. But in the end, it still turns out to be a pretty decent amount of garbage to be cleaned. Also do not forget about URLs and parameters with dynamic elements. There can be a lot of them, and they can make a positive model maintenance-free. After automatic training, we got just something that is unattended. Everything was aggravated by the fact that daily from a few tens to hundreds of URLs were added / removed to a web resource and the system simply did not have time to make changes so quickly.
The second attempt to get a list of URLs automatically consisted in reading sitemap.xml, which in turn contained links to pages with what we thought were necessary for us. We downloaded the list of URLs obtained in this way by API. He was pretty big. However, this list was not complete.
After the experiments, we reluctantly decided to abandon the white list of URLs, so as not to produce a bunch of policy triggers if there is no URL in the given list. Instead, they decided to resort to a different approach - protection based on parameters, in which both user and some internal data are transmitted. The largest number of attacks is carried out precisely through the parameters. Given the fact that we refused the “white list” of URLs, we decided to protect the parameters without reference to a specific URL. In WAF used in this project, such an opportunity exists due to the use of the Global Parameters concept.
A problem can occur if different URLs have parameters with the same name but containing different data types. Usually, however, so few do. But if you encounter a similar problem, you will have to determine these parameters for each URL separately and specify different valid values for them.

In the case of the online store that we defended, the rule worked rather than the exception, and if parameters with the same name were found on different pages, they contained data of the same type. We also proceeded from the principle that through parameters that can contain exclusively alphanumeric values without special characters , attacking is extremely problematic. And therefore, in the list of protected parameters, only those that contain special characters were placed, everyone else described as "*". Thus, if a parameter containing a special character appears, we add it to the list. For all other parameters, there is a restriction: they should contain only alphanumeric values.

As a result, we got about 300-400 parameters. Such a model can already be serviced. After this model stabilized and the number of false positives decreased to an acceptable value, the next stage began, which many ignore due to strained relations between units. And completely in vain. It is highly desirable to coordinate the resulting model with the developers of the customer’s web application. This step can significantly improve the quality of the constructed model. In our case, quite intelligent specialists worked on the part of the customer, who, despite the high employment, went to meet us. We did not spend time acquainting developers with the WAF interface. Instead, they wrote a simple Python script, which parses the WAF policy in XML format and on the output gives a csv file with a list of parameters and special characters allowed for them. The file is transferred to the developers, they look for the desired parameter in their dictionary and adjust the permissible values if necessary.
One of the problems with a positive security model is that it is difficult to limit some parameters in any way. For example, user input comments, especially when the customer does not want to restrict users. In this case, to protect such parameters, you will have to use the services of a negative security model, checking user input for the presence of known malicious code. And of course, such parameters must be additionally carefully checked on the side of the web server itself.
Another feature that one has to deal with is user input errors. For example, in the input field of the city name, the user writes “ {f, fhjdcr"Instead of" Khabarovsk ". As a result, his request should be blocked, because it violates the permissible parameter values, in which only alphanumeric values can be. It would seem that it’s okay: the user made a mistake - the user paid. But not all customers may be satisfied with such a WAF reaction, in which case they will have to weaken the positive model for the sake of functionality.
As a result, we got a combined system that combines both models. Next, the rather tedious process of updating begins. With each release of patches or releases, a positive model can change, new false positives can occur both in one and in another model.
Well, after that the most crucial stage begins - the inclusion of the blocking mode. Up to this point, WAF, of course, works by generating security events, but does not increase the security of the web resource and does not affect user activity in any way.
In our case, due to the complexity of the approvals on the customer side, the process of activating the blocking mode has dragged on. There was no strong-willed decision that would allow us to move on. Therefore, for a fairly long period of time, we simply updated the existing models.
Commissioning
Finally, the customer decided, but put forward a number of rather stringent requirements. The number of false positives should be minimal, and the site’s accessibility should be maximum. The requirement, of course, is excellent, but difficult to put into practice. However, we were not provided with any quantitative metrics to fulfill these requirements. On average, the number of web requests per day ranged from 100-150 million.
As previously shown in the figure, on average, the negative model was triggered on the order of 100-150 thousand times a week, which gives a figure of about 20 thousand operations per day. Most of these detections are really attacks, in most cases automated. Based on our observations, the number of false positives as a percentage was no more than 5%. I’ll make a reservation right away: we didn’t look at all 20 thousand requests per day, however real attacks are rather easily eliminated by analyzing the attackers IP addresses. It makes no sense to browse all requests from the attacker's address, two or three are enough.
The positive model yielded about 50 thousand positives a day. For the most part, they duplicated alerts of the negative model, and such positives could be easily thrown out of the analysis, but about 20 thousand requests had to be analyzed. And here a lot depends on how WAF can group responses, simplifying the analysis. In our case, WAF grouped responses by special characters, and then by the parameters in which this special character was encountered. Further, everything is more or less simple: we look at the combination, the parameter is a special symbol for which there was the largest number of operations. If they were from a large IP range, this is a sign that the alarm could be false. And then we analyze examples of requests. In most cases, it is possible without outside assistance to understand whether this is an attack, a mistake in entering users or a lack of a positive model. In case of disputes, do not disdain communicating with developers (unless, of course, such an opportunity exists). After the analysis of mass responses is completed, we proceed to the analysis of single responses. They are usually more difficult to analyze, especially at the very beginning, when you do not know the specifics of the application. Upon completion of the analysis, we get from 5 to 10% of false positives, which together with false positives of the negative model gives up to 6000 false positives or 0.005% of the total number of requests. This figure no longer scares away - in the end, it arranged the customer. An important remark: this does not mean at all that the work of 6,000 users will be paralyzed. Just when you go to some page or enter some data, the user will be redirected to the page with the error,
Total: the numbers are received and they suit the customer, but it will not be superfluous to play it safe, especially if the client insists on it. We decided to put the protection system into operation in stages. To test the inclusion of blocking, we were kindly provided with a macro region, in which the customer tests his new releases and patches. The traffic there is quite military, the regions change periodically and there are cities / regions with a population of over one million. We copy the existing policy and apply it to the macro-region. And then turn on the lock, but also not immediately, but only for a negative model. Testing for a week - the number of false positives did not exceed the calculated value. After that, we turn on the blocking based on the negative model for the entire site, we also look at the week, we measure the indicators. Next, we do the same trick with a positive model: first turn on the lock for the macro region, a week or two we look at the results and turn on the lock for the entire site. It may seem that everything went suspiciously smoothly, but do not forget that this was preceded by a long stage of updating the models, which allowed us to turn on the protection quite painlessly.
What to do if there is no full test zone (test macro-region in our case)? You can select a group of internal employees who will also access the resource through WAF and enable blocking only for this group of users. Yes, such testing is less complete, but still better than nothing.
As a result, we can recognize that WAF is either not fast or poor quality. Many customers are not ready to wait enough time to form a high-quality positive model, or do not want / are not able to update it. In this regard, they either refuse the positive model, or leave it to their own devices, not including blocking. There are also those who completely and completely rely on the automatic construction of a positive model; as a result, it turns out to be far from ideal. Both of these approaches are not able to provide full protection. WAF tuning needs to be approached thoroughly. The time spent is worth it.
Andrei Chernykh, expert at the Information Security Center Jet Infosystems