Load testing of Web-systems. We continue to prepare
The article describes a number of points (the number of connections and the sequence of execution, third-party resources in the script, grouping of requests) that you should pay attention to when preparing to run a test with a high load on a Web-system with a Web-based interface.
I suggest considering the following script configuration items that may affect the performance of your test.
I want to describe why the use of real HTTP methods is more important than its "fast" counterparts. We will touch upon the need to use checks of the received data and building regular expressions to get values.
In my opinion, at least in the final testing of the Web-system, it is necessary to use the requests that the browser sends. If it was a GET request, then we should only simulate it and not replace it with HEAD, for example. The main difference between these methods is that GET receives the contents of the response, but HEAD does not. It would seem why should I get unnecessary data, for example, from pictures, css, fonts, but as practice shows, they are no less important.
Compare two types of queries for the same
HEAD
GET test resource
The pictures show that the GET request takes many times longer than the HEAD request. Therefore, the server gave the data for this request longer and could not serve the following. With one user, this difference does not seem significant, but when using 1,000 virtual users, the server will spend time no longer processing each of them.
For example, our Web server is configured in such a way that it can only handle 100 concurrent connections. As a result, we will see that the first 100 are connected and will work for at least 3 seconds. At the same time, all the remaining 900 will wait for connectivity. As soon as someone from the first hundred finishes, he will give resources to the next. That is, the thousandth user in our example will be able to start working with the specified request only after about 27 seconds. If we used the HEAD method, then the thousandth user would have access to the system in 2 seconds. (these calculations are extremely rough)
As a result, we see how using the “correct” method to access the server shows the real load on it.
The example used in the pictures is completely synthetic to display greater visibility in terms of query execution time. You may not have so many requests. But even with a response of 100-200 kilobytes and using 5,000 users, we can observe significant slowdowns in the operation of the Web server.
Many will say that checking the received data is not a load test, but a functional one, and you will be right in some respects. In practice, it turns out that the functional part of the Web-system does not start working correctly under heavy load. For example, a web application may not properly process incoming data under heavy load.
I came across situations when the web-system reported the success of the request, sending a response of 200 OK. But the request body was empty, after a deep study of the system, it was possible to find out that this is the planned response. That is, the success of the request could only be determined by the presence of content in the response.
It turns out that the only way to verify the correct operation of the system under high load is to control the data received, and not just the status of the response.
Today, with the intensive development of technologies for dynamic work with web-systems (AJAX, WebSocket, Flash, Java, etc.), we can receive different content on the same request. And you need to be sure that the response text is correct.
In many utilities, regular expressions must be used to perform load testing in order to verify the correctness of the received data. We all know what it is, but what is the “regular” regular expression. This is an expression that takes as little time as possible to find the original value.
On the Internet there are a lot of examples and articles on the low performance of certain regular expressions or engines on which they are used. They describe what lazy, greedy and super-greedy quantifiers are. Why and when to use groupings and alternations. How to specify to increase the speed of the regular expression engine. I think to whom it is important and interesting, they will be able to find information.
I want to demonstrate why they need to be built correctly. Let’s take a certain sequence of queries that had some delays during recording.
In the picture, the delays are indicated by green arrows. When running the script, the utility should simulate delays between requests thereby guaranteeing the "real" user load.
From the previous paragraph, we found out that it is necessary to control the received data. We do this with regular expressions. As a result, for some of the queries, we will compose the expressions and begin checking.
Checks performed should take time to search for data. As a result, we get something like the following.
Red arrows show some time that will be spent on regular expressions.
As a result, we see that the total execution time of the script has already deviated a bit about how if one user did it. If we take into account that each virtual user should do such a check, and we have 1,000 of these running on one computer, then the time for processing them will increase several times. Each virtual user will capture operating system resources for computing. Therefore, while resources are occupied by some users, others cannot access them.
The more accurate our regular expression is, the less time it takes to process it, and we get a more real load on the system with a large number of virtual users
Of course, for many, the use of regular expressions, and even more so their "correct" option is not required. But if you want to bring your test closer to more realistic conditions for virtual users, then you must not forget about the speed of the service functions.
We examined three possible options that may affect the distortion of the results of load testing. Of course, not all of the situations described will affect such parameters as server connection time, server response time, and other characteristics you receive.
But in most cases, we are interested in how many real users execute one or another scenario, how fast the server will respond to us, and how much time each of the users will spend on this scenario. But on these parameters the items examined can be coddled.
Your server’s response time to a request may be several milliseconds. With a larger amount of data received and their processing, the execution time of the entire script may take several minutes, which is sometimes not acceptable for customers of such a test.
Update: completing the article
I suggest considering the following script configuration items that may affect the performance of your test.
I want to describe why the use of real HTTP methods is more important than its "fast" counterparts. We will touch upon the need to use checks of the received data and building regular expressions to get values.
Getting responses with real data size
In my opinion, at least in the final testing of the Web-system, it is necessary to use the requests that the browser sends. If it was a GET request, then we should only simulate it and not replace it with HEAD, for example. The main difference between these methods is that GET receives the contents of the response, but HEAD does not. It would seem why should I get unnecessary data, for example, from pictures, css, fonts, but as practice shows, they are no less important.
Compare two types of queries for the same
HEAD
GET test resource
The pictures show that the GET request takes many times longer than the HEAD request. Therefore, the server gave the data for this request longer and could not serve the following. With one user, this difference does not seem significant, but when using 1,000 virtual users, the server will spend time no longer processing each of them.
For example, our Web server is configured in such a way that it can only handle 100 concurrent connections. As a result, we will see that the first 100 are connected and will work for at least 3 seconds. At the same time, all the remaining 900 will wait for connectivity. As soon as someone from the first hundred finishes, he will give resources to the next. That is, the thousandth user in our example will be able to start working with the specified request only after about 27 seconds. If we used the HEAD method, then the thousandth user would have access to the system in 2 seconds. (these calculations are extremely rough)
As a result, we see how using the “correct” method to access the server shows the real load on it.
The example used in the pictures is completely synthetic to display greater visibility in terms of query execution time. You may not have so many requests. But even with a response of 100-200 kilobytes and using 5,000 users, we can observe significant slowdowns in the operation of the Web server.
Checking Received Data
Many will say that checking the received data is not a load test, but a functional one, and you will be right in some respects. In practice, it turns out that the functional part of the Web-system does not start working correctly under heavy load. For example, a web application may not properly process incoming data under heavy load.
I came across situations when the web-system reported the success of the request, sending a response of 200 OK. But the request body was empty, after a deep study of the system, it was possible to find out that this is the planned response. That is, the success of the request could only be determined by the presence of content in the response.
It turns out that the only way to verify the correct operation of the system under high load is to control the data received, and not just the status of the response.
Today, with the intensive development of technologies for dynamic work with web-systems (AJAX, WebSocket, Flash, Java, etc.), we can receive different content on the same request. And you need to be sure that the response text is correct.
Building "Regular" Regular Expressions
In many utilities, regular expressions must be used to perform load testing in order to verify the correctness of the received data. We all know what it is, but what is the “regular” regular expression. This is an expression that takes as little time as possible to find the original value.
On the Internet there are a lot of examples and articles on the low performance of certain regular expressions or engines on which they are used. They describe what lazy, greedy and super-greedy quantifiers are. Why and when to use groupings and alternations. How to specify to increase the speed of the regular expression engine. I think to whom it is important and interesting, they will be able to find information.
I want to demonstrate why they need to be built correctly. Let’s take a certain sequence of queries that had some delays during recording.
In the picture, the delays are indicated by green arrows. When running the script, the utility should simulate delays between requests thereby guaranteeing the "real" user load.
From the previous paragraph, we found out that it is necessary to control the received data. We do this with regular expressions. As a result, for some of the queries, we will compose the expressions and begin checking.
Checks performed should take time to search for data. As a result, we get something like the following.
Red arrows show some time that will be spent on regular expressions.
As a result, we see that the total execution time of the script has already deviated a bit about how if one user did it. If we take into account that each virtual user should do such a check, and we have 1,000 of these running on one computer, then the time for processing them will increase several times. Each virtual user will capture operating system resources for computing. Therefore, while resources are occupied by some users, others cannot access them.
The more accurate our regular expression is, the less time it takes to process it, and we get a more real load on the system with a large number of virtual users
Of course, for many, the use of regular expressions, and even more so their "correct" option is not required. But if you want to bring your test closer to more realistic conditions for virtual users, then you must not forget about the speed of the service functions.
Conclusion
We examined three possible options that may affect the distortion of the results of load testing. Of course, not all of the situations described will affect such parameters as server connection time, server response time, and other characteristics you receive.
But in most cases, we are interested in how many real users execute one or another scenario, how fast the server will respond to us, and how much time each of the users will spend on this scenario. But on these parameters the items examined can be coddled.
Your server’s response time to a request may be several milliseconds. With a larger amount of data received and their processing, the execution time of the entire script may take several minutes, which is sometimes not acceptable for customers of such a test.
Update: completing the article