Performance Testing: Pitfalls

I am engaged in the creation of highly loaded applications for stock trading. Loaded both in terms of data volume and in the number of users and queries. Naturally, for such applications performance is of paramount importance, and, as a result, testing thereof.

Observing this testing from the side, I have accumulated a certain amount of information, which, in my opinion, will be interesting.

1st stone. Conversion factor

Testing such applications requires the deployment of an entire network of test machines. As a result, this leads to the formation of a test cluster. An attempt to deploy such a cluster on the basis of machines physically located in the development center leads to the creation of our own data center with all the costs of such a solution. A good alternative is to use solutions such as Amazon Web Services.

The natural desire to save on renting hosts or buying equipment leads to the choice of those with underestimated performance relative to the production installation. Underestimated at times. And here the conversion factor between synthetic performance indices comes into effect. Those. the processor in production will be 2 times faster, the number of cores will be 4 times more, the amount of RAM will be 6 times more, the performance of the disk subsystem will be 3.5 times better, the network speed will be 100 times more. Add, divide by the number of indicators, multiply by some correction factor and get the conversion factor, by which we will multiply the results of performance testing. You can come up with a more complex formula, for example, assign a certain weight to each indicator.

Upon closer examination, this approach is only suitable for preparing a test suite for future testing on installations close to production and for detecting the most obvious performance problems. (Which is already quite a lot and important.) Why? Yes, because at least because with this approach the effect of bottlenecks is completely ignored.

An example from life. Tests were run on one host, the test application on another. The test suite included requests giving a different amount of data in the response. Requests giving relatively little data yielded satisfactory results, while queries yielding large responses yielded unsatisfactory results. At the same time, the host of the tested application was far from overload. The conclusion suggests itself: the tested application does not cope well with requests that give a large output, it does not use all the resources of the machine, i.e. needs redesigning. But what really is? But in fact, it turned out that the low network speed led to a long transmission of answers, which especially affected the responses of a large volume and, therefore, simply did not allow creating a large load on the application under test.

Here is an example of a bottleneck, manifested in a test (slow) installation, and simply impossible in production. Let's call it a “downstream” bottleneck. Be that as it may, the “downstream” bottleneck is harmless and causes only unproductive loss of time for the tester and developer.

But can it be the opposite, is an “upstream” bottleneck possible, which is truly dangerous and can cause great trouble not only to the developer, tester, etc., but also to the client? Those. imagine that the performance achieved on a test installation, we are completely satisfied. For example, we have a conversion factor of 5, we need to provide 100,000 operations per second, the test installation gave 25,000. Everything seems to be OK, can we sleep peacefully? No matter how! Similarly, in such a situation, a bottleneck can appear that is undetected (and fundamentally undetectable!) In a test installation. Because of which, the real, effective conversion factor will not be 5, but 3. That is not 125,000, but only 75,000 - 25% worse than necessary.

Moreover, a huge scope of opportunities opens up for upstream bottlenecks: we form the conversion factor based on synthetic indices, in which the weight of individual indicators is chosen almost arbitrarily. It’s not enough to correctly estimate the weight of one of the indicators, and ...

2nd stone. Test application

Performance testing requires a load. Loads from multiple hosts, from multiple user accounts, sending a large number of different requests. There is a natural desire to automate all this. Make your own utility or find a finished one, which, fortunately, are many.

But, creating a large load for the application under test, the testing application itself is under load. This is not entirely obvious, but nonetheless it is. We turn to the analogy. We all know that any measuring device affects the measured process and, thus, inevitably introduces an error. In the same way, the features of the test implementation affect the test results, which is not always taken into account. For testers, it’s natural to first of all (and any) look at the application under test, and not the tests themselves. I here by no means mean banal errors in tests. I mean the non-obvious impact of tests on the testing process.

For instance. The testing application sends thousands of requests per second. It turns out that each subsequent request is slower than the previous one. Well, the system does not withstand the load, delays increase, the queue of processing requests grows? And it turns out that the testing application (in Java) allocates a certain amount of memory for each request-response, and the more requests are made and the responses are received, the slower the memory is allocated in it, the less requests per unit of time the testing application can send.

3rd stone. Black box

The most general approach to testing is considered to be the application under test as a black box. Testers are not included in the implementation details, do not take into account the features and relationships of the individual components of the application.

This approach is good in the general case. Moreover, the necessary initial conditions of this general case include a large (almost unlimited) amount of time for testing. That almost never occurs in reality. On the contrary, testers in most cases lag behind developers. Because at least that developers endlessly redo something.

In such conditions, it is important to minimize the time required for testing, while not reducing its quality, of course. To solve this problem, close interaction between testers and developers is required, during which developers, naturally knowing their application, can indicate to testers in advance weaknesses, possible bottleneck'es, etc. In this case, of course, the approach to the application under test is lost, as to a black box. But, firstly, this is justified, and secondly, it is not necessary to do all the tests in conjunction with the developers.

The article describes 3 pitfalls. Perhaps someone would like to talk about other unobvious performance testing issues. Waiting for your feedback!

UPD. An article about non-obvious, non-surface methodological errors in performance testing. If you have a burning desire to brand and condemn these errors - it’s not worth it, I agree with you in advance. It would be more interesting if you shared other types of such errors.

Also popular now: