Overview of stress testing and performance testing tools

    As other courageous people say: "From dev to prod is just one step." Experienced people add that this step is called "testing", and the most diverse, and we just do not make sense not to believe them.



    The load matters: the driver of this truck managed to collapse the bridge with the weight of his vehicle, the bill for the restoration was approximately $ 21.3M. Fortunately, software testing is cheaper!

    Of course, talking about testing, you need to understand what we are fighting for and why. We deliberately limited ourselves and decided today to talk exclusively about load testing and performance testing: topics polar apart from each other are extremely interesting in practical terms. Let's consider the tools for both, without being tied to any particular technology stack, so do not be surprised at the proximity of Yandex.Tank and BenchmarkDotNet!

    Stress Testing


    Imagine that you and I wrote a certain service - now you need to understand what kind of load it will withstand. An old sad joke of developing quality software says that it is better to have software that works guaranteed to be bad than software that works well, but is not guaranteed to be good: in the first case, we will at least know what we can count on. Further, if our service knows how to scale in one way or another, then we need to understand how scalable it turns out to be useful with increasing load and whether it fulfills the tasks assigned to it by the project.

    Well, we take and direct the load to our offspring, moreover, carefully observing the result: we are obviously interested in the situation when the service will either respond to requests with an unacceptable delay, or will return incorrect data, or completely cease to show signs of life for all requests or only for their part.

    Let's imagine that we wrote some service - for definiteness, we say that it is a web service, but this is not so important. To make sure what we can count on, we begin to “bombard” him with requests, observing the behavior of both the service itself and the load on the servers where it is spinning. Well, if it is clear in advance what requests we need to send to the service (in this case, we can prepare an array of requests in advance, and then send it to our application in one fell swoop). If the second request depends on the results of the first (a simple example - the user is authorized first, and the session ID is included in the next calls to the service), then the load generator should be able to generate test requests extremely quickly, in real time.

    Taking into account the circumstances and our knowledge of the test object, select the tool (s):

    Jmeter


    Yes, good old JMeter. For nearly twenty years (!), It has been a frequent choice for many options and types of load testing: a convenient GUI, platform independence (thanks to Java!), Multithreading support, extensibility, excellent reporting capabilities, and support for many protocols for queries. Thanks to the modular architecture, JMeter can be expanded in the direction that the user needs, implementing even very exotic test scenarios - and if none of the plugins written by the community over the past time suits us, you can take the API and write your own. If necessary, with JMeter you can build, albeit limited, but distributed testing, when the load will be created by several machines at once.

    One of the convenient functions of JMeter is working in proxy mode: we specify “127.0.0.1:8080” in the browser settings as a proxy and visit the browser on the pages of the site we need, while JMeter saves all our actions and all related requests in a script that later it will be possible to edit as needed - this makes the process of creating HTTP tests noticeably easier.

    By the way, the latest version (3.2), released in April of this year, learned to send test results to InfluxDB using asynchronous HTTP requests. True, starting with version 3.2, JMeter began to require only Java 8, but this is probably not the highest price for progress.

    Storage of test scripts in JMeter is implemented in XML files, which, as it turned out, creates a lot of problems: they are very inconvenient to write with your hands (read - you need a GUI to create text), as well as working with such files in version control systems (especially at the time when you need to make diff). Products competing on the field of load testing, such as Yandex.Tank or Taurus, learned how to independently and on the fly create test files and transfer them to JMeter for execution, thus using the power and experience of JMeter, but allowing users to create tests in the form of more readable and easier to store test scripts in CVS.

    Loadrunner


    Another well-known product that has long existed on the market and in certain circles, which was more widely prevented by the licensing policy adopted by the manufacturer (by the way, today, after the merger of the Hewlett Packard Enterprise software division with Micro Focus International, the usual name HPE LoadRunner changed to Micro Focus LoadRunner). Of interest is the logic of creating a test, where several (probably correctly say “a lot”) virtual users simultaneously do something with the application under test. This makes it possible not only to evaluate the ability of an application to process a stream of simultaneous requests, but also to understand how the work of some users who are actively doing something with the service affects the work of others. In this case, we are talking about a wide selection of interaction protocols with the tested application.

    HP at one time created a very good set of tools for functional and load testing automation, which, if necessary, can be integrated into the software development process, and LoadRunner can integrate with them (in particular, HP Quality Center, HP QuickTest Professional).

    Some time ago, the manufacturer decided to turn to those who are not ready to immediately pay for the license, and supplies LoadRunner with a free license (where the limit is set for 50 virtual users and a small part of the entire set of supported protocols is prohibited), and the money is taken to further expand the capabilities . It is difficult to say how much this will increase interest in this, no doubt, an entertaining tool, if it has such strong competitors.

    Gatling


    A very powerful and serious tool (not in vain named after a rapid-fire machine gun) - primarily because of the performance and breadth of protocol support “out of the box”. For example, where load testing with JMeter will be slow and painful (alas, the support for working with web sockets is not very fast, which ideologically conflicts with the speed of the web sockets themselves), Galting will almost certainly create the necessary load without any difficulties.

    It should be noted that, unlike JMeter, Gatling does not use the GUI and is generally considered a tool aimed at an experienced, "competent" audience, capable of creating a test script in the form of a text file.

    Gatling also has drawbacks for which he is criticized. Firstly, the documentation could have been better, and secondly, it’s good to know Scala to work with it: Gatling itself, as a testing tool, and test scripts are written in this language. Thirdly, the developers “sometimes” in the past drastically changed the API, as a result it was possible to find that the tests written six months earlier did not “work” on the new version, or they needed to be improved / migrated. Gatling also lacks the ability to do distributed testing, which limits possible applications.

    Yandex.Tank


    In short, Yandex Tank is a wrapper over several load testing utilities (including JMeter), which provides a unified interface for their configuration, launching and reporting, regardless of which utility is used “under the hood”.

    He can monitor the main metrics of the application under test (processor, memory, swap, etc.), system resources (free memory / disk space), can stop the test based on various clear criteria ("if the response time exceeds a predetermined value", " if the number of errors per unit time is higher than x ", etc.). By the way, it can display in real time the main statistical data of the test, which is very useful right in the test process.

    The tank has been used in Yandex itself and in other companies for about 10 years. They are bombarded with completely different services, with different requirements for the complexity of test scenarios and the level of load. Almost always, for testing even highly loaded services, only one load generator is enough. The tank supports various load generators, both written specifically for it (Phantom, BFG, Pandora), and widely third-party (JMeter). The modular architecture allows you to write your own plug-in for the desired load generator and generally screw almost anything.

    Why use different load generators? Phantom is a fast C ++ gun. One such generator can produce up to hundreds of thousands of queries per second. But in order to achieve this speed, you have to generate requests in advance and you cannot (fail) to use the data received from the tested service to generate the next request. In cases when you need to execute a complex script or the service uses a non-standard protocol, you should use JMeter, BFG, Pandora.

    In BFG, unlike Jmeter, there is no GUI, test scripts are written in Python. This allows you to use any libraries (and there are a huge number of them). It often happens that for a service binders for Python are written, then they are convenient to use when writing load scripts. Pandora- This is an experimental gun on GoLang, fast enough and extensible, suitable for tests using the HTTP / 2 protocol and will be used where fast scripts are needed.

    Inside Yandex, a special service is used to store and display the results of stress tests. Now its simplified analogue called Overload is open to the outside - it is completely free, it is used, inter alia, for testing open libraries ( for example ) and conducting competitions.

    Taurus


    Taurus is another framework for several load testing utilities. You may like this product using an approach similar to Yandex.Tank, but having a slightly different set of “features” and, perhaps, a more adequate configuration file format.

    In general, Taurus is well suited in a situation where, say, Gatling power is important for creating a test, but there is no desire or ability to deal with Gatling (as well as writing test scripts on Scala): it’s enough to describe the test in a much simpler Taurus file format, configure him to use Gatling as a load building tool, and all Scala files will be generated automatically. So to say, "automation automation" in action!

    Taurus can be configured to send test statistics to the online service BlazeMeter.com, which displays the data in the form of smart charts and tables. The approach is not very ordinary, but worthy of attention: the report output engine is obviously being improved over time, and will gradually display information even more nicely.

    Performance testing


    It is possible and necessary to test the performance of a service or application not only after the development process is completed, but also during it, literally in the same way as we do regular unit or regression tests. Properly organized, regular performance tests allow you to answer a very "thin" question: have the latest changes in the application code led to a deterioration in the performance of the resulting software?

    It would seem that measuring performance is that simple! Take a timestamp twice (preferably with high accuracy), calculate the difference, add, divide, and everything can be optimized. No matter how! Although this question sounds simple in words, in reality it is quite difficult to make such measurements, and it is generally not always reasonable to compare the results of different measurements. One of the reasons: in order to compare the results, the tests must pass on the same initial data, which, among other things, implies the reconstruction of the test environment with each test run, another reason - the comparison of the subjective perception of the test script operation time may be inaccurate.
    Another reason is the difficulty of highlighting the impact on the performance of an entire application of the operation of its individual module, the one we are currently editing. To aggravate the situation, we clarify: it is even more difficult to isolate this influence if a team of more than one developer is working on the code.

    One approach in this situation is to carefully create a full-fledged test script that repeats the work with the service of a real client, and run it many times, with a parallel analysis of the load on the server where the test is being performed (thus, it will be clear which part of the script creates a load on individual resources of the test server, which can provide additional information on finding places where you should approach performance more seriously) - alas, you can’t always afford this in a real situation, just because in that the volumetric test, and even povtoronny 10-20 times more likely to be too long to see him very often, and it's completely kill the idea.

    The second approach, which is more suitable for the development process, is to organize limited-scale, “micro-” or even “nano-” testing of individual parts of the code (say, running one method or one function, but a large number of times - i.e., rather, benchmarking). Planning such testing requires additional efforts on the part of the developer, but the result pays off both by a general improvement in code performance and understanding how individual parts of the project behave as they work on them and on other parts. Here, for example, a couple of performance testing tools:

    Jmh


    JMH (Java Microbenchmark Harness) is a Java snap-in for building, running and analyzing nano / micro / milli / macro benchmarks written in Java and other languages ​​with the target JVM platform. A relatively young framework in which developers tried to take into account all the nuances of the JVM. One of the most convenient tools from those that are nice to have at hand. JMH supports the following types of measurements: Throughput (measuring pure performance), AverageTime (measuring average runtime), SampleTime (percentile of runtime), SingleShotTime (time to call one method - relevant for measuring the "cold" start of the test code).

    Since we are talking about Java, the framework takes into account and the operation of the JVM caching mechanism, and before starting the benchmark, it executes the test code several times to "warm up" the Java machine’s bytecode cache.

    Benchmark dotnet


    BenchmarkDotNet takes upon itself the routine of compiling benchmarks for .NET projects and provides ample opportunity to format the results with minimal effort. According to the authors, there are enough feature requests, so BenchmarkDotNet has a lot to develop.

    Today, BenchmarkDotNet is a library, primarily for benchmarks, and not for performance tests. Serious work is underway to ensure that the library can also be used on the CI server to automatically detect performance regressions, but so far these developments have not been completed.

    Google lighthouse


    Measurements of frontend performance have always been somewhat different: on the one hand, delays are often related to the backend’s reaction speed, and on the other hand, it’s precisely the behavior of the frontend (more precisely, its reaction speed) that users often judge the entire application, especially when it comes to the web.

    In the web front with respect to performance measurements, now everything goes towards using the Performance API and measuring exactly those parameters that are relevant for a particular project. The webpagetest.org web service with Performance API tags and measurements will be of great help - it will allow you to see the picture not from your computer, but from one of the many testing points in the world, and evaluate the effect of the time of receiving and transmitting data via Internet channels on work frontend.

    This product would be more suitable for checking site pages for compliance with Google recommendations (and generally best practices) for both websites and for Progressive Web Apps, if not for one of its functions: among the checks there is also a test for site behavior in case of poor the quality of the web connection, as well as in the complete absence of connection. This does not really correlate with performance testing as such, however, if you think about it, in some cases the web application is perceived as “slow” not because it is slowly preparing data, but because its working conditions on the user's machine, in its browser, with given its internet connection - alas, not perfect. Google Lighthouse just allows you to evaluate this impact.



    Yes, the topic of benchmarks and testing is simply endless. You can and should write a post about each of them, and not one. However, as you and I know, the most interesting thing is not just to read, but to talk, listen, ask around a knowledgeable person who, by virtue of his experience, will warn in advance about many small and major difficulties lying in the way of mastering this or that technology.

    Therefore, we are pleased to invite you to visit the Heisenbag 2017 Moscow conference, which will be held on December 8-9, 2017, where, in particular, the following reports will be presented:


    Details and conditions of participation can be found on the conference website .

    Also popular now: