QA on prod. Why is it cool

    Many consider testing on production environments as harmful practice: it does not help prevent problems from reaching end users, but rather states that they are present. In addition, the tester is detached from the standard workflow and techniques used in the test environment. My name is Olya Mikhalchuk, I am a QA engineer in the fintech company ID Finance. In this post I will explain why testing on the prod can significantly help your project.


    Why do you need a QA on the prod, if there is a pre-production environment

    During the software development process, there are always several environments on which the application is deployed. The environment that end users use, as you know, is called production. Usually it is assumed that testing should be conducted in a separate environment, most often in QA environment or Staging (pre-prod), to prevent errors from reaching users. But there is such a technique as QA on prod, which perfectly helps to solve problems that are physically impossible to solve on a test environment.


    What tasks does QA on prod help

    1. The problem of differences between Staging and Production environments.

    Staging is often considered a copy of the production environment, which is inaccessible to end users, but is most similar to the combat environment. When the application is quite complex, synchronizing and maintaining such a mini-copy becomes a time-consuming and not always rational task.

    For example, on our project, pre-prod is used more for functional testing on manually made test scenarios. It does not have technical resources comparable to the production environment. Also, we usually do not completely synchronize configurations and databases with the production environment, which does not interfere with functional tests. Why do not we copy the prod environment? Imagine how many resources it would take to create a copy of, say, Facebook, with the same super-powerful servers, services, database and configurations as on production. This is actually how to deploy another application of the same kind.

    In addition, when integrating with third-party services, you always have different settings for the test and combat environments (the same API). I am not saying that test and staging environments are pointless. It’s just not possible to 100% guarantee that upon successful completion of certain tests on one environment, services will not fall on another. Additional testing for production can help in solving this problem.


    2. Real levels of multitasking and load.

    Some errors can only be detected under a long and real level of multitasking and workload. This applies to memory leaks, stability, speed and stability of the system. For example, we had a situation when the problem of system performance arose due to the fact that two resource-intensive tasks were performed in the same time interval. The developers optimized the work of the tasks, the team did tests on the pre-prod environment, delivered the changes, then did a production check.

    3. Deployment errors

    From the definition, deployment is the installation by a working group of a new version of the service program code in the production infrastructure. Accordingly, the best way to see deployment errors is through testing in the deployment process itself.

    4.Lack of monitoring on the pre-prod

    One of the best and indispensable ways to control that the application works as we expect is to monitor certain metrics. For example, from simple and most critical examples: monitoring on the number of new user registrations per hour, on conversion from one target action to another, on the number of loans issued. Of course, such monitoring only makes sense in a combat environment.

    5. The ability to analyze end-user scenarios for using the system

    Production - a storehouse of test cases for the tester. If possible, the tester can see and process the scripts used by end users, the tester can identify the most critical scenarios, or find out the cause of the defect, or pay attention to non-trivial cases when testing on the pre-prod.

    6. The ability to maintain more reliable statistics and metrics of software quality.

    For example, the number of errors in the logs of an application or component, bug reports and other reports that a pro-tester can do, more realistically demonstrate the quality of the software compared to the same reports from the test environment.

    7. It is always better if the error on the prod is noticed by “your” tester than the end user.

    Usually, after the task is delivered, the tester makes basic checks of the new or changed functionality on the prod. In addition, we have a separate person in our company - a tester on prod. I want to once again note that I do not position QA on prod as a substitute for testing on pre-production, and of course it is necessary to prevent bugs and take preventive measures. But such testing can be a great additional technique in the process of ensuring quality on your project.


    Useful QA practices on production, which work effectively on our project

    1. Checking the tasks we have delivered to ensure that they are well-established and working on the new environment.

    For example, when we introduce integration with a new partner, in addition to tests on the pre-prod, we will definitely check the integration after delivery, because there are a lot of settings depending on the environment (API, URLs, components). There are also 3rd party issues - errors are not on our side, but on the side of integrated services.

    2. Logging and audit.

    Good logging helps developers and testers notice a problem even before the end user guesses it, as well as notice places that need optimization. An audit of actions and changes allows us to always find out the reasons for a particular behavior without any problems. For example, if a component of a credit policy cannot give a decision on a loan, to analyze why this happened, we first turn to the logs. This item applies to both prodcution and pre-production environments.

    3. Monitoring and alert system

    As I mentioned above, monitoring by certain metrics is one of the best ways to control that everything is ok with our application. Moreover, if there is any problem, you must send an alert to interested parties (for example, the number of loan applications is 20% less than expected - we will send an alert to IT and business departments, the CPU load is above normal - notify administrators and virgins). It is necessary to ensure that alerts about problems are timely and relevant, as well as really indicate the problem.

    4. Regression and stability testing. A

    cool practice is to periodically go through regression tests to ensure that nothing went wrong anywhere. It can help in some narrow and specific cases when monitoring does not see problems.

    5.Reporting and statistics

    As in any testing, reporting and statistics on the results of the pro-test makes the process more transparent, the quality of the software and the causes of defects more visible.

    All errors cannot be detected on the pre-prod, so they will fall into the combat environment. If users find them, it will affect the company's reputation and, ultimately, the loss of money. Testing on the prod will help prevent this.

    Also popular now: