How we test search in Yandex. Screenshot-based testing of result blocks
The larger and more complex the service becomes, the more time you have to devote to testing. Therefore, the desire to automate and formalize this process is completely legal.
Most often, Selenium WebDriver is used to automate testing of web services . As a rule, they use it to write functional tests. But, as everyone is well aware, functional tests cannot solve the problem of testing the layout of a service, which requires additional manual, often cross-browser, checks. How can a test evaluate the layout correctness? To detect regression errorstypesetting, the test will need some standard, which can be the image of the correct layout, taken, for example, from the production version of the service. This approach is called screenshot-based testing. This approach is rarely used, and most often the layout is still tested manually. The reason for this is a number of rather strict requirements for the service, the test execution environment, and the tests themselves.
The expanded answers of Yandex services in the search results - we inside ourselves, according to the old tradition, call them "sorcerers" - an additional link in which something can break.
On the example of testing sorcerers in the search, we will tell you what features the tested service should have, what problems we have when using screenshot-based testing, and how we solve them.
The sorcerer's checks take most of the time allotted for regression testing of the desktop search. It is important to make sure that the sorcerers are displayed correctly in all major browsers (Firefox, Chrome, Opera, IE9 +). Whatever high-quality functional tests we wrote, we were not able to significantly reduce the regression time. Fortunately, due to some features, sorcerers are quite suitable for screenshot-based testing:
For testing to be effective, Selenium Grid should have as many browsers of different versions as possible. The benefit of each test is multiplied by the number of browsers in which it runs. Creating screenshot-based tests takes a lot of time and resources, so you should try to conduct them with the highest degree of efficiency. Otherwise, the gain in time compared to manual testing can be very small. For testing automation needs, we have deployed Selenium Grid, which provides thousands of browsers of the types we need.
Another problem that needs to be considered “ashore” is the stability of the service as a whole. When a service lives and develops rapidly (design, functionality changes significantly), the fight against this noise level will require support and may not pay off from updating to updating the service. As noted above, sorcerers are fairly stable.
So, we want to test the sorcerers with the help of screenshots, emulating the actions of users: click on the active elements, enter text in the input fields, switch tabs and more. But, in addition to the sorcerer himself, there are other elements on the page, including non-static: snippets, ads, vertical frames. In the vast majority of cases, beta and search production have visible differences. So, comparing the page is completely pointless. But all these elements in no way affect the functionality of the sorcerer. It was possible to hide individual elements of the page, but since we have too many of them, we decided to hide all elements of the page, except for the tested sorcerer, using JavaScript. This also has indirect benefits: the page is “compressed”, the screenshot is taken and transmitted over the network faster, takes up less memory space. Moreover,
But even the same screenshots for the human eye were distinguishable by pixel-by-pixel comparison. Without going into the reasons for this behavior of browsers, we introduced an experimental threshold for differences in RGB channels, at which triggering occurs only for visual differences that are visible to the human eye.
On the way to full cross-browser testing, many problems were solved, caused primarily by the features of OperaDriver and IEDriver (the description of which is beyond the scope of this article).
But, despite all the efforts, in a significant percentage of cases the tests gave false positives for random reasons: network lag, delays in JavaScript and AJAX execution. Although similar errors occur with functional tests, in screenshot-based tests their influence is higher: if a functional test checks element A and a problem occurs in element B, then false positives may not occur, which cannot be said about the screenshot-based test.
We give an example. When you select another cocktail in the “bartender” sorcerer, drawing a new recipe does not happen instantly: it takes time to receive data over the network using AJAX and JavaScript redrawing of the sorcerer’s elements. As a result, on beta the script did not bring the sorcerer to the desired state:
Whereas in the production the script had no problems and the sorcerer looks different:
To exclude an element of randomness, we restart the tests several times until we are convinced of a stable reproduction of the problem. This leads to another requirement for Selenium Grid: you must have many browsers of each type. Because only parallel running can give an acceptable duration of the tests. In our case, more than three hours in a sequential start turned into 12-15 minutes after parallelization. We also recommend splitting long scenarios into independent short ones: the probability of random operations will decrease, and it will become easier to analyze the report.
The report is also subject to special requirements: when the test returns many screenshots, it is important to present them correctly. Endless clicks on report subpages will take almost as much time as manual service check. There is no universal recipe for the report, we settled on the following:
Only the expected outcomes of the test are included in the report: success, detection of differences, inability to execute the script. It is necessary to exclude any test errors that do not allow reaching one of these outcomes.
Comparing the state images of the sorcerers, we were able to detect bugs of various types (hereinafter the first screenshot is beta, the second is production):
As a bonus, we got the opportunity to find changes in translations. The search is presented in Russian, Ukrainian, Belarusian, Kazakh, Tatar, English, Turkish. Tracking the correctness of all versions is very difficult, and in the screenshots the differences in the texts are immediately visible.
So screenshot-based testing can be very useful. But be careful in evaluations: not every service allows you to use this approach, and your efforts can be wasted. If you manage to find the appropriate functionality, there is every chance to reduce the time for manual testing.
November 30 in St. Petersburg we will conduct a test environment- its first event specifically for testers. There we will tell you how our testing works, what we did to automate it, how we work with errors, data and graphs, and much more. Participation is free, but there are only 100 places, so you need to register in time .
Most often, Selenium WebDriver is used to automate testing of web services . As a rule, they use it to write functional tests. But, as everyone is well aware, functional tests cannot solve the problem of testing the layout of a service, which requires additional manual, often cross-browser, checks. How can a test evaluate the layout correctness? To detect regression errorstypesetting, the test will need some standard, which can be the image of the correct layout, taken, for example, from the production version of the service. This approach is called screenshot-based testing. This approach is rarely used, and most often the layout is still tested manually. The reason for this is a number of rather strict requirements for the service, the test execution environment, and the tests themselves.
The expanded answers of Yandex services in the search results - we inside ourselves, according to the old tradition, call them "sorcerers" - an additional link in which something can break.
On the example of testing sorcerers in the search, we will tell you what features the tested service should have, what problems we have when using screenshot-based testing, and how we solve them.
Testing sorcerers in the search
The sorcerer's checks take most of the time allotted for regression testing of the desktop search. It is important to make sure that the sorcerers are displayed correctly in all major browsers (Firefox, Chrome, Opera, IE9 +). Whatever high-quality functional tests we wrote, we were not able to significantly reduce the regression time. Fortunately, due to some features, sorcerers are quite suitable for screenshot-based testing:
- The sorcerer is a rather isolated page functionality, it weakly depends on neighboring elements.
- Most sorcerers are static.
- Changes to sorcerers are made relatively rarely, so in most cases you can use the production version of the search as a reference.
For testing to be effective, Selenium Grid should have as many browsers of different versions as possible. The benefit of each test is multiplied by the number of browsers in which it runs. Creating screenshot-based tests takes a lot of time and resources, so you should try to conduct them with the highest degree of efficiency. Otherwise, the gain in time compared to manual testing can be very small. For testing automation needs, we have deployed Selenium Grid, which provides thousands of browsers of the types we need.
Another problem that needs to be considered “ashore” is the stability of the service as a whole. When a service lives and develops rapidly (design, functionality changes significantly), the fight against this noise level will require support and may not pay off from updating to updating the service. As noted above, sorcerers are fairly stable.
So, we want to test the sorcerers with the help of screenshots, emulating the actions of users: click on the active elements, enter text in the input fields, switch tabs and more. But, in addition to the sorcerer himself, there are other elements on the page, including non-static: snippets, ads, vertical frames. In the vast majority of cases, beta and search production have visible differences. So, comparing the page is completely pointless. But all these elements in no way affect the functionality of the sorcerer. It was possible to hide individual elements of the page, but since we have too many of them, we decided to hide all elements of the page, except for the tested sorcerer, using JavaScript. This also has indirect benefits: the page is “compressed”, the screenshot is taken and transmitted over the network faster, takes up less memory space. Moreover,
But even the same screenshots for the human eye were distinguishable by pixel-by-pixel comparison. Without going into the reasons for this behavior of browsers, we introduced an experimental threshold for differences in RGB channels, at which triggering occurs only for visual differences that are visible to the human eye.
On the way to full cross-browser testing, many problems were solved, caused primarily by the features of OperaDriver and IEDriver (the description of which is beyond the scope of this article).
But, despite all the efforts, in a significant percentage of cases the tests gave false positives for random reasons: network lag, delays in JavaScript and AJAX execution. Although similar errors occur with functional tests, in screenshot-based tests their influence is higher: if a functional test checks element A and a problem occurs in element B, then false positives may not occur, which cannot be said about the screenshot-based test.
We give an example. When you select another cocktail in the “bartender” sorcerer, drawing a new recipe does not happen instantly: it takes time to receive data over the network using AJAX and JavaScript redrawing of the sorcerer’s elements. As a result, on beta the script did not bring the sorcerer to the desired state:
Whereas in the production the script had no problems and the sorcerer looks different:
To exclude an element of randomness, we restart the tests several times until we are convinced of a stable reproduction of the problem. This leads to another requirement for Selenium Grid: you must have many browsers of each type. Because only parallel running can give an acceptable duration of the tests. In our case, more than three hours in a sequential start turned into 12-15 minutes after parallelization. We also recommend splitting long scenarios into independent short ones: the probability of random operations will decrease, and it will become easier to analyze the report.
The report is also subject to special requirements: when the test returns many screenshots, it is important to present them correctly. Endless clicks on report subpages will take almost as much time as manual service check. There is no universal recipe for the report, we settled on the following:
- The report consists of one html page.
- Sorcerers are grouped into blocks. The contents of the blocks can be minimized. First come blocks with errors.
- Inside the block of the sorcerer scripts are displayed. Successful scripts are minimized.
- Script logs are available: which elements were interacted with so that the problem could be reproduced.
- When you move the mouse over the screenshot, images from beta and production are alternately displayed so that a person can quickly detect the difference.
Only the expected outcomes of the test are included in the report: success, detection of differences, inability to execute the script. It is necessary to exclude any test errors that do not allow reaching one of these outcomes.
Examples of bugs found
Comparing the state images of the sorcerers, we were able to detect bugs of various types (hereinafter the first screenshot is beta, the second is production):
- Moving out text (zip code wizard);
- Change image scaling (sorcerer event poster);
- Regression in css (black frame in the input fields of the mathematical sorcerer);
- Regression in data (translation wizard).
As a bonus, we got the opportunity to find changes in translations. The search is presented in Russian, Ukrainian, Belarusian, Kazakh, Tatar, English, Turkish. Tracking the correctness of all versions is very difficult, and in the screenshots the differences in the texts are immediately visible.
So screenshot-based testing can be very useful. But be careful in evaluations: not every service allows you to use this approach, and your efforts can be wasted. If you manage to find the appropriate functionality, there is every chance to reduce the time for manual testing.
November 30 in St. Petersburg we will conduct a test environment- its first event specifically for testers. There we will tell you how our testing works, what we did to automate it, how we work with errors, data and graphs, and much more. Participation is free, but there are only 100 places, so you need to register in time .