Creating a tool for quickly and efficiently writing autotests on Selenium

Fundamental Building Block Automation - Testing by
Rod Johnson

I am not an ambassador for testing web interfaces, but this essay is more likely to be useful to comrades who already have experience in this field.

It will also be useful for beginners, as I provide the source code where you can see how the interaction with the selenium is organized in the final product.

I will talk about how, from scratch, having little development experience, I wrote a platform for running tests, and about the platform itself. I myself believe that my product turned out to be very effective, which means it will be useful to many and has a place for consideration.

Of the concept

The testing process depends on the information system.

To remember my concept, it is necessary to understand which systems I focus on in the first place - those are systems where usually there are specific linear business processes that are set as the key when conducting regression tests.

So, a system like srm. A key business entity is vendor offerings. The key consideration in conducting regression testing is the integrity of the business process.
The business process starts from the registration of the supplier in the system, then the creation of the commercial proposal proceeds - it goes to the stage of consideration, which is carried out by various types of internal users (and each user has a unique interface) until the decision on the consideration of the proposal to the supplier is returned.

It turns out that we go through a number of different interfaces, and almost always work with different ones. In fact - if you just look directly at it - it looks like watching a video tape - i.e. this is a process that has a beginning and an end, and it is absolutely linear - no branching, when writing a test, we always know the expected result. Those. I want to say that already looking at this picture, we can conclude that making tests polymorphic is unlikely to succeed. In view of this, when creating a platform for running tests, the key factor I set the speed of writing tests.

The concepts that I set for myself:

  1. Autotest should be created as quickly as possible. If you achieve this qualitatively, then other aspects, such as reliability and ease of use, should come by themselves.
  2. Tests must be declared declaratively and live separately from the code. I did not even see another option. This increases the speed of writing, as if you have a ready-made interpreter - our platform, then you don’t need to add anything later, you don’t have to go into the code once again — in general, you can forget about the finished platform by IDE. So tests are easier to maintain. In this form, they are easier to learn to write, because development skills are not needed, but only an understanding of the markup language. In this form, they are understandable to all participants in the process.

What I decided to refuse at the start:

  1. DO NOT wrap your system in a test framework. You can start the execution of a process without a test framework. “You want to invent a bicycle!” - many will say. I think differently. Popular used test frameworks were created primarily for testing the code from the inside, and we are going to test the external part of the system from the outside. It's like if I have a road bike, and I need to go down the mountain off-road (rude, but the train of thought reflects). In general, we will write the framework ourselves - with blackjack and ... (although I am aware that, for example, JUnit 5 is already much more adapted for such tasks).
  2. Refusal to use wrappers for selenium. Actually, the key library itself is small. To understand that you need to use 5 percent of its functionality, while fully shoveling it, it will take several hours. Stop looking everywhere for a way to write less code and accustom yourself to the potty. In the modern world, this desire often leads to absurdity and almost always causes damage to flexibility (I mean the approaches to “write less code” and not the cases of architectural frameworks).
  3. Beautiful presentation of the results is not necessary. Introduced this item, because I’m not once faced with this. When the autotest is completed, I need to know 2 things: the overall result (positive / negative), and if there was an error - where exactly. Perhaps you still need to keep statistics. Everything else in terms of results is ABSOLUTELY not essential. To consider a beautiful design as a significant plus, or spend time on this beautiful design in the initial stages - are superfluous show-offs.

I’ll talk a little more about the level of development in the company and the conditions for creating the tool in order to fully clarify some details.

Due to some confidential circumstances, I do not disclose the company where I work.

In our company, development has already existed for many years, and therefore all processes have long been established. However, they are far behind current trends.
All representatives of IT understand that you need to cover the code with tests, write scripts for autotests at the time of coordinating requirements for future functionality, flexible technologies significantly save time and resources, and CI which simply takes and simplifies life. But all this so far only slowly reaches us ...

So is the software quality control service — all tests are performed manually, if you look at the process “from above” - this is the “bottleneck” of the entire development process.

Assembly description

The platform is written in Java using JDK 12.

The main infrastructure tools - Selenium Web Driver, OJDBC.

For the application to work on the PC, the FireFox browser version 52 must be installed

Application Build Composition


There are 3 folders and 2 files without fail with the application:

BuildKit folder - contains:

  • jdk12, through which the application is launched (JVM);
  • geckodriver.exe - for Selenium Web Driver to work with FireFox browser;
  • SprintAutoTest.jar - directly the application instance

Reports folder - reports are saved to it after the application completes the test case. It should also contain the ErrorScreens folder, where a screenshot is saved in case of an error

TestSuite folder - web packages, javascripts, a set of test cases (this folder will be covered in detail separately)

• file - contains a config for connecting to the Oracle database and values explicit expectations for WebDriverWait

• starter.bat - file for launching the application (it is possible to automatically launch the application without manually specifying TestCase if you enter TestCase as the parameter at the end).

Brief description of the application

The application can be launched with the parameter (name TestCase) or without it - in this case, you must enter the name of the test case in the console yourself.

An example of the general contents of a bat file to run without a parameter : start "AutoTest launcher"% cd% \ BuildKit \ jdk-12 \ bin \ java.exe -Xmx768M -jar --enable-preview% cd% \ BuildKit \ SprintAutoTest.jar

At the general launch of the application, it looks at the xml files located in the "\ TestSuite \ TestCase" directory (without viewing the contents of the subfolders). In this case, the primary validation of xml files to the correctness of the structure occurs (that is, all tags from the point of view of the xml markup are specified correctly), and the names indicated in the "testCaseName" tag are taken, after which the user is prompted to enter one of the possible names for the available test cases. In case of an erroneous entry, the system will ask you to enter the name again.

After the name TestCase is received, the internal model is constructed, which is a bunch of TestCase (test script) - WebPackage (storage of elements) in the form of java objects. After building the model, TestCase (the executable object of the program) is built directly. At the construction stage of TestCase, secondary validation also takes place - it is checked that all the specified forms in TestCase are in the associated WebPackage and that all the elements specified in action are in the WebPackage within the specified pages. (The structure of TestCase and WebPackage is described in detail below)

After TestCase is built, the script is run directly

Script operation algorithm (key logic)

TestCase is a collection of Action entities, which in turn is a collection of Event entities.

-> List {Action}
-> List {Event}

When starting TestCase, the Action starts sequentially (each Action returns a logical result)

When the Action starts, the Event sequentially starts (each Event returns a logical result)

The result of each Event is saved the result

Respectively the test ends either when all actions have been completed successfully, or if the Action returned false.

* Crash mechanism

Because my test system is ancient and has caught errors / bugs that are not errors, and some events do not work the first time, the platform has a mechanism that can depart from the above concept of a strictly linear test (however, it is strongly typed). When catching such errors, it is possible to repeat cases first and perform additional actions to be able to repeat actions.

At the end of the application, a report is generated, which is saved in the "\ Reports" directory. In case of an error, a screenshot is taken, which is saved in "\ Reports \ ErrorScreens"

TestSuite Filling

So, the description of the test. As already mentioned, the main parameter needed to run is the name of the test to be run. This name is stored in the xml file in the directory “/ TestSuite / TestCase”. All test scripts are stored in this directory. There can be any number of them. The name of the test case is taken not from the file name, but from the tag “testCaseName” inside the file.

TestCase sets what exactly will be done - i.e. actions. In the directory “/ TestSuite / WebPackage” in xml files all locators are stored. Those. all in the best traditions - actions are stored separately, web form locators separately.

TestCase also stores the WebPackage name in the “webPackageName” tag.

Total picture is already there. To run, you must have 2 xml files: TestCase and WebPackage. They make up a bunch. WebPackage is independent - the identifier is the name in the tag “webPackageName”. Accordingly, here is the first rule - the names TestCase and WebPackage must be unique. Those. once again - in fact, our test is a bunch of TestCase and WepPackage files, which are connected by the WebPackage name, which is specified in TestCase. In practice, I automate one system and I knit all my test cases to one WebPackage in which I have a heap of descriptions of all forms.

The next layer of logical decomposition is based on a pattern such as Page Object.

Page Object
Page Object is one of the most useful and used architectural solutions in automation. This design pattern helps to encapsulate work with individual page elements. Page Object, as it were, simulates the pages of the application under test as objects.

Separation of logic and implementation

There is a big difference between the testing logic (what to check) and its implementation (how to check). An example of a test scenario: "The user enters an incorrect username or password, presses the login button, receives an error message." This script describes the logic of the test, while the implementation includes actions such as searching for input fields on the page, filling them in, checking for errors, etc. And if, for example, the method of displaying an error message changes, then this will not affect the test script, you will also need to enter incorrect data, press the login button and check the error. But this will directly affect the implementation of the test - it will be necessary to change the method that receives and processes the error message. By separating the logic of the test from its implementation, autotests become more flexible and, as a rule, easier to maintain.

*! It cannot be said that this architectural approach has been fully applied. It is only a matter of decomposing the description of the test scenario page by page, which helps to write tests faster and add additional auto-checks on all pages, stimulates the correct description of locators (so that they are not the same on different pages) and builds a "beautiful" logical structure of the test. The platform itself is implemented on the principles of "Clean Architecture"

Next, I will try not to detail the structure of WebPackage and TestCase. For them, I created a DTD schema for WebPackage and XSD 1.1 for TestCase.


By maintaining DTD and XSD schemes, the concept of quick test writing is implemented.

When writing WebPackage and TestCase directly, you must use the xml Editor with built-in real-time DTD and XSD validation functions with auto-generation of tags, which will make the process of writing an auto test to a large extent automated (all required tags will be substituted automatically, drop-down lists will be displayed for attribute values possible values, according to the type of event corresponding tags will be generated)

When these schemes are “screwed” to the xml file itself, then you can forget about the correctness of the structure of the xml file, if you use a special environment. My choice fell on oXygen XLM Editor. Once again - without using such a program, you will not understand the essence of writing speed. Idea is not very suitable for this. it does not handle the XSD 1.1 “alternative” construct, which is key to TestCase.


WebPackaege - an xml file that describes the elements of web forms, located in the directory "\ TestSuite \ WebPackage". (there can be as many files as you like. The name of the files can be anything - only the content matters).

DTD (inserted at the beginning of the document):


In general, it looks approximately
пустая_форма_авторизации_при_открытии_тестового_приложениянаименование_элемента.//div/form/div/div/form/table/tbody/tr/td[text()="Логин"]/following-sibling::td/inputнаименование_другого_элемента.//div/form/div/div/form/table/tbody/tr/td[text()="Логин"]/following-sibling::td/input .......

As already mentioned, so that the elements are not in a heap - everything is decomposed according to web forms. The

key entity is

The element tag has 2 attributes:

  • type
  • alwaysVisible

The type attribute is required and specifies the type of element. In the platform, it is specified by type byte.

At the moment, it has implemented the following types specifically for itself:

• 0 - does not have a functional meaning, usually some kind of inscription
• 1 - button (button)
• 2 - input field (input)
• 3 - checkbox ( checkBox)
• 4 - the drop-down list (select) - is not actually implemented, but left a place for it
• 5 - for the srm drop-down list: write the name, wait for the value to appear - select according to the specific xpath template - the type specifically for my system
• 6 - srm select - used on typical functions such as search, etc. - type specifically for my system ; alwaysVisible

attribute- optional - shows whether the element is always present on the form, can be used during the initial / final validation of Action (i.e., in automatic mode, you can check that when you open the form, it contains all the elements that are always on it, when the form is closed, all these elements have disappeared)

Possible values:

  • 0 - by default (or if the attribute is not set) - the element may not be on the page (do not validate)
  • 1 - the element is always present on the page

An optional optional type attribute is implemented with the locator tag.

Possible values:

  • 1 - search for an element by id (respectively, specify only id in the locator)
  • 2 - by default (or if the attribute is not set) - search on xpath - it is recommended to use only search on xpath, because This method combines almost all the advantages of the rest and is universal


TestCase - the xml file that directly describes the test script is located in the "\ TestSuite \ TestCase" directory (there can be as many files as you like. The name of the files can be anything - only the content matters).

XSD circuit:

General form:
уникальное_наименование_testCaseназвание_webPackage_из_webPackageNameВход через пустую форму авторизации в интерфейс КМа для инициализации приложения10пустая_форма_авторизации_при_открытии_тестового_приложения10&srmURL;

Here in this line you can see how to fasten the xsd scheme so that the XML Editor sees it:

In TestCase, I also use DTD entities that are stored separately in the same directory - a file with the extension .dtd. In it, I store almost all data - constants. I also built the logic in such a way that to launch a new test, and throughout the test, new unique entities were created, a new spacecraft was registered, it was enough to change 1 digit in this file.

Its structure is very simple - I will give an example:

Such constants are inserted into the tag value as follows:


- can be combined.

! Recommendation - when writing testCase, you should specify these DTD entities inside the document, and after everything has worked stably, transfer it to a separate file. My xml editor has difficulties with this - it cannot find DTD and does not take XSD into account, therefore I recommend


testCase in this way - the most parent tag contains:

  • testCaseName - the name of our test case, this parameter is passed to the application input
  • webPackageName - the name of WebPackage, which is specified in webPackageName (see paragraph above on WebPackage)
  • actions - action entity container



  • name - name - it is recommended to specify the form name and key actions - what and why
  • orderNumber - serial number - parameter necessary for sorting action (introduced due to the fact that when parsing xml in java using certain tools, parsing can be performed in a multi-threaded environment, so the order can go) - when specifying the next action, you can jump - i.e. when sorting, it matters only “more / less”, and so it is possible to deliver action between those already described without the need to change the entire numbering
  • runConfiguration - the actual description of what will happen as part of the action



  • openValidation attribute - optional, default is "0"
    • 0 - do not conduct initial form validation
    • 1 - initial form validation
  • attribute closeValidation - optional, default is "0"
    • 0 - do not conduct final form validation
    • 1 - final form validation
  • formName - the name of the form within which the actions will be performed - the formName value from WebPackage
  • repeatsOnError - optional, indicates how many repeats should be performed in case of failure
  • events - event entity container
  • exceptionBlock - optional - a container of event entities that are executed in case of an error


Minimum structural unit - this entity shows what actions are performed.

Each event is special, can have unique tags and attributes.

The base type contains:

  • type attribute - indicates the type of element
  • the hasExceptionBlock attribute is an optional attribute, by default “0” is necessary to implement the failure mechanism - the attribute indicates that we can expect an error on this event
    • 0 - no error expected
    • 1 - a possible error is expected on the action
  • attribute invertResult - optional attribute, defaults to "0" - the attribute indicates that it is necessary to change the result of the event
    • 0 - leave the result of the event
    • 1 - change the result of the event to the opposite

*! A mechanism for describing the expected error
Let me give you a trivial example of where I used it for the first time and what to do to make it work.

Case: Captcha input. At the moment, I could not automate, so to speak, a check for the robot crashes so far - they don’t write me a captcha test service (but it’s easier for me to make a neural network for recognition))) So, we may make a mistake when entering. In this case, I make a control event in which I check that we do not have an element - a notification about an incorrect control code, I put the hasExceptionBlock attribute on it. Previously, I asked action that we can have several repetitions (5) and after all, I registered exceptionBlock, in which I wrote that I had to press the exit button for the notification, and then the action was repeated.

Examples from my context.

Here is how I registered the event:


And here exceptionBlock after events


And yes, actions on one page can be decomposed into several actions.

+ who noticed 2 parameters in the config: defaultTimeOutsForWebDriverWait, lowTimeOutsForWebDriverWait. So that's why they are. Because I have the entire web driver in a singleton, and I did not want to create a new WebDriverWait every time, then I have 1 fast one and it is in case of an error (well, or if you just put hasExceptionBlock = "1", then it will be stupid with less time explicit waiting) - well, you must agree, wait a minute to make sure that the message did not come out comme il faut, as well as create a new WebDriverWait each time. Well, this situation on which side do not stick requires a crutch - I decided to do so.

Event types

Here I will give a minimal set of my events, such as a scout set, with which I can test almost everything on my system.

And now it's smooth to the code to understand what an event is and how it is built. The code essentially implements the framework. I have 2 classes - DataBaseWrapper and SeleniumWrapper. In these classes, interaction with infrastructure components is described, and platform features are also reflected. I will give the interface which implements SeleniumWrapper

package logic.selenium;
import models.ElementWithStringValue;
import models.webpackage.Element;
import org.openqa.selenium.WebElement;
public interface SeleniumService {
    void initialization(boolean webDriverWait);
    void nacigateTo(String url);
    void refreshPage();
    boolean checkElementNotPresent(Element element);
    WebElement findSingleVisibleElement(Element element);
    WebElement findSingleElementInDOM(Element element);
    void enterSingleValuesToWebField(ElementWithStringValue element);
    void click(Element element);
    String getInputValue(Element element);
    Object jsReturnsValue(String jsFunction);
    //Actions actions
    void doubleClick(Element element);
    void moveMouseToElement(Element element);
    void pressKey(CharSequence charSequence);
    void getScreenShot(String storage);

It describes all the features of selenium and overlays platform chips - well, actually the main chip is the "enterSingleValuesToWebField" method. Remember that we in WebPackage specify the type of element. So, how to react to this type when filling in the fields is written here. We write 1 time and forget. The above method should be corrected for yourself in the first place. For example, types 5 and 6, currently in force, are only suitable for my system. And if you have such a thing as a filter and you need to filter a lot, and it’s typical (in your web application), but in order to use it you must first move the mouse over the field, wait for some fields to appear, switch to some, wait there’s something, then go there and enter ... Stupidly prescribe the mechanism of action 1 time,

So, in the package “package” there is an abstract class that describes the general actions of the event. In order to create your own unique event, you need to create a new class, inherit from this abstract class, and you already have dataBaseService and seleniumService in the kit - and then you determine what data you need and what to do with it. Something like this. Well and accordingly, after creating a new event, you need to finish the TestCaseActionFactory factory class and, if possible, XSD scheme. Well, if a new attribute is added - modify the model itself. In fact, it is very easy and fast.

So, a scout set.

goToURL - usually the first action - click on the specified link


fillingFields - Filling specified elements

Special tags:

  • fields — контейнер сущности fiel
    • field — содержит тег element
      • element — указывается имя элемента из webPackage
      • value — какое значение указать, имеет необязательный атрибут type (если елемент чекбокс, то указывается одно из значений: «check» или «uncheck»)

  • атрибут type — указывается каким образом взять значение, необязательный, значение по умолчанию — «1»
    • 1 — берется указанное значение
    • 2 — в этом случае выполняется указанная JS функция из директории "\TestSuite\JS"! ВАЖНО — указывается название txt файла, без ".txt" (и я пока нашел применения js функциям пока только в таком виде – использую в одном месте для генерации случайного инн, однако спектр возможного применения широк)
    • 3 - in this case, the query in the database is indicated as the value, and the program substitutes the first result of this query


checkElementsVisibility - checks that the specified elements are present on the form (namely, visible, and not just in the DOM). In the field attribute, either an element from WebPackage or directly xpath can be specified


checkElementsInVisibility - similar to checkElementsVisibility, but vice versa

clickElement - click on the specified element


checkInputValues - check the entered values


dbUpdate - perform an update in the database ( oXygen reacts strangely to 1 event of the dbUpdate type - I don’t know what to do with it and I don’t understand why )

10update что-то там

CheckQueryResultWithUtilityValue - check the value entered by the user with the value from the database

10select ...test

checkFieldsPresenceByQueryResult - check for the presence of elements on the form by xpath by pattern. If the desired pattern is not specified, then the search will occur according to the pattern .//* [text () [contains (normalize-space (.), "$")]], Where instead of "$" there will be a value from the database. When describing your own pattern, you should indicate "$" at the place where you want to put the value from the database. In my system there are so-called grids in which there are values ​​that are usually formed from some kind of view. This event to test such grids


Wait - everything is simple - waiting for the specified number of milliseconds. Unfortunately, even though this is considered to be a crutch, I will say for sure - sometimes it is impossible to do without it


scrollDown - scroll down from the specified element. It is done this way: it clicks on the specified element and presses the “PgDn” key. In my cases where I had to scroll down, it works fine:


userInput - enter a value in the specified element. The only semiautomatic device in my automation, used only for captcha. The element to enter the value into is indicated. The value is entered in the pop-up dialog box.


About the code

So, I tried to make the platform in accordance with the principles of Uncle Bob's Clean Architecture.


application - initialization and launch + configs and report (do not scold me for the Report class - this is what is most whipped up, then as fast as possible)

logic - key logic + Selenium and DB services. There are events.

models - POJO on XML and all auxiliary

utils object classes - singleton for selenium and DB

To run the code, you need to download jdk 12 and specify everywhere that its features are turned on. In Idea, this is done through Project Structure -> Modules and Project. Also do not forget about the Maven runner. And when you start in the bat file, add --enable-preview. An example was.

Well, for everything to start up, by JDK you will need to download the ojdbc driver and drop the dzharnik into the “SprintAutoTest \ src \ lib” directory. I do not provide it, because right now everything is serious with the Oracle - to download it is necessary to register, but I'm sure everyone will cope in one way or another (well, make sure that all folders are created, otherwise the report will not be saved)


So, we have a test launcher, writing tests for which is really fast. During the working week, I was able to automate 1.5 hours of manual work, which is performed by the robot in 5-6 minutes. These are approximately 3700 lines of the concatenated test case and 830 described elements (more than 4800 lines). The numbers are rough, and so it is not measured, but those who are engaged in automation should understand that this is a very high indicator, especially for robot-unfriendly systems. At the same time, I test everything - business logic, along the way I perform some negative tests for the correctness of the filled attributes, and as a bonus I check every web form that I’m not idling about, and describe all the functional and key elements, they are needed independently me or not (A small digression - I use closeValidation mainly only when writing a test.

At first glance, it seems that there are a lot of lines of xml, but in practice they are generated semi-automatically, and an error can only be made in directly entered parameters (because in fact we have 2 levels of validation - the first is the xml of the scheme, the second checking for the presence of the specified forms and elements at the start of TestCase).

Of the minuses - there are no clear boundaries of the tests. As such, they are missing and you can blame me that this is just a macro launcher, not tests. I say this:

In the platform, tests are conceptually divided from several points of view into several levels of abstraction:

  • как касета + конечный результат — тест завершается если «порвалась пленка» + как рекомендация — это валидация конечного результата (то к чему мы приходим по окончании тестирования функциональности – т.е. валидация конечного результата бизнес-процесса)
  • описание действий декомпозировано постранично (сущность action — это совокупность event в рамках одной страницы c промежуточной валидацией действий + начальная и конечная валидация страницы в автоматическом режиме)
  • описанными ивентами, мы не определяем сами тесты, однако они есть – их определяет тест-дизайнер. Да и никто не мешает развернуть платформу так, чтобы каждый ивент представлял из себя тест – но именно от этого я пытался уйти. У меня есть пару ивентов, содержащих ком. тайну, которые я убрал – которые просто было удобнее описать как ивент, но сути дела не меняет
  • each action on the page is a unit test (returning a single result true or false)

If you compare my approach with something, then the minuses of, for example, the popular Cucumber and the very concept of BDD are more significant for me (exactly when we test the ones like my system):

  • We cannot guarantee the performance of tests when testing an unstable web application. Those. in my case, for most tests we cannot guarantee their execution, and they will fall on “When”, which in my opinion is not generally acceptable, if we describe testing with a set of tests.
  • Well and everywhere this already enormous example with a login is given. Yes, for all my practice, it’s precisely in the login that there have never been errors ever (although it certainly should be covered, that's for sure). And the example is good, but for the rest of the tests, you need to sculpt endless crutches from Given and When - in the system I gave for real tests, it will take 99% to describe the intermediate conditions, while we get to this test itself - a lot of trouble, a little essence, and even that climb into the code.

From what I would like to do, to think about, but my hands have not yet reached:

  • running not one but several tests sequentially in one run
  • I think, pumping entities so that they can generate values ​​upon the fact of a new test, and they were saved during execution
  • make centralized version control. Not just git, but to indicate which version of the test to run, well, or, again, a smart module that understands which version is relevant, which is not yet
  • as I said, to start a new test with new values, I need to change 1 digit, automate, as it were, make a smart module for it
  • although it doesn’t bother me much that all the locators are stored in one place, still in a good way I would have to make an even more user-friendly structure for storing them
  • did not dabble with selenium server. I think it’s worth thinking about this, as well as the possibility of further adaptation to CI, Team City, etc.

Well, that's all. I enclose a reference to github: source codes .

I will be very glad of constructive criticism, I hope this project will really be useful.

Also popular now: