Test it: how do we determine which tests to run on pull requests?

    Hi, Habr! My name is Egor Danilenko. I am developing a digital platform for corporate Internet bank Sberbank Business Online, and today I want to tell you about the CI development procedure adopted by us.

    How does a developer’s change reach the release branch? The developer makes changes locally and pushes into our version control system. We use Bitbucket with the author plugin (we wrote about this plugin earlier here ). These changes start the build and run tests (unit, integration, functional). If the assembly has not collapsed and all the tests have been completed successfully, and also after a successful review, the pull request is inserted into the main branch.

    But over time, the number of teams increased. Proportionally increased the number of tests. We understood that such a number of teams would speed up the onset of the problem of a “slow pull-request check”, and it would be impossible to develop a product. At the moment we have about 40 teams. Along with new features, they bring new tests that also need to be run on pull-requests.

    We thought that it would be cool if we knew which tests to run under changing a specific piece of code.

    And that's how we solved this problem.

    Formulation of the problem


    There is a project with tests, and we want to determine which tests to run when “touching” a particular file.

    We all know about the library to cover the JaCoCo code from EclEmma. We took it as a basis.

    A bit about JaCoCo


    JaCoCo is a library for measuring code coverage by tests. The work is based on the analysis of bytes of code. The agent collects execution information and uploads it upon request or stop of the JVM.

    There are three modes of data collection:

    1. File system: after stopping JVM data will be written to a file.
    2. TCP Socket Server: You can connect external tools to the JVM and get data through a socket.
    3. TCP Socket Client: at startup, the JaCoCo agent connects to a specific TCP endpoint.

    We chose the second option.

    Decision


    You must implement the ability to run the applications and the tests themselves with the JaCoCo agent.

    First of all, add to gradle the ability to run tests with a JaCoCo agent.

    Java agent can be started:

    -javaagent:[yourpath/]jacocoagent.jar=[option1]=[value1],[option2]=[value2]

    Add a dependency to our project:

    dependencies {
        compile ‘org.jacoco:org.jacoco.agent:0.8.0’
    }
    

    We need to run with an agent only to collect statics, so we add a withJacoco flag with default value false to gradle.properties. We also prescribe the directory where statistics, address and port will be collected.

    Add the formation of the jvm argument with the agent to the test run:

    if (withJacoco.toBoolean()) {
        …
        jvmArgs "-javaagent:${tempPath}=${jacocoArgs.join(',')}".toString()
    }

    Now we need to collect statistics with JaCoCo after each successful completion of the test. To do this, we write TestNG listener.

    publicclassJacocoCoverageTestNGListenerimplementsITestListener{
        privatestaticfinal IntegrationTestsCoverageReporter reporter = new IntegrationTestsCoverageReporter();
        privatestaticfinal String TEST_NAME_PATTERN = "%s.%s";
        @OverridepublicvoidonTestStart(ITestResult result){
            reporter.resetCoverageDumpers(String.format(TEST_NAME_PATTERN, result.getInstanceName(), result.getMethod().getMethodName()));
        }
        @OverridepublicvoidonTestSuccess(ITestResult result){
            reporter.report(String.format(TEST_NAME_PATTERN, result.getInstanceName(), result.getMethod().getMethodName()));
        }
    }
    

    Add a listener to testng.xml and its comments, since we don’t need it at the usual test run.

    Now we have the opportunity to run tests with the JaCoCo agent, with each successful test statistics will be collected.

    A little more detail about how the reporter for collecting statistics is implemented.
    During the initialization of the reporter, it connects to the agents, creates a directory where the statistics will be stored, and the actual collection of statistics.

    Add a report method:

    publicvoidreport(String test){
        reportClassFiles(test);
        reportResources(test);
    }

    The reportClassFile method creates a jvm folder in the statistics directory, which stores statistics collected from class files.

    The reportResources method creates a resources folder that stores collected statistics on resources (for all non-class files).

    The report contains all the logic for connecting to the agent, reading data from the socket and writing to a file. Implemented by tools that JaCoCo provides, such as org.jacoco.core.runtime.RemoteControlReader / RemoteControlWriter.

    The reportClassFiles and reportResources functions use the common function dumpToFile.

    publicvoiddumpToFile(File file){
        try (Writer fileWriter = new BufferedWriter(new FileWriter(file))) {
            for (RemoteControlReader remoteControlReader : remoteControlReaders) {
                remoteControleReader.setExecutionDataVisitor(new IExecutionDataVisitor() {
                    @OverridepublicvoidvisitClassExecution(ExecutionData data){
                        if (data.hasHits()) {
                            String name = data.getName();
                            try {
                                fileWriter.write(name);
                                fileWriter.write('\n');
                            } catch (IOException e) {
                                thrownew RuntimeException(e);
                            }
                        }
                    }
                });
            }
        }
    }

    The result of the function will be a file with a set of classes / resources that this test affects.

    And so, after running all the tests, we have a directory with statistics on class files and resources.

    It remains to write a pipeline for the daily collection of statistics and add pull-requests checks to the pipeline.

    Stage projects are not interesting for us, but we will take a closer look at the page for publishing statistics.

    stage('Agregate and parse result') {
    def inverterInJenkins = downloadMavenDependency(
                            url: NEXUS_RELEASE_REPOSITORY,
                            group: 'ХХХ',
                            name: 'coverage-inverter',
                            version: '0',
                            type: 'jar',
                            mavenHome: wsp
    )
    dir('coverage-mapping') {
        gitFullCheckoutRef 'ХХХ', 'ХХХ', 'coverage-mapping', "refs/heads/${params.targetBranch}-integration-tests"
        sh 'rm -rf *'
    }
    sh "ls -lRa ../ХХХ/out/coverage/"
    def inverter = wsp + inverterInJenkins.substring(wsp.length())
    sh "java -jar ${inverter} " +
            "-d ../ХХХ/out/coverage/jvm " +
            "-o coverage-mapping/ХХХ/jvm " +
            "-i coverage-config/jvm-include " +
            "-e coverage-config/jvm-exclude"
    sh "java -jar ${inverter} " +
            "-d ../ХХХ/out/coverage/resources " +
            "-o coverage-mapping/ХХХ/resources " +
            "-i coverage-config/resources-include " +
            "-e coverage-config/resources-exclude"
    gitPush 'ХХХ', 'ХХХ', 'coverage-mapping', "${params.targetBranch}-integration-tests"
    }
    

    In coverage-mapping, we need to store the name of the file and inside it a list of tests that need to be run. Since the result of the statistics collection is the name of the test, which stores a set of classes and resources, we need to invert the whole thing and exclude unnecessary data (classes from the third-party libraries).

    We invert our statistics and push into our repository.

    Statistics are collected every night. It is stored in a separate repository for each release branch.

    Bingo!

    Now, when running the tests, we need to find the modified file and determine the tests that need to be run.

    Problems we encountered:

    • Since JaCoCo works only with bytecode, it is impossible to collect statistics on files such as .xml, .gradle, .sql from the box. Therefore, we had to “fasten” our decisions.
    • Constant monitoring of the relevance of the statistics and the frequency of the assembly, if the nightly assembly collapses for some reason, then yesterday's statistics will be used for checking in pull requests.

    Also popular now: