New analyzer suppression mechanism

    PVS-Studio
    At the moment, PVS-Studio analyzer already has a mechanism for suppressing false positives (False Positive). This mechanism completely suits us from a functional point of view, i.e. we have no complaints about the reliability of his work. However, some of our users and clients had a desire to be able to work with analyzer messages only on the “new” one, i.e. newly written code. This desire can be fully understood, given that in a large project the analyzer can generate thousands or even tens of thousands of messages to existing code, which, of course, will not be edited.



    The possibility of marking up messages as “false” in some sense intersects with the desire to work only with “new” messages, because nothing, theoretically, interferes with marking all messages found as “false”, and in the future only working with messages in newly written code.

    However, in the existing mechanism for marking up “false messages” there is a fundamental usability problem (which we will discuss later), which can become an obstacle when it is used on real projects to solve this problem. In particular, this existing mechanism does not imply the use for mass markup, which will be inevitable when processing thousands of analyzer messages.

    Due to the fact that the problem described above is fundamental to the existing methodology, it cannot be eliminated while maintaining the same methodology. Therefore, it makes sense to consider the possibility of implementing an alternative method for solving this problem.

    The task of suppressing false positives and the task of suppressing the results of previous launches.


    The mechanism for comparing the source code and analyzer diagnostics implies the ability to match a line of source code with a specific analyzer diagnostic. In this case, it is important to maintain this connection for a long period of time, during which both the user code and the diagnostic output of the analyzer can change.

    The mechanism for comparing source code and diagnostics can be used to solve 2 problems:
    1. The task of suppressing false positives. The user is given the opportunity to mark analyzer messages that he considers false in a special way, which will subsequently allow him to filter such messages (for example, hide them). This markup should be preserved during subsequent launches of the analyzer, including after the source code has been changed. This functionality has long been available in PVS-Studio.
    2. The task of suppressing the results of previous launches is to provide the user with the opportunity to see only the “fresh” results of the analyzer start (that is, the results that were already found on previous launches should not be displayed). This task was not previously implemented in PVS-Studio, and it will be discussed in the next section of the article.

    PVS-Studio has a mechanism for matching source code and diagnostics based on markers (comments of a special kind) in the source code. The mechanism is implemented at the analyzer core level (PVS-Studio.exe) and IDE plug-ins. The IDE plug-in carries out the initial arrangement of markers in the code, and also allows you to filter the analysis results by this marker. The analyzer core can “pick up” markers already present in the code and mark its output, thereby preserving the markup of the code from previous launches.

    Consider the advantages and disadvantages of the existing mechanism.

    Advantages:
    • Ease of implementation at the kernel and plugin level
    • Intuitive use for users of the tool, the ability to manually markup.
    • Guaranteed preservation of the connection “code - diagnostics” for any subsequent modifications of the code by the user.
    • "Free" support for team development - as markers are stored in source files; the same system can be used to synchronize them as for synchronization of the files themselves (for example, a version control system)
    disadvantages
    • Clogging the code with special comments that are not related to the logic of this code.
    • The problem is when using version control systems, the need to put comments of a special kind in the repository.
    • Potential corruption of source code in global mass markup.
    The problems described above make it impossible from a practical point of view to use the existing matching mechanism to implement the task of suppressing the results of previous launches, i.e. for "mass" markup of messages on the existing code base.

    In other words, no one wants to add 20,000 comments to the code that overwhelm the existing messages and put all these changes into the version control system without looking.

    A new mechanism for matching diagnostics with source code based on message database files.


    As shown earlier, the main problem of the existing mechanism is its commitment to modifying the user's source code. From this fact both undoubted advantages characteristic of such an approach flow, as well as its disadvantages. It becomes obvious that in order to implement an alternative approach, it is necessary to abandon the modification of the user code and store information about the link “analyzer diagnostics - user code” in some external storage, and not in the source files themselves.

    Long-term storage of such markups of comparisons poses the fundamental task of accounting for changes over a large time period both in the diagnostics of the analyzer itself and in the user's source code. The disappearance of diagnostics in the analyzer output is not a fundamental problem, since a message with such diagnostics has already been marked as false / unnecessary. But changes in the user code can lead to a “second coming” of messages that were previously marked out.

    This problem is not scary when using the code markup mechanism. No matter how much the code section changes, the marker will remain in it until the user (by a willful decision or out of ignorance) removes it, which seems unlikely. Moreover, an experienced user can add such a marker to a new (or changed) piece of code if he knows that the analyzer will swear here.

    What exactly is required to identify the analyzer diagnostic message? The analyzer message itself contains the file name, project, line number in the file, and checksums of the previous, current, and subsequent lines of code on which these diagnostics were found. To compare the diagnostics when changing in the source code, it will definitely be necessary not to take into account the line number, because it changes unpredictably and at the slightest modification of the document.

    To implement the store of diagnostics “connections” described above with the user code, we went along the path of creating local “database files”. Such files (files with the suppress extension) are created next to the project files (vcproj \ vcxproj) and contain lists of marked “unnecessary” diagnostics. Diagnostics are stored without line numbers; the paths to the files in which these diagnostics were identified are stored in a relative format - relative to project files. This allows you to transfer such files between the machines of developers, even if their projects are deployed in different places (from the point of view of the file system). These files can be stored in version control systems, because in most cases project files themselves store the paths to the source files in the same relative format. The exception here is the generated project files,

    We used the following fields to identify the message in the suppress file:
    • The text of the diagnostic message of the analyzer;
    • Message error code;
    • The relative name to the file in which the message was found;
    • Hash of the sum of the line with the code on which the message was found, as well as the previous and next lines of code.
    As you can see, it is due to the storage of the hash of the sums of the lines of the source code that we would like to correlate the analyzer message with the user code. At the same time, if the user code is “shifted”, the analyzer message will also shift, however, the “context” of this message (that is, the code that surrounds it) will remain unchanged. If the user corrects his code in the place where the message was generated, then it is logical to consider such a code to be “new”, and show the analyzer message on this code. Moreover, if the user really “fixed” the error that the analyzer pointed to with his message, the message would simply “disappear”. Otherwise, if the suspicious location is not fixed, the user will again see the analyzer message.

    It is clear that relying on hashes of lines of code in user files, we will encounter a number of limitations. For example, if a user has several identical lines of code in a file, we will consider all messages on such lines to be suppressed, even if only one of them was marked. More information about the problems and limitations that we encountered when using the described methodology will be discussed in the next section.

    PVS-Studio IDE plug-ins automatically create suppress files at the initial markup of messages, and then compare all newly generated diagnostics with those contained in suppress databases. And, if after re-checking the newly generated message is identified in the database, it will not be shown to the user.

    Statistics on the use of the new suppression mechanism


    After implementing the first workable prototype of the new mechanism, we naturally wanted to see how this mechanism will show itself when working with real projects. We did not wait several months \ years until a sufficient number of changes accumulate in such projects, but simply took several past revisions in several large open source projects.

    What did we want to see? We took some fairly old revision of the project (depending on the activity of the developers, it could have been a week or a whole year), checked it with our analyzer, put all the messages received in the suppress database. Then the project was updated to its last head revision and checked again by the analyzer. Ideally, we should see messages found only on the “new” code, i.e. the code that was written in the time period under consideration.

    When checking the first project, we encountered a number of problems and limitations of our methodology. Let's consider them in more detail.

    Firstly, what, in principle, was expected, the messages “reappeared” if the code was modified at the place where the message was issued, or on the previous / next line. Moreover, if the modification of the line of the message itself, as expected, led to the “resurrection” of such a message, then the modification of the surrounding lines, as it might seem, should not lead to this. This, in particular, is one of the main limitations of our methodology - we are attached to the text of the source file on these 3 lines. Further, to be attached to only one line seems inexpedient - too many messages can be potentially "mixed up". In the statistics on projects, which will be given later, we will designate such messages as “paired” - i.e. messages which, as it were, are already in suppress bases, but have surfaced again.

    Secondly, another feature (or rather, another limitation) of our new mechanism was revealed - the "resurrection" of messages in h (header) files in the case when these files were included in other source files in other projects. This limitation is due to the fact that databases are generated at the project IDE level. A similar situation arises in the case of new projects in the solution reusing header / source files.

    Further, it turned out to be not a good idea to focus on the analyzer message text to identify such a message in the databases. Sometimes the text of the analyzer message may contain line numbers in the source code (they change in the event of a shift) and the names of the variables appearing in the user code. We solved this problem by storing the incomplete analyzer message in the database - all digital symbols are cut out of it. But we decided to consider the “resurrection” of the message when changing the name of the variable correct - not only its name, but also the definition could change - we consider this to be a “new” code.

    Finally, some messages “migrated” - either the code with them was copied to other files, or the files were included in other projects, which, in principle, intersects with the very first described problem of our methodology.

    We list the statistics for several projects on which we tested the new system. A large number of diagnostic messages is caused by the fact that all messages were taken into account. Including the diagnosis of 64-bit errors, which unfortunately generates a lot of false positives, and nothing can be done about it.
    1. LLVM is a large project for a universal system of analysis, transformation and optimization of programs. The project has been actively developing for more than a year; accordingly, it was enough to take the dynamics of changes in just 1.5 months to get a large number of code modifications. The well-known Clang compiler is part of this project. For 1600-1800 project files, 52,000 messages were marked as unnecessary. After 1.5 months, 18,000 new analyzer messages were discovered, of which 500 were paired and 500 messages migrated to other files;
    2. Miranda is a well-known messaging program for Windows. Miranda has 11 versions since its first release. We took the latest of them: Miranda IM. Unfortunately, due to conflicts in the Miranda development team, this version did not change too often: I had to take changes with an interval of as much as 2 years. For 700 files, 51,000 messages were marked as unnecessary. Two years later, only 41 new posts appeared;
    3. ffdShow is a media decoder commonly used for fast and high-precision decoding of video streams. ffdShow is a fairly complete project, at the time of writing, the latest release was in April 2013. We took the dynamics of changes for 1 year. Of the 570 files, 21,000 messages were marked as junk. A year later, 120 new posts appeared;
    4. Torque3D is a game engine. Now the project is practically not developing, but at first everything was different. The latest release, at the time of writing, was May 1, 2007. At the time of active development, the dynamics of changes with an interval of a week issued 43,259 messages. During this period of time, 222 new ones appeared;
    5. OpenCV is a library of algorithms for computer vision, image processing and general purpose numerical algorithms. A fairly dynamic project. We took the dynamics of changes for 2 months and a year. 50948 junk messages were flagged. Of these, 1174 new messages after 2 months and 19471 a year later;
    What conclusions can we draw from our results?

    It is expected that on projects that are not actively developing, we did not see a large number of “new” messages, even over such a long period of time as a whole year. Note that for such projects we did not consider the number of “paired” and migrated messages.

    But the greatest interest, of course, for us is “living projects”. In particular, using the LLVM example, we see that the number of “new” messages amounted to 34% of those tagged on a version that was only 1.5 months behind in time! However, out of these 18,000 new messages, only 1,000 (500 migrated + 500 pairs) belong to the limitations of our methodology, i.e. only 5% of the total number of new posts.

    In our opinion, these figures very well demonstrated the viability of the new mechanism. Of course, it is worth remembering that the new suppression mechanism is by no means a "panacea", but nothing can cancel the ability to use the many previously existing methods of suppression / filtering. For example, if a message in the h file starts to “pop up” very often, there is nothing wrong with “killing” it forever by adding a comment like // - Vxxx to the line.

    Despite the fact that the new mechanism is already quite debugged, and we are ready to show it to our users in the next release, we decided to continue testing it by organizing a regular (every night) check of the LLVM / Clang project. The new mechanism will allow us to look only at messages from the "fresh" code of the project - theoretically we can find errors even before they are discovered by the developers. This will very well show the real benefits of using static analysis on a regular basis - and it would not be possible without our new suppression system, since it is unrealistic to view 50,000 every day. Wait for reports of fresh bugs found in Clang on our twitter .

    Also popular now: