Validating PVS-Studio with Clang

    Checking PVS-Studio with Clang
    Yes Yes. You heard right. This time the article is "vice versa." We are not checking a project, but checking our analyzer using another tool. In fact, we have done this before. For example, we checked PVS-Studio using Cppcheck, using the analyzer built into Visual Studio, looked at Intel C ++ warnings. But before there was no reason to write an article. Nothing interesting was found. But Clang was able to interest in its diagnostic messages.

    We tested Clang twice with PVS-Studio [ 1 , 2] and each time they found something interesting. But we could not verify PVS-Studio in any way. Clang developers have long been reporting that they are good at building Windows projects developed using Visual C ++. But in practice, somehow it doesn’t work out. Or we are not lucky.

    And the other day we suddenly realized that we can easily test our analyzer using Clang. It was just necessary to approach the task from the other side. Every night we get a command-line version of PVS-Studio for Linux using GCC. And the GCC compiler is very easily replaced with Clang. Accordingly, we can easily try checking PVS-Studio. And really. On the same day, as a bright idea occurred to one of the employees, we had a report on the PVS-Studio verification. Now I sit and talk about the contents of the report and my impressions.

    Impression of html reports


    Of course, I have already dealt with Clang. However, it is difficult to judge the quality of analysis on third-party projects. I often cannot understand if there is a mistake or not. Particularly scary when Clang shows that you need to view a path of 37 points in the source code.

    PVS-Studio code, on the contrary, is familiar to me, and I finally was able to fully work with the report issued by Clang. Unfortunately, this confirmed my earlier impressions that the often shown way to achieve the detected error is redundant and confuses the programmer. I understand that giving key points to program execution and building such a path is an extremely difficult task. For example, we at PVS-Studio are generally afraid to take up such a voluminous task. However, since Clang implements the display of this path, then obviously we need to go further in improving this mechanism.

    Otherwise, such points only knock down, litter the conclusion and complicate the understanding:



    The figure shows the "point N4". Somewhere below is an error. I understand that an error will only occur if the condition is not met. This is what Clang reports. But why display this information? And so it is clear that if the condition is true, then the function will exit and there will be no error. Extra pointless information. There is a lot of such redundant information. Obviously, this mechanism can and should be improved.

    However, I want to give credit to the Clang developers. Often, showing such a path really helps to understand the cause of the error, especially when more than one function is involved in it. And, unambiguously, the Clang developers displayed the way to achieve the error turned out much better than in the Visual Studio 2013 static analyzer. There, often half of the function consisting of 500 lines is tinted. And what this coloring gives is completely incomprehensible.

    The criticality of the errors found


    Checking PVS-Studio is a very good example, what a thankless task to show the benefits of static analysis on a working, well-tested project. In fact, I can “get away” from all the mistakes by saying that:
    • this code is not used now;
    • this code is used extremely rarely or in error handling;
    • this error, but it will not lead to noticeable consequences (its correction does not manifest itself in any way on a huge number of regression tests).


    Having otmazyvatsya, I will keep the appearance that I do not make serious mistakes and with a proud look I can say that Clang is suitable only for beginners in programming.

    Of course, I won’t say that! The fact that no critical errors were found does not mean that Clang is weak in analysis. The absence of such defects is the result of a lot of work on testing by various methods:
    • internal unit tests;
    • diagnostic regression testing (marked up files);
    • testing on sets of * .i files containing various C ++ constructs and various extensions;
    • regression testing on 90 open-source projects;
    • and, of course, static analysis using PVS-Studio.


    After such defense in depth, it is hard to expect that Clang will find 20 dereferences of null pointers and 10 divisions by 0. However, think about it. Even in a carefully tested project, Clang found a few bugs. This means that with regular use, you can avoid a lot of trouble. It is better to fix the error when Clang finds it, than to get from the * .i user the file on which PVS-Studio crashes.

    Naturally, we made conclusions for ourselves. Now my colleague is just setting up the Clang launch on the server and sending logs to the mail if the analyzer finds something.

    About false positives


    In total, the Clang analyzer generated 45 warnings. I don’t want to talk about the number of false positives. I’d better say that 12 places should be corrected.

    The fact is that “false positive” is a relative matter. Formally, the analyzer is right - the code is bad and suspicious. But this does not mean that a defect is found. I will explain with examples.

    First, consider the real false positive:
    #define CreateBitMask(bitNum) ((v_uint64)(1) << bitNum)
    unsigned GetBitCountForRepresntValueLoopMethod(
      v_int64 value, unsigned maxBitsCount)
    {
      if (value == 0)
        return 0;
      if (value < 0)
        return maxBitsCount;
      v_uint64 uvalue = value;
      unsigned n = 0;
      int bit;
      for (bit = maxBitsCount - 1; bit >= 0; --bit)
      {
        if ((uvalue & CreateBitMask(bit)) != 0)
         // Clang: Within the expansion of the macro 'CreateBitMask':
         // The result of the '<<' expression is undefined
        {
          n = bit + 1;
          break;
        }
      ....
    }

    As I understand it, the analyzer reports that the shift operation can lead to undefined behavior. Apparently, Clang got confused in the logic or could not correctly calculate the possible range of values ​​of the variable maxBitsCount. I carefully studied how the GetBitCountForRepresntValueLoopMethod () function was called and could not find a situation where the value in the 'maxBitsCount' variable would be too large. I understand shifts [ 3 ]. So I am sure that there is no error.

    Self-confidence is good, but not enough. Therefore, I entered here such assert ():
    ....
    for (bit = maxBitsCount - 1; bit >= 0; --bit)
    {
      VivaAssert(bit >= 0 && bit < 64);
      if ((uvalue & CreateBitMask(bit)) != 0)
    ....

    And on all tests this assert () did not work. So this is really an example of a false positive for the Clang analyzer.

    The nice consequence of adding assert () was that Clang stopped reporting. It focuses on assert () macros. They tell the analyzer the possible ranges of variable values.

    There are few such true false positives. There are much more of these messages:
    static bool G807_IsException1(const Ptree *p)
    {
      ....
        if (kind == ntArrayExpr) {
          p = First(p);
          kind = p->What();
            // Clang: Value stored to 'kind' is never read
      ....

    The assignment "kind = p-> What ();" is not used. It used to be, and after the changes, this line became superfluous. The analyzer is right. The line is superfluous, and it should be deleted at least so that it does not confuse the person who will accompany this code.

    Another example:
    template<> template<>
    void object::test<11>() {
      ....
      // Нулевой nullWalker не будет использоваться в тестах.
      VivaCore::VivaWalker *nullWalker = 0;
      left.m_simpleType = ST_INT;
      left.SetCountOfUsedBits(32);
      left.m_creationHistory = TYPE_FROM_VALUE;
      right.m_simpleType = ST_INT;
      right.SetCountOfUsedBits(11);
      right.m_creationHistory = TYPE_FROM_EXPRESSION;
      result &= ApplyRuleN1(*nullWalker, left, right, false);
        // Clang: Forming reference to null pointer
      ....
    }

    In a unit test, a null pointer is dereferenced. Yes, you can’t do this and ugly. But really want to. The fact is that it is very difficult to prepare an instance of the VivaWalker class, and in this case the reference to the object is not used at all.

    Both examples show code that works. However, I do not call this a false positive. These are flaws that need to be addressed. However, I would not write these warnings in the “found errors” column either. That is why I say that a false positive is a relative concept.

    Found bugs


    Finally, we got to the section where I will show interesting snippets of code that Clang found inside PVS-Studio.

    Found errors are not critical for the program. I'm not trying to make excuses. But it turned out really so. After editing all warnings, regression tests did not reveal any differences in the work of PVS-Studio.

    However, we are talking about the real mistakes and it's great that Clang was able to find them. I hope now, with its regular launches, it will be able to find more serious mistakes in the fresh PVS-Studio code.

    Using two uninitialized variables


    The code was large and complex. Bringing it in an article makes no sense. I made a synthetic code that reflects the essence of the error.
    int A, B;
    bool getA, getB;
    Get(A, getA, B, getB);
    int TmpA = A; // Clang: Assigned value is garbage or undefined
    int TmpB = B; // Clang: Assigned value is garbage or undefined
    if (getA)
      Use(TmpA);
    if (getB)
      Use(TmpB);

    The Get () function can initialize the variables A and B. Whether it initializes them or not, it notes getA, getB in the variables.

    Regardless of the variables A and B are initialized, their values ​​are copied to TmpA, TmpB. At this point, there is the use of an uninitialized variable.

    Why am I saying that this is a non-critical error? In practice, copying an uninitialized variable of type 'int' does not cause any trouble. Formally, as I understand it, Undefined behavior arises. In practice, the garbage will simply be copied. Further variables with garbage are not used.

    The code was rewritten as follows:
    if (getA)
    {
      int TmpA = A;
      Use(TmpA);
    }
    if (getB)
    {
      int TmpB = B;
      Use(TmpB);
    }

    Uninitialized pointers


    Let's look at the call to the GetPtreePos () function. It passes references to uninitialized pointers.
    SourceLocation Parser::GetLocation(const Ptree* ptree)
    {
      const char *begin, *end;
      GetPtreePos(ptree, begin, end);
        return GetSourceLocation(*this, begin);
    }

    It is not right. The GetPtreePos () function assumes that pointers are initialized to nullptr. Here's how it works:
    void GetPtreePos(const Ptree *p, const char *&begin, const char *&end)
    {
      while (p != nullptr)
      {
        if (p->IsLeaf())
        {
          const char *pos = p->GetLeafPosition();
          if (....)
          {
            if (begin == nullptr) {
                // Clang: The left operand of '==' is a garbage value
              begin = pos;
            } else {
              begin = min(begin, pos);
            }
            end = max(end, pos);
          }
          return;
        }
        GetPtreePos(p->Car(), begin, end);
        p = p->Cdr();
      }
    }

    It only saves you from shame that the GetLocation () function is called when a certain code parsing error occurs in the unit test subsystem. Apparently, this has never happened.

    A good example of how static analysis complements TDD [4].

    Scary explicit casts


    There are three similar functions with scary and incorrect type conversions. Here is one of them:
    bool Environment::LookupType(
      CPointerDuplacateGuard &envGuard, const char* name,
      size_t len, Bind*& t, const Environment **ppRetEnv,
      bool includeFunctions) const
    {
      VivaAssert(m_isValidEnvironment);
      //todo:
      Environment *eTmp = const_cast(this);
      Environment **ppRetEnvTmp = const_cast(ppRetEnv);
      bool r = eTmp->LookupType(envGuard, name, len, t,
                                ppRetEnvTmp, includeFunctions);
      ppRetEnv = const_cast(ppRetEnvTmp);
        // Clang: Value stored to 'ppRetEnv' is never read
      return r;
    }

    Sodom and Gomorrah. We tried to remove the constancy. And then return the received value. But, in fact, in the line “ppRetEnv = const_cast ....” the local variable ppRetEnv simply changes.

    Let me explain where such disgrace came from and how it affects the program.

    PVS-Studio analyzer is based on the OpenC ++ library. It practically did not use the keyword 'const'. At any moment, you could change anything and anywhere using pointers to non-constant objects. PVS-Studio inherited this flaw.

    They fought with him. But not yet to the end. You write const in one place, you need to write in the second place, then in the third and so on. Then it turns out that in certain situations, something needs to be changed using the pointer and it is necessary to divide the function into parts or conduct even more extensive refactoring.

    The last heroic attempt to make const everywhere one of the idealistic collaborators took a week of time and ended in partial failure. It became clear that we needed big changes in the code and editing of some data storage structures. The bringing of light into the kingdom of darkness was stopped. There were several functions of stubs, as considered, the purpose of which is to make compiled code.

    What does this error affect? Oddly enough, it seems no matter what. All unit tests and regression tests did not reveal any changes in the behavior of PVS-Studio after corrections. Apparently, the return value in "ppRetEnv" is not very necessary at work.

    Using a potentially uninitialized variable


    v_uint64 v; // Clang: 'v' declared without an initial value
    verify(GetEscape(p, len - 3, v, notation, &p));
    retValue <<= 8;
    retValue |= v; // Clang: Assigned value is garbage or undefined

    The GetEscape () function may not work correctly and then the variable 'v' will remain uninitialized. The result of GetEscape () for some reason checks the verify () macro. How it happened is already unknown.

    The error goes unnoticed for the following reason. The GetEscape () function does not initialize the variable only if the PVS-Studio analyzer will work with incorrect program text. In the correct program text, the correct ESC sequences are always and the variable is always initialized.

    Himself surprised how it worked


    Ptree *varDecl = bind->GetDecl();
    if (varDecl != nullptr)
    {
      if (varDecl->m_wiseType.IsIntegerVirtualValue())
        varRanges.push_back(....);
      else if (varDecl->m_wiseType.IsPointerVirtualValue())
        varRanges.push_back(....);
      else
        varRanges.push_back(nullptr);
    }
    rangeTypes.push_back(varDecl->m_wiseType.m_simpleType);
      // Clang: Dereference of null pointer

    The varDecl pointer may be nullptr. However, the last line is always executed. And a null pointer dereference may occur: varDecl-> m_wiseType.m_simpleType.

    Why there has never been a fall here is a mystery to me. The only assumption is that we never get here if the object does not store a pointer to a variable declaration (declarator of variable). But you can’t rely on it anyway.

    A very serious mistake was found, which, if not now, then, someday, would certainly prove itself.

    Surprisingly, falls were not seen here either.


    Another amazing place. Apparently, a combination of factors that can lead to dereferencing a null pointer is extremely unlikely. At least, the fall has not been noticed since the writing of this function. But after that, a year and a half has passed. Miracle.
    
    void ApplyRuleG_657(VivaWalker &walker,
      const BindFunctionName *bind,
      const IntegerVirtualValueArray *pReturnIntegerVirtualValues,
      const PointerVirtualValueArray *pReturnPointerVirtualValues,
      const Ptree *body, const Ptree *bodySrc,
      const Environment *env)
    {
      if (body == nullptr || bodySrc == nullptr)
      {
        VivaAssert(false);
        return;
      }
      if (bind == nullptr)
        return;
      if (pReturnIntegerVirtualValues == nullptr &&
          pReturnPointerVirtualValues == nullptr)
        return;
      ....
      size_t integerValueCount = pReturnIntegerVirtualValues->size();
      // Clang: Called C++ object pointer is null

    The pointer pReturnIntegerVirtualValues ​​may well be nullptr.

    At first glance, it seems that the error is in the condition and that the "||" operator should be used:
    if (pReturnIntegerVirtualValues == nullptr &&
        pReturnPointerVirtualValues == nullptr)

    But this is not so. The condition is correct. You just need to check that the pointer is not null before dereferencing the pointer. If it is zero, the integerValueCount variable should be set to 0. The correct option:
    size_t integerValueCount =
      pReturnIntegerVirtualValues != nullptr ?
        pReturnIntegerVirtualValues->size() : 0;

    Amazing So many tests. Runs on 90 open source projects. Plus for the year we checked a lot of other projects. And still, a bug lives in the code. And he would have shown himself at our important potential client.

    Glory to the static analyzers! Glory to Clang!

    Other


    The analyzer revealed some more errors that should be corrected. It is difficult to describe them, but I do not want to make synthetic examples. Plus there are a couple of warnings that are absolutely true, but useless. There I had to turn off the analysis.

    For example, Clang was worried about uninitialized variables when using the RunPVSBatchFileMode () function. But the fact is that batch launch is simply not implemented for Linux, and there is a stub. And we do not plan to implement it yet.

    conclusions


    Use static analyzers in your work.

    I find the PVS-Studio core to be highly tested. However, the Clang static analyzer found 12 real errors. Other places are not errors, but they were code that smells, and I fixed all such places.

    Found errors could prove to be at the most inopportune moment. Plus, I think this analyzer would help to find a lot of errors that were caught by tests. A run of the main regression tests takes about 2 hours. If you find something earlier, that's fine.

    Here's an article with a Clang ad. But he deserved it.

    Just don’t think that other analyzers are of little use. For example, I personally really like the Cppcheck analyzer. It is very easy to use and has very clear diagnostics. But it so happened that he did not find a bunch of errors in PVS-Studio, so I can’t write a similar article about him.

    And, of course, I recommend using the PVS-Studio analyzer in your work. It is extremely convenient for those using Visual C ++ [ 5 ]. Especially do not forget about the automatic analysis that starts after successful compilation of the changed files.

    This article is in English.


    If you want to share this article with an English-speaking audience, then please use the link to the translation: Andrey Karpov. Checking PVS-Studio with Clang .

    Additional links:


    1. Andrey Karpov. PVS-Studio vs Clang .
    2. Andrey Karpov. Static analysis should be applied regularly .
    3. Andrey Karpov. Not knowing the ford, do not get into the water. Part Three (about shift operations) .
    4. Andrey Karpov. How static analysis complements TDD .
    5. Andrey Karpov. PVS-Studio for Visual C ++ .


    Have you read the article and have a question?
    Often our articles are asked the same questions. We collected answers to them here: Answers to questions from readers of articles about PVS-Studio and CppCat, version 2014 . Please see the list.

    Also popular now: