PVS-Studio vs Clang

    PVS-Studio vs CLANG
    We accidentally checked the Clang project. I think the result will be interesting to a number of developers. Details under the cut.

    PVS-Studio now uses an external Microsoft Visual C ++ preprocessor, which is a significant drawback. The Visual C ++ preprocessor is extremely slow and has errors that we cannot fix. Yes, it doesn’t surprise you that the preprocessor works badly, but this does not prevent you from quickly and correctly compiling files in Visual Studio. I don’t know exactly how cl.exe works, but from indirect evidence I can assume that there are two completely different preprocessing algorithms used to compile and create * .i files. Apparently, this is more convenient in terms of compiler architecture.

    Recently, we have been actively looking for an alternative solution for file preprocessing. And apparently, our choice will focus on the preprocessor implemented in Clang. Preliminary measurements show that it works several times faster than cl.exe, which is very nice and useful.

    Clang is a new compiler for C-like languages ​​(C, C ++, Objective-C, Objective-C ++, in the process support for C ++ 11). The development is sponsored by Apple. One of the strengths of the Clang compiler is the large number of rules for static code analysis. In practice, Clang is already used by Apple as a static analysis tool.

    Clang was originally designed to maximize the preservation of information during the compilation process [1]. This feature allows Clang to create detailed context-oriented error messages that are understandable for both programmers and development environments. The modular design of the compiler allows you to use it as part of the development environment for syntax highlighting and refactoring.

    So, for the good, PVS-Studio, perhaps, should be based on Clang, and not on VivaCore [2]. But now it’s too late and everything is not so clear. In this case, we would depend too much on the capabilities of a third-party library. And they are not in a hurry to support Microsoft Specific in the Clang project.

    However, we were distracted. It is most interesting to check one code analyzer with another code analyzer. We did this, since we had already begun to study the Clang project anyway.

    Unfortunately, a full comparison is impossible. It would be great to check out Clang with PVS-Studio and vice versa. And then calculate the number of errors found per one thousand lines. The trouble is that we can check Clang, but he doesn’t. In Clang, MSVC support is experimental. Plus, the matter is complicated by the fact that the features of C ++ 11 are already used in PVS-Studio. A simple attempt to compile leads to errors like 'These language extensions are not yet supported.'

    We'll have to limit ourselves to the fact that we managed to find PVS-Studio in the Clang code. Naturally, what will be described in the article is not all the errors that were found. Clang is about 50 megabytes of source code. I will notify the developers of the defects found and, if they are interested, they will be able to check their project and study the entire list of potential errors. However, they heard about us and discussed some of our diagnostics.

    Let's see what we found interesting. Almost all errors found are Copy-Paste errors.

    Error Copy-Paste N1


    static SDValue PerformSELECTCombine(...)
    {
      ...
      SDValue LHS = N->getOperand(1);
      SDValue RHS = N->getOperand(2);
      ...
      if (!UnsafeFPMath &&
          !DAG.isKnownNeverZero(LHS) && !DAG.isKnownNeverZero(RHS))
      ...
      if (!UnsafeFPMath &&
          !DAG.isKnownNeverZero(LHS) && !DAG.isKnownNeverZero(LHS))
      ...
    }

    PVS-Studio Diagnostics: V501 There are identical sub-expressions '! DAG.isKnownNeverZero (LHS)' to the left and to the right of the '&&' operator. LLVMX86CodeGen x86isellowering.cpp 11635

    At the beginning, the code simultaneously works with LHS and RHS, and then only with LHS. The reason for this, apparently, was a typo or a copied fragment of a line. As I understand it, in the second case, the RHS variable must also be present.

    Error Copy-Paste N2


    MapTy PerPtrTopDown;
    MapTy PerPtrBottomUp;
    void clearBottomUpPointers() {
      PerPtrTopDown.clear();
    }
    void clearTopDownPointers() {
      PerPtrTopDown.clear();
    }

    PVS-Studio Diagnostics: V524 It is odd that the body of 'clearTopDownPointers' function is fully equivalent to the body of 'clearBottomUpPointers' function (ObjCARC.cpp, line 1318). LLVMScalarOpts objcarc.cpp 1322

    Classic Copy-Paste. The function has been copied. Her name was changed, but not the body itself. The correct option:
    void clearBottomUpPointers() {
      PerPtrBottomUp.clear();
    }


    Error Copy-Paste N3


    static Value *SimplifyICmpInst(...) {
      ...
      case Instruction::Shl: {
        bool NUW = LBO->hasNoUnsignedWrap() && LBO->hasNoUnsignedWrap();
        bool NSW = LBO->hasNoSignedWrap() && RBO->hasNoSignedWrap();
      ...
    }

    PVS-Studio Diagnostics: V501 There are identical sub-expressions 'LBO-> hasNoUnsignedWrap ()' to the left and to the right of the '&&' operator. LLVMAnalysis instructionsimplify.cpp 1891

    Most likely, they wanted to write like this:
    bool NUW = LBO->hasNoUnsignedWrap() && RBO->hasNoUnsignedWrap();


    Copy-Paste N4 Error


    Sema::DeduceTemplateArguments(...)
    {
      ...
      if ((P->isPointerType() && A->isPointerType()) ||
          (P->isMemberPointerType() && P->isMemberPointerType()))
      ...
    }

    Diagnostics of PVS-Studio. V501 There are identical sub-expressions 'P-> isMemberPointerType ()' to the left and to the right of the '&&' operator. clangSema sematemplatededuction.cpp 3240

    This is just a case, unlike the fifth example. It is clear that they wanted to write like this:

    (P-> isMemberPointerType () && A-> isMemberPointerType ())

    Error Copy-Paste N5


    static bool CollectBSwapParts(...) {
      ...
      // 2) The input and ultimate destinations must line up: if byte 3 of an i32
      // is demanded, it needs to go into byte 0 of the result.  This means that the
      // byte needs to be shifted until it lands in the right byte bucket.  The
      // shift amount depends on the position: if the byte is coming from the high
      // part of the value (e.g. byte 3) then it must be shifted right.  If from the
      // low part, it must be shifted left.
      unsigned DestByteNo = InputByteNo + OverallLeftShift;
      if (InputByteNo < ByteValues.size()/2) {
        if (ByteValues.size()-1-DestByteNo != InputByteNo)
          return true;
      } else {
        if (ByteValues.size()-1-DestByteNo != InputByteNo)
          return true;
      }
      ...
    }

    PVS-Studio Diagnostics: V523 The 'then' statement is equivalent to the 'else' statement. LLVMInstCombine instcombineandorxor.cpp 1387

    This is an example of a difficult situation. I'm not even sure the end of the error is here or not. Commentary doesn't help me either. Here, the creators must analyze the code. But still, I think that here is an example of an error with Copy-Paste.

    I think that Copy-Paste is enough for now, because there are a number of other types of errors.

    Here, for example, is a classic error in switch


    void llvm::EmitAnyX86InstComments(...) {
      ...
      case X86::VPERMILPSri:
        DecodeVPERMILPSMask(4, MI->getOperand(2).getImm(),
                            ShuffleMask);
        Src1Name = getRegName(MI->getOperand(0).getReg());
      case X86::VPERMILPSYri:
        DecodeVPERMILPSMask(8, MI->getOperand(2).getImm(),
                            ShuffleMask);
        Src1Name = getRegName(MI->getOperand(0).getReg());
        break;
      ... 
    }

    PVS-Studio diagnostics: V519 The 'Src1Name' variable is assigned values ​​twice successively. Perhaps this is a mistake. Check lines: 211, 215. LLVMX86AsmPrinter x86instcomments.cpp 215

    In large code, such errors are simply inevitable. The switch statement is very dangerous. You can know as much as you like how it works, but still forget this damn 'break'.

    There are a number of if not mistakes, then clearly stupid or suspicious actions.

    Stupid action N1, which may be a mistake


    AsmToken AsmLexer::LexLineComment() {
      // FIXME: This is broken if we happen to a comment at the end of a file, which
      // was .included, and which doesn't end with a newline.
      int CurChar = getNextChar();
      while (CurChar != '\n' && CurChar != '\n' && CurChar != EOF)
        CurChar = getNextChar();
      ...
    }

    Diagnostics PVS-Studio: V501 There are identical sub-expressions to the left and to the right of the '&&' operator: CurChar! = '\ N' && CurChar! = '\ N' LLVMMCParser asmlexer.cpp 149

    Most likely, the second checking CurChar! = '\ n' is superfluous here. But perhaps this is a mistake and should be written:
    while (CurChar != '\n' && CurChar != '\r' && CurChar != EOF)


    The stupid action of N2, which is definitely not a mistake


    std::string ASTContext::getObjCEncodingForBlock(const BlockExpr *Expr) const {
      ...
      ParmOffset = PtrSize;
      // Argument types.
      ParmOffset = PtrSize;
      ...
    }

    PVS-Studio Diagnostics: V519 The 'ParmOffset' variable is assigned values ​​twice successively. Perhaps this is a mistake. Check lines: 3953, 3956. clangAST astcontext.cpp 3956

    The stupid action of N3, which I find it difficult to describe


    static unsigned getTypeOfMaskedICmp(...)
    {
      ...
      result |= (icmp_eq ? (FoldMskICmp_Mask_AllZeroes |
                            FoldMskICmp_Mask_AllZeroes |
                            FoldMskICmp_AMask_Mixed |
                            FoldMskICmp_BMask_Mixed)
                         : (FoldMskICmp_Mask_NotAllZeroes |
                            FoldMskICmp_Mask_NotAllZeroes |
                            FoldMskICmp_AMask_NotMixed |
                            FoldMskICmp_BMask_NotMixed));
      ...
    }

    PVS-Studio Diagnostics:

    V501 There are identical sub-expressions 'FoldMskICmp_Mask_AllZeroes' to the left and to the right of the '|' operator. LLVMInstCombine instcombineandorxor.cpp 505

    V501 There are identical sub-expressions 'FoldMskICmp_Mask_NotAllZeroes' to the left and to the right of the '|' operator. LLVMInstCombine instcombineandorxor.cpp 509

    I do not know if these are simple duplicates or another condition must be written. Difficult to assume that it should really be here.

    There is code that is simply potentially dangerous.


    This code will work as long as the two enum have a similar structure.
    enum LegalizeAction {
      Legal,      // The target natively supports this operation.
      Promote,    // This operation should be executed in a larger type.
      Expand,     // Try to expand this to other ops, otherwise use a libcall.
      Custom      // Use the LowerOperation hook to implement custom lowering.
    };
    enum LegalizeTypeAction {
      TypeLegal,           // The target natively supports this type.
      TypePromoteInteger,  // Replace this integer with a larger one.
      TypeExpandInteger,   // Split this integer into two of half the size.
      ...
    };
    LegalizeTypeAction getTypeAction(LLVMContext &Context, EVT VT) const;
    EVT getTypeToExpandTo(LLVMContext &Context, EVT VT) const {
      ...
      switch (getTypeAction(Context, VT)) {
      case Legal:
        return VT;
      case Expand:
      ...
    }

    PVS-Studio diagnostics:

    V556 The values ​​of different enum types are compared: switch (ENUM_TYPE_A) {case ENUM_TYPE_B: ...}. LLVMAsmPrinter targetlowering.h 268

    V556 The values ​​of different enum types are compared: switch (ENUM_TYPE_A) {case ENUM_TYPE_B: ...}. LLVMAsmPrinter targetlowering.h 270

    Having TypeLegal is consonant with Legal, and the name TypeExpandInteger is consonant with Expand. This was the reason for the typo. The code only works because it’s lucky and the values ​​of these names match.

    Conclusion


    Scary when errors are in the compiler? :)

    PS


    I think I hastened to praise Clang. Just stumbled upon a situation where during preprocessing it spoils the code. We have this fragment in atlcore.h:
    ATLASSUME(p != NULL); // Too expensive to check separately here
    if (*p == '\0')  // ::CharNextA won't increment if we're at a \0 already
      return const_cast<_CharType*>(p+1);

    The Clang preprocessor turns this into:
    do { ((void)0);
    #pragma warning(push)
    #pragma warning(disable : 4548)
     do {__noop(!!(p != 0));} while((0,0)
    #pragma warning(pop)
     ); } while(0); // Too expensive to check separately here if (*p == '\0') // ::CharNextA won't increment if we're at a \0 already
      return const_cast<_CharType*>(p+1);

    He placed the 'if' statement after the comment and it turned out that "if (* p == '\ 0')" is now also a comment. As a result, we have an incorrect code. Eh, there is no happiness in the life of programmers.

    Additional links.


    1. Wikipedia Clang. http://www.viva64.com/go.php?url=713
    2. VivaCore library. http://www.viva64.com/en/vivacore-library/

    Also popular now: