Find bugs in LLVM 8 using the PVS-Studio analyzer

    PVS-Studio and LLVM 8.0.0

    More than two years have passed since the last verification of the LLVM project code using our PVS-Studio analyzer. Let's make sure that the PVS-Studio analyzer is still the leading tool for detecting errors and potential vulnerabilities. To do this, check and find new errors in the LLVM 8.0.0 release.

    Article to be written


    To be honest, I did not want to write this article. It is not interesting to write about a project that we have already repeatedly tested ( 1 , 2 , 3 ). It’s better to write about something new, but I have no choice.

    Every time a new version of LLVM is released or Clang Static Analyzer is updated , questions of the following type appear in our mail:

    Look, the new version of Clang Static Analyzer has learned to find new errors! It seems to me that the relevance of using PVS-Studio is decreasing. Clang finds more errors than before and catches up with the capabilities of PVS-Studio. What do you think about this?

    To this I always want to answer something in the spirit:

    We, too, are not sitting idle!We have significantly improved the capabilities of the PVS-Studio analyzer. So don’t worry, we continue to lead, as before.

    Unfortunately, this is a bad answer. There are no proofs in it. And that is why I am writing this article now. So, the LLVM project has once again been tested and a variety of errors have been found in it. Those that seemed interesting to me, I will now demonstrate. These errors cannot be found by the Clang Static Analyzer (or it is extremely inconvenient to do with it). And we can. And I found and wrote out all these errors in one evening.

    But the writing of the article dragged on for several weeks. I couldn’t force myself to arrange all this in the form of text :).

    By the way, if you are interested in what technologies are used in the PVS-Studio analyzer to detect errors and potential vulnerabilities, then I suggest that you familiarize yourself with this note .

    New and old diagnostics


    As already noted, about two years ago, the LLVM project was once again verified, and the errors found were corrected. Now in this article a new portion of errors will be presented. Why were new bugs found? There are 3 reasons for this:

    1. The LLVM project develops, the old code changes in it, and a new one appears. Naturally, there are new errors in the modified and written code. This demonstrates well that static analysis should be applied regularly, and not from case to case. Our articles show the capabilities of the PVS-Studio analyzer well, but this has nothing to do with improving the quality of the code and reducing the cost of fixing errors. Use a static code analyzer regularly!
    2. We are finalizing and improving existing diagnostics. Therefore, the analyzer can detect errors that were not noticed during previous checks.
    3. PVS-Studio introduced new diagnostics, which were not 2 years ago. I decided to separate them into a separate section to clearly demonstrate the development of PVS-Studio.

    Defects identified by diagnostics that existed 2 years ago


    Fragment N1: Copy-Paste

    static bool ShouldUpgradeX86Intrinsic(Function *F, StringRef Name) {
      if (Name == "addcarryx.u32" || // Added in 8.0
        ....
        Name == "avx512.mask.cvtps2pd.128" || // Added in 7.0
        Name == "avx512.mask.cvtps2pd.256" || // Added in 7.0
        Name == "avx512.cvtusi2sd" || // Added in 7.0
        Name.startswith("avx512.mask.permvar.") || // Added in 7.0     // <=
        Name.startswith("avx512.mask.permvar.") || // Added in 7.0     // <=
        Name == "sse2.pmulu.dq" || // Added in 7.0
        Name == "sse41.pmuldq" || // Added in 7.0
        Name == "avx2.pmulu.dq" || // Added in 7.0
      ....
    }

    PVS-Studio Warning: V501 [CWE-570] There are identical sub-expressions 'Name.startswith ("avx512.mask.permvar.")' To the left and to the right of the '||' operator. AutoUpgrade.cpp 73 It is

    double checked that the name begins with the substring “avx512.mask.permvar.”. In the second test, they clearly wanted to write something else, but forgot to fix the copied text.

    Fragment N2: Typo

    enum CXNameRefFlags {
      CXNameRange_WantQualifier = 0x1,
      CXNameRange_WantTemplateArgs = 0x2,
      CXNameRange_WantSinglePiece = 0x4
    };
    void AnnotateTokensWorker::HandlePostPonedChildCursor(
        CXCursor Cursor, unsigned StartTokenIndex) {
      const auto flags = CXNameRange_WantQualifier | CXNameRange_WantQualifier;
      ....
    }

    PVS-Studio Warning: V501 There are identical sub-expressions 'CXNameRange_WantQualifier' to the left and to the right of the '|' operator. CIndex.cpp 7245

    Because of a typo, the same named constant CXNameRange_WantQualifier is used twice .

    Snippet N3: Confusion over Operator Priorities

    int PPCTTIImpl::getVectorInstrCost(unsigned Opcode, Type *Val, unsigned Index) {
      ....
      if (ISD == ISD::EXTRACT_VECTOR_ELT && Index == ST->isLittleEndian() ? 1 : 0)
        return 0;
      ....
    }

    PVS-Studio Warning: V502 [CWE-783] Perhaps the '?:' Operator works in a different way than it was expected. The '?:' Operator has a lower priority than the '==' operator. PPCTargetTransformInfo.cpp 404

    In my opinion, this is a very beautiful mistake. Yes, I know that I have strange ideas about beauty :).

    Now, according to the priorities of the operators , the expression is calculated as follows:

    (ISD == ISD::EXTRACT_VECTOR_ELT && (Index == ST->isLittleEndian())) ? 1 : 0

    From a practical point of view, such a condition does not make sense, since it can be reduced to:

    (ISD == ISD::EXTRACT_VECTOR_ELT && Index == ST->isLittleEndian())

    This is a clear mistake. Most likely, 0/1 wanted to compare with the Index variable . To fix the code, add brackets around the ternary operator:

    if (ISD == ISD::EXTRACT_VECTOR_ELT && Index == (ST->isLittleEndian() ? 1 : 0))

    By the way, the ternary operator is very dangerous and provokes logical errors. Be very careful with it and do not be greedy to put parentheses. I considered this topic in more detail here in the chapter “Fear the operator ?: and enclose it in parentheses”.

    Fragment N4, N5: Null pointer

    Init *TGParser::ParseValue(Record *CurRec, RecTy *ItemType, IDParseMode Mode) {
      ....
      TypedInit *LHS = dyn_cast(Result);
      ....
      LHS = dyn_cast(
        UnOpInit::get(UnOpInit::CAST, LHS, StringRecTy::get())
          ->Fold(CurRec));
      if (!LHS) {
        Error(PasteLoc, Twine("can't cast '") + LHS->getAsString() +
                        "' to string");
        return nullptr;
      }
      ....
    }

    PVS-Studio Warning: V522 [CWE-476] Dereferencing of the null pointer 'LHS' might take place. TGParser.cpp 2152

    If the LHS pointer is null, a warning should be issued. However, instead this will dereference this very null pointer: LHS-> getAsString () .

    This is a very typical situation when the error is hidden in the error handler, since no one is testing them. Static analyzers check all reachable code, no matter how often it is used. This is a very good example of how static analysis complements other testing and error protection techniques.

    A similar error processing the RHS pointerallowed in the code below: V522 [CWE-476] Dereferencing of the null pointer 'RHS' might take place. TGParser.cpp 2186

    Snippet N6: Using the pointer after moving

    static Expected
    ExtractBlocks(....)
    {
      ....
      std::unique_ptr ProgClone = CloneModule(BD.getProgram(), VMap);
      ....
      BD.setNewProgram(std::move(ProgClone));                                // <=
      MiscompiledFunctions.clear();
      for (unsigned i = 0, e = MisCompFunctions.size(); i != e; ++i) {
        Function *NewF = ProgClone->getFunction(MisCompFunctions[i].first);  // <=
        assert(NewF && "Function not found??");
        MiscompiledFunctions.push_back(NewF);
      }
      ....
    }

    PVS-Studio Warning: V522 [CWE-476] Dereferencing of the null pointer 'ProgClone' might take place. Miscompilation.cpp 601

    At the beginning, the smart pointer ProgClone ceases to own the object:

    BD.setNewProgram(std::move(ProgClone));

    In fact, now ProgClone is a null pointer. Therefore, a dereferencing of the null pointer should occur just below:

    Function *NewF = ProgClone->getFunction(MisCompFunctions[i].first);

    But, in fact, this will not happen! Note that the loop is not actually running.

    At the beginning, the MiscompiledFunctions container is cleared:

    MiscompiledFunctions.clear();

    Next, the size of this container is used in the loop condition:

    for (unsigned i = 0, e = MisCompFunctions.size(); i != e; ++i) {

    It is easy to see that the loop does not start. I think this is also a mistake, and the code should be written differently.

    It seems that we met that very famous parity of errors! One mistake disguises another :).

    Fragment N7: Using the cursor after moving

    static Expected TestOptimizer(BugDriver &BD, std::unique_ptr Test,
                                        std::unique_ptr Safe) {
      outs() << "  Optimizing functions being tested: ";
      std::unique_ptr Optimized =
          BD.runPassesOn(Test.get(), BD.getPassesToRun());
      if (!Optimized) {
        errs() << " Error running this sequence of passes"
               << " on the input program!\n";
        BD.setNewProgram(std::move(Test));                       // <=
        BD.EmitProgressBitcode(*Test, "pass-error", false);      // <=
        if (Error E = BD.debugOptimizerCrash())
          return std::move(E);
        return false;
      }
      ....
    }

    PVS-Studio Warning: V522 [CWE-476] Dereferencing of the null pointer 'Test' might take place. Miscompilation.cpp 709

    Again the same situation. At the beginning, the contents of the object are moved, and then it is used as if nothing had happened. I increasingly see this situation in program code, after the semantics of movement appeared in C ++. For this I love the C ++ language! There are more and more new ways to shoot your own leg. PVS-Studio analyzer will always work :).

    Fragment N8: Null Pointer

    void FunctionDumper::dump(const PDBSymbolTypeFunctionArg &Symbol) {
      uint32_t TypeId = Symbol.getTypeId();
      auto Type = Symbol.getSession().getSymbolById(TypeId);
      if (Type)
        Printer << "";
      else
        Type->dump(*this);
    }

    PVS-Studio Warning: V522 [CWE-476] Dereferencing of the null pointer 'Type' might take place. PrettyFunctionDumper.cpp 233

    In addition to error handlers, functions for debugging printouts are usually not tested. Before us is just such a case. The function is waiting for a user who, instead of solving his problems, will be forced to fix it.

    Right:

    if (Type)
      Type->dump(*this);
    else
      Printer << "";

    Fragment N9: Null Pointer

    void SearchableTableEmitter::collectTableEntries(
        GenericTable &Table, const std::vector &Items) {
      ....
      RecTy *Ty = resolveTypes(Field.RecType, TI->getType());
      if (!Ty)                                                              // <=
        PrintFatalError(Twine("Field '") + Field.Name + "' of table '" +
                        Table.Name + "' has incompatible type: " +
                        Ty->getAsString() + " vs. " +                       // <=
                        TI->getType()->getAsString());
       ....
    }

    PVS-Studio Warning: V522 [CWE-476] Dereferencing of the null pointer 'Ty' might take place. SearchableTableEmitter.cpp 614

    I think everything is clear and requires no explanation.

    Fragment N10: Typo

    bool FormatTokenLexer::tryMergeCSharpNullConditionals() {
      ....
      auto &Identifier = *(Tokens.end() - 2);
      auto &Question = *(Tokens.end() - 1);
      ....
      Identifier->ColumnWidth += Question->ColumnWidth;
      Identifier->Type = Identifier->Type;                    // <=
      Tokens.erase(Tokens.end() - 1);
      return true;
    }

    PVS-Studio Warning: V570 The 'Identifier-> Type' variable is assigned to itself. FormatTokenLexer.cpp 249

    It makes no sense to assign a variable to itself. Most likely they wanted to write:

    Identifier->Type = Question->Type;

    Fragment N11: Suspicious break
    void SystemZOperand::print(raw_ostream &OS) const {
      switch (Kind) {
        break;
      case KindToken:
        OS << "Token:" << getToken();
        break;
      case KindReg:
        OS << "Reg:" << SystemZInstPrinter::getRegisterName(getReg());
        break;
      ....
    }

    PVS-Studio Warning: V622 [CWE-478] Consider inspecting the 'switch' statement. It's possible that the first 'case' operator is missing. SystemZAsmParser.cpp 652

    In the beginning there is a very suspicious break statement . Have you forgotten to write something else here?

    Fragment N12: Checking the pointer after dereferencing

    InlineCost AMDGPUInliner::getInlineCost(CallSite CS) {
      Function *Callee = CS.getCalledFunction();
      Function *Caller = CS.getCaller();
      TargetTransformInfo &TTI = TTIWP->getTTI(*Callee);
      if (!Callee || Callee->isDeclaration())
        return llvm::InlineCost::getNever("undefined callee");
      ....
    }

    PVS-Studio Warning: V595 [CWE-476] The 'Callee' pointer was utilized before it was verified against nullptr. Check lines: 172, 174. AMDGPUInline.cpp 172

    The Callee pointer at the beginning is dereferenced at the time the getTTI function is called .

    And then it turns out that this pointer should be checked for nullptr equality :

    if (!Callee || Callee->isDeclaration())

    But it's already late ...

    Fragment N13 - N ...: Checking the pointer after dereferencing

    The situation discussed in the previous code fragment is not unique. She is found here:

    static Value *optimizeDoubleFP(CallInst *CI, IRBuilder<> &B,
                                   bool isBinary, bool isPrecise = false) {
      ....
      Function *CalleeFn = CI->getCalledFunction();
      StringRef CalleeNm = CalleeFn->getName();                 // <=
      AttributeList CalleeAt = CalleeFn->getAttributes();
      if (CalleeFn && !CalleeFn->isIntrinsic()) {               // <=
      ....
    }

    PVS-Studio Warning: V595 [CWE-476] The 'CalleeFn' pointer was utilized before it was verified against nullptr. Check lines: 1079, 1081. SimplifyLibCalls.cpp 1079

    And here:

    void Sema::InstantiateAttrs(const MultiLevelTemplateArgumentList &TemplateArgs,
                                const Decl *Tmpl, Decl *New,
                                LateInstantiatedAttrVec *LateAttrs,
                                LocalInstantiationScope *OuterMostScope) {
      ....
      NamedDecl *ND = dyn_cast(New);
      CXXRecordDecl *ThisContext =
        dyn_cast_or_null(ND->getDeclContext());         // <=
      CXXThisScopeRAII ThisScope(*this, ThisContext, Qualifiers(),
                                 ND && ND->isCXXInstanceMember());     // <=
      ....
    }

    PVS-Studio Warning: V595 [CWE-476] The 'ND' pointer was utilized before it was verified against nullptr. Check lines: 532, 534. SemaTemplateInstantiateDecl.cpp 532

    And here:

    • V595 [CWE-476] The 'U' pointer was utilized before it was verified against nullptr. Check lines: 404, 407. DWARFFormValue.cpp 404
    • V595 [CWE-476] The 'ND' pointer was utilized before it was verified against nullptr. Check lines: 2149, 2151. SemaTemplateInstantiate.cpp 2149

    And then it became of no interest to me to study the warnings with the number V595. So I don’t know if there are any similar errors besides those listed here. Most likely there is.

    Fragment N17, N18: Suspicious shift

    static inline bool processLogicalImmediate(uint64_t Imm, unsigned RegSize,
                                               uint64_t &Encoding) {
      ....
      unsigned Size = RegSize;
      ....
      uint64_t NImms = ~(Size-1) << 1;
      ....
    }

    Предупреждение PVS-Studio: V629 [CWE-190] Consider inspecting the '~(Size — 1) << 1' expression. Bit shifting of the 32-bit value with a subsequent expansion to the 64-bit type. AArch64AddressingModes.h 260

    Возможно, это не ошибка, и код работает именно так, как и задумано. Но это явно очень подозрительное место, и его нужно проверить.

    Допустим, переменная Size равна 16, и тогда автор кода планировал получить в переменной NImms значение:

    1111111111111111111111111111111111111111111111111111111111100000

    Однако, на самом деле, получится значение:

    0000000000000000000000000000000011111111111111111111111111100000

    The fact is that all calculations occur using the 32-bit unsigned type. And only then, this 32-bit unsigned type will be implicitly expanded to uint64_t . In this case, the most significant bits will be zero.

    You can fix the situation like this:

    uint64_t NImms = ~static_cast(Size-1) << 1;

    Similar situation: V629 [CWE-190] Consider inspecting the 'Immr << 6' expression. Bit shifting of the 32-bit value with a subsequent expansion to the 64-bit type. AArch64AddressingModes.h 269

    Snippet N19: Missing else keyword ?

    void AMDGPUAsmParser::cvtDPP(MCInst &Inst, const OperandVector &Operands) {
      ....
      if (Op.isReg() && Op.Reg.RegNo == AMDGPU::VCC) {
        // VOP2b (v_add_u32, v_sub_u32 ...) dpp use "vcc" token.
        // Skip it.
        continue;
      } if (isRegOrImmWithInputMods(Desc, Inst.getNumOperands())) {    // <=
        Op.addRegWithFPInputModsOperands(Inst, 2);
      } else if (Op.isDPPCtrl()) {
        Op.addImmOperands(Inst, 1);
      } else if (Op.isImm()) {
        // Handle optional arguments
        OptionalIdx[Op.getImmTy()] = I;
      } else {
        llvm_unreachable("Invalid operand type");
      }
      ....
    }

    PVS-Studio Warning: V646 [CWE-670] Consider inspecting the application's logic. It's possible that 'else' keyword is missing. AMDGPUAsmParser.cpp 5655

    No errors here. Since the then-block of the first if ends with continue , it doesn’t matter if there is an else keyword or not. In any case, the code will work the same. However, missing an else makes the code more obscure and dangerous. If in the future continue disappears, then the code will start to work in a completely different way. In my opinion, it is better to add else .

    Fragment N20: Four typos of the same type

    LLVM_DUMP_METHOD void Symbol::dump(raw_ostream &OS) const {
      std::string Result;
      if (isUndefined())
        Result += "(undef) ";
      if (isWeakDefined())
        Result += "(weak-def) ";
      if (isWeakReferenced())
        Result += "(weak-ref) ";
      if (isThreadLocalValue())
        Result += "(tlv) ";
      switch (Kind) {
      case SymbolKind::GlobalSymbol:
        Result + Name.str();                        // <=
        break;
      case SymbolKind::ObjectiveCClass:
        Result + "(ObjC Class) " + Name.str();      // <=
        break;
      case SymbolKind::ObjectiveCClassEHType:
        Result + "(ObjC Class EH) " + Name.str();   // <=
        break;
      case SymbolKind::ObjectiveCInstanceVariable:
        Result + "(ObjC IVar) " + Name.str();       // <=
        break;
      }
      OS << Result;
    }

    PVS-Studio warnings:

    • V655 [CWE-480] The strings were concatenated but are not utilized. Consider inspecting the 'Result + Name.str ()' expression. Symbol.cpp 32
    • V655 [CWE-480] The strings were concatenated but are not utilized. Consider inspecting the 'Result + "(ObjC Class)" + Name.str ()' expression. Symbol.cpp 35
    • V655 [CWE-480] The strings were concatenated but are not utilized. Consider inspecting the 'Result + "(ObjC Class EH)" + Name.str ()' expression. Symbol.cpp 38
    • V655 [CWE-480] The strings were concatenated but are not utilized. Consider inspecting the 'Result + "(ObjC IVar)" + Name.str ()' expression. Symbol.cpp 41

    By chance, the + operator is used instead of the + = operator. The result is meaningless designs.

    Fragment N21: Undefined behavior

    static void getReqFeatures(std::map &FeaturesMap,
                               const std::vector &ReqFeatures) {
      for (auto &R : ReqFeatures) {
        StringRef AsmCondString = R->getValueAsString("AssemblerCondString");
        SmallVector Ops;
        SplitString(AsmCondString, Ops, ",");
        assert(!Ops.empty() && "AssemblerCondString cannot be empty");
        for (auto &Op : Ops) {
          assert(!Op.empty() && "Empty operator");
          if (FeaturesMap.find(Op) == FeaturesMap.end())
            FeaturesMap[Op] = FeaturesMap.size();
        }
      }
    }

    Try to find the dangerous code yourself. And this is a picture for distraction, so as not to immediately look at the answer:

    Hmmm ...


    PVS-Studio Warning: V708 [CWE-758] Dangerous construction is used: 'FeaturesMap [Op] = FeaturesMap.size ()', where 'FeaturesMap' is of 'map' class. This may lead to undefined behavior. RISCVCompressInstEmitter.cpp 490

    Problem line:

    FeaturesMap[Op] = FeaturesMap.size();

    If the Op element is not found, a new element is created in the map and the number of elements in this map is written there. It's just not known whether the size function will be called before or after adding a new element.

    Fragment N22-N24: Reassignments

    Error MachOObjectFile::checkSymbolTable() const {
      ....
      } else {
        MachO::nlist STE = getSymbolTableEntry(SymDRI);
        NType = STE.n_type;                              // <=
        NType = STE.n_type;                              // <=
        NSect = STE.n_sect;
        NDesc = STE.n_desc;
        NStrx = STE.n_strx;
        NValue = STE.n_value;
      }
      ....
    }

    PVS-Studio Warning: V519 [CWE-563] The 'NType' variable is assigned values ​​twice successively. Perhaps this is a mistake. Check lines: 1663, 1664. MachOObjectFile.cpp 1664

    I think there is no real error here. Just superfluous repeated assignment. But still a blunder.

    Similarly:

    • V519 [CWE-563] The 'B.NDesc' variable is assigned values ​​twice successively. Perhaps this is a mistake. Check lines: 1488, 1489. llvm-nm.cpp 1489
    • V519 [CWE-563] The variable is assigned values ​​twice successively. Perhaps this is a mistake. Check lines: 59, 61. coff2yaml.cpp 61

    Fragment N25-N27: More

    reassignments Now let's look at a slightly different reassignment option.

    bool Vectorizer::vectorizeLoadChain(
        ArrayRef Chain,
        SmallPtrSet *InstructionsProcessed) {
      ....
      unsigned Alignment = getAlignment(L0);
      ....
      unsigned NewAlign = getOrEnforceKnownAlignment(L0->getPointerOperand(),
                                                     StackAdjustedAlignment,
                                                     DL, L0, nullptr, &DT);
      if (NewAlign != 0)
        Alignment = NewAlign;
      Alignment = NewAlign;
      ....
    }

    PVS-Studio Warning: V519 [CWE-563] The 'Alignment' variable is assigned values ​​twice successively. Perhaps this is a mistake. Check lines: 1158, 1160. LoadStoreVectorizer.cpp 1160

    This is a very strange code that seems to contain a logical error. At the beginning, the Alignment variable is assigned a value depending on the condition. And then the assignment occurs again, but now without any verification.

    Similar situations can be seen here:

    • V519 [CWE-563] The 'Effects' variable is assigned values ​​twice successively. Perhaps this is a mistake. Check lines: 152, 165. WebAssemblyRegStackify.cpp 165
    • V519 [CWE-563] The 'ExpectNoDerefChunk' variable is assigned values ​​twice successively. Perhaps this is a mistake. Check lines: 4970, 4973. SemaType.cpp 4973

    Fragment N28: Always a true condition

    static int readPrefixes(struct InternalInstruction* insn) {
      ....
      uint8_t byte = 0;
      uint8_t nextByte;
      ....
      if (byte == 0xf3 && (nextByte == 0x88 || nextByte == 0x89 ||
                           nextByte == 0xc6 || nextByte == 0xc7)) {
        insn->xAcquireRelease = true;
        if (nextByte != 0x90) // PAUSE instruction support             // <=
          break;
      }
      ....
    }

    PVS-Studio warning : V547 [CWE-571] Expression 'nextByte! = 0x90' is always true. X86DisassemblerDecoder.cpp 379

    Verification does not make sense. The nextByte variable is always not equal to 0x90 , which follows from the previous check. This is some kind of logical mistake.

    Fragment N29 - N ...: Always True / False Conditions

    The analyzer gives a lot of warnings that the whole condition ( V547 ) or part of it ( V560) is always true or false. Often these are not real mistakes, but simply inaccurate code, the result of the deployment of macros and the like. Nevertheless, it makes sense to look at all these warnings, as from time to time there are real logical errors. For example, this piece of code is suspicious:

    static DecodeStatus DecodeGPRPairRegisterClass(MCInst &Inst, unsigned RegNo,
                                       uint64_t Address, const void *Decoder) {
      DecodeStatus S = MCDisassembler::Success;
      if (RegNo > 13)
        return MCDisassembler::Fail;
      if ((RegNo & 1) || RegNo == 0xe)
         S = MCDisassembler::SoftFail;
      ....
    }

    PVS-Studio Warning: V560 [CWE-570] A part of conditional expression is always false: RegNo == 0xe. ARMDisassembler.cpp 939 The

    constant 0xE is the value 14 in decimal. Checking RegNo == 0xe does not make sense, because if RegNo> 13 , then the function will complete its execution.

    There were many other warnings with identifiers V547 and V560, but, as with V595 , I was not interested in studying these warnings. It was already clear that I had enough material to write an article :). Therefore, it is not known how many errors of this type can be detected in LLVM using PVS-Studio.

    I will give an example why it is boring to study these responses. The analyzer is absolutely right in issuing a warning for the following code. But this is not a mistake.

    bool UnwrappedLineParser::parseBracedList(bool ContinueOnSemicolons,
                                              tok::TokenKind ClosingBraceKind) {
      bool HasError = false;
      ....
      HasError = true;
      if (!ContinueOnSemicolons)
        return !HasError;
      ....
    }

    PVS-Studio warning: V547 [CWE-570] Expression '! HasError' is always false. UnwrappedLineParser.cpp 1635

    Fragment N30: ​​Suspicious return

    static bool
    isImplicitlyDef(MachineRegisterInfo &MRI, unsigned Reg) {
      for (MachineRegisterInfo::def_instr_iterator It = MRI.def_instr_begin(Reg),
          E = MRI.def_instr_end(); It != E; ++It) {
        return (*It).isImplicitDef();
      }
      ....
    }

    PVS-Studio Warning: V612 [CWE-670] An unconditional 'return' within a loop. R600OptimizeVectorRegisters.cpp 63

    This is either a mistake, or a specific technique, which is intended to explain something to programmers reading the code. This design does not explain anything to me and looks very suspicious. It’s better not to write like that :).

    Are you tired? Then time to make tea or coffee.

    Coffee


    Defects detected by new diagnostics


    I think 30 triggering of old diagnostics is enough. Let us now see what is interesting can be found with new diagnostics that appeared in the analyzer after a previous check. In total, 66 general purpose diagnostics were added to the C ++ analyzer during this time.

    Fragment N31: Unreachable code

    Error CtorDtorRunner::run() {
      ....
      if (auto CtorDtorMap =
              ES.lookup(JITDylibSearchList({{&JD, true}}), std::move(Names),
                        NoDependenciesToRegister, true))
      {
        ....
        return Error::success();
      } else
        return CtorDtorMap.takeError();
      CtorDtorsByPriority.clear();
      return Error::success();
    }

    PVS-Studio Warning: V779 [CWE-561] Unreachable code detected. It is possible that an error is present. ExecutionUtils.cpp 146

    As you can see, both branches of the if statement end with a call to the return statement . Accordingly, the CtorDtorsByPriority container will never be emptied .

    Snippet N32: Unreachable Code

    bool LLParser::ParseSummaryEntry() {
      ....
      switch (Lex.getKind()) {
      case lltok::kw_gv:
        return ParseGVEntry(SummaryID);
      case lltok::kw_module:
        return ParseModuleEntry(SummaryID);
      case lltok::kw_typeid:
        return ParseTypeIdEntry(SummaryID);                        // <=
        break;                                                     // <=
      default:
        return Error(Lex.getLoc(), "unexpected summary kind");
      }
      Lex.setIgnoreColonInIdentifiers(false);                      // <=
      return false;
    }

    PVS-Studio Warning: V779 [CWE-561] Unreachable code detected. It is possible that an error is present. LLParser.cpp 835

    An interesting situation. Let's look at the beginning of this place:

    return ParseTypeIdEntry(SummaryID);
    break;

    At first glance, it seems that there is no mistake. It seems that the break statement is superfluous here, and you can simply delete it. However, not all so simple.

    The analyzer generates a warning on the lines:

    Lex.setIgnoreColonInIdentifiers(false);
    return false;

    Indeed, this code is unreachable. All cases in switch end with a call to the return statement . And now the pointless lone break doesn't look so harmless! Perhaps one of the branches should end with break , and not with return ?

    Fragment N33: Accidentalization of high bits

    unsigned getStubAlignment() override {
      if (Arch == Triple::systemz)
        return 8;
      else
        return 1;
    }
    Expected
    RuntimeDyldImpl::emitSection(const ObjectFile &Obj,
                                 const SectionRef &Section,
                                 bool IsCode) {
      ....
      uint64_t DataSize = Section.getSize();
      ....
      if (StubBufSize > 0)
        DataSize &= ~(getStubAlignment() - 1);
      ....
    }

    PVS-Studio Warning: V784 The size of the bit mask is less than the size of the first operand. This will cause the loss of higher bits. RuntimeDyld.cpp 815

    Note that the getStubAlignment function returns an unsigned type . We calculate the value of the expression if we assume that the function returns the value 8:

    ~ (getStubAlignment () - 1)

    ~ (8u-1)

    0xFFFFFFF8u

    Now notice that the DataSize variable has a 64-bit unsigned type. It turns out that during the operation DataSize & 0xFFFFFFF88, all thirty-two high-order bits will be reset. Most likely, this is not what the programmer wanted. I suspect that he wanted to calculate: DataSize & 0xFFFFFFFFFFFFFFFFF8u.

    To fix the error, you should write like this:

    DataSize &= ~(static_cast(getStubAlignment()) - 1);

    Or so:

    DataSize &= ~(getStubAlignment() - 1ULL);

    Fragment N34: Failed explicit cast

    template 
    void scaleShuffleMask(int Scale, ArrayRef Mask,
                          SmallVectorImpl &ScaledMask) {
      assert(0 < Scale && "Unexpected scaling factor");
      int NumElts = Mask.size();
      ScaledMask.assign(static_cast(NumElts * Scale), -1);
      ....
    }

    PVS-Studio Warning: V1028 [CWE-190] Possible overflow. Consider casting operands of the 'NumElts * Scale' operator to the 'size_t' type, not the result. X86ISelLowering.h 1577

    Explicit type conversion is used to prevent overflow when multiplying variables of type int . However, explicit casting here does not protect against overflow. At the beginning, the variables will be multiplied, and only then will the 32-bit multiplication result be expanded to the size_t type .

    Fragment N35: Unsuccessful Copy-Paste

    Instruction *InstCombiner::visitFCmpInst(FCmpInst &I) {
      ....
      if (!match(Op0, m_PosZeroFP()) && isKnownNeverNaN(Op0, &TLI)) {
        I.setOperand(0, ConstantFP::getNullValue(Op0->getType()));
        return &I;
      }
      if (!match(Op1, m_PosZeroFP()) && isKnownNeverNaN(Op1, &TLI)) {
        I.setOperand(1, ConstantFP::getNullValue(Op0->getType()));        // <=
        return &I;
      }
      ....
    }

    V778 [CWE-682] Two similar code fragments were found. Perhaps, this is a typo and 'Op1' variable should be used instead of 'Op0'. InstCombineCompares.cpp 5507

    This new interesting diagnostics reveals situations when a piece of code was copied and some names began to change in it, but were not corrected in one place.

    Please note that in the second block they changed Op0 to Op1 . But in one place they didn’t fix it. Most likely, it should have been written like this:

    if (!match(Op1, m_PosZeroFP()) && isKnownNeverNaN(Op1, &TLI)) {
      I.setOperand(1, ConstantFP::getNullValue(Op1->getType()));
      return &I;
    }

    Fragment N36: Confusion in variables

    struct Status {
      unsigned Mask;
      unsigned Mode;
      Status() : Mask(0), Mode(0){};
      Status(unsigned Mask, unsigned Mode) : Mask(Mask), Mode(Mode) {
        Mode &= Mask;
      };
      ....
    };

    PVS-Studio Warning: V1001 [CWE-563] The 'Mode' variable is assigned but is not used by the end of the function. SIModeRegister.cpp 48

    It is very dangerous to give function arguments the same names as class members. Very easy to get confused. Before us is just such a case. This expression does not make sense:

    Mode &= Mask;

    The argument of the function changes. And that’s it. This argument is no longer used. Most likely, it was necessary to write like this:

    Status(unsigned Mask, unsigned Mode) : Mask(Mask), Mode(Mode) {
      this->Mode &= Mask;
    };

    Fragment N37: Confusion in variables

    class SectionBase {
      ....
      uint64_t Size = 0;
      ....
    };
    class SymbolTableSection : public SectionBase {
      ....
    };
    void SymbolTableSection::addSymbol(Twine Name, uint8_t Bind, uint8_t Type,
                                       SectionBase *DefinedIn, uint64_t Value,
                                       uint8_t Visibility, uint16_t Shndx,
                                       uint64_t Size) {
      ....
      Sym.Value = Value;
      Sym.Visibility = Visibility;
      Sym.Size = Size;
      Sym.Index = Symbols.size();
      Symbols.emplace_back(llvm::make_unique(Sym));
      Size += this->EntrySize;
    }

    PVS-Studio Warning: V1001 [CWE-563] The 'Size' variable is assigned but is not used by the end of the function. Object.cpp 424

    The situation is similar to the previous one. It should be written:

    this->Size += this->EntrySize;

    Fragment N38-N47: Pointer forgot to check.

    Earlier, we examined examples of V595 diagnostic triggering . Its essence is that the pointer is dereferenced at the beginning, and only then checked. The young diagnostics of the V1004 is the reverse of its meaning, but it also detects a lot of errors. It identifies situations where the pointer was checked at the beginning, and then forgot to do it. Consider such cases found inside LLVM.

    int getGEPCost(Type *PointeeType, const Value *Ptr,
                   ArrayRef Operands) {
      ....
      if (Ptr != nullptr) {                                            // <=
        assert(....);
        BaseGV = dyn_cast(Ptr->stripPointerCasts());
      }
      bool HasBaseReg = (BaseGV == nullptr);
      auto PtrSizeBits = DL.getPointerTypeSizeInBits(Ptr->getType());  // <=
      ....
    }

    PVS-Studio Warning: V1004 [CWE-476] The 'Ptr' pointer was used unsafely after it was verified against nullptr. Check lines: 729, 738. TargetTransformInfoImpl.h 738

    The Ptr variable can be nullptr , as evidenced by the check:

    if (Ptr != nullptr)

    However, below this pointer is dereferenced without prior verification:

    auto PtrSizeBits = DL.getPointerTypeSizeInBits(Ptr->getType());

    Consider another similar case.

    llvm::DISubprogram *CGDebugInfo::getFunctionFwdDeclOrStub(GlobalDecl GD,
                                                              bool Stub) {
      ....
      auto *FD = dyn_cast(GD.getDecl());
      SmallVector ArgTypes;
      if (FD)                                                                // <=
        for (const ParmVarDecl *Parm : FD->parameters())
          ArgTypes.push_back(Parm->getType());
      CallingConv CC = FD->getType()->castAs()->getCallConv(); // <=
      ....
    }

    PVS-Studio Warning: V1004 [CWE-476] The 'FD' pointer was used unsafely after it was verified against nullptr. Check lines: 3228, 3231. CGDebugInfo.cpp 3231

    Pay attention to the pointer FD . I am sure the problem is clearly visible, and no special explanation is required.

    And further:

    static void computePolynomialFromPointer(Value &Ptr, Polynomial &Result,
                                             Value *&BasePtr,
                                             const DataLayout &DL) {
      PointerType *PtrTy = dyn_cast(Ptr.getType());
      if (!PtrTy) {                                                   // <=
        Result = Polynomial();
        BasePtr = nullptr;
      }
      unsigned PointerBits =
          DL.getIndexSizeInBits(PtrTy->getPointerAddressSpace());     // <=
      ....
    }

    PVS-Studio Warning: V1004 [CWE-476] The 'PtrTy' pointer was used unsafely after it was verified against nullptr. Check lines: 960, 965. InterleavedLoadCombinePass.cpp 965

    How to protect yourself from such errors? Be careful on Code-Review and use the PVS-Studio static analyzer to regularly check your code.

    It makes no sense to bring other code fragments with errors of this type. I will leave only a list of warnings in the article:

    • V1004 [CWE-476] The 'Expr' pointer was used unsafely after it was verified against nullptr. Check lines: 1049, 1078. DebugInfoMetadata.cpp 1078
    • V1004 [CWE-476] The 'PI' pointer was used unsafely after it was verified against nullptr. Check lines: 733, 753. LegacyPassManager.cpp 753
    • V1004 [CWE-476] The 'StatepointCall' pointer was used unsafely after it was verified against nullptr. Check lines: 4371, 4379. Verifier.cpp 4379
    • V1004 [CWE-476] The 'RV' pointer was used unsafely after it was verified against nullptr. Check lines: 2263, 2268. TGParser.cpp 2268
    • V1004 [CWE-476] The 'CalleeFn' pointer was used unsafely after it was verified against nullptr. Check lines: 1081, 1096. SimplifyLibCalls.cpp 1096
    • V1004 [CWE-476] The 'TC' pointer was used unsafely after it was verified against nullptr. Check lines: 1819, 1824. Driver.cpp 1824

    Фрагмент N48-N60: Не критично, но дефект (возможна утечка памяти)

    std::unique_ptr createISelMutator() {
      ....
      std::vector> Strategies;
      Strategies.emplace_back(
          new InjectorIRStrategy(InjectorIRStrategy::getDefaultOps()));
      ....
    }

    PVS-Studio Warning: V1023 [CWE-460] A pointer without owner is added to the 'Strategies' container by the 'emplace_back' method. A memory leak will occur in case of an exception. llvm-isel-fuzzer.cpp 58

    To add an item to the end of a container of type std :: vector>you can’t just write xxx.push_back (new X) , since there is no implicit conversion from X * to std :: unique_ptr.

    A common solution is to write xxx.emplace_back (new X) , because it compiles: the emplace_back method constructs the element directly from the arguments and therefore can use explicit constructors.

    It's not safe. If the vector is full, then memory is allocated. A memory reallocation operation may fail, resulting in a std :: bad_alloc exception being thrown . In this case, the pointer will be lost, and the created object will never be deleted.

    A safe solution is to create unique_ptr , which will own the pointer before the vector tries to reallocate memory:

    xxx.push_back(std::unique_ptr(new X))

    Starting with C ++ 14, you can use 'std :: make_unique':

    xxx.push_back(std::make_unique())

    This type of defect is not critical for LLVM. If memory cannot be allocated, then the compiler will simply stop working. However, for applications with a long uptime that cannot just end if memory allocation fails, this can be a real nasty bug.

    So, although this code does not pose a practical danger for LLVM, I found it useful to talk about this error pattern and that the PVS-Studio analyzer learned to detect it.

    Other warnings of this type:

    • V1023 [CWE-460] A pointer without owner is added to the 'Passes' container by the 'emplace_back' method. A memory leak will occur in case of an exception. PassManager.h 546
    • V1023 [CWE-460] A pointer without owner is added to the 'AAs' container by the 'emplace_back' method. A memory leak will occur in case of an exception. AliasAnalysis.h 324
    • V1023 [CWE-460] A pointer without owner is added to the 'Entries' container by the 'emplace_back' method. A memory leak will occur in case of an exception. DWARFDebugFrame.cpp 519
    • V1023 [CWE-460] A pointer without owner is added to the 'AllEdges' container by the 'emplace_back' method. A memory leak will occur in case of an exception. CFGMST.h 268
    • V1023 [CWE-460] A pointer without owner is added to the 'VMaps' container by the 'emplace_back' method. A memory leak will occur in case of an exception. SimpleLoopUnswitch.cpp 2012
    • V1023 [CWE-460] A pointer without owner is added to the 'Records' container by the 'emplace_back' method. A memory leak will occur in case of an exception. FDRLogBuilder.h 30
    • V1023 [CWE-460] A pointer without owner is added to the 'PendingSubmodules' container by the 'emplace_back' method. A memory leak will occur in case of an exception. ModuleMap.cpp 810
    • V1023 [CWE-460] A pointer without owner is added to the 'Objects' container by the 'emplace_back' method. A memory leak will occur in case of an exception. DebugMap.cpp 88
    • V1023 [CWE-460] A pointer without owner is added to the 'Strategies' container by the 'emplace_back' method. A memory leak will occur in case of an exception. llvm-isel-fuzzer.cpp 60
    • V1023 [CWE-460] A pointer without owner is added to the 'Modifiers' container by the 'emplace_back' method. A memory leak will occur in case of an exception. llvm-stress.cpp 685
    • V1023 [CWE-460] A pointer without owner is added to the 'Modifiers' container by the 'emplace_back' method. A memory leak will occur in case of an exception. llvm-stress.cpp 686
    • V1023 [CWE-460] A pointer without owner is added to the 'Modifiers' container by the 'emplace_back' method. A memory leak will occur in case of an exception. llvm-stress.cpp 688
    • V1023 [CWE-460] A pointer without owner is added to the 'Modifiers' container by the 'emplace_back' method. A memory leak will occur in case of an exception. llvm-stress.cpp 689
    • V1023 [CWE-460] A pointer without owner is added to the 'Modifiers' container by the 'emplace_back' method. A memory leak will occur in case of an exception. llvm-stress.cpp 690
    • V1023 [CWE-460] A pointer without owner is added to the 'Modifiers' container by the 'emplace_back' method. A memory leak will occur in case of an exception. llvm-stress.cpp 691
    • V1023 [CWE-460] A pointer without owner is added to the 'Modifiers' container by the 'emplace_back' method. A memory leak will occur in case of an exception. llvm-stress.cpp 692
    • V1023 [CWE-460] A pointer without owner is added to the 'Modifiers' container by the 'emplace_back' method. A memory leak will occur in case of an exception. llvm-stress.cpp 693
    • V1023 [CWE-460] A pointer without owner is added to the 'Modifiers' container by the 'emplace_back' method. A memory leak will occur in case of an exception. llvm-stress.cpp 694
    • V1023 [CWE-460] A pointer without owner is added to the 'Operands' container by the 'emplace_back' method. A memory leak will occur in case of an exception. GlobalISelEmitter.cpp 1911
    • V1023 [CWE-460] A pointer without owner is added to the 'Stash' container by the 'emplace_back' method. A memory leak will occur in case of an exception. GlobalISelEmitter.cpp 2100
    • V1023 [CWE-460] A pointer without owner is added to the 'Matchers' container by the 'emplace_back' method. A memory leak will occur in case of an exception. GlobalISelEmitter.cpp 2702

    Заключение


    In total, I wrote out 60 warnings, after which I stopped. Are there any other defects that PVS-Studio analyzer detects in LLVM? Yes there is. However, when I wrote out the code snippets for the article, it was late evening, or rather, even night, and I decided it was time to round off.

    I hope you were interested, and you will want to try the PVS-Studio analyzer.

    You can download the analyzer and get a trial key on this page .

    Most importantly, use static analysis regularly. One-time checks performed by us in order to popularize the methodology of static analysis and PVS-Studio are not a normal scenario.

    Good luck in improving the quality and reliability of the code!



    If you want to share this article with an English-speaking audience, then please use the link to the translation: Andrey Karpov. Finding Bugs in LLVM 8 with PVS-Studio .

    Also popular now: