NCBI Genome Workbench: Scientific Research Under Threat

    Modern computer technology, technical and software solutions - all this greatly facilitates and accelerates the conduct of various scientific studies. Often, computer simulation is the only way to test many theories. Scientific software has its own characteristics. For example, such software is often subjected to very thorough testing, but is poorly documented. However, software is written by people, and people make mistakes. Errors in scientific programs can call into question entire studies. This article will cover dozens of problems found in the code of the NCBI Genome Workbench software package.

    Introduction


    NCBI Genome Workbench offers researchers a wide range of tools for studying and analyzing genetic data. Users can explore and compare data from multiple sources, including NCBI (National Center for Biotechnology Information) databases or their own personal data.

    As mentioned earlier, scientific software is usually well covered by unit tests. When checking this project, 85 directories with test files were excluded from the analysis. This is about a thousand files. Probably, this is due to the requirements for testing various complex algorithms that are invented for various studies. But the quality of the rest of the code (not test) is not at such a high level as we would like. However, as in any project that has not yet taken care of the implementation of static code analysis tools :).

    Data for the review (or even research) of the code was provided by a static code analyzer for C / C ++ / C # / Java - PVS-Studio .

    Just two digits that will ruin your project.




    Based on our database of errors, which currently amounts to more than 12 thousand selected examples, we notice and describe special patterns of writing code that lead to numerous errors. For example, we conducted the following studies:

    1. Last line effect ;
    2. The most dangerous feature in the C / C ++ world ;
    3. Logical expressions in C / C ++. How are the professionals mistaken ;
    4. Evil lives in comparison functions .

    This project initiated the description of the new pattern. We are talking about the numbers 1 and 2 in the names of variables, for example, file1 and file2 , etc. It is very easy to confuse two such variables. This is a special case of typos in the code, but one error leads to the appearance of such errors - the desire to work with variables of the same name, differing only in numbers 1 and 2 at the end of the name.

    Looking ahead a bit, I’ll say that all the listed studies were confirmed in the code of this project: D.

    Consider the first example from the Genome Workbench project:

    V501 There are identical sub-expressions '(! Loc1.IsInt () &&! Loc1.IsWhole ())' || operator. nw_aligner.cpp 480

    CRef<CSeq_align> CNWAligner::Run(CScope &scope, const CSeq_loc &loc1,
                                     const CSeq_loc &loc2,
                                     bool trim_end_gaps)
    {
      if ((!loc1.IsInt() && !loc1.IsWhole()) ||
          (!loc1.IsInt() && !loc1.IsWhole()))
      {
        NCBI_THROW(CException, eUnknown,
                   "Only whole and interval locations supported");
      }
      ....
    }

    We see two variables, loc1 and loc2 . As well as errors in the code: variable loc2 not used, because instead of it once again used loc1 .

    Another example:

    V560 A part of conditional expression is always false: s1.IsSet (). valid_biosource.cpp 3073

    staticbools_PCRPrimerSetLess(const CPCRPrimerSet& s1, const CPCRPrimerSet& s2){
      if (!s1.IsSet() && s1.IsSet()) {
        returntrue;
      } elseif (s1.IsSet() && !s2.IsSet()) {
        returnfalse;
      } elseif (!s1.IsSet() && !s2.IsSet()) {
        returnfalse;
      } elseif (s1.Get().size() < s2.Get().size()) {
        returntrue;
      } elseif (s1.Get().size() > s2.Get().size()) {
        returnfalse;
      } else {
      .....
    }

    The first line of the code mixed up the variables s1 and s2 . Based on the name, this is a comparison function. But such an error can be anywhere, because by calling the variables Number 1 and Number 2 , the programmer will almost certainly make a mistake in the future. And the more uses of such names in the function, the higher the probability of making a mistake.

    Other typos and copy-paste




    V501 There are identical sub-expressions to the left and the right! ”= 'Operator: bd.bit_.bits [i]! = Bd.bit_.bits [i] bm.h 296

    boolcompare_state(const iterator_base& ib)const{
      ....
      if (this->block_type_ == 0
      {
        if (bd.bit_.ptr != ib_db.bit_.ptr) returnfalse;
        if (bd.bit_.idx != ib_db.bit_.idx) returnfalse;
        if (bd.bit_.cnt != ib_db.bit_.cnt) returnfalse;
        if (bd.bit_.pos != ib_db.bit_.pos) returnfalse;
        for (unsigned i = 0; i < bd.bit_.cnt; ++i)
        {
          if (bd.bit_.bits[i] != bd.bit_.bits[i]) returnfalse;
        }
      }
      ....
    }

    I suppose that after all the checks, the sizes of the arrays bits of the bd.bit_ and ib_db.bit_ objects are equal. Therefore, the author of the code wrote one cycle for element-by-element comparison of the bits arrays , but made a typo in the name of one of the compared objects. As a result, the compared objects can be mistakenly considered equal in some situations.

    This example is worthy of the article " Evil lives in comparison functions ."

    V501 There are identical sub-expressions 'CFieldHandler :: QualifierNamesAreEquivalent (field, kFieldTypeSeqId)' to the left | operator. field_handler.cpp 152

    bool CFieldHandlerFactory::s_IsSequenceIDField(conststring& field)
    {
      if (   CFieldHandler::QualifierNamesAreEquivalent(field, kFieldTypeSeqId)
          || CFieldHandler::QualifierNamesAreEquivalent(field, kFieldTypeSeqId)) {
        returntrue;
      } else {
        returnfalse;
      }
    }

    Most likely, one of the checks is superfluous. I did not find in the code variables similar to kFieldTypeSeqId . However, there may be an extra function call due to the operator "||", which degrades performance.

    A couple of similar places with a warning analyzer that need to be checked:

    • V501 There are identical sub-expressions "uF-> GetData (). IsBool ()" variation_utils.cpp 1711
    • V501 There are identical sub-expressions "uF-> GetData (). IsBool ()" variation_utils.cpp 1735

    V766 An item with the same key 'kArgRemote' has already been added. blast_args.cpp 3262

    void
    CBlastAppArgs::x_IssueWarningsForIgnoredOptions(const CArgs& args)
    {
      set<string> can_override;
      ....
      can_override.insert(kArgOutputFormat);
      can_override.insert(kArgNumDescriptions);
      can_override.insert(kArgNumAlignments);
      can_override.insert(kArgMaxTargetSequences);
      can_override.insert(kArgRemote);               // <=
      can_override.insert(kArgNumThreads);
      can_override.insert(kArgInputSearchStrategy);
      can_override.insert(kArgRemote);               // <=
      can_override.insert("remote_verbose");
      can_override.insert("verbose");
      ....
    }

    The analyzer detected the addition of 2 identical values ​​to the set container . Recall that this container stores only unique values, so duplicates are not added to it.

    Code like the one above is often written using the copy-paste method. There may just be an extra value, or perhaps the author forgot to rename one of the variables when copied. When removing an extra call, the insert code is slightly optimized, which, however, is not essential. It is much more important that there may be a serious mistake hiding due to a missing element in the set.

    V523 The 'then' statement is the subsequent code fragment. vcf_reader.cpp 1105

    bool
    CVcfReader::xAssignFeatureLocationSet(....)
    {
      ....
      if (data.m_SetType == CVcfData::ST_ALL_DEL) {
        if (data.m_strRef.size() == 1) {
          //deletion of a single base
          pFeat->SetLocation().SetPnt().SetPoint(data.m_iPos-1);
          pFeat->SetLocation().SetPnt().SetId(*pId);
        }
        else {
          pFeat->SetLocation().SetInt().SetFrom(data.m_iPos-1);
          //-1 for 0-based, //another -1 for inclusive end-point ( i.e. [], not [) )
          pFeat->SetLocation().SetInt().SetTo( 
             data.m_iPos -1 + data.m_strRef.length() - 1); 
          pFeat->SetLocation().SetInt().SetId(*pId);
        }
        returntrue;
      }
      //default: For MNV's we will use the single starting point//NB: For references of size >=2, this location will not//match the reference allele.  Future Variation-ref//normalization code will address these issues,//and obviate the need for this code altogether.if (data.m_strRef.size() == 1) {
        //deletion of a single base
        pFeat->SetLocation().SetPnt().SetPoint(data.m_iPos-1);
        pFeat->SetLocation().SetPnt().SetId(*pId);
      }
      else {
        pFeat->SetLocation().SetInt().SetFrom(data.m_iPos-1);
        pFeat->SetLocation().SetInt().SetTo( 
          data.m_iPos -1 + data.m_strRef.length() - 1); 
        pFeat->SetLocation().SetInt().SetId(*pId);
      }
      returntrue;
    }

    The function contains large and completely identical code fragments. However, they contain different accompanying comments. The code is not optimally written, complicated, and possibly contains an error.

    The entire list of suspicious places with an if-else operator looks like this:

    • V523 The 'then' statement is equivalent to the 'else' statement. blk.c 2142
    • V523 The 'then' statement is the subsequent code fragment. odbc.c 379
    • V523 The 'then' statement is the subsequent code fragment. odbc.c 1414
    • V523 The 'then' statement is equivalent to the 'else' statement. seqdbvol.cpp 1922
    • V523 The 'then' statement is equivalent to the 'else' statement. seqdb_demo.cpp 466
    • V523 The 'then' statement is the subsequent code fragment. blast_engine.c 1917
    • V523 The 'then' statement is equivalent to the 'else' statement. blast_filter.c 420
    • V523 The 'then' statement is equivalent to the 'else' statement. blast_parameters.c 636
    • V523 The 'then' statement is equivalent to the 'else' statement. unordered_spliter.cpp 684
    • V523 The 'then' statement is equivalent to the 'else' statement. bme.cpp 333
    • V523 The 'then' statement is equivalent to the 'else' statement. gme.cpp 484

    / * with security is best be pedantic * /



    V597 The compiler couldn’t need the function call, which is used to flush 'passwd_buf' buffer. The memset_s () function should be used to erase the private data. challenge.c 366

    /**
     * Crypt a given password using schema required for NTLMv1 authentication
     * @param passwd clear text domain password
     * @param challenge challenge data given by server
     * @param flags NTLM flags from server side
     * @param answer buffer where to store crypted password
     */voidtds_answer_challenge(....){
    #define MAX_PW_SZ 14
      ....
      if (ntlm_v == 1) {
        ....
        /* with security is best be pedantic */memset(hash, 0, sizeof(hash));
        memset(passwd_buf, 0, sizeof(passwd_buf));
        memset(ntlm2_challenge, 0, sizeof(ntlm2_challenge));
      } else {
        ....
      }
    }

    As you probably already guessed, in the section title an amusing comment about security from the code is used.

    In short, the memset function will be deleted by the compiler, because cleared buffers are no longer used. And data such as hash or passwd_buf , in fact, will not be overwritten with zeros. For more information about this unobvious compiler mechanism, see the article " Safely cleans private data ."

    V597 The compiler couldn’t delete the memset function call, which is used to flush the answer object. The memset_s () function should be used to erase the private data. challenge.c 561

    static TDSRET
    tds7_send_auth(....){
      ....
      /* for security reason clear structure */memset(&answer, 0, sizeof(TDSANSWER));
      return tds_flush_packet(tds);
    }

    That was not the only example with comments about "security." Judging by the comments, we can assume that security is really important for the project. Therefore, I’m enclosing a whole list of problems that are not small:

    • V597 The compiler couldn’t delete the memset function call, which is used to flush the heap object. The memset_s () function should be used to erase the private data. ncbi_heapmgr.c 1300
    • V597 The compiler couldn’t delete the memset function call, which is used to flush the context. The memset_s () function should be used to erase the private data. challenge.c 167
    • V597 The compiler couldn’t delete the memset function call, which is used to flush the ks object. The memset_s () function should be used to erase the private data. challenge.c 339
    • V597 The compiler could delete the memset function call, which is used to flush the md5_ctx object. The memset_s () function should be used to erase the private data. challenge.c 353
    • V597 The compiler couldn’t delete the memset function call, which is used to flush 'hash' buffer. The memset_s () function should be used to erase the private data. challenge.c 365
    • V597 The compiler couldn’t delete the memset function call, which is used to flush the ks object. The memset_s () function should be used to erase the private data. challenge.c 406
    • V597 The compiler couldn’t make it the function call, which is used to flush 'ntlm_v2_response' object. The memset_s () function should be used to erase the private data. login.c 795
    • V597 The compiler couldn’t delete the memset function call, which is used to flush the answer object. The memset_s () function should be used to erase the private data. login.c 801
    • V597 The compiler couldn’t have the function call, which is used to flush 'packet' buffer. The memset_s () function should be used to erase the private data. numeric.c 256
    • V597 The compiler couldn’t have the function call, which is used to flush 'packet' buffer. The memset_s () function should be used to erase the private data. numeric.c 110
    • V597 The compiler couldn’t delete the memset function call, which is used to flush the pwd buffer. The memset_s () function should be used to erase the private data. getpassarg.c 50
    • V597 The compiler couldn’t delete the memset function call, which is used to flush the context. The memset_s () function should be used to erase the private data. challenge.c 188
    • V597 The compiler couldn’t delete the memset function call, which is used to flush the buf buffer. The memset_s () function should be used to erase the private data. challenge.c 243
    • V597 The compiler couldn’t make it the function call, which is used to flush 'ntlm_v2_hash' buffer. The memset_s () function should be used to erase the private data. challenge.c 309
    • V597 The compiler could delete the memset function call, which is used to flush the md5_ctx object. The memset_s () function should be used to erase the private data. challenge.c 354
    • V597 The compiler couldn’t need the function call, which is used to flush 'passwd_buf' buffer. The memset_s () function should be used to erase the private data. challenge.c 380
    • V597 The compiler couldn’t delete the memset function call, which is used to flush the ks object. The memset_s () function should be used to erase the private data. challenge.c 393
    • V597 The compiler couldn’t delete the memset function call, which is used to flush 'hash' buffer. The memset_s () function should be used to erase the private data. challenge.c 394
    • V597 The compiler couldn’t have the function call, which is used to flush the ntlm2_challenge buffer. The memset_s () function should be used to erase the private data. challenge.c 395
    • V597 The compiler couldn’t delete the memset function call, which is used to flush the ks object. The memset_s () function should be used to erase the private data. challenge.c 419
    • V597 The compiler couldn’t make it the function call, which is used to flush 'ntlm_v2_response' object. The memset_s () function should be used to erase the private data. challenge.c 556

    Suspicious cycles




    V534 It is likely that a variable is being compared inside the 'for' operator. Consider reviewing 'i'. taxFormat.cpp 569

    void CTaxFormat::x_LoadTaxTree(void)
    {
      ....
      for(size_t i = 0; i < alignTaxids.size(); i++) {
        int tax_id = alignTaxids[i];
        ....
        for(size_t j = 0; i < taxInfo.seqInfoList.size(); j++) {
          SSeqInfo* seqInfo = taxInfo.seqInfoList[j];
          seqInfo->taxid = newTaxid;
        }
        ....
      }
      ....
    }

    I think, in the condition of the inner cycle, variable i was randomly crammed. Instead, the variable j should be used .

    V535 The variable 'i' is being used for the loop. Check lines: 302, 309. sls_alp.cpp 309

    alp::~alp()
    {
      ....
      if(d_alp_states)
      {
        for(i=0;i<=d_nalp;i++)           // <=
        {
          if(i<=d_alp_states->d_dim)
          {
            if(d_alp_states->d_elem[i])
            {
              for(i=0;i<=d_nalp;i++)     // <=
              {
                ....
      ....
    }

    Two nested identical cycles, in which the global counter is also reset, look well, very suspiciously. Developers should check out what's going on here.

    Abnormal Array Indexing




    V520 The comma operator ',' in array index expression '[- i2, - k]'. nw_spliced_aligner16.cpp 564

    void CSplicedAligner16::x_DoBackTrace (
        const Uint2* backtrace_matrix, CNWAligner::SAlignInOut* data,
        int i_global_max, int j_global_max)
    {
      ....
      while(intron_length < m_IntronMinSize || (Key & donor) == 0) {
          Key = backtrace_matrix[--i2, --k];
          ++intron_length;
          data->m_transcript.push_back(eTS_Intron);
      }
      ....
    }

    At once I will say that there seems to be no error here (so far, lol). Consider the following line:

    Key = backtrace_matrix[--i2, --k];

    The word 'matrix' and double indexing may suggest that the array is two-dimensional, but it is not. This is a regular pointer to an array of integers. But the diagnosis of the V520 did not just appear. Programmers are really confused about how to index two-dimensional arrays.

    In this case, the author simply decided to save on a single line of code, although he could write this:

    --i2;
    Key = backtrace_matrix[--k];

    V661 A suspicious expression 'A [B == C]'. Probably meant 'A [B] == C'. ncbi_service_connector.c 180

    static EHTTP_HeaderParse s_ParseHeader(constchar* header, ....){
      ....
      if (sscanf(header, "%u.%u.%u.%u%n", &i1, &i2, &i3, &i4, &n) < 4
          ||  sscanf(header + n, "%hu%x%n", &uuu->port, &tkt, &m) < 2
          || (header[m += n]  &&  !(header[m] == '$')  &&
              !isspace((unsignedchar)((header + m)
                                       [header[m] == '$'])))) {
          break/*failed - unreadable connection info*/;
      }
      ....
    }

    Another example of code in which I tried for a long time to understand what was happening: D. The isspace () function checks the character with the index m , but if this character is '$', then the character with the index m + 1 is passed to the function . In this case, the comparison with '$' was already in advance. Perhaps there is no error here, but the code can be accurately rewritten more clearly.

    V557 Array overrun is possible. The 'row' index is pointing beyond array bound. aln_reader.cpp 412

    bool CAlnReader::x_IsGap(TNumrow row, TSeqPos pos, conststring& residue)
    {
      if (m_MiddleSections.size() == 0) {
        x_CalculateMiddleSections();
      }
      if (row > m_MiddleSections.size()) {
          returnfalse;
      }
      if (pos < m_MiddleSections[row].first) {
        ....
      }
      ....
    }

    Here there is a serious mistake. The correct row index check should be like this:

    if (row >= m_MiddleSections.size()) {
      returnfalse;
    }

    Otherwise, it is possible to access data outside of the MiddleSections vector .

    Many more places like this:

    • V557 Array overrun is possible. The 'i' index is pointing beyond array bound. resource_pool.hpp 388
    • V557 Array overrun is possible. The 'row' index is pointing beyond array bound. aln_reader.cpp 418
    • V557 Array overrun is possible. The 'fmt_idx' index is pointing beyond array bound. seq_writer.cpp 384
    • V557 Array overrun is possible. The 'fmt_idx' index is pointing beyond array bound. blastdb_formatter.cpp 183
    • V557 Array overrun is possible. The 'num' index is beyond array bound. newcleanupp.cpp 13035

    How to earn distrust of functions




    V570 The 'm_onClickFunction' variable is assigned to itself. alngraphic.hpp 103

    voidSetOnClickFunctionName(string onClickFunction){
      m_onClickFunction = m_onClickFunction;
    }

    Here there is nothing to even comment on. One can only sympathize with the person who clicked something, clicked, but nothing has changed.

    Two more cases of assigning variables to myself will give a list:

    • V570 The 'iter-> level' variable is assigned to itself. align_format_util.cpp 189
    • V570 The 'd_elements_values ​​[ind]' variable is assigned to itself. sls_alp_data.cpp 1416

    V763 Parameter 'w1' is always rewritten in function body before being used. bmfunc.h 5363

    /// Bit COUNT functortemplate<typename W> structbit_COUNT
    {W operator()(W w1, W w2){
        w1 = 0;
        BM_INCWORD_BITCOUNT(w1, w2);
        return w1;
      }
    };

    The function in which the argument is ground immediately upon entering the function may be misleading by the developers using it. The code should be double-checked.

    Errors when designing classes




    V688 The 'm_qsrc' function argument. compart_matching.cpp 873

    classCElementaryMatching:public CObject
    {
      ....
      ISequenceSource * m_qsrc;
      ....
      voidx_CreateIndex(ISequenceSource *m_qsrc, EIndexMode index_more, ....);
      voidx_CreateRemapData(ISequenceSource *m_qsrc, EIndexMode mode);
      voidx_LoadRemapData(ISequenceSource *m_qsrc, conststring& sdb);
      ....
    };

    Immediately 3 functions of the class contain arguments whose names coincide with the field of the class. This can lead to errors in the function bodies: the programmer may think that he is working with a member of a class, actually changing the value of a local variable.

    V614 Uninitialized variable 'm_BitSet' used. SnpBitAttributes.hpp 187

    /// SNP bit attribute container.classCSnpBitAttributes
    {public:
      ....
    private:
      /// Internal storage for bits.
      Uint8 m_BitSet;
    };
    inline CSnpBitAttributes::CSnpBitAttributes(Uint8 bits) : m_BitSet(bits)
    {
    }
    inline CSnpBitAttributes::CSnpBitAttributes(constvector<char>& octet_string)
    {
      auto count = sizeof(m_BitSet);
      auto byte = octet_string.end();
      do
        m_BitSet = (m_BitSet << 8) | *--byte;
      while (--count > 0);
    }

    One of the constructors works inappropriately with the m_BitSet variable . The fact is that the variable is uninitialized. Its “garbage” value is used at the first iteration of the loop, after which initialization occurs. This is a very serious mistake, leading to an undefined program behavior.

    V603 The object was not used. If you wish to call the constructor, 'this-> SIntervalComparisonResult :: SIntervalComparisonResult (....)' should be used. compare_feats.hpp 100

    //This struct keeps the result of comparison of two exonsstructSIntervalComparisonResult : CObject
    {
    public:
      SIntervalComparisonResult(unsigned pos1, unsigned pos2,
                                FCompareLocs result, int pos_comparison = 0) 
      : m_exon_ordinal1(pos1), m_exon_ordinal2(pos2),
        m_result(result), m_position_comparison(pos_comparison) {}
      SIntervalComparisonResult()
      {
        SIntervalComparisonResult(0, 0, fCmp_Unknown, 0);
      }
      ....
    };

    A long time ago I did not encounter such errors when checking projects. But the problem is still relevant. The mistake is that calling a parameterized constructor in this way results in the creation and deletion of a temporary object. And class fields remain uninitialized. Another constructor should be called via the initialization list (see Delegating constructor ).

    V591 Non-void function should return a value. bio_tree.hpp 266

    /// Recursive assignment
    CBioNode& operator=(const CBioNode& tree)
    {
      TParent::operator=(tree);
      TBioTree* pt = (TBioTree*)tree.GetParentTree();
      SetParentTree(pt);
    }

    The analyzer considers that there is not enough line in the overloaded operator:

    return *this;

    V670 The uninitialized class member 'm_OutBlobIdOrData' is used to initialize the 'm_StdOut' member. Remember that members are initialized in the order of declarations inside a class. remote_app.hpp 215

    classNCBI_XCONNECT_EXPORTCRemoteAppResult
    {public:
      CRemoteAppResult(CNetCacheAPI::TInstance netcache_api,
              size_t max_inline_size = kMaxBlobInlineSize) :
          m_NetCacheAPI(netcache_api),
          m_RetCode(-1),
          m_StdOut(netcache_api, m_OutBlobIdOrData, m_OutBlobSize),
          m_OutBlobSize(0),
          m_StdErr(netcache_api, m_ErrBlobIdOrData, m_ErrBlobSize),
          m_ErrBlobSize(0),
          m_StorageType(eBlobStorage),
          m_MaxInlineSize(max_inline_size)
      {
      }
      ....
    };

    On this fragment of the code 3 warnings of the analyzer are issued immediately. Class fields are initialized not in the order in which they are listed in the initialization list, but in how they are declared in the class. The classic reason for the error is that not all programmers remember or know about this rule. Here and in the initialization list is just the wrong order. One gets the feeling that the list of fields was entered in a random order.

    V746 Object slicing. By reference rather than by value. cobalt.cpp 247

    void 
    CMultiAligner::SetQueries(constvector< CRef<objects::CBioseq> >& queries)
    {
      ....
      try {
        seq_loc->SetId(*it->GetSeqId());
      }
      catch (objects::CObjMgrException e) {
        NCBI_THROW(CMultiAlignerException, eInvalidInput,
                   (string)"Missing seq-id in bioseq. " + e.GetMsg());
      }
      m_tQueries.push_back(seq_loc);
      ....
    }

    Intercepting exceptions by value can result in the loss of some exception information due to the creation of a new object. It is much better and safer to catch an exception by reference.

    Similar places:

    • V746 Object slicing. By reference rather than by value. agp_validate_reader.cpp 562
    • V746 Object slicing. By reference rather than by value. aln_build_app.cpp 320
    • V746 Object slicing. By reference rather than by value. aln_test_app.cpp 458
    • V746 Object slicing. By reference rather than by value. cobalt.cpp 691
    • V746 Object slicing. By reference rather than by value. cobalt.cpp 719
    • V746 Object slicing. By reference rather than by value. cobalt.cpp 728
    • V746 Object slicing. By reference rather than by value. cobalt.cpp 732

    About unreachable code and other code execution issues



    V779 Unreachable code detected. It is possible that an error is present. merge_tree_core.cpp 627

    bool CMergeTree::x_FindBefores_Up_Iter(....)
    {
        ....
        FirstFrame->Curr = StartCurr;
        FirstFrame->Returned = false;
        FirstFrame->VisitCount = 0;
        FrameStack.push_back(FirstFrame);
        while(!FrameStack.empty()) {
            ....
            if(Rel == CEquivRange::eAfter) {
                Frame->Returned = false;
                FrameStack.pop_back();
                continue;
            } 
            elseif(Rel == CEquivRange::eBefore) {
                ....
                continue;
            }
            else {
                if(Frame->VisitCount == 0) {
                    ....
                    continue;
                } else {
                    ....
                    continue;
                }
            }
            Frame->Returned = false; // <=
            FrameStack.pop_back();
            continue;
        }  // end stack loop
        FirstFrame->ChildFrames.clear();
        return FirstFrame->Returned;
    }

    The code of the conditional operator is written in such a way that absolutely all branches of the code end with the continue operator . This led to the fact that in a while loop several lines of unreachable code were formed. These lines look very suspicious. Most likely, such a problem arose after the code was refactored, and now a careful code-review is required here.

    V519 The 'interval interval' variable is assigned values ​​twice successively. Perhaps this is a mistake. Check lines: 454, 456. aln_writer.cpp 456

    void CAlnWriter::AddGaps(....)
    {
      ....
      switch(exon_chunk->Which()) {
      case CSpliced_exon_chunk::e_Match:
          interval_width = exon_chunk->GetMatch();
      case CSpliced_exon_chunk::e_Mismatch:
          interval_width = exon_chunk->GetMismatch();
      case CSpliced_exon_chunk::e_Diag:
          interval_width = exon_chunk->GetDiag();
          genomic_string.append(....);
          product_string.append(....);
          genomic_pos += interval_width;
          product_pos += interval_width/res_width;
          break;
        ....
      }
      ....
    }

    The variable interval_width is overwritten several times, because There are no break statements in the case branches . Though a classic, but very bad mistake. A few more suspicious places:



    • V779 Unreachable code detected. It is possible that an error is present. dbapi_driver_utils.cpp 351
    • V779 Unreachable code detected. It is possible that an error is present. net.c 780
    • V779 Unreachable code detected. It is possible that an error is present. bcp.c 1495
    • V779 Unreachable code detected. It is possible that an error is present. remote_blast.cpp 1470
    • V779 Unreachable code detected. It is possible that an error is present. remote_blast.cpp 1522

    V571 Recurring check. The 'if (m_QueryOpts-> filtering_options)' condition was already verified in line 703. blast_options_local_priv.hpp 713

    inlinevoid
    CBlastOptionsLocal::SetFilterString(constchar* f)
    {
      ....
      if (m_QueryOpts->filtering_options)      // <=
      {
        SBlastFilterOptions* old_opts = m_QueryOpts->filtering_options;
        m_QueryOpts->filtering_options = NULL;
        SBlastFilterOptionsMerge(&(m_QueryOpts->filtering_options),
          old_opts, new_opts);
        old_opts = SBlastFilterOptionsFree(old_opts);
        new_opts = SBlastFilterOptionsFree(new_opts);
      } 
      else
      {
         if (m_QueryOpts->filtering_options)   // <=
             m_QueryOpts->filtering_options = 
                 SBlastFilterOptionsFree(m_QueryOpts->filtering_options);
         m_QueryOpts->filtering_options = new_opts;
         new_opts = NULL;
      }
      ....
    }

    Obviously, the else branch requires rewriting. I have a few ideas on what they wanted to do with the m_QueryOpts-> filtering_options pointer , but the code is still kind of tangled. I appeal to the authors of the code.

    Well, the problem does not come alone:

    • V571 Recurring check. The 'if (sleeptime)' condition was already verified in line 205. request_control.cpp 208
    • V571 Recurring check. The 'if (assignValue.empty ())' condition was already verified in line 712. classstr.cpp 718

    Read errors



    V739 EOF should not be compared with a value of the 'char' type. The 'linestring [0]' should be of the 'int' type. alnread.c 3509

    static EBool
    s_AfrpInitLineData(
      ....
      char* linestring = readfunc (pfile);
      ....
      while (linestring != NULL  &&  linestring [0] != EOF) {
        s_TrimSpace (&linestring);
        ....
      }
      ....
    }

    Characters to be compared with EOF should not be stored in variables of type char . Otherwise, there is a risk that a character with a value of 0xFF (255) becomes -1 and will be interpreted in the same way as the end of file (EOF). Also (just in case) it is worth checking the implementation of the readfunc function .

    V663 Infinite loop is possible. The 'cin.eof ()' condition is not a loop from the loop. Consider adding the 'cin.fail ()' function call to the conditional expression. ncbicgi.cpp 1564

    typedefstd::istream CNcbiIstream;
    void CCgiRequest::Serialize(CNcbiOstream& os) const
    {
      ....
      CNcbiIstream* istrm = GetInputStream();
      if (istrm) {
        char buf[1024];
        while(!istrm->eof()) {
          istrm->read(buf, sizeof(buf));
          os.write(buf, istrm->gcount());
        }
      }
    }

    The analyzer has detected a potential error due to which an infinite loop may occur. In the event of a failure in reading the data, a call to the eof () function will always return false . To complete the loop in this case, additional checking of the value returned by the function fail () is necessary .

    Mistakes



    V502 Perhaps the '?:' Operator was different. The '?:' Operator has a operator. ncbi_connutil.c 1135

    staticconstchar* x_ClientAddress(constchar* client_host,
                                       int/*bool*/ local_host){
      ....
      if ((client_host == c  &&  x_IsSufficientAddress(client_host))
          ||  !(ip = *c  &&  !local_host
                ? SOCK_gethostbyname(c)
                : SOCK_GetLocalHostAddress(eDefault))
          ||  SOCK_ntoa(ip, addr, sizeof(addr)) != 0
          ||  !(s = (char*) malloc(strlen(client_host) + strlen(addr) + 3))) {
          return client_host/*least we can do :-/*/;
      }
      ....
    }

    Pay attention to the expression:

    !local_host ? SOCK_gethostbyname(c) : SOCK_GetLocalHostAddress(eDefault)


    It is not calculated as the programmer expected, because the whole expression looks like this:

    ip = *c  && !local_host ? SOCK_gethostbyname(c) : SOCK_GetLocalHostAddress(...)


    The priority of && operator is higher than that of ?: . For this reason, the code is not executed as intended.

    V561 It's probably better to assign value to 'seq' variable than to declare it anew. Previous declaration: validator.cpp, line 490. validator.cpp 492

    bool CValidator::IsSeqLocCorrectlyOrdered(const CSeq_loc& loc, CScope& scope)
    {
      CBioseq_Handle seq;
      try {
        CBioseq_Handle seq = scope.GetBioseqHandle(loc);
      } catch (CObjMgrException& ) {
        // no way to tellreturntrue;
      } catch (const exception& ) {
        // no way to tellreturntrue;
      }
      if (seq  &&  seq.GetInst_Topology() == CSeq_inst::eTopology_circular) {
        // no way to check if topology is circularreturntrue;
      }
      return CheckConsecutiveIntervals(loc, scope, x_IsCorrectlyOrdered);
    }

    Due to the fact that the programmer declared the new variable seq inside the try / catch section, the other variable seq remains uninitialized and is used below by code.

    V562 It's a bit to compare value with a value of 0: ((((status) & 0x7f) == 0)! = 0. ncbi_process.cpp 111

    bool CProcess::CExitInfo::IsExited(void) const
    {
        EXIT_INFO_CHECK;
        if (state != eExitInfo_Terminated) {
            returnfalse;
        }
    #if   defined(NCBI_OS_UNIX)return WIFEXITED(status) != 0;
    #elif defined(NCBI_OS_MSWIN)// The process always terminates with exit codereturntrue;
    #endif
    }

    Nothing foreshadowed trouble, but WIFEXITED turned out to be a macro, opening like this:

    return (((status) & 0x7f) == 0) != 0;

    It turns out that the function returns the opposite value.

    The code found another such function for which a warning was issued:

    • V562 It's a bit to compare the value of a bool type with a value of 0. ncbi_process.cpp 126

    V595 The 'dst_len' pointer was used before it was verified against nullptr. Check lines: 309, 315. zlib.cpp 309

    bool CZipCompression::CompressBuffer(
      constvoid* src_buf, size_t  src_len,
      void*       dst_buf, size_t  dst_size,
      /* out */size_t* dst_len)
    {
      *dst_len = 0;
      // Check parametersif (!src_len  &&  !F_ISSET(fAllowEmptyData)) {
        src_buf = NULL;
      }
      if (!src_buf || !dst_buf || !dst_len) {
        SetError(Z_STREAM_ERROR, "bad argument");
        ERR_COMPRESS(48, FormatErrorMessage("CZipCompression::CompressBuffer"));
        returnfalse;
      }
      ....
    }

    The dst_len pointer is dereferenced at the very beginning of the function, while further on the code is checked for equality to zero. An error is made in the code that leads to undefined behavior if the dst_len pointer turns out to be nullptr .

    V590 Consider inspecting the 'ch! =' \ 0 '&& ch ==' '' expression. The expression is misprint. cleanup_utils.cpp 580

    boolAsn2gnbkCompressSpaces(string& val){
      ....
      while (ch != '\0' && ch == ' ') {
        ptr++;
        ch = *ptr;
      }
      ....
    }

    The condition for stopping the loop depends only on whether the character ch is a space or not. The expression can be simplified to this:

    while (ch == ' ') {
      ....
    }

    Conclusion


    The use of computer programs in scientific research helps and will help to make discoveries. Let's hope that the most important of them will not be missed because of any typos.

    I invite the developers of the NCBI Genome Workbench project to contact us, and we will provide a full report issued by the PVS-Studio analyzer.

    I hope that this small code study will help to correct many errors and, in general, improve the reliability of the project. Try running PVS-Studio on the code of your projects, if you haven’t done so yet. You might like it :).



    If you want to share this article with an English-speaking audience, then please use the link to the translation: Svyatoslav Razmyslov. NCBI Genome Workbench: Scientific Research under Threat

    Also popular now: