Andrey Karpov believes that the Manticore project code is better than the Sphinx project code

    Sphinx vs manticoreMy readers asked me to compare the Manticore and Sphinx projects in terms of code quality. I can do this only in one way I have mastered - to check projects using the PVS-Studio static analyzer and calculate the density of errors in the code. So, I checked C and C ++ code in these projects and, in my opinion, the quality of the Manticore code is higher than the quality of the Sphinx code. Naturally, this is a very narrow view, and I do not pretend to be reliable in my research. However, they asked me, and I made a comparison as I can.

    Sphinx and Manticore


    First, let's get to know the Sphinx and Manticore projects.

    Sphinx is a full-text search engine developed by Andrey Aksyonov and distributed under the GNU GPL license. A distinctive feature is the high speed of indexing and search, as well as integration with existing DBMSs and APIs for common web programming languages.

    I took the source code from here . The size of the project, if you take the code in C and C ++ and do not include third-party libraries - 156 KLOC. Comments make up 10.2%. This means that the “clean code” is 144 KLOC.

    Manticore -this is a fork of Sphinx. Starting as key members of the original Sphinx team, the Manticore team has set itself the following goal - to deliver fast, stable and powerful free software for full-text search.

    I took the source code from here . The size of the project, if you take the code in C and C ++ and do not include third-party libraries - 170 KLOC. Comments make up 10.1%. This means that the “clean code” is 152 KLOC.

    The number of lines of code in the Manticore project is slightly larger, and I will take this into account when evaluating the density of errors found.

    Comparative analysis


    The code for these projects is very similar, and very often the same error is present in both one and the other project. I must say right away that this time I conducted the analysis superficially and studied only the general High level warnings issued by the PVS-Studio analyzer.

    Why am I too lazy to compare projects more carefully? As I said, the projects are very similar, and already when viewing the warnings of the High level I was bored. In general, the picture is clear. The analyzer generated very similar warning lists, but only in the Sphinx project there are a bit more of them. I think the situation will be exactly the same with warnings of other levels.

    In the article I will consider only some fragments of code with errors, which for some reason seemed interesting to me. A more detailed analysis of projects can be performed by their developers. I am ready to provide them with temporary license keys.

    I suggest readers also download the demo version of PVS-Studio and check the code of their projects. I am sure you will find a lot of interesting things in them.

    Common mistakes


    I will start with the errors that were found both in the Sphinx project and in the Manticore.

    CWE-476: NULL Pointer Dereference


    Expr_StrIn_c ( const CSphAttrLocator & tLoc, int iLocator,
                   ConstList_c * pConsts, UservarIntSet_c * pUservar,
                   ESphCollation eCollation )
      : Expr_ArgVsConstSet_c ( NULL, pConsts )
      , ExprLocatorTraits_t ( tLoc, iLocator )
      , m_pStrings ( NULL )
      , m_pUservar ( pUservar )
    {
      assert ( tLoc.m_iBitOffset>=0 && tLoc.m_iBitCount>0 );
      assert ( !pConsts || !pUservar );
      m_fnStrCmp = GetCollationFn ( eCollation );
      const char * sExpr = pConsts->m_sExpr.cstr();      // <=
      ....
    }

    I brought a rather large code snippet, but don’t be afraid, everything is simple here. Note the formal argument to pConsts . This pointer is used in the constructor to initialize the sExpr variable . Moreover, there is no check anywhere in the constructor in case NULL is passed as an argument , i.e. there is no protection against a null pointer. The pConsts variable is just boldly dereferenced.

    Note. There is a check in the form of assert , but it will not help in the Release version, so such a check cannot be considered sufficient.

    Now take a look at the code of the CreateInNode function , where an instance of the Expr_StrIn_c class is created:

    ISphExpr * ExprParser_t::CreateInNode ( int iNode )
    {
      ....
      case TOK_ATTR_STRING:
        return new Expr_StrIn_c ( tLeft.m_tLocator,
                                  tLeft.m_iLocator,
                                  NULL,                   // <=
                                  pUservar,
                                  m_eCollation );
      ....
    }

    The third actual argument is NULL . Accordingly, if this code fragment is executed, the null pointer will be dereferenced.

    The analyzer signals this error by issuing a warning: V522 Dereferencing of the null pointer 'pConsts' might take place. The null pointer is passed into 'Expr_StrIn_c' function. Inspect the third argument. Check lines: 5407, 5946. sphinxexpr.cpp 5407

    This error is interesting because PVS-Studio performs data-flow analysis, examining the bodies of two different functions. However, he is able to perform a much more complex nested analysis. Consider this case.

    First, consider the SendBytes function , in which, in fact, the dereferencing of the null pointer will occur.

    void ISphOutputBuffer::SendBytes ( const void * pBuf, int iLen )
    {
      int iOff = m_dBuf.GetLength();
      m_dBuf.Resize ( iOff + iLen );
      memcpy ( m_dBuf.Begin() + iOff, pBuf, iLen );
    }

    Pay attention to the pBuf pointer . It is not checked anywhere and is immediately passed as the actual argument to the memcpy function . Accordingly, if the pBuf pointer is null, then when the memcpy function is called , reading will occur at the null pointer.

    Why did PVS-Studio decide that there would be an error? To answer this question, we ’ll go up the call chain and consider the SendMysqlOkPacket function .

    void SendMysqlOkPacket ( ISphOutputBuffer & tOut, BYTE uPacketID,
                             int iAffectedRows=0, int iWarns=0,
                             const char * sMessage=NULL,
                             bool bMoreResults=false )
    {
      DWORD iInsert_id = 0;
      char sVarLen[20] = {0};
      void * pBuf = sVarLen;
      pBuf = MysqlPack ( pBuf, iAffectedRows );
      pBuf = MysqlPack ( pBuf, iInsert_id );
      int iLen = (char *) pBuf - sVarLen;
      int iMsgLen = 0;
      if ( sMessage )
        iMsgLen = strlen(sMessage) + 1;
      tOut.SendLSBDword ( (uPacketID<<24) + iLen + iMsgLen + 5);
      tOut.SendByte ( 0 );
      tOut.SendBytes ( sVarLen, iLen );
      if ( iWarns<0 ) iWarns = 0;
      if ( iWarns>65535 ) iWarns = 65535;
      DWORD uWarnStatus = iWarns<<16;
      if ( bMoreResults )
        uWarnStatus |= ( SPH_MYSQL_FLAG_MORE_RESULTS );
      tOut.SendLSBDword ( uWarnStatus );
      tOut.SendBytes ( sMessage, iMsgLen );
    }

    I apologize that I had to bring the whole body of the function. I wanted to show that there is no protection in the function against the argument sMessage being NULL . The sMessage pointer is simply passed to the SendBytes function .

    I also want to note that the default value of the formal argument sMessage is NULL :
    const char * sMessage=NULL,

    This is dangerous in itself. However, the fact that the default argument is NULL does not mean anything. Perhaps the correct arguments are always passed to the function. Therefore, let's move on:

    inline void Ok( int iAffectedRows=0, int iWarns=0,
                    const char * sMessage=NULL,
                    bool bMoreResults=false )
    {
      SendMysqlOkPacket ( m_tOut, m_uPacketID, iAffectedRows,
                          iWarns, sMessage, bMoreResults );
      if ( bMoreResults )
        m_uPacketID++;
    }

    In the Ok function , the sMessage argument is simply passed to the SendMysqlOkPacket function . Let's continue.
    void HandleMysqlMultiStmt (....)
    {
      ....
      dRows.Ok ( 0, 0, NULL, bMoreResultsFollow );
      ....
    }

    Here we end our journey. Only 4 actual arguments are passed to the function. The remaining arguments take a default value. This means that the fifth parameter sMessage - will be NULL , and the null pointer will be dereferenced.

    PVS-Studio analyzer warning that points to this error: V522 Dereferencing of the null pointer 'pBuf' might take place. The null pointer is passed into 'Ok' function. Inspect the third argument. Check lines: 2567, 12267, 12424, 14979. searchd.cpp 2567

    CWE-570: Expression is Always False


    First, consider the ESphBinRead enumeration .

    enum ESphBinRead
    {
      BIN_READ_OK,        ///< bin read ok
      BIN_READ_EOF,       ///< bin end
      BIN_READ_ERROR,     ///< bin read error
      BIN_PRECACHE_OK,    ///< precache ok
      BIN_PRECACHE_ERROR  ///< precache failed
    };

    As you can see, there are no named constants with negative values.

    Just in case, take a look at the ReadBytes function and make sure that it honestly returns values ​​without any tricks.

    ESphBinRead CSphBin::ReadBytes ( void * pDest, int iBytes )
    {
      ....
        return BIN_READ_EOF;
      ....
        return BIN_READ_ERROR;
      ....
      return BIN_READ_OK;
    }

    As you can see, all the values ​​returned by the function are greater than or equal to 0. Now it is the turn of the code with the error:

    static void DictReadEntry (....)
    {
      ....
      if ( pBin->ReadBytes ( pKeyword, iKeywordLen )<0 )
      {
        assert ( pBin->IsError() );
        return;
      }
      ....
    }

    PVS-Studio Warning: V547 Expression is always false. sphinx.cpp 22416

    Such a check does not make sense. The condition is always false, and as a result, incorrect situations are not processed when reading data. Most likely, it should have been written here:
    if ( pBin->ReadBytes ( pKeyword, iKeywordLen ) != BIN_READ_OK)

    This code demonstrates that the author of the code only thinks that the program will handle incorrect situations. In fact, I very often encounter defects in the code that is responsible for handling incorrect / non-standard situations. Therefore, programs often crash when something goes wrong. The error handlers are simply incorrectly written in them.

    There is no mystery why this happens, of course. Testing such sections of the program is difficult and uninteresting. This is one of those cases where the static analyzer can be a good helper, as it checks the code no matter how often it gets control.

    CWE-14: Compiler Removal of Code to Clear Buffers


    static bool GetFileStats (....)
    {
      ....
      struct_stat tStat;
      memset ( &tStat, 0, sizeof ( tStat ) );
      if ( stat ( szFilename, &tStat ) < 0 )
      {
        if ( pError )
          *pError = strerror ( errno );
        memset ( &tStat, 0, sizeof ( tStat ) );   // <=
        return false;
      }
      ....
    }

    PVS-Studio Warning: V597 The compiler could delete the 'memset' function call, which is used to flush 'tStat' object. The memset_s () function should be used to erase the private data. sphinx.cpp 19987

    The compiler has the right to remove the call to the memset function , which, in case of an error in the program, should clear private data in tStat .

    Why the compiler does this, I wrote many times, so I won’t repeat it. For those who have not yet encountered such situations, I propose to read the description of the V597 diagnostic or see the description of the CWE-14 .

    CWE-762: Mismatched Memory Management Routines


    First we need to look at the implementation of two macros:

    #define SafeDelete(_x) \
      { if (_x) { delete (_x); (_x) = nullptr; } }
    #define SafeDeleteArray(_x) \
      { if (_x) { delete [] (_x); (_x) = nullptr; } }

    Now, I think it’s easy for you to find the error in this code yourself:
    int CSphIndex_VLN::DebugCheck ( FILE * fp )
    {
      ....
      CSphRowitem * pInlineStorage = NULL;
      if ( pQword->m_iInlineAttrs )
        pInlineStorage = new CSphRowitem [ pQword->m_iInlineAttrs ];
      ....
      // cleanup
      SafeDelete ( pInlineStorage );
      ....
    }

    PVS-Studio warning: V611 The memory was allocated using 'new T []' operator but was released using the 'delete' operator. Consider inspecting this code. It's probably better to use 'delete [] pInlineStorage;'. sphinx.cpp 19178

    As you can see, memory is allocated for the array and freed as if only one element was created. Instead of the SafeDelete macro, the SafeDeleteArray macro should have been used here .

    Unique bugs


    Above, I looked at a few bugs that find themselves in both Sphinx and Manticore code. In this case, of course, there are errors peculiar to only one project. For example, consider one such case.

    Both projects have a RotateIndexMT function . But it is implemented in different ways. In the implementation of the Sphinx project, this function contains the defect CWE-690 (Unchecked Return Value to NULL Pointer Dereference).

    First, look at the declaration of the CheckServedEntry function :

    static bool CheckServedEntry(const ServedIndex_c * pEntry, // <=
                                 const char * sIndex,
                                 CSphString & sError );

    The first argument is a pointer to a constant object. Therefore, the function cannot change this object and the pointer itself.

    Now the function containing the error:
    static bool RotateIndexMT ( .... )
    {
      ....
      ServedIndex_c * pServed =
        g_pLocalIndexes->GetWlockedEntry ( sIndex );
      pServed->m_sNewPath = "";                            // <=
      if ( !CheckServedEntry ( pServed, sIndex.cstr(), sError ) )
      {
        if ( pServed )                                     // <=
          pServed->Unlock();
        return false;
      }
      ....
    }

    PVS-Studio Warning: V595 The 'pServed' pointer was utilized before it was verified against nullptr. Check lines: 17334, 17337. searchd.cpp 17334

    First, the pServed pointer is dereferenced. Next, the CheckServedEntry function is called , which, as we already found out, cannot change the pServed pointer , passed as the first actual argument.

    Next, the pServed pointer is checked for NULL . Yeah! So the pointer could be null. Therefore, above, before the first dereferencing of a pointer, you need to add a check.

    Another option: checking if (pServed) is unnecessary if the pointer is never NULL. In any case, the code needs to be fixed.

    To summarize


    The Sphinx project is smaller in size than the Manticore project. At the same time, in the Sphinx project, I noticed more errors and “code with a smell” than in the Manticore project.

    Given the size of the projects and the number of defects noticed, I got the following result. Let's take the error density in Manticore as 1. Then, in the Sphinx project, the error density according to my estimates is 1.5.

    My findings. The error density in the Sphinx project is one and a half times higher compared to the Manticore project. Therefore, the quality of the Manticore code is better than that of the Sphinx project. Fork turned out better than the original.

    Once again, this is my subjective opinion, based on a very small amount of information. The density of errors in the code of some components does not yet indicate the quality and reliability of the project as a whole.

    Download and try the PVS-Studio analyzer. It's simple. In the end, even if you write the perfect code, you can always look for errors in the code of your colleagues :).

    Thanks for attention. Subscribe to Twitter or RSS to keep abreast of our new publications.



    If you want to share this article with an English-speaking audience, please use the translation link: Andrey Karpov considers that code of the Manticore project is better than code of the Sphinx project

    Have you read the article and have a question?
    Often our articles are asked the same questions. We collected the answers here: Answers to questions from readers of articles about PVS-Studio, version 2015 . Please see the list.

    Also popular now: