Evil lives in comparison functions

    Evil in comparison functions

    Readers may recall my article entitled “The Effect of the Last Line”. It refers to a pattern I noticed: a mistake is most often made in the last line of the same type of text blocks. Now I want to talk about a new interesting observation. It turns out that programmers tend to make a mistake in the functions of comparing two objects. Such a statement seems implausible, however, I will show a huge number of examples of errors that shock the reader. Read the new study, it will be interesting and scary.

    Issue


    I argue that programmers very often make mistakes in fairly simple functions that are designed to compare two objects. This statement is based on the experience of our team checking a large number of open projects in C, C ++ and C #.

    The functions in question are called IsEqual , Equals , Compare , AreEqual, and so on. Or overloaded operators such as == , ! = .

    I noticed that when writing articles I often come across errors related to comparison functions. I decided to investigate this issue in more detail and studied the basethe errors we found. To do this, the database was searched for functions with names containing the words Cmp , Equal , Compare, and so on. The result was very impressive and shocking.

    In general, this story is similar to writing the article “The Effect of the Last Line”. In the same way, I noticed an anomaly and decided to study it in more detail. Unfortunately, unlike the mentioned article, I do not know what numbers and statistics to bring me. Perhaps I’ll come up with: how and what can be calculated - later. Now, I am guided by intuition and can only share my feelings. They say that there are many errors in the comparison functions, and I’m sure that you will have the same feeling when I show a large number of impressive examples.

    Psychology


    Let's go back to the article " The Effect of the Last Line " for a moment . By the way, if you haven’t read it, I suggest pausing and getting to know her. There is a more detailed analysis of this topic: " Explanation of the effect of the last line ."

    In general, we can conclude that the cause of errors in the last lines is due to the fact that the developer has already mentally moved on to the next lines / tasks, instead of focusing on completing the current fragment. As a result, at the end of writing the same type of text blocks in the last of them, he is more likely to make a typo than in the previous ones.

    I believe that when writing a comparison function, the developer often does not focus on it, considering it trivial. In other words, he writes the code automatically without thinking about it. Otherwise, it is not clear how to make a similar error:

    bool IsLuidsEqual(LUID luid1, LUID luid2)
    {
      return (luid1.LowPart == luid2.LowPart) &&
             (luid2.HighPart == luid2.HighPart);
    }

    PVS-Studio analyzer detected this error in the RunAsAdmin Explorer Shim (C ++) project code : V501 There are identical sub-expressions to the left and to the right of the '==' operator: luid2.HighPart == luid2.HighPart RAACommon raacommonfuncs. cpp 1511

    typo. The second line should read: luid1.HighPart == luid2.HighPart .

    Agree, the code is simple. Apparently, the simplicity of the code just spoils everything. The programmer immediately perceives the task of writing such a function standard and uninteresting. He instantly decides in his head how to write a function, and it remains only to implement the code. This is a routine, but, unfortunately, a necessary process in order to move on to writing more important, complex and interesting code. The developer’s thoughts are already there and ... as a result, he makes a mistake.

    In addition, programmers rarely write unit tests for such functions. Again, the deceptive simplicity of these functions fails. It seems redundant to test them, because these functions are so simple and of the same type. A person wrote hundreds of such functions in his life, can he really make a mistake in the next function ?! Maybe it does.

    At the same time I want to draw attention to the fact that the article is not about the code of students who learn to program. We are talking about errors in the code of projects such as GCC, Qt, GDB, LibreOffice, Unreal Engine 4, CryEngine V, Chromium, MongoDB, Oracle VM Virtual Box, FreeBSD, WinMerge, CoreCLR, MySQL, Mono, CoreFX, Roslyn, MSBuild and etc. Everything is very serious.

    I will show you so many diverse examples that you will be scared to go to bed.

    Erroneous patterns in comparison functions


    All errors in the comparison functions will be divided into several patterns. The article deals with errors in projects in the C, C ++ and C # languages, but there is no sense in somehow separating these languages, since many patterns are the same for different languages.

    Pattern: A <B, B> A


    Very often, in the comparison functions, you need to perform the following checks:

    • A <b
    • A> B

    Sometimes, programmers find it more beautiful to use the same <operator, but swap the variable names:

    • A <b
    • B <a

    However, due to carelessness, the following checks are obtained:

    • A <b
    • B> a

    In fact, the same comparison is performed here twice. Perhaps you did not understand what was being discussed, but now we will move on to practical examples, and everything will become clear.

    string _server;
    ....
    bool operator<( const ServerAndQuery& other ) const {
      if ( ! _orderObject.isEmpty() )
        return _orderObject.woCompare( other._orderObject ) < 0;
      if ( _server < other._server )
        return true;
      if ( other._server > _server )
        return false;
      return _extra.woCompare( other._extra ) < 0;
    }

    The PVS-Studio analyzer detected this error in the MongoDB (C ++) project code : V581 The conditional expressions of the 'if' operators located alongside each other are identical. Check lines: 44, 46. parallel.h 46

    This condition:

    if ( other._server > _server )

    It will always be false, since exactly the same check was performed two lines above. The correct version of the code:

    if ( _server < other._server )
      return true;
    if ( other._server < _server )
      return false;

    This error is also found in the Chromium project code (C ++):

    enum ContentSettingsType;
    struct EntryMapKey {
      ContentSettingsType content_type;
      ...
    };
    bool OriginIdentifierValueMap::EntryMapKey::operator<(
        const OriginIdentifierValueMap::EntryMapKey& other) const {
      if (content_type < other.content_type)
        return true;
      else if (other.content_type > content_type)
        return false;
      return (resource_identifier < other.resource_identifier);
    }

    PVS-Studio warning : V517 The use of 'if (A) {...} else if (A) {...}' pattern was detected. There is a probability of logical error presence. Check lines: 61, 63. browser content_settings_origin_identifier_value_map.cc 61

    I showed an example in C ++, now it's C # queue. The following error was found in IronPython and IronRuby (C #) code .

    public static int Compare(SourceLocation left,
                              SourceLocation right) {
      if (left < right) return -1;
      if (right > left) return 1;
      return 0;
    }

    PVS-Studio Warning (C #): V3021 There are two 'if' statements with identical conditional expressions. The first 'if' statement contains method return. This means that the second 'if' statement is senseless. SourceLocation.cs 156

    I think the explanation here is unnecessary.

    Note. There is only one example of an error for C #, but there were 2 for C ++. In general, there will be fewer errors for C # code in the article than for C / C ++. However, I do not recommend rushing to the conclusion that C # is much safer. The fact is that the PVS-Studio analyzer learned to analyze C # code relatively recently, and we just checked a lot less projects written in C # than in C and C ++.

    Pattern: class member compares with itself


    Comparison functions often consist of sequential comparisons of field structures / classes. Such a code is prone to errors when a member of a class begins to compare with itself. In this case, two subtypes of error can be distinguished.

    In the first case, they forget to indicate the name of the object and write like this:

    return m_x == foo.m_x &&
           m_y == m_y &&            // <=
           m_z == foo.m_z;

    In the second case, they write the same object name:
    return zzz.m_x == foo.m_x &&
           zzz.m_y == zzz.m_y &&    // <=
           zzz.m_z == foo.m_z;

    Let's get acquainted with this pattern of errors in practice. By the way, note that the incorrect comparison is often found in the last of the same blocks of text: hello to the "last line effect".

    The error was found in the code of the Unreal Engine 4 (C ++) project :

    bool
    Compare(const FPooledRenderTargetDesc& rhs, bool bExact) const
    {
      ....
      return Extent == rhs.Extent
        && Depth == rhs.Depth
        && bIsArray == rhs.bIsArray
        && ArraySize == rhs.ArraySize
        && NumMips == rhs.NumMips
        && NumSamples == rhs.NumSamples
        && Format == rhs.Format
        && LhsFlags == RhsFlags
        && TargetableFlags == rhs.TargetableFlags
        && bForceSeparateTargetAndShaderResource ==
             rhs.bForceSeparateTargetAndShaderResource
        && ClearValue == rhs.ClearValue
        && AutoWritable == AutoWritable;           // <=
    }

    Warning PVS-Studio: V501 There are identical sub-expressions to the left and to the right of the '==' operator: AutoWritable == AutoWritable rendererinterface.h 180 Samba

    project code (C):

    static int compare_procids(const void *p1, const void *p2)
    {
      const struct server_id *i1 = (struct server_id *)p1;
      const struct server_id *i2 = (struct server_id *)p2;
      if (i1->pid < i2->pid) return -1;
      if (i2->pid > i2->pid) return 1;
      return 0;
    }

    PVS-Studio warning : V501 There are identical sub-expressions to the left and to the right of the '>' operator: i2-> pid> i2-> pid brlock.c 1901 MongoDB

    project code (C ++):

    bool operator==(const MemberCfg& r) const {
      ....
      return _id==r._id && votes == r.votes &&
             h == r.h && priority == r.priority &&
             arbiterOnly == r.arbiterOnly &&
             slaveDelay == r.slaveDelay &&
             hidden == r.hidden &&
             buildIndexes == buildIndexes;        // <=
    }

    Warning PVS-Studio: V501 There are identical sub-expressions to the left and to the right of the '==' operator: buildIndexes == buildIndexes rs_config.h 101 Geant4 Software

    project code (C ++):

    inline G4bool G4FermiIntegerPartition::
    operator==(const G4FermiIntegerPartition& right)
    {
      return (total == right.total &&
              enableNull == enableNull &&          // <=
              partition == right.partition);
    }

    PVS-Studio warning : V501 There are identical sub-expressions to the left and to the right of the '==' operator: enableNull == enableNull G4hadronic_deex_fermi_breakup g4fermiintegerpartition.icc 58 LibreOffice

    project code (C ++):

    class SvgGradientEntry
    {
      ....
      bool operator==(const SvgGradientEntry& rCompare) const
      {
        return (getOffset() == rCompare.getOffset()
               && getColor() == getColor()            // <=
               && getOpacity() == getOpacity());      // <=
      }
      ....
    }

    PVS-Studio warning : V501 There are identical sub-expressions to the left and to the right of the '==' operator: getColor () == getColor () svggradientprimitive2d.hxx 61 Chromium

    project code (C ++):

    bool FileIOTest::MatchesResult(const TestStep& a,
                                   const TestStep& b) {
      ....
      return (a.data_size == a.data_size &&             // <=
              std::equal(a.data, a.data + a.data_size, b.data));
    }

    PVS-Studio warning : V501 There are identical sub-expressions to the left and to the right of the '==' operator: a.data_size == a.data_size cdm_file_io_test.cc 367 FreeCAD

    project code (C ++):

    bool FaceTypedBSpline::isEqual(const TopoDS_Face &faceOne,
                                   const TopoDS_Face &faceTwo) const
    {
      ....
      if (surfaceOne->IsURational() != 
          surfaceTwo->IsURational())
        return false;
      if (surfaceTwo->IsVRational() !=         // <= 
          surfaceTwo->IsVRational())           // <=
        return false;
      if (surfaceOne->IsUPeriodic() != 
          surfaceTwo->IsUPeriodic())
        return false;
      if (surfaceOne->IsVPeriodic() != 
          surfaceTwo->IsVPeriodic())
        return false;
      if (surfaceOne->IsUClosed() != 
          surfaceTwo->IsUClosed())
        return false;
      if (surfaceOne->IsVClosed() != 
          surfaceTwo->IsVClosed())
        return false;
      if (surfaceOne->UDegree() != 
          surfaceTwo->UDegree())
        return false;
      if (surfaceOne->VDegree() != 
          surfaceTwo->VDegree())
        return false;
      ....
    }

    PVS-Studio warning : V501 There are identical sub-expressions 'surfaceTwo-> IsVRational ()' to the left and to the right of the '! =' Operator. modelrefine.cpp 780 Serious Engine

    project code (C ++):

    class CTexParams {
    public:
      inline BOOL IsEqual( CTexParams tp) {
        return tp_iFilter     == tp.tp_iFilter &&
               tp_iAnisotropy == tp_iAnisotropy &&             // <=
               tp_eWrapU      == tp.tp_eWrapU &&
               tp_eWrapV      == tp.tp_eWrapV; };
      ....
    };

    PVS-Studio warning : V501 There are identical sub-expressions to the left and to the right of the '==' operator: tp_iAnisotropy == tp_iAnisotropy gfx_wrapper.h 180 Qt

    project code (C ++):

    inline bool qCompare(QImage const &t1, QImage const &t2, ....)
    {
      ....
      if (t1.width() != t2.width() || t2.height() != t2.height()) {
      ....
    }

    PVS-Studio warning : V501 There are identical sub-expressions to the left and to the right of the '! =' Operator: t2.height ()! = T2.height () qtest_gui.h 101 FreeBSD (C)

    project code :

    static int
    compare_sh(const void *_a, const void *_b)
    {
      const struct ipfw_sopt_handler *a, *b;
      a = (const struct ipfw_sopt_handler *)_a;
      b = (const struct ipfw_sopt_handler *)_b;
      ....
      if ((uintptr_t)a->handler < (uintptr_t)b->handler)
        return (-1);
      else if ((uintptr_t)b->handler > (uintptr_t)b->handler) // <=
        return (1);
      return (0);
    }

    PVS-Studio warning : V501 There are identical sub-expressions '(uintptr_t) b-> handler' to the left and to the right of the '>' operator. ip_fw_sockopt.c 2893 Mono

    project code (C #):

    static bool AreEqual (VisualStyleElement value1, 
                          VisualStyleElement value2)
    {
      return
        value1.ClassName == value1.ClassName && // <=
        value1.Part == value2.Part &&
        value1.State == value2.State;
    }

    PVS-Studio Warning: V3001 There are identical sub-expressions 'value1.ClassName' to the left and to the right of the '==' operator. ThemeVisualStyles.cs 2141 Mono

    project code (C #):

    public int ExactInference (TypeSpec u, TypeSpec v)
    {
      ....
      var ac_u = (ArrayContainer) u;
      var ac_v = (ArrayContainer) v;
      ....
      var ga_u = u.TypeArguments;
      var ga_v = v.TypeArguments;
      ....
      if (u.TypeArguments.Length != u.TypeArguments.Length) // <=
        return 0;
      ....
    }

    PVS-Studio Warning: V3001 There are identical sub-expressions 'u.TypeArguments.Length' to the left and to the right of the '! =' Operator. generic.cs 3135 MonoDevelop

    project code (C #):

    Accessibility DeclaredAccessibility { get; }
    bool IsStatic { get; }
    private bool MembersMatch(ISymbol member1, ISymbol member2)
    {
      if (member1.Kind != member2.Kind)
      {
        return false;
      }
      if (member1.DeclaredAccessibility !=          // <=1
          member1.DeclaredAccessibility             // <=1
       || member1.IsStatic != member1.IsStatic)     // <=2
      {
        return false;
      }
      if (member1.ExplicitInterfaceImplementations().Any() ||  
          member2.ExplicitInterfaceImplementations().Any())
      {
        return false;
      }
      return SignatureComparer
        .HaveSameSignatureAndConstraintsAndReturnTypeAndAccessors(
           member1, member2, this.IsCaseSensitive);
    }

    PVS-Studio Warning: V3001 There are identical sub-expressions 'member1.IsStatic' to the left and to the right of the '! =' Operator. CSharpBinding AbstractImplementInterfaceService.CodeAction.cs 545 Haiku

    Project Code (C ++):

    int __CORTEX_NAMESPACE__ compareTypeAndID(....)
    {
      int retValue = 0;
      ....
      if (lJack && rJack)
      {
        if (lJack->m_jackType < lJack->m_jackType)           // <=
        {
          return -1;
        }
        if (lJack->m_jackType == lJack->m_jackType)          // <=
        {
          if (lJack->m_index < rJack->m_index)
          {
            return -1;
          }
          else
          {
            return 1;
          }
        }
        else if (lJack->m_jackType > rJack->m_jackType)
        {
          retValue = 1;
        }
      }
      return retValue;
    }

    PVS-Studio warning : V501 There are identical sub-expressions to the left and to the right of the '<' operator: lJack-> m_jackType <lJack-> m_jackType MediaJack.cpp 783

    Just below, there is another exact same error. As I understand it, in both cases they forgot to replace lJack with rJack . CryEngine V

    Project Code (C ++):

    bool
    CompareRotation(const Quat& q1, const Quat& q2, float epsilon)
    {
      return (fabs_tpl(q1.v.x - q2.v.x) <= epsilon)
          && (fabs_tpl(q1.v.y - q2.v.y) <= epsilon)
          && (fabs_tpl(q2.v.z - q2.v.z) <= epsilon)     // <=
          && (fabs_tpl(q1.w - q2.w) <= epsilon);
    }

    Warning PVS-Studio: V501 There are identical sub-expressions to the left and to the right of the '-' operator: q2.vz - q2.vz entitynode.cpp 93

    Pattern: calculating pointer size instead of structure / class size


    This type of error occurs in C and C ++ programs and is associated with incorrect use of the sizeof operator . The error is not calculating the size of the object, but the size of the pointer. Example:

    T *a = foo1();
    T *b = foo2();
    x = memcmp(a, b, sizeof(a));

    Instead of the size of the structure T , the size of the pointer is calculated. The size of the pointer depends on the data model used , but usually 4 or 8 bytes. As a result, more or less bytes of memory are compared than the structure occupies.

    The correct code is:

    x = memcmp(a, b, sizeof(T));

    or

    x = memcmp(a, b, sizeof(*a));

    Now let's get to practice. Here is how such an error looks in the code of the CryEngine V (C ++) project :

    bool
    operator==(const SComputePipelineStateDescription& other) const
    {
      return 0 == memcmp(this, &other, sizeof(this));
    }

    PVS-Studio Warning: V579 The memcmp function receives the pointer and its size as arguments. It is possibly a mistake. Inspect the third argument. graphicspipelinestateset.h 58 Unreal Engine 4 (C ++)

    project code :

    bool FRecastQueryFilter::IsEqual(
      const INavigationQueryFilterInterface* Other) const
    {
      // @NOTE: not type safe, should be changed when
      // another filter type is introduced
      return FMemory::Memcmp(this, Other, sizeof(this)) == 0;
    }

    PVS-Studio Warning: V579 The Memcmp function receives the pointer and its size as arguments. It is possibly a mistake. Inspect the third argument. pimplrecastnavmesh.cpp 172

    Pattern: repeating arguments of the form Cmp (A, A)


    Comparison functions often call other comparison functions. In this case, one of the possible errors is that a link / pointer to the same object is transmitted twice. Example:

    x = memcmp(A, A, sizeof(T));

    Here, the object A will be compared with itself, which, of course, does not make sense.

    We start with the error found in the GDB (C) debugger code :

    static int
    psymbol_compare (const void *addr1, const void *addr2,
                     int length)
    {
      struct partial_symbol *sym1 = (struct partial_symbol *) addr1;
      struct partial_symbol *sym2 = (struct partial_symbol *) addr2;
      return (memcmp (&sym1->ginfo.value, &sym1->ginfo.value,    // <=
                      sizeof (sym1->ginfo.value)) == 0
              && sym1->ginfo.language == sym2->ginfo.language
              && PSYMBOL_DOMAIN (sym1) == PSYMBOL_DOMAIN (sym2)
              && PSYMBOL_CLASS (sym1) == PSYMBOL_CLASS (sym2)
              && sym1->ginfo.name == sym2->ginfo.name);
    }

    Warning PVS-Studio: V549 The first argument of 'memcmp' function is equal to the second argument. psymtab.c 1580 CryEngineSDK

    project code (C ++):

    inline bool operator != (const SEfResTexture &m) const
    {
      if (stricmp(m_Name.c_str(), m_Name.c_str()) != 0 ||   // <=
          m_TexFlags != m.m_TexFlags || 
          m_bUTile != m.m_bUTile ||
          m_bVTile != m.m_bVTile ||
          m_Filter != m.m_Filter ||
          m_Ext != m.m_Ext ||
          m_Sampler != m.m_Sampler)
        return true;
      return false;
    }

    Warning PVS-Studio: V549 The first argument of 'stricmp' function is equal to the second argument. ishader.h 2089 PascalABC.NET

    project code (C #):

    private List enum_consts = new List();
    public override bool IsEqual(SymScope ts)
    {
      EnumScope es = ts as EnumScope;
      if (es == null) return false;
      if (enum_consts.Count != es.enum_consts.Count) return false;
      for (int i = 0; i < es.enum_consts.Count; i++)
        if (string.Compare(enum_consts[i],
                           this.enum_consts[i], true) != 0)
          return false;
      return true;
    }

    PVS-Studio Warning: V3038 The 'enum_consts [i]' argument was passed to 'Compare' method several times. It is possible that other argument should be passed instead. CodeCompletion SymTable.cs 2206

    Here I will explain a little. Error in the actual arguments to the Compare function :

    string.Compare(enum_consts[i], this.enum_consts[i], true)

    The fact is that enum_consts [i] and this.enum_consts [i] are one and the same. As I understand it, the correct call should be like this:

    string.Compare(es.enum_consts[i], this.enum_consts[i], true)

    or

    string.Compare(enum_consts[i], es.enum_consts[i], true)

    Pattern: repeating checks A == B && A == B


    A common programming error is when the same check is performed twice. Example:

    return A == B &&
           C == D &&   // <=
           C == D &&   // <=
           E == F;

    In such cases, 2 options are possible. The first is harmless: one comparison is superfluous, and you can simply delete it. The second is worse: they wanted to check some other variables, but they were sealed up.

    In any case, such a code deserves close attention. Let me scare you harder and show that such an error can be found even in the GCC (C) compiler code :

    static bool
    dw_val_equal_p (dw_val_node *a, dw_val_node *b)
    {
      ....
      case dw_val_class_vms_delta:
        return (!strcmp (a->v.val_vms_delta.lbl1,
                         b->v.val_vms_delta.lbl1)
                && !strcmp (a->v.val_vms_delta.lbl1,
                            b->v.val_vms_delta.lbl1));
      ....
    }

    PVS-Studio warning : V501 There are identical sub-expressions '! Strcmp (a-> v.val_vms_delta.lbl1, b-> v.val_vms_delta.lbl1)' to the left and to the right of the '&&' operator. dwarf2out.c 1428

    The strcmp function is called two times with the same set of arguments. Unreal Engine 4 (C ++)

    project code :

    FORCEINLINE
    bool operator==(const FShapedGlyphEntryKey& Other) const
    {
      return FontFace == Other.FontFace 
        && GlyphIndex == Other.GlyphIndex   // <=
        && FontSize == Other.FontSize
        && FontScale == Other.FontScale
        && GlyphIndex == Other.GlyphIndex;  // <=
    }

    PVS-Studio Warning: V501 There are identical sub-expressions 'GlyphIndex == Other.GlyphIndex' to the left and to the right of the '&&' operator. fontcache.h 139 Serious Engine

    project code (C ++):

    inline BOOL CValuesForPrimitive::operator==(....)
    {
      return (
     (....) &&
     (vfp_ptPrimitiveType == vfpToCompare.vfp_ptPrimitiveType) &&
     ....
     (vfp_ptPrimitiveType == vfpToCompare.vfp_ptPrimitiveType) &&
     ....
    );

    PVS-Studio warning : V501 There are identical sub-expressions '(vfp_ptPrimitiveType == vfpToCompare.vfp_ptPrimitiveType)' to the left and to the right of the '&&' operator. worldeditor.h 580 Oracle VM Virtual Box (C ++)

    Project Code :

    typedef struct SCMDIFFSTATE
    {
      ....
      bool  fIgnoreTrailingWhite;
      bool  fIgnoreLeadingWhite;
      ....
    } SCMDIFFSTATE;
    /* Pointer to a diff state. */
    typedef SCMDIFFSTATE *PSCMDIFFSTATE;
    /* Compare two lines */
    DECLINLINE(bool) scmDiffCompare(PSCMDIFFSTATE pState, ....)
    {
      ....
      if (pState->fIgnoreTrailingWhite    // <=
       || pState->fIgnoreTrailingWhite)   // <=
        return scmDiffCompareSlow(....);
      ....
    }

    PVS-Studio Warning: V501 There are identical sub-expressions 'pState-> fIgnoreTrailingWhite' to the left and to the right of the '||' operator. scmdiff.cpp 238

    Pattern: misuse of value returned by memcmp function


    The memcmp function returns the following values ​​of type int :

    • <0 - buf1 less than buf2;
    • 0 - buf1 identical to buf2;
    • > 0 - buf1 greater than buf2;

    Note. “Greater than 0” means any numbers, and not at all 1. These numbers can be: 2, 3, 100, 256, 1024, 5555, 65536, and so on. This means that this result cannot be placed in a variable of type char or short . Significant bits may be dropped, which will violate the program execution logic.

    It also follows from this that the result cannot be compared with constants 1 or -1. In other words, doing it wrong is:

    if (memcmp(a, b, sizeof(T)) == 1)
    if (memcmp(x, y, sizeof(T)) == -1)

    Correct comparisons:

    if (memcmp(a, b, sizeof(T)) > 0)
    if (memcmp(a, b, sizeof(T)) < 0)

    The insidiousness of the described errors is that the code can successfully work for a long time. Errors can begin to manifest themselves when switching to a new platform or when changing the compiler version. ReactOS

    Project Code (C ++):

    HRESULT WINAPI CRecycleBin::CompareIDs(....)
    {
      ....
      return MAKE_HRESULT(SEVERITY_SUCCESS, 0,
       (unsigned short)memcmp(pidl1->mkid.abID,
                              pidl2->mkid.abID,
                              pidl1->mkid.cb));
    }

    PVS-Studio Warning: V642 Saving the 'memcmp' function result inside the 'unsigned short' type variable is inappropriate. The significant bits could be lost breaking the program's logic. recyclebin.cpp 542 Firebird

    project code (C ++):

    SSHORT TextType::compare(ULONG len1, const UCHAR* str1,
                             ULONG len2, const UCHAR* str2)
    {
      ....
      SSHORT cmp = memcmp(str1, str2, MIN(len1, len2));
      if (cmp == 0)
        cmp = (len1 < len2 ? -1 : (len1 > len2 ? 1 : 0));
      return cmp;
    }

    PVS-Studio Warning: V642 Saving the 'memcmp' function result inside the 'short' type variable is inappropriate. The significant bits could be lost breaking the program's logic. texttype.cpp 338 CoreCLR

    Project Code (C ++):

    bool operator( )(const GUID& _Key1, const GUID& _Key2) const
      { return memcmp(&_Key1, &_Key2, sizeof(GUID)) == -1; }

    PVS-Studio warning : V698 Expression 'memcmp (....) == -1' is incorrect. This function can return not only the value '-1', but any negative value. Consider using 'memcmp (....) <0' instead. sos util.cpp 142 OpenToonz

    project code (C ++):

    bool TFilePath::operator<(const TFilePath &fp) const
    {
      ....
      char differ;
      differ = _wcsicmp(iName.c_str(), jName.c_str());
      if (differ != 0)
        return differ < 0 ? true : false;
      ....
    }

    PVS-Studio Warning: V642 Saving the '_wcsicmp' function result inside the 'char' type variable is inappropriate. The significant bits could be lost, breaking the program's logic. tfilepath.cpp 328

    Pattern: invalid null reference checking


    This error pattern is typical for C # programs. In comparison functions, type casts are sometimes performed using the as operator . The error lies in the fact that inattentiveness to null is not checked for a new link, but the original one. Consider a synthetic example:

    ChildT foo = obj as ChildT;
    if (obj == null)
      return false;
    if (foo.zzz()) {}

    Checking if (obj == null) protects against the situation if the obj variable contained a null reference. However, there is no protection against the case if it turns out that the as operator will return a null reference. The correct code should look like this:

    ChildT foo = obj as ChildT;
    if (foo == null)
      return false;
    if (foo.zzz()) {}

    As a rule, an error occurs due to the carelessness of the programmer. Similar errors are possible in C and C ++ programs, but I did not find any such cases in the database of errors we found. MonoDevelop

    Project Code (C #):

    public override bool Equals (object o)
    {
      SolutionItemReference sr = o as SolutionItemReference;
      if (o == null)
        return false;
      return (path == sr.path) && (id == sr.id);
    }

    PVS-Studio Warning: V3019 Possibly an incorrect variable is compared to null after type conversion using 'as' keyword. Check variables 'o', 'sr'. MonoDevelop.Core SolutionItemReference.cs 81 CoreFX

    Project Code (C #):

    public override bool Equals(object comparand)
    {
      CredentialHostKey comparedCredentialKey =
                                      comparand as CredentialHostKey;
      if (comparand == null)
      {
        // This covers also the compared == null case
        return false;
      }
      bool equals = string.Equals(AuthenticationType,
            comparedCredentialKey.AuthenticationType, ....
      ....
    }

    PVS-Studio Warning: V3019 Possibly an incorrect variable is compared to null after type conversion using 'as' keyword. Check variables 'comparand', 'comparedCredentialKey'. CredentialCache.cs 4007 Roslyn

    Project Code (C #):

    public override bool Equals(object obj)
    {
      var d = obj as DiagnosticDescription;
      if (obj == null)
        return false;
      if (!_code.Equals(d._code))
        return false;
      ....
    }

    PVS-Studio Warning: V3019 Possibly an incorrect variable is compared to null after type conversion using 'as' keyword. Check variables 'obj', 'd'. DiagnosticDescription.cs 201 Roslyn

    Project Code (C #):

    protected override bool AreEqual(object other)
    {
      var otherResourceString = other as LocalizableResourceString;
      return
        other != null &&
        _nameOfLocalizableResource == 
          otherResourceString._nameOfLocalizableResource &&
        _resourceManager == otherResourceString._resourceManager &&
        _resourceSource == otherResourceString._resourceSource &&
        ....
    }

    PVS-Studio Warning: V3019 Possibly an incorrect variable is compared to null after type conversion using 'as' keyword. Check variables 'other', 'otherResourceString'. LocalizableResourceString.cs 121 MSBuild

    Project Code (C #):

    public override bool Equals(object obj)
    {
       AssemblyNameExtension name = obj as AssemblyNameExtension;
       if (obj == null)  // <=
       {
         return false;
       }
       ....
    }

    PVS-Studio Warning: V3019 Possibly an incorrect variable is compared to null after type conversion using 'as' keyword. Check variables 'obj', 'name'. AssemblyRemapping.cs 64 Mono

    project code (C #):

    public override bool Equals (object o)
    {
      UrlMembershipCondition umc = (o as UrlMembershipCondition);
      if (o == null)                                      // <=
        return false;
      ....
      return (String.Compare (u, 0, umc.Url, ....) == 0); // <=
    }

    PVS-Studio Warning: V3019 Possibly an incorrect variable is compared to null after type conversion using 'as' keyword. Check variables 'o', 'umc'. UrlMembershipCondition.cs 111 Media Portal 2 (C #)

    Project Code :

    public override bool Equals(object obj)
    {
      EpisodeInfo other = obj as EpisodeInfo;
      if (obj == null) return false;
      if (TvdbId > 0 && other.TvdbId > 0)
        return TvdbId == other.TvdbId;
      ....
    }

    PVS-Studio Warning: V3019 Possibly an incorrect variable is compared to null after type conversion using 'as' keyword. Check variables 'obj', 'other'. EpisodeInfo.cs 560 NASA World Wind

    Project Code (C #):

    public int CompareTo(object obj)
    {
      RenderableObject robj = obj as RenderableObject;
      if(obj == null)                                 // <=
        return 1;
      return this.m_renderPriority.CompareTo(robj.RenderPriority);
    }

    PVS-Studio Warning: V3019 Possibly an incorrect variable is compared to null after type conversion using 'as' keyword. Check variables 'obj', 'robj'. RenderableObject.cs 199

    Pattern Type: Wrong Cycles


    Some functions compare collections of items. Naturally, for their comparison, various cycle options are used. If you write the code inattentively, as happens with comparison functions, then it is easy to confuse something with loops. Consider several such situations. Trans-Proteomic Pipeline (C ++)

    Project Code :

    bool Peptide::operator==(Peptide& p) {
      ....
      for (i = 0, j = 0;
           i < this->stripped.length(), j < p.stripped.length();
           i++, j++) { 
      ....
    }

    PVS-Studio Warning: V521 Such expressions using the ',' operator are dangerous. Make sure the expression is correct. tpplib peptide.cpp 191

    Note that the condition uses the comma operator. The code is clearly incorrect, since the condition to the left of the comma is not taken into account. That is, the condition on the left is calculated, but its result is not used in any way. Qt

    Project Code (C ++):

    bool equals( class1* val1, class2* val2 ) const
    {
      ...
      size_t size = val1->size();
      ...
      while ( --size >= 0 ){
        if ( !comp(*itr1,*itr2) )
          return false;
        itr1++;
        itr2++;
      }
      ...
    }

    PVS-Studio warning : V547 Expression '- size> = 0' is always true. Unsigned type value is always> = 0. QtCLucene arrays.h 154 CLucene

    project code (C ++):

    class Arrays
    {
      ....
       bool equals( class1* val1, class2* val2 ) const{
         static _comparator comp;
         if ( val1 == val2 )
           return true;
         size_t size = val1->size();
         if ( size != val2->size() )
           return false;
         _itr1 itr1 = val1->begin();
         _itr2 itr2 = val2->begin();
         while ( --size >= 0 ){
           if ( !comp(*itr1,*itr2) )
             return false;
           itr1++;
           itr2++;
         }
       return true;
      }
      ....
    }

    PVS-Studio warning : V547 Expression '- size> = 0' is always true. Unsigned type value is always> = 0. arrays.h 154 Mono

    project code (C #):

    public override bool Equals (object obj)
    {
      ....
      for (int i=0; i < list.Count; i++) {
        bool found = false;
        for (int j=0; i < ps.list.Count; j++) {     // <=
          if (list [i].Equals (ps.list [j])) {
            found = true;
            break;
          }
        }
        if (!found)
          return false;
      }
      return true; 
    }

    PVS-Studio Warning: V3015 It is likely that a wrong variable is being compared inside the 'for' operator. Consider reviewing 'i' corlib-net_4_x PermissionSet.cs 607

    Apparently, a typo was made, and, in fact, in the condition of the nested loop, the variable j should be used , not i :

    for (int j=0; j < ps.list.Count; j++)

    Pattern: A = getA (), B = GetA ()


    Often in comparison functions you have to write this type of code:

    if (GetA().x == GetB().x && GetA().y == GetB().y)

    To reduce the size of the conditions or for optimization, intermediate variables are used:

    Type A = GetA();
    Type B = GetB();
    if (A.x == B.x && A.y == B.y)

    At the same time, they inadvertently make a mistake and initialize temporary variables with the same value:

    Type A = GetA();
    Type B = GetA();

    Let's see how such errors look in the code of real applications. LibreOffice

    project code (C ++):

    bool CmpAttr(
      const SfxPoolItem& rItem1, const SfxPoolItem& rItem2)
    {
      ....
      bool bNumOffsetEqual = false;
      ::boost::optional oNumOffset1 =
            static_cast(rItem1).GetNumOffset();
      ::boost::optional oNumOffset2 =
            static_cast(rItem1).GetNumOffset();
      if (!oNumOffset1 && !oNumOffset2)
      {
        bNumOffsetEqual = true;
      }
      else if (oNumOffset1 && oNumOffset2)
      {
        bNumOffsetEqual = oNumOffset1.get() == oNumOffset2.get();
      }
      else
      {
        bNumOffsetEqual = false;
      }
      ....
    }

    PVS-Studio warning : V656 Variables 'oNumOffset1', 'oNumOffset2' are initialized through the call to the same function. It's probably an error or un-optimized code. Check lines: 68, 69. findattr.cxx 69 Qt

    project code (C ++):

    AtomicComparator::ComparisonResult
    IntegerComparator::compare(const Item &o1,
                               const AtomicComparator::Operator,
                               const Item &o2) const
    {
      const Numeric *const num1 = o1.as();
      const Numeric *const num2 = o1.as();
      if(num1->isSigned() || num2->isSigned())
      ....
    }

    PVS-Studio Warning: V656 Variables 'num1', 'num2' are initialized through the call to the same function. It's probably an error or un-optimized code. Consider inspecting the 'o1.as <Numeric> ()' expression. Check lines: 220, 221. qatomiccomparators.cpp 221

    Pattern: failed code copying


    Many of the errors mentioned earlier can be called the consequences of the failed Copy-Paste. They fell under some kind of erroneous pattern, and I decided that it would be logical to describe them in the corresponding sections. However, I still have a few errors that clearly arose due to unsuccessful copying of the code, but which I do not know how to classify. So I just compiled these errors here. CoreCLR

    Project Code (C ++):

    int __cdecl Compiler::RefCntCmp(const void* op1, const void* op2)
    {
      ....
      if (weight1)
      {
        ....
        if (varTypeIsGC(dsc1->TypeGet()))
        {
          weight1 += BB_UNITY_WEIGHT / 2;
        }
        if (dsc1->lvRegister)
        {
          weight1 += BB_UNITY_WEIGHT / 2;
        }
      }
      if (weight1)
      {
        ....
        if (varTypeIsGC(dsc2->TypeGet()))
        {
          weight1 += BB_UNITY_WEIGHT / 2;       // <=
        }
        if (dsc2->lvRegister)
        {
          weight2 += BB_UNITY_WEIGHT / 2;
        }
      }
      ....
    }

    PVS-Studio Warning: V778 Two similar code fragments were found. Perhaps, this is a typo and 'weight2' variable should be used instead of 'weight1'. clrjit lclvars.cpp 2702

    The function was long, so it is thoroughly shortened for the article. If we consider the code of this function, it is noticeable that part of the code was copied, but in one place the programmer forgot to replace the variable weight1 with weight2 . WPF samples

    code by Microsoft project (C #):

    public int Compare(GlyphRun a, GlyphRun b)
    {
      ....
      if (aPoint.Y > bPoint.Y)      // <=
      {
        return -1;
      }
      else if (aPoint.Y > bPoint.Y) // <=
      {
        result = 1;
      }
      else if (aPoint.X < bPoint.X)
      {
        result = -1;
      }
      else if (aPoint.X > bPoint.X)
      {
        result = 1;
      }
      ....
    }

    PVS-Studio warning : V3003 The use of 'if (A) {...} else if (A) {...}' pattern was detected. There is a probability of logical error presence. Check lines: 418, 422. txtserializerwriter.cs 418 PascalABC.NET

    project code (C #):

    public void CompareInternal(....)
    {
      ....
      else if (left is int64_const)
        CompareInternal(left as int64_const, right as int64_const);
      ....
      else if (left is int64_const)
        CompareInternal(left as int64_const, right as int64_const);  
      ....
    }

    PVS-Studio warning : V3003 The use of 'if (A) {...} else if (A) {...}' pattern was detected. There is a probability of logical error presence. Check lines: 597, 631. ParserTools SyntaxTreeComparer.cs 597 SharpDevelop

    project code (C #):

    public int Compare(SharpTreeNode x, SharpTreeNode y)
    {
      ....
      if (typeNameComparison == 0) {
        if (x.Text.ToString().Length < y.Text.ToString().Length)
          return -1;
        if (x.Text.ToString().Length < y.Text.ToString().Length)
          return 1;
      }  
      ....
    }

    PVS-Studio Warning: V3021 There are two 'if' statements with identical conditional expressions. The first 'if' statement contains method return. This means that the second 'if' statement is senseless NamespaceTreeNode.cs 87 Coin3D

    project code (C ++):

    int
    SbProfilingData::operator == (const SbProfilingData & rhs) const
    {
      if (this->actionType != rhs.actionType) return FALSE;
      if (this->actionStartTime != rhs.actionStopTime) return FALSE;
      if (this->actionStartTime != rhs.actionStopTime) return FALSE;
      ....
    }

    PVS-Studio Warning: V649 There are two 'if' statements with identical conditional expressions. The first 'if' statement contains function return. This means that the second 'if' statement is senseless. Check lines: 1205, 1206. sbprofilingdata.cpp 1206 Spring

    project code (C ++):

    bool operator < (const aiFloatKey& o) const
      {return mTime < o.mTime;}
    bool operator > (const aiFloatKey& o) const
      {return mTime < o.mTime;}

    PVS-Studio Warning: V524 It is odd that the body of '>' function is fully equivalent to the body of '<' function. assimp 3dshelper.h 470

    And the last particularly interesting piece of code that the PVS-Studio code analyzer found in the MySQL (C ++) project :

    static int rr_cmp(uchar *a,uchar *b)
    {
      if (a[0] != b[0])
        return (int) a[0] - (int) b[0];
      if (a[1] != b[1])
        return (int) a[1] - (int) b[1];
      if (a[2] != b[2])
        return (int) a[2] - (int) b[2];
      if (a[3] != b[3])
        return (int) a[3] - (int) b[3];
      if (a[4] != b[4])
        return (int) a[4] - (int) b[4];
      if (a[5] != b[5])
        return (int) a[1] - (int) b[5]; // <=
      if (a[6] != b[6])
        return (int) a[6] - (int) b[6];
      return (int) a[7] - (int) b[7];
    }

    Warning PVS-Studio: V525 The code containing the collection of similar blocks. Check items '0', '1', '2', '3', '4', '1', '6' in lines 680, 682, 684, 689, 691, 693, 695. sql records.cc 680

    Most likely, the programmer wrote the first comparison, then the second and he became bored. Therefore, he copied a block of text to the clipboard:

    if (a[1] != b[1])
      return (int) a[1] - (int) b[1];

    And I inserted it into the program text as many times as needed. Then he changed the indices, but in one place he made a mistake, and the result was an incorrect comparison:

    if (a[5] != b[5])
      return (int) a[1] - (int) b[5];

    Note. I discuss this error in more detail in the mini-book " The main issue of programming, refactoring, and all that " (see the chapter "Do not take over the work of the compiler").

    Pattern: Equals method incorrectly processes null reference


    In C #, it is customary to implement Equals methods so that they correctly handle the situation if a null reference comes as an argument. Unfortunately, the implementation of not all methods complies with this rule. GitExtensions

    Project Code (C #):

    public override bool Equals(object obj)
    {
      return GetHashCode() == obj.GetHashCode(); // <=
    }

    PVS-Studio warning : V3115 Passing 'null' to 'Equals (object obj)' method should not result in 'NullReferenceException'. Git.hub Organization.cs 14 PascalABC.NET

    Project Code (C #):

    public override bool Equals(object obj)
    {
      var rhs = obj as ServiceReferenceMapFile;
      return FileName == rhs.FileName;
    }

    PVS-Studio warning : V3115 Passing 'null' to 'Equals' method should not result in 'NullReferenceException'. ICSharpCode.SharpDevelop ServiceReferenceMapFile.cs 31

    Miscellaneous Other Errors


    G3D Content Pak (C ++) Project Code :

    bool Matrix4::operator==(const Matrix4& other) const {
      if (memcmp(this, &other, sizeof(Matrix4) == 0)) {
        return true;
      }
      ...
    }

    PVS-Studio Warning: V575 The 'memcmp' function processes '0' elements. Inspect the 'third' argument. graphics3D matrix4.cpp 269

    One closing parenthesis is out of place. As a result, the number of bytes compared is calculated by the expression sizeof (Matrix4) == 0 . The size of any class is greater than 0, which means that the result of the expression is 0. Thus, 0 bytes are compared.

    The correct option:

    if (memcmp(this, &other, sizeof(Matrix4)) == 0) {

    Wolfenstein 3D Project Code (C ++):

    inline int operator!=( quat_t a, quat_t b )
    {
      return ( ( a.x != b.x ) || ( a.y != b.y ) ||
               ( a.z != b.z ) && ( a.w != b.w ) );
    }

    PVS-Studio Warning: V648 Priority of the '&&' operation is higher than that of the '||' operation. math_quaternion.h 167

    In one place, apparently, they accidentally wrote the && operator instead of || . FlightGear

    Project Code (C):

    static int tokMatch(struct Token* a, struct Token* b)
    {
      int i, l = a->strlen;
      if(!a || !b) return 0;
      ....
    }

    PVS-Studio Warning: V595 The 'a' pointer was utilized before it was verified against nullptr. Check lines: 478, 479. codegen.c 478

    If you pass NULL as the first argument to the function , the null pointer will be dereferenced, although the idea is that the function should return 0 . WinMerge

    Project Code (C ++):

    int TimeSizeCompare::CompareFiles(int compMethod,
                                      const DIFFITEM &di)
    {
      UINT code = DIFFCODE::SAME;
      ...
      if (di.left.size != di.right.size)
      {
        code &= ~DIFFCODE::SAME;
        code = DIFFCODE::DIFF;
      }
      ...
    }

    PVS-Studio Warning: V519 The 'code' variable is assigned values ​​twice successively. Perhaps this is a mistake. Check lines: 79, 80. Merge timesizecompare.cpp 80 ReactOS

    project code (C ++):

    #define IsEqualGUID(rguid1, rguid2) \
      (!memcmp(&(rguid1), &(rguid2), sizeof(GUID)))
    static int ctl2_find_guid(....)
    {
      MSFT_GuidEntry *guidentry;
      ...
      if (IsEqualGUID(guidentry, guid)) return offset;
      ...
    }

    PVS-Studio Warning: V512 A call of the 'memcmp' function will lead to underflow of the buffer 'guidentry'. oleaut32 typelib2.c 320

    The pointer is the first argument to the macro. As a result, the address of the pointer is calculated, which makes no sense.

    The correct option:

    if (IsEqualGUID(*guidentry, guid)) return offset;

    IronPython and IronRuby Project Code (C #):

    public static bool Equals(float x, float y) {
      if (x == y) {
        return !Single.IsNaN(x);
      }
      return x == y;
    }

    PVS-Studio Warning: V3024 An odd precise comparison: x == y. Consider using a comparison with defined precision: Math.Abs ​​(A - B) <Epsilon. FloatOps.cs 1048

    It is not clear what is the point of a special check for NaN . If the condition (x == y) is fulfilled, then this means that both x and y are different from NaN , because NaN is not equal to any other value, including itself. It seems that checking for NaN is just superfluous and the code can be reduced to:

    public static bool Equals(float x, float y) {
      return x == y;
    }

    Mono Project Code (C #):

    public bool Equals (CounterSample other)
    {
      return
        rawValue         == other.rawValue         &&
        baseValue        == other.counterFrequency &&   // <=
        counterFrequency == other.counterFrequency &&   // <=
        systemFrequency  == other.systemFrequency  &&
        timeStamp        == other.timeStamp        &&
        timeStamp100nSec == other.timeStamp100nSec &&
        counterTimeStamp == other.counterTimeStamp &&
        counterType      == other.counterType;
    }

    PVS-Studio Warning: V3112 An abnormality within similar comparisons. It is possible that a typo is present inside the expression 'baseValue == other.counterFrequency'. System-net_4_x CounterSample.cs 139

    How do these programs work at all?


    Looking through all these errors, it seems surprising that all these programs generally work. Indeed, the comparison functions perform an extremely important and responsible task in programs.

    There are several explanations for the health of programs with such errors:

    1. In many functions, only part of the object is incorrectly compared. In this case, a partial comparison is enough for most tasks in this program.
    2. Не возникает (пока) ситуаций, когда функция будет работать неправильно. Например, это относится к функциям, которые не защищены от нулевых указателей или те, где результат вызова функции memcmp помещается в переменную типа char. Программе просто везёт.
    3. Рассмотренная функция сравнения используется крайне редко или вообще не используется.
    4. А кто сказал, что программа работает? Многие программы действительно делают что-то неправильно!

    Рекомендации


    I demonstrated how many errors can be found in comparison functions. And from this it follows that the performance of such functions must be checked using unit tests.

    Be sure to write unit tests for comparison operators, for Equals functions, and so on.

    I am sure that many before reading this article it seemed that such tests were redundant and still they would not reveal any errors: after all, the comparison functions at first glance are so simple ... Well, now I have shown all the horror that may be hidden in them.

    Reviews of the code and the use of static analysis tools will not be redundant either.

    Conclusion


    This article mentioned a lot of famous projects that are developed by highly qualified specialists. These projects are thoroughly tested using various methodologies. And still, this did not prevent the PVS-Studio analyzer from finding errors in them. This suggests that PVS-Studio can be a great addition to other methodologies used to improve the quality and reliability of the code.

    Visit our website and try PVS-Studio .



    If you want to share this article with an English-speaking audience, then please use the link to the translation: Andrey Karpov. The Evil within the Comparison Functions

    Also popular now: