Glibc library test experiment

    glibc and PVS-Studio
    We conducted an experiment to check the glibc library using PVS-Studio. The purpose of the experiment is to see how successfully the analyzer can check Linux projects. It may be bad so far. There are a lot of false positives due to the use of non-standard extensions. However, it was still possible to find something interesting.

    glibc


    glibc - GNU C Library (GNU library). Glibc is a C library that provides system calls and basic functions such as open, malloc, printf, etc. The C library is used for all dynamically linked programs. It is written by the Free Software Foundation for GNU operating systems. glibc is released under the GNU LGPL license.

    Adapted from Wikipedia: glibc .

    Not so long ago, news appeared on the Internet that a new version of the glibc library was released. This encouraged us to check this library using our PVS-Studio analyzer.. Namely, the glibc-2-19-90 version was checked. Unfortunately, for a couple of weeks I was distracted, so I found the time to write an article only now. I was busy with a large comparison of several static analyzers. This is a very important task for us because they do not stop asking what we are better than Cppcheck and the static analyzer in Visual Studio 2013. Therefore, glibc had to wait a bit.

    We did not expect to find something terrible and really did not find. The glibc library is very high quality and verified by many parsers. At a minimum, these are:
    • Coverity
    • Clang;
    • Cppcheck.
    So finding at least something is already a great achievement.

    What is the complexity of the analysis


    Unfamiliar with the internal kitchen of static analysis tools, they seem like very simple utilities. This is not true. These are very complex programs.

    Tools such as RATS can be confusing . If someone looked at the RATS code, then they saw that it was just a search for specific function names in files. This tool is also called a static code analyzer. However, it is very far from what real code analyzers do. Static analysis is not a search with regular expressions at all [ 1 ].

    We repeatedly repeated that the Linux version is not at all the same as a recompiled executable module [ 2]. There is a gap between the executable module and the software product. One of the obstacles is support for specific extensions and the like.

    What is it, a third-party person is completely incomprehensible. Here he sees, in the program, the call to strcmp ():
    cmpres = strcmp (newp->from_string, root->from_string);

    And he does not even suspect what horror this line may reveal after preprocessing and what non-standard extensions will be used. Specifically, in our case, the string turns into this:
    cmpres = __extension__ ({ size_t __s1_len, __s2_len;
      (__builtin_constant_p (newp->from_string) &&
      __builtin_constant_p (root->from_string) &&
      (__s1_len = strlen (newp->from_string),
      __s2_len = strlen (root->from_string),
      (!((size_t)(const void *)((newp->from_string) + 1) -
      (size_t)(const void *)(newp->from_string) == 1) ||
      __s1_len >= 4) &&
      (!((size_t)(const void *)((root->from_string) + 1) -
      (size_t)(const void *)(root->from_string) == 1) ||
      __s2_len >= 4)) ?
      __builtin_strcmp (newp->from_string, root->from_string) :
      (__builtin_constant_p (newp->from_string) &&
      ((size_t)(const void *)((newp->from_string) + 1) -
      (size_t)(const void *)(newp->from_string) == 1) &&
      (__s1_len = strlen (newp->from_string), __s1_len < 4) ?
      (__builtin_constant_p (root->from_string) &&
      ((size_t)(const void *)((root->from_string) + 1) -
      (size_t)(const void *)(root->from_string) == 1) ?
       __builtin_strcmp (newp->from_string, root->from_string) :
      (__extension__ ({ const unsigned char *__s2 =
      (const unsigned char *) (const char *) (root->from_string);
      int __result = (((const unsigned char *) (const char *)
      (newp->from_string))[0] - __s2[0]);
      if (__s1_len > 0 && __result == 0) {
      __result = (((const unsigned char *) (const char *)
      (newp->from_string))[1] - __s2[1]);
      if (__s1_len > 1 && __result == 0) { __result =
      (((const unsigned char *) (const char *)
      (newp->from_string))[2] - __s2[2]);
      if (__s1_len > 2 && __result == 0)
      __result = (((const unsigned char *)
      (const char *) (newp->from_string))[3] -
      __s2[3]); } } __result; }))) :
      (__builtin_constant_p (root->from_string) &&
      ((size_t)(const void *)((root->from_string) + 1) -
      (size_t)(const void *)(root->from_string) == 1) &&
      (__s2_len = strlen (root->from_string), __s2_len < 4) ?
      (__builtin_constant_p (newp->from_string) &&
      ((size_t)(const void *)((newp->from_string) + 1) -
      (size_t)(const void *)(newp->from_string) == 1) ?
      __builtin_strcmp (newp->from_string, root->from_string) :
      (- (__extension__ ({ const unsigned char *__s2 =
      (const unsigned char *) (const char *) (newp->from_string);
      int __result = (((const unsigned char *) (const char *)
      (root->from_string))[0] - __s2[0]);
      if (__s2_len > 0 && __result == 0) { __result =
      (((const unsigned char *) (const char *)
      (root->from_string))[1] - __s2[1]);
      if (__s2_len > 1 && __result == 0)
      { __result = (((const unsigned char *)
      (const char *) (root->from_string))[2] -
      __s2[2]); if (__s2_len > 2 && __result == 0)
      __result = (((const unsigned char *) (const char *)
      (root->from_string))[3] - __s2[3]); } } __result; })))) :
      __builtin_strcmp (newp->from_string, root->from_string))));
    });

    The analyzer is not ready for such a turn of events and sometimes produces meaningless false messages on such constructions.

    I will explain about false messages in a simpler example. Suppose we have a line of code:
    assert(MAP_FAILED == (void *) -1);

    The assert () macro expands to the following code:
    ((((void *) -1) == (void *) -1) ? (void) (0) :
      __assert_fail ("((void *) -1) == (void *) -1",
        "loadmsgcat.c", 840, __PRETTY_FUNCTION__));

    The PVS-Studio analyzer generates a false warning regarding the comparison (((void *) -1) == (void *) -1):

    V501 There are identical sub-expressions to the left and to the right of the '==' operator: ((void *) - 1) == (void *) - 1 loadmsgcat.c 840

    We are not surprised at this. We already went through all this while developing for programs built using Visual C ++. There are also many interesting and unusual things. A lot of work needs to be done to teach the analyzer to understand what is what. You need to teach him to understand that he is dealing with an "assert" macro that is harmless and just checks that the MAP_FAILED macro is "(void *) -1". All this has already been done for Visual C ++. But for Linux, no.

    The ability to work correctly with such constructions consists of a huge part of the work of supporting other compilers. Outwardly, this work is not visible. But it requires studying the features of the compiler and standard libraries. These features need to be studied, supported and tested.

    I hope I opened a small click so that you can look into hell. In the future, I plan to write a series of articles that will show the complexity of developing static analysis tools. I think it will be interesting.

    Suspicious code fragments found


    Although the glibc project is being tested by many tools, it was still possible to find something interesting. Let's look at these sections of code.

    Strange expression


    char *DCIGETTEXT (....)
    {
      ....
      /* Make CATEGORYVALUE point to the next element of the list. */
      while (categoryvalue[0] != '\0' && categoryvalue[0] == ':')
        ++categoryvalue;
      ....
    }

    V590 Consider inspecting this expression. The expression is excessive or contains a misprint. dcigettext.c 582 The

    condition can be simplified to:
    while (categoryvalue[0] == ':')

    Perhaps this is not a mistake and the first part of the condition (categoryvalue [0]! = '\ 0') is simply superfluous. However, it should suddenly be written like this:
    while (categoryvalue[0] != '\0' && categoryvalue[0] != ':')

    Pointer dereferencing before validation


    Not necessarily, this place is dangerous. Perhaps the pointer can never be zero. But nonetheless:
    static enum clnt_stat
    clntraw_call (h, proc, xargs, argsp, xresults, resultsp, timeout)
         CLIENT *h;
         u_long proc;
         xdrproc_t xargs;
         caddr_t argsp;
         xdrproc_t xresults;
         caddr_t resultsp;
         struct timeval timeout;
    {
      struct clntraw_private_s *clp = clntraw_private;
      XDR *xdrs = &clp->xdr_stream;
      ....
      if (clp == NULL)
        return RPC_FAILED;
      ....
    }

    V595 The 'clp' pointer was utilized before it was verified against nullptr. Check lines: 145, 150. clnt_raw.c 145

    Next to this file you can see a similar defect: V595 The 'clp' pointer was utilized before it was verified against nullptr. Check lines: 232, 235. clnt_raw.c 232

    Another example of a dangerous code:
    int
    __nss_getent_r (....)
    {
      ....
      if (res && __res_maybe_init (&_res, 0) == -1)
      {
        *h_errnop = NETDB_INTERNAL;
        *result = NULL;
        return errno;
      }
      ....
      if (status == NSS_STATUS_TRYAGAIN
          && (h_errnop == NULL || *h_errnop == NETDB_INTERNAL)
          && errno == ERANGE)
    }

    V595 The 'h_errnop' pointer was utilized before it was verified against nullptr. Check lines: 146, 172. getnssent_r.c 146

    If the condition if (res && __res_maybe_init (& _res, 0) == -1) is fulfilled, the function returns information about the error. In doing so, it dereferences the pointers 'h_errnop' and 'result'. However, these pointers may well be NULL. This conclusion can be drawn by examining the code below.

    Dangerous optimization (vulnerability)


    char *
    __sha256_crypt_r (key, salt, buffer, buflen)
         const char *key;
         const char *salt;
         char *buffer;
         int buflen;
    {
      ....
      unsigned char temp_result[32]
      ....
      memset (temp_result, '\0', sizeof (temp_result));
      ....
      .... // temp_result далее не используется
    }

    V597 The compiler could delete the 'memset' function call, which is used to flush 'temp_result' buffer. The RtlSecureZeroMemory () function should be used to erase the private data. sha256-crypt.c 385

    The compiler has the right to remove the memset () function call when compiling the Release version. More precisely, he is not only in law, but also obliged to do this in order to optimize. The buffer 'temp_result' is not used anywhere after calling the memset () function, therefore, the function call itself is also superfluous.

    We are dealing with a vulnerability since private data will not be cleared. The memset () function should be replaced with a more suitable one. The analyzer offers RtlSecureZeroMemory (), which of course is not in Linux. But there are analogues.

    Similar situation: V597 The compiler could delete the 'memset' function call, which is used to flush 'temp_result' buffer. The RtlSecureZeroMemory () function should be used to erase the private data. sha512-crypt.c 396

    Undefined behavior


    It would seem that the glibc library should be written as portable as possible. However, we find in it many shear constructions that cannot be called safe from the point of view of portability.

    Here's what the C standard about shifts tells us:

    The integer promotions are performed on each of the operands. The type of the result is that of the promoted left operand. If the value of the right operand is negative or is greater than or equal to the width of the promoted left operand, the behavior is undefined.

    The result of E1 << E2 is E1 left-shifted E2 bit positions; vacated bits are filled with zeros. If E1 has an unsigned type, the value of the result is E1 * 2 pow E2, reduced modulo one more than the maximum value representable in the result type. If E1 has a signed type and nonnegative value, and E1 * 2 pow E2 is representable in the result type, then that is the resulting value; otherwise, the behavior is undefined.

    5 The result of E1 >> E2 is E1 right-shifted E2 bit positions. If E1 has an unsigned type or if E1 has a signed type and a nonnegative value, the value of the result is the integral part of the quotient of E1 / 2 pow E2. If E1 has a signed type and a negative value, the resulting value is implementation-defined.

    It follows from the standard that it is wrong to shift negative numbers. However, this is a very common operation in the glibc library.

    Left shift example:
    static void init_cacheinfo (void)
    {
      ....
      count_mask = ~(-1 << (count_mask + 1));
      ....
    }

    V610 Undefined behavior. Check the shift operator '<<. The left operand '-1' is negative. cacheinfo.c 645

    Example of a shift to the right:
    utf8_encode (char *buf, int val)
    {
      ....
      *buf = (unsigned char) (~0xff >> step);
      ....
    }

    The expression "~ 0xff" is of type 'int' and is -256.

    Here is a list of all the places where you can observe incorrect shifts:
    • strxfrm_l.c 68
    • clock_nanosleep.c 38
    • ifaddrs.c 786
    • xdr_intXX_t.c 35
    • xdr_intXX_t.c 41
    • private.h 327
    • private.h 331
    • zic.c 696
    • zdump.c 212
    • zdump.c 216
    • timer_create.c 47
    • timer_create.c 49
    • loop.c 331
    • loop.c 437
    • mktime.c 207
    • mktime.c 208
    • mktime.c 211
    • mktime.c 212
    • mktime.c 230
    • mktime.c 298
    • mktime.c 298
    • ld-collate.c 298


    Using an uninitialized variable


    static int send_vc(....)
    {
      ....
      int truncating, connreset, resplen, n;
      ....
      #ifdef _STRING_ARCH_unaligned
        *anssizp2 = orig_anssizp - resplen;
        *ansp2 = *ansp + resplen;
      #else
      ....
    }
    V614 Uninitialized variable 'resplen' used. res_send.c 790

    Incorrect string formatting


    In some places, '% u' is used to print signed variables. In some places, '% d' is used to print unsigned variables. These are of course trifles, but they are also worth mentioning.

    Example:
    typedef unsigned int __uid_t;
    typedef __uid_t uid_t;
    int
    user2netname (...., const uid_t uid, ....)
    {
      ....
      sprintf (netname, "%s.%d@%s", OPSYS, uid, dfltdom);
      ....
    }

    V576 Incorrect format. Consider checking the fourth actual argument of the 'sprintf' function. The SIGNED integer type argument is expected. netname.c 51

    Other related messages:
    • Consider checking the second actual argument of the 'printf' function. The SIGNED integer type argument is expected. locarchive.c 1741
    • Consider checking the fourth actual argument of the 'printf' function. The SIGNED integer type argument is expected. locarchive.c 1741
    • Consider checking the fifth actual argument of the 'fprintf' function. The SIGNED integer type argument is expected. res_debug.c 236
    • Consider checking the third actual argument of the 'sprintf' function. The UNSIGNED integer type argument is expected. inet_net_ntop.c 134
    • Consider checking the fourth actual argument of the 'sprintf' function. The SIGNED integer type argument is expected. ns_print.c 500
    • Consider checking the fifth actual argument of the 'sprintf' function. The SIGNED integer type argument is expected. ns_print.c 500
    • Consider checking the third actual argument of the 'sprintf' function. The SIGNED integer type argument is expected. ns_print.c 572
    • Consider checking the fourth actual argument of the 'sprintf' function. The SIGNED integer type argument is expected. ns_print.c 572
    • Consider checking the fifth actual argument of the 'sprintf' function. The SIGNED integer type argument is expected. ns_print.c 572
    • Consider checking the third actual argument of the 'sprintf' function. The UNSIGNED integer type argument is expected. ns_print.c 628
    • Consider checking the fourth actual argument of the 'sprintf' function. The UNSIGNED integer type argument is expected. ns_print.c 628
    • Consider checking the fifth actual argument of the 'sprintf' function. The UNSIGNED integer type argument is expected. ns_print.c 628
    • Consider checking the third actual argument of the 'sprintf' function. The SIGNED integer type argument is expected. ns_print.c 645
    • Consider checking the third actual argument of the 'sprintf' function. The UNSIGNED integer type argument is expected. ns_print.c 685
    • Consider checking the second actual argument of the 'printf' function. The SIGNED integer type argument is expected. nis_print.c 209
    • Consider checking the second actual argument of the 'printf' function. The SIGNED integer type argument is expected. sprof.c 480


    Conclusion


    Not a good project was chosen to start checking code from the Linux world. It is too high quality. :) It is difficult to write an interesting article about mistakes. But it doesn’t matter. we are waiting for many other well-known and interesting projects in Linux, which we will check to demonstrate the capabilities of the PVS-Studio analyzer.

    Sitelinks


    1. Andrey Karpov. Static analysis and regular expressions .
    2. Dmitry Tkachenko. Conversation with Andrei Karpov, Technical Director of PVS-Studio and CppCat .


    Answers on questions
    Often our articles are asked the same questions. We collected answers to them here: Answers to questions from readers of articles about PVS-Studio and CppCat, version 2014 .

    Also popular now: