Leo Tolstoy and static code analysis

    PVS-Studio vs Apache HTTP Server
    This time using PVS-Studio we checked Apache HTTP Server. As expected, they found errors in it. There are few errors. This is also expected.

    Other developers are faced with the same situation when testing PVS-Studio on their projects. Unfortunately, the first conclusion that I want to draw after seeing only a few mistakes is that such an instrument is of little use to him. Now I have come up with a good analogy that explains why this is not so.

    War and Peace
    Imagine the following situation. We have a work of Leo Tolstoy, for example, " War and Peace". We want to test the spelling checker built into Microsoft Word on this book. We run the analysis, we find only a few real errors and a large number of false positives. In addition, Leo Tolstoy liked to write long sentences. Microsoft Word, having failed to understand the text, will emphasize a lot of green, suggesting the presence of punctuation errors. We look at all this and turn our nose. We do not like that we almost did not find mistakes. We do not like that many names are considered incorrect. We do not like the analysis of the item We conclude that spell checking in Word is an absolutely useless thing to work in.

    Let us examine the incorrectness of such a study of the tool.

    We forget that people ALREADY WORKED on errors in the book. And they spent a lot of time and energy on it. Leo Tolstoy copied his drafts and exterminated the mistakes. The editor worked, turning the text into a book. Something was corrected in subsequent editions.

    Of course, the program will not find all the errors in the book. People can do it better. But on the other hand, the program does this much faster, enormously reducing the waste of human power. The true benefit of such a tool is its regular use. I wrote a paragraph, noticed an underlined error, and instantly corrected it. And now, rereading the text, a person can correct not trivial typos, but focus on the content and subtle mistakes.

    The comments about false positives are also not convincing. It is enough to add a word unknown to the program to the dictionary once, and it will no longer bother the author of the text. Well, for the rest of the obviously false warnings, you can always right-click and specify “ignore”.

    I think it’s stupid to argue about the benefits of text verification mechanisms built into programs such as Word, Thunderbird, TortoiseSVN. We use them and do not worry that they will find few errors in the book “War and Peace”.

    The situation is the same with static analysis of the source code of programs. It makes no sense to check a project like Apache HTTP Serverand try to evaluate the possible benefits of the analysis. Apache HTTP Server code is old by programming standards. As an Apache HTTP Server project, it has been around for over 15 years. The quality of the code is confirmed by a huge number of projects built on its basis. Naturally, a huge number of errors were fixed due to the widespread use of this code by many programmers and a long development period.

    It is logical to assume that many of the errors found earlier could be found much easier if static analysis were used. The analogy with Word is complete. It is more useful to constantly look for errors than to do it one-time.

    As in the Word editor, PVS-Studio false errors are easily fixed. It is enough to click on the false message and hide it (a special comment will be automatically added to the code).

    We make the last comparison in terms of finding errors. Word Editor can emphasize errors immediately. PVS-Studio analyzer does this after compiling the files. Previously, the syntax is too complex to parse the incomplete code. However, this is enough. You can consider PVS-Studio as a means of increasing warnings issued by the compiler. By the way, now PVS-Studio can do this in different versions of Visual Studio: 2005, 2008, 2010. So I invite you to try incremental analysis .

    In general, the analogy with Word, in my opinion, is complete. If someone does not agree, then I am ready to continue the discussion in the comments or in the mail (karpov [@] viva64.com).

    I finished the main idea, which I irresistibly wanted to share. However, I will disappoint readers if I write nothing about the errors found in Apache HTTP Server. So now a little about the errors found.

    Here is an example of extra sizeof ().
    PSECURITY_ATTRIBUTES GetNullACL(void)
    {
      PSECURITY_ATTRIBUTES sa;
      sa  = (PSECURITY_ATTRIBUTES) LocalAlloc(LPTR,
               sizeof(SECURITY_ATTRIBUTES));
      sa->nLength = sizeof(sizeof(SECURITY_ATTRIBUTES));
      ...
    }

    PVS-Studio diagnostic message: V568 It's odd that the argument of sizeof () operator is the 'sizeof (SECURITY_ATTRIBUTES)' expression. libhttpd util_win32.c 115 The

    size of the SECURITY_ATTRIBUTES structure is not set correctly. As a result, correct operation with such a structure is impossible.

    However, I suspect that most of the examples I give are in code that rarely gets control. Therefore, you should not worry much. This is the following example. The error is in the function associated with error handling.
    apr_size_t ap_regerror(int errcode, const ap_regex_t *preg,
      char *errbuf, apr_size_t errbuf_size)
    {
      ...
      apr_snprintf(errbuf, sizeof errbuf,
                   "%s%s%-6d", message, addmessage, (int)preg->re_erroffset);
      ...
    }

    PVS-Studio diagnostic message: V579 The apr_snprintf function receives the pointer and its size as arguments. It is possibly a mistake. Inspect the second argument. libhttpd util_pcre.c 85

    The point is this. Instead of the size of the buffer, the size of the pointer is passed to the function. As a result, the error message cannot be generated correctly. In practice, the message will be 3 or 7 characters long.

    There are also classic Copy-Paste errors.
    static int magic_rsl_to_request(request_rec *r)
    {
      ...
      if (state == rsl_subtype || state == rsl_encoding ||
          state == rsl_encoding) {
      ...
    }

    PVS-Studio diagnostic message: V501 There are identical sub-expressions 'state == rsl_encoding' to the left and to the right of the '||' operator. mod_mime_magic mod_mime_magic.c 787

    In fact, the comparison should be like this:
    if (state == rsl_subtype || state == rsl_separator ||
        state == rsl_encoding) {

    There are errors that will manifest themselves only in the Windows operating system.
    typedef UINT_PTR SOCKET;
    static unsigned int __stdcall win9x_accept(void * dummy)
    {
      SOCKET csd;
      ...
      do {
          clen = sizeof(sa_client);
          csd = accept(nsd, (struct sockaddr *) &sa_client, &clen);
      } while (csd < 0 && APR_STATUS_IS_EINTR(apr_get_netos_error()));
      ...
    }

    PVS-Studio diagnostic message: V547 Expression 'csd <0' is always false. Unsigned type value is never <0. libhttpd child.c 404

    The accept function in the Visual Studio header files returns a value that has an unsigned SOCKET type. Therefore, checking 'csd <0' is not allowed, because its result is always false (false). The returned values ​​must be explicitly compared with various constants, for example, with SOCKET_ERROR.

    There are a number of errors that can be attributed to potential vulnerabilities. I don’t know, it is possible, at least theoretically, to use them for attack. Nevertheless, I will give a couple of similar examples.
    typedef  size_t apr_size_t;
    APU_DECLARE(apr_status_t) apr_memcache_getp(...)
    {
      ...
      apr_size_t len = 0;
      ...
      len = atoi(length);
      ...
      if (len < 0) {
        ...
      }
      else {
        ...  
      }
    }

    PVS-Studio diagnostic message: V547 Expression 'len <0' is always false. Unsigned type value is never <0. aprutil apr_memcache.c 814 The

    condition "len <0" in this code is always false, since the variable 'len' has an unsigned type. Accordingly, if you input a string with a negative number, then apparently a failure will occur.

    Apache also has a lot of code where various data buffers are nullified. Most likely, this is due to security. That's just zeroing itself often does not occur due to an error of the following form:
    #define MEMSET_BZERO(p,l)       memset((p), 0, (l))
    void apr__SHA256_Final(sha2_byte digest[], SHA256_CTX* context) {
      ...
      MEMSET_BZERO(context, sizeof(context));
      ...
    }

    PVS-Studio diagnostic message: V512 A call of the 'memset' function will lead to underflow of the buffer '(context)'. apr sha2.c 560

    Only the first bytes in the buffer are reset, since the sizeof () operator returns the size of the pointer, not the data buffer. I counted at least 6 such errors.

    Other errors are more boring, so I won’t write about them. Thanks for attention. Come try PVS-Studio. And we still provide keys for developers of free open-source projects and for those who really want to try the analyzer in full force. Write us.

    Also popular now: