FoxCanFly March 19, 2015 at 17:41

Verifying Vim with PVS-Studio on GNU / Linux

The reader might think that this is another article about checking another project from the world of free software, but in fact, the article is not so much about checking, but about the practice of using the PVS-Studio analyzer in a fully GNU / Linux environment. It is no coincidence that Vim became the choice of the project for verification, for he also served in this matter.

First, a little about Vim

Vim ( http://vim.org ) is a cross-platform free text editor with a 30-year history, which is the successor to vi and comes from the world of Unix systems.

Vim is widely used in administration and development; in many GNU / Linux distributions it is the default editor. Vim differs from other text editors in its orientation to using exclusively a keyboard, a text interface, and rich expansion options through a system of plugins written in Vim Script.

Now about the test itself

One of the options for checking projects for Linux is integration into the build system, for example, GNU Make. To check vim, this method was chosen. For each compiler call, an analyzer call was added to the makefile. For convenience, this call was wrapped in a Make variable in a similar way:

#PVS Studio vars
PVS_CFLAGS = $(ALL_CFLAGS)
PVS_INCFLAGS = -I$(srcdir)
PVS_STUDIO = ~/PVS-Studio/PVS-Studio –cfg \
    ~/PVS-Studio/PVS-Studio_vim.cfg --source-file \
    $< --cl-params $(PVS_CFLAGS) -c $(PVS_INCFLAGS) $<

Next, the project is assembled in normal mode with the make command (if desired, you can add a separate target for analysis, for example, ".analysis"). The result, in addition to the assembled project, is a raw analyzer log.

Note. The analyzer can run in parallel with parallel assembly of the project. Each running instance of the analyzer appends its portion of messages to the log. Therefore, keep in mind that the analyzer does not clear the raw log file. Therefore, before restarting the scan, you must delete the log from the previous scan yourself.

It is impossible to work with a raw log. There are a lot of duplicate messages in this log (when one .h file is included in several .cpp files). To change the analysis parameters, you have to, after editing the configuration file, run the test again, which seriously affects the time for large projects. Even if we just wanted to, for example, turn off warnings related to files from a certain directory. To solve this problem, a log-parser utility was written in C ++ that processes the raw PVS-Studio log, removes duplicate messages, applies filters to messages from its options file and displays analyzer warnings in one of the supported formats. The utility works very quickly (even on the verification logs of very large projects, its operation time does not exceed 2-3 seconds),

If necessary, you can easily add new output formats. At the moment there are two of them: xml and the so-called errorfile. As far as I understand, he does not have an official name, this is the format in which many programs for Linux give out messages, for example, grep, gcc compilation errors, etc. It was he who came in handy.

Unlike Windows, where the vast majority of developers use Visual Studio, in the GNU / Linux world there are many IDEs, text editors, and more with their share of users. There is no consensus and a preponderance of the parties, and everyone chooses an instrument to his liking. Nevertheless, when checking projects, it is important not only to get the results of the analysis, but also to be able to conveniently work with them, as this option is provided by the integration of PVS-Studio with Visual Studio. The error message format described above is a kind of standard for Linux programs and most editors and development environments have one or another of its support, but, unfortunately, in most cases this support exists only for reading compiler messages from stderr during assembly,

This is where the Vim editor came in handy. Of course, you can develop an appropriate plug-in for any of the other tools, but it turned out that Vim had this feature initially. Figure 1 - Running vim with analysis results. Enough after running the analyzer and the log processing utility, run vim -q

, And then in the editor that opens, execute the command to create a buffer with errors. For example: cw 20. And now we are already getting a comfortable environment for working with analyzer messages and code navigation. Of course, I had to spend several hours studying Vim itself, since I had not worked in it before, and its principles of use are very different from the more familiar text editors. But in the end I can say that I was very imbued with the convenience of its use and from the category of obscure gizmos for aliens, he moved into the category of powerful tools for work. Accordingly, for a long time I did not have to worry about choosing a project for verification - of course, Vim itself. The vim code turned out to be very high-quality, there were no obvious bugs in it (although the style of writing the code in some places is rather controversial, but, I think, you can make a discount on the age of the project. Nevertheless, a number of places were discovered that are worth paying attention to. Let's talk about them in more detail.

Extra check

    if (ptr == NULL)
    {
        if (compl_leader != NULL)
            ptr = compl_leader;
        else
            return;  /* nothing to do */
    }
    if (compl_orig_text != NULL)
    {
        p = compl_orig_text;
        for (len = 0; p[len] != NUL && p[len] == ptr[len]; ++len)
        ;
#ifdef FEAT_MBYTE
        if (len > 0)
            len -= (*mb_head_off)(p, p + len);
#endif
        for (p += len; *p != NUL; mb_ptr_adv(p))
            AppendCharToRedobuff(K_BS);
    }
    else
        len = 0;
    if (ptr != NULL)
        AppendToRedobuffLit(ptr + len, -1);

V595 Analyzer Warning (1) The 'ptr' pointer was utilized before it was verified against nullptr. Check lines: 3922, 3933.

The ptr pointer has already been checked for NULL; if this condition is true, the comp_leader pointer has been assigned to it, which is definitely not zero. A second check is not required.

Weird memset

/*
* If requested, store and reset the global values controlling
* the exception handling (used when debugging). Otherwise avoid
* clear it to a bogus compiler warning when the optimizer
* uses inline functions...
*/
if (flags & DOCMD_EXCRESET)
  save_dbg_stuff(&debug_saved);
else
  vim_memset(&debug_saved, 0, 1);

where debug_saved is a structure object

struct dbg_stuff
{
    int        trylevel;
    int        force_abort;
    except_T    *caught_stack;
    char_u    *vv_exception;
    char_u    *vv_throwpoint;
    int        did_emsg;
    int        got_int;
    int        did_throw;
    int        need_rethrow;
    int        check_cstack;
    except_T    *current_exception;
};

Analyzer Message: V512 (1) A call of the 'memset' function will lead to underflow of the buffer '& debug_saved'.

It is difficult to say why it was necessary to reset only the first byte of the structure. If this is used as some kind of flag, it is better to define it as a separate field of the structure (union can be).

Suspicious cycle

/* check for out-of-memory */
for (i = 0; i < num_names; ++i)
{
  if (names[i] == NULL)
  {
    for (i = 0; i < num_names; ++i)
      vim_free(names[i]);
    num_names = 0;
  }
}

Analyzer Message: V535 (1) The variable 'i' is being used for this loop and for the outer loop. Check lines: 1893, 1897.

In the outer and inner loops, the same counter passes through the same array i. It is clear that the first time the condition if (names [i] == NULL) is triggered, the next step of the external loop will not be executed, but an outsider will have to strain to understand this code correctly, and the originality of its writing suggests that the behavior meant the author. In other words, although there is no error, the code smells a bit. Perhaps the 'break' statement would do better to stop the loop.

Scopes

char_u *p, *old;
//...
{
    char_u        buffer[BUFLEN + 1];
    //...
    for (p = buffer; p < buffer + len; p += l)
    //...

Analyzer Warning: V507 (2) Pointer to local array 'buffer' is stored outside the scope of this array. Such a pointer will become invalid.

There are a lot of similar places in the code in Vim (to the question of style). The pointer p declared at the very beginning of the function (in some places generally with a global scope) stores a pointer to an array that exists only in a smaller scope and which will be deleted when it leaves its code block. During a quick inspection, it seemed to me that after exiting the scope of the buffer, the pointer p is used only after assigning it a new value, but you can skip this somewhere. Why should I do this instead of declaring another variable inside the scope of buffer (is it really to save space on the stack?), I don’t understand. Such code is poorly maintained and uncomfortable to read.

Error with signed and unsigned types in one expression

for (cu = 1; cu <= 255; cu++)
    if (VIM_ISDIGIT(cu))
        regc(cu);

Where

#define VIM_ISDIGIT(c) ((unsigned)(c) - '0' < 10)

Analyzer Warning: V658 (2) A value is being subtracted from the unsigned variable. This can result in an overflow. In such a case, the '<' comparison operation can potentially behave unexpectedly. Consider inspecting the '(unsigned) (cu) -' 0 '<10' expression.

This code is more like a dirty hack. When evaluating the expression ((unsigned) - '0' <10), the result of the subtraction will be a value of type unsigned and when comparing both parts of the expression will also be converted to unsigned. Accordingly, when the variable cu is less than the numerical value 0, overflow occurs. In this case, the code behaves correctly, it fulfills its purpose (checking whether the character is a digit), but I do not think that there are reasons to use such tricks in your code unnecessarily. The cycle could be made from '0' and dispense with unsigned.

The pointer is initialized to NULL and does not change anywhere, moreover, it is used

char_u    *retval = NULL;
//...
if (round == 2)
  vim_strncpy(retval, s, len); //первое использование retval
//...
if (retval == NULL)
{

Analyzer Warning: V595 (1) The 'retval' pointer was utilized before it was verified against nullptr. Check lines: 7903, 7907.

This already looks like an error. The analyzer speaks of an unnecessary check, but in fact, the problem is primarily in another. The retval pointer is initialized to 0, and I did not find a single place in this function where its value would change. At the same time, strncpy is done many times in it. Then he suddenly decided to check for NULL.

Insecure use of realloc

/* TODO: check for vim_realloc() returning NULL. */
l->t = vim_realloc(l->t, newlen * sizeof(nfa_thread_T));

V701 (2) analyzer warning realloc () possible leak: when realloc () fails in allocating memory, original pointer 'l-> t' is lost. Consider assigning realloc () to a temporary pointer.

A very common error in many projects, the essence of this error is described in detail in the analyzer message. Fortunately, judging by the comment, this will be fixed soon. Elsewhere in the Vim code, realloc is used correctly.

Some examples of false positives

if (ireg_icombine && len == 0)
{
  /* If \Z was present, then ignore composing characters.
   * When ignoring the base character this always matches. */
   if (len == 0 && sta->c != curc)
     result = FAIL;

V560 (2) A part of conditional expression is always true: len == 0.

V571 (2) Recurring check. The 'len == 0' condition was already verified in line 6032.

if (VIsual_active)
{
  if (VIsual_active
      && (VIsual_mode != wp->w_old_visual_mode
      || type == INVERTED_ALL))

V571 (2) Recurring check. The 'VIsual_active' condition was already verified in line 1515.

There are several more places with similar checks. They do not cause much interest for discussion, in most cases they do not cause harm, but in some cases a logical error may be hidden, therefore such places should be paid attention to.

Just bad code, only the first byte is filled in the structure

#ifdef FEAT_TAG_BINS
  /* This is only to avoid a compiler warning for using search_info
  * uninitialised. */
  vim_memset(&search_info, 0, (size_t)1);
#endif

V512 (1) A call of the 'memset' function will lead to underflow of the buffer '& search_info'.

The commentary explains why this was done, but the method is rather strange. There are many more beautiful ways to avoid compiler warnings.

Bad practice using short names

extern char *UP, *BC, PC;

Analyzer Warning: V707 (2) Giving short names to global variables is considered to be bad practice. It is suggested to rename 'UP', 'BC', 'PC' variables.

This practice is not uncommon in Vim code. Many variables have 1- and 2-letter names, often with a large scope, and in this case with a global one. In combination with functions with a length of 500+ lines, this leads to very difficult to read code.

Strange assignment i in condition

int i = 2; /* index in s[] just after [ or CSI */
//...
if (n >= 8 && t_colors >= 16
    && ((s[0] == ESC && s[1] == '[')
        || (s[0] == CSI && (i = 1) == 1))
    && s[i] != NUL
    && (STRCMP(s + i + 1, "%p1%dm") == 0
    || STRCMP(s + i + 1, "%dm") == 0)
    && (s[i] == '3' || s[i] == '4'))

Analyzer Warning: V560 (2) A part of conditional expression is always true: (i = 1) == 1.

It is difficult to say whether this is a mistake or just a strange way to assign i to one. Writing so clearly is not worth it.

Conclusion

In conclusion, I would like to say that now checking projects using PVS-Studio under GNU Linux without using a Windows machine is quite real and quite convenient. Vim also helped in this number, for which he was subjected to a similar verification test first.

This article is in English.

If you want to share this article with an English-speaking audience, then please use the link to the translation: Anton Tokarev. Analyzing Vim by PVS-Studio in GNU / Linux .

Have you read the article and have a question?

Often our articles are asked the same questions. We collected answers to them here: Answers to questions from readers of articles about PVS-Studio and CppCat, version 2014 . Please see the list.

Tags: