Andrey2008 August 21, 2013 at 14:53

PVS-Studio finally got to Boost

We have long wanted to check out the Boost library. We were not sure that the results of the verification would be enough for the article. However, the desire did not disappear. Twice we tried to do this, but retreated, not understanding how to replace the compiler call with the PVS-Studio.exe call. Now we were armed with new tools, and the third attempt was successful. So, is it possible to find errors in Boost?

Boost

Boost is a collection of freeware libraries that extend the functionality of C ++. The project was created after the adoption of the C ++ standard, when many were unhappy that some libraries were not included in the standard. The project is a kind of "testing ground" for various extensions of the language, and some libraries are candidates for inclusion in the following C ++ standards. References:

Boost library site .
Wikipedia The Boost .
Download Boost .

Boost is a "heavy" code in which complex patterns are actively used. This library is in some ways a test for compilers. It often happens that a compiler is able to compile only part of the projects from the modern Boost library.

However, we didn’t have problems parsing the code. In extreme cases, the analyzer can allow itself to silently skip the very complicated design. It turned out to be difficult for us to integrate into the assembly itself.

Let me remind you of the existing options for checking projects using PVS-Studio.

If there is an ordinary project for Visual Studio or Embarcadero C ++ Builder

Everything is simple here. Project analysis can be performed directly from the development environment. Another option is to launch PVS-Studio from the command line and get a file with the scan results at the output. This mode is useful for continuous integration systems (e.g. Cruise Control, Draco.NET, or Team Foundation Build). This mode is described in more detail in the documentation. On interaction with continuous integration systems, see here .

If there is no project (or the project is essentially a disguised makefile)

In this case, you need to create a build mode when, instead of the compiler (or with it), the code analyzer is launched. The output also produces a report file. The necessary magic spells are also described in the documentation. In this case, the magician must be very careful, study the manual in detail and not forget a single character.

It was this approach that we should have used to test Boost. It’s a pity, we were magicians of insufficient level or just lazy. I could not figure out the assembly system in order to pass all the necessary parameters to the console version of the analyzer.

A new third option for checking projects came to our aid

My colleague mentioned this new mode in a recent article “ From the basement of the secret laboratory of PVS-Studio developers ”. No need to fully integrate into the assembly system. Simply get the preprocessed * .i files . It is much simpler. That is what we did.

Then, using the prototype of the new tool (PVS-Studio Standalone), we performed an analysis of all * .i files and received a long-awaited report. In the same program, you can conveniently work with the list of errors and make corrections to the code.

I hope we will include this tool in the distribution through several versions. Perhaps this will happen in PVS-Studio 5.10.

A few words about the regime, which is not, but which we dream about

We are slowly getting to tracking compiler actions. This mode will also apply to the PVS-Studio Standalone tool. All compiler launches will be tracked and the keys to run it will be collected. Thus, the following steps will suffice. Order - "watch." Build a project using any build system. To say - "ready." And the analyzer will know how to check the project now. Of course, if the project structure or parameters change, this process will have to be repeated. Okay, dreaming, and now back to the Boost test.

Sense of hopelessness

I was ready that I couldn’t write an article about Boost. This sad assumption had the following premises.

A large number of compilers and tools

Boost is built by a large number of compilers. Some partially, some completely. I did not study this issue. But, as I understand it, Boost builds quite well using Visual C ++, Intel C ++, Sun Studio, Compaq C ++, GCC, Clang. Each compiler has its own unique diagnostic capabilities. And their total use should provide a very high quality code. One compiler will find error A, the second error B, and so on.

Moreover, the Boost library is a kind of testing ground for various tools and static code analyzers. Since Boost actively uses various modern features of the C ++ language, it is always interesting to know whether the tool will be able to deal with such code. As a result, Boost is tested and cross-checked by various code analyzers.

To look for errors and typos after so many compilers and other tools is almost hopeless.

A large number of users

The Boost library is used in many projects. At one time, we ourselves used Boost in the PVS-Studio project (then Viva64). We used mechanisms for working with regular expressions, with configuration files, and a couple of other little things. Then we realized that regular expressions are a dead end, and they gradually disappeared from the code. Carrying Boost just because of the configuration files was doubtful. Moreover, some unpleasant shortcomings became clear. For example, you cannot use '#' in the file name. This character counts as the beginning of a comment. In our particular case, Boost did not take root, however, this is certainly a very useful library.

Since the library is widely used, errors should be quickly detected by programmers. Errors can only remain in rarely used code fragments or in exotic subsystems that have few users.

Patterns

Boost has many template classes. They are almost impossible to verify unless they are instantiated. For example, Visual C ++ does not parse template classes at all if they are not used. You can write any rubbish in an unused class and the file is successfully compiled enough to keep the number of opening and closing brackets and quotation marks (), <>, {}, [], "", ''.

Analyzing the template class, we have to compromise between the number of false positives and skipping errors. Let me explain with a simple example what the complexity is.

template 
bool IsNAN(T x) { return x != x; }

This function determines whether the value of a variable is Not-a-Number . This function makes sense for float / double / long double types. For integer types, such a comparison is meaningless, and indicates the presence of an error.

What if the type is unknown? Insoluble question. For full verification, all templates should be used in all possible cases. Moreover. Templates still need to be parsed. And this is a difficult task. For example, PVS-Studio has a number of problems with this. Something he can make out and even try to instantiate, but something not.

In any case, template analysis is a very thankless task. And in Boost they are full of them.

Odds assessment

Evaluating and reflecting on all of the above, I was pessimistic. I assumed that there might not be anything sensible at all. Or there is one mistake that you can’t make an article from anyway.

Finding 3-4 bugs in Boost I would consider a huge achievement.

Let's see what PVS-Studio 5.06 managed to find in Boost 1.55 (the version was downloaded at the development stage).

Boost Test Results

Of course, not a lot of errors or suspicious places were found. But I think the result of the analysis is simply magnificent.

Fragment N1. Typo

point3D operator/(const point3D &p1, const point3D &p2)
{
  return point3D( p1.x/p2.x , p1.y/p2.y , p1.z/p1.z );
}

PVS-Studio diagnostic message: V501 There are identical sub-expressions to the left and to the right of the '/' operator: p1.z / p1.z lorenz_point.cpp 61

In the third argument of the function, the variable 'p1.z' is divided on myself. Most likely, it should be divided into 'p2.z'.

Fragment N2. Error initializing class members

BOOST_UBLAS_INLINE
compressed_matrix_view(const compressed_matrix_view& o) :
  size1_(size1_), size2_(size2_),
  nnz_(nnz_),
  index1_data_(index1_data_),
  index2_data_(index2_data_),
  value_data_(value_data_)
{}

The first diagnostic message from PVS-Studio (there’s no point in giving the rest): V546 Member of a class is initialized by itself: 'size1_ (size1_)'. sparse_view.hpp 193

Members of the class are initialized by themselves. And it turns out that they remain uninitialized. Probably the data from the 'o' object should have been used. It seems to me that the constructor should look like this:

BOOST_UBLAS_INLINE
compressed_matrix_view(const compressed_matrix_view& o) :
  size1_(o.size1_), size2_(o.size2_),
  nnz_(o.nnz_),
  index1_data_(o.index1_data_),
  index2_data_(o.index2_data_),
  value_data_(o.value_data_)
{}

Fragment N3. Inaccuracy in freeing up memory

static std::basic_string get(char const* source = "")
{
  ....
  std::auto_ptr result (new wchar_t[len+1]);
  ....
}

PVS-Studio diagnostic message: V554 Incorrect use of auto_ptr. The memory allocated with 'new []' will be cleaned using 'delete'. tree_to_xml.ipp 71 The

container 'std :: auto_ptr' is an inappropriate type for storing a pointer to an array of objects. To destroy the array, the operator 'delete' will be used, not 'delete []'. Most likely not a fatal, but a real mistake. The danger of such errors is well described in the article " delete, new [] in C ++ and urban legends about their combination ."

A similar error is present here: generate_static.hpp 53.

Fragment N4. Classics of the genre. SOCKET

In my opinion, it is rarely possible to meet a project where there is not at least one error associated with the use of the SOCKET type. Let me remind you the essence of the trouble. Often the status of the operation is checked as follows:

SOCKET s = Foo();
if (s < 0) { Error(); }

Such verification is illegal. You should compare the variable with the constant SOCKET_ERROR. However, programmers are often lazy and write “socket <0” or “socket> = 0”.

On Linux, the SOCKET type is iconic and similar hackwork gets away with. On Windows, the SOCKET type is unsigned. As a result, conditions are always false and error situations are not handled in any way.

There was such a mistake in Boost.

typedef SOCKET socket_type;
class socket_holder
{
  ....
  socket_type socket_;
  ....
  socket_type get() const { return socket_; }
  ....
};
template 
boost::system::error_code accept(....)
{
  ....
  // On success, assign new connection to peer socket object.
  if (new_socketnew_socket.get() >= 0)
  {
    if (peer_endpoint)
      peer_endpoint->resize(addr_len);
    if (!peer.assign(impl.protocol_, new_socket.get(), ec))
      new_socket.release();
  }
  return ec;
}

PVS-Studio diagnostic message: V547 Expression 'new_socket.get ()> = 0' is always true. Unsigned type value is always> = 0. win_iocp_socket_service.hpp 436

On Windows, this piece of code will not work as the programmer expected. The condition "new_socketnew_socket.get ()> = 0" is always satisfied.

Fragment N5. Typo

void set_duration_style(duration_style style)
{
  duration_style_ == style;
}

PVS-Studio diagnostic message: V607 Ownerless expression 'duration_style_ == style'. base_formatter.hpp 51

I think there is nothing special to comment on. Relying on the name of the function, it seems to me that should be written here: "duration_style_ = style". Just a typo.

Look at this error and imagine how much nerves and time PVS-Studio can save with regular use. We all make such mistakes all the time. Then we seek and rule them. We do not remember them, because they are too small. But they add up to the hours and days spent by the programmer meaninglessly. PVS-Studio is good at looking for typos. Try incremental analysis mode. And when the tool shows you a triple of such blunders right after compilation, you will love it.

You do not make such mistakes? Check this example again (and various other examples ). I think this code was not written by a student. It's just that a person’s brain is not perfect, and we all make mistakes. This is normal and there is nothing wrong with playing with various auxiliary tools: static code analyzers, TDD methodology, code reviews.

Fragment N6. Potentially dangerous reading from a stream

template< typename CharT >
basic_settings< CharT > parse_settings(std::basic_istream< CharT >& strm)
{
  ....
  string_type line;
  while (!strm.eof())
  {
     std::getline(strm, line);
     const char_type* p = line.c_str();
     parser.parse_line(p, p + line.size());
     line.clear();
     ++line_number;
  }
  ....
}

PVS-Studio diagnostic message: V663 Infinite loop is possible. The 'cin.eof ()' condition is insufficient to break from the loop. Consider adding the 'cin.fail ()' function call to the conditional expression. settings_parser.cpp 285

This code does what it should do - reads data from a file. The analyzer does not like that the code can lead to a loop. I do not know how to accurately simulate a dangerous situation, but try to guess. Let the file be on a network drive. At the time of reading, the connection is disconnected. The function 'eof ()' will return 'false'. After all, the file has not been read to the end. To catch such situations, it is recommended to use the 'eof ()' function paired with the 'fail ()' function. In the given fragment, the function 'fail ()' is not called anywhere, which means that an unpleasant situation is possible when the program is running. The reliability of programs and their robustness to errors are made up of such trifles.

And another potentially dangerous place: V663 Infinite loop is possible. The 'cin.eof ()' condition is insufficient to break from the loop. Consider adding the 'cin.fail ()' function call to the conditional expression. adjacency_list_io.hpp 195

Fragment N7. Suspicious Subtraction

template<> 
struct identity_element
{
  static boost::gregorian::date_duration value()
  { 
    return
      boost::gregorian::date(boost::gregorian::min_date_time) -
      boost::gregorian::date(boost::gregorian::min_date_time); 
  }
};

PVS-Studio analyzer message: V501 There are identical sub-expressions 'boost :: gregorian :: date (boost :: gregorian :: min_date_time)' to the left and to the right of the '-' operator. gregorian.hpp 57

It seems to me, or will this function always return 0?

Conclusion

I believe that the PVS-Studio analyzer proved to be excellent. Finding at least something in Boost is a huge achievement. And the analyzer did it!

Tags: