Going for the record: Chromium's fifth test

    It would seem that Chromium has been reviewed by us repeatedly. An attentive reader will ask a logical question: “Why do we need another check? Wasn't that enough? ” Undoubtedly, the Chromium code is notable for its cleanliness, as we were convinced every time during the verification, but errors inevitably continue to be detected. Repeated checks demonstrate well that the more static analysis is used, the better. It is good if the project is checked every day. Even better, if the analyzer is used by programmers directly during work (automatic analysis of the changed code).

    A little background


    Chromium has been tested with PVS-Studio four times already:


    Previously, all checks were performed by the Windows version of the PVS-Studio analyzer. Recently, PVS-Studio also works under Linux, so this version was used for analysis.

    During this time, the project grew in size: in the third check, the number of projects reached 1169. At the time of writing, there were 4420. The size of the source code also increased noticeably to 370 MB (in 2013, the size was 260 MB).

    For four checks, the highest quality code for such a large project was noted. Has the situation changed after two and a half years? Not. The quality is still on top. However, due to the large amount of code and its constant development, we again find many errors.

    How to check?


    Let us dwell on how to check the Chromium. This time we will do it on Linux. After loading the sources using depot_tools and preparing (for details , see the Building section), we assemble the project:

    pvs-studio-analyzer trace -- ninja -C out/Default chrome

    Next, execute the command (this is one line):

    pvs-studio-analyzer analyze -l /path/to/PVS-Studio.lic 
    -o /path/to/save/chromium.log -j

    where the -j flag starts multithreaded analysis. The recommended number of threads is the number of physical CPU cores plus one (for example, on a quad-core processor the flag will look like "-j5").

    As a result, the PVS-Studio analyzer report will be received. Using the PlogConverter utility, which is part of the PVS-Studio distribution kit, it can be converted into one of three readable formats: xml, errorfile, tasklist. We will use tasklist format to view messages. In the current scan, only general analysis alerts of all levels (High, Medium, Low) will be viewed. The conversion command will look like this (in one line):

    plog-converter -t tasklist -o /path/to/save/chromium.tasks
    -a GA:1,2,3 /path/to/saved/chromium.log

    You can read more about all the parameters of PlogConverter here . Downloading the tasklist “chromium.tasks” to QtCreator (must be pre-installed) is done using the command:

    qtcreator path/to/saved/chromium.tasks

    It is highly recommended to start viewing the report with warnings of the High and Medium levels - the likelihood that there will be erroneous instructions in the code will be very high. Low level warnings can also indicate potential errors, but they have a higher probability of false positives, so when writing articles, they are usually not studied.

    Viewing the report itself in QtCreator will look like this:


    Figure 1 - Working with the results of the analyzer from under QtCreator (click on the image to enlarge)

    What did the analyzer tell?


    After checking the Chromium project, 2,312 warnings were received. The following diagram shows the distribution of alerts by severity level:

    Figure 2 - Distribution of warnings by importance levels

    Briefly comment on the diagram below: 171 warnings of High level, 290 warnings of Medium level, 1851 warnings of Low level were received.

    Despite a fairly large number of warnings, for such a gigantic project this is not enough. The total number of lines of source code (SLOC) without libraries is 6468751. If we take into account only warnings of the High and Medium level, then among them I can probably point out 220 real errors. Figures are numbers, but in reality we get an error density of 0.034 errors per 1000 lines of code. This, of course, is the density of not all errors, but only those that PVS-Studio finds. Or rather, the mistakes that I noticed while looking at the report.

    As a rule, on the other projects we get used tolower density of errors found. Chromium developers are great! However, you should not relax: there are mistakes, and they are far from harmless.

    Let us consider in more detail the most interesting errors.

    Newly Found Errors


    Copy paste




    Analyzer Warning : V501 There are identical sub-expressions 'request_body_send_buf_ == nullptr' to the left and to the right of the '&&' operator. http_stream_parser.cc 1222

    bool HttpStreamParser::SendRequestBuffersEmpty() 
    {
      return request_headers_ == nullptr 
          && request_body_send_buf_ == nullptr 
          && request_body_send_buf_ == nullptr;  // <=
    }

    Classics of the genre. The programmer twice compared the request_body_send_buf_ pointer with nullptr . This is probably a typo, and another member of the class should be compared with nullptr .

    Analyzer Warning : V766 An item with the same key '"colorSectionBorder"' has already been added. ntp_resource_cache.cc 581

    void NTPResourceCache::CreateNewTabCSS() 
    {
      ....
      substitutions["colorSectionBorder"] =             // <=
          SkColorToRGBAString(color_section_border); 
      ....
      substitutions["colorSectionBorder"] =             // <=
          SkColorToRGBComponents(color_section_border); 
      ....
    }

    The analyzer reports a suspicious double initialization of the object using the “colorSectionBorder” key . The substitutions variable in this context is an associative array. In the first initialization, the color_section_border variable of type SkColor (defined as uint32_t ) is converted to the RGBA string representation (judging by the name of the SkColorToRGBAString method ) and stored by the key “colorSectionBorder” . When reassigned, color_section_border is converted to another string format ( SkColorToRGBComponents method) and is recorded using the same key. This means that the previous value for the key "colorSectionBorder" will be destroyed. Perhaps this was intended, but then one of the initializations should be removed. Otherwise, you should save the color components with a different key.

    Note. By the way, this is the first error found using the V766 diagnostics in a real project. The type of error is quite specific, but the Chromium project is so large that it can meet exotic defects.

    Invalid work with pointers



    I invite readers to stretch a bit and try to find a mistake on their own.

    // Returns the item associated with the component |id| or nullptr
    // in case of errors.
    CrxUpdateItem* FindUpdateItemById(const std::string& id) const;
    void ActionWait::Run(UpdateContext* update_context,
                         Callback callback)
    {
      ....
      while (!update_context->queue.empty()) 
      {
          auto* item = 
            FindUpdateItemById(update_context->queue.front());
          if (!item)
          {
            item->error_category = 
              static_cast(ErrorCategory::kServiceError); 
            item->error_code =
              static_cast(ServiceError::ERROR_WAIT);
            ChangeItemState(item, CrxUpdateItem::State::kNoUpdate);
          } else {
            NOTREACHED();
          }
          update_context->queue.pop();
      }
      ....
    }

    Analyzer Warning : V522 Dereferencing of the null pointer 'item' might take place. action_wait.cc 41

    Here, the programmer decided to explicitly shoot himself in the foot. The code sequentially bypasses the queue queue , which contains identifiers in string representations. The identifier is retrieved from the queue, then the FindUpdateItemById method should return a pointer to an object of type CrxUpdateItem by identifier. If the method FindUpdateItemById an error occurs, it will be returned nullptr , which then branch into razymenuetsya then operator of the if .

    The correct code might look like this:

    ....
    while (!update_context->queue.empty()) 
    {
      auto* item = 
        FindUpdateItemById(update_context->queue.front());
      if (item != nullptr)
      { 
        ....
      }
      ....
    }
    ....

    Analyzer Warning : V620 It's unusual that the expression of sizeof (T) * N kind is being summed with the pointer to T type. string_conversion.cc 62

    int UTF8ToUTF16Char(const char *in, int in_length, uint16_t out[2]) 
    {
      const UTF8 *source_ptr = reinterpret_cast(in);
      const UTF8 *source_end_ptr = source_ptr + sizeof(char);
      uint16_t *target_ptr = out;
      uint16_t *target_end_ptr = target_ptr + 2 * sizeof(uint16_t); // <=
      out[0] = out[1] = 0;
      ....
    }

    The analyzer detected a code with suspicious address arithmetic. As the name implies, the function converts a character from the encoding UTF-8 to UTF-16. The current Unicode 6.x standard expands the UTF-8 character to four bytes. In this regard, the UTF-8 character is decoded as 2 UTF-16 characters (UTF-16 character is hardcoded with two bytes). For decoding, 4 pointers are used - pointers to the beginning and end of the in and out arrays . Pointers to the end of arrays in the code act like STL iterators: they refer to the memory area after the last element in the array. If in the case of source_end_ptr the pointer was received correctly, then with target_end_ptreverything is not so "rosy." It was understood that it should refer to the memory area after the second element of the out array (that is, shift relative to the out pointer by 4 bytes), but instead the pointer will refer to the memory area after the fourth element (offset out by 8 bytes).

    We illustrate the words spoken, as it should have been:



    How it actually happened:



    The correct code should look like this:

    int UTF8ToUTF16Char(const char *in, int in_length, uint16_t out[2]) 
    {
      const UTF8 *source_ptr = reinterpret_cast(in);
      const UTF8 *source_end_ptr = source_ptr + 1;
      uint16_t *target_ptr = out;
      uint16_t *target_end_ptr = target_ptr + 2;
      out[0] = out[1] = 0;
      ....
    }

    The analyzer also found another suspicious place:

    • V620 It's unusual that the expression of sizeof (T) * N kind is being summed with the pointer to T type. string_conversion.cc 106

    Various errors




    I suggest again to stretch and try to find an error in the code yourself.

    CheckReturnValue& operator=(const CheckReturnValue& other)
    {
      if (this != &other)
      {
        DCHECK(checked_);
        value_ = other.value_;
        checked_ = other.checked_;
        other.checked_ = true;
      }
    }

    Analyzer Warning : V591 Non-void function should return a value. memory_allocator.h 39

    The code above has undefined behavior : the C ++ standard says that any non-void method must return a value. What is in the code above? The assignment operator checks for assignment to itself (comparing objects by their pointers) and copying fields (if the pointers are different). However, the method did not return a reference to itself ( return * this ).

    There were a couple more places in the project where the non-void method did not return a value:

    • V591 Non-void function should return a value. sandbox_bpf.cc 115
    • V591 Non-void function should return a value. events_x.cc 73

    Analyzer Warning : V583 The '?:' Operator, regardless of its conditional expression, always returns one and the same value: 1. configurator_impl.cc 133

    int ConfiguratorImpl::StepDelay() const 
    {
      return fast_update_ ? 1 : 1;
    }

    The code always returns a delay equal to one. Perhaps this hurt for the future, but so far there is no use for such a ternary operator.

    Analyzer Warning : V590 Consider inspecting the 'rv == OK || rv! = ERR_ADDRESS_IN_USE 'expression. The expression is excessive or contains a misprint. udp_socket_posix.cc 735

    int UDPSocketPosix::RandomBind(const IPAddress& address) 
    {
      DCHECK(bind_type_ == DatagramSocket::RANDOM_BIND 
          && !rand_int_cb_.is_null());
      for (int i = 0; i < kBindRetries; ++i) {
        int rv = DoBind(IPEndPoint(address,
                                   rand_int_cb_
                                   .Run(kPortStart, kPortEnd)));
        if (rv == OK || rv != ERR_ADDRESS_IN_USE) // <=
          return rv;
      }
      return DoBind(IPEndPoint(address, 0));
    }

    The analyzer warns of a possible excessive comparison. In the code, a random port is bound to an IP address. If the binding was successful, the loop stops (which means the number of attempts to bind the port to the address). Based on the logic, you can leave one of the comparisons (now the cycle stops if the binding was successful, or the error of using the port by another address is not returned).

    Analyzer Warning : V523 The 'then' statement is equivalent to the 'else' statement.

    bool ResourcePrefetcher::ShouldContinueReadingRequest(
      net::URLRequest* request,
      int bytes_read
    ) 
    {
      if (bytes_read == 0) {  // When bytes_read == 0, no more data.
        if (request->was_cached())
          FinishRequest(request); // <=
        else
          FinishRequest(request); // <=
        return false;
      }
      return true;
    }

    Parser reported the identical statements in the branches then and else operator the if . What can this lead to? Based on the code, an uncached URL request ( net :: URLRequest * request ) will complete as well as a cached one. If it should be so, then you can remove the branch operator:

    ....
    if (bytes_read == 0) {  // When bytes_read == 0, no more data.
      FinishRequest(request); // <=
      return false;
    }
    ....

    If a different logic was implied, then the wrong method will be called, which can lead to "sleepless nights" and a "sea of ​​coffee."

    Analyzer Warning : V609 Divide by zero. Denominator range [0..4096]. addr.h 159

    static int BlockSizeForFileType(FileType file_type)
    {
      switch (file_type)
      {
        ....
        default:
          return 0; // <=
      }
    }
    static int RequiredBlocks(int size, FileType file_type)
    {
      int block_size = BlockSizeForFileType(file_type);
      return (size + block_size - 1) / block_size; // <=
    }

    What do we see here? This code can lead to a subtle error: in the RequiredBlocks method, the block_size variable is divided by the value (calculated using the BlockSizeForFileType method ). The method BlockSizeForFileType over the transferred value transfer FileType in the switch is the operator returns some value, but it was also provided by default - 0. Imagine that the transfer FileType decided to expand and add new value, but added a new case Q branch in a switchoperator. This will lead to undefined behavior: according to the C ++ standard, division by zero does not cause a program exception. Instead, a hardware exception will be raised that cannot be caught with the standard try / catch block (instead, signal handlers are used, more details can be found here and here ).

    Analyzer Warning : V519 The '* list' variable is assigned values ​​twice successively. Perhaps this is a mistake. Check lines: 136, 138. util.cc 138

    bool GetListName(ListType list_id, std::string* list) 
    {
      switch (list_id) {
        ....
        case IPBLACKLIST:
          *list = kIPBlacklist;
          break;
        case UNWANTEDURL:
          *list = kUnwantedUrlList;
          break;
        case MODULEWHITELIST:
          *list = kModuleWhitelist; // <=
        case RESOURCEBLACKLIST:
          *list = kResourceBlacklist;
          break;
        default:
          return false;
      }
      ....
    }

    A typical mistake when writing a switch statement . It is expected that if the variable list_id takes the MODULEWHITELIST value from the ListType enumeration , then the line at the list pointer is initialized with the value kModuleWhitelist and the switch statement is interrupted . However, because of the missed break statement , the transition to the next RESOURCEBLACKLIST branch will occur , and the string kResourceBlacklist will actually be saved in * list .

    conclusions


    Chromium remains a tough nut. Nevertheless, the PVS-Studio analyzer can find errors in it again and again. When using static analysis methods, errors can be found even at the stage of writing code, before the testing stage.

    What tools can be used for static code analysis? In fact, there are a lot of tools . Naturally, I suggest trying PVS-Studio: the product is very conveniently integrated into the Visual Studio environment, or it can be used with any build system. And more recently, the Linux version has become available. More information about Windows and Linux versions can be found here and here .


    If you want to share this article with an English-speaking audience, then please use the translation link: Phillip Khandeliants. Heading for a Record: Chromium, the 5th Check .

    Have you read the article and have a question?
    Often our articles are asked the same questions. We collected the answers here: Answers to questions from readers of articles about PVS-Studio, version 2015 . Please see the list.

    Also popular now: