Chromium: Using Inaccurate Data

    PVS-Studio, Common Weakness EnumerationWe bring to your attention a series of articles devoted to recommendations for writing high-quality code on the example of errors found in the Chromium project. This is the fifth part, which will be devoted to errors in the use of unverified or incorrectly verified data. A very large number of vulnerabilities exist due to the use of unverified data, which makes this topic interesting and relevant.

    In fact, the cause of the vulnerability can be an error of almost any type, even an ordinary typo. Actually, if a found error is classified according to the Common Weakness Enumeration , then it means a potential vulnerability.

    Starting with version 6.21, the PVS-Studio analyzer learned to classify errors found according to the Common Weakness Enumeration and assign them the corresponding CWE ID.

    Perhaps readers have already noticed that in previous articles, in addition to the warning number Vxxx, I also provided a CWE ID. This means that the previously discussed errors could theoretically cause vulnerability. The probability of this is small, but it is. Interestingly, we were able to match one or another CWE ID with almost every warning issued by PVS-Studio. This means that without planning it ourselves, we created an analyzer that is able to detect a large number of weaknesses :).

    Conclusion. PVS-Studio analyzer helps prevent many types of vulnerabilities in advance. Publication on this subject: "How can PVS-Studio help in searching for vulnerabilities? ".

    In this article I collected errors that could potentially lead to security problems. I warn you that the choice of errors is very conditional and subjective. It may well turn out that some vulnerability is masked as an error that I called a trivial typo in one of the previous articles

    So, let's look at some security defects I noticed while parsing the report issued by PVS-Studio for the Chromium project, as I wrote in the introductory article, I looked at the report quite fluently, so there may be other errors that I have not noticed. The task of the article in general terms is to show how some errors can lead to the fact that the program begins to process incorrect or unverified data. I have not yet decided what is the best name for such data, and for now I will use the term "false data".

    Error Examples


    Chromium project.

    InstallUtil::ConditionalDeleteResult
    InstallUtil::DeleteRegistryValueIf(....) {
      ....
      ConditionalDeleteResult delete_result = NOT_FOUND;
      ....
      if (....) {
        LONG result = key.DeleteValue(value_name);
        if (result != ERROR_SUCCESS) {
          ....
          delete_result = DELETE_FAILED;
        }
        delete_result = DELETED;
      }
      return delete_result;
    }

    PVS-Studio Warning: V519 CWE-563 The 'delete_result' variable is assigned values ​​twice successively. Perhaps this is a mistake. Check lines: 381, 383. install_util.cc 383

    The function returns an invalid status. As a result, other parts of the program will assume that the function successfully deleted a certain value. The error is that the DELETE_FAILED status is always replaced with the DELETED status .

    The error can be fixed by adding the else keyword :

    if (result != ERROR_SUCCESS) {
      ....
      delete_result = DELETE_FAILED;
    } else {
      delete_result = DELETED;
    }

    Perhaps the considered error does not reflect the essence of inaccurate data very well. In this function, the creation of false data occurs, and not their verification or use. So let's look at another, more appropriate error.

    PDFium library (used by Chromium).

    CPVT_WordRange Intersect(const CPVT_WordRange& that) const {
      if (that.EndPos < BeginPos || that.BeginPos > EndPos ||
          EndPos < that.BeginPos || BeginPos > that.EndPos) {
        return CPVT_WordRange();
      }
      return CPVT_WordRange(std::max(BeginPos, that.BeginPos),
                            std::min(EndPos, that.EndPos));
    }

    PVS-Studio warnings:

    • V501 CWE-570 There are identical sub-expressions 'that.BeginPos> EndPos' to the left and to the right of the '||' operator. cpvt_wordrange.h 46
    • V501 CWE-570 There are identical sub-expressions 'that.EndPos <BeginPos' to the left and to the right of the '||' operator. cpvt_wordrange.h 46

    The condition is spelled incorrectly. To make it easier to notice the error, we reduce the condition:

    if (E2 < B1 || B2 > E1 || E1 < B2 || B1 > E2)

    Note that (E2 <B1) and (B1> E2) are one and the same. Similarly (B2> E1), this is the same as (E1 <B2) .

    It turns out that not all necessary checks are performed, and then an incorrect range can be generated, which, in turn, will affect the functioning of the program.

    Now let's look at a large and complex piece of code from the RE2 regular expression library (used by Chromium). To be honest, I don’t even understand what is happening here, but there is definitely an abnormal check in the code.

    First, show how some types are declared. If this is not done, then the code will not be very clear.

    typedef signed int Rune;
    enum
    {
      UTFmax = 4,
      Runesync = 0x80,
      Runeself = 0x80,
      Runeerror = 0xFFFD,
      Runemax = 0x10FFFF,
    };

    And now a function with an anomaly.
    char*
    utfrune(const char *s, Rune c)
    {
      long c1;
      Rune r;
      int n;
      if(c < Runesync)    /* not part of utf sequence */
        return strchr((char*)s, c);
      for(;;) {
        c1 = *(unsigned char*)s;
        if(c1 < Runeself) {  /* one byte rune */
          if(c1 == 0)
            return 0;
          if(c1 == c)                // <=
            return (char*)s;
          s++;
          continue;
        }
        n = chartorune(&r, s);
        if(r == c)
          return (char*)s;
        s += n;
      }
      return 0;
    }

    The PVS-Studio analyzer generates a warning on the line that I noted with the comment "// <=". Message: V547 CWE-570 Expression 'c1 == c' is always false. rune.cc 247

    Let's try to figure out why the condition is always false. First, pay attention to these lines:

    if(c < Runesync)
      return strchr((char*)s, c);

    If the variable c <0x80 , then the function will stop its work. If the function does not complete its work, but continues it, then we can say for sure that the variable c> = 0x80 .

    Now we look at the condition:

    if(c1 < Runeself)

    The condition (c1 == c) , marked by the comment "// <=", is fulfilled only if c1 <0x80 .

    So here is what we know about variable values:

    • c> = 0x80
    • c1 <0x80

    It follows that the condition c1 == c is always false. But this is very suspicious. It turns out that the utfrune function in the regular expression library does not work as planned. The consequences of such an error are unpredictable.

    LibVPX video codec (used by Chromium).

    #define VP9_LEVELS 14
    extern const Vp9LevelSpec vp9_level_defs[VP9_LEVELS];
    typedef enum {
      ....
      LEVEL_MAX = 255
    } VP9_LEVEL;
    static INLINE int log_tile_cols_from_picsize_level(
      uint32_t width, uint32_t height)
    {
      int i;
      const uint32_t pic_size = width * height;
      const uint32_t pic_breadth = VPXMAX(width, height);
      for (i = LEVEL_1; i < LEVEL_MAX; ++i) {
       if (vp9_level_defs[i].max_luma_picture_size >= pic_size &&
           vp9_level_defs[i].max_luma_picture_breadth >= pic_breadth)
       {
         return get_msb(vp9_level_defs[i].max_col_tiles);
       }
      }
      return INT_MAX;
    }

    PVS-Studio warnings:

    • V557 CWE-119 Array overrun is possible. The value of 'i' index could reach 254. vp9_encoder.h 931
    • V557 CWE-119 Array overrun is possible. The value of 'i' index could reach 254. vp9_encoder.h 932
    • V557 CWE-119 Array overrun is possible. The value of 'i' index could reach 254. vp9_encoder.h 933

    The vp9_level_defs array consists of 14 elements. In the loop, the variable i , used as the index of the array, changes from 0 to 254. Result: the border of the array is exited.

    Well, if this code leads to Access Violation . But in practice, most likely, some random data located after the vp9_level_defs array will begin to be processed .

    Another similar error of using data outside the array was encountered in the SQLite library (used in Chromium).

    First, note that the yy_shift_ofst array contains 455 elements.

    static const short yy_shift_ofst[] = {
      /*   0 */ 355, 888, 1021, 909, 1063, 1063, 1063, 1063, 20, -19,
      ....
      /* 450 */ 1440, 1443, 1538, 1542, 1562,
    }

    Two macros are also of interest to us:

    #define YY_SHIFT_COUNT    (454)
    #define YY_MIN_REDUCE     993

    The YY_SHIFT_COUNT macro defines the maximum index that can be used to access the elements of the yy_shift_ofst array . It is not 455, but 454, since the numbering of elements goes from 0. The

    macro YY_MIN_REDUCE , equal to 993, has nothing to do with the size of the yy_shift_ofst array .

    Function containing weak validation:

    static unsigned int yy_find_shift_action(....)
    {
      int i;
      int stateno = pParser->yytos->stateno;
      if( stateno>=YY_MIN_REDUCE ) return stateno;      // <=
      assert( stateno <= YY_SHIFT_COUNT );
      do {
        i = yy_shift_ofst[stateno];                     // <=
      ....
    }

    PVS-Studio Warning: V557 CWE-125 Array overrun is possible. The value of 'stateno' index could reach 992. sqlite3.c 138802

    The protection has been made in the function that the index when accessing the array should not be more than a certain value. Due to a typo, or for another reason, the wrong constant is used. You should use a constant equal to 454, but instead the index value is compared with 993.

    As a result, you can access the array abroad and read arbitrary invalid data.

    Note. Below is the correct assert , but it will not help in the Release version.

    Most likely, the check should be rewritten as follows:

    if (stateno > YY_SHIFT_COUNT)
    {
      assert(false);
      return stateno;
    }

    ICU project (used by Chromium).
    UVector*
    ZoneMeta::createMetazoneMappings(const UnicodeString &tzid) {
      UVector *mzMappings = NULL;
      ....
      if (U_SUCCESS(status)) {
        ....
        if (U_SUCCESS(status)) {
          ....
          while (ures_hasNext(rb)) {
            ....
            if (mzMappings == NULL) {
              mzMappings = new UVector(
                deleteOlsonToMetaMappingEntry, NULL, status);
              if (U_FAILURE(status)) {
                delete mzMappings;
                uprv_free(entry);
                break;
              }
            }
            ....
          }
          ....
        }
      }
      ures_close(rb);
      return mzMappings;
    }

    PVS-Studio Warning: V774 CWE-416 The 'mzMappings' pointer was used after the memory was released. zonemeta.cpp 713

    The code is complex, I find it difficult to say for sure whether there is a real mistake or not. However, as I understand it, a situation is possible when the function returns a pointer to an already freed memory block. The correct invalid status handler should nullify the pointer:

    if (U_FAILURE(status)) {
      delete mzMappings;
      mzMappings = nullptr;
      uprv_free(entry);
      break;
    }

    Now it turns out that the function returned a pointer to the freed memory. This memory can be anything and the use of this invalid pointer will lead to undefined program behavior.

    The following function of the Chromium project incorrectly implements protection against negative values.

    void AXPlatformNodeWin::HandleSpecialTextOffset(LONG* offset) {
      if (*offset == IA2_TEXT_OFFSET_LENGTH) {
        *offset = static_cast(GetText().length());
      } else if (*offset == IA2_TEXT_OFFSET_CARET) {
        int selection_start, selection_end;
        GetSelectionOffsets(&selection_start, &selection_end);
        if (selection_end < 0)
          *offset = 0;
        *offset = static_cast(selection_end);
      }
    }

    PVS-Studio Warning: V519 CWE-563 The '* offset' variable is assigned values ​​twice successively. Perhaps this is a mistake. Check lines: 3543, 3544. ax_platform_node_win.cc 3544

    If the value of the selection_end variable is negative, then the function should return 0 . However, due to a typo, 0 is not written to the right place. The correct code should be like this:

    if (selection_end < 0)
      selection_end = 0;
    *offset = static_cast(selection_end);

    Due to this error, the function can return a negative number, although it should not. This is a negative number that can “leak” through the check, and there is inaccurate data.

    Other bugs


    To be honest, I do not really like the examples that I cited in the previous section of the article. There are few of them, and they do not very well reflect the essence of errors in using false data. I think that over time I will make a separate article where I will show more vivid examples of errors, collecting them from various open projects.

    By the way, it was possible to include more examples of errors in the article, but I already “spent” them when writing the previous articles, but I don’t want to repeat them. For example, in the article “Chromium: typos”, there was such a fragment:

      if(!posX->hasDirtyContents() ||
         !posY->hasDirtyContents() ||
         !posZ->hasDirtyContents() ||
         !negX->hasDirtyContents() ||
         !negY->hasDirtyContents() ||          // <=
         !negY->hasDirtyContents())            // <=

    Because of this typo, the object referenced by the negZ pointer is not checked . As a result, the program will work with false data.

    Also in this article, I did not consider situations where inaccurate (corrupted) data occurs due to the lack of a pointer check that the malloc function returns . If the function malloc returned a NULL, it does not mean that the only possible error is a NULL pointer dereference. There are more insidious situations. Schematically they look like this:

    int *ptr = (int *)malloc(100 * sizeof(int));
    ptr[1234567] = 42;

    There will be no dereferencing of the null pointer. Here the data will be recorded, it is not clear where and the destruction of some data.

    This is an interesting story and I will devote it to the next separate article.

    Recommendations


    A variety of errors lead to the emergence and use of unreliable (unverified, corrupted) data. There can be no universal advice here. You can, of course, write: do not make mistakes in the code! But there is no use from such advice :).

    So why then did I write this article and highlight this type of error?

    So you know about them. Knowing the existence of a problem in itself helps prevent it. If someone does not know about a problem, it does not mean that it is not there. This picture will be a good illustration:

    Figure 2



    What can still be advised:

    1. Update the libraries used in your project. In new versions, various errors that are vulnerabilities can be fixed. However, we must admit that the vulnerability may appear just in the new version, but in the old one. But still, a better solution would be to upgrade the libraries. More people know about old vulnerabilities than about new ones.
    2. Carefully check all input data, especially coming from somewhere outside. For example, all data coming from somewhere over the network must be checked very carefully.
    3. Use various code verification tools. For example, the Chromium project clearly lacks the use of the PVS-Studio static analyzer :).
    4. Explain to colleagues that "a simple coding error does not mean a non-fearsome error ." If your team develops responsible applications, then you should focus on the quality of the code and destroy all, even harmless-looking errors.

    Note about PVS-Studio


    As I already said, PVS-Studio analyzer already helps prevent vulnerabilities by detecting errors even at the stage of writing code. But we want more and soon we will seriously improve PVS-Studio by introducing the concept of “using unverified data” into Data Flow analysis.

    We even already reserved a special number for this important diagnosis: V1010. Diagnostics will allow you to identify errors when data was received from an unreliable source (for example, sent over a network) and used without proper verification. The lack of all necessary input checks is often the cause of vulnerability detection in applications. We wrote more about this recently in the article “ PVS-Studio 2018: CWE, Java, RPG, macOS, Keil, IAR, MISRA ” (see the section “Potential Vulnerabilities, CWE”).

    New diagnostics will significantly strengthen the analyzer in identifying potential vulnerabilities. Most likely, the V1010 diagnostics will correspond to the identifier CWE-20 (Improper Input Validation).

    Conclusion


    I invite you and your colleagues to read the article " 42 recommendations " on our website . After it, the programmer will not turn into a security expert, but will learn a lot of new and useful information. Especially these articles will be useful to developers who have just recently mastered the C or C ++ language and who do not suspect how deep the rabbit hole they hit.

    I plan to update 42 Tips and turn them into 50 Tips. Therefore, I invite you to subscribe to my @Code_Analysis twitter and our RSS feed so as not to miss this and other interesting articles on our blog.



    If you want to share this article with an English-speaking audience, then please use the link to the translation: Andrey Karpov. Chromium: Use of Untrusted Data .

    Also popular now: