Perl 5: as in macros, errors were hidden


    Perl 5 was chosen to supplement the list of open source programming languages ​​that were tested using the PVS-Studio static code analyzer. This article is about errors found and the difficulties of viewing the analysis results. The number of macros in the code is so great that it creates the feeling that the code is written not in C language, but in some strange dialect of it. Despite the difficulties in viewing the code, it was possible to gather up interesting problems, which will be discussed in this article.

    Introduction


    Perl is a high-level interpreted dynamic general purpose programming language (Perl is a family of two high-level, general-purpose, interpreted, dynamic programming languages). Development of Perl 5 began in 1994. After a couple of decades, C code with numerous macros causes nervousness among modern programmers.

    Perl 5 source code was taken from the official repository (the blead branch ). To check the project, the PVS-Studio static code analyzer was used . The analysis was conducted on the Linux operating system, but the analyzer is also available for Windows and macOS.

    Viewing the results of the analysis was not an easy task. The point is that the analyzer checks the preprocessed .ifiles in which all preprocessor directives have already been disclosed, but gives warnings on files with source code. This is the correct behavior of the analyzer; nothing needs to be changed, but many warnings are issued for macros! And behind the macros there is an unreadable code.

    The ternary operator doesn't work the way you think.


    V502 Perhaps the '?:' Operator was different. The '?:' Operator has a lower limit than the '-' operator. toke.c 9494

    STATIC char *
    S_scan_ident(pTHX_ char *s, char *dest, STRLEN destlen, I32 ck_uni){
      ....
      if ((s <= PL_bufend - (is_utf8)
                              ? UTF8SKIP(s)
                              : 1)
            && VALID_LEN_ONE_IDENT(s, PL_bufend, is_utf8))
      {
        ....
      }
      ....
    }

    Let's start the review with a beautiful error. Every few reviews of the code have to repeat that the ternary operator has almost the lowest priority in the calculations.

    Consider a fragment with an error:

    s <= PL_bufend - (is_utf8) ? UTF8SKIP(s) : 1

    The order of operations that the programmer expects:
    1. ?:
    2. -
    3. <=

    What actually happens:
    1. -
    2. <=
    3. ?:

    Keep a label with the priorities of operations: "The priority of operations in the C / C ++ language ".

    V502 Perhaps the '?:' Operator was different. The '?:' Operator has a lower limit than the '==' operator. re_exec.c 9193

    STATIC I32
    S_regrepeat(pTHX_ regexp *prog, char **startposp, const regnode *p,
                regmatch_info *const reginfo, I32 max _pDEPTH){
      ....
      assert(STR_LEN(p) == reginfo->is_utf8_pat ? UTF8SKIP(STRING(p)) : 1);
      ....
    }

    Simple code with a similar error. But if you do not know the priorities of operations, then you can make a mistake in the expression of any size.

    Another place with assert:

    • V502 Perhaps the '?:' Operator was different. The '?:' Operator has a lower limit than the '==' operator. re_exec.c 9286

    V502 Perhaps the '?:' Operator was different. The '?:' Operator has a operator. pp_hot.c 3036

    PP(pp_match)
    {
      ....
      MgBYTEPOS_set(mg, TARG, truebase, RXp_OFFS(prog)[0].end);
      ....
    }

    And here is a warning to the macro ... To understand what is happening there, even the implementation of the macro will not help, because it uses a few more macros!

    Therefore, I attach a fragment of the preprocessed file for this line of code:

    (((targ)->sv_flags & 0x00000400) && (!((targ)->sv_flags & 0x00200000) ||
    S_sv_only_taint_gmagic(targ)) ? (mg)->mg_len = ((prog->offs)[0].end),
    (mg)->mg_flags |= 0x40 : ((mg)->mg_len = (((targ)->sv_flags & 0x20000000)
    && !__builtin_expect(((((PL_curcop)->cop_hints + 0) & 0x00000008) ?
    (_Bool)1 :(_Bool)0),(0))) ? (ssize_t)Perl_utf8_length( (U8 *)(truebase),
    (U8 *)(truebase)+((prog->offs)[0].end)) : (ssize_t)((prog->offs)[0].end),
    (mg)->mg_flags &= ~0x40));

    Somewhere here, the analyzer doubted the correct use of the ternary operator (there are 3 of them), but I did not find the strength to understand what was being done in this code. We have already seen that the developers make such mistakes, so here it can also be very likely.

    Three more uses of this macro:

    • V502 Perhaps the '?:' Operator was different. The '?:' Operator has a operator. pp_ctl.c 324
    • V502 Perhaps the '?:' Operator was different. The '?:' Operator has a operator. regexec.c 7335
    • V502 Perhaps the '?:' Operator was different. The '?:' Operator has a operator. re_exec.c 7335

    Note colleagues Andrei Karpov. I have been meditating on this code for 10 minutes and am inclined to believe that there is no error here. But in any case, it is extremely painful to read such code, and it is better not to write like this.

    Errors in conditions


    V523 The 'then' statement is equivalent to the 'else' statement. toke.c 12056

    static U8 *
    S_add_utf16_textfilter(pTHX_ U8 *const s, bool reversed){
      ....
      SvCUR_set(PL_linestr, 0);
      if (FILTER_READ(0, PL_linestr, 0)) {
        SvUTF8_on(PL_linestr);
      } else {
        SvUTF8_on(PL_linestr);
      }
      PL_bufend = SvEND(PL_linestr);
      return (U8*)SvPVX(PL_linestr);
    }

    I think you can do without examining the contents of the macros to make sure that there are suspiciously duplicated code fragments.

    V564 The '|' operator is applied to bool type value. You've probably forgotten to include the parentheses or intended to use the '||' operator. op.c 11494

    OP *
    Perl_ck_rvconst(pTHX_ OP *o){
      ....
      gv = gv_fetchsv(kidsv,
        o->op_type == OP_RV2CV
          && o->op_private & OPpMAY_RETURN_CONSTANT
            ? GV_NOEXPAND
            : iscv | !(kid->op_private & OPpCONST_ENTERED), iscv // <=
            ? SVt_PVCV
            : o->op_type == OP_RV2SV
          ? SVt_PV
          : o->op_type == OP_RV2AV
              ? SVt_PVAV
              : o->op_type == OP_RV2HV
            ? SVt_PVHV
            : SVt_PVGV);
      ....
    }

    Very strange code. The expression "iscv | ! (kid-> op_private & OPpCONST_ENTERED) ”is not used at all. There is clearly some typo here. For example, perhaps you should write here:

    : iscv = !(kid->op_private & OPpCONST_ENTERED), iscv // <=

    V547 Expression 'RETVAL == 0' is always true. Typemap.c 710

    XS_EUPXS(XS_XS__Typemap_T_SYSRET_pass);
    XS_EUPXS(XS_XS__Typemap_T_SYSRET_pass)
    {
      dVAR; dXSARGS;
      if (items != 0)
        croak_xs_usage(cv,  "");
      {
        SysRet  RETVAL;
    #line 370 "Typemap.xs"
        RETVAL = 0;
    #line 706 "Typemap.c"
        {
          SV * RETVALSV;
          RETVALSV = sv_newmortal();
          if (RETVAL != -1) {          // <=if (RETVAL == 0)           // <=
              sv_setpvn(RETVALSV, "0 but true", 10);
            else
              sv_setiv(RETVALSV, (IV)RETVAL);
          }
          ST(0) = RETVALSV;
        }
      }
      XSRETURN(1);
    }

    The variable RETVAL is checked twice in a row. At the same time, from the code it is clear that this variable is always zero. Perhaps, in one or both conditions, they wanted to check the RETVALSV pointer , but made a typo.

    Throw in warnings about the sizeof operator


    The analyzer has several types of diagnostic rules that look for errors using the sizeof operator . On the Perl 5 project, two such diagnostics issued a total of about a thousand warnings. In this case, it is not the analyzer that is to blame, but the macros.

    V568 It's odd that the sizeof () operator is the 'len + 1' expression. util.c 1084

    char *
    Perl_savepvn(pTHX_ constchar *pv, I32 len){
      ....
      Newx(newaddr,len+1,char);
      ....
    }

    There are many similar macros in the code. I chose one for an example, we are interested in the argument "len + 1".

    The macro preprocessor is expanded to the following code:

    (newaddr = ((void)(__builtin_expect(((((( sizeof(size_t) < sizeof(len+1) ||
    sizeof(char) > ((size_t)1 << 8*(sizeof(size_t) - sizeof(len+1)))) ?
    (size_t)(len+1) : ((size_t)-1)/sizeof(char)) > ((size_t)-1)/sizeof(char))) ?
    (_Bool)1 : (_Bool)0),(0)) && (S_croak_memory_wrap(),0)),
    (char*)(Perl_safesysmalloc((size_t)((len+1)*sizeof(char))))));

    The analyzer warning is issued on the sizeof (len +1) construct . The fact is that no calculations in the arguments of the sizeof operator are performed. In a similar code many macros are revealed. This is probably the old legacy code, in which no one wants to touch anything, but current developers continue to use old macros, suggesting a different behavior.

    Null pointer dereferencing


    V522 Dereferencing of the null pointer 'sv' might take place. pp_ctl.c 577

    OP * Perl_pp_formline(void){
      ....
      SV *sv = ((void *)0);
      ....
      switch (*fpc++) {
      ....
      case4:
        arg = *fpc++;
        f += arg;
        fieldsize = arg;
        if (mark < sp)
          sv = *++mark;
        else {
          sv = &(PL_sv_immortals[2]);
          Perl_ck_warner( (28 ), "....");
        }
        ....
        break;
      case5:
      {
        constchar *s = item = ((((sv)->sv_flags & (....)) == 0x00000400) ? ....
        ....
      }
      ....
    }

    This code snippet is completely taken from the preprocessed file, because it is impossible to verify the existence of a problem from the source file, again due to macros.

    The sv pointer when declared is initialized to zero. The analyzer found that when passing by the value of 5 in the switch statement , this pointer is dereferenced, which was not previously initialized. The change of the sv pointer is present in the branch with the value 4 , but at the end of this block of code there is a break statement . Most likely, this place requires writing additional code.

    V595It was verified against nullptr. Check lines: 15919, 15920. op.c 15919

    voidPerl_rpeep(pTHX_ OP *o){
      ....
      OP *k = o->op_next;
      U8 want = (k->op_flags & OPf_WANT);   // <=if (   k                              // <=
          && k->op_type == OP_KEYS
          && (   want == OPf_WANT_VOID
              || want == OPf_WANT_SCALAR)
          && !(k->op_private & OPpMAYBE_LVSUB)
          && !(k->op_flags & OPf_MOD)
      ) {
      ....
    }

    In this code snippet, the analyzer detected a pointer k , which is dereferenced one line before it is checked for validity. This can be either an error or an extra code.

    Diagnostics V595 finds a lot of warnings in any project, Perl 5 is no exception. It’s impossible to put all this into an article, so we’ll confine ourselves to one example, and developers, if they wish, will check the project themselves.

    miscellanea


    V779 Unreachable code detected. It is possible that an error is present. universal.c 457

    XS(XS_utf8_valid);
    XS(XS_utf8_valid)
    {
      dXSARGS;
      if (items != 1)
        croak_xs_usage(cv, "sv");
      else {
        SV * const sv = ST(0);
        STRLEN len;
        constchar * const s = SvPV_const(sv,len);
        if (!SvUTF8(sv) || is_utf8_string((const U8*)s,len))
          XSRETURN_YES;
        else
          XSRETURN_NO;
      }
      XSRETURN_EMPTY;
    }

    In the line with XSRETURN_EMPTY, the analyzer detected an unreachable code. In this function there are two operators return and croak_xs_usage - a macro that is expanded into the noreturn function:

    voidPerl_croak_xs_usage(const CV *const cv, constchar *const params)
      __attribute__((noreturn));

    In similar places in the Perl 5 code, the NOT_REACHED macro is used to specify an unreachable branch .

    V784 The size of the bit mask is less than the size of the first operand. This will cause the loss of higher bits. inffast.c 296

    void ZLIB_INTERNAL inflate_fast(z_streamp strm, unsigned start){
      ....
      unsignedlong hold;         /* local strm->hold */unsigned bits;              /* local strm->bits */
      ....
      hold &= (1U << bits) - 1;
      ....
    }

    The analyzer detected a suspicious operation when working with bitmasks. A variable of smaller width than the variable hold is used as a bit mask . This results in the loss of the most significant bits. Developers should pay attention to this code.

    Conclusion


    Picture 6



    Finding errors through macros was very difficult. Viewing the report took a lot of time and effort. Nevertheless, the article includes very interesting cases, similar to real mistakes. The report of the analyzer is quite large, there surely is still a lot of interesting things. But I can’t see it further :). I advise the developer to check the project on their own and eliminate the defects that they can detect.

    PS We definitely want to support this interesting project, and we are ready to provide developers with a license for several months.



    If you want to share this article with an English-speaking audience, then please use the link to the translation: Svyatoslav Razmyslov. Perl 5: How to Hide Errors in Macros

    Also popular now: