Docotic.Pdf: What problems will PVS-Studio detect in a mature project?

    Docotic.Pdf and PVS-studio

    Quality is important to us. And we have heard about PVS-Studio. All this led to the desire to check Docotic.Pdf and find out what else can be improved.

    Docotic.Pdf is a general purpose library for working with PDF files. Written in C #, no unsafe code, no external dependencies other than .NET runtime. It works both under .NET 4+ and under .NET Standard 2+.

    The library has been in development for a little over 10 years and it has 110 thousand lines of code without taking tests, examples and other things into account. For static analysis, we constantly use Code Analysis and StyleCop. Several thousand automatic tests protect us from regressions. Our clients from different countries and different industries trust the quality of the library.

    What problems will PVS-Studio detect?

    Installation and first impression

    I downloaded the trial version from the PVS-Studio website. Pleasantly surprised by the small size of the installer. Installed with the default settings: analysis engines, separate PVS-Studio, integration into Visual Studio 2017.

    After installation, nothing started, and two shortcuts with the same icons were added to the Start menu: Standalone and PVS-Studio. For a moment I thought about what to start. I launched Standalone and was unpleasantly surprised by the interface. The 200% scale set for my Windows is supported crookedly. A part of the text is too small, a part of the text does not fit in the space reserved for it. The name, Unicorn and Actions list are cropped at any window size. Even with full screen.

    Well, okay, I decided to open my project file. Suddenly, the File menu did not find such an option. There I was offered only to open separate files. Thank you, I thought, I'll try another option. I launched PVS-Studio - they showed me a window with blurred text. Scale 200% again made itself felt. Text reported: look for me in “Three Crowns”; look for the PVS-Studio menu in Visual Studio. Well, opened the Studio.

    Opened the solution. Indeed, there is a PVS-Studio menu, and in it there is an opportunity to check “Current Project”. I made the project I needed current and started the check. Below in the Studio a window appeared with the results of the analysis. A window with the progress of the checkout also appeared in the background, but I did not immediately find it. At first there was a feeling that the test did not start or immediately ended.

    Result of the first check

    The analyzer checked all 1253 project files in approximately 9 minutes and 30 seconds. By the end of the check, the file counter did not change as quickly as at the beginning. Perhaps there is some non-linear dependence of the scan duration on the number of files to be scanned.

    Information about 81 High, 109 Medium and 175 Low warnings appeared in the results window. If you count the frequency, you get 0.06 High warnings / file, 0.09 Medium warnings / file, and 0.14 Low warnings / per file. Or
    0.74 High warnings per thousand lines of code, 0.99 Medium warnings per thousand lines of code, and 1.59 Low warnings per thousand lines of code.

    Here in this article it is indicated that in CruiseControl.NET with its 256 thousand lines of code, the analyzer found 15 High, 151 Medium and 32 Low warnings.

    It turns out that as a percentage for Docotic.Pdf, more warnings were issued in each of the groups.

    What is found?

    Warnings Low I decided to disregard at this stage.

    I sorted out warnings on the Code column and it turned out that V3022 "Expression is always true / false" and V3063 "If it is evaluated", became the absolute frequency record holder in terms of frequency . In my opinion, they are about one thing. In total, these two warnings give 92 cases out of 190. Relative frequency = 48%.

    The division logic into High and Medium is not entirely clear. I expected V3072 "The 'A' class containing IDisposable members doesn’t itself implement IDisposable" and V3073 "Not all IDisposable members are properly disposed. Call 'Dispose' when disposing 'A' class' in the High group, for example. But it tastes, of course.

    I was surprised that V3095 “The object was used before it was verified against null. Check lines: N1, N2 ”is marked twice as High and once as Medium. Bug

    Trust but check

    It is time to check whether the received warnings are justified. Are any real errors found? Are there any incorrect warnings?

    I divided the warnings found into the groups below.

    Important warnings

    Their correction increased the stability of work, solved problems with memory leaks, etc. Real errors / imperfections.

    There were 16 of them, which is about 8% of all warnings.

    I will give some examples.

    V3019 "Possibly incorrectly variable compared to null after type conversion using 'as' keyword. Check variables 'color', 'indexed' »

    publicoverrideboolIsCompatible(ColorImpl color)
        IndexedColorImpl indexed = color as IndexedColorImpl;
        if (color == null)
        return indexed.ColorSpace.Equals(this);

    As you can see, instead of indexed, the variable color is compared with null. This is wrong and can lead to NRE.

    V3080 “Possible null dereference. Consider inspecting 'cstr_index.tile_index' ”

    A small fragment for illustration:

    if (cstr_index.tile_index == null)
        if (cstr_index.tile_index[0].tp_index == null)
            // ..

    Obviously, the first condition implied! = Null. In the current view, the code will call NRE each time it is called.

    V3083 “Unsafe invocation of event 'OnProgress', NullReferenceException is possible. Consider assigning an event to a local variable before invoking it. ”

        if (OnProgress != null)
            OnProgress(this, new EventArgs());

    The warning helped correct a potential exception. Why can it occur? There is a good explanation on Stackoverflow .

    V3106 “Possibly index is out of bounds. The '0' index is pointing beyond 'v' bound »

    var result = new List<FontStringPair>();
    for (int i = 0; i < text.Length; ++i)
        var v = new List<FontStringPair>();

    The error is that the result of createPairs is ignored, and instead an appeal is made to the empty list. Apparently, initially createPairs accepted the list as a parameter, but an error was made during the method changes.

    V3117 “Constructor parameter 'validateType' is not used A

    warning was issued for code similar to

    publicSomeClass(IDocument document, bool validateType = true)
        : base(document, true)
        m_provider = document;

    The warning itself does not seem important. But the problem is more serious than it seems at first glance. In the process of adding the optional parameter validateType, the call to the constructor of the base class was forgotten to be fixed.

    V3127 „Two similar code fragments were found. Perhaps this is a typo and 'range' variable should be used instead of 'domain' “

    privatevoidfillTransferFunction(PdfStreamImpl function)
        // ..
        PdfArrayImpl domain = new PdfArrayImpl();
        function.Add(Field.Domain, domain);
        PdfArrayImpl range = new PdfArrayImpl();
        function.Add(Field.Range, domain);
        // ....

    Perhaps a warning will not be issued if parts of the code are slightly more different. But in this case, an error was found when using copy-paste.

    Theoretical / formal warnings

    They are either correct, but fixing them does not fix any specific errors and does not affect the readability of the code. Or they point to places where there could be a mistake, but there is none. For example, the order of the parameters is intentionally changed. For such alerts do not need to change anything in the program.

    There were 57 of them, which is about 30% of all warnings. I will give examples for cases that deserve attention.

    V3013 “It is the odd that the body of the BeginText function is fully equivalent to the body of the EndText function (166, line 171)“


    Both body functions are actually the same. But it is right. And is it really so strange if the bodies of functions from one line coincide?

    V3106 „Possible negative index value. The value of 'c1' index could reach -1 “

    freq[256] = 1;
    // ....
    c1 = -1;
    v = 1000000000L;
    for (i = 0; i <= 256; i++)
        if (freq[i] != 0 && freq[i] <= v)
            v = freq[i];
            c1 = i;
    // ....
    freq[c1] += freq[c2];

    It agree, I resulted a slice of not the most clear algorithm. But, in my opinion, in this case, the analyzer suffers in vain.

    V3107 "Identical expression 'neighsum' to the left and the right of compound assignment" The

    warning was caused by a completely banal code:

    neighsum += neighsum;

    Yes, it can be rewritten via multiplication. But there is no error.

    V3109 „The 'l_cblk.data_current_size' sub-expression The expression is incorrect or it can be simplified. “

    /* Check possible overflow on size */if ((l_cblk.data_current_size + l_seg.newlen) < l_cblk.data_current_size)
        // ...

    A comment in the code explains the intent. False alarm again.

    Reasonable warnings

    Their correction had a positive effect on the readability of the code. That is, reduced unnecessary conditions, checks, etc. The impact on how the code works is not obvious.

    There were 103 of them, which is about 54% of all warnings.

    V3008 „The 'l_mct_deco_data' variable is assigned values ​​twice successively. Perhaps this is a mistake “

    if (m_nb_mct_records == m_nb_max_mct_records)
        l_mct_deco_data = (OPJ_INT32)m_nb_mct_records;
    l_mct_deco_data = (OPJ_INT32)m_nb_mct_records;

    Rights analyzer: assignment inside if is not necessary.

    V3009 „It’s odd that this method always returns one and the same value“

    privatestaticboolopj_dwt_decode_tile(opj_tcd_tilecomp_t tilec, OPJ_UINT32 numres)
        if (numres == 1U)
        // ...returntrue;

    On the advice of the analyzer, the method has been changed and returns nothing more.

    V3022 „Expression '! Add' is always true“

    privatevoidaddToFields(PdfDictionaryImpl controlDict, booladd)
        // ...if (add)
            // ..return;
        if (!add)
            // ...
        // ...

    Indeed, there is no point in the second if. The condition will always be true.

    V3029 "The conditional expression of the" if "statements followed alongside each other are identical"

    if (stroke)
        extGState.OpacityStroke = opacity;
    if (stroke)
        state.AddReal(Field.CA, opacity);

    It is not clear how such a code came about. But now we fixed it.

    V3031 Anonymous check can be simplified. The '||' operator is surrounded by opposite expressions “

    This is a nightmare condition:

    if (!(cp.m_enc.m_tp_on != 0 &&
        ((!opj_codec_t.OPJ_IS_CINEMA(cp.rsiz) && (t2_mode == J2K_T2_MODE.FINAL_PASS)) || opj_codec_t.OPJ_IS_CINEMA(cp.rsiz))))
        // ...

    After the change is much better

    if (!(cp.m_enc.m_tp_on != 0 &&
        (opj_codec_t.OPJ_IS_CINEMA(cp.rsiz) || t2_mode == J2K_T2_MODE.FINAL_PASS)))
        // ...

    V3063 „A part of the conditional expression is always true if it is evaluated: x! = Null“
    V3022 „Expression 'x! = Null' is always true“

    Here I took the warnings that checking for null does not make sense. Is it right to do so - the question is ambiguous. Below I have described the essence of the issue.

    Unwarranted Warnings

    False positives. Due to errors in the implementation of a specific test or some lack of the analyzer.

    14 of them were issued, which is about 7% of all warnings.

    V3081 „The 'i' counter is not used inside a nested loop. Consider inspecting usage of 'j' counter “

    A slightly simplified version of the code for which this warning was issued:

    for (uint i = 0; i < initialGlyphsCount - 1; ++i)
        for (int j = 0; j < initialGlyphsCount - i - 1; ++j)
            // ...

    Obviously, i is used in a nested loop.

    V3125 “The object was used after it was verified against null“

    The code for which such a warning is issued:

    privatestaticintCompare_SecondGreater(cmapEncodingRecord er1, cmapEncodingRecord er2)
        if (er1 == er2)
        if (er1 != null && er2 == null)
        if (er1 == null && er2 != null)
        return er1.CompareTo(er2);

    er1 cannot be null at the time of calling CompareTo ().

    Another code for which this warning is issued:

    privatestaticvoidrealloc(refint[][] table, int newSize)
        int[][] newTable = newint[newSize][];
        int existingSize = 0;
        if (table != null)
            existingSize = table.Length;
        for (int i = 0; i < existingSize; i++)
            newTable[i] = table[i];
        for (int i = existingSize; i < newSize; i++)
            newTable[i] = newint[4];
        table = newTable;

    table cannot be null in a loop.

    V3134 „Shift by [32.255] bits is greater than the size of the 'UInt32' type of expression '(uint) 1'“

    A piece of code for which this warning is issued:

    byte bitPos = (byte)(numBits & 0x1F);
    uint mask = (uint)1 << bitPos;

    It can be seen that bitPos can have a value from the range [0..31], but the analyzer believes that it can have a value from the range [0..255], which is incorrect.

    I will not give other similar cases, since they are equivalent.

    Additional thoughts about some checks

    It seemed undesirable to warn that 'x! = Null' is always true in cases where x is the result of calling some method. Example:

    private X GetX(int y)
        if (y > 0)
           returnnew X(...);
        if (y == 0)
           returnnew X(...);
        thrownew ArgumentOutOfRangeException(nameof(x));
        // ...
        X x = GetX(..);
        if (x != null)
           // ...

    Yes, formally the analyzer is right: x will always not be null, because GetX will either return a full instance, or throw an exception. But will the code improve removing null checks? What if GetX changes later? Is Method required to know the GetX implementation?

    Inside the team opinions are divided. It has been suggested that the current method has a contract in which it should not return null. And it makes no sense to write redundant code "for the future" with each call. And if the contract changes - it is necessary to update the calling code.

    In support of this opinion, the following argument was given: checking for null is how to wrap each call in try / catch in case the method starts throwing exceptions in the future.

    As a result, according to the principle of YAGNI, decided not to hold on to the checks and deleted them. All warnings were moved from theoretical / formal to reasonable.

    I would be glad to read your opinion in the comments.


    Static analysis is a useful thing. PVS-Studio allows you to find real errors.

    Yes, there are unfounded / incorrect warnings. Still, PVS-Studio found real errors in the project, where Code Analysis is already used. Our product is fairly well covered with tests, our clients test it in one way or another, but static analysis still benefits the robots do it better .

    Finally, some statistics.

    Top 3 Unwarranted Warnings

    V3081 „The 'X' counter is not used inside a loop loop. Consider inspecting usage of 'Y' counter “
    1 out of 1 found to be unsubstantiated.

    V3125 „ The object was used against it. Check lines: N1, N2 “
    9 out of 10 found to be unfounded

    V3134 „ Shift by N bits is greater than the size of type “
    4 out of 5 are found to be unreasonable

    Top 3 Important Warnings

    V3083 Unsafe invocation of event, NullReferenceException is possible. Consider
    5 out of 5 found important

    V3020 “An unconditional 'break / continue / return / goto' within a loop”
    V3080 “Possible null dereference”
    2 out of 2 found important

    V3019 “It is possible that an incorrect variable is compared with the after-type conversion using 'as' keyword “
    V3127 „ Two similar code fragments were found. Perhaps, this is a typo and 'X' variable should be used instead of 'Y' “
    1 out of 1 is considered important

    Also popular now: