Checking a CDK project with IntelliJ IDEA static analyzer

    I decided to test the IntelliJ IDEA static Java code analyzer and with its help checked the project The Chemistry Development Kit . Here I will give some errors that I found. I think that some of them are typical for Java-programs in general, so they may be interesting.


    The Chemistry Development Kit is an open source Java library for solving problems of chemoinformatics and bioinformatics. When I was engaged in bioinformatics, we actively used it. The project has been developed for more than 20 years, it has dozens of authors, and the quality of the code there is very uneven. However, the project has unit tests, and pom.xml contains integration with the JaCoCo coverage analyzer . In addition, there are set up plug-ins of as many as three static analyzers: FindBugs , PMD , Checkstyle . More interesting to check, what warnings remain.


    A static Java code analyzer built into IntelliJ IDEA is not inferior to specialized static analysis tools, and in some ways surpasses them. In addition, virtually all static analysis features are available in Community Edition , the free and open source IDE. In particular, the free version gives all the warnings described in this article.


    By default, static analysis is performed continuously in the code editing mode, so if you write code in IntelliJ IDEA, you will correct a lot of errors literally seconds after they were allowed, even before running the tests. You can check the entire project or its part in batch mode using the Analyze | Inspect Code or run a separate inspection using Analyze | Run Inspection by Name . In this case, some inspections become available, which, because of the complexity, do not work in edit mode. However, there are few such inspections.


    Many IntelliJ IDEA inspections do not report a bug, but rather an inaccuracy in the code or offer a more idiomatic, beautiful, or fast alternative. This is useful when you are constantly working in the IDE. However, in my case it is better to start with messages that warn about real bugs. Mostly interesting is the Java | Probable Bugs , although there are other categories in which to rummage, for example, Numeric Issues .


    I will talk only about some interesting warnings.


    1. Unary plus


    Unary pluses plucked as much as 66 in the project. To write +1instead of just 1sometimes you want for beauty. However, in some cases, the unary plus comes out, if instead they +=wrote=+ :


    int totalCharge1 = 0;
    while (atoms1.hasNext()) {
      totalCharge1 = +((IAtom) atoms1.next()).getFormalCharge();
    }
    Iterator<IAtom> atoms2 = products.atoms().iterator();
    int totalCharge2 = 0;
    while (atoms2.hasNext()) {
      totalCharge2 = +((IAtom) atoms2.next()).getFormalCharge();
    }

    The obvious misprint, which resulted in ignoring all iterations of the loop except the last. It may seem strange that not “space equals plus space” is written, but “space equals space plus”. However, the strangeness disappears if you delve into history . Initially, “equal” and “plus” were really close, but in 2008 we went through an automatic formatter, and the code changed. Here, by the way, the moral for static analyzers is that it is reasonable to issue warnings based on strange formatting, but if the code is automatically formatted, the warnings will disappear and the bugs will remain.


    2. Integer division with reduction to fractional


    A rather offensive mistake, but static analyzers find it well. Here is an example :


    angle = 1 / 180 * Math.PI;

    Unfortunately, the angle was not one degree at all, but zero. Similar error :


    Integer c1 = features1.get(key);
    Integer c2 = features2.get(key);
    c1 = c1 == null ? 0 : c1;
    c2 = c2 == null ? 0 : c2;
    sum += 1.0 - Math.abs(c1 - c2) / (c1 + c2); // integer division in floating-point context

    It seems that both the number c1and c2the non-negative, which means that the difference between the module will never exceed the amount. Therefore, the result will be 0 if both numbers are nonzero, or 1 if one of them is 0.


    3. Call Class.getClass ()


    Sometimes people call a method getClass()on an object of type Class. The result is again an object of type Classwith a constant value Class.class. This is usually a mistake: getClass()no need to call. For example, here :


    public <T extends ICDKObject> T ofClass(Class<T> intf, Object... objects){
        try {
            if (!intf.isInterface()) 
                thrownew IllegalArgumentException("expected interface, got " + 
                                                    intf.getClass());
        ...

    If an exception occurs, reporting it will be absolutely useless. By the way, errors in the error handling procedure are often found by static analysis in old projects: as a rule, error handling procedures are tested the worst.


    4. Calling toString () on an array


    This is a classic of the genre: toString () for arrays is not redefined, and its result is rather useless. Usually this can be found in diagnostic messages .


    int[]        dim             = {0, 0, 0};
    ...
    return"Dim:" + dim + " SizeX:" + grid.length + " SizeY:" + grid[0].length + " SizeZ:"...

    It’s hard to notice the problem because it’s dim.toString()implicit, but the string concatenation delegates to it. Immediately proposed a fix - wrap in Arrays.toString(dim).


    5. The collection is read, but not filled.


    This is also often found in the code base, which is not constantly checked by the static analyzer. Here is a simple example :


    final Set<IBond> bondsToHydrogens = new HashSet<IBond>();
    // ... 220 строк логики, но bondsToHydrogens нигде не заполняется!for (IBond bondToHydrogen : bondsToHydrogens) // в цикл не зайдём
      sgroup.removeBond(bondToHydrogen);

    Obviously, the filling just missed. In static analyzers, there are simpler tests that talk about an unused variable, but the variable is used here, so they are silent. Need a smarter inspection, which knows about the collection.


    6. On the contrary: fill, but do not read


    Reverse cases are also possible. Here is an example with an array :


    int[] tmp = newint[trueBits.length - 1];
    System.arraycopy(trueBits, 0, tmp, 0, i);
    System.arraycopy(trueBits, i + 1, tmp, i, trueBits.length - i - 1);

    The inspection knows that the third argument of the arraycopy method is used only to write the array, and after that the array is not used at all. Judging by the logic of the code, the line is missing trueBits = tmp;.


    7. Integer versus ==


    This is an insidious bug, because the small values ​​of Integer objects are cached, and everything can work well, until one day the number exceeds 127. This problem may not be noticeable at all :


    for (int a = 0; a < cliqueSize; a++) {
      for (int b = 0; b < vecSize; b += 3) {
        if (cliqueList.get(a) == compGraphNodes.get(b + 2)) {
          cliqueMapping.add(compGraphNodes.get(b));
          cliqueMapping.add(compGraphNodes.get(b + 1));
        }
      }
    }

    Well, it would seem that some objects are compared in some lists, maybe everything is fine. You must be careful to see that these lists are objects of type Integer.


    8. Duplicate in Map


    In this inspection, the picture is worth a thousand words. See a mistake ?


    Is that better?


    9. The result of the method is not used.


    The result of some methods is stupid not to use, as IDEA readily reports :


    currentChars.trim();

    Probably meant currentChars = currentChars.trim();. Since strings in Java are immutable, if the result is not reassigned, nothing will happen. It occurs more, for example , str.substring(2).


    By the way, this is a rather complicated inspection. In addition to a previously prepared list of methods, we sometimes try to automatically determine the methods whose result is worth using. Here an interprocedural analysis is required, both in the source text and in the bytecode of the libraries. And all this is done on the fly in the process of editing the code!


    10. Unreachable switch branches


    // if character is out of scope don'tif (c > 128)
        return0;
    switch (c) {
        case'\u002d': // hyphencase'\u2012': // figure dashcase'\u2013': // en-dashcase'\u2014': // em-dashcase'\u2212': // minusreturn'-'; // 002ddefault:
            return c;
    }

    Since we have excluded characters with a code greater than 128, the branches are \u2012-\u2212unreachable. It seems not to be excluded.


    11. Unreachable condition


    An absolutely wonderful problem in the chain of conditions :


    if (oxNum == 0) {
        if (hybrid.equals("sp3")) { ... } elseif (hybrid.equals("sp2")) return47;
    } elseif (oxNum == 1 && hybrid.equals("sp3"))
        return47;
    elseif ((oxNum == 2 && hybrid.equals("sp3")) 
            || (oxNum == 1 && hybrid.equals("sp2"))
            || (oxNum == 0 && hybrid.equals("sp"))) // вот это вот недостижимоreturn48;
    elseif ((oxNum == 3 && hybrid.equals("sp3")) 
            || (oxNum >= 2 && hybrid.equals("sp2"))
            || (oxNum >= 1 && hybrid.equals("sp"))) return49;

    In complex conditional logic, this happens quite often: we check a condition that cannot be true, because its fragment has already been checked above. Here we have a separate branch oxNum == 0, and otherwise we check oxNum == 0 && hybrid.equals("sp"), which, of course, cannot be.


    12. We write to the array of zero length


    Sometimes IntelliJ IDEA will notice if you are writing to an array outside its size :


    Point3d points[] = new Point3d[0]; // завели массив на 0 элементовif (nwanted == 1) {
        points = new Point3d[1];
        points[0] = new Point3d(aPoint);
        points[0].add(new Vector3d(length, 0.0, 0.0));
    } elseif (nwanted == 2) {
        // а тут пытаемся в него писать — исключение неминуемо
        points[0] = new Point3d(aPoint);
        points[0].add(new Vector3d(length, 0.0, 0.0));
        points[1] = new Point3d(aPoint);
        points[1].add(new Vector3d(-length, 0.0, 0.0));
    }

    13. Length check after indexing


    Another common problem with the order of actions and again during error handling :


    publicvoidsetParameters(Object[] params)throws CDKException {
        if (params.length > 1) {
            thrownew CDKException("...");
        }
        if (!(params[0] instanceof Integer)) { // раз прочитали нулевой элементthrownew CDKException("The parameter must be of type Integer");
        }
        if (params.length == 0) return; // то длина точно не нуль
        maxIterations = (Integer) params[0];
    }

    In the case of an empty array, the author of the code wanted to go out quietly, but because of the check, he would come out, loudly banging an ArrayIndexOutOfBoundsException. Obviously, the order of checks violated.


    14. Check for null after a call.


    And again, the order of actions was violated, this time with null :


    while (!line.startsWith("frame:") && input.ready() && line != null) {
        line = input.readLine();
        logger.debug(lineNumber++ + ": ", line);
    }

    IDEA writes that is line != nullalways true. It happens that the verification is really redundant, but here the code looks as if null can really be.


    15. Disjunction instead of conjunction


    People often confuse logical operators AND and OR. The CDK project is no exception :


    if (rStereo != 4 || pStereo != 4 || rStereo != 3 || pStereo != 3) { ... }

    Whatever the equal rStereoand pStereo, it is clear that they cannot be equal to the four and the three at the same time, therefore this condition is always true.


    16. And again disjunction instead of conjunction


    A similar error , but caught by another message:


    if (getFirstMapping() != null || !getFirstMapping().isEmpty()) { ... }

    We can get to the right side only if we getFirstMapping()return null, but in this case we are guaranteed a NullPointerException, which is what IDEA warns about. By the way, here we rely on the stability of the results of the method getFirstMapping(). Sometimes we use heuristics, but here the stability is directly analyzed. Since the class is final, the method cannot be redefined. IDEA checks its body return firstSolution.isEmpty() ? null : firstSolutionand determines that stability is reduced to the stability of a method Map#isEmptythat has been previously annotated as stable.


    17. Hierarchy of interfaces and instanceof


    When checking an object for belonging to any interface, do not forget that interfaces can inherit each other :


    if (object instanceof IAtomContainer) {
        root = convertor.cdkAtomContainerToCMLMolecule((IAtomContainer) object);
    } elseif (object instanceof ICrystal) {
        root = convertor.cdkCrystalToCMLMolecule((ICrystal) object);
    } ...

    The interface ICrystalextends the interface IAtomContainer, so the second branch is obviously unattainable: if a crystal comes here, it will fall into the first branch.


    18. Bypassing the empty list


    Probably the author of this code is not very familiar with the Java language:


    List<Integer> posNumList = new ArrayList<Integer>(size);
    for (int i = 0; i < posNumList.size(); i++) {
        posNumList.add(i, 0);
    }

    The size parameter ArrayListspecifies the initial size of the internal array. This is used to optimize to reduce the number of allocations, if you know in advance how many elements are added there. In this case, in fact, the elements in the list do not appear, and the method size()returns 0. Therefore, the next cycle with an attempt to initialize the elements of the list with zeroes is completely useless.


    19. Do not forget to initialize the fields


    The analyzer checks constructors in a special way, taking into account field initializers. Due to this, the following error was found :


    publicclassIMatrix{
      publicdouble[][] realmatrix;
      publicdouble[][] imagmatrix;
      publicint        rows;
      publicint        columns;
      publicIMatrix(Matrix m){
        rows = m.rows;
        columns = m.columns;
        int i, j;
        for (i = 0; i < rows; i++)
          for (j = 0; j < columns; j++) {
            realmatrix[i][j] = m.matrix[i][j]; // NullPointerException гарантирован
            imagmatrix[i][j] = 0d;
          }
      }
    }

    Despite the fact that the fields are public, here no one could penetrate and initialize them in front of the designer. Therefore, IDEA boldly warns that a call to an array element will cause a NullPointerException.


    20. Do not repeat twice


    Repeated conditions also happen quite often. Here is an example :


    if (commonAtomCount > vfMCSSize && commonAtomCount > vfMCSSize) {
      returntrue;
    }

    Such bugs are insidious, because you never know, the second condition is simply superfluous, or the author wanted to check something else. If this is not fixed right away, then it can be hard to figure it out. This is another reason why static analysis should be used constantly.


    I reported some of these bugs to the project's bug tracker . It is curious that when the authors of the project corrected a part, they themselves used the IntelliJ IDEA analyzer, found other problems about which I did not write, and also began to correct them . I think this is a good sign: the authors realized the importance of static analysis.


    Also popular now: