Static analysis of PHP code using Symfony2 as an example

annotation


The need for static analysis in large projects has already been written more than once and, mainly, with a focus on strictly typed languages, for example, here and here .

With PHP, the situation is more complicated: they have already written about static analysis of PHP code, but in general the tools are much poorer here, and the dynamic nature of the language makes the development-testing process more difficult. For comparison, in the same Java, compiling a project in itself helps to find errors, and in PHP typing is weak, so even tests can skip errors.

At our company, we partially solved this problem by incorporating static analysis and strong typing in the development and verification process. After that, I wanted to see what was happening in other projects.

Instruments


I will skip the well-known utilities ( PHP Mess Detector , PHP Copy / Paste Detector , PHP CodeSniffer , PHP Analyzer , Pfff ) - we used them at a certain stage, but the results were not worth the effort, so let's move on to smart tools that will bring benefit almost immediately.

Theoretically we could use SensioLabs Insight, but corporate rules do not allow you to give the code to a third party. Thus, our basic requirement is back to the classic integration of static analysis into the IDE. In our case, this is PhpStorm (you can put a trial version for 30 days if you want to check your project). By default, a good set of rules for static analysis is available, but this, frankly, is not enough for enterprise-level products, so we put the extension Php Inspections (EA Extended) . This extension is the main tool of our analysis, and all subsequent examples are for him.

What is symfony2


Symfony - is opensource-framework written in PHP5 ( details of the architecture ). For review, version 2.7 was selected.

Symfony2 code analysis


To analyze the code, we take only the kernel components located in the src / Symfony / Component folder.

Our analysis found 6924 problems of varying severity and different categories (the extension contains about 40 inspections, but not all of them found problems).
For comparison, in version 2.3 there were 5727 problems (20% less), although these can be both new components and new tests.

Debriefing


It makes no sense to consider all cases, and there will be false positives, so we will consider only a few interesting pieces of code (in relation to the type of problem).

Problem Type - Architecture: class re-implements interface of a superclass


namespace Symfony\Component\Translation\Loader;
// проблема здесь
class MoFileLoader extends ArrayLoader implements LoaderInterface {
    ...
}
// родительский класс
class ArrayLoader implements LoaderInterface {
    ...
}

It is not clear: either the interface was introduced to the parent class later, or the developers simply overlooked the inheritance of the classes, but re-inheriting the interface in the child class does not make sense.

Problem Type - Architecture: class overrides a field of superclass


namespace Symfony\Component\Translation\Dumper;
class IcuResFileDumper extends FileDumper {
    ...
    // проблема здесь
    protected $relativePathTemplate = '%domain%/%locale%.%extension%';
    ...
}
// родительский класс
abstract class FileDumper implements DumperInterface {
    protected $relativePathTemplate = '%domain%.%locale%.%extension%';
    ...
}

It was necessary to override the default value, which we see.

The problem is that the attribute is duplicated to the child region, although initially it is an attribute of the parent class. It should be implemented like this:

use Symfony\Component\Translation\MessageCatalogue;
class IcuResFileDumper extends FileDumper {
    ...
    public function __construct() {
        $this->relativePathTemplate = '%domain%/%locale%.%extension%';
    }
    ...
}


Problem Type - Probable bugs: missing parent constructor / clone call


namespace Symfony\Component\HttpFoundation;
class FileBag extends ParameterBag {
    ...
    // проблема здесь
    public function __construct(array $parameters = array()) {
        $this->replace($parameters);
    }
    ...
    public function replace(array $files = array()) {
        $this->parameters = array();
        $this->add($files);
    }
    ...
}
// родительский класс
class ParameterBag implements \IteratorAggregate, \Countable {
    ...
    public function __construct(array $parameters = array()) {
        $this->parameters = $parameters;
    }
    ...
}

In OOP, calling the parent constructor / destructor is a common practice. In this case, it seems that the developers simply overlooked the code. It should be implemented like this:
class FileBag extends ParameterBag {
    ...
    public function __construct(array $parameters = array()) {
        parent::__construct();
        $this->add($files);
    }
    ...
}


Problem Type - Control flow: loop which does not loop


namespace Symfony\Component\Templating\Loader;
class ChainLoader extends Loader {
    ...
    public function isFresh(TemplateReferenceInterface $template, $time) {
        foreach ($this->loaders as $loader) {
            // проблема здесь
            return $loader->isFresh($template, $time);
        }
        return false;
    }
    ...
}

It is possible that this is the desired behavior of the method, but the class name and the loop indicate more likely an error - only the first iteration will always be checked.

Problem Type - Control flow: ternary operator simplification


class CrawlerTest extends \PHPUnit_Framework_TestCase {
    ...
    public function testReduce() {
        ...
        $nodes = $crawler->reduce(function ($node, $i) {
            // проблема здесь
            return $i == 1 ? false : true;
        });
        ...
    }
    ...
}

This, of course, is not a mistake, but there is no special meaning in such code either.
In the test fragment, the return can be simplified as “return $ i! = 1;”.

Problem Type - Performance: elvis operator can be used


namespace Symfony\Component\HttpKernel\DataCollector;
class RequestDataCollector extends DataCollector implements EventSubscriberInterface {
    ...
    public function collect(Request $request, Response $response, \Exception $exception = null) {
        ...
        $this->data = array(
            ...
            // проблема здесь
            'content_type' => $response->headers->get('Content-Type') ? $response->headers->get('Content-Type') : 'text/html',
         ...
    }
    ...
}

This is also not a mistake, but a simplification of the code, and can be rewritten as "'content_type' => $ response-> headers-> get ('Content-Type')?: 'Text / html'".
Basically, the new Symfony2 code already uses this operator, and this piece of code has not yet been noticed.

AlexPTS suggested a very simple option:
    $response->headers->get('Content-Type', 'text/html')


Type of problem - Control flow: not optimal if conditions


This is a separate class of errors and non-optimal code. There are a lot of variations of problems, because conditional structures in legacy projects and after active refactoring is a headache for any architect.

Here are a couple of examples of what this analyzer finds.
namespace Symfony\Component\HttpKernel\Profiler;
abstract class BaseMemcacheProfilerStorage implements ProfilerStorageInterface {
    ...
    public function find($ip, $url, $limit, $method, $start = null, $end = null) {
            ...
            // проблема здесь
            if ($ip && false === strpos($itemIp, $ip) || $url && false === strpos($itemUrl, $url) || $method && false === strpos($itemMethod, $method)) {
                ...
            }
            ...
     }
     ...
}

A vivid example of how to confuse everyone at once. In fact, the structure is as follows:
    ($ip && false === strpos($itemIp, $ip)) 
    || 
    ($url && false === strpos($itemUrl, $url)) 
    || 
    ($method && false === strpos($itemMethod, $method))


Duplicate calls to methods and functions:
namespace Symfony\Component\Form\Extension\Core\DataMapper;
class PropertyPathMapper implements DataMapperInterface {
    ...
    public function mapFormsToData($forms, &$data) {
        ...
                // проблема здесь
                if ($form->getData() instanceof \DateTime && $form->getData() == $this->propertyAccessor->getValue($data, $propertyPath)) {
                    ...
                }
                if (!is_object($data) || !$config->getByReference() || $form->getData() !== $this->propertyAccessor->getValue($data, $propertyPath)) {
                    $this->propertyAccessor->setValue($data, $propertyPath, $form->getData());
                }
        ...
     }
}

"$ form-> getData ()" is called several times, although it would be more logical to save the result in a local variable.

Instead of a conclusion


Static code analysis (including PHP code) is a powerful and useful practice, although expensive from the point of view of organizing the development process and training teams.

But with everyday use, this practice works very effectively:
  • reduces the number of errors;
  • helps to objectively choose components for projects;
  • evaluate the new project and understand which side to approach it;
  • raise the level of your teams.

On the example of opensource, all of these points are just as relevant as in enterprise, just a quick glance at the problems found.

For PHP, the situation with static analysis tools has improved markedly over the last couple of years, and it looks like it will continue to improve.

Also popular now: