Mirantis_OpenStack March 12, 2016 at 17:09

About the importance of catching fleas. What is Global OpenStack Bug Smash for?

Authors: Igor Marnat, Ilya Stechkin

From March 7 to 9, the Global OpenStack Bug Smash Mitaka took place. Mirantis hosted two venues for this event: one of them at the company's Moscow office. Russia for the first time joined the BugSmash. And we are glad that this happened with our direct participation.
Why is it so important for the community that large market players, such as Intel, Rackspace, Mirantis, IBM, HPE, Huawei, CESI or SUSE, support the work of checking and improving the quality of the code? What place does this event take in the process of creating the OpenStack platform?

Functional development and fix bugs are two different, but highly interconnected processes. Such a statement is true for any software product. But in the case of open source code there is a nuance: everyone who can create something new will prefer to do just that, rather than spend time looking at someone else's code and making corrections to it.

Even in literature, music or the visual arts, there are those who create works and those who analyze them (critics of all stripes). But if the productivity of a writer, composer or artist does not always benefit from the scrutiny of critics, then the software product always improves, becomes cleaner and more stable if you look at it with a fresh look. The essence of the OpenStack development process is that it is conducted openly and every change is always checked by several people.

In addition, there is a technical duty in each release cycle that hinders the development of new functionality. A simple example from Mirantis' internal kitchen: the master node prior to the release of MOS 7.0 was based on the outdated version of CentOS. This slowed down the process of product improvement, did not give the opportunity to implement some interesting features. But in the release of MOS 8.0, we valiantly paid this technical debt by updating the master node, allowing developers to use the latest versions of libraries and users to receive updates on time.

However, experience shows that technical debt builds up from release to release. And developers are forced to divide their time between the creation of new functionality and the partial liquidation of debt. So, for “catching fleas” time is allocated according to the residual principle, since pressure from product management is always aimed at developing new cool features that can be sold, and not at fixing old boring bugs that have already been sold.

Critical bugs are eliminated first. These are such bugs, in the presence of which the feature does not work at all. Substantial errors (high bugs) come into the work. In the presence of such bugs, the feature works, but not quite as it should, but “with crutches” - with significant improvements that compensate for the presence of this error. For example, a user has to restart a service so that it continues to work correctly.

Bugs below “high” rarely reach the attention of developers who are always within the tight schedule of a new release, while fixing high & critical bugs in previous releases. Therefore, bugs of a lower priority can hang for years and be transferred from release to release. They are hanging. And transferred from release to release. The problem is that medium bugs are related to usability, they are often triggered by the users themselves. We can say that their presence spoils the impression of working with ecosystem projects (customer experience). Here is an example of such a bug with FOUR YEARS history ( dhcp server defaults to gateway for filtering when unset ). Here is a more “young” bug - it is only 2 years old ( Enable metadata when create server groups ). Both bugs are related to the project.Nova (the official name of the project is OpenStack Compute). In total, in this project, which is extremely important for the ecosystem, there are 483 bugs of different “severity” (at the time of writing)! So all hope is that once in the release cycle developers postpone their affairs for the sake of hunting for bugs. And the code will become cleaner.

Determine the place of Bug Smash in the process of quality control (QA). It is believed that QA is solely testing. However, experienced developers (including those working for proprietary product companies such as Cisco) know that testing is only part of the QA. A large number of bugs can be detected at the stage of checking the quality of the code by other developers (code review). Typically, a code review precedes testing. This means that the price of the error found in the review process is lower.

It is widely known that the sooner a problem is found, the cheaper it will cost to fix it. For example, according to the data provided in McConnell’s book “Perfect Code” , fixing a mistake at the testing stage will cost ten times more than at the code development stage. Testing is a labor-intensive procedure and therefore not cheap. It is required to raise the lab with the appropriate characteristics, draw up a test script, conduct testing and eliminate problems identified during the testing process.

The most expensive mistake is the one that the user found. The one that the reviewers overlooked and the testers did not catch. In this case, the correction chain begins with support. Support service specialists receive a client’s request, diagnose the problem, for which they most likely repeat the testing procedure: that is, they lift the lab and - hereinafter (see above).

The most advanced users, who have people from the OpenStack community in the team, themselves detect bugs and inform the community about them. However, since these bugs are not critical or high, developers rarely have the opportunity to work with them. The circle is closed.

Thus, it is difficult to overestimate the importance of OpenStack Bug Smash, a marathon that takes place within each release cycle of OpenStack, and allows developers to allocate time to work with those bugs that usually remain outside their field of vision.

Everyone benefits from this: and users who finally receive a solution to their problems. And contributing companies, saving money by early detection and correction of errors in the code. And the whole ecosystem, as the level of customer satisfaction is increasing, which means there are new opportunities for business growth, built on the creation and implementation of solutions based on OpenStack.

Well, another significant result of the event is the attraction of new contributors and the general dissemination of knowledge about OpenStack in the world. In New York, at a similar event, they spent the first day training newcomers who came to learn how to work with OpenStack, and only on the second day they started fixing bugs. In Moscow, we also took the day to work with those who are just starting their immersion in OpenStack. Good luck and stable code!

Tags:

About the importance of catching fleas. What is Global OpenStack Bug Smash for?

Also popular now: