The Madness and Success of Oracle Database Code

    This week, Hacker News users decided to discuss the question “What is the maximum amount of bad - but at the same time working - did you ever see the code?” (Later Reddit users joined them ). The comments have been told a lot of "funny" stories about what we all come across from time to time; but most of all the attention was attracted by the story about the code of the “advanced database management system used by most of the Fortune 100 companies”.

    The story of the former developer of Oracle became deservedly the winner in the nomination “Lovecraft-like Horrors”which worked on Oracle Database during the development period of version 12.2. The size of the database code base at that time was 25 million lines in the C language - and as soon as you changed one of these lines, thousands of previously written tests broke .

    Over the past years, several generations of programmers managed to work hard on the code, who were regularly pursued by tight deadlines - and thanks to this, the code was able to turn into a real nightmare. Today it consists of complex “pieces” of code, responsible for logic, memory management, context switching, and much more; they are related to each other with the help of thousandsvarious flags. All code is interconnected by a mysterious macro, which cannot be decrypted without the help of a notebook in which you have to write down what the relevant parts of the macro are doing. As a result, the developer can take a day or two only to figure out what the macro actually does.

    In order to predict the behavior of the code in one way or another, you have to understand and remember what values ​​and consequences 20 (and even hundreds) flags can have. The situation is worsened by the fact that various developers used their own types, which were essentially the same (for example, int32) - and hardly anyone would dare to touch such a legacy (you can definitely say that this was the place to be Oracle 8i codebase).

    The question arises: how does the Oracle Database still manage to keep up with all this? The secret is in millions of tests. Their full implementation can take from 20 to 30 hours (at the same time they are distributed on a test cluster of 100-200 servers).

    The team that worked on the product in the late 90s and adhered to the ideas of TDD (test-driven development) had the following opinion: “automated tests mean that you don’t have to write code that you can understand - instead think tests. In the future, the developers were forced to adhere to the principles laid down by them, and now we see in practice what this idea has turned into in the long term - with all its pluses and minuses.

    Today, the process of fixing a new bug in Oracle Database takes from several weeks to several months. First, the developer has to spend a few days only to deal with the necessary flags (the mysterious interaction of which causes the bug), after which he often has to add his own flag, which will be responsible for processing the specific script that caused the bug.

    Then he sends the code for testing, and the next day, he quietly switches to another task, waiting for the test cluster to assemble the new Oracle DB build and run all the tests on it. If the developer is lucky, about 100 tests will "turn red"; if not (and this option happens more often) - around 1000, and he will have to check which of his assumptions about the operation of the existing code turned out to be incorrect; it is quite possible that he will find that he needs to examine a dozen or more different flags that have unobviously participated in the work of the code that he changed.

    He will have to repeat this process for a couple of weeks before luck finally smiles at him and all the tests finally pass. After that, he himself will have to write dozens of tests - in order to make sure that the developer, who will disturb his code in the future, will not break his “fix”. Then the modifications will be sent to the review, which can take from several weeks to a couple of months, after which the bug will finally be merged into the main work branch.

    Due to the fact that it takes at least a day to build the DBMS and perform the tests, each developer is expected to work simultaneously on 2-3 bugs and switch between them while waiting for the test results.

    If you thought that the life of developers who add new functionality to the DBMS is easier - then you are in vain. Adding even a small new feature like the new authentication mode can take from 6 months to a year, in particularly neglected cases - up to two years.

    In the described case, TDD allows you not to crumble "spaghetti" code, which is already extremely difficult to understand something, and have a working product at the output. At the same time, costs continue to grow, and the quality of the new code often leaves much to be desired. Not only the US development team, but also a team from India is working on the DBMS, so some Oracle developers traditionally lay the blame for the quality of the code on them. Others disagree with them, and based on the changelog they state that the quality of the code does not depend on the geography of the team, and bad code periodically “flies” from both teams. The really serious problem for the product is the developers who perceive the project as an “entry into the industry” and work on the DBMS for no longer than 1-2 years; During this time, it is impossible to substantially understand the intricacies of the project.

    According to the testimony of another developer who was porting the Oracle 8i codebase to one of the Unix versions in the late 1990s, the code was already a tangle of “spaghetti”, which was totally impossible to understand. Another developer who worked with the DBMS code in the late 80s claims that the codebase then was a huge heap of C source codes and a makefile for the build — many of which were much more complex than the code for the kernel itself. Of course, it is worthwhile to be realistic - it is unlikely that the situation is better in similar products-industry leaders, the development of which has been conducted for several decades.

    Also popular now: