jeje August 5, 2009 at 17:39

Storing code in the database or collect the code in bricks

This article was written by Napolsky . For a known reason, he could not publish it. If you liked the article, reward the author in a known manner.

In this topic, I will talk about one approach I am developing in web programming, the heart of which is storing code in a database. A few comments on the following text:

The phrase "page code" means executable (php) code
In all matters regarding performance, we mean the clean page generation time, without the use of accelerators, caching systems, etc.

How it all began

In order to understand “why it is necessary,” we will quickly go through the path that led me to store the code in the database. It so happened that I did not start my path in web programming with writing any scripts or modules for existing systems, but immediately with writing my own website engine from absolute zero. At this point, I had two years of programming experience in C ++ and, of course, I tried to build my own OOP web engine on the thumb (though at that time there was one name from OOP in PHP :)). Within reason, I really love my "bikes." Especially large. And before using the ready-made solution, I always ask myself the question “is it possible to write better?”.

In general, writing your bicycles is very useful, especially for beginner developers (when the first place is to increase professionalism, rather than writing code in the allotted time and budget). Only writing your own decisions gives an understanding of how something is built from the inside at the lowest level. And this, in turn, provides an understanding of the complexity, resource intensity, speed of various approaches, which ultimately translates into the choice of the right tools for solving the problem. For example, at the university we were forced to write our pushbacks for arrays, so that we would not forget that something much more could be hidden behind seemingly simple and trivial things.

As a result, we got an engine built according to a rather classical scheme: folders with classes, modules, templates, and more. Well and accordingly endless inclusions of all this at page generation. And since a rationalizer lives in me, as in many programmers, the costs of this approach began to bother me. In particular, most of all I didn’t like the fact that I had to connect a lot of “unnecessary” code (“dead” code, which obviously would not be executed on the page) for pages (for example, the entire library, when on this page only one function from her).

Have you ever thought about the amount of "dead" code on the page? In fact, its amount is usually 7-15 times the amount of code that will actually be executed when accessing the page. Take the comment class for example. It will have the render (), delete (), edit (), add (), compress (), answer () methods, etc. Moreover, in 1 execution of the script, as a rule, only 1 of these methods will be called (delete - if delete, edit - when editing, etc.), and the rest will not be called. So consider how much such extra code will run into the page.

At first, I tried to optimize by “cutting” and “gluing” large libraries or classes to the needs of various pages, thereby reducing the number of inclusions and “dead” code. But this, of course, is a dead end. Time passed. The projects written on this engine (the kingdom to them heaven :)) became more and more. Along with this, the number and size of the connected code grew, and with them the time of page generation. I began to think more and more often about how to get rid of "dead" code. And then I was visited by a bold, even crazy idea that seemed. What if…

The birth of an idea

But what if you divide the code into the smallest independent parts in order to be able to collect on the page only what is really needed? That is, to separate all functions, classes (ideally, and class methods) and so on. Thus, we will get many many small "bricks", from which we will then fold the page. Thus, it will be possible to completely get rid of the "dead" code and inclusions . I was really excited about this idea, but there were more questions than answers: how to do it, will it work, what pitfalls await in implementation, how fast is such a system? In short, until I had the slightest idea of how to implement this and how it would work. But it was worth a try, of course.

Warrior's path

The ideology is that after breaking everything down into as small pieces of code as possible, we can assemble anything from them. There were no questions about how to store the “bricks” of the code - since they were no longer code, but were essentially data with a set attributes, then the only option was to use the database. I will try to show the principle of operation of such a system as simple and abstract as possible, only conveying the essence.

1 Brick storage

Everything is simple and clear here: each separate function, class (or even a class method is better), module controller, module view, etc. are a separate line in the database. For example, in the simplest case, the table may look like id | code | name | componentType (where componentType is the type of brick (function, class, module ..))

2 Dependency storage

Since the code of one brick can cause another brick (for example, dependencies of the type function-function, module-function, or even page-module), you need to store replications. This can be done using the replication table, which, in the simplest case, has the form id | parentId | childId . Thus, we solve the problem of proper collection of "bricks" for nested structures:

function A() { B(); }

In this case, in replication table is a record that A 'need' in B . Therefore, when connecting A, B will automatically be connected .

3 page code generation

Well, we have all the bricks, but how to assemble the page code from them? To do this, of course, we need a separate script that will collect workable page code from our useless “bricks” themselves. Call this script Codegen. How it will depend on what and how you want to assemble from your "bricks". This is one of the strengths of the approach: you can collect fundamentally different page codes from the same bricks. You can even assemble a "classic" architecture. In order to avoid misunderstandings: the generation of the page code with legogen occurs 1 time, and not every time the page is accessed .

The output is a monolithic generated code for each page. At the same time, depending on Codegen, it is possible to immediately obtain all the necessary code for the page, and to load some parts during page execution (via eval from the database).

Reaping the benefits

Thus, we can achieve the following main results:
- complete absence of inclusions on the page
- reduction of the "dead" code to zero

Here is what this gave in my particular case:

the amount of code was reduced from 12000-14000 to 1500-2000 lines per page
the number of includes per page was reduced from 16-22 to 0
Page generation time was reduced from 0.25-0.3 to 0.04-0.05 seconds (~ 600%. I remind you that this is without a cache in the classic. With a cache, the figure will be smaller)

Pros and cons

Let us consider in detail the pros and cons of the ideology of code storage in the database.

Cons
- The inability to fully use the IDE. As a consequence. Since the code is stored in the database, there must be an interface for editing / writing it (for example, I use the web interface). How it looks like you can see here . In general, for me this never presented any particular inconvenience. All the tools I need (code highlighting, hot keys ..) can be easily implemented on the web interface. For those who need more, there is still no full replacement for the IDE.

- Debugging difficulty. It flows from the first paragraph. It is complicated by the fact that if you want to load some code dynamically from the database and execute it with the eval function, then finding an error can be really difficult.

- Support . Like everything that is not common, support for your project by other developers will not be any. Indeed, a problem that can only be solved by popularization.

In this topic , there were also indicated more disadvantages with which I will try to argue:

source codes are files, as a result, you can do any file operations with them

Honestly, I can’t imagine what can be done with a file, which cannot be done with a line in the database. On the contrary, a line in the database is a much more flexible thing than a file.

Distribution / backup / update

... is done with the sql dump (one file) much easier and faster than with a large number of files in the classics.

Security, direct injection code in case of problems

The problem also seems far-fetched. Make different users for the engine databases and site databases.

Backup, imagine, it happens that they are not made, and then any of your "customizations" on site down the drain if the base breaks

For the engine to work (after the codegen works), the database is no longer needed. That is, the site can work even when the database is off.

Pros
- Speed . For me it was a decisive factor. For the first time, when I compared the speed on the old "classic" engine and the new one, I was shocked by the result.
- Flexibility at the macro level . Than the designer consists of the smallest and simplest parts, the more difficult things can be assembled from it.
- Attributes for parts of the code . Since our bricks are stored in the table, we can set any attributes for each of them by adding the corresponding field. This is indeed a very important feature that opens up new spaces in development.
-The ability to carry out any processing of executable code before executing it . As you remember, all of our code goes through codegen, and therefore in it we can modify it arbitrarily. For example, apply language packs at the page code generation stage. Or in this way: if some code is often found in the code, for example

if(!$user->isAdmin()) {ErrorLog('нехватает прав'); return;}

You can write shortly everywhere instead

_CHECKADMIN

And at the generation stage, simply replace it with the code you need. So preprocessing the code also gives room for the imagination of the programmer.

Conclusion

In this article, I wanted to show that the ideology of storing code in a database is not as hopeless as it might seem at first glance. Along with the obvious disadvantages, there are unique advantages that push the scope of opportunities in web programming. And, importantly, not only in theory but also in practice: I have been using this approach for the past three years. And this, in my opinion, is a sufficient time to verify its “survival” in real conditions. In no way do I claim that storing code in a database is better than using the classical approach. But I believe that this is a completely competitive concept, and work in this area can give an impetus to the emergence of fundamentally new frameworks and CMS, with unique capabilities.

PS If interest arises, I can continue this topic with a description of my implementation of the proposed approach.

Tags: