What should be taken into account when designing a system so that it is not excruciatingly painful?

    The article describes the problems in designing databases and a bit of the entire application, which then with the growth of the project is more and more difficult to solve. Moments that are important to consider at the design stage, and not to think about them later. Well, or to think over a cup of tea and the phrase “Do you remember how we decided to do this right away?” How much time we saved for ourselves! ”, And not with a sensation of toothache and painful flinching at every memory. As the system and the number of users grow, the design of the database becomes more and more difficult to change, and the scale of the changes becomes more global and time-consuming.

    Now many successful projects have grown from small startups, which later gained commercial success and became large international companies. This growth opportunity has appeared in the last 20 years, mainly due to the Internet and the effect of “erasing borders”. There are global Internet applications and mobile applications that can be used in any country. Previously, most often, if the application was to be an international project, it was designed right away taking into account such a requirement. Of course, you can take the evolutionary approach, and as the project grows, add the necessary functions and scaling to it. But to facilitate the implementation of further changes, it is necessary to immediately take into account the scale of some basic functions, which are difficult to change in the future.

    I worked in 2 startup projects that shot and grew into large companies with millions of users from small regional projects, and now they are highly loaded. To my surprise, I saw that there are many common problems, although applications were written by different teams and for different users. One can see common problems in the databases that are the legacy of a startup, such childish growth problems that show that the project was originally planned as small.



    There is no need to immediately create a project of international scale, but it is important to lay down the basic functionality, which, if necessary, will facilitate and help transform the project into an international highly loaded system.

    And so, the main errors:

    1. The lack of localization.

      Our world has long become global, and there are no boundaries for software applications. When a project becomes popular not only in one country, but around the world, the possibility of localization is very important. At the stage of development and laying the foundation for localization, the correct choice of data types for storing information, which will vary depending on the country, is important. If there is no such foundation, then localizing the application when it is already actively used will be difficult. There are 4 main points to consider when localizing.

      • Time zones.

        It is necessary to store the date and time either in UTC format or taking into account the client’s time zone. It is important to remember that all server dates will also need to be translated into the client’s time zone when displayed. It looks ridiculous and obvious, but however, initially, one of the projects, when we offered the owner to lay support for other regions, he said that he did not plan such growth. And then there was growth. And torn hair about the fact that they did not lay right away, too.

      • Different languages

        When choosing text fields, you should use the Unicode format or the NVarchar type, later this will facilitate the work with the application. You need to pay attention to the rules for sorting and comparing rows in your database. Make sure that you choose not Collation by default, but a sort that will work correctly when comparing diacritics, hieroglyphs, and characters of non-standard width, provided that the end user enters some information. Tell me, will your application work only for the USA or only for Europe? Then you need to ask yourself whether there will be data that the end user can enter. At least something comment line or some information for yourself? If there is such a possibility, do the storage of strings in Unicode. The world has long been mixed up and you are not immune from the fact that a person a Japanese user will add something to your database as a user. The ultimate goal is to ensure that user data is correctly saved and then correctly displayed.

      • Currency

        If information about funds is stored in your database, it would be great to clarify in what currency they should be stored and to establish a mechanism that will allow you to enter a different currency without changing the application. This does not mean the implementation of the full functionality in the form of conversion of rates, their loading and other tools that are redundant in the first stage, but the storage of the sign of monetary units in a table so that when changing the country of the user, it was possible to distinguish the amounts in different currencies stored in the database. It is important to remember that some countries allow transactions in several currencies.

      • Natural keys

        Almost all canonical database textbooks describe how ineffective artificial keys are and how beautiful natural keys are. But the world, as a rule, has not yet reached such a level of globalization that everyone could use the natural key, since for the existence of the natural key a unified classification of the object is necessary, for which the key is selected, which remains unified and global throughout the world.

        For example, in one of the projects we thought to use the TIN as a natural key for counterparties, but abandoned this idea. I can not describe in words how we later rejoiced over this, as the company began to develop and cooperate with legal entities outside of Russia.

    2. Data Types The

      right choice of data types is the key to the success of the application, and, therefore, the health and sleep of the developer. When choosing the dimension of a data type, it is necessary to focus on the largest possible option from the largest possible number of users and operations.

      For example, when creating a table with a list of employees of only your company, it is permissible to use a primary key of type INT, since the probability of a company growing to 2 billion people is very small. However, if you create a database of employees of your clients ’companies, you must use a data type of a larger dimension, for example, BIGINT, since there is a possibility that 500 thousand employees, subject to a change of work, will create 2 million records four times in your database, and a table for storing the list of employees of client companies, in the presence of a large number of customers, may well grow and exceed 2 billion. The same rule applies to tables that store user logs, and other business operations that can increase with the growth of the application I'm a million times.

    3. Synchronous response to user actions

      When designing a system, it is important to consider which of the user actions will concern the change or removal of a large number of system objects if the system grows. For example, if the user has 2000 contacts, and he sends them an invitation, the system will send this invitation immediately, or display a message to the user that his request is being processed, and then updates the status to completed when the action is processed.

      If the processing is immediate, then with the growth of the system in the future there may be problems with server performance. Also, the system workload when processing all requests synchronously will directly depend on user actions. For example, at standard load, the system has 10% of resources for processing all requests in the normal mode. When a custom super request appears, such as an invitation to 500 thousand users at a time, the system load can increase up to 100%. The example is somewhat synthetic, but I think the essence is clear. In this case, the system will cease to properly process requests of the remaining individual users until it completes processing this huge request.

      Think about what user actions can be large-scale and make them asynchronous. In the example with invitations, the system does not start sending immediately, but creates a new task “sending invitations to users”. When the task is created, the system in the background sends invitations and informs the user about the task completion upon its completion. For the convenience of the user, its notification of the actions of a system that operates in asynchronous mode is also quite important.

      Having laid such capabilities in the system, you will save it from overloads in case of large-scale user actions, which is important in working with corporate systems when such large-scale actions come from your key customers.

    4. Accumulation of unnecessary data

      At the beginning of work on a project, developers, as a rule, do not think about how much free space and disk space is spent in the system, since disk space prices are quite affordable. No one thinks about saving disk space until the system reaches its first million users. Prior to this, you can afford the luxury of storing the logs of each user action forever.

      For example, it is important to keep user action logs for 1-2 weeks in case he contacts technical support. This information can then be deleted. Therefore, it is important to think over and immediately introduce the process of deleting old data before such data is accumulated too much. If this process is not immediately prescribed, then there is a high probability of encountering a problem when a huge stream of new data is added to the database, and the system cannot cope with the removal of unnecessary data. It is necessary to think over an effective system for deleting old data, such that it not only surpasses the speed of the new data stream, but also can delete a huge amount of information over the past years of work in the foreseeable time.

      The rule to delete old junk data should apply to all new features.

    5. Lack of tests and documentation.

      Typically, startup projects are in a hurry to start faster. Also usually a relatively small team of developers implements stratups, who knows by heart the entire business logic of the application and does not understand why they should waste time writing documentation, and sometimes tests. The work is carried out on the principle of “we’ll start now, and then write.” However, as soon as the system starts up, a huge number of new tasks appear, and there is no time left for writing documentation.

      If the project is developing successfully, the company begins to grow very quickly and to recruit new employees. The inclusion of new developers in a team without documentation is very difficult and time-consuming. There may come a time when all the working hours of the members of the original team will be spent explaining the functionality of the system to new employees.

      We strongly recommend writing documentation immediately. Subsequently, it will pay off in full.
      As for the tests, as soon as the development team grows, you need to make sure that the development of new functions or the refinement / modification of existing ones does not adversely affect the rest of the existing system functionality. Tests are the easiest and most effective way to test functionality.

    6. Scaling

      With increasing load on the system, it becomes necessary to scale it. Therefore, during the design of the system, it is necessary to consider the possibilities of scaling the system with increasing load, for example, 10 times. Designed for scalability, the design will facilitate subsequent changes to the new realities of the growing system.

      It would be nice to ask yourself and the developers the following questions:

      • How to increase system performance, if you increase the capacity of the equipment?

      • Is it possible to create a copy of the system on another server and enable it to work? If not, what needs to be changed in the design to make this possible?

      • How large is the scale of the changes needed to break down the system into several parts and rebuild the architecture using the principles of SOA or microservices?

      This does not mean that the system must be designed immediately large-scale, but it is important to lay the foundation for further scaling of the system. Lay the straws in a thin layer, do not immediately fold the sheaves everywhere.

    So, in the initial stages of system design, it is important to strike a balance between work for the future and work for the present. And the present is more important, because there are specific requirements and tasks that need to be implemented, usually in a short time. The future may not come, or appear completely different, originally expected. If you were asked to make a city bus, you don’t need to immediately make it possible to convert it into a spaceship, because this future may not happen on a city bus. Therefore, it is ideal to lay the foundation for the further separation of the system, which will subsequently be used for diversity in services and modules.

    Also popular now: