Failover system out of the trash

    Actually the story was as follows.
    For one firm N, it was necessary to develop a cheap and reliable data storage and processing system. Briefly about the data. It is necessary to receive information from clients (I’ll miss which one, something like tax reporting) and store it for many years. Quite often, a search was required for this information and even more often a modification of the data entered over the past couple of hours. Loss of information is unacceptable in any case. Including in case of fire or earthquake. Previously, all this was done on paper and stored in large folders. For the analysis of folders there was a whole department of senseless and merciless people.

    All this was to be transferred to an automated basis. The most interesting thing is that the development was paid quite decent, but they did not allocate money for hardware at all - they asked that all this be raised on the existing hardware. The car park consisted of a dozen morally dead monsters, and it was on them that it was necessary to raise the database server and backup server.


    The machines were mainly P3, 128mb operational, 16mb video, 10GB screw. That is not the standard server, to put it mildly.

    Each entry was offered to backup - that is, to mirror the database server and backup server. Given the amount of data and slowness of the machines during the first tests, this showed unpleasantly long results when sampling from the database. The base itself, by the way, did not greatly contribute to the speed of work, because it was completely normalized and compact.
    Since editing records in the database was quite frequent, this caused inconvenience.

    After the brainstorming, it was decided to be somewhat perverted.
    The total result:
    Server A. The main working server of the system, stores the Base for the current day and performs all the basic logic.
    Server B. The second server stores the database for all the time, not counting the current day. At midnight, the data from A poured into B. Here lie the scripts for reporting on periods.
    Server C. Backup server. After overflowing at midnight, B backuped, erased the old backup. And mirrored A all the time.

    Total: if any of the servers was lost, we had all the information. That is, A + B or C.

    Server C was settled separately in case of an emergency (the customer is still paranoid)

    On all servers there was Linux, Apache and muscle. The main code is written in PHP, backups in Python.

    The fire did not happen, the earthquake, too. Everything works great for the second year. Later, the “red button" was added - the ability to work at your own peril and risk if one of the cars is lost. It seems to be not useful)

    I think this is a bicycle or excessive paranoia, but everything worked, passed all the crash tests and everyone was satisfied.

    % username%, but how would you improve this system?

    UPD: advise in which blog to transfer.

    Also popular now: