dmitry_demchina April 11, 2019 at 18:13

Database business logic with SchemaKeeper

From the sandbox

The purpose of the article is to use the schema-keeper library as an example to show tools for simplifying database development in PHP projects using PostgreSQL DBMSs.

The following issues will be addressed:

In what form to store a dump of the database structure in the version control system (hereinafter - VCS)
How to track changes in the database structure after saving the dump
How to transfer changes in the database structure to other environments without conflicts and giant migration files
How to establish a process of parallel work on a project by several developers
How to safely deploy more changes to the database structure to production environment

The information from this article will primarily be useful to developers who want to make the most of PostgreSQL’s capabilities, but face problems maintaining the business logic in the database.

The article will not describe the advantages or disadvantages of storing business logic in a database. It is assumed that the choice has already been made by the reader.

SchemaKeeper is designed for working with stored procedures written in PL / pgSQL . Testing with other languages was not carried out, so the use may not be as effective or impossible.

In what form to store a dump of the database structure in VCS

The schema-keeper library provides the saveDump function , which saves the structure of objects from the database as separate text files. The output creates a directory containing the database structure, divided into grouped files that are easy to add to VCS.

Consider converting objects from a database to files with a few examples:

Object type	Scheme	Title	Relative file path
Table	public	accounts	`./public/tables/accounts.txt`
Stored procedure	public	auth (hash bigint)	`./public/functions/auth(int8).sql`
Representation	booking	tariffs	`./booking/views/tariffs.txt`

File content is a textual representation of the structure of a specific database object. For example, for stored procedures, the contents of the file will be the complete definition of the stored procedure, starting with a block CREATE OR REPLACE FUNCTION.

As can be seen from the table above, the file path stores information about the type, scheme, and name of the object. This approach makes it easy to navigate through the dump and code review changes to the database.

The extension .sqlfor files with the source code of the stored procedures is selected so that the IDE automatically provides tools for interacting with the database when opening the file.

How to track changes in the database structure after saving the dump

Having saved a dump of the current database structure in VCS, we get the opportunity to check whether changes were made to the database structure after creating the dump. The schema-keeper library provides a verifyDump function for detecting database structure changes , which returns information about differences without side effects.

An alternative way to check is to call the function again saveDump, specifying the same directory, and check for changes in VCS. Since the objects from the database are stored in separate files, VCS will show only changed objects. The main disadvantage of this method is the need to overwrite files to see the changes.

How to transfer changes in the database structure to other environments without conflicts and giant migration files

Thanks to the deployDump function , the source code of stored procedures is edited just like the rest of the application source code. Modification of the stored procedure occurs by making changes to the corresponding file, which is automatically reflected in the version control system.

For example, to create a new stored procedure in the scheme, it is publicenough to create a new file with the extension .sqlin the directory public/functions, place the source code of the stored procedure, including the block CREATE OR REPLACE FUNCTION, into it , then call the function deployDump. Similarly, a stored procedure is modified and deleted. Thus, the code simultaneously enters both the VCS and the database.

If an error appears in the source code of the stored procedure, it deployDumpwill not be executed by throwing an exception. Mismatch of stored procedures between the dump and the current database is not possible when using deployDump.

When creating a new stored procedure, there is no need to manually enter the correct file name. It is enough that the file has an extension .sql. The correct name can be obtained from the return value of the function deployDump, and used to rename the file.

deployDumpchanges the parameters of the function or the return type without additional actions, while with the classical approach it would have to be executed first DROP FUNCTION, and only then CREATE OR REPLACE FUNCTION.

Unfortunately, in some situations it is deployDumpnot able to automatically apply the changes. For example, if a trigger function that is used by at least one trigger is deleted. Such situations are resolved manually using migration files.

If schema-keeper is responsible for transferring changes in stored procedures , then migration files are used to transfer other changes in the structure. For example, the doctrine / migrations library is suitable .

Migrations must be applied prior to launch deployDumpin order to make changes to the structure and resolve possible problem situations.

Work with migrations will be described in more detail in the following sections.

How to establish a process of parallel work on a project by several developers

Let's create a script for full database initialization, which developers run on local machines to bring the structure of local databases in line with the dump stored in VCS. We divide the initialization of the local database into 3 steps:

Import a file with a basic structure, which will be called, for example, base.sql
Application of migrations
Call deployDump

base.sqlIs a starting point on top of which migrations are applied and executed deployDump, that is base.sql + миграции + deployDump = актуальная структура БД. Used base.sqlexclusively when initializing the database from scratch. Such a file is generated using pg_dump .

We will call the script for complete database initialization refresh.sh. The developer's workflow is as follows:

Running in your environment refresh.shto get the current database structure
Start of work on the task, modification of the local database to the needs of the new functionality ( ALTER TABLE ... ADD COLUMNand so on)
After the task is completed, a function call is made saveDumpto commit changes made to the database in VCS
Restart refresh.sh, then verifyDumpto display a list of changes to be included in the migration
Transferring structure changes to the migration file, starting again refresh.shand verifyDump, and if the migration is completed correctly, verifyDumpwill show the absence of differences between the local database and the saved dump

The process described above is compatible with the principles gitflow. Each branch in VCS contains its own version of the dump, and when branches merge, dumps merge. Mergers are performed without additional actions, but if changes were made in the branches, for example, to the same table, a conflict is possible.

Consider a conflict situation using the example of a develop branch , from which branch1 and branch2 are branched , which do not conflict with develop , but conflict with each other. The task is to merge branch1 and branch2 into develop. For such a case, it is recommended that you first merge branch1 into develop , and then merge develop into branch2 , resolving the conflicts in branch2 , and then merge branch2 into develop. At the stage of conflict resolution, you may have to fix the migration file in branch2so that it matches the final dump, which includes the results of the mergers.

How to safely deploy more changes to the database structure to production environment

The presence in the VCS dump of the current database structure allows you to check the production base for exact compliance with the required structure. This ensures that all the changes that the developers intended were transferred to the production base.

Since DDL in PostgreSQL is transactional , it is recommended that you follow the deployment order so that, in the event of an unexpected error, it is "painless" to perform ROLLBACK:

Start transaction
Perform all migrations in a transaction
In the same transaction, execute deployDump
Without completing the transaction, execute verifyDump. If there are no errors, execute COMMIT. If there are errors, executeROLLBACK

These steps are quite easy to integrate into existing approaches to deploy applications, including zero-downtime.

Conclusion

Thanks to the methods described above, you can squeeze the maximum performance out of “PHP + PostgreSQL” projects, while sacrificing a relatively small amount of development convenience in comparison with the implementation of all business logic in the main application code. Moreover, data processing in PL / pgSQL often looks more transparent and requires less code than the same functionality written in PHP.

Tags: