How to use JSON restrictions when working with PostgreSQL

    Earlier in the blog on Habré we talked about the development of our product - billing for telecom operators "Hydra" , and also examined issues of working with infrastructure and using new technologies. For example, we examined the advantages of Clojure and situations when it is worth and not worth using MongoDB .

    Today we will focus on working with JSON, and in particular, on the application of restrictions. An interesting material on this topic was published in his blog by the developer Magnus Hagander - we present to your attention the main thoughts of this material.

    Hagander writes that during a conversation on the sidelines of a conference, he thought about whether it is possible to use the advantages of SQL and NoSQL databases at the same time. In particular, the interlocutors asked the developer about the possibility of applying the advanced restrictions of the PostgreSQL DBMS. “If you think that this can work in a particular case, most likely it will be,” Hagander is convinced.

    However, if in the case of unique UNIQUE keys or checking constraints everything is quite simple, then when working with exclusive constraints everything is far from simple.

    In addition to the technical side, the question certainly arises: is it worth doing this at all? The more restrictions are added to JSON data, the more structured they become. On the other hand, there are databases in which the main advantage is the presence of dynamic schemes, but they still require key indexes and restrictions (unlike PostgreSQL, where binary JSONB is unstructured even after indexing).

    In PostgreSQL, keys and restrictions can be defined both for columns and directly for any expressions, provided that they are immutable - the result depends only on the input values, but not on the attributes external to the expression. This also applies to JSONB.

    Consider a standard table containing JSON:

    postgres=# CREATE TABLE jsontable (j jsonb NOT NULL);
    postgres=# CREATE INDEX j_idx ON jsontable USING gin(j jsonb_path_ops);

    Of course, such a declaration of a table with only a JSONB field in practice rarely turns out to be a good idea. Most often, in reality, there is more information, and you need more than one JSONB field - but for example we use this option.

    By creating standard reverse indexes using jsonb_path_ops, you can get fully unstructured indexing in JSONB as efficiently as possible. This index will not be used in the current example, however, in real development, this is one of the main reasons for using JSONB. Add some information in the form of registration entries to demonstrate limitations. For example, we’ll use a semi-fixed schema. In addition, the user identifier UUID is used here as the sort key - they usually do this:
     postgres=# INSERT INTO jsontable (j) VALUES ($${
      "uuid": "4e9cf085-09a5-4b4f-bc99-bde2d2d51f41",
      "start": "2015-03-08 10:00",
      "end": "2015-03-08 11:00",
      "title": "test"
    INSERT 0 1

    The first thing to look at is whether it is possible to verify the uniqueness of the uuid field. This identifier must be unique throughout the table. However, as practice shows, this requirement is not always respected, which means that in order to make sure that there are no duplicates, it is necessary to use a restriction. Everything is pretty simple:

    postgres=# CREATE UNIQUE INDEX j_uuid_idx ON jsontable(((j->>'uuid')::uuid));

    Here, an index is created from the extracted value of the UUID field (using a unique index based on the B-tree). This index can be used both for searching by key and for eliminating duplication of keys. Using the command, the j->>'uuid'text value of the uuid field is retrieved, then using the command it ::uuidis converted to the built-in type of unique identifiers.

    In principle, you can directly create a restriction on a text field, but it is much more efficient to use UUIDs for this, since it is processed as a 128-bit integer.

    postgres=# INSERT INTO jsontable (j) VALUES ($${"uuid": "4e9cf085-09a5-4b4f-bc99-bde2d2d51f41", "start": "2015-03-08 11:00", "end": "2015-03-08 12:00", "title": "test2"}$$);
    ERROR:  duplicate key value violates unique constraint "j_uuid_idx"
    DETAIL:  Key (((j ->> 'uuid'::text)::uuid))=(4e9cf085-09a5-4b4f-bc99-bde2d2d51f41) already exists.

    Another problem remains - there is no verification of the existence of this field. You can insert records into the table in which there simply will not be a UUID field. This is due to the fact that the - >> operator returns NULL by default, which does not cause uniqueness violation (since one NULL is not equal to another NULL). If you want to eliminate this drawback, you can implement the CHECK test restriction:

    postgres=# ALTER TABLE jsontable ADD CONSTRAINT uuid_must_exist CHECK (j ? 'uuid');

    With this restriction, you can no longer insert records into the table without the UUID field, and creating a unique index in the previous step ensures that there are no duplicates. Converting to a UUID type ensures that the data format is correct. This set of indices and restrictions repeats the functionality of a classic column defined as uuid NOT NULL UNIQUE.

    However, there are more complex exclusionary restrictions - that is why the registration record was chosen as an example.
    The main idea here is the impossibility of overlapping records in the table when constraints are implemented. For ordinary relational databases, implementing such a mechanism is simple - you just need to create the usual exclusive constraint.

    For JSON data, you can apply a method similar to the one described above to implement restrictions based on the extraction function. However, there is one problem - you can create restrictions for expressions only if they are unchanged , and the conversion of text to timestamps is mutable.

    The reason for the variability of the conversion of text to timestamps is the conversion of values ​​that depend on external values, for example:

    postgres=# SELECT 'today'::timestamp;
     2016-03-08 00:00:00
    (1 row)

    This is a good example of the variability of such a transformation, since the value will change every day. And the value that is used as the index should not be changed (until the value is explicitly updated, of course).

    If it is known for certain that the database does not contain such data types, then the use of the function will most likely be unchanged. If this is known exactly, then you can create a small interface function that will convert the above expression into immutable:

    postgres = # CREATE FUNCTION immutable_tstamp (t text) RETURNS timestamp LANGUAGE sql IMMUTABLE AS $$ SELECT t :: timestamptz AT TIME ZONE 'UTC' $$;

    This function implements an immutable conversion to a time stamp that is hard-coded in the UTC time zone. In addition, it makes sense to implement a check constraint, which will make sure that the data contains only immutable values.

    In the next step, the function is combined with the tsrange () function - this allows you to create an expression-based exclusion constraint that ensures that there are no overlapping records . This problem is solved by step-by-step extraction of start and end time records using the function of constant conversion. Then, to create valid timestamps and pass them to an exclusive constraint, a function is called using the overlap operator (&&) tsrange.

    postgres=# ALTER TABLE jsontable
      ADD CONSTRAINT overlapping_times
      EXCLUDE USING gist(
       ) WITH &&

    As a result, we have forbidden the addition of records with overlapping time fields:

    postgres=# insert into jsontable (j) values ($${
      "uuid": "4e9cf085-09a5-4b4f-bc99-bde2d2d51f43",
      "start": "2015-03-08 10:30",
      "end": "2015-03-08 11:30",
      "title": "test"
    ERROR:  conflicting key value violates exclusion constraint "overlapping_times"
    DETAIL:  Key (tsrange(immutable_tstamp(j ->> 'start'::text), immutable_tstamp(j ->> 'end'::text)))=(["2015-03-08 10:30:00","2015-03-08 11:30:00")) conflicts with existing key (tsrange(immutable_tstamp(j ->> 'start'::text), immutable_tstamp(j ->> 'end'::text)))=(["2015-03-08 11:00:00","2015-03-08 12:00:00")).

    Using the function and constraint, we implemented the functionality of the usual exclusion constraints, defined as EXCLUDE USING gist(r WITH &&)if the corresponding range is available in the r column.

    So the answer to the original question “is it possible to take advantage of both SQL and NoSQL DBMSs?” - yes. At least for now, a DBMS is used that has the capabilities of both types, namely PostgreSQL.

    An important point is if, as in our example, the scheme is known, the system will work faster and more efficiently by storing fields in a relational form. Of course, to work, you need to use the right tool, so applying JSON is only necessary if the circuit is at least semi-dynamic. At the same time, the possibility of declaring restrictions for part of the scheme is very useful even if the data is not relational, and everything does not work so fast. In the end, the whole point of a dynamic circuit is its flexibility.

    Our experience

    We use JSON in PostgreSQL in several projects. In particular, in a project for managing business processes, we store in these fields the values ​​of process variables, the structure of which is determined at the time of product introduction, and not during its development.

    Working with table fields is done through the Ruby On Rails framework adapter for PostgreSQL. Reading and writing work in native mode for Ruby - through hashes and lists. Thus, you can work with data from the field without additional transformations.

    The article describes an important aspect of the integrity of stored data, but often, in addition to simple storage, you need to search for it, for which PostgreSQL also has functional indexes that can significantly increase data access operations for frequent queries.

    Other technical articles on our blog:

    Also popular now: