What was frozen at feature freeze 2019. Part I. JSONPath


    After the commit event 2019-03, the feature was frozen (feature freeze). We have this almost traditional section: we already wrote about last year’s freeze . Now, the results of 2019: which of the new will be included in PostgreSQL 12. In this part of the JSONPath review, examples and fragments from the report "Postgres 12 in Etudes", which Oleg Bartunov read at Saint Highload ++ in St. Petersburg on April 9 this year, are used.

    Jsonpath


    Everything related to JSON (B) is relevant, in demand in the world, in Russia, and this is one of the most important areas of development at Postgres Professional. The jsonb type, functions and operators for working with JSON / JSONB appeared in PostgreSQL version 9.4, they were made by a team led by Oleg Bartunov.

    The SQL / 2016 standard provides for working with JSON: JSONPath is mentioned there - a set of data addressing tools inside JSON; JSONTABLE - means of converting JSON to regular tables; A large family of functions and operators. Despite the fact that JSON in Postgres has been supported for a long time, in 2017 Oleg Bartunov and his colleagues began to work on supporting the standard. Complying with the standard is always good. Of everything that is described in the standard, only one but the most important patch is JSONPath in version 12, so we will talk about it first of all.

    In ancient times, people used JSON, storing it in text fields. In 9.3, a special data type for JSON appeared, but the functionality associated with it was not rich, and requests with this type worked slowly due to the time spent on parsing the text representation of JSON. This stopped many potential Postgres users who preferred NoSQL databases. Postgres productivity increased at 9.4 when, thanks to O. Bartunov, A. Korotkov, and F. Sigaev, Postgres introduced a binary version of JSON - the jsonb type.
    jsonb does not need to be parsed every time, so working with it is much faster. Of the new functions and operators that arose simultaneously with it, some work only with a new, binary type, such as, for example, the important operator of occurrence @>, which checks if an element or array is in the given JSONB:

    SELECT '[1, 2, 3]'::jsonb @> '[1, 3]'::jsonb;

    gives TRUE, since the array on the right side enters the array on the left. But

    SELECT '[1, 2, [1, 3]]'::jsonb @> '[1, 3]'::jsonb;

    will give FALSE, since the level of nesting is different, it must be set explicitly. Is existence operator introduced for jsonb type ? (a question mark) that checks whether a string is an object key or an element of an array at the top level of JSONB values, as well as two more similar operators (details here ). They are supported by GIN indexes with two classes of GIN operators. The operator -> (arrow) allows you to "navigate" through JSONB, it returns a value by key or, if it is an array, by index. There are several more operators for moving. But there is no way to organize filters that work similarly to WHERE. It was a breakthrough: thanks to jsonb, Postgres began to grow in popularity as an RDBMS with NoSQL features.

    In 2014, A. Korotkov, O. Bartunov and F. Sigaev developed the jsquery extension, which was included as a result in Postgres Pro Standard 9.5 (and in later versions of Standard and Enterprise). It provides additional, very broad features for working with json (b). This extension defines the query language for extracting data from json (b) and indexes to speed up these queries. This functionality was required by users, they were not ready to wait for the standard and the inclusion of new features in the vanilla version. The practical value is also evidenced by the fact that the development was sponsored by Wargaming.net. The extension implements a special type - jsquery.

    A query in this language is compact and looks like this:

    SELECT '{"apt":[{"no": 1, "rooms":2}, {"no": 2, "rooms":3}, {"no": 3, "rooms":2}]}'::jsonb @@ 'apt.#.rooms=3'::jsquery;

    We are asking here whether there are “three rubles” in the apartment building. The jsquery type must be specified because the @@ operator is now also in the jsonb type. The description is here , and the presentation with many examples is here .

    Total: Postgres already had everything for working with JSON, and then the SQL: 2016 standard appeared. It turned out that its semantics are not so different from ours in the jsquery extension. It is possible that the authors of the standard even glanced at jsquery, inventing JSONPath. Our team had to implement a little differently what we already had and, of course, a lot of new things too.

    More than a year ago, at the March commitfest, the fruits of our programming efforts were offered to the community in the form of 3 large patches with support for the SQL: 2016 standard :

    SQL / JSON: JSONPath;
    SQL / JSON: functions;
    SQL / JSON: JSON_TABLE.

    But to develop a patch is not the whole business, promoting them is also not easy, especially if the patches are large and affect many modules. Many iterations of revision revision are required, the patch must be promoted, as commercial companies do, investing a lot of resources (man-hours). The chief architect of Postgres Professional, Alexander Korotkov, took it upon himself (since he now has the status of a committer) and secured the adoption of the JSONPath patch - the main one in this series of patches. The second and third are now in the status of Needs Review. The focused JSONPath allows you to work with the JSON (B) structure and is flexible enough to highlight its fragments. Of the 15 points prescribed in the standard, 14 are implemented, and this is more than in Oracle, MySQL and MS SQL.

    JSONPath notation differs from Postgres statements for working with JSON and JSQuery notation. The hierarchy is indicated by dots:

    $ .abc (in postgres 11 notation, you would have to write 'a' -> 'b' -> 'c');
    $ - the current context of the element - in fact, the expression with $ defines the json (b) region that is to be processed, including the one in the filter, the rest of this is not available for work;
    @ - the current context in the filter expression - iterates over the paths available in the expression with $;
    [*] - an array;
    * - wildcard, in the expression with $ or @ means any value of the path segment, but taking into account the hierarchy;
    ** - as part of the expression with $ or @ can mean any value of the path segment without taking into account the hierarchy - it is convenient to use it if you do not know the level of nesting of elements;
    operator "?" allows you to organize a filter similar to WHERE:
    $ .abc? (@ .x> 10);
    $ .abcxtype (), as well as size (), double (), ceiling (), floor (), abs (), datetime (), keyvalue () are methods.
    A query with the jsonb_path_query function (about functions below) might look like this:

    SELECT jsonb_path_query_array('[1,2,3,4,5]', '$[*] ? (@ > 3)');
     jsonb_path_query_array 
    ------------------------
     [4, 5]
    (1 row)

    Although a special patch with functions is not commited, the JSONPath patch already has key functions for working with JSON (B):


    jsonb_path_exists('{"a": 1}', '$.a') возвращает true (вызывается оператором "?")
    jsonb_path_exists('{"a": 1}', '$.b') возвращает false
    jsonb_path_match('{"a": 1}', '$.a == 1') возвращает true (вызывается оператором "@>")
    jsonb_path_match('{"a": 1}', '$.a >= 2') возвращает false
    jsonb_path_query('{"a": [1,2,3,4,5]}', '$.a[*] ? (@ > 2)') возвращает 3, 4, 5
    jsonb_path_query('{"a": [1,2,3,4,5]}', '$.a[*] ? (@ > 5)') возвращает 0 записей
    jsonb_path_query_array('{"a": [1,2,3,4,5]}', '$.a[*] ? (@ > 2)') возвращает [3, 4, 5]
    jsonb_path_query_array('{"a": [1,2,3,4,5]}', '$.a[*] ? (@ > 5)') возвращает []
    jsonb_path_query_first('{"a": [1,2,3,4,5]}', '$.a[*] ? (@ > 2)') возвращает 3
    jsonb_path_query_first('{"a": [1,2,3,4,5]}', '$.a[*] ? (@ > 5)') возвращает NULL

    Note that equality in JSONPath expressions is a single "=", while in jsquery it is double: "==".

    For more elegant illustrations, we will generate JSONB in ​​a single-column house plate:

    CREATE TABLE house(js jsonb);
    INSERT INTO house VALUES
    ('{
    	"address": {
    		   "city":"Moscow",
    		   "street": "Ulyanova, 7A"
    	},
    	"lift": false,
    	"floor": [
    		 {
    			"level": 1,
    		 	"apt": [
    		       	       {"no": 1, "area": 40, "rooms": 1},
    		       	       {"no": 2, "area": 80, "rooms": 3},
    		       	       {"no": 3, "area": 50, "rooms": 2}
    			]
    		},
    		{
    			"level": 2,
    			"apt": [
    		       	       {"no": 4, "area": 100, "rooms": 3},
    		       	       {"no": 5, "area": 60, "rooms": 2}
    			]
    		}
    	]
    }');


    Fig. 1 Housing JSON tree with allocated leaf apartments.

    This is a strange JSON: it has a confused hierarchy, but it is taken from life, and in life it is often necessary to work with what is, and not with what should be. Armed with the capabilities of the new version, we will find apartments on the 1st and 2nd floors, but not the first in the list of floor apartments (on the tree they are highlighted in green):

    SELECT jsonb_path_query_array(js, '$.floor[0, 1].apt[1 to last]')
    FROM house;
    ---------------------
    [{"no": 2, "area": 80, "rooms": 3}, {"no": 3, "area": 50, "rooms": 2}, {"no": 5, "area": 60, "rooms": 2}]
    

    In PostgreSQL 11, you have to ask this:

    SELECT jsonb_agg(apt) FROM (
            SELECT apt->generate_series(1, jsonb_array_length(apt) - 1) FROM (
                 SELECT js->'floor'->unnest(array[0, 1])->'apt' FROM house
            ) apts(apt)
    ) apts(apt);

    Now a very simple question: are there lines containing (anywhere) the value “Moscow”? Really simple:

    SELECT jsonb_path_exists(js, '$.** ? (@ == "Moscow")') FROM house;

    In version 11, you would have to write a huge script:

    WITH RECURSIVE t(value) AS (
          SELECT * FROM house UNION ALL (
                SELECT COALESCE(kv.value, e.value) AS value
                FROM t
                LEFT JOIN LATERAL jsonb_each (
                    CASE WHEN jsonb_typeof(t.value) = 'object' THEN t.value
                    ELSE NULL END
                ) kv ON true
                LEFT JOIN LATERAL jsonb_array_elements (
                    CASE WHEN jsonb_typeof(t.value) = 'array' THEN t.value
                    ELSE NULL END
                ) e ON true
                WHERE kv.value IS NOT NULL OR e.value IS NOT NULL
          )
    ) SELECT EXISTS (SELECT 1 FROM t WHERE value = '"Moscow"');


    Fig.2 Tree of Housing JSON, Moscow was found!

    We are looking for any apartment on any floor with an area of ​​40 to 90 sq.m:

    select jsonb_path_query(js, '$.floor[*].apt[*] ? (@.area > 40 && @.area < 90)') FROM house;
             jsonb_path_query          
    -----------------------------------
     {"no": 2, "area": 80, "rooms": 3}
     {"no": 3, "area": 50, "rooms": 2}
     {"no": 5, "area": 60, "rooms": 2}
    (3 rows)
    

    We are looking for apartments with rooms after the 3rd, using our housing jason:

    SELECT jsonb_path_query(js, '$.floor.apt.no ? (@>3)') FROM house;
     jsonb_path_query 
    ------------------
     4
     5
    (2 rows)

    And here is how jsonb_path_query_first works:

    SELECT jsonb_path_query_first(js, '$.floor.apt.no ? (@>3)') FROM house;
     jsonb_path_query_first 
    ------------------------
     4
    (1 row)

    We see that only the first value is selected that satisfies the filter condition.

    The Boolean JSONPath operator for JSONB @@ is called the match operator. It computes the JSONPath predicate by calling the jsonb_path_match_opr function.

    Another boolean operator is @? - this is a check of existence, answers the question whether the JSONPath expression will return SQL / JSON objects, it calls the jsonb_path_exists_opr function:

    проверка '[1,2,3]' @@ '$[*] == 3' возвращает true;
    и '[1,2,3]' @? '$[*] @? (@ == 3)' - тоже true

    The same result can be achieved using different operators:

    
    js @? '$.a' эквивалентно js @@ 'exists($.a)'
    js @@ '$.a == 1' эквивалентно js @? '$ ? ($.a == 1)'

    The beauty of JSONPath Boolean operators is that they are supported, accelerated by GIN indices. jsonb_ops and jsonb_path_ops are the corresponding operator classes. In the example, we disable SEQSCAN, since we have a microtable, on large tables the optimizer itself will select the Bitmap Index:

    SET ENABLE_SEQSCAN TO OFF;
    CREATE INDEX ON house USING gin (js);
    EXPLAIN (COSTS OFF) SELECT * FROM house
          WHERE js @? '$.floor[*].apt[*] ? (@.rooms == 3)';
    QUERY PLAN
    --------------------------------------------------------------------------------
    Bitmap Heap Scan on house
       Recheck Cond: (js @? '$."floor"[*]."apt"[*]?(@."rooms" == 3)'::jsonpath)
       -> Bitmap Index Scan on house_js_idx
           Index Cond: (js @? '$."floor"[*]."apt"[*]?(@."rooms" == 3)'::jsonpath)
    (4 rows)

    All functions of the form jsonb_path_xxx () have the same signature:

    jsonb_path_xxx(
    js jsonb,
    jsp jsonpath,
    vars jsonb DEFAULT '{}',
    silent boolean DEFAULT false
    )

    vars is a JSONB object for passing JSONPath variables:

    SELECT jsonb_path_query_array('[1,2,3,4,5]', '$[*] ? (@ > $x)',
    vars => '{"x": 2}');
    jsonb_path_query_array
    ------------------------
    [3, 4, 5]

    It’s hard to do without vars when we make a join involving a jsonb type field in one of the tables. Let's say we make an application that looks for suitable apartments for employees in that house who have written down their requirements for the minimum area in the questionnaire:

    CREATE TABLE demands(name text, position text, demand int);
    INSERT INTO demands VALUES ('Саша','босс', 85), ('Паша','младший программист', 45);
    SELECT jsonb_path_query(js, '$.floor[*].apt[*] ? (@.area >= $min)', vars => jsonb_build_object('min', demands.demand)) FROM house, demands WHERE name = 'Паша';
    -[ RECORD 1 ]----+-----------------------------------
    jsonb_path_query | {"no": 2, "area": 80, "rooms": 3}
    -[ RECORD 2 ]----+-----------------------------------
    jsonb_path_query | {"no": 3, "area": 50, "rooms": 2}
    -[ RECORD 3 ]----+-----------------------------------
    jsonb_path_query | {"no": 4, "area": 100, "rooms": 3}
    -[ RECORD 4 ]----+-----------------------------------
    jsonb_path_query | {"no": 5, "area": 60, "rooms": 2}

    Lucky Pasha can choose from 4 apartments. But it is worth changing 1 letter in the request - from "P" to "C", and there will be no choice! Only 1 apartment will do.


    One more keyword remains: silent is a flag that suppresses error handling; they are on the conscience of the programmer.

    SELECT jsonb_path_query('[]', 'strict $.a');
    ERROR: SQL/JSON member not found
    DETAIL: jsonpath member accessor can only be applied to an object

    Mistake. But this will not be an error:

    SELECT jsonb_path_query('[]', 'strict $.a', silent => true);
    jsonb_path_query
    ------------------
    (0 rows)

    By the way, about errors: in accordance with the standard, arithmetic errors in expressions do not give error messages, they are on the conscience of the programmer:

    SELECT jsonb_path_query('[1,0,2]', '$[*] ? (1/ @ >= 1)');
    jsonb_path_query
    ------------------
    1
    (1 row)

    When calculating the expression in the filter, the array values ​​are sorted, among which there are 0, but dividing by 0 does not generate an error.

    Functions will work differently depending on the selected mode: Strict or Lax (in the translation “non-strict” or even “loose”, it is selected by default). Suppose we are looking for a key in Lax mode in JSON, where it is obviously not:

    SELECT jsonb '{"a":1}' @? 'lax $.b ? (@ > 1)';
    ?column?
    ----------
    f
    (1 row)

    Now in Strict mode:

    SELECT jsonb '{"a":1}' @? 'strict $.b ? (@ > 1)';
    ?column?
    ----------
    (null)
    (1 row)

    That is, where under liberal mode we got FALSE, with strict we got NULL.

    In Lax mode, an array with a complex hierarchy [1,2, [3,4,5]] always expands to [1,2,3,4,5]:

    SELECT jsonb '[1,2,[3,4,5]]' @? 'lax $[*] ? (@ == 5)';
    ?column?
    ----------
    t
    (1 row)

    In Strict mode, the number “5” will not be found, since it is not at the bottom of the hierarchy. To find it, you have to modify the request, replacing "@" with "@ [*]":

    SELECT jsonb '[1,2,[3,4,5]]' @? 'strict $[*] ? (@[*] == 5)';
    ?column?
    ----------
    t
    (1 row)

    In PostgreSQL 12, JSONPath is a data type. The standard does not say anything about the need for a new type, it is a property of implementation. With the new type, we get full-fledged work with jsonpath with the help of operators and indexes accelerating their work, which already exist for JSONB. Otherwise, JSONPath would have to be integrated at the level of the executor and optimizer code.

    You can read about SQL / JSON syntax, for example, here .

    Oleg Bartunov's blog post is about SQL / JSON standard-2016 conformance for PostgreSQL, Oracle, SQL Server and MySQL.

    Here's a presentation on SQL / JSON.

    And here is an introduction to SQL / JSON.

    Also popular now: