JSONB queries in PostgreSQL

Original author: schinkel
  • Transfer
Earlier I wrote how to enable jsonb support in postgres / psycopg2 . Today I experimented with how to request data in columns like JSON.
There is documentation on this subject , but it was not entirely clear to me how the various operations work:

CREATETABLE json_test (
  idserial primary key,
  data jsonb
);
INSERTINTO json_test (data) VALUES 
  ('{}'),
  ('{"a": 1}'),
  ('{"a": 2, "b": ["c", "d"]}'),
  ('{"a": 1, "b": {"c": "d", "e": true}}'),
  ('{"b": 2}');


The request worked, let's get all the data to check:

SELECT * FROM json_test;
 id |                 data                 
----+--------------------------------------
  1 | {}
  2 | {"a": 1}
  3 | {"a": 2, "b": ["c", "d"]}
  4 | {"a": 1, "b": {"c": "d", "e": true}}
  5 | {"b": 2}
(5 rows)  

Now let's filter the results. There are several operators that we can use, and we will see later why we chose jsonb as the type.

Equality
In jsonb, we can verify that two JSON objects are identical:

SELECT * FROM json_test WHEREdata = '{"a":1}';
 id | data 
----+------
  1 | {"a": 1}
(1 row)


Limitations
We can also get a json object containing another, i.e. "Being a subset":

SELECT * FROM json_test WHEREdata @> '{"a":1}';

Says: - Give us all the objects starting with key a and values ​​1:

 id |                 data                 
----+--------------------------------------
  2 | {"a": 1}
  4 | {"a": 1, "b": {"c": "d", "e": true}}
(2 rows)

Limitations in both directions:
In this case, the query will display an empty object and an exact match for the second:

SELECT * FROM json_test WHEREdata <@ '{"a":1}';
 id |   data   
----+----------
  1 | {}
  2 | {"a": 1}
(2 rows)


Existence of a key / element
The last batch of operators will check the existence of a key (or an element of type string in an array).

 id |                 data                 
----+--------------------------------------
  2 | {"a": 1}
  3 | {"a": 2, "b": ["c", "d"]}
  4 | {"a": 1, "b": {"c": "d", "e": true}}
(3 rows)

Get the objects that have any keys from the list:

SELECT * FROM json_test WHEREdata ?| array['a', 'b'];
 id |                 data                 
----+--------------------------------------
  2 | {"a": 1}
  3 | {"a": 2, "b": ["c", "d"]}
  4 | {"a": 1, "b": {"c": "d", "e": true}}
  5 | {"b": 2}
(4 rows)

And all the values ​​of objects that have exact matching keys from the list:

SELECT * FROM json_test WHEREdata ?& array['a', 'b'];
 id |                 data                 
----+--------------------------------------
  3 | {"a": 2, "b": ["c", "d"]}
  4 | {"a": 1, "b": {"c": "d", "e": true}}
(2 rows)

Keys for bypassing
You can also filter records matching key-> path. In simple cases, using restriction operators can be simpler, but not in complex ones, they cannot be dispensed with. We can use these operations in SELECT, but it’s more interesting to apply them in the WHERE clause.

SELECT * FROM json_test WHEREdata ->> 'a' > '1';

We get all the entries of the values ​​of the associative element with the key a equal to 1.
Note the need to use a text value, not a number:

 id |           data            
----+---------------------------
  3 | {"a": 2, "b": ["c", "d"]}
(1 row)

We can make a comparison between the primitives of objects and arrays:

SELECT * FROM json_test WHEREdata -> 'b' > '1';
 id |                 data                 
----+--------------------------------------
  3 | {"a": 2, "b": ["c", "d"]}
  4 | {"a": 1, "b": {"c": "d", "e": true}}
  5 | {"b": 2}
(3 rows)

It turns out that arrays and objects are more than numbers.
We can also see a deeper path:

SELECT * FROM json_test WHEREdata#> '{b,c}' = '"d"';

We get an object where element b has a child object c, and c is equal to the string "d":

id |                 data                 
----+--------------------------------------
  4 | {"a": 1, "b": {"c": "d", "e": true}}

There are also versions of these statements that return text, not a JSON object. In the case of the last request, this means that we do not need to compare with the JSON object (in the version when we really want to get a string):

SELECT * FROM json_test WHEREdata#>> '{b,c}' = 'd';id |                 data----+--------------------------------------4 | {"a": 1, "b": {"c": "d", "e": true}}
(1row)

Thus, up to this point, everything is fine. We can work with different data, and the same data can be used in jsonb indexes too. However, a more attentive reader may have noticed that we are dealing with JSON data that has an object path from the root. This does not have to be like this: arrays are also valid JSON, indeed any of the valid examples are:

SELECT'null'::json, 
  'true'::json, 
  'false'::json, 
  '2'::json,
  '1.0001'::json,
  '"abc"'::json, 
  '1E7'::jsonb;

Note the last entry, which is a jsonb type and converts to canonical form:

 json | json | json  | json |  json   | json  |  jsonb   
------+------+-------+------+---------+-------+----------
 null | true | false | 2    | 1.00001 | "abc" | 10000000
(1 row)

JSON null is also different from SQL NULL.
So what happens when we store objects of a mixed “type” in a JSON column?

INSERTINTO json_test (data) 
VALUES ('[]'), ('[1,2,"a"]'), ('null'), ('1E7'), ('"abc"');
SELECT * FROM json_test;
id |                 data                 
----+--------------------------------------
  1 | {}
  2 | {"a": 1}
  3 | {"a": 2, "b": ["c", "d"]}
  4 | {"a": 1, "b": {"c": "d", "e": true}}
  5 | {"b": 2}
  6 | []
  7 | [1, 2, "a"]
  8 | null
  9 | 10000000
 10 | "abc"
(10 rows)

The whole structure was deduced without problems. Let's see if we can work with these objects and queries?
The equality check works fine:

SELECT * FROM json_test WHEREdata = '{"a":1}';
SELECT * FROM json_test WHEREdata = 'null';

Limitations also work as expected:

SELECT * FROM json_test WHEREdata @> '{"a":1}';
SELECT * FROM json_test WHEREdata <@ '{"a":1}';

Keys and existing items also work. Not surprisingly, a single request will match the elements in the array, as well as the keys in the object:

SELECT * FROM json_test WHEREdata ? 'a';
 id |                 data                 
----+--------------------------------------
  2 | {"a": 1}
  3 | {"a": 2, "b": ["c", "d"]}
  4 | {"a": 1, "b": {"c": "d", "e": true}}
  7 | [1, 2, "a"]
(4 rows)


SELECT * FROM json_test WHEREdata ?| array['a', 'b'];
 id |                 data                 
----+--------------------------------------
  2 | {"a": 1}
  3 | {"a": 2, "b": ["c", "d"]}
  4 | {"a": 1, "b": {"c": "d", "e": true}}
  5 | {"b": 2}
  7 | [1, 2, "a"]
(5 rows)


SELECT * FROM json_test WHEREdata ?& array['a', 'b'];
 id |                 data                 
----+--------------------------------------
  3 | {"a": 2, "b": ["c", "d"]}
  4 | {"a": 1, "b": {"c": "d", "e": true}}
(2 rows)

But as soon as we started to make the keys or elements of 'get' we get problems;

(Apparently, the author of the article had PotgreSQL 9.4 betta version installed at the time of writing, so some of the queries had errors, checked on 9.4.1 all queries were processed):

SELECT * FROM json_test WHEREdata ->> 'a' > '1';
ERROR: cannot call jsonb_object_field_text 
       (jsonb ->> textoperator) on an array

You can still use key-path bypass if you have non-scalar values:

SELECT * FROM json_test WHEREdata#> '{b,c}' = '"d"';ERROR:  cannot callextractpathfrom a scalar
SELECT * FROM json_test WHEREdata#> '{b,c}' = '"d"' AND id < 8;id |                 data----+--------------------------------------4 | {"a": 1, "b": {"c": "d", "e": true}}
(1row)

Note the syntax for key path, for strings (must have json keys) or integer (in array indices).
This imposes very severe restrictions. I do not know how such things work in MondgoDB.

But in the future, if you store data in arrays and in json objects in one column, then in the future there may be some problems. But not all is lost. You can get strings based on base objects:

SELECT * FROM json_test WHEREdata @> '{}';
 id |                 data                 
----+--------------------------------------
  1 | {}
  2 | {"a": 1}
  3 | {"a": 2, "b": ["c", "d"]}
  4 | {"a": 1, "b": {"c": "d", "e": true}}
  5 | {"b": 2}
(5 rows)

Then you can combine this request with the request above:

SELECT * FROM json_test WHEREdata @> '{}'ANDdata ->> 'a' > '1';
 id |           data            
----+---------------------------
  3 | {"a": 2, "b": ["c", "d"]}
(1 row)

Indeed, in Postgres you don’t even have to be sure that data @> '{} comes first.
But what if we only need array data types? It turns out you can use the same trick:

SELECT * FROM json_test WHEREdata @> '[]';
 id |    data     
----+-------------
  6 | []
  7 | [1, 2, "a"]
(2 rows)

And it can still be combined with other operators:

SELECT * FROM json_test WHEREdata @> '[]'ANDdata ->> 1 = '2';
 id |    data     
----+-------------
  7 | [1, 2, "a"]
(1 row)

Well, the @> operator entry is only available for jsonb columns, so you won’t be able to request mixed data for regular json columns.

What's next?

Considering jsonb in Postgres was a third-party project, I am currently working on json (b) queries in ORM django. With Django 1.7, you can write in the search functions, something like:

# Exact
MyModel.objects.filter(data={'a': 1})
MyModel.objects.exclude(data={})
# Key/element existence
MyModel.objects.filter(data__has='a')
MyModel.objects.filter(data__has_any=['a', 'b'])
MyModel.objects.filter(data__has_all=['a', 'b'])
# Sub/superset of key/value pair testing
MyModel.objects.filter(data__contains={'a': 1})
MyModel.objects.filter(data__in={'a': 1, 'b': 2})
# Get element/field (compare with json)
MyModel.objects.filter(data__get=(2, {'a': 1}))
# Get element/field (compare with scalar, including gt/lt comparisons)
MyModel.objects.filter(data__get=(2, 'a'))
MyModel.objects.filter(data__get__gt=('a', 1))
# key path traversal, compare with json or scalar.
MyModel.objects.filter(data__get=('{a,2}', {'foo': 'bar'}))
MyModel.objects.filter(data__get=('{a,2}', 2))
MyModel.objects.filter(data__get__lte=('{a,2}', 2))

But I'm not sure that the names from the last set will work. The name “get” seems a little universal, and maybe we could use different names to substitute the input type, although only integer and string are valid.

Also popular now: