Designing a Document-Oriented API in PostgreSQL (Part 1)

Published on December 04, 2015

Designing a Document-Oriented API in PostgreSQL (Part 1)

This article is a translation, the original article is here , by Rob Conery .

Postgres , as many people know, supports JSON as a type of data storage, and with the release of 9.4, Postgres now supports JSON storage in the form of jsonb - a binary format.

This is great news for those who want to step beyond simple “storing JSON as text." jsonb now supports indexing using the GIN index, and also has a special query operator that allows you to take advantage of the GIN index.

Who cares?


It was fun to discover jsonb at Postgres and see what it was capable of. Which, in its own way, is a problem: it is only acquaintance and reflection in order to perform some work, this is not enough.

It is understood that other systems (such as RethinkDB ) have a huge, already built-in functionality to help you save documents, send requests to these documents and carry out optimization. In Postgres , too, there are some interesting possibilities in this direction, but to write queries "out of the box" just a little ... not enough to be honest.

Let's look at this query:

select document_field -> 'my_key' from my_docs
where document_field @> '{"some_key" : "some_value"}';

It slightly reveals the strangeness of the moment when it comes to JSON and Postgres : these are all lines. Obviously, SQL cannot recognize JSON , so you have to format it as a string. Which in turn means that working with JSON directly in SQL is a pain. Of course, if you have a good means of compiling queries , then the problem is simplified to a certain extent ... but it still exists.

Moreover, document storage is quite free. Use single field which is jsonb? Or multiple fields in a larger table structure? It all depends on you, which, of course, is not bad, but too much freedom of choice can also be a paralyzing factor.

So why bother with this? If you want to use a document-oriented database, then use a document-oriented database. I agree with this ... but there is one really compelling reason to use Postgres (at least for me) ...

image

Postgres is ACID-compatible. So you can expect that she will record your data and, most likely, will not lose them .

In addition, PostgresIt is a relational database, which in turn means that, if desired, it is possible to switch to a more rigorous scheme over time. There are a certain number of reasons why you may want to choose Postgres , at the same time, suppose that the choice is made and it's time to start working with documents and jsonb .

Best API


As for me, I would like to see more features that support the idea of ​​working with documents. At the moment, we have built-in tools that allow us to deal with JSON types, but nothing that supports a higher level of abstraction.

This does not mean that we will not be able to build such an API with our own hands ... As I did. It starts ...

Document Oriented Table


I want to store documents in a table that contains meta-data, as well as additional ways of working with information, namely: Full Text Search.

The structure of the table itself can vary - why don't we build this abstraction! Let's start with this:

create table my_docs(
  id serial primary key,
  body jsonb not null,
  search tsvector,
  created_at timestamptz not null default now(),
  updated_at timestamptz not null default now()
)

There will be some duplication. The document, by itself, will be stored in the body field , including id, which, in turn, is stored as the primary key (this is necessary, since it is still Postgres ). I use duplication, however, for the following reasons:

  • This API belongs to me and I can be sure that everything is synchronized.
  • This is done in document-oriented systems.

Saving document


What would I like from the save_document function ...

  • Create tables on the fly
  • Create appropriate indexes
  • Create timestamps and a search field (for full-text index)

This can be achieved by making your own save_document function and, for fun, I will use PLV8 - javascript inside the database. In fact, I will create two functions - one will specifically create my table, the other will save the document itself.

First, create_document_table :

create function create_document_table(name varchar, out boolean)
as $$
  var sql = "create table " + name + "(" + 
    "id serial primary key," + 
    "body <b>jsonb</b> not null," + 
    "search tsvector," + 
    "created_at timestamptz default now() not null," + 
    "updated_at timestamptz default now() not null);";
  plv8.execute(sql);
  plv8.execute("create index idx_" + name + " on docs using GIN(body <b>jsonb</b>_path_ops)");
  plv8.execute("create index idx_" + name + "_search on docs using GIN(search)");
  return true;
$$ language plv8;

This function creates a table and related indexes - one for the jsonb field in our document-oriented table, the other for the tsvector full-text index. Please note that I build SQL strings on the fly and execute using plv8 - this is how it behaves with javascript in Postgres .

Next, let's create our save_document function:

create function save_document(tbl varchar, doc_string jsonb)
returns jsonb
as $$
  var doc = JSON.parse(doc_string);
  var result = null;
  var id = doc.id;
  var exists = plv8.execute("select table_name from information_schema.tables where table_name = $1", tbl)[0];
  if(!exists){
    plv8.execute("select create_document_table('" + tbl + "');");
  }
  if(id){
    result = plv8.execute("update " + tbl + " set body=$1, updated_at = now() where id=$2 returning *;",doc_string,id);
  }else{
    result = plv8.execute("insert into " + tbl + "(body) values($1) returning *;", doc_string);
    id = result[0].id;
    doc.id = id;
    result = plv8.execute("update " + tbl + " set body=$1 where id=$2 returning *",JSON.stringify(doc),id);
  }
  return result[0] ? result[0].body : null;
$$ language plv8;

I am sure that this function looks a little strange, but if you read it line by line, you can understand some things. But why is JSON .parse () called ?

This is due to the fact that the Postgres ' ovsky type jsonb here is not JSON th - he line. Outside of our PLV8 site, this is still the Postgres world and it works with JSON as a string (storing it in jsonb in binary format). Thus, when our document falls into our function as a string that needs to be parsed if we want to work with it as a JSON object in javascript.

In the case of insert, you can see that I have to synchronize the document ID with the primary key that was created. A bit cumbersome, but it works well.

As a result, you can see that when inserting the original, as well as when updating, the input argument for plv8.execute is doc_string . This is also due to the fact that JSON values ​​must be handled like strings in Postgres .

It really can be confusing. If I try to input a doc (our JSON .parsed object), it will be converted by plv8 to [Object object] . Which is strange.

Moreover, if I try to return a javascript object from this function (suppose our doc variable), I get an error that this is not the correct format for the JSON type . What drives into a stupor.

As a result, I simply return data from the result of the query - and this is a string, you want - believe it, you want it - no, and I can just pass it directly as a result. It is worth noting here that all the results of plv8.execute are returned in the form of elements with which you can work as with javascript objects.

Result


It works really well! And fast. If you want to try it in practice - you will need to install the PLV8 extension and then write your request according to:

create extension plv8;
select * from save_document('test_run', '{"name" : "Test"}');

You should see a new table and a new entry in this table:

image

Future plans


In the next article I will add some additional features, namely:

  • Automatic update search fields
  • Insert multiple documents using arrays

This is a good start!