Postgres search using ZomboDb and elasticsearch

At some point in the development of the project, the question arose of searching for a large number of texts. Moreover, the texts have different lengths: from tweets to large articles. At first, the built-in Postgres _ts vector was chosen as the main search engine . For a search by simple rules it was quite enough. The array of texts grew at a high speed, and the search rules became more complicated, so the built-in engine no longer covered the requirements.


Yes, there is sphinx, it has excellent integration with Postgres, but the goal was to find a solution for using elasticsearch with Postgres. Why? elasticsearch showed good results in some case-cases of the project. Yes, and there was already a server with it for storing logstash logs. There was also a desire to find such a tool that will completely take over the synchronization of data.


As a result, the ZomboDb project was found on the vast expanses of the network , which just fit the requirements.


Project page on github .


Install extension


This section is a retelling of the official instructions.


Currently supported package versions:


PackageVersions
Elasticsearch1.7.1+ (not 2.0)
Postgres9.3, 9.4, 9.5

My configuration: Postgres 9.4, elasticsearch 1.7.5


  1. From the page you need to download and install the package with the plugin for Postgres ( deb or rpm )
  2. In postgresql.conf add the line:


    local_preload_libraries = 'zombodb.so'`

  3. Restart the database and create extension:


    psql db_name -c "CREATE EXTENSION zombodb;"

  4. Next, from the same page, you need to download the plugin for elasticsearch and install it:


    bin/plugin -i zombodb -u file:///path/to/zombodb-plugin-X.X.X.zip

  5. Add to elasticsearch.yml :


    threadpool.bulk.queue_size: 1024
    threadpool.bulk.size: 12
    http.max_content_length: 1024mb
    index.query.bool.max_clause_count: 1000000

  6. Restart elasticsearch.

This completes the installation


Create a test index


Suppose there is a table in which the tags lie:


CREATE TABLE public.tags (
  id INTEGER PRIMARY KEY NOT NULL DEFAULT nextval('tags_id_seq'::regclass),
  word CHARACTER VARYING(100) NOT NULL
);

Index creation:


CREATE INDEX tags_zdb_search_idx
ON tags
USING zombodb (zdb('tags', tags.ctid), zdb(tags))
WITH (url='http://localhost:9200/');

As a result of this query, an index is created and the data immediately goes to elasticsearch.
A query that finds the words mom and dad :


SELECT *
FROM tags
WHERE zdb('tags', ctid) ==> 'word:(мама,папа)';

Where word is the name of the field that will be searched. The search is implemented using the ==> operator .


ZomboDb also provides phrase and fulltext domains based on the text type . Using custom domains, you can define mapping for elasticsearch.


Query Language


Using queries, you can search for individual fields of an indexed table, as well as for all fields.
Queries support logical operations ( and , or , not ), brackets.
It is possible to use various search operators. For example, request


SELECT *
FROM texts
WHERE zdb('texts', ctid) ==> 'text:папа';

where the operator is a colon, will return texts containing the word dad . More like this and fuzzy like this
operations are also supported through the operators: @ and: @ ~ respectively. Example:


SELECT *
FROM texts
WHERE zdb('texts', ctid) ==> '(text:@папа and title:@мама) or text:тетя';

Also, there is support for comparison operators:


SELECT *
FROM texts
WHERE zdb('texts', ctid) ==> 'comments > 10';

Detailed description of the query language in the documentation .


conclusions


A project is a good product that works out of the box. It is well documented, updated (the latest version of Postgres is supported, the last commit at the time of writing of the article 27 days ago). If it works well and stably in production, I will write a wrapper for sqlalchemy.


Thanks for attention!


UPD whipped up sqlalchemy extension


Also popular now: