
What's New in PostgreSQL 11: Embedded Web Search

Continuing the topic of interesting features of the upcoming PostgreSQL 11 release, I would like to talk about the new built-in function websearch_to_tsquery . The corresponding patch was developed by Victor Drobny and Dmitry Ivanov, with edits from Fedor Sigaev. Let's see what is implemented in this patch.
It would seem that full-text search has been in PostgreSQL for a long time, and it works very well. What else could be added?
Imagine that you are making an online store based on PostgreSQL and you need to search by product. Here you have a form with a search query. To search the database from this query, you need to somehow build a tsvector. You can do this using the to_tsquery function. But to_tsquery expects the string to be in a specific format: In other words, in this case, you have to write a function that converts the user's query into a query that is understandable by to_tsquery. Inconveniently. In part, the plainto_tsquery and phraseto_tsquery functions solve this problem: But there is a problem with them. The fact is that the user can intuitively use quotes or, say, some kind of Boolean operators, because it works for Google, Yandex and other search engines. Let's see what happens in this case:
=# select to_tsquery('foo bar baz');
ERROR: syntax error in tsquery: "foo bar baz"
=# select to_tsquery('foo & bar & baz');
to_tsquery
-----------------------
'foo' & 'bar' & 'baz'
=# select plainto_tsquery('foo bar baz');
plainto_tsquery
-----------------------
'foo' & 'bar' & 'baz'
=# select phraseto_tsquery('foo bar baz');
phraseto_tsquery
---------------------------
'foo' <-> 'bar' <-> 'baz'
=# select plainto_tsquery('"foo bar" -baz or qux');
plainto_tsquery
-------------------------------
'foo' & 'bar' & 'baz' & 'qux'
Everything is broken! Oh. Is it really necessary to write your own parser?
So that it does not have to be written from scratch for every application, starting with PostgreSQL 11, the corresponding parser will now be directly in the DBMS: Besides the fact that websearch_to_tsquery understands quotes, minus signs and Boolean operators, it is interesting in that it ignores any attempts to make a syntax error. That is, you will never get an error, there will always be some kind of tsquery in the output: Another of the features of the function is the fact that it ignores any brackets. That is, it’s not like this:
=# select websearch_to_tsquery('"foo bar" -baz or qux');
websearch_to_tsquery
----------------------------------
'foo' <-> 'bar' & !'baz' | 'qux'
=# select websearch_to_tsquery('-"foo bar" ((( baz or or qux !@#$%^&*_+-=');
websearch_to_tsquery
--------------------------------------
!( 'foo' <-> 'bar' ) & 'baz' | 'qux'
=# select websearch_to_tsquery('foo and (bar or baz)');
websearch_to_tsquery
-----------------------
'foo' & 'bar' | 'baz'
This behavior was chosen for the reason that normal people (not IT people: trollface :) in practice do not use brackets anyway. Just ignoring them, we greatly simplify the implementation of the feature “make sure that the function never ends with an error”, and parsing the request will work faster. Perhaps a flag that includes support for parentheses will appear in future versions.
Such a simple but useful feature. At a minimum, it makes full-text search in PostgreSQL no worse than it was before, and allows developers to simplify the code of their applications.