plumqqz February 16, 2016 at 15:50

Using PostgreSQL Functions as Parameterized Views

In daily work, the task often arises of clearly and simply referring to large lists of columns and expressions in a sample, and / or dealing with bulky and unclear conditions in a sentence where. Usually, representations are used for these purposes, which is quite convenient and clear. You can compare the query:

select v.* from v_active_user vau, v_detailized_user v where v.id=vau.id

which is clearly understood as “we take active users and obtain detailed information on them” and the same request, but, so to speak, in expanded form:

select 
 u.first_name, u.last_name, u.patronymic_name,
 format('%s %s %s', u.first_name, u.last_name, u.patronymic_name) as fio,
 birth_date,
 (extract(days from now() - birth_date)/365.25)::int as full_years_count,
 hire_date,
 (current_date - hire_date) as work_time,
 (select string_agg(format('%s %s', ch.first_name, ch.last_name),', ') 
    from children ch 
   where array[u.id] <@ array[ch.mother_id, ch.father_id]) as children,
 dep.id as dep_id,
 coalesce(dep.name, get_localized_message('Внештатный сотрудник - неприменимо')) as dep_name
 ....
 .... as last_vaction_date,
 .... as salary_history
 from usr u join dep on u.dep_id=dep.id
 ....
 where u.state='active' and not exists(select * from pending_fires ...)
 and not exists(select * from usr_vacation uv where ...)
 and exists(...)
 and col123<>col321
 ...

Requests of this kind - with a large list of received columns and expressions based on them, with difficult conditions and which in real life are often aggravated by historical strata - are often completely unreadable and incomprehensible. Perhaps it is worth noting that the very change in the concept of “active” (for example, remove or add remote workers or employees on maternity leave, etc.) may not be a nontrivial, but very tedious task; and it’s unlikely to affect the number of errors quite favorably; and changing the list of columns or just expressions has similar consequences. Perhaps we can say that if the expression is select * from tablestrictly unacceptable for tables , then it is probably even preferable for representations of this kind. Well for some, at least.

Let's consider another problem. Suppose we have a simple user table:

create table usr(
 id serial primary key,
 name text,
 added timestamptz
)

and friends table:

create table friend(
 usr_id int references usr(id),
 friend_usr_id int references usr(id), 
 primary key(usr_id, friend_usr_id)
)

Required:
Get a specific user with a list of friends.

The solution is trivial:

select u.id, array(select f.friend_usr_id from friend f where f.usr_id=u.id) as friends
from usr u

Since this operation is required quite often, we create a view for it:

create view v_usr_with_friends as
select u.id, array(select f.friend_usr_id from friend f where f.usr_id=u.id) as friends
from usr u

Everything is fine, but a new requirement has appeared: to get a user with a list of friends who are simultaneously friends of another user (for example, viewing):

select u.id, 
       array(select f.friend_usr_id 
               from friend f 
              where f.usr_id=u.id
                and exists(select * from usr_friend f2 where f2.usr_id=another_usr_id and f2.friend_usr_id=f.friend_usr_id) 
              ) as friends
from usr u

Unfortunately, it is impossible to create a view based on this request - it is impossible to pass the identifier of the second user as a parameter; but it is possible to get around this limitation using the Cartesian product:

create or replace view usr_with_common_friends as
select u2.id as another_usr_id,
       u.id, 
       array(select f.friend_usr_id 
               from friend f 
              where f.usr_id=u.id
                and exists(select * from friend f2 where f2.usr_id=u2.id and f2.friend_usr_id=f.friend_usr_id) 
              ) as friends
from usr u, usr u2

Using the resulting view is completely natural:

select * from usr_with_common_friends where id=1 and another_usr_id=2

There is a new requirement: it is required to receive not just mutual friends, but common friends who have registered in the indicated period of time. Since it is not possible to create a table with all possible time intervals, you will have to create a function:

create or replace function usr_with_common_friends_created_at_tr(tr tstzrange)
 returns table(another_usr_id int, id int, friends int[]) as
 $code$
    select u2.id as another_usr_id,
           u.id, 
           array(select f.friend_usr_id 
                   from friend f, usr u3
                  where f.usr_id=u.id
                    and exists(select * from friend f2 where f2.usr_id=u2.id and f2.friend_usr_id=f.friend_usr_id)
                    and u3.id=f.friend_usr_id
                    and u3.added <@ tr
                  ) as friends
    from usr u, usr u2;   
 $code$
 language sql
**stable**

Use is also quite convenient:

select * from usr_with_common_friends_created_at_tr(tstzrange(now() - make_interval(years:=1), now())) where id=1 and another_usr_id=2

It would seem that a query using this function will work uncomplicatedly - first, the function will return all possible rows, and then they will be filtered by condition. Let's get a look:

explain
 select * from usr_with_common_friends_created_at_tr(tstzrange(now() - make_interval(years:=1), now())) where id=1 and another_usr_id=2

Plan:

QUERY PLAN
Nested Loop  (cost=0.30..67.17 rows=1 width=8)
  ->  Index Only Scan using usr_pkey on usr u  (cost=0.15..8.17 rows=1 width=4)
        Index Cond: (id = 1)
  ->  Index Only Scan using usr_pkey on usr u2  (cost=0.15..8.17 rows=1 width=4)
        Index Cond: (id = 2)
  SubPlan 1
    ->  Nested Loop  (cost=19.44..50.82 rows=1 width=4)
          ->  Hash Join  (cost=19.29..30.01 rows=6 width=8)
                Hash Cond: (f.friend_usr_id = f2.friend_usr_id)
                ->  Bitmap Heap Scan on friend f  (cost=4.24..14.91 rows=11 width=4)
                      Recheck Cond: (usr_id = u.id)
                      ->  Bitmap Index Scan on friend_pkey  (cost=0.00..4.24 rows=11 width=0)
                            Index Cond: (usr_id = u.id)
                ->  Hash  (cost=14.91..14.91 rows=11 width=4)
                      ->  Bitmap Heap Scan on friend f2  (cost=4.24..14.91 rows=11 width=4)
                            Recheck Cond: (usr_id = u2.id)
                            ->  Bitmap Index Scan on friend_pkey  (cost=0.00..4.24 rows=11 width=0)
                                  Index Cond: (usr_id = u2.id)
          ->  Index Scan using usr_pkey on usr u3  (cost=0.15..3.46 rows=1 width=4)
                Index Cond: (id = f.friend_usr_id)
                Filter: (added <@ tstzrange((now() - '1 year'::interval), now()))

Surprisingly, this is not so - the server managed to deploy the function directly to the request body. Yes, in some cases Postgresql can embed a function body directly in a query.
In what cases does this happen?

Scalar Functions:

The function is implemented in SQL ( LANGUAGE SQL) as a simple select, returning a scalar type
Function marked as immutableorstable
Function does not contain subqueries
Function not marked as security definer
The function has no specific set(n., set enable_seqscan=offEtc.)
The function returns only one column.
The return type must match the type of the function.
And a number of restrictions (see the list below for a complete list)

This can be useful for encapsulating simple, but cumbersome logic, for example:

create or replace function is_system_catalog_table_name(r anyelement) returns boolean as
$code$
  select substring(r.relname from 1 for 3)='pg_' -- выражение, конечно, может быть куда более замысловатым
$code$
immutable
language sql

Inquiry:

explain analyze
  select * from pg_class pc where is_system_catalog_table_name(pc)

Plan:

Seq Scan on pg_class pc  (cost=0.00..6955.59 rows=827 width=201)
  Filter: ("substring"((relname)::text, 1, 3) = 'pg_'::text)

As you can see, there is no function call here - the function code is inserted directly into the request body. This function can be considered as a kind of macro.

At the same time, I would like to draw attention to the compact syntax of the function call record - the record is immediately passed as a parameter, moreover, it is accepted not as a strictly defined type ( pg_classin this case), but as an arbitrary type with a column relname.

For instance:

create table t(id serial, relname text);
insert into t(relname) select relname from pg_class limit 100;
explain
  select * from t pc where is_system_catalog_table_name(pc)
Seq Scan on t pc  (cost=0.00..2.50 rows=1 width=25)
  Filter: ("substring"(relname, 1, 3) = 'pg_'::text)

Table functions have similar, but much milder restrictions:

The function is implemented in SQL ( LANGUAGE SQL)
Function immutableorstable
Function not security definer
Function not strict
No specific set
The body of the function contains a single select(and only select, insert/ update/ are deletenot allowed)
The return column types must match the types in the function declaration.
And a number of rather specific limitations

Thus, embedding the function body directly in the request implemented in Postgres makes it possible to efficiently implement the “representation with parameters” design, which is absent in the standard, but nevertheless demanded.

Interestingly, DB2 and SQL Server also use functions that are embedded in the query to solve the "presentation with parameters" task.

Links:
https://wiki.postgresql.org/wiki/Inlining_of_SQL_functions
https://www-01.ibm.com/support/knowledgecenter/SSEPGG_9.5.0/com.ibm.db2.luw.sql.ref.doc/doc /r0000935.html

Tags:

Using PostgreSQL Functions as Parameterized Views

Also popular now: