zzashpaupat August 6, 2015 at 13:52

PostgreSQL 9.5: what's new? Part 1. INSERT ... ON CONFLICT DO NOTHING / UPDATE and ROW LEVEL SECURITY

From the sandbox

Part 2. TABLESAMPLE
Part 3. GROUPING SETS, CUBE, ROLLUP PostgreSQL 9.5 is expected to be released
in the 4th quarter of 2015 . As always, the new version, ~~in addition to new bugs,~~ brings new features and "buns". In this article, two of them will be considered, namely INSERT ... ON CONFLICT DO NOTHING / UPDATE and Row-level security. The second alpha version has already been released, so the most impatient ones can install it and try new functionality.
Download here

INSERT ... ON CONFLICT DO NOTHING / UPDATE
Row-level security

INSERT ... ON CONFLICT DO NOTHING / UPDATE

He is colloquially UPSERT. Allows in case of a conflict during insertion to update the fields or ignore the error.

What was previously proposed to be implemented using the stored function will now be available out of the box. You can use the ON CONFLICT DO NOTHING / UPDATE clause in an INSERT statement . In this case, the expression indicates separately conflict_target (by what field / condition the conflict will be considered) and conflict_action (what to do when the conflict occurred: DO NOTHING or DO UPDATE SET ).

The full syntax for an INSERT statement would be :

[ WITH [ RECURSIVE ] with_query [, ...] ]
INSERT INTO table_name [ AS alias ] [ ( column_name [, ...] ) ]
    { DEFAULT VALUES | VALUES ( { expression | DEFAULT } [, ...] ) [, ...] | query }
    [ ON CONFLICT [ conflict_target ] conflict_action ]
    [ RETURNING * | output_expression [ [ AS ] output_name ] [, ...] ]
where conflict_target can be one of:
    ( { column_name_index | ( expression_index ) } [ COLLATE collation ] [ opclass ] [, ...] ) [ WHERE index_predicate ]
    ON CONSTRAINT constraint_name
and conflict_action is one of:
    DO NOTHING
    DO UPDATE SET { column_name = { expression | DEFAULT } |
                    ( column_name [, ...] ) = ( { expression | DEFAULT } [, ...] ) |
                    ( column_name [, ...] ) = ( sub-SELECT )
                  } [, ...]
              [ WHERE condition ]

For us, the fun begins after ON CONFLICT.

Let's look at some examples. Let's create a table in which credentials of certain persons will lie:

CREATE TABLE account
(
  id bigserial,
  name varchar,
  surname varchar,
  address varchar,
  PRIMARY KEY (id),
  CONSTRAINT unique_person UNIQUE (name, surname, address)
);
Query returned successfully with no result in 31 ms.

Run the insert request

INSERT INTO account (id, name, surname, address)
VALUES (1, 'Вася', 'Пупкин', 'Москва, Кремль')
ON CONFLICT (id) DO NOTHING;
Query returned successfully: one row affected, 12 ms execution time.
SELECT * FROM ACCOUNT;

id	name	surname	address
1	Vasya	Pupkin	Moscow Kremlin

Here, conflict_target is (id) and conflict_action is DO NOTHING .
If you try to execute this request a second time, then the insertion will not occur, and it will not give any error message:

Query returned successfully: 0 rows affected, 12 ms execution time.

If we did not specify ON CONFLICT (id) DO NOTHING , we would get an error:

INSERT INTO account (id, name, surname, address)
VALUES (1, 'Вася', 'Пупкин', 'Москва, Кремль');
********** Error **********
ERROR: duplicate key value violates unique constraint "account_pkey"
SQL state: 23505
Detail: Key (id)=(1) already exists.

The same behavior (as with ON CONFLICT (id) DO NOTHING ) will be in the request:

INSERT INTO account (id, name, surname, address)
VALUES (DEFAULT, 'Вася', 'Пупкин', 'Москва, Кремль')
ON CONFLICT (name, surname, address) DO NOTHING;
Query returned successfully: 0 rows affected, 12 ms execution time.

In it, we already take the default id value (from the sequence), but specify another conflict_target - in three fields that are uniquely constrained.

As mentioned above, you can also specify conflict_target using the ON CONSTRAINT construct , specifying the constraint name directly:

INSERT INTO account (id, name, surname, address)
VALUES (DEFAULT, 'Вася', 'Пупкин', 'Москва, Кремль')
ON CONFLICT ON CONSTRAINT unique_person DO NOTHING;
Query returned successfully: 0 rows affected, 11 ms execution time.

This is especially useful if you have an exclusion constraint, which you can refer to only by name and not by a set of columns, as is the case with the uniqueness constraint.

If you have built a partial unique index, then this can also be specified in the condition. Suppose that only people with the name Vasya will have unique combinations of surname + address in our table:

ALTER TABLE account DROP CONSTRAINT unique_person; 
CREATE UNIQUE INDEX unique_vasya ON account (surname, address) WHERE name='Вася';

Then we can write a query like this:

INSERT INTO account (id, name, surname, address)
VALUES (DEFAULT, 'Вася', 'Пупкин', 'Москва, Кремль')
ON CONFLICT (surname, address) WHERE name='Вася' DO NOTHING;
Query returned successfully: 0 rows affected, 12 ms execution time.

And finally, if you want DO NOTHING to be triggered when any uniqueness / exception conflict occurs during insertion, then this can be written as follows:

INSERT INTO account (id, name, surname, address)
VALUES (DEFAULT, 'Вася', 'Пупкин', 'Москва, Кремль')
ON CONFLICT DO NOTHING;
Query returned successfully: 0 rows affected, 12 ms execution time.

It is worth noting that it is impossible to set multiple conflict_action , so if one of them is specified, and the other works, then there will be an error when inserting:

INSERT INTO account (id, name, surname, address)
VALUES (DEFAULT, 'Вася', 'Пупкин', 'Москва, Кремль')
ON CONFLICT (id) DO NOTHING;
********** Error **********
ERROR: duplicate key value violates unique constraint "unique_person"
SQL state: 23505
Detail: Key (name, surname, address)=(Вася, Пупкин, Москва, Кремль) already exists.

Let's move on to the DO UPDATE SET features .

To DO UPDATE SET unlike DO NOTHING indication conflict_action necessary.

The DO UPDATE SET construct updates the fields that are specified in it. The values of these fields can be specified explicitly, set by default, obtained from a subquery, or taken from a special EXCLUDED expression , from which you can take data that was originally proposed for insertion.

INSERT INTO account (id, name, surname, address)
VALUES (1, 'Петя', 'Петров', 'Москва, Кремль')
ON CONFLICT (id)
DO UPDATE SET
name='Петя',
surname='Петров';
Query returned successfully: one row affected, 11 ms execution time.
SELECT * FROM ACCOUNT;

id	name	surname	address
1	Petya	Petrov	Moscow Kremlin

INSERT INTO account AS a (id, name, surname, address)
VALUES (1, 'Петя', 'Петров', 'Москва, Кремль')
ON CONFLICT (id)
DO UPDATE SET
name=EXCLUDED.name || ' (бывший ' || a.name || ')',
surname=EXCLUDED.surname || ' (бывший ' || a.surname || ')';
Query returned successfully: one row affected, 13 ms execution time.
SELECT * FROM ACCOUNT;

id	name	surname	address
1	Petya (former Vasya)	Petrov (former Pupkin)	Moscow Kremlin

INSERT INTO account (id, name, surname, address)
VALUES (1, 'Вася', 'Пупкин', 'Москва, Кремль')
ON CONFLICT (id) DO UPDATE SET
name=DEFAULT,
surname=DEFAULT;
Query returned successfully: one row affected, 11 ms execution time.
SELECT * FROM ACCOUNT;

id	name	surname	address
1	Null	Null	Moscow Kremlin

INSERT INTO account (id, name, surname, address)
VALUES (1, 'Вася', 'Пупкин', 'Москва, Кремль')
ON CONFLICT (id) DO UPDATE SET
name=(SELECT some_field FROM other_table LIMIT 1);

The WHERE clause can also be used . For example, we want the name field not to be updated if the text “Kremlin” already exists in the address field in the table row, otherwise it was updated:

INSERT INTO account AS a (id, name, surname, address)
VALUES (1, 'Вася', 'Пупкин', 'Москва, Кремль')
ON CONFLICT (id) DO UPDATE SET
name=EXCLUDED.name
WHERE a.name not like '%Кремль%';
Query returned successfully: 0 rows affected, 12 ms execution time.

And if we want the name field not to be updated, if the Kremlin text is in the address field of the inserted data, otherwise it was updated:

INSERT INTO account AS a (id, name, surname, address)
VALUES (1, 'Вася', 'Пупкин', 'Москва, Красная площадь')
ON CONFLICT (id) DO UPDATE SET
name=EXCLUDED.name
WHERE EXCLUDED.name not like '%Кремль%';
Query returned successfully: one row affected, 11 ms execution time.
SELECT * FROM ACCOUNT

id	name	surname	address
1	Vasya	Null	Moscow Kremlin

ROW LEVEL SECURITY

Row-level security or row-level security is a mechanism for delimiting access to database information, which allows users to restrict access to individual rows in tables.

This functionality may be of interest to those who use databases with a large number of users.

It works as follows: the rules for a specific table are described, according to which access to specific rows is limited when certain commands are executed, using the CREATE POLICY expression . Each rule contains a certain logical expression that must be true so that the string is visible in the request. The rules are then activated using the expression ALTER TABLE ... ENABLE ROW LEVEL SECURITY . Then, when trying to access, for example, during a SELECT query, it is checked whether the user has the right to access a specific line and if not, they are not shown to him. The superuser can see all rows by default, since he has the BYPASSRLS flag set by default, which means that for this role checks will not be performed.

The syntax for the CREATE POLICY expression is :

CREATE POLICY name ON table_name
    [ FOR { ALL | SELECT | INSERT | UPDATE | DELETE } ]
    [ TO { role_name | PUBLIC | CURRENT_USER | SESSION_USER } [, ...] ]
    [ USING ( using_expression ) ]
    [ WITH CHECK ( check_expression ) ]

Rules are created for specific tables, so there can be several rules in the database with the same name for different tables.
After the FOR expression, it is indicated for which particular requests the rule is applied, by default - ALL , that is, for all requests.

After TO - for which roles, by default - PUBLIC , that is, for all roles.

Next, a Boolean expression is specified in the USING expression, which must be true so that a particular string is visible to the user in queries that use existing data ( SELECT , UPDATE , DELETE) If the boolean expression returns null or false, then the line will not be visible.

The WITH CHECK expression specifies a Boolean expression that must be true for a query that adds or modifies data ( INSERT or UPDATE ) to succeed. If the Boolean expression returns null or false, then there will be an error. The WITH CHECK expression is executed after BEFORE triggers (if present) and before any other checks. Therefore, if the BEFORE trigger modifies the string so that the condition does not return true, there will be an error. For successful UPDATE, it is necessary that both conditions return true, including if the requestINSERT ... ON CONFILCT DO UPDATE A conflict will occur and the request will try to modify the data. If the WITH CHECK expression is omitted, the condition from the USING expression will be substituted for it .
Under conditions, aggregate or window functions cannot be used.

Usually, you need to control access based on which database user is requesting data, therefore, functions that return information about the system ( System Information Functions ) are useful to us .

Let's move on to the examples:

Add the db_user field to the account table , fill this field for an existing record, and add new records:

ALTER TABLE account ADD COLUMN db_user varchar;
Query returned successfully with no result in 16 ms.
UPDATE account SET db_user='pupkin' WHERE surname='Пупкин';
INSERT INTO account (name, surname, address, db_user)
VALUES ('Петр', 'Петров', 'Москва, Красная площадь', 'petrov'), 
('Иван', 'Сидоров', 'Санкт-Петербург, Зимний дворец', 'sidorov');
Query returned successfully: 2 rows affected, 31 ms execution time.

Create the roles:

CREATE ROLE pupkin WITH LOGIN PASSWORD 'pupkin';
CREATE ROLE petrov WITH LOGIN PASSWORD 'petrov';
Query returned successfully with no result in 31 ms.

Create a rule and enable RLS on the table:

CREATE POLICY select_self ON account
FOR SELECT
USING (db_user=current_user);
ALTER TABLE account ENABLE ROW LEVEL SECURITY;
Query returned successfully with no result in 12 ms.

In this query, we created a rule according to which, in the SELECT query, the user will see only those rows in which the value of the db_user field matches the name of the current database user.

We execute the request from the postgres user:

SELECT * FROM account

id	name	surname	address	db_user
1	Vasya	Pupkin	Moscow Kremlin	pupkin
5	Peter	Petrov	Moscow Red Square	petrov
6	Ivan	Sidorov	St. Petersburg, Winter Palace	sidorov

We execute the same request from pupkin:

id	name	surname	address	db_user
1	Vasya	Pupkin	Moscow Kremlin	pupkin

Let's create a rule according to which only pupkin user can insert lines with the last name “Pupkin”:

CREATE POLICY insert_update_pupkin ON account
WITH CHECK (surname<>'Пупкин' OR current_user='pupkin')

Let's try to execute a request from pupkin:

INSERT INTO account (name, surname, address)
VALUES ('Дмитрий', 'Пупкин', 'Киев, Майдан')
Query returned successfully: one row affected, 13 ms execution time.

Check:

select * from account;

id	name	surname	address	db_user
1	Vasya	Pupkin	Moscow Kremlin	pupkin

Oops! We forgot to specify the db_user field and we will not see the record that we inserted. Well, let's fix this logic with a trigger in which we will fill the db_user field with the name of the current user:

CREATE OR REPLACE FUNCTION fill_db_user() RETURNS TRIGGER AS 
$BODY$
BEGIN
  NEW.db_user = current_user;
  RETURN NEW;
END;
$BODY$
LANGUAGE plpgsql VOLATILE;
CREATE TRIGGER fill_db_user BEFORE INSERT ON account
FOR EACH ROW EXECUTE PROCEDURE fill_db_user();

Try again:

INSERT INTO account (name, surname, address)
VALUES ('Иван', 'Пупкин', 'Киев, Майдан');
select * from account;

id	name	surname	address	db_user
1	Vasya	Pupkin	Moscow Kremlin	pupkin
21	Ivan	Pupkin	Kiev, Maidan	pupkin

Let's try to change the data about Ivan Pupkin by user petrov:

UPDATE account SET db_user='petrov'
WHERE id=21
Query returned successfully: 0 rows affected, 13 ms execution time.

As you can see, the data has not changed, this happened because the USING clause from the select_self rule was not fulfilled.

If several rules correspond to one request, then they are combined via OR .

It is worth noting that the rules only apply to explicit queries to tables and do not apply to checks performed by the system (constaints, foreign keys, etc.). This means that the user, using queries, determine that a value exists in the database. For example, if a user can insert into a table that refers to another table from which he cannot make SELECT. In this case, he can try to make INSERT in the first table and, based on the result (an insert occurred or an error occurred while checking the referential integrity), determine whether a value exists in the second table.

There are many options for using row-level security:

several applications with different functionality use the same database
multiple instances of the same application with different rights
access by role or user group
etc.

In the following parts I plan to consider such new features of PostgreSQL 9.5 as:

Part 2. TABLESAMPLE
SKIP LOCKED
BRIN indexes
GROUPING SETS, CUBE, ROLLUP
New features for JSONB
IMPORT FOREIGN SCHEMA
other

Tags:

PostgreSQL 9.5: what's new? Part 1. INSERT ... ON CONFLICT DO NOTHING / UPDATE and ROW LEVEL SECURITY

INSERT ... ON CONFLICT DO NOTHING / UPDATE

ROW LEVEL SECURITY

Also popular now: