Comparing Postgres Schemas and Related View Issues

Original author: Greg Sabino Mullane
  • Transfer
Comparing the schemas of two or more different Postgres databases is a common task, but it can become more tricky if these databases run on different versions of Postgres. A quick and canonical way to compare schemes is to use the same pg_dump program to interact with each database with the --schema-only parameter. This method works great, but there are some pitfalls, especially when copying views .

image
( Photo taken by Philippe Vieux-Jeanton )

Premise


Let's start with some assumptions about how this problem was discovered. We have a copy that is in the process of updating the version of Postgres from 9.2 to 9.6 (the latest version at the time of writing). Using pg_upgrade was not possible, since it was planned not only to include checksums of data , but also to change the encoding to UTF-8. A number of factors, especially the encoding change, meant that the typical update process is pg_dump old_database | psql new_database is not possible. Thus, we have a very specific program that accurately migrates parts of the data, performing actions on them along the way.

Problem


As a final assessment of sanity, we wanted to make sure that the final schema of the database updated to version 9.6 is as identical as possible to the current schema of the product database of version 9.2. When comparing the output of pg_dump, we quickly discovered a problem with the way the views were displayed. Version 9.2 uses a very meager, single-line output, while version 9.6 uses a multi-line “beautifully drawn” variation. Needless to say, this meant that none of the views matched when comparing the output of pg_dump.

The problem lies in the pg_get_viewdef () system functionwhich is used by pg_dump to return the human-readable and Postgres-recognizable version of the view. To demonstrate the problem and solution, create a simple presentation on both bases, and then compare them using pg_dump:

$ psql -p 5920 vtest -c \
'create view gregtest as select count(*) from pg_class where reltuples = 0'
CREATE VIEW
$ psql -p 5960 vtest -c \
'create view gregtest as select count(*) from pg_class where reltuples = 0'
CREATE VIEW
$ diff -u <(pg_dump vtest -x -p 5920 --schema-only) <(pg_dump vtest -x -p 5960 --schema-only)
--- /dev/fd/70          2016-09-29 12:34:56.019700912 -0400
+++ /dev/fd/72          2016-09-29 12:34:56.019720902 -0400
@@ -2,7 +2,7 @@
 -- PostgreSQL database dump
 --
--- Dumped from database version 9.2.18
+-- Dumped from database version 9.6.0
 -- Dumped by pg_dump version 9.6.0
 SET statement_timeout = 0;
@@ -35,22 +35,14 @@
 --
 CREATE VIEW gregtest AS
-SELECT count(*) AS count FROM pg_class WHERE (pg_class.reltuples = (0)::double precision);
+ SELECT count(*) AS count
+   FROM pg_class
+  WHERE (pg_class.reltuples = (0)::double precision);

The only difference besides the server version is the view, which does not match at all, which is why the diff utility is concerned. (For the purposes of this article, all secondary lines are removed from the output).

As mentioned earlier, the culprit is the pg_get_viewdef () function. His job is to present the filling of the presentation in an adequate, readable way. There are two main changes that she makes with this conclusion: adding brackets and adding indentation with spaces. In recent versions, despite the fact that the documents hinted, indentation (beautiful output) cannot be turned off, which means that there is no easy way to force a server with version 9.6 to give the difference in views in one line, as the server with version 9.2 does by default. Moreover, there are five versions of the pg_get_viewdef function, each of which takes different arguments:

  1. view name
  2. view name and boolean argument
  3. Oid
  4. OID and boolean argument
  5. OID and integer argument

In Postgres version 9.2, the pg_get_viewdef (text, boolean) version will turn on and off the indentation; moreover, you can see that indentation is not added by default:

$ psql vtest -p 5920 -Atc "select pg_get_viewdef('gregtest')"
 SELECT count(*) AS count FROM pg_class WHERE (pg_class.reltuples = (0)::double precision);
$ psql vtest -p 5920 -Atc "select pg_get_viewdef('gregtest',false)"
 SELECT count(*) AS count FROM pg_class WHERE (pg_class.reltuples = (0)::double precision);
$ psql vtest -p 5920 -Atc "select pg_get_viewdef('gregtest',true)"
 SELECT count(*) AS count                        +
   FROM pg_class                                 +
  WHERE pg_class.reltuples = 0::double precision;

In Postgres version 9.6, however, you always come across a “pretty” display, no matter which of the five versions of the function you choose and what arguments you pass them! Here is the call to the same function as in the example above on version 9.6:

$ psql vtest -p 5960 -Atc "select pg_get_viewdef('gregtest')"
  SELECT count(*) AS count
   FROM pg_class
  WHERE (pg_class.reltuples = (0)::double precision);
$ psql vtest -p 5960 -Atc "select pg_get_viewdef('gregtest',false)"
  SELECT count(*) AS count
   FROM pg_class
  WHERE (pg_class.reltuples = (0)::double precision);
$ psql vtest -p 5960 -Atc "select pg_get_viewdef('gregtest',true)"
  SELECT count(*) AS count
   FROM pg_class
  WHERE pg_class.reltuples = 0::double precision;

Solutions


When I first encountered this problem, three solutions surfaced in my mind:

  1. Write a script that will transform and normalize the output of the circuit
  2. Change Postgres source code to change pg_get_viewdef behavior
  3. Get pg_dump's call to pg_get_viewdef function in such a way as to get identical output

Initially, I thought that a quick Perl script would be the easiest way. And by the time I got one working version of the script, it was a lot of pain to turn the output from “beautiful” to “ugly”, especially spaces and the use of brackets. The brute force approach, by simply removing all parentheses and square brackets, extra spaces from rules and definitions of representations, almost worked, but the conclusion was pretty ugly hard to read, and there were still problems with extra spaces.

Approach number two, modifying the Postgres source code, is actually quite simple. At some point, the source code was changed in such a way that the insertion of the gaps was forced into the “on” state. Changing a single character in src / backend / utils / adt / ruleutils.c everything decided:

- #define PRETTYFLAG_INDENT    2
+ #define PRETTYFLAG_INDENT    0

Although this solution fixed the indentation and white space problem, the brackets are still different and this is not so easy to solve. In general, not the best solution.

The third approach was to change the source code of pg_dump. In particular, it uses the pg_get_viewdef (oid) function format. By changing this format to pg_get_viewdef (oid, integer), the function format and inputting an argument of 0, both version 9.2 and version 9.5 output the same thing:

$ psql vtest -p 5920 -tc "select pg_get_viewdef('gregtest'::regclass, 0)"
  SELECT count(*) AS count                        +
    FROM pg_class                                 +
   WHERE pg_class.reltuples > 0::double precision;
$ psql vtest -p 5960 -tc "select pg_get_viewdef('gregtest'::regclass, 0)"
  SELECT count(*) AS count                        +
    FROM pg_class                                 +
   WHERE pg_class.reltuples > 0::double precision;

This modified version will reproduce the same pattern in our test database:

$ diff -u <(pg_dump vtest -x -p 5920 --schema-only) <(pg_dump vtest -x -p 5960 --schema-only)
--- /dev/fd/80               2016-09-29 12:34:56.019801980 -0400
+++ /dev/fd/88               2016-09-29 12:34:56.019881988 -0400
@@ -2,7 +2,7 @@
 -- PostgreSQL database dump
 --
--- Dumped from database version 9.2.18
+-- Dumped from database version 9.6.0
 -- Dumped by pg_dump version 9.6.0
 SET statement_timeout = 0;

The best solution, according to my colleague David Christensen, is simply to make Postgres do all the heavy work by itself using the import / export magic. By the end of the day, pg_dump's output is not only human-readable, but also designed so that Postgres can recognize it. Thus, we can feed the old version 9.2 schema to the temporary base of version 9.6, then turn around and copy it. As a result, we have identical calls to pg_get_viewdef () for both schemes. Here it is on our test bases:

$ createdb -p 5960 vtest92
$ pg_dump vtest -p 5920 | psql -q -p 5960 vtest92
$ diff -s -u <(pg_dump vtest92 -x -p 5960 --schema-only) <(pg_dump vtest -x -p 5960 --schema-only)
Files /dev/fd/63 and /dev/fd/62 are identical

Output


Trying to compare the schemes of different versions can be quite difficult, so it’s better not to try. Copying and restoring schemes is a cheap operation, so just copy both schemes to one server, and then make a comparison.

Also popular now: