Another new feature of pg_filedump: restoring the PostgreSQL directory
In a previous article, we learned how to use pg_filedump utility to recover data, or at least some of it, from a completely killed PostgreSQL database. It was assumed that from somewhere we know the numbers of segments corresponding to the table. If we know some of the contents of the table, its segments are really not difficult to find, for example, with a simple grep. However, in a more general case, this is not so easy to do. In addition, it was assumed that we know the exact layout of the tables, which is also far from a fact. So, recently, my colleagues and I made a new patch for pg_filedump, which allows us to solve these problems.
So, let's say we want to restore a table called test. If we don’t remember the name of the table, it’s not scary, so using the method described below you can get the names of all tables in the database. Information about the tables is stored in the pg_class catalog table, the segment of which is always numbered 1259.
Using the latest version of pg_filedump, we can read pg_class as follows:
./pg_filedump -D name,oid,oid,oid,oid,oid,oid,~ /path/to/base/16384/1259 | grep COPY | grep test
Pay attention to the list of types for decoding, which we pass to pg_filedump:
name,oid,oid,oid,oid,oid,oid,~
Here at the beginning we pass the type names of the first seven columns of the table (the pg_class scheme is known and described in the documentation ), and the tilde says to ignore the remaining columns. In this case, they are still not interesting to us; there is no need to list them all.
Output Example:
COPY: test 220016387010016385COPY: test 220016387010016385COPY: test_pkey 2200001040316391
The last column is relfilenode, i.e. the segment number. We need him! Remember, 16385.
But wait, we don’t know the table layout. The pg_attribute catalog table will help us find it, the relfilenode of which is hardcoded and equal to 1249. By the way, you can look at the relfilenode of all catalog tables in the pg_class.h file .
Open the dock by pg_attribute , decode:
./pg_filedump -D oid,name,oid,int,smallint,~ /path/to/base/16384/1249 | grep COPY | grep 16385
Output Example:
COPY: 16385 k 23-14COPY: 16385 v 25-1-1COPY: 16385 ctid 2706COPY: 16385 xmin 2804COPY: 16385 cmin 2904COPY: 16385 xmax 2804COPY: 16385 cmax 2904COPY: 16385 tableoid 2604
As you can see, the table has two columns with the names k and v (the rest are system columns, they are needed for MVCC to work, and that's all). Here 23 and 25 are atttypid, that is, column types. But how to understand what these types are?
The answer is contained in the pg_type catalog table (relfilenode = 1247, dock ):
./pg_filedump -i -D name,~ /path/to/base/16384/1247 | grep -A5 -E 'OID: (23|25)'
Output Example:
XMIN: 1 XMAX: 0 CID|XVAC: 0OID: 23
Block Id: 0 linp Index: 8 Attributes: 30 Size: 32
infomask: 0x0909 (HASNULL|HASOID|XMIN_COMMITTED|XMAX_INVALID)
t_bits: [0]: 0xff [1]: 0xff [2]: 0xff [3]: 0x07COPY: int4--
XMIN: 1 XMAX: 0 CID|XVAC: 0OID: 25
Block Id: 0 linp Index: 10 Attributes: 30 Size: 32
infomask: 0x0909 (HASNULL|HASOID|XMIN_COMMITTED|XMAX_INVALID)
t_bits: [0]: 0xff [1]: 0xff [2]: 0xff [3]: 0x07COPY: text
So, now we have all the necessary information on hand. The table is called test, has relfilenode 16385 and contains two columns - k with type int4 and v with type text. Now we can dump its contents, as described in the previous article .
I hope that in practice you will never need this knowledge :) If you have questions or additions, I will be glad to see them in the comments!