
MVCC-2. Layers, Files, Pages
Last time we talked about data consistency, looked at the difference between different levels of transaction isolation through the eyes of the user, and figured out why it is important to know. Now we are starting to learn how PostgreSQL implements snapshot-based isolation and multi-versioning.
In this article, we will look at how data is physically located in files and pages. This leads us away from the topic of isolation, but such a digression is necessary for understanding further material. We need to understand how low-level data storage works.
If you look inside the tables and indexes, it turns out that they are arranged in a similar way. Both that, and another - base objects which contain some data consisting of lines.
The fact that the table consists of rows is beyond doubt; for the index, this is less obvious. However, imagine a B-tree: it consists of nodes that contain indexed values and links to other nodes or to table rows. These nodes can be considered index lines - in fact, the way it is.
In fact, there are still a number of objects arranged in a similar way: sequences (essentially single-row tables), materialized views (essentially tables that remember the query). And then there are the usual views, which by themselves do not store data, but in all other senses are similar to tables.
All of these objects in PostgreSQL called the common word attitude (in English relation ). The word is extremely unfortunate because it is a term from relational theory. You can draw a parallel between the relation and the table (view), but certainly not between the relation and the index. But it so happened: PostgreSQL's academic roots make themselves felt. I think that at first it was called tables and views, and the rest grew over time.
Further, for simplicity, we will only talk about tables and indexes, but the rest of the relationships are structured exactly the same.
Usually, each relation has several layers (forks). Layers are of several types and each of them contains a certain type of data.
If there is a layer, then at first it is represented by a single file . The file name consists of a numerical identifier to which the ending corresponding to the name of the layer can be added.
The file gradually grows and when its size reaches 1 GB, the next file of the same layer is created (such files are sometimes called segments ). The segment number is appended to the end of the file name.
The 1 GB file size limitation has arisen historically to support various file systems, some of which cannot work with large files. The restriction can be changed when building PostgreSQL (
Thus, several files can correspond to one relation on a disk. For example, for a small table there will be 3 of them.
All files of objects belonging to one table space and one database will be placed in one directory. This must be taken into account because file systems usually do not work very well with a large number of files in a directory.
Just note that the files, in turn, are divided into pages (or blocks ), usually 8 KB. We’ll talk about the internal structure of the pages below.

Now let's look at the types of layers.
The main layer is the data itself: the same table or index rows. The main layer exists for any relationship (except for representations that do not contain data).
The names of the files in the main layer consist of only a numerical identifier. Here is an example path to the table file that we created last time:
Where do these identifiers come from? The base directory corresponds to the pg_default tablespace, the next subdirectory corresponds to the database, and the file we are interested in is already in it:
The path is relative, it is counted from the data directory (PGDATA). Moreover, almost all paths in PostgreSQL are counted from PGDATA. Thanks to this, you can safely transfer PGDATA to another place - it does not hold anything (unless you may need to configure the path to the libraries in LD_LIBRARY_PATH).
We look further in the file system:
An initialization layer exists only for non-journaled tables (created with UNLOGGED) and their indexes. Such objects are no different from ordinary ones, except that actions with them are not recorded in the prerecord log. Due to this, work with them is faster, but in the event of a failure it is impossible to restore data in a consistent state. Therefore, when recovering, PostgreSQL simply deletes all layers of such objects and writes the initialization layer to the place of the main layer. The result is a "dummy". We will talk about journaling in detail, but in a different cycle.
The accounts table is journaled, so there is no initialization layer for it. But for the experiment, you can disable logging:
The ability to enable and disable journaling on the fly, as can be seen from the example, involves overwriting data into files with different names.
The initialization layer has the same name as the main layer, but with the suffix "_init":
Map of free space (free space map) - a layer in which there is an empty space inside the pages. This place is constantly changing: when new versions of strings are added, it decreases, while cleaning - it increases. The free space map is used when inserting new versions of rows to quickly find a suitable page on which the data to be added will fit.
The free space map has the suffix "_fsm". But the file does not appear immediately, but only if necessary. The easiest way to achieve this is to clean the table (why - let's talk in due time):
A visibility map is a layer in which pages that contain only current versions of strings are marked with one bit. Roughly speaking, this means that when a transaction tries to read a line from such a page, the line can be displayed without checking its visibility. We will examine in detail how this happens in the following articles.
As we already said, files are logically divided into pages.
Typically, a page is 8 KB in size. The size within some limits can be changed (16 KB or 32 KB), but only during assembly (
Regardless of which layer the files belong to, they are used by the server in approximately the same way. Pages are first read into the buffer cache, where processes can read and modify them; then, if necessary, the pages are pushed back to disk.
Each page has internal markup and generally contains the following sections:
The size of these sections is easy to find out with the “research” pageinspect extension:
Here we look at the title of the very first (zero) page of the table. In addition to the size of the remaining areas, the header contains other information about the page, but it does not interest us yet.
At the bottom of the page is a special area , in our case, empty. It is used only for indexes, and then not for everyone. The “bottom” here corresponds to the picture; perhaps it would be more correct to say “at high addresses”.
Following the special area are the row versions - the very data that we store in the table, plus some overhead information.
At the top of the page, immediately after the heading, is the table of contents: an array of pointers to the version of the lines available in the page.
Between versions of lines and pointers there may be free space (which is marked in the free space map). Note that there is no fragmentation inside the page, all free space is always represented by one fragment.
Why are pointers to string versions necessary? The fact is that index rows must somehow refer to the version of the rows in the table. It is clear that the link should contain the file number, page number in the file and some indication of the version of the line. An offset from the beginning of the page could be used as such an indication, but this is inconvenient. We would not be able to move the version of the line inside the page because it would break existing links. And this would lead to fragmentation of the space inside the pages and other unpleasant consequences. Therefore, the index refers to the index number, and the pointer refers to the current position of the row version in the page. It turns out indirect addressing.
Each pointer occupies exactly 4 bytes and contains:
The data format on the disk completely coincides with the representation of the data in RAM. The page is read into the buffer cache "as is", without any transformations. Therefore, data files from one platform are incompatible with other platforms.
For example, in x86 architecture, the byte order is adopted from the least significant to the highest (little-endian), z / Architecture uses the reverse order (big-endian), and in ARM the switch order.
Many architectures provide data alignment across machine word boundaries. For example, on an x86 32-bit system, integers (integer type, occupies 4 bytes) will be aligned on the border of 4-byte words, as well as double-precision floating-point numbers (double precision type, 8 bytes). And on a 64-bit system, double values will be aligned on the border of 8-byte words. This is another reason for incompatibility.
Due to alignment, the size of the table row depends on the order of the fields. Usually this effect is not very noticeable, but in some cases it can lead to a significant increase in size. For example, if you place the char (1) and integer fields mixed up, 3 bytes will usually be wasted between them. You can see more about this in Nikolai Shaplov’s presentation “ What is inside his"
About how versions of strings are arranged from the inside, we will talk in detail next time. So far, the only important thing for us is that each version should fit entirely on one page: PostgreSQL does not provide a way to "continue" the line on the next page. Instead, a technology called TOAST (The Oversized Attributes Storage Technique) is used. The name itself suggests that the string can be cut into toasts.
Seriously speaking, TOAST involves several strategies. “Long” attribute values can be sent to a separate service table, previously cut into small pieces of toasts. Another option is to compress the value so that the version of the row still fits on a regular table page. And it is possible both that, and another: at first to compress, and only then to cut and send.
For each main table, if necessary, a separate, but one for all attributes, TOAST table (and a special index for it) is created. Necessity is determined by the presence of potentially long attributes in the table. For example, if a table has a column of type numeric or text, a TOAST table will be created immediately, even if long values are not used.
Since the TOAST table is essentially a regular table, it still has the same set of layers. And this doubles the number of files that “serve” the table.
Initially, strategies are determined by column data types. You can view them with a command
The names of the strategies have the following meanings:
In general terms, the algorithm is as follows. PostgreSQL wants at least 4 lines to fit on a page. Therefore, if the size of the line exceeds the fourth part of the page, taking into account the heading (with a normal 8K page this is 2040 bytes), TOAST should be applied to part of the values. We act in the order described below and stop as soon as the line ceases to exceed the threshold:
Sometimes it may be useful to change the strategy for some columns. For example, if it is known in advance that the data in the column is not compressed, you can set an external strategy for it - this will save on useless compression attempts. This is done as follows:
Repeating the request, we get:
TOAST tables and indexes are located in a separate pg_toast schema and therefore are usually not visible. For temporary tables used pg_toast_temp_ circuit N is similar to a conventional pg_temp_ N .
Of course, if desired, no one bothers to peek at the internal mechanics of the process. Say there are three potentially long attributes in the accounts table, so a TOAST table must be. Here she is:
It is logical that for the “toasts” into which the line is sliced, the plain strategy is applied: TOAST of the second level does not exist.
The PostgreSQL index hides more carefully, but it is also easy to find:
The client column uses the extended strategy: the values in it will be compressed. Check:
There is nothing in the TOAST table: repeating characters are perfectly compressed and after that the value fits in a regular table page.
Now let the client name consist of random characters:
This sequence cannot be compressed, and it falls into the TOAST table:
As you can see, the data is cut into fragments of 2000 bytes.
When accessing a "long" value, PostgreSQL automatically, transparent to the application, restores the original value and returns it to the client.
Of course, quite a lot of resources are spent on sliced compression and subsequent recovery. Therefore, storing voluminous data in PostgreSQL is not a good idea, especially if it is actively used and transactional logic is not required for them (as an example: scanned originals of accounting documents). A more profitable alternative might be storing such data on the file system, and in the DBMS, the names of the corresponding files.
A TOAST table is used only when referring to a “long” value. In addition, the toast table has its own versioning: if the data update does not affect the "long" value, the new version of the row will refer to the same value in the TOAST table - this saves space.
Note that TOAST only works for tables, but not for indexes. This imposes a limit on the size of the indexed keys.
In this article, we will look at how data is physically located in files and pages. This leads us away from the topic of isolation, but such a digression is necessary for understanding further material. We need to understand how low-level data storage works.
Relations
If you look inside the tables and indexes, it turns out that they are arranged in a similar way. Both that, and another - base objects which contain some data consisting of lines.
The fact that the table consists of rows is beyond doubt; for the index, this is less obvious. However, imagine a B-tree: it consists of nodes that contain indexed values and links to other nodes or to table rows. These nodes can be considered index lines - in fact, the way it is.
In fact, there are still a number of objects arranged in a similar way: sequences (essentially single-row tables), materialized views (essentially tables that remember the query). And then there are the usual views, which by themselves do not store data, but in all other senses are similar to tables.
All of these objects in PostgreSQL called the common word attitude (in English relation ). The word is extremely unfortunate because it is a term from relational theory. You can draw a parallel between the relation and the table (view), but certainly not between the relation and the index. But it so happened: PostgreSQL's academic roots make themselves felt. I think that at first it was called tables and views, and the rest grew over time.
Further, for simplicity, we will only talk about tables and indexes, but the rest of the relationships are structured exactly the same.
Layers (forks) and files
Usually, each relation has several layers (forks). Layers are of several types and each of them contains a certain type of data.
If there is a layer, then at first it is represented by a single file . The file name consists of a numerical identifier to which the ending corresponding to the name of the layer can be added.
The file gradually grows and when its size reaches 1 GB, the next file of the same layer is created (such files are sometimes called segments ). The segment number is appended to the end of the file name.
The 1 GB file size limitation has arisen historically to support various file systems, some of which cannot work with large files. The restriction can be changed when building PostgreSQL (
./configure --with-segsize
). Thus, several files can correspond to one relation on a disk. For example, for a small table there will be 3 of them.
All files of objects belonging to one table space and one database will be placed in one directory. This must be taken into account because file systems usually do not work very well with a large number of files in a directory.
Just note that the files, in turn, are divided into pages (or blocks ), usually 8 KB. We’ll talk about the internal structure of the pages below.

Now let's look at the types of layers.
The main layer is the data itself: the same table or index rows. The main layer exists for any relationship (except for representations that do not contain data).
The names of the files in the main layer consist of only a numerical identifier. Here is an example path to the table file that we created last time:
=> SELECT pg_relation_filepath('accounts');
pg_relation_filepath
----------------------
base/41493/41496
(1 row)
Where do these identifiers come from? The base directory corresponds to the pg_default tablespace, the next subdirectory corresponds to the database, and the file we are interested in is already in it:
=> SELECT oid FROM pg_database WHERE datname = 'test';
oid
-------
41493
(1 row)
=> SELECT relfilenode FROM pg_class WHERE relname = 'accounts';
relfilenode
-------------
41496
(1 row)
The path is relative, it is counted from the data directory (PGDATA). Moreover, almost all paths in PostgreSQL are counted from PGDATA. Thanks to this, you can safely transfer PGDATA to another place - it does not hold anything (unless you may need to configure the path to the libraries in LD_LIBRARY_PATH).
We look further in the file system:
postgres$ ls -l --time-style=+ /var/lib/postgresql/11/main/base/41493/41496
-rw------- 1 postgres postgres 8192 /var/lib/postgresql/11/main/base/41493/41496
An initialization layer exists only for non-journaled tables (created with UNLOGGED) and their indexes. Such objects are no different from ordinary ones, except that actions with them are not recorded in the prerecord log. Due to this, work with them is faster, but in the event of a failure it is impossible to restore data in a consistent state. Therefore, when recovering, PostgreSQL simply deletes all layers of such objects and writes the initialization layer to the place of the main layer. The result is a "dummy". We will talk about journaling in detail, but in a different cycle.
The accounts table is journaled, so there is no initialization layer for it. But for the experiment, you can disable logging:
=> ALTER TABLE accounts SET UNLOGGED;
=> SELECT pg_relation_filepath('accounts');
pg_relation_filepath
----------------------
base/41493/41507
(1 row)
The ability to enable and disable journaling on the fly, as can be seen from the example, involves overwriting data into files with different names.
The initialization layer has the same name as the main layer, but with the suffix "_init":
postgres$ ls -l --time-style=+ /var/lib/postgresql/11/main/base/41493/41507_init
-rw------- 1 postgres postgres 0 /var/lib/postgresql/11/main/base/41493/41507_init
Map of free space (free space map) - a layer in which there is an empty space inside the pages. This place is constantly changing: when new versions of strings are added, it decreases, while cleaning - it increases. The free space map is used when inserting new versions of rows to quickly find a suitable page on which the data to be added will fit.
The free space map has the suffix "_fsm". But the file does not appear immediately, but only if necessary. The easiest way to achieve this is to clean the table (why - let's talk in due time):
=> VACUUM accounts;
postgres$ ls -l --time-style=+ /var/lib/postgresql/11/main/base/41493/41507_fsm
-rw------- 1 postgres postgres 24576 /var/lib/postgresql/11/main/base/41493/41507_fsm
A visibility map is a layer in which pages that contain only current versions of strings are marked with one bit. Roughly speaking, this means that when a transaction tries to read a line from such a page, the line can be displayed without checking its visibility. We will examine in detail how this happens in the following articles.
postgres$ ls -l --time-style=+ /var/lib/postgresql/11/main/base/41493/41507_vm
-rw------- 1 postgres postgres 8192 /var/lib/postgresql/11/main/base/41493/41507_vm
Pages
As we already said, files are logically divided into pages.
Typically, a page is 8 KB in size. The size within some limits can be changed (16 KB or 32 KB), but only during assembly (
./configure --with-blocksize
). The assembled and running instance can work with pages of only one size. Regardless of which layer the files belong to, they are used by the server in approximately the same way. Pages are first read into the buffer cache, where processes can read and modify them; then, if necessary, the pages are pushed back to disk.
Each page has internal markup and generally contains the following sections:
0 + ----------------------------------- + | heading | 24 + ----------------------------------- + | array of pointers to version strings | lower + ----------------------------------- + | free space | upper + ----------------------------------- + | row versions | special + ----------------------------------- + | special area | pagesize + ----------------------------------- +
The size of these sections is easy to find out with the “research” pageinspect extension:
=> CREATE EXTENSION pageinspect;
=> SELECT lower, upper, special, pagesize FROM page_header(get_raw_page('accounts',0));
lower | upper | special | pagesize
-------+-------+---------+----------
40 | 8016 | 8192 | 8192
(1 row)
Here we look at the title of the very first (zero) page of the table. In addition to the size of the remaining areas, the header contains other information about the page, but it does not interest us yet.
At the bottom of the page is a special area , in our case, empty. It is used only for indexes, and then not for everyone. The “bottom” here corresponds to the picture; perhaps it would be more correct to say “at high addresses”.
Following the special area are the row versions - the very data that we store in the table, plus some overhead information.
At the top of the page, immediately after the heading, is the table of contents: an array of pointers to the version of the lines available in the page.
Between versions of lines and pointers there may be free space (which is marked in the free space map). Note that there is no fragmentation inside the page, all free space is always represented by one fragment.
Pointers
Why are pointers to string versions necessary? The fact is that index rows must somehow refer to the version of the rows in the table. It is clear that the link should contain the file number, page number in the file and some indication of the version of the line. An offset from the beginning of the page could be used as such an indication, but this is inconvenient. We would not be able to move the version of the line inside the page because it would break existing links. And this would lead to fragmentation of the space inside the pages and other unpleasant consequences. Therefore, the index refers to the index number, and the pointer refers to the current position of the row version in the page. It turns out indirect addressing.
Each pointer occupies exactly 4 bytes and contains:
- link to the version of the string;
- the length of this version of the string;
- several bits that determine the version status of a string.
Data format
The data format on the disk completely coincides with the representation of the data in RAM. The page is read into the buffer cache "as is", without any transformations. Therefore, data files from one platform are incompatible with other platforms.
For example, in x86 architecture, the byte order is adopted from the least significant to the highest (little-endian), z / Architecture uses the reverse order (big-endian), and in ARM the switch order.
Many architectures provide data alignment across machine word boundaries. For example, on an x86 32-bit system, integers (integer type, occupies 4 bytes) will be aligned on the border of 4-byte words, as well as double-precision floating-point numbers (double precision type, 8 bytes). And on a 64-bit system, double values will be aligned on the border of 8-byte words. This is another reason for incompatibility.
Due to alignment, the size of the table row depends on the order of the fields. Usually this effect is not very noticeable, but in some cases it can lead to a significant increase in size. For example, if you place the char (1) and integer fields mixed up, 3 bytes will usually be wasted between them. You can see more about this in Nikolai Shaplov’s presentation “ What is inside his"
String and TOAST Versions
About how versions of strings are arranged from the inside, we will talk in detail next time. So far, the only important thing for us is that each version should fit entirely on one page: PostgreSQL does not provide a way to "continue" the line on the next page. Instead, a technology called TOAST (The Oversized Attributes Storage Technique) is used. The name itself suggests that the string can be cut into toasts.
Seriously speaking, TOAST involves several strategies. “Long” attribute values can be sent to a separate service table, previously cut into small pieces of toasts. Another option is to compress the value so that the version of the row still fits on a regular table page. And it is possible both that, and another: at first to compress, and only then to cut and send.
For each main table, if necessary, a separate, but one for all attributes, TOAST table (and a special index for it) is created. Necessity is determined by the presence of potentially long attributes in the table. For example, if a table has a column of type numeric or text, a TOAST table will be created immediately, even if long values are not used.
Since the TOAST table is essentially a regular table, it still has the same set of layers. And this doubles the number of files that “serve” the table.
Initially, strategies are determined by column data types. You can view them with a command
\d+
in psql, but since it also displays a lot of other information, we will use the request to the system directory:=> SELECT attname, atttypid::regtype, CASE attstorage
WHEN 'p' THEN 'plain'
WHEN 'e' THEN 'external'
WHEN 'm' THEN 'main'
WHEN 'x' THEN 'extended'
END AS storage
FROM pg_attribute
WHERE attrelid = 'accounts'::regclass AND attnum > 0;
attname | atttypid | storage
---------+----------+----------
id | integer | plain
number | text | extended
client | text | extended
amount | numeric | main
(4 rows)
The names of the strategies have the following meanings:
- plain - TOAST is not used (used for obviously “short” data types, like integer);
- extended - both compression and storage in a separate TOAST table are allowed;
- external - long values are stored in the TOAST table uncompressed;
- main - long values are compressed first and only in the TOAST table if the compression did not help.
In general terms, the algorithm is as follows. PostgreSQL wants at least 4 lines to fit on a page. Therefore, if the size of the line exceeds the fourth part of the page, taking into account the heading (with a normal 8K page this is 2040 bytes), TOAST should be applied to part of the values. We act in the order described below and stop as soon as the line ceases to exceed the threshold:
- First, we sort through attributes with external and extended strategies, moving from the longest to the shorter. Extended attributes are compressed (if this has an effect) and, if the value itself exceeds a quarter of the page, it is immediately sent to the TOAST table. External attributes are handled in the same way, but are not compressed.
- If after the first pass the version of the row still does not fit, we send the remaining attributes with the external and extended strategies to the TOAST table.
- If this also does not help, try to compress the attributes with the main strategy, while leaving them in the table page.
- And only if after that the row is still not short enough, the main attributes are sent to the TOAST table.
Sometimes it may be useful to change the strategy for some columns. For example, if it is known in advance that the data in the column is not compressed, you can set an external strategy for it - this will save on useless compression attempts. This is done as follows:
=> ALTER TABLE accounts ALTER COLUMN number SET STORAGE external;
Repeating the request, we get:
attname | atttypid | storage
---------+----------+----------
id | integer | plain
number | text | external
client | text | extended
amount | numeric | main
TOAST tables and indexes are located in a separate pg_toast schema and therefore are usually not visible. For temporary tables used pg_toast_temp_ circuit N is similar to a conventional pg_temp_ N .
Of course, if desired, no one bothers to peek at the internal mechanics of the process. Say there are three potentially long attributes in the accounts table, so a TOAST table must be. Here she is:
=> SELECT relnamespace::regnamespace, relname
FROM pg_class WHERE oid = (
SELECT reltoastrelid FROM pg_class WHERE relname = 'accounts'
);
relnamespace | relname
--------------+----------------
pg_toast | pg_toast_33953
(1 row)
=> \d+ pg_toast.pg_toast_33953
TOAST table "pg_toast.pg_toast_33953"
Column | Type | Storage
------------+---------+---------
chunk_id | oid | plain
chunk_seq | integer | plain
chunk_data | bytea | plain
It is logical that for the “toasts” into which the line is sliced, the plain strategy is applied: TOAST of the second level does not exist.
The PostgreSQL index hides more carefully, but it is also easy to find:
=> SELECT indexrelid::regclass FROM pg_index
WHERE indrelid = (
SELECT oid FROM pg_class WHERE relname = 'pg_toast_33953'
);
indexrelid
-------------------------------
pg_toast.pg_toast_33953_index
(1 row)
=> \d pg_toast.pg_toast_33953_index
Unlogged index "pg_toast.pg_toast_33953_index"
Column | Type | Key? | Definition
-----------+---------+------+------------
chunk_id | oid | yes | chunk_id
chunk_seq | integer | yes | chunk_seq
primary key, btree, for table "pg_toast.pg_toast_33953"
The client column uses the extended strategy: the values in it will be compressed. Check:
=> UPDATE accounts SET client = repeat('A',3000) WHERE id = 1;
=> SELECT * FROM pg_toast.pg_toast_33953;
chunk_id | chunk_seq | chunk_data
----------+-----------+------------
(0 rows)
There is nothing in the TOAST table: repeating characters are perfectly compressed and after that the value fits in a regular table page.
Now let the client name consist of random characters:
=> UPDATE accounts SET client = (
SELECT string_agg( chr(trunc(65+random()*26)::integer), '') FROM generate_series(1,3000)
)
WHERE id = 1
RETURNING left(client,10) || '...' || right(client,10);
?column?
-------------------------
TCKGKZZSLI...RHQIOLWRRX
(1 row)
This sequence cannot be compressed, and it falls into the TOAST table:
=> SELECT chunk_id,
chunk_seq,
length(chunk_data),
left(encode(chunk_data,'escape')::text, 10) ||
'...' ||
right(encode(chunk_data,'escape')::text, 10)
FROM pg_toast.pg_toast_33953;
chunk_id | chunk_seq | length | ?column?
----------+-----------+--------+-------------------------
34000 | 0 | 2000 | TCKGKZZSLI...ZIPFLOXDIW
34000 | 1 | 1000 | DDXNNBQQYH...RHQIOLWRRX
(2 rows)
As you can see, the data is cut into fragments of 2000 bytes.
When accessing a "long" value, PostgreSQL automatically, transparent to the application, restores the original value and returns it to the client.
Of course, quite a lot of resources are spent on sliced compression and subsequent recovery. Therefore, storing voluminous data in PostgreSQL is not a good idea, especially if it is actively used and transactional logic is not required for them (as an example: scanned originals of accounting documents). A more profitable alternative might be storing such data on the file system, and in the DBMS, the names of the corresponding files.
A TOAST table is used only when referring to a “long” value. In addition, the toast table has its own versioning: if the data update does not affect the "long" value, the new version of the row will refer to the same value in the TOAST table - this saves space.
Note that TOAST only works for tables, but not for indexes. This imposes a limit on the size of the indexed keys.
You can read more about internal data organization in the documentation .To be continued .