litweg February 22, 2013 at 7:15 pm

Basics of thin provisioning of volumes on storage systems (or 3PAR thin provisioning anniversary)

This year marks 10 years since the sale of the first 3PAR storage system with Thin Provisioning technology. And despite the fact that the technology has become very popular and in demand, I still have not been able to meet an intelligible description of how this works at a low level. In this article I will try to highlight the most “dark”, in my opinion, sides of thin provisioning - the technical basis of this technology. That is, exactly how the host interacts with the storage system. These technologies are no longer exclusive to 3PAR, since now they are industry standards, but since thin provisioning technology first appeared in 3PAR, I will allow myself to give all the laurels to these arrays.

Why thin provisioning is needed

For those who have missed the previous 10 years, I still briefly remind you what thin provisioning is and why it is needed, while others can skip this section with a clear conscience.

Thin provisioning is a storage virtualization technology that maximizes storage utilization. This technology is necessary to reduce the use of disk space, which is not directly used to store application data. In particular, file systems are never 100% full under normal conditions. However, you always need to have a certain amount of free space to ensure the normal functioning of the file system and to ensure readiness for data growth. This virtually unused space is allocated for all logical volumes on the storage system. Logical volumes for which the disk space is allocated in full at the time of creation on the storage system are called "toasts."

It has the following disadvantages:

The space allocated to one logical volume (but not used) cannot be used by another volume. With the rapidly growing volume of data on one logical volume, sooner or later we will run into its size and the fact that there are quite a lot of unused disk space on other logical volumes will not help us. That is, free disk space is not represented by a common pool from which any volume can take capacities if necessary, but is essentially rigidly attached to each volume. Besides the fact that it is terribly wasteful, this scheme is also inconvenient if you need to redistribute the capacity between the volumes.
Since it is often very difficult to accurately predict the growth of application data, usually the sizes of thick volumes are chosen with a substantial margin. According to various studies, the utilization rate of data storage systems with thick volumes ranges from 30 to 50 percent. However, the unused disk space for these applications costs a certain amount of money, which could be spent on much more useful things.
When replicating or using snapshots on thick volumes, the disk array works with unused host blocks, as well as used ones. Although during replication it would be possible to copy only occupied blocks, and when using snapshots, do not copy (see copy-on-write) a free block into a snapshot, but simply mark it there not occupied. A similar replication technology is implemented in 3PAR arrays.

To solve such problems, thin provisioning and thin reclamation were invented, which we will talk in more detail about.

How thin provisioning works

The concept of thin provisioning is simple and consists in the following:

At the time of creation of the logical volume (LUN) on the disk array, the entire volume of the volume is not completely allocated. The LUN LBA -> Backend physical address mapping table is initialized. The storage system administrator indicates the maximum possible size of the volume and the volume full threshold at which it will receive a warning.
The allocation of new data blocks for a logical volume occurs as the volume is full.
When the server releases data blocks, it must report the released blocks to the storage system in order to return them to the shared pool. The technology is called thin reclamation and is described below.
When the server asks for the volume size (SCSI Read Capacity), the storage system gives the maximum volume size that the storage administrator set.
The sum of the maximum volumes of all volumes on the storage system can exceed the physically available space on the storage system.

Based on the foregoing, it is not difficult to present a thin provisioning work scheme. When the storage system receives the SCSI Write command (encapsulated on the FC, SAS, iSCSI, etc.) stack, it allocates the next batch of data and writes the data from SCSI Write there. In the case of 3PAR, blocks are allocated with a size of 16K.

How thin reclamation works

And now we will discuss much more interesting and non-obvious points - how the host interacts with the storage system to return the free disk space to the shared pool. The interaction of the host and the storage system is an extremely important nuance, since only the host knows which blocks can be deleted and which cannot. Thin reclamation technology was first implemented on 3PAR arrays and today is an industry standard approved by the International Committee for Standardization in the Field of Information Technologies (INCITS). The document is called the T10 SBC-3 and expands the SCSI standard with new commands for interacting with storage systems (these commands were added in the eighteenth revision of the document on February 23, 2009). A similar standard is also available for ATA / SATA devices.

To implement thin provisioning, the standard provides 3 SCSI commands:

Unmap
WRITE SAME
GET LBA STATUS

The standard requires all storage systems with thin provisioning to support at least the UNMAP command or the WRITE SAME command with the unmap bit. Consider the API described by the protocol.

Unmap

Tells the storage system to free one or more groups of consecutive LBAs (Logical Block Address). The storage system should mark the LBA data as free (unmapped, in SCSI terms), free up space on the backend and wipe the data that was previously there in the background process in case these blocks are then allocated to another host. This command exclusively transmits service information in the form of many pairs consisting of “LBA Address” and “Number of Logical Blocks”.

WRITE SAME

If for some reason the host does not want to use the UNMAP command, it can get a similar effect using the WRITE SAME command. For this, the unmap bit field is provided. If the WRITE SAME command with the unmap bit set arrives on an array with thin provisioning and the volume on the array is thin, then the array will do the same as with the UNMAP command. It differs from the UNMAP (42h) command in that, using WRITE SAME, it is not possible to specify a large number of blocks to be freed. Only one pair of “LBA Address” and “Number of Logical Blocks” can be specified.

Also, do not forget that the WRITE SAME team is primarily a team for recording data. If the unmap bit is not set, the storage system does not support thin provisioning, or the volume is thick, then the usual operation of writing data to the specified LBA will be performed. It follows that in these cases, SCSI READ must return exactly the data that was written there. Here, some manufacturers like the same Hewlett are cunning and instead of sequentially writing the same type of data (for example, zeros), these blocks are marked in the metadata of the logical volume as allocated but "filled with zeros." This technology is called - zero detection.

GET LBA STATUS

This is a service operation (device specific) and it uses the SERVICE ACTION IN (9Eh) command code. It allows the server to know:
1. Does the volume support thin provisioning.
2. The status of a specific block on the storage system (whether real capacities are allocated for it on the backend or not).
3. Thin provisioning granularity for volume.
4. Limits (alarming level and maximum volume).

The command is very useful, for example, for background searches from the host side of blocks allocated on an array, but not used by the host to store data or when moving from thick to thin volumes.

In conclusion.

I am very glad that you read to the last lines! Unfortunately, I didn’t say anything about the support for thin provisioning from file systems, databases and OS, I didn’t tell when it makes sense to use it at all - and this is a very interesting topic, in my opinion, but unfortunately a lengthy one. Maybe I'll come back to her later.

Tags: