kachini September 4, 2011 at 19:17

Optimization of backup speed using the file system (read ahead, read ahead)

From the sandbox

This article is addressed to engineers and consultants working with the performance of operations related to sequential reading of files. Basically, these are of course backups. Here you can also include reading large files from file storages, some database operations, for example, a full table scan (depending on the location of the data).

Examples are provided for the VxFS file system (Symantec). This file system is widely used in server systems and is supported on HP-UX, AIX, Linux, Solaris.

Why is this needed?

The question is how to get the maximum speed when sequentially reading data into one stream (!) From a large file (backup of a large number of small files beyond the scope of this article). By sequential reading, we consider this when blocks of data from physical disks are requested one after another, in order. We believe that file system fragmentation is absent. This is justified, since if there are few large files on the file system and they are rarely recreated, then they are practically not fragmented. This is a common situation for databases such as Oracle. Reading from a file in this case is not much different from reading from a raw device.

What is the limit of single-threaded reading?

The fastest modern drives (15K rpm) have a service time of about 5.5 ms (for queuing theory readers, we consider the wait time to be 0).
Let us determine the number of input-output operations that a process (backup) can perform:

1 / 0.0055 = 182 IO per second (iops).

If a process sequentially performs operations, each of which lasts 5.5 ms, it will execute 182 pieces per second. Suppose the block size is 256KB. Thus, the maximum throughput of this process will be: 182 * 256 = 46545 KB / s. (46 MB / s). Modestly, right? This is especially modest for systems with hundreds of physical spindles when we rely on a much faster read speed. The question is how to optimize this. It is impossible to reduce the access time to the disk, as these are technological limitations. Parallel backup is also not always possible. To remove this limitation, file systems implement a read ahead mechanism.

How read-ahead works

In modern * nix systems, there are two types of I / O requests: synchronous and asynchronous. With a synchronous request, the process is blocked until a response is received from the disk subsystem. When asynchronous, it does not block and can do anything else. In sequential reading, we read data synchronously. When the read-ahead mechanism is turned on, the file system code, immediately after the synchronous request, makes several more asynchronous ones. Suppose the process requested block number 1000. When read ahead is enabled, in addition to block 1000, 1001,1002,1003,1004 will also be requested. Thus, when requesting block 1001, we do not need to wait 5.5 ms. Using the read ahead setting, you can significantly (at times) increase the speed of sequential reads.

How is it configured?

A key read-ahead setting is its size. Looking ahead, I will say that with read ahead there are two main problems: insufficient read ahead and excessive. So, on VxFS, read ahead is configured using the read_pref_io and read_nstream parameters of the vxtunefs command. When read-ahead is enabled on VxFS, 4 blocks of read_pref_io size are initially requested. If the process continues to read sequentially, then 4 * read_pref_io * read_nstream is read.

Example

:
Let read_pref_io = 256k and read_nstream = 4

Thus, the initial read ahead is: 4 * 256KB = 1024KB.
If sequential reading continues, then: 4 * 4 * 256KB = 4096KB

It should be noted that in the latter case, 16 requests with a 256KB block will be sent to the disk subsystem almost simultaneously. This is not small and for a short time can load the array well. In the general case, it’s difficult to give any general advice in setting read_pref_io and read_nstream. Specific solutions always depend on the number of disks in the array and the nature of the load. For some loads, read_pref_io = 256k and read_nstream = 32 (very much) work fine. Sometimes, read_ahead is better off completely. Since the setting is simple and set on the fly, it is easiest to select the optimal value. The only thing that can be advised is to always set read_pref_io in powers of 2. Or at least so that they are a multiple of the size of the data block in the OS cache. Otherwise, the consequences can be unpredictable.

The effect of the OS buffer cache

When read ahead reads data asynchronously, they need to be stored somewhere in memory. To do this, use the file cache of the operating system. In some cases, the file system can be mounted with the file cache disabled (direct IO). Accordingly, the read ahead functionality is disabled in this case.

Key issues with forward reading:

1) Not enough read ahead. The block size requested by the application is larger than the block read through read ahead. For example, the 'cp' command can read in 1024 KB block, and read-ahead is configured to read 256KB. That is, there is simply not enough data to satisfy the application and another synchronous I / O request is needed. In this case, turning on read ahead will not bring an increase in speed.

2) Excessive read ahead
- too aggressive read ahead can simply overload the disk subsystem. Especially if there are few spindles in the backend. A large number of almost parallel requests dropped from the host can flood the disk array. In this case, instead of acceleration, you will see slowdowns in work.
- Another problem with read ahead may be misses when the file system incorrectly determines sequential reads, reads unnecessary data in the cache. This leads to spurious I / O, and creates additional load on the disks.
- since read ahead data is stored in the file system cache, a large amount of read ahead can lead to more valuable blocks being washed out of the cache. Then you will have to read these blocks from the disk again.

3) Conflict between read ahead file system and read ahead disk array
Fortunately, this is an extremely rare case. Most modern disk arrays equipped with cache memory and logic have their own read ahead mechanism at the hardware level. The logic of the array itself determines sequential reading and the controller in bulk reads data from physical disks into the array cache. This can significantly reduce the response time from the disk subsystem and increase the speed of sequential reads. A read ahead of the file system is slightly different than a regular synchronous read and may confuse the disk array controller. It may not recognize the nature of the load and may not enable hardware read ahead. For example, if the disk array is connected via SAN (Storage Area Networking) and there are several paths to it. Due to load balancing, asynchronous requests can arrive on different ports of the disk array almost simultaneously. In this case, the requests may not be processed by the controller in the order they were sent from the server. As a result, the array does not recognize sequential reads. Solving such problems can be the longest and most laborious. Sometimes the solution lies in the configuration area, sometimes it helps to disable one of the read ahead (if possible), sometimes it is necessary to change the code of one of the components.

Read Impact Example

The customer was unsatisfied with the backup time of the database. As a test, a single file of 50 GB in size was backed up. The following are the results of three tests with different file system settings.

Directories ... 0
Regular files ... 1
- Objects Total ... 1
Total Size ... 50.51 GB

1. Read ahead turned off (Direct IO)

Run Time ... 0:17:10
Backup Speed ... 71.99 MB / s

2. Standard settings for read ahead (read_pref_io = 65536, read_nstream = 1)

Run Time ... 0:05:17
Backup Speed ... 163.16 MB / s

3. Increased (strongly) read-ahead size (read_pref_io = 262144, read_nstream = 64)

Run Time ... 0:02:27
Backup Speed ... 222.91 MB / s

As you can see from the example, read ahead significantly reduced backup time. Further operation showed that all other tasks on the system worked normally with such a large read ahead size (test 3). No problems due to excessive read ahead were noticed. As a result, these settings were left.

Tags: