Block device performance analysis with blktrace


    I / O operations are known to be critical resources in terms of performance in modern linux systems. Identifying and analyzing performance bottlenecks in Linux systems is quite complicated. Typically, specialized utilities are used for this purpose. Among the most well-known performance analysis tools, it is worth mentioning first of all the utilities included in the sysstat package (iostat, sar, etc.). However, in some situations, the information obtained using these utilities is not enough. For example, using iostat, you cannot find out which process is performing a particular operation. Meanwhile, such information is necessary when solving some specific problems: for example, to search and analyze bottlenecks in data storage systems.

    Well-known Linux kernel developer Jens Axbo in 2007 created blktrace, a special utility that traces I / O operations and provides the user with detailed information about them. In this article we would like to talk in detail about the capabilities of blktrace.


    Using blktrace, you can solve the following problems:
    • analyze the performance of block devices;
    • calculate potential resource costs (for example, when connecting software RAID);
    • analyze the performance of various hardware configurations;
    • determine the optimal configuration for a specific software environment;
    • evaluate the performance of different file systems: different file systems (ext4, JFS, XFS, Btrfs) interact differently with the block I / O subsystem, and using blktrace you can determine which file system can provide the best performance when working with a specific application or hardware configuration.

    Installation and getting started

    Blktrace is available on most common Linux distributions, so you don't need to compile anything from the source to install it. Installation is carried out in the standard way using the package manager. Along with blktrace, the blkparse utility is also installed, presenting the findings in a more convenient, human-readable form.

    Now execute the following command:
    blktrace -w 30 -d / dev / sdf -o-

    The command line arguments in this case mean the following:
    • “W” means the period of time during which the observation will be carried out (in this case, 30 seconds);
    • after the argument “d”, the device is indicated for which statistics on I / O operations will be collected;
    • “-–” is an indication that all statistics will be displayed on the console, and not saved in a special text file.

    You can read more about command line arguments and command syntax in the official documentation .

    The following table will appear on the screen:
    === sdd ===
    CPU 0: 34 events, 2 KiB data
    CPU 1: 27 events, 2 KiB data
    CPU 2: 41 events, 2 KiB data
    CPU 3: 46 events, 3 KiB data
    CPU 4: 2769 events, 130 KiB data
    CPU 5: 1718 events, 81 KiB data
    CPU 6: 1326 events, 63 KiB data
    CPU 7: 2279 events, 107 KiB data
    CPU 8: 14 events, 1 KiB data
    CPU 9: 12 events, 1 KiB data
    CPU 10: 22 events, 2 KiB data
    CPU 11: 50 events, 3 KiB data
    CPU 12: 455 events, 22 KiB data
    CPU 13: 184 events, 9 KiB data
    CPU 14: 508 events, 24 KiB data
    CPU 15: 1100 events, 52 KiB data
    Total: 10585 events (dropped 0), 497 KiB data

    It displays information on the loading of processor cores, which has little practical value: it cannot be used to draw any conclusions about the performance of I / O operations.

    To get more detailed information presented in an understandable form, we will resort to the help of the blkparse utility:
    blktrace -w 1 -d / dev / sdf -o - | blkparse -i -

    Now the output will look like this:
    8.32 0 19190 28.774795629 2039 DR 94229760 + 32 [fio]
    8.32 0 19191 29.927624071 0 CR 94229760 + 32 [0]

    / next, statistics on I / O operations for all involved processor cores are displayed; we will give an example of such statistics for one core /

    CPU15 (8.32):
    Reads Queued: 0, 0KiB Writes Queued: 64, 354 KiB         
    Read Dispatches: 0 0KiB Write Dispatches: 33, 276 KiB
    Reads Requeued: 0 Writes Requeued: 0
    Reads Completed: 0.0KiB Writes Completed: 0.0 KiB          
    Read Merges: 0, 0KiB Write Merges: 0,0KiB        
    Read depth: 0 Write depth: 68
    IO unplugs: 22 Timer unplugs: 16
    Total (8.32)
    Reads Queued: 0, 0KiB Writes Queued: 1908, 7665KiB
    Read Dispatches: 0, 0KiB Write Dispatches: 1,009.7665KiB
    Reads Requeued: 0 Writes Requeued: 0
    Reads Completed: 0, 0KiB Writes Completed: 1954,7655KiB
    Read Merges: 0,0KiB Write Merges: 0, 0KiB
    IO unplugs: 612 Timer unplugs: 382
    Throughput (R / W): 0KiB / s / 7701KiB / s
    Events (8.32): 11684 entries
    Skips: 0 forward (0 - 0.0%)

    At the beginning there is a table consisting of the following columns:
    1. major and minor device numbers (in our case, 8, 32);
    2. kernel involved in the operation;
    3. sequence number of the operation;
    4. operation execution time (in milliseconds);
    5. process identifier (PID);
    6. event (blktrace monitors the life cycle events of all I / O, including its own);
    7. RWBS (R — reading, W — writing, B — barrier operation, S — synchronous operation);
    8. the block from which the operation began + the number of blocks;
    9. the name of the process that performed the operation (indicated in square brackets).

    The main operations are indicated as follows:
    • A - I / O operation was transferred to another device;
    • C - operation completed;
    • F - the operation is combined with the adjacent operation in the queue;
    • I - the request for the operation is queued;
    • M - the operation is combined with the adjacent operation in the queue;
    • Q - the operation is queued;
    • T - disabled due to timeout;
    • X - the operation is divided into several operations.

    Next, blkparse displays the total information about all I / O operations and compares the load level of read and write operations.

    Auxiliary tools

    Blktrace receives data and presents it in human-readable form, but does not analyze it. Specialized utilities are designed to analyze this data and build graphs on their basis - here, first of all, btt and seekwatcher / iowatcher should be called.


    The name of this utility is an abbreviation of the expression blktrace timeline, which can be translated as "chronicle blktrace". It is designed to analyze files in which blktrace outputs processed by blkparse are stored, extracting information from them:
    • about the time spent on processing the operation before placing it in the queue;
    • about the time spent waiting in line;
    • about the time spent directly on the operation.

    To get a btt report, you must first trace the I / O operations with blktrace and save it in a separate file:
    blktrace -d / dev / sda -o-> trace

    Now process this file with blkparse; save the processed result in a separate file:
    blkparse -i trace -d trace1

    (the -d argument in this case indicates the file to which the processed data will be saved).

    Now we process the resulting output using btt by running the following command:
    btt -i trace1

    The report will be displayed on the screen in the form of a table. You can read more about the structure of btt conclusions and their interpretation in the official documentation.

    Seekwatcher / Iowatcher

    The seekwatcher utility was created in 2007 by Chris Mason. It was intended for processing blktrace reports and plotting charts, including animated ones. The seekwatcher project site exists to this day, but has a more memorial character.

    Today, Chris Mason is developing a new blktrace data visualization tool - iowatcher. You can install iowatcher from the repository . iowatcher requires a minimum of dependencies: to create animated graphics, you only need to install the ffmpeg or librsvg programs.

    Using iowatcher, you can build graphs (including animated ones) based on blktrace pins, as well as btt, fio and mstat utilities.
    To plot, you must first run blktrace and save the output to a text file:
    blktrace -w 30 -d / dev / sdf -o-> trace.dump

    Then enter the following command:
    iowatcher -t trace.dump -o trace.svg

    iowatcher will present blktrace data in a graph.

    An animated chart can be obtained using the command:
    iowatcher -t trace.dump --movie -o trace.mp4 /

    You can read more about command syntax here .

    Where is blktrace used

    Blktrace is used as an auxiliary tool in software solutions designed to analyze and diagnose performance problems in storage systems.

    For example, LSI produces several models of SSD-drives made in the form of PCI-express cards. To help the user choose the most optimal model, the company has developed a special software product - Nytro Predictor. Nytro Predictor collects information about storage usage activity by applications and formulates recommendations based on them to improve response time. Blktrace is used as a data collection tool on Linux systems. Then this data is processed using special algorithms, after which a hardware solution is selected that allows for optimal speed.

    Intel is releasing a similar product that uses software components from LSI - Intel RAID SSD Cache Sizing and Performance Prediction Tool . It also uses blktrace as a tool for collecting statistics.

    The problem of accelerating access to data and reducing response time is very relevant for social networks with a large number of users. Active work to solve this problem is being carried out by Facebook programmers, who in 2010 created Flashcache - a module for the Linux kernel that allows using one block device to cache access to other block devices. The product is licensed under the GPL (here is the repository on GitHub ). Flashcache developers used blktrace to analyze disk accesses performed by database applications.

    For those who cannot comment on posts on Habré, we invite to our blog .

    Also popular now: