Intel Enterprise Edition - Lampshade for Luster

    Creating highly loaded cluster systems is not an easy task in itself; it is further complicated by the fact that such decisions require maximum balance. There is no place for “crutches” and “patches,” each component in the work must squeeze out the maximum number of flops and iops. This, of course, applies to one of the critical components of any hardware solution - the file system. In the process of developing supercomputers, several variants of specialized file systems were created, the most popular of which was Luster, the development of which took place in the last century and which is currently supported by Intel. Over the 3 years since the purchase of Intel by Whamcloud, the developer of Luster, the product has been supplemented with new functionality and tools. In this post you will learn which ones.

    As already mentioned, Luster is a fairly common system with a long history of development; Currently, more than 60 percent of Top500 supercomputers use it as a file system. Therefore, probably, you should not devote much time to its description; a very detailed introductory part can be found, for example, in the Wiki . Nevertheless, it is necessary to say a few words about the Luster construction scheme, since this will be needed for the future.

    So, Luster is a distributed parallel file system, that is, a set of servers that store their data and work independently of each other. Its foundation is the management server and metadata server (MGS / MDS). MDS is a repository of metadata (file names and their attributes), MGS is a place where information is stored, on which servers the file system is located. Server data can be on different computers, can be on one. The objects themselves (pieces of data from the contents of the file) are located on various devices (OST) under the control of storage servers (OSS). Unlike traditional file systems, the Luster inode is used as a key to search for a structure with information about the actual partition and location of the data. Thus, an additional layer of abstraction is created.

    The main advantage of this approach is that fragments of the file are stored on different servers and the request to them occurs in parallel. Instead of waiting for the data to be considered as one large chunk from one place, Luster breaks a large chunk into smaller ones and loads them in parallel from different places. Exceptional parallelization capabilities provide the required data processing speed, essentially limited only by the bandwidth of physical connections. The file system does not impose hard locks on files, but flexibly ensures data integrity using a special mechanism. This is very similar to synchronizing caches between different processors in a multiprocessor system.

    The current limit on the total size of the stored data is 512 Petabytes.

    Intel, like a number of other software manufacturers, offers its solution based on Luster. At its core, of course, is Luster FS itself, freely distributed under the GNU GPL software - today version 2.7. And then, add-ons and gadgets begin, created by a special unit of Intel High Performance Data Division (HPDD) and included in the package Intel Enterprise Edition for Luster Software .

    For developers, the Hadoop adapter, which allows you to run MapReduce , will be primarily usefulapplications directly on Luster. Thus, two birds with one stone are killed at once. First, Hadoop users have access to files located on Luster without the need to use the native Hadoop distributed file system and additional copy operations.

    Secondly, the system as a whole becomes simpler and prettier: Hadoop coexists with Luster and takes advantage of it, without consuming a separate place for itself. Another useful development and implementation tool is the API suite (including REST), which allows you to quickly and easily integrate third-party software and storage systems with Luster.

    There is even more good news for server and storage administrators. The Intel Enterprise Edition for Luster Software package contains the Intel Manager for Luster Software graphical application for launching, configuring, monitoring and administering the Luster system, as well as demonstrating malfunctions in it. The manager provides a graphical interface for any action related to managing the file system, and also visualizes Luster statistics based on numerous criteria, thus showing its status. Another tool necessary for an administrator is a command line interface with the ability to write scripts to automate routine maintenance and management processes.

    So, on the basis of a good product, which remains still freely available, Intel has created an entire ecosystem of software - first of all, for the convenience of its use and implementation in integrated solutions. Well, experts have the right to decide for themselves whether to simply use Luster or Luster with a “lampshade” in the form of Intel Enterprise Edition for Luster Software. Our gratitude to Dmitry Eremin,

    lead developer of the HPDD division, for his help in writing the post.

    Also popular now: