Fujitsu ETERNUS CD10000: Ceph without worries

    Today, many companies work with a huge amount of data. No, I’m not talking about BigData patterns now, but simply saying that nobody can surprise a dozen or so terabytes of data on the servers of a single company. But many go further - hundreds of terabytes, petabytes, tens of petabytes ... Of course, it’s good when your data and tasks for processing them fall under the ideology of mapreduce, but much more often all this data is either “just files”, or volumes of virtual machines, or data already structured and shaded in their own way. In such cases, the company comes up with the idea of ​​the need to deploy a storage system.

    Storage systems like OpenStack add to the popularity of storage today, because it’s nice to manage your servers without worrying that the drive does not work in one server, that one of the racks is de-energized. Do not worry about the fact that the hardware on one Most Important Server is outdated and you need to degrade your services to a minimum level to upgrade it. Of course, such cases can be a design error, but we will be honest - we can all make such mistakes.

    As a result, the company faces a difficult choice: to create storage systems independently on the basis of open source software (Ceph, MuseFS, hdfs - there are plenty to choose from with minimal integration costs, but you will have to spend time on design and deployment) or buy a ready-made proprietary storage system and spend time and forces on its integration (with the risk that the storage system will eventually reach the limit of its capacity or performance).

    But what if we take Ceph as a basis, for which it is difficult to come up with an impossible task in the field of data storage, enlist the support of some Ceph vendor (for example, Inktank, who created it), take modern servers with a large number of SAS disks, write web- management interface, add additional features for effective deployment and monitoring ... It sounds attractive, but difficult for the average company, especially if it is not an IT company.

    Fortunately, all this has already been taken care of at Fujitsu, in the form of the ETERNUS CD10000 product - the first enterprise storage based on Inktank Ceph Enterprise, with which we will introduce you today.

    ETERNUS CD10000 itself is a designer of modules. Modules are x86-servers with installed Linux, Ceph Enterprise and Fujitsu's own developments. This design of storage allows you to get the required storage capacity and gradually expand it in the future. There are two types of modules - a module with data and a module with metadata (more precisely, a management node).

    Storage servers are now represented by three models:

    • Basic (12.6 TB in one module, 1 SSD for cache, 2U)
    • Perfomance (34.2 TB, 2 SSDs for cache, 4U)
    • Capacity (252.6 TB in one module, 1 SSD for cache, 6U)


    Basic and Performance nodes are equipped with 2.5-inch SAS disks, and capacity modules can install up to 14 SAS disks and 60 SATA disks at the same time. The storages communicate with each other via infiniband - this applies to replication operations, recovery of lost copies of blocks, and the exchange of other service information. You can install additional storage servers at any time, thereby expanding the total disk storage capacity - Ceph Enterprise will redistribute the load on the storage / drives. In total, you can install 224 servers for data. Today it is about 56 petabytes, but disk volumes are growing, the possibilities of software stuffing in theory are limited to exabytes per cloud storage. The advantage in this situation is that that in ETERNUS it will be possible to add new generation servers together with servers of previous generations (and they will be able to work together). Outdated storage nodes, over time, can simply be disconnected "from the outlet" - Ceph replicates the missing data to the remaining nodes without additional intervention.

    Management nodes store logs and events that occur in the repository. It is recommended to install 2 of these servers, but in general, the system can work even if the management node ceases to be available.

    The CD10000 has a web interface that allows you to carry out most operations with the repository and view the status of individual nodes or the repository as a whole. The classic CLI interface, familiar to many administrators who worked directly with Ceph, has not gone away. Problems with "communication" in people with this system should not arise.

    Now about how ETERNUS can "talk" with other servers. To start with, hardware - each storage server connects to a regular network with 10-gigabit interfaces. To be absolutely exact, then with dual-port cards PRIMERGY 10Gb Modular LAN Adapter (with Intel chip 82599 inside them). They are unlikely to become a bottleneck.

    At the software level, all the fantasies of users of similar products are also taken into account. There are 4 interfaces for storage clients:

    • Librados (designed for direct interaction with the repository using a ready-made library of applications written in C / C ++ / Java / Python / PHP / Ruby)
    • Ceph Object Gateway (RGW - here you will find a REST API compatible with Amazon S3 and Swift)
    • Ceph Block Device (RBD - interface for storing volumes of QEMU / KVM virtual machines)
    • CephFS (POSIX-compatible network file system, with drivers for FUSE)
    • At the client’s individual request, an additional interface from a number of standard scenarios may appear in his installation


    Ceph Object Storage (or RADOS) became the heart, brain and soul of the Fujitsu ETERNUS CD10000 - it deals with load balancing between nodes / disks, block replication, restoration of lost replicas, and clustering of storage. In general, everything regarding performance and reliability. RAID arrays are not used here in their usual application scenario. It is hard to imagine how long a rebuild of one array will take on dozens of 6 TB disks. And how often will it happen.

    And if there are several thousand drives? RADOS, on the other hand, solves the problem of disk failure more quickly - it does not need to re-read the surface of all blocks of the array (including empty ones, when compared with the same mdadm). He only needs to make additional copies of those blocks that were stored on the disk removed from the storage. The problems of disconnected storage nodes are solved in the same way - RADOS will be able to find those blocks whose number of replicas do not match the storage settings. Of course, replicas of a single block will never be stored on the same node. For each data set, the block size, the number of copies (replicas) of each block, and on what type of media these replicas should be created (slow, fast or very fast media) are determined at the software level. For data sets where the main requirement lies in the field of economics,

    Inside RADOS, the CRUSH algorithm (Controlled Replication Under Scalable Hashing) is used - a hierarchical tree is built from equipment of different levels (rack, server, disks), which contains information about the available volumes, location and availability of disks. Based on this tree, RADOS already decides where to store copies of blocks in the required quantity. By the way, the system administrator can edit this tree manually. The same algorithm ensures that there is no need for a single repository of information about where to look for a block - any “iron” RADOS participant is able to respond to a request for any data block, which saves us from another point of failure of storage.

    As a pleasant feature, the Fujitsu ETERNUS CD10000 can work in several data centers. True, the speed of light cannot be fooled - you should not place the wings of the cluster further than 80 kilometers along the optics from each other (which, however, allows you to place the cluster in two different cities of the Moscow Region, for example), otherwise, due to the high RTT, the storage may not work correctly. In this case, the storage will work in split-site configuration mode, but still it will remain the same storage with the same data set inside.

    Thus, we have a storage system that is easily integrated into any infrastructure, fault-tolerant and reliable enough, based on high-quality Fujitsu equipment, easily scalable to different data volumes and different performance requirements, eliminated performance bottlenecks, and has the status of enterprise -product and technical support from a global company with rich experience.

    » Product page on Fujitsu website

    Thank you for your attention, we are ready to answer your questions.

    Also popular now: