The future of Linux file systems

    The Vault event hosted by the Linux Foundation in Boston in early March of this year had a lot of talk about file systems and storage. If you think that nothing new was said there, then you are mistaken.




    Linux file systems such as Btrfs and storage options are constantly evolving. FacebookStorage technology has come a long way since Linus Torvalds adopted it until it was supported by the system itself.
    In past years, for example, there have been attempts to use flash drives as the main drive for the server and SSD drive to increase the speed of working with data closer to working with RAM.
    The use of flash drives was proposed by SanDisk based on InfiniFlash, which was supposed to be used as a replacement for hard drives at a price of $ 1-2 per gigabyte.



    At the same time, Big Data, cloud computing and containerization suggested finding new ways. To solve these problems, Linux developers are developing existing file systems and file storages, as well as working on new ones.

    Btrfs
    For example, Chris Mason, a Facebook developer, and one of Btrfs' maintainers , shared information on how Facebook uses this file system. It is interesting in that it is well suited for working with a large number of small files, and with files up to 16 exabytes; going to a raid array; has a built-in compression system and support for various storage devices.

    Of course, Facebook makes extensive use of Linux. To be more precise, the kernels 3.10 and 3.18 and a proprietary distribution based on CentOS are mainly used. For Facebook, Btrfs was truly a salvation, this file system behaves stably and shows great speed with endless I / O operations performed by members of the social network. This is if you take the good news. From the bad - Btrfs proved to be very bad at working with traditional database storage servers such as MySQL. Facebook uses XFS for them. Both file systems are managed by the open distributed file system Gluster,
    Facebook, hand-in-hand with leading Linux kernel developers and Btrfs, is working to increase the speed of working with database storage systems. Mason and his colleagues achieved acceptable work of RocksDB on Btrfs.
    RocksDB is a fast key-value storage system that can be used as the basis for a client-server database.
    With Btrfs, besides this, not everything is so smooth. It contains errors that occur at different points. For example, you were smart enough to decide to drive the hard drive to failure, but Btrfs will break the record reaching a certain point of storage fullness.
    The Btrfs development team is also working on data deduplication. This data compression method works best when there are a large number of stored files in the storage that differ little from each other, or have many similar elements, as when storing backups. Mason said so: “This is not necessary for everyone, if he needs it, then it really needs to be!” But Btrfs is not the only file system on which serious work is underway and on which everything should be put. John Spray, lead developer of Red Hat, talked about Ceph distributed storage.



    Ceph FS
    Ceph- This is an open source development of flexible, easily scalable petabyte storage. The basis is the union of the disk spaces of several dozen servers in the object storage, which allows flexible multiple pseudorandom data redundancy. Ceph developers complement this object store with three more projects:

    • RADOS Gateway - S3- and Swift-compatible RESTful interface
    • RBD - a block device with support for thin growth and snapshots
    • Ceph FS - Distributed POSIX-compliant File System


    Ceph is a distributed, easy-to-scale storage that allows you to create a union from servers to an object store to implement flexible multiple pseudo-random data redundancy (RADOS, Reliable Autonomic Distributed Object Store). The developers distinguish the following main components:
    • OSD (Object Storage Device) Daemon: A storage daemon for a service that provides communication with an OSD (physical or logical storage unit). This daemon must be running on each of the cluster servers, each OSD can be assigned a separate hard drive with a RAID array, LVM or Btrfs pool. Three pools are created by default: data, metadata, and RDB
    • MDS (Meta-Data Server, Metadata Storage Server): It is built as a POSIX file system. If you are not using the Ceph file system, then a metadata server is not required.
    • MON (Monitor): A lightweight daemon that provides communication between external applications and clients. Also provides reconciliation for expansion in a Ceph / RADOS cluster.

    Ceph developers recommend using Btrfs for the file system in the Khanilah, although XFS may be the best alternative for a "combat" application.

    Following the takeover of Ceph, the owner of Inktank, in 2014, Red Hat was busy preparing CephFS for production use.
    There is still a lot of work ahead of CephFS, let it work, but it lacks such important tools for monitoring or testing / fixing bugs.
    Red Hat is currently developing fsck and a journaling system, as well as improving the ability to create snapshots, improving customer control or integrating into containers and clouds. To date, according to Spray, CephFS can be used either very cool and brave, or very stupid as a file system.

    The rest is trivial

    From what was interesting at the Boston meeting: Jeff Layton, lead developer of Primary Data , said that he was creating a power-off emulation for testing file systems. This feature will be added to the xfstests app . Despite the name, in addition to XFS, there is support for the bulk of the currently popular file systems.

    Rick van Riel, developer of Red Hat, shared the problem of working with RAM as a data warehouse. That is, it works if you use these resources as RAM, or as storage, but when something more is needed, difficulties arise. For example, there is no way to create snapshots for backup when using RAM for its intended purpose.
    There is no solution yet, but programmers are working on it. And while in Linux there are a large number of various file systems and storages, there will be plenty of work. Technology does not stand still. Linux works on everything from a simple gadget such as coffee makers, desktops, cloud systems to super computers that have different storage requirements.

    Also popular now: