Linux virtual file systems: why are they needed and how do they work? Part 1

    Hello! We continue to launch new threads at courses you already love and are in a hurry to inform you that we are starting a new set at the Linux Administrator course , which will start at the end of April. A new publication will be timed to coincide with this event. The original material can be found here .

    Virtual file systems act as some kind of magical abstraction that allows the Linux philosophy to say that “everything is a file”.

    What is a file system? Building on the words of one of the first Linux contributors and authors, Robert Love, “A file system is a hierarchical data warehouse, compiled in accordance with a specific structure.” Be that as it may, this definition is equally well suited for VFAT (Virtual File Allocation Table), Git, and Cassandra ( NoSQL Database ). So what exactly defines a concept like “file system”?

    File System Basics

    The Linux kernel has specific requirements for an entity that can be considered a file system. It must implement the methods open(), read()and write()for persistent objects that have names. From the point of view of object-oriented programming, the kernel defines a generic file system (generic filesystem) as an abstract interface, and these three large functions are considered "virtual" and do not have a specific definition. Accordingly, the default implementation of the file system is called the virtual file system (VFS).

    If we can open, read and write to an entity, then this entity is considered a file, as we see from the example in the console above.
    The VFS phenomenon only underscores the observation characteristic of Unix-like systems, which states that "everything is a file." Think of how strange it is that the small example above with / dev / console shows how the console actually works. The picture shows an interactive Bash session. Sending a string to the console (virtual console device) displays it on a virtual screen. VFS has other, even stranger properties. For example, it makes it possible to search through them .

    Familiar systems like ext4, NFS, and / proc have three important functions in the C data structure called file_operations. In addition, certain file systems extend and redefine VFS functions in a familiar, object-oriented manner. As Robert Love notes, the VFS abstraction allows Linux users to blithely copy files to or from third-party operating systems or abstract entities such as pipes, without worrying about their internal data format. On the user side (userspace), using a system call, a process can copy from a file to the kernel data structures using the method of read()one file system, and then use the method of write()another file system to output data.

    The definitions of functions that belong to the basic VFS types are in the fs / *. Files from the kernel source code, while the subdirectoriesfs/contain specific file systems. The kernel also contains entities, such as cgroups, /devand tmpfs, which are required during the boot process and therefore are defined in the kernel subdirectory init/. Note that cgroups, /devand tmpfsdo not call the "big three" functions file_operations, but directly read and write to memory.
    The diagram below shows how userspace accesses the various types of file systems typically mounted on Linux systems. Not shown are such constructions as pipes, dmesgand POSIX clocks, which also implement the structure file_operations, access to which passes through the VFS layer.

    VFS - a "cladding layer" between the system calls and certain implementations file_operations, such as ext4andprocfs. Functions file_operationscan interact with either device drivers or memory access devices. tmpfs, devtmpfs and cgroups do not use file_operations, but directly access the memory.
    The existence of VFS provides the ability to reuse code, since the basic methods associated with file systems do not need to be re-implemented by each type of file system. Reusing code is a widely accepted practice for software engineers! However, if the reusable code contains serious errors , all implementations that inherit common methods suffer from them.

    / tmp: A simple hint

    A simple way to detect that VFS is present in the system is to enter it mount | grep -v sd | grep -v :/, which will show all mounted (mounted) file systems that are not resident on the disk and not NFS, which is true on most computers. One of the listed ( mounts) VFS mounts will undoubtedly be /tmp, right?

    Everyone knows that storing /tmpon a physical medium is crazy! Source .

    Why is it undesirable to store /tmpon physical media? Because the files in /tmpare temporary and the storage devices are slower than the memory where tmpfs is created. Moreover, physical media are more susceptible to overwriting wear than memory. Finally, the files in / tmp may contain sensitive information, so their disappearance with each reboot is an integral function.

    Unfortunately, some Linux distribution installation scripts create / tmp on the default storage device. Do not despair if this happened to your system. Follow some simple instructions from the Arch Wiki to fix this, and remember that the memory allocated for tmpfs becomes inaccessible for other purposes. In other words, a system with giant tmpfs and large files in it can run out of memory and crash. Another hint: when editing a file /etc/fstab, remember that it must end with a new line, otherwise your system will not boot.

    / proc and / sys

    addition /tmp, VFS (virtual file system) that are most familiar to Linux users - it /procand /sys. (/devlocated in shared memory and does not have file_operations). Why exactly these two components? Let's look into this issue.

    procfscreates a snapshot of the instant state of the kernel and the processes that it controls for userspace. The /prockernel displays information about what tools it has, for example, interrupts, virtual memory, and the scheduler. In addition, /proc/systhis is the place where the parameters that can be configured with the command sysctlare available for userspace. The status and statistics of individual processes are displayed in directories /proc/.

    Here /proc/meminfois an empty file that nonetheless contains valuable information. File

    behavior /procshows how dissimilar VFS disk file systems can be. One side,/proc/meminfocontain information that can be viewed by the team free. On the other hand, it’s empty there! How is that? The situation resembles a famous article entitled “Does the moon exist when no one is looking at it?” Reality and Quantum Theory, ” written by Cornell University physics professor David Mermin in 1985. The fact is that the kernel collects memory statistics when a request is made to /proc, and in fact, there is /procnothing in the files when no one is looking there. As Mermin said , “The fundamental quantum doctrine states that measurement, as a rule, does not reveal the preexisting value of the measured property.” (Think about the moon as a homework!)
    Apparent emptinessprocfs makes sense because the information there is dynamic. A slightly different situation with sysfs. Let's compare how many files of at least one byte are in /procand in /sys.

    Procfshas one file, namely the exported kernel configuration, which is an exception, because it needs to be generated only once per boot. On the other hand, there are /sysmany more voluminous files, many of which occupy a whole page of memory. Typically, files sysfs contain exactly one number or line, unlike tables of information obtained by reading files such as /proc/meminfo.

    The goal sysfs is to provide readable and writeable properties of what the kernel calls «kobjects»userspace. Sole purposekobjects - This is a reference count: when the last link to kobject is deleted, the system will restore the resources associated with it. Nevertheless, it /sysmakes up the bulk of the famous “stable ABI for userspace” kernel, which no one can, under any circumstances, “break” . This does not mean that the files in sysfs are static, which would contradict counting links to unstable objects.
    A kernel's stable kernel application interface (kernel's stable ABI) limits what might appear in/sys, and not what is actually present at this particular moment. Listing file permissions in sysfs provides an understanding of how configurable settings for devices, modules, file systems, etc. can be customized or read. We conclude that procfs is also part of the stable ABI kernel, although this is not explicitly stated in the documentation .

    Files in sysfs describe one specific property for each entity and can be readable, rewritable, or both at once. A “0” in the file indicates that the SSD cannot be deleted.

    The second part of the translation starts with how to watch VFS using the eBPF and bcc tools, and now we are waiting for your comments and traditionally invite you to an open webinar, which will be held on April 9 by our teacher - Vladimir Drozdetsky .

    Second part.

    Also popular now: