Linux virtual file systems: why are they needed and how do they work? Part 1
Hello! We continue to launch new threads at courses you already love and are in a hurry to inform you that we are starting a new set at the Linux Administrator course , which will start at the end of April. A new publication will be timed to coincide with this event. The original material can be found here .
Virtual file systems act as some kind of magical abstraction that allows the Linux philosophy to say that “everything is a file”.
What is a file system? Building on the words of one of the first Linux contributors and authors, Robert Love, “A file system is a hierarchical data warehouse, compiled in accordance with a specific structure.” Be that as it may, this definition is equally well suited for VFAT (Virtual File Allocation Table), Git, and Cassandra ( NoSQL Database ). So what exactly defines a concept like “file system”?
File System Basics
The Linux kernel has specific requirements for an entity that can be considered a file system. It must implement the methods
If we can open, read and write to an entity, then this entity is considered a file, as we see from the example in the console above.
The VFS phenomenon only underscores the observation characteristic of Unix-like systems, which states that "everything is a file." Think of how strange it is that the small example above with / dev / console shows how the console actually works. The picture shows an interactive Bash session. Sending a string to the console (virtual console device) displays it on a virtual screen. VFS has other, even stranger properties. For example, it makes it possible to search through them .
Familiar systems like ext4, NFS, and / proc have three important functions in the C data structure called file_operations. In addition, certain file systems extend and redefine VFS functions in a familiar, object-oriented manner. As Robert Love notes, the VFS abstraction allows Linux users to blithely copy files to or from third-party operating systems or abstract entities such as pipes, without worrying about their internal data format. On the user side (userspace), using a system call, a process can copy from a file to the kernel data structures using the method of
The definitions of functions that belong to the basic VFS types are in the fs / *. Files from the kernel source code, while the subdirectories
The diagram below shows how userspace accesses the various types of file systems typically mounted on Linux systems. Not shown are such constructions as
VFS - a "cladding layer" between the system calls and certain implementations
The existence of VFS provides the ability to reuse code, since the basic methods associated with file systems do not need to be re-implemented by each type of file system. Reusing code is a widely accepted practice for software engineers! However, if the reusable code contains serious errors , all implementations that inherit common methods suffer from them.
/ tmp: A simple hint
A simple way to detect that VFS is present in the system is to enter it
Everyone knows that storing
Why is it undesirable to store
Unfortunately, some Linux distribution installation scripts create / tmp on the default storage device. Do not despair if this happened to your system. Follow some simple instructions from the Arch Wiki to fix this, and remember that the memory allocated for
/ proc and / sys
addition
Here
behavior
Apparent emptiness
The goal
A kernel's stable kernel application interface (kernel's stable ABI) limits what might appear in
Files in
The second part of the translation starts with how to watch VFS using the eBPF and bcc tools, and now we are waiting for your comments and traditionally invite you to an open webinar, which will be held on April 9 by our teacher - Vladimir Drozdetsky .
Second part.
Virtual file systems act as some kind of magical abstraction that allows the Linux philosophy to say that “everything is a file”.
What is a file system? Building on the words of one of the first Linux contributors and authors, Robert Love, “A file system is a hierarchical data warehouse, compiled in accordance with a specific structure.” Be that as it may, this definition is equally well suited for VFAT (Virtual File Allocation Table), Git, and Cassandra ( NoSQL Database ). So what exactly defines a concept like “file system”?
File System Basics
The Linux kernel has specific requirements for an entity that can be considered a file system. It must implement the methods
open()
, read()
and write()
for persistent objects that have names. From the point of view of object-oriented programming, the kernel defines a generic file system (generic filesystem) as an abstract interface, and these three large functions are considered "virtual" and do not have a specific definition. Accordingly, the default implementation of the file system is called the virtual file system (VFS). If we can open, read and write to an entity, then this entity is considered a file, as we see from the example in the console above.
The VFS phenomenon only underscores the observation characteristic of Unix-like systems, which states that "everything is a file." Think of how strange it is that the small example above with / dev / console shows how the console actually works. The picture shows an interactive Bash session. Sending a string to the console (virtual console device) displays it on a virtual screen. VFS has other, even stranger properties. For example, it makes it possible to search through them .
Familiar systems like ext4, NFS, and / proc have three important functions in the C data structure called file_operations. In addition, certain file systems extend and redefine VFS functions in a familiar, object-oriented manner. As Robert Love notes, the VFS abstraction allows Linux users to blithely copy files to or from third-party operating systems or abstract entities such as pipes, without worrying about their internal data format. On the user side (userspace), using a system call, a process can copy from a file to the kernel data structures using the method of
read()
one file system, and then use the method of write()
another file system to output data. The definitions of functions that belong to the basic VFS types are in the fs / *. Files from the kernel source code, while the subdirectories
fs/
contain specific file systems. The kernel also contains entities, such as cgroups
, /dev
and tmpfs
, which are required during the boot process and therefore are defined in the kernel subdirectory init/
. Note that cgroups
, /dev
and tmpfs
do not call the "big three" functions file_operations
, but directly read and write to memory. The diagram below shows how userspace accesses the various types of file systems typically mounted on Linux systems. Not shown are such constructions as
pipes
, dmesg
and POSIX clocks
, which also implement the structure file_operations
, access to which passes through the VFS layer. VFS - a "cladding layer" between the system calls and certain implementations
file_operations
, such as ext4
andprocfs
. Functions file_operations
can interact with either device drivers or memory access devices. tmpfs
, devtmpfs
and cgroups
do not use file_operations
, but directly access the memory. The existence of VFS provides the ability to reuse code, since the basic methods associated with file systems do not need to be re-implemented by each type of file system. Reusing code is a widely accepted practice for software engineers! However, if the reusable code contains serious errors , all implementations that inherit common methods suffer from them.
/ tmp: A simple hint
A simple way to detect that VFS is present in the system is to enter it
mount | grep -v sd | grep -v :/
, which will show all mounted (mounted
) file systems that are not resident on the disk and not NFS, which is true on most computers. One of the listed ( mounts
) VFS mounts will undoubtedly be /tmp
, right? Everyone knows that storing
/tmp
on a physical medium is crazy! Source . Why is it undesirable to store
/tmp
on physical media? Because the files in /tmp
are temporary and the storage devices are slower than the memory where tmpfs is created. Moreover, physical media are more susceptible to overwriting wear than memory. Finally, the files in / tmp may contain sensitive information, so their disappearance with each reboot is an integral function.Unfortunately, some Linux distribution installation scripts create / tmp on the default storage device. Do not despair if this happened to your system. Follow some simple instructions from the Arch Wiki to fix this, and remember that the memory allocated for
tmpfs
becomes inaccessible for other purposes. In other words, a system with giant tmpfs and large files in it can run out of memory and crash. Another hint: when editing a file /etc/fstab
, remember that it must end with a new line, otherwise your system will not boot. / proc and / sys
addition
/tmp
, VFS (virtual file system) that are most familiar to Linux users - it /proc
and /sys
. (/dev
located in shared memory and does not have file_operations
). Why exactly these two components? Let's look into this issue. procfs
creates a snapshot of the instant state of the kernel and the processes that it controls for userspace
. The /proc
kernel displays information about what tools it has, for example, interrupts, virtual memory, and the scheduler. In addition, /proc/sys
this is the place where the parameters that can be configured with the command sysctl
are available for userspace
. The status and statistics of individual processes are displayed in directories /proc/
. Here
/proc/meminfo
is an empty file that nonetheless contains valuable information. File behavior
/proc
shows how dissimilar VFS disk file systems can be. One side,/proc/meminfo
contain information that can be viewed by the team free
. On the other hand, it’s empty there! How is that? The situation resembles a famous article entitled “Does the moon exist when no one is looking at it?” Reality and Quantum Theory, ” written by Cornell University physics professor David Mermin in 1985. The fact is that the kernel collects memory statistics when a request is made to /proc
, and in fact, there is /proc
nothing in the files when no one is looking there. As Mermin said , “The fundamental quantum doctrine states that measurement, as a rule, does not reveal the preexisting value of the measured property.” (Think about the moon as a homework!) Apparent emptiness
procfs
makes sense because the information there is dynamic. A slightly different situation with sysfs
. Let's compare how many files of at least one byte are in /proc
and in /sys
. Procfs
has one file, namely the exported kernel configuration, which is an exception, because it needs to be generated only once per boot. On the other hand, there are /sys
many more voluminous files, many of which occupy a whole page of memory. Typically, files sysfs
contain exactly one number or line, unlike tables of information obtained by reading files such as /proc/meminfo
. The goal
sysfs
is to provide readable and writeable properties of what the kernel calls «kobjects»
userspace. Sole purposekobjects
- This is a reference count: when the last link to kobject is deleted, the system will restore the resources associated with it. Nevertheless, it /sys
makes up the bulk of the famous “stable ABI for userspace” kernel, which no one can, under any circumstances, “break” . This does not mean that the files in sysfs are static, which would contradict counting links to unstable objects. A kernel's stable kernel application interface (kernel's stable ABI) limits what might appear in
/sys
, and not what is actually present at this particular moment. Listing file permissions in sysfs provides an understanding of how configurable settings for devices, modules, file systems, etc. can be customized or read. We conclude that procfs is also part of the stable ABI kernel, although this is not explicitly stated in the documentation . Files in
sysfs
describe one specific property for each entity and can be readable, rewritable, or both at once. A “0” in the file indicates that the SSD cannot be deleted. The second part of the translation starts with how to watch VFS using the eBPF and bcc tools, and now we are waiting for your comments and traditionally invite you to an open webinar, which will be held on April 9 by our teacher - Vladimir Drozdetsky .
Second part.