Find the mkdir system call code

In the first year of my acquaintance with Ubuntu, I became interested in how the kernel of this OS works. I decided with all seriousness to figure it out, downloaded the 80-megabyte source archive, and ... that's it! I did not know where to start, nor how to end. I opened random files in turn and got lost right there. I think this happened with every experienced Linuxoid. Now I have gained experience and would like to share it.

In this article I will explain how I searched for the mkdir system call code.

To begin with, the mkdir function is defined in sys/stat.h. The prototype is as follows:

/* Create a new directory named PATH, with permission bits MODE.  */
extern int mkdir (__const char *__path, __mode_t __mode)
     __THROW __nonnull ((1));


This header file is part of the POSIX standard. Linux is almost completely POSIX-compatible, which means it must implement mkdir with just such a signature.

But even knowing the signature, to find the actual system call code is not at all easy ...

And in truth it ack "int mkdir"will return:

security / inode.c
103: static int mkdir (struct inode * dir, struct dentry * dentry, int mode)

tools / perf / util / util.c
4: int mkdir_p (char * path, mode_t mode)

tools / perf / util /util.h
259: int mkdir_p (char * path, mode_t mode);


It is clear that not a single signature completely matches. Where is the mkdir function implementation located? What is the algorithm for finding implementations of system calls in the Linux kernel?

System calls do not work like regular functions. More precisely, they are not functions at all. To execute a system call, you need some assembler code. By and large, the system call number is placed in the EAX register (by the way, system calls are accessed by the number, not the address), the remaining registers put the arguments: the first in EBX, the second in ECX, the third in EDX, the fourth in ESX, fifth in EDI. By the way, this is why a system call cannot have more than 5 arguments. After all the necessary values ​​are located, the program that wants to make a system call executes the 128th interrupt (on assembler: int 0x80). Interruption puts the processor in kernel mode and transfers control to an address previously agreed with the kernel. As you can see, system calls operate at a lower level than C-functions.

The number of any system call can be found in usr/include/asm*/unistd.h:
#define __NR_mkdir                              83
__SYSCALL(__NR_mkdir, sys_mkdir)


That is, the mkdir system call is number 83.

If you programmed into user space under Linux, you most likely know that, as a rule, C functions are used to make system calls. Where do they come from? Those functions are just wrappers from the GNU libc library. Each system call has a wrapper function. The functions themselves do all the same interrupts.

So where to look for mkdir now? In theory, a system call could be implemented simply in assembler, then the corresponding C-function would simply not exist, but this is not so. On Linux, every system call is defined in include/linux/syscalls.h:
asmlinkage long sys_mkdir(const char __user *pathname, int mode);


The implementation is located in the corresponding part of the kernel. Here you just need to know that mkdir is part of the VFS file subsystem and is defined in fs/namei.c:
SYSCALL_DEFINE2(mkdir, const char __user *, pathname, umode_t, mode)
{
	return sys_mkdirat(AT_FDCWD, pathname, mode);
}


SYSCALL_DEFINE2 is one of the macros in the SYSCALL_DEFINEx series, where x is the number of arguments to the system call. In the code above, another system call is called - sys_mkdirat, which is also located in fs/namei.c. Please note that here the system call is made by calling the function, because the calling code is already executing in kernel mode.

SYSCALL_DEFINE3(mkdirat, int, dfd, const char __user *, pathname, umode_t, mode)
{
	struct dentry *dentry;
	struct path path;
	int error;
	dentry = user_path_create(dfd, pathname, &path, 1);
	if (IS_ERR(dentry))
		return PTR_ERR(dentry);
	if (!IS_POSIXACL(path.dentry->d_inode))
		mode &= ~current_umask();
	error = security_path_mkdir(&path, dentry, mode);
	if (!error)
		error = vfs_mkdir(path.dentry->d_inode, dentry, mode);
	done_path_create(&path, dentry);
	return error;
}


And here is already interesting! We meet the first checks and again the transfer of control to another function - vfs_mkdir, which is defined all in the same fs/namei.h:

int vfs_mkdir(struct inode *dir, struct dentry *dentry, umode_t mode)
{
	int error = may_create(dir, dentry);
	unsigned max_links = dir->i_sb->s_max_links;
	if (error)
		return error;
	if (!dir->i_op->mkdir)
		return -EPERM;
	mode &= (S_IRWXUGO|S_ISVTX);
	error = security_inode_mkdir(dir, dentry, mode);
	if (error)
		return error;
	if (max_links && dir->i_nlink >= max_links)
		return -EMLINK;
	error = dir->i_op->mkdir(dir, dentry, mode);
	if (!error)
		fsnotify_mkdir(dir, dentry);
	return error;
}


Another check, another transfer of control. It is worth saying that Linux is a very multi-level system, where responsibilities are distributed across different parts of the system. Therefore, it is not strange that in the code above, logic is constantly delegated. In the last piece of code, dir-> i_op-> mkdir (dir, dentry, mode) is called. Follow the trail! dir is of type inode *. From the definition of the inode structure, we learn that the i_op pointer is of type inode_operations *. The last structure contains pointers to the functions of operations that can be done on this node, and the implementations are different for different file systems. That is, depending on which file system our dir belongs to, the inode_operations structure will contain pointers to certain implementations.

For example, for ext4, we find the implementation of mkdir in fs / ext4 / namei.c

Actually the desired code!
static int ext4_mkdir(struct inode *dir, struct dentry *dentry, umode_t mode)
{
	handle_t *handle;
	struct inode *inode;
	struct buffer_head *dir_block = NULL;
	struct ext4_dir_entry_2 *de;
	struct ext4_dir_entry_tail *t;
	unsigned int blocksize = dir->i_sb->s_blocksize;
	int csum_size = 0;
	int err, retries = 0;
	if (EXT4_HAS_RO_COMPAT_FEATURE(dir->i_sb,
				       EXT4_FEATURE_RO_COMPAT_METADATA_CSUM))
		csum_size = sizeof(struct ext4_dir_entry_tail);
	if (EXT4_DIR_LINK_MAX(dir))
		return -EMLINK;
	dquot_initialize(dir);
retry:
	handle = ext4_journal_start(dir, EXT4_DATA_TRANS_BLOCKS(dir->i_sb) +
					EXT4_INDEX_EXTRA_TRANS_BLOCKS + 3 +
					EXT4_MAXQUOTAS_INIT_BLOCKS(dir->i_sb));
	if (IS_ERR(handle))
		return PTR_ERR(handle);
	if (IS_DIRSYNC(dir))
		ext4_handle_sync(handle);
	inode = ext4_new_inode(handle, dir, S_IFDIR | mode,
			       &dentry->d_name, 0, NULL);
	err = PTR_ERR(inode);
	if (IS_ERR(inode))
		goto out_stop;
	inode->i_op = &ext4_dir_inode_operations;
	inode->i_fop = &ext4_dir_operations;
	inode->i_size = EXT4_I(inode)->i_disksize = inode->i_sb->s_blocksize;
	if (!(dir_block = ext4_bread(handle, inode, 0, 1, &err))) {
		if (!err) {
			err = -EIO;
			ext4_error(inode->i_sb,
				   "Directory hole detected on inode %lu\n",
				   inode->i_ino);
		}
		goto out_clear_inode;
	}
	BUFFER_TRACE(dir_block, "get_write_access");
	err = ext4_journal_get_write_access(handle, dir_block);
	if (err)
		goto out_clear_inode;
	de = (struct ext4_dir_entry_2 *) dir_block->b_data;
	de->inode = cpu_to_le32(inode->i_ino);
	de->name_len = 1;
	de->rec_len = ext4_rec_len_to_disk(EXT4_DIR_REC_LEN(de->name_len),
					   blocksize);
	strcpy(de->name, ".");
	ext4_set_de_type(dir->i_sb, de, S_IFDIR);
	de = ext4_next_entry(de, blocksize);
	de->inode = cpu_to_le32(dir->i_ino);
	de->rec_len = ext4_rec_len_to_disk(blocksize -
					   (csum_size + EXT4_DIR_REC_LEN(1)),
					   blocksize);
	de->name_len = 2;
	strcpy(de->name, "..");
	ext4_set_de_type(dir->i_sb, de, S_IFDIR);
	set_nlink(inode, 2);
	if (csum_size) {
		t = EXT4_DIRENT_TAIL(dir_block->b_data, blocksize);
		initialize_dirent_tail(t, blocksize);
	}
	BUFFER_TRACE(dir_block, "call ext4_handle_dirty_metadata");
	err = ext4_handle_dirty_dirent_node(handle, inode, dir_block);
	if (err)
		goto out_clear_inode;
	set_buffer_verified(dir_block);
	err = ext4_mark_inode_dirty(handle, inode);
	if (!err)
		err = ext4_add_entry(handle, dentry, inode);
	if (err) {
out_clear_inode:
		clear_nlink(inode);
		unlock_new_inode(inode);
		ext4_mark_inode_dirty(handle, inode);
		iput(inode);
		goto out_stop;
	}
	ext4_inc_count(handle, dir);
	ext4_update_dx_flag(dir);
	err = ext4_mark_inode_dirty(handle, dir);
	if (err)
		goto out_clear_inode;
	unlock_new_inode(inode);
	d_instantiate(dentry, inode);
out_stop:
	brelse(dir_block);
	ext4_journal_stop(handle);
	if (err == -ENOSPC && ext4_should_retry_alloc(dir->i_sb, &retries))
		goto retry;
	return err;
}


Understanding the operation of the kernel is a useful skill in the arsenal of any Linuxoid. Hope this article has been helpful too!

Resources:
Linux system call table Linux
sources 3.6
Discussion on unix.stackexchange.com

Also popular now: