Learning Linux Processes

In this article, I would like to talk about the life course of the processes in the Linux family. In theory and examples, I will look at how processes are born and die, talk a little about the mechanics of system calls and signals.

This article is more designed for newcomers to system programming and those who just want to learn a little more about how Linux processes work.

Everything written below is valid for Debian Linux with kernel 4.15.0.

Introduction

System software interacts with the system kernel through special functions - system calls. In rare cases, there is an alternative API, for example, procfs or sysfs, implemented as virtual file systems.

Process Attributes

The process in the kernel is simply represented as a structure with many fields (the definition of the structure can be read here ).
But since the article is devoted to system programming, and not to the development of the kernel, we abstract a little and just focus on the important for us fields of the process:

Process ID (pid)
Open File Descriptors (fd)
Signal handlers
Current working directory (cwd)
Environmental variables (environ)
Return code

Process life cycle

Birth process

Only one process in the system is born in a special way init- it is generated directly by the core. All other processes appear by duplicating the current process using a system call fork(2). After execution, fork(2)we obtain two almost identical processes with the exception of the following points:

fork(2) returns the PID of the child to the parent, 0 is returned to the child;
The child changes the PPID (Parent Process Id) to the PID of the parent.

Once completed, fork(2)all the resources of the child process are a copy of the parent's resources. Copying the process with all allocated memory pages is expensive, so the Linux kernel uses Copy-On-Write technology.
All pages in the memory of the parent are marked as read-only and become available to both the parent and the child. As soon as one of the processes changes the data on a particular page, this page does not change, but a copy is copied and changed. The original is “untied” from this process. As soon as the read-only original remains “attached” to one process, the page is again assigned the status of read-write.

An example of a simple useless program with a fork (2)

#include<stdio.h>#include<unistd.h>#include<errno.h>#include<sys/wait.h>#include<sys/types.h>intmain(){
    int pid = fork();
    switch(pid) {
        case-1:
            perror("fork");
            return-1;
        case0:
            // Childprintf("my pid = %i, returned pid = %i\n", getpid(), pid);
            break;
        default:
            // Parentprintf("my pid = %i, returned pid = %i\n", getpid(), pid);
            break;
    }
    return0;
}

$ gcc test.c && ./a.out
my pid = 15594, returned pid = 15595
my pid = 15595, returned pid = 0

State "ready"

Immediately after execution, fork(2)it goes into the “ready” state.
In fact, the process is queuing and waiting for the scheduler in the kernel to let the process run on the processor.

Status "running"

As soon as the scheduler put the process to execution, the “running” state began. The process can be performed all the proposed period (quantum) of time, and may give way to other processes, using the system export sched_yield.

Rebirth into another program

In some programs, logic is implemented in which the parent process creates a child process for solving a task. The child in this case solves some specific problem, and the parent only delegates tasks to his children. For example, a web server on an incoming connection creates a child and transfers connection processing to it.
However, if you need to start another program, you need to resort to a system call execve(2):

intexecve(constchar *filename, char *const argv[], char *const envp[]);

or library calls execl(3), execlp(3), execle(3), execv(3), execvp(3), execvpe(3):

intexecl(constchar *path, constchar *arg, ... /* (char  *) NULL */);
intexeclp(constchar *file, constchar *arg, ...  /* (char  *) NULL */);
intexecle(constchar *path, constchar *arg, ...
                             /*, (char *) NULL, char * const envp[] */);
intexecv(constchar *path, char *const argv[]);
intexecvp(constchar *file, char *const argv[]);
intexecvpe(constchar *file, char *const argv[], char *const envp[]);

All of the listed calls execute the program, the path to which is specified in the first argument. If successful, the control is transferred to the downloaded program and is not returned to the original one. In this case, the loaded program will have all the fields of the process structure, except for file descriptors marked as O_CLOEXEC, they will close.

How not to be confused in all these challenges and choose the right one? Enough to understand the logic of naming:

All calls begin with exec
The fifth letter specifies the type of argument passing:
- l denotes a list , all parameters are passed asarg1, arg2, ..., NULL
- v stands for vector , all parameters are passed in a null-terminated array;
Next can follow the letter p , which stands for path . If the argument filestarts with a character other than "/", then the specified fileis searched in the directories listed in the environment variablePATH
The latter may be the letter e , meaning environ . In such calls, the last argument is a null-terminated array of null-terminated strings of the form key=value— environment variables that will be passed to the new program.

Call example / bin / cat --help via execve

#define _GNU_SOURCE#include<unistd.h>intmain(){
    char* args[] = { "/bin/cat", "--help", NULL };
    execve("/bin/cat", args, environ);
    // Unreachablereturn1;
}

$ gcc test.c && ./a.out
Usage: /bin/cat [OPTION]... [FILE]...
Concatenate FILE(s) to standard output.
*Вывод обрезан*

The call family exec*allows you to run scripts with execution rights and starting with a shebang sequence (#!).

An example of running a script with a spoofed PATH using execle

#define _GNU_SOURCE#include<unistd.h>intmain(){
    char* e[] = {"PATH=/habr:/rulez", NULL};
    execle("/tmp/test.sh", "test.sh", NULL, e);
    // Unreachablereturn1;
}

$ cat test.sh
#!/bin/bashecho$0echo$PATH
$ gcc test.c && ./a.out
/tmp/test.sh
/habr:/rulez

There is a convention that implies that argv [0] matches zero arguments for exec * family functions. However, this can be broken.

Example of when cat becomes a dog using execlp

#define _GNU_SOURCE#include<unistd.h>intmain(){
    execlp("cat", "dog", "--help", NULL);
    // Unreachablereturn1;
}

$ gcc test.c && ./a.out
Usage: dog [OPTION]... [FILE]...
*Вывод обрезан*

A curious reader may notice that int main(int argc, char* argv[])there is a number in the function signature — the number of arguments, but exec*nothing in the family of functions is passed. Why? Because when you start the program, control is not transferred immediately to main. Before this, some actions defined by glibc are performed, including counting argc.

State "waiting"

Some system calls can take a long time, such as I / O. In such cases, the process goes into the "waiting" state. As soon as the system call is completed, the kernel will transfer the process to the “ready” state.
In Linux, there is also a “waiting” state in which the process does not respond to interrupt signals. In this state, the process becomes “unkillable”, and all incoming signals are queued until the process leaves this state.
The kernel itself chooses which of the states to transfer the process to. Most often, processes that request I / O get into the "waiting (without interrupts)" state. This is especially noticeable when using a remote disk (NFS) with not very fast internet.

“Stopped” status

At any time, you can pause the execution of the process by sending it a SIGSTOP signal. The process will go to the “stopped” state and remain there until it receives a signal to continue working (SIGCONT) or die (SIGKILL). The remaining signals will be queued.

Process completion

No program can complete itself. They can only ask the system for this through a system call _exitor be terminated by the system due to an error. Even when returning a number from main(), it is still implicitly called _exit.
Although the system call argument takes an int, only the low byte of the number is taken as the return code.

Zombie condition

Immediately after the process has completed (no matter whether it is correct or not), the kernel records information about how the process ended and translates its zombie state. In other words, zombies are a completed process, but its memory is still stored in the core.
Moreover, this is the second state in which the process can safely ignore the SIGKILL signal, because dead cannot die again.

Forgetting

The return code and the reason for completing the process are still stored in the kernel and need to be retrieved from there. To do this, you can use the appropriate system calls:

pid_t wait(int *wstatus); /* Аналогично waitpid(-1, wstatus, 0) */pid_t waitpid(pid_t pid, int *wstatus, int options);

All information about the completion of the process fits into the data type int. The macros described in the man page are used to get the return code and the reason for terminating the program waitpid(2).

An example of correct completion and receipt of a return code

#include<stdio.h>#include<unistd.h>#include<errno.h>#include<sys/wait.h>#include<sys/types.h>intmain(){
    int pid = fork();
    switch(pid) {
        case-1:
            perror("fork");
            return-1;
        case0:
            // Childreturn13;
        default: {
            // Parentint status;
            waitpid(pid, &status, 0);
            printf("exit normally? %s\n", (WIFEXITED(status) ? "true" : "false"));
            printf("child exitcode = %i\n", WEXITSTATUS(status));
            break;
        }
    }
    return0;
}

$ gcc test.c && ./a.out
exit normally? true
child exitcode = 13

Example of incorrect termination

Передача argv[0] как NULL приводит к падению.

#include<stdio.h>#include<unistd.h>#include<errno.h>#include<sys/wait.h>#include<sys/types.h>intmain(){
    int pid = fork();
    switch(pid) {
        case-1:
            perror("fork");
            return-1;
        case0:
            // Child
            execl("/bin/cat", NULL);
            return13;
        default: {
            // Parentint status;
            waitpid(pid, &status, 0);
            if(WIFEXITED(status)) {
                printf("Exit normally with code %i\n", WEXITSTATUS(status));
            }
            if(WIFSIGNALED(status)) {
                printf("killed with signal %i\n", WTERMSIG(status));
            }
            break;
        }
    }
    return0;
}

$ gcc test.c && ./a.out
killed with signal 6

There are cases in which the parent ends earlier than the child. In such cases, the parent of the child will be initand he will apply the call wait(2)when the time comes.

After the parent has taken away the information about the death of the child, the kernel erases all the information about the child so that another process will soon take its place.

Acknowledgments

Thanks to Sasha “Al” for editing and design assistance;

Thanks to Sasha “Reisse” for clear answers to difficult questions.

They bravely endured the inspiration that attacked me and the flurry of my questions that attacked them.

Tags: