Learning Linux Processes
In this article, I would like to talk about the life course of the processes in the Linux family. In theory and examples, I will look at how processes are born and die, talk a little about the mechanics of system calls and signals.
This article is more designed for newcomers to system programming and those who just want to learn a little more about how Linux processes work.
Everything written below is valid for Debian Linux with kernel 4.15.0.
Content
Introduction
System software interacts with the system kernel through special functions - system calls. In rare cases, there is an alternative API, for example, procfs or sysfs, implemented as virtual file systems.
Process Attributes
The process in the kernel is simply represented as a structure with many fields (the definition of the structure can be read here ).
But since the article is devoted to system programming, and not to the development of the kernel, we abstract a little and just focus on the important for us fields of the process:
- Process ID (pid)
- Open File Descriptors (fd)
- Signal handlers
- Current working directory (cwd)
- Environmental variables (environ)
- Return code
Process life cycle
Birth process
Only one process in the system is born in a special way
init
- it is generated directly by the core. All other processes appear by duplicating the current process using a system call fork(2)
. After execution, fork(2)
we obtain two almost identical processes with the exception of the following points:fork(2)
returns the PID of the child to the parent, 0 is returned to the child;- The child changes the PPID (Parent Process Id) to the PID of the parent.
Once completed,
fork(2)
all the resources of the child process are a copy of the parent's resources. Copying the process with all allocated memory pages is expensive, so the Linux kernel uses Copy-On-Write technology. All pages in the memory of the parent are marked as read-only and become available to both the parent and the child. As soon as one of the processes changes the data on a particular page, this page does not change, but a copy is copied and changed. The original is “untied” from this process. As soon as the read-only original remains “attached” to one process, the page is again assigned the status of read-write.
An example of a simple useless program with a fork (2)
#include<stdio.h>#include<unistd.h>#include<errno.h>#include<sys/wait.h>#include<sys/types.h>intmain(){
int pid = fork();
switch(pid) {
case-1:
perror("fork");
return-1;
case0:
// Childprintf("my pid = %i, returned pid = %i\n", getpid(), pid);
break;
default:
// Parentprintf("my pid = %i, returned pid = %i\n", getpid(), pid);
break;
}
return0;
}
$ gcc test.c && ./a.out
my pid = 15594, returned pid = 15595
my pid = 15595, returned pid = 0
State "ready"
Immediately after execution,
fork(2)
it goes into the “ready” state. In fact, the process is queuing and waiting for the scheduler in the kernel to let the process run on the processor.
Status "running"
As soon as the scheduler put the process to execution, the “running” state began. The process can be performed all the proposed period (quantum) of time, and may give way to other processes, using the system export
sched_yield
.Rebirth into another program
In some programs, logic is implemented in which the parent process creates a child process for solving a task. The child in this case solves some specific problem, and the parent only delegates tasks to his children. For example, a web server on an incoming connection creates a child and transfers connection processing to it.
However, if you need to start another program, you need to resort to a system call
execve(2)
:intexecve(constchar *filename, char *const argv[], char *const envp[]);
or library calls
execl(3), execlp(3), execle(3), execv(3), execvp(3), execvpe(3)
:intexecl(constchar *path, constchar *arg, ... /* (char *) NULL */);
intexeclp(constchar *file, constchar *arg, ... /* (char *) NULL */);
intexecle(constchar *path, constchar *arg, ...
/*, (char *) NULL, char * const envp[] */);
intexecv(constchar *path, char *const argv[]);
intexecvp(constchar *file, char *const argv[]);
intexecvpe(constchar *file, char *const argv[], char *const envp[]);
All of the listed calls execute the program, the path to which is specified in the first argument. If successful, the control is transferred to the downloaded program and is not returned to the original one. In this case, the loaded program will have all the fields of the process structure, except for file descriptors marked as
O_CLOEXEC
, they will close. How not to be confused in all these challenges and choose the right one? Enough to understand the logic of naming:
- All calls begin with
exec
- The fifth letter specifies the type of argument passing:
- l denotes a list , all parameters are passed as
arg1, arg2, ..., NULL
- v stands for vector , all parameters are passed in a null-terminated array;
- l denotes a list , all parameters are passed as
- Next can follow the letter p , which stands for path . If the argument
file
starts with a character other than "/", then the specifiedfile
is searched in the directories listed in the environment variablePATH
- The latter may be the letter e , meaning environ . In such calls, the last argument is a null-terminated array of null-terminated strings of the form
key=value
— environment variables that will be passed to the new program.
Call example / bin / cat --help via execve
#define _GNU_SOURCE#include<unistd.h>intmain(){
char* args[] = { "/bin/cat", "--help", NULL };
execve("/bin/cat", args, environ);
// Unreachablereturn1;
}
$ gcc test.c && ./a.out
Usage: /bin/cat [OPTION]... [FILE]...
Concatenate FILE(s) to standard output.
*Вывод обрезан*
The call family
exec*
allows you to run scripts with execution rights and starting with a shebang sequence (#!).An example of running a script with a spoofed PATH using execle
#define _GNU_SOURCE#include<unistd.h>intmain(){
char* e[] = {"PATH=/habr:/rulez", NULL};
execle("/tmp/test.sh", "test.sh", NULL, e);
// Unreachablereturn1;
}
$ cat test.sh
#!/bin/bashecho$0echo$PATH
$ gcc test.c && ./a.out
/tmp/test.sh
/habr:/rulez
There is a convention that implies that argv [0] matches zero arguments for exec * family functions. However, this can be broken.
Example of when cat becomes a dog using execlp
#define _GNU_SOURCE#include<unistd.h>intmain(){
execlp("cat", "dog", "--help", NULL);
// Unreachablereturn1;
}
$ gcc test.c && ./a.out
Usage: dog [OPTION]... [FILE]...
*Вывод обрезан*
A curious reader may notice that
int main(int argc, char* argv[])
there is a number in the function signature — the number of arguments, but exec*
nothing in the family of functions is passed. Why? Because when you start the program, control is not transferred immediately to main. Before this, some actions defined by glibc are performed, including counting argc.State "waiting"
Some system calls can take a long time, such as I / O. In such cases, the process goes into the "waiting" state. As soon as the system call is completed, the kernel will transfer the process to the “ready” state.
In Linux, there is also a “waiting” state in which the process does not respond to interrupt signals. In this state, the process becomes “unkillable”, and all incoming signals are queued until the process leaves this state.
The kernel itself chooses which of the states to transfer the process to. Most often, processes that request I / O get into the "waiting (without interrupts)" state. This is especially noticeable when using a remote disk (NFS) with not very fast internet.
“Stopped” status
At any time, you can pause the execution of the process by sending it a SIGSTOP signal. The process will go to the “stopped” state and remain there until it receives a signal to continue working (SIGCONT) or die (SIGKILL). The remaining signals will be queued.
Process completion
No program can complete itself. They can only ask the system for this through a system call
_exit
or be terminated by the system due to an error. Even when returning a number from main()
, it is still implicitly called _exit
. Although the system call argument takes an int, only the low byte of the number is taken as the return code.
Zombie condition
Immediately after the process has completed (no matter whether it is correct or not), the kernel records information about how the process ended and translates its zombie state. In other words, zombies are a completed process, but its memory is still stored in the core.
Moreover, this is the second state in which the process can safely ignore the SIGKILL signal, because dead cannot die again.
Forgetting
The return code and the reason for completing the process are still stored in the kernel and need to be retrieved from there. To do this, you can use the appropriate system calls:
pid_t wait(int *wstatus); /* Аналогично waitpid(-1, wstatus, 0) */pid_t waitpid(pid_t pid, int *wstatus, int options);
All information about the completion of the process fits into the data type int. The macros described in the man page are used to get the return code and the reason for terminating the program
waitpid(2)
.An example of correct completion and receipt of a return code
#include<stdio.h>#include<unistd.h>#include<errno.h>#include<sys/wait.h>#include<sys/types.h>intmain(){
int pid = fork();
switch(pid) {
case-1:
perror("fork");
return-1;
case0:
// Childreturn13;
default: {
// Parentint status;
waitpid(pid, &status, 0);
printf("exit normally? %s\n", (WIFEXITED(status) ? "true" : "false"));
printf("child exitcode = %i\n", WEXITSTATUS(status));
break;
}
}
return0;
}
$ gcc test.c && ./a.out
exit normally? true
child exitcode = 13
Example of incorrect termination
Передача argv[0] как NULL приводит к падению.
Передача argv[0] как NULL приводит к падению.
#include<stdio.h>#include<unistd.h>#include<errno.h>#include<sys/wait.h>#include<sys/types.h>intmain(){
int pid = fork();
switch(pid) {
case-1:
perror("fork");
return-1;
case0:
// Child
execl("/bin/cat", NULL);
return13;
default: {
// Parentint status;
waitpid(pid, &status, 0);
if(WIFEXITED(status)) {
printf("Exit normally with code %i\n", WEXITSTATUS(status));
}
if(WIFSIGNALED(status)) {
printf("killed with signal %i\n", WTERMSIG(status));
}
break;
}
}
return0;
}
$ gcc test.c && ./a.out
killed with signal 6
There are cases in which the parent ends earlier than the child. In such cases, the parent of the child will be
init
and he will apply the call wait(2)
when the time comes. After the parent has taken away the information about the death of the child, the kernel erases all the information about the child so that another process will soon take its place.
Acknowledgments
Thanks to Sasha “Al” for editing and design assistance;
Thanks to Sasha “Reisse” for clear answers to difficult questions.
They bravely endured the inspiration that attacked me and the flurry of my questions that attacked them.