fork () vs. vfork ()

Listen up!
After all, if the stars are lit, then does anyone need this?

V.V. Mayakovsky, 1914


I am engaged in programming for embedded systems, and I decided to write this article in order to better understand the problem of using fork () and vfork () system calls . The second of them is often advised not to use, but it is clear that he appeared for a reason.

Let's see when and why it is better to use this or that call.

As a bonus, a description of the vfork () / fork () implementations in our project will be given.. First of all, my interest is related to the use of these calls in embedded systems, and the main feature of these implementations is the lack of virtual memory. Perhaps the Khabrovsk residents, well versed in system programming and embedded systems, will give advice and share experiences.

Who cares, please, under the cat.

Let's start with the definition, that is, with the POSIX standard in which these functions are defined:

fork () creates an exact copy of the process with the exception of several variables. If the function succeeds, the function returns a value of zero to the child process and the number of the child process to the parent (then the processes begin to "live their own lives").

vfork () is defined as fork () with the following restriction: the behavior of the function is not defined if the process created with its help performs at least one of the following actions:
  • Will return from the function in which vfork () was called ;
  • Will call any function except _exit () or exec * () ;
  • It will change any data except the variable in which the value returned by the vfork () function is stored .

In order to understand why there is a system call with such strong limitations, you need to understand what an exact copy of the process is.

One of the first links in a search engine on this topic in Russian is a description of process cloning options in Linux. It follows from it that some parameters can be made common for the parent and child processes:
  • Address space (CLONE_VM);
  • File System Information (CLONE_FS);
  • Table of open files (CLONE_FILES);
  • Signal handler table (CLONE_SIGHAND);
  • Parent process (CLONE_PARENT).

POSIX does not allow variables to be changed for vfork () , which suggests that it is a matter of cloning the address space. This link confirms the assumption:
Unlike fork () , vfork () does not create a copy of the parent process, but creates the address space shared with the parent process until the _exit function or one of the exec functions is called.
The parent process stops its execution at this time. From here all restrictions on use follow - the child process cannot change any global variables or even general variables shared with the parent process.

In other words, if this statement is true, after calling vfork (), both processes will see the same data.
Let's do an experiment. If this is true, then the changes made to the data of the generated process should be visible in the parent process, and vice versa.

The code checking the assumption.
static int create_process(void) {
    pid_t pid;
    int status;
    int common_variable;
    common_variable = 0;
    pid = fork();
    if (-1 == pid) {
        return errno;
    }
    if (pid == 0) {
        /* Если исполняется дочерний процесс */
        common_variable = 1;
        exit(EXIT_SUCCESS);
    }
    waitpid(pid, &status, 0);
    if (common_variable) {
        puts("vfork(): common variable has been changed.");
    } else {
        puts("fork(): common variable hasn't been changed.");
    }
    return EXIT_SUCCESS;
}
int main(void) {
    return create_process();
}

If you build and run this program, we get the output:
fork (): common variable hasn't been changed.

When replacing fork () with vfork () , the output will change:
vfork (): common variable has been changed.

Many people use this property when transferring data between processes, although the behavior of such programs is not defined by POSIX. This is likely to create problems that make it advisable not to use vfork () .

Indeed, it is one thing when a developer consciously changes the value of a variable, and quite another when he forgets that a child process cannot, for example, return from a function in which vfork () was called(after all, this will destroy the stack structure of the parent process!). And even acting deliberately, as usual, you use undocumented opportunities at your own peril and risk.

Here are a couple of less obvious problems:
  • The book “Secure Programming for Linux and Unix HOWTO” says that even if the child process does not change any data in the high-level language code, this may not be the case in machine code (for example, due to the appearance of hidden temporary variables).
  • This blog parses the following question: what if vfork () is called in a multi-threaded application? Consider the implementation of vfork () on Linux: the manual says that the parent process stops when called, but in fact it only happens with the current thread (which, of course, is easier to implement). This means that the child process continues to run in parallel with other threads, which can, for example, change the rights of the parent process. And here everything will become very bad: we will get two processes with different rights in the same address space, which opens a security hole.

Now consider the functions of the exec * family . Only them (not counting _exit () ) can be called in the process obtained using vfork () . They create a new address space, and then load the code and data from the specified file into it. At the same time, the old address space is essentially destroyed.
Therefore, if the process is created using fork () and then calls exec * () , creating (copying) the address space when calling fork () was redundant, and this is a rather time-consuming operation, and it may take the main time to call fork (). On Wikipedia, for example, this moment is given the most attention, and, unlike the standard, it is explicitly said :
The fork () operation creates a separate address space for the child. The child process has an exact copy of all the memory segments of the parent process.

Of course, on most modern systems with virtual memory, no copying occurs; all pages of the memory of the parent process are simply marked with the copy-on-write flag . However, it is necessary to go over the entire hierarchy of tables, and this takes time.

It turns out that the vfork () call should be faster than fork () , which is also mentioned in the LinuxMan page .

We will conduct another experiment and make sure that this is indeed so. We slightly modify the previous example: add a cycle to create 1000 processes, remove the general variable and display it on the screen.

Received code.
#include 
#include 
#include 
#include 
#include 
#include 
static int create_process(void) {
    pid_t pid;
    int status;
    pid = vfork();
    if (-1 == pid) {
        return errno;
    }
    if (pid == 0) {
        /* child */
        exit(EXIT_SUCCESS);
    }
    waitpid(pid, &status, 0);
    return EXIT_SUCCESS;
}
int main(void) {
    int i;
    for (i = 0; i < 1000; i ++) {
        create_process();
    }
    return EXIT_SUCCESS;
}

Run through the time command.
Output when using fork ()Output when using vfork ()
real    0m0.135s
user    0m0.000s
sys     0m0.052s
real    0m0.028s
user    0m0.000s
sys     0m0.016s

The result, to say the least, is impressive. From start to start, the data will differ slightly, but still vfork () will be 4 to 5 times faster.

The conclusions are as follows:
fork () is a heavier call, and if vfork () can be called , it is better to use it.
vfork () is a less secure call, and it’s easier to shoot yourself in the foot with it, and, accordingly, you need to use it meaningfully.
fork () / vfork () should be used where you need to create separate resources for the process (inodes, user, work folder), in other cases it is worth working with pthread *, which work even faster.
fork ()best used when you really need to create a separate address space. However, it is very difficult to implement on small processor platforms without hardware support for virtual memory.

Before moving on to the second part of the article, I note that POSIX has a posix_spawn () function . This function, in fact, contains vfork () and exec () , and, therefore, avoids the problems associated with vfork () , in the absence of re-creating the address space as in fork () .

Now let's move on to our implementation of fork () / vfork () without MMU support.

Vfork implementation


When implementing vfork () on our system, we assumed that the call to vfork () should be as follows: the parent goes into standby mode, and the child process is returned from vfork () first , awakening the parent when the _exit () or exec * () function is called . This means that the descendant can be executed on the parent stack, but with its own resources of other types: inodes, a table of signals, and so on.

The storage of various types of resources in our project is a task ( struct task) It is this structure that describes all the resources of a process, including available memory, inodes and a list of threads belonging to this process. A task always has a main thread - one that is created when it is initialized. The stream in our system is called the object of planning, more about this in the article of my colleague . Since the flow controls the stack, not the task, we can offer two implementation options:
  • Change the stack in the newly created thread to the parent stack;
  • “Replace” a task with a new one for the same thread of execution.

One way or another, the task will have to be created, or rather inherited from the parent: the table of signals, environment variables, and so on will be cloned. However, the address space will not be inherited.

The return from vfork () will be done twice: for the parent and child processes. So, somewhere, the registers of the stack frame from which vfork () was called should be saved . This cannot be done on the stack, since the child process may overwrite these values ​​at runtime. However, the vfork () signaturedoes not imply the presence of a buffer, so first the registers are stored on the stack, and only then - somewhere in the parent task. Saving registers on the stack could be done using a system call, but we decided to do without it and did it on our own. Naturally, the vfork () function is written in assembler.
Code for i386 architecture.
vfork:
    subl    $28, %esp;          
    pushl   %ds;
    pushl   %es;
    pushl   %fs;
    pushl   %gs;
    pushl   %eax;    
    pushl   %ebp;    
    pushl   %edi;    
    pushl   %esi;    
    pushl   %edx;    
    pushl   %ecx;    
    pushl   %ebx;          
    movl    PT_END(%esp), %ecx;
    movl    %ecx, PT_EIP(%esp);
    pushf;                      
    popl    PT_EFLAGS(%esp);    
    movl    %esp, %eax;         
    addl    $PT_END+4, %eax;
    movl    %eax, PT_ESP(%esp);
    push    %esp;               
    call    vfork_body

Thus, first the registers are saved on the stack, and then the C- th function vfork_body () is called. As an argument, a pointer to a structure with a set of registers is passed to it.
Mentioned structure for i386.
typedef struct pt_regs {
    /* Pushed by SAVE_ALL. */
    uint32_t ebx;
    uint32_t ecx;
    uint32_t edx;
    uint32_t esi;
    uint32_t edi;
    uint32_t ebp;
    uint32_t eax;
    uint32_t gs;
    uint32_t fs;
    uint32_t es;
    uint32_t ds;
    /* Pushed at the very beginning of entry. */
    uint32_t trapno;
    /* In some cases pushed by processor, in some - by us. */
    uint32_t err;
    /* Pushed by processor. */
    uint32_t eip;
    uint32_t cs;
    uint32_t eflags;
    /* Pushed by processor, if switching of rings occurs. */
    uint32_t esp;
    uint32_t ss;
} pt_regs_t;

The vfork_body () code is architecturally independent. He is responsible for creating the task and maintaining the necessary registers for exit.
The function code is vfork_body ().
void __attribute__((noreturn)) vfork_body(struct pt_regs *ptregs) {
        struct task *child;
        pid_t child_pid;
        struct task_vfork *task_vfork;
        int res;
        /* can vfork only in single thread application */
        assert(thread_self() == task_self()->tsk_main);
        /* create task description but not start its main thread */
        child_pid = task_prepare("");
        if (0 > child_pid) {
                /* error */
                ptregs_retcode_err_jmp(ptregs, -1, child_pid);
                panic("vfork_body returning");
        }
        child = task_table_get(child_pid);
        /* save ptregs for parent return from vfork() */
        task_vfork = task_resource_vfork(child->parent);
        memcpy(&task_vfork->ptregs, ptregs, sizeof(task_vfork->ptregs));
        res = vfork_child_start(child);
        if (res < 0) {
                /* Could not start child process */
                /* Exit child task */
                vfork_child_done(child, vfork_body_exit_stub, &res);
                /* Return to the parent */
                ptregs_retcode_err_jmp(&task_vfork->ptregs, -1, -res);
        }
        panic("vfork_body returning");
}

A little explanation for the code.
First, multithreading is checked (the problems associated with it when using vfork () were discussed above). Then a new task is created, and if it succeeds, the registers for returning from vfork () are stored in it .
After that, the vfork_child_start () function is called , which, as the name implies, “starts” the child process. The quotes here are not accidental, since in fact the task can be launched later, it all depends on the specific implementation, of which there are two in our project. Before proceeding to their description, consider the functions _exit () and exec * () .
When called, the parent thread must be unlocked. We will say that it is at this moment that the child process starts as a separate entity in the system.

Execv function code
int execv(const char *path, char *const argv[]) {
    struct task *task;
    /* save starting arguments for the task */
    task = task_self();
    task_resource_exec(task, path, argv);
    /* if vforked then unblock parent and  start execute new image */
    vfork_child_done(task, task_exec_callback, NULL);
    return 0;
}
Other functions of the exec * family are expressed through the call to execv () .

Function Code _exit ()
void _exit(int status) {
    struct task *task;
    task = task_self();
    vfork_child_done(task, task_exit_callback, (void *)status);
    task_start_exit();
    {
        task_do_exit(task, TASKST_EXITED_MASK | (status & TASKST_EXITST_MASK));
        kill(task_get_id(task_get_parent(task)), SIGCHLD);
    }
    task_finish_exit();
    panic("Returning from _exit");
}

As you can probably see from the above code, in order to unlock the parent process, the vfork_child_done () function is used with the handler specified as one of the parameters. To implement a particular work algorithm, the following should be implemented:
  • vfork_child_start () - a function called at the beginning of the cloning process should block the parent process;
  • vfork_child_done () - a function called when the child process is finally launched, the parent process is unlocked;
  • task_exit_callback () - a handler to complete the child process;
  • task_exec_callback () - a handler to fully launch the child process.


First implementation


The idea of ​​the first implementation is to use the same control flow in addition to the same stack. In fact, in this case, you only need to "replace" the task for the current thread with the child until the child task starts completely when vfork_child_done () is called.

Vfork_child_start () function code
int vfork_child_start(struct task *child) {
        thread_set_task(thread_self(), child);
        /* mark as vforking */
        task_vfork_start(child);
        /* Restore values of the registers and return 0 */
        ptregs_retcode_jmp(&task_resource_vfork(child->parent)->ptregs, 0);
        panic("vfork_child_start returning");
        return -1;
}

The following happens: the current thread of execution (that is, the parent) is bound to the spawned process by the thread_set_task () function - just change the corresponding pointer in the structure of the current thread. This means that when accessing the resources associated with the task, the thread will refer to the task as a child, and not as a parent, as before. For example, when a thread tries to find out which task the thread belongs to ( task_self () function ), it will receive a child task.

After this, the child task is marked as created as a result of vfork , this flag will be needed so that the vfork_child_done () function is executed as needed (more on this later).
Then registers are saved that were saved when vfork () was called. Recall that according to POSIX, the vfork () call should return a value of zero to the child process, which is what happens with the ptregs_retcode_jmp (ptregs, 0) call .

As already mentioned, when the child process calls the _exit () or execv () function, the vfork_chlid_done () function must unlock the parent thread. In addition, you need to prepare the child task for the execution of the desired handler.

Vfork_child_done () function code
void vfork_child_done(struct task *child, void * (*run)(void *), void *arg) {
        struct task_vfork *vfork_data;
        if (!task_is_vforking(child)) {
                return;
        }
        task_vfork_end(child);
        task_start(child, run, NULL);
        thread_set_task(thread_self(), child->parent);
        vfork_data = task_resource_vfork(child->parent);
        ptregs_retcode_jmp(&vfork_data->ptregs, child->tsk_id);
}

Handler code task_exit_callback ()
void *task_exit_callback(void *arg) {
    _exit((int)arg);
    return arg;
}

Handler code task_exec_callback ()
void *task_exec_callback(void *arg) {
    int res;
    res = exec_call();
    return (void*)res;
}

When calling vfork_child_done (), it is necessary to take into account the case of using exec () / _exit () without vfork () - then you just need to exit the current function, because there is no need to engage in unlocking the parent, and you can immediately proceed to launch the child. If the process was created using vfork () , the following occurs: first, the is_vforking flag is removed from the child task using task_vfork_end () , then, finally, the main thread of the child task starts. The run function is specified as an entry point , which should be one of the handlers described earlier ( task_exec_callback , task_exit_callback) - they are necessary when implementing vfork () . After that, the thread belongs to the task: instead of the child, the parent is again indicated. Finally, it returns to the parent task from the vfork () call with the child process ID as the return value. It has been said above that this is done by calling ptregs_retcode_jmp () .

Second vfork implementation



The idea of ​​the second implementation is to use the parent stack with a new thread that was created along with the new task. This will happen automatically if you restore the registers previously stored in the parent stream in the child stream. In this case, you can use real synchronization between threads, as described in the already mentioned article . This is certainly a more beautiful, but also more difficult to implement solution, because when the parent thread is waiting, its descendant will be executed on the same stack. So, while waiting, you need to switch to some intermediate stack, where you can safely wait for a call by the descendant of _exit () or exec * () .

Vfork_child_start function code for the second implementation
int vfork_child_start(struct task *child) {
        struct task_vfork *task_vfork;
        task_vfork = task_resource_vfork(task_self());
        /* Allocate memory for the new stack */
        task_vfork->stack = sysmalloc(sizeof(task_vfork->stack));
        if (!task_vfork->stack) {
                return -EAGAIN;
        }
        task_vfork->child_pid = child->tsk_id;
        /* Set new stack and go to vfork_waiting */
        if (!setjmp(task_vfork->env)) {
                CONTEXT_JMP_NEW_STACK(vfork_waiting,
                        task_vfork->stack + sizeof(task_vfork->stack));
        }
        /* current stack was broken, can't reach any old data */
        task_vfork = task_resource_vfork(task_self());
        sysfree(task_vfork->stack);
        ptregs_retcode_jmp(&task_vfork->ptregs, task_vfork->child_pid);
        panic("vfork_child_start returning");
        return -1;
}

Explanations for the code:
First, space is allocated for the stack, after that the pid (process ID) of the child is saved , since the parent will need it to return from vfork () .
Calling setjmp () will allow you to return to the place on the stack where vfork () was called . As already mentioned, the wait should be performed on some intermediate stack, and switching is performed using the CONTEXT_JMP_NEW_STACK () macro , which changes the current stack and transfers control to the vfork_waiting () function . In it, the descendant will be activated and the ancestor will be blocked until vfork_child_done () is called.

Vfork_waiting code
static void vfork_waiting(void) {
        struct sigaction ochildsa;
        struct task *child;
        struct task *parent;
        struct task_vfork *task_vfork;
        parent = task_self();
        task_vfork = task_resource_vfork(parent);
        child = task_table_get(task_vfork->child_pid);
        vfork_wait_signal_store(&ochildsa);
        {
                task_vfork_start(parent);
                task_start(child, vfork_child_task, &task_vfork->ptregs);
                while (SCHED_WAIT(!task_is_vforking(parent)));
        }
        vfork_wait_signal_restore(&ochildsa);
        longjmp(task_vfork->env, 1);
        panic("vfork_waiting returning");
}

As you can see from the code, the signal table of the child process is saved first. In fact, the SIGCHLD signal will be overridden , which is sent when the status of the child process changes. In this case, it is used to unlock the parent.

New SIGCHLD Handler
static void vfork_parent_signal_handler(int sig, siginfo_t *siginfo, void *context) {
    task_vfork_end(task_self());
}

Saving and restoring the signal table is done using the sigaction () POSIX call .

Saving handler
static void vfork_wait_signal_store(struct sigaction *ochildsa) {
    struct sigaction sa;
    sa.sa_flags = SA_SIGINFO;
    sa.sa_sigaction = vfork_parent_signal_handler;
    sigemptyset(&sa.sa_mask);
    sigaction(SIGCHLD, &sa, ochildsa);
}

Handler Recovery
static void vfork_wait_signal_restore(const struct sigaction *ochildsa) {
    sigaction(SIGCHLD, ochildsa, NULL);
}

After replacing the signal handler, the task is marked as in standby mode, in which it will remain until the current launch of the child task when _exit () / exec * () is called. The function vfork_child_task () is used as the entry point to the task , which restores previously saved registers and returns from vfork () .

Vfork_child_task () function code
static void *vfork_child_task(void *arg) {
    struct pt_regs *ptregs = arg;
    ptregs_retcode_jmp(ptregs, 0);
    panic("vfork_child_task returning");
}

When _exit () and exec * () are called , SIGCHLD will be sent , and the signal handler will uncheck the wait for the child to start. After that, the old SIGCHLD signal handler is restored , and control returns to the vfork_child_start () function using longjmp () . It must be remembered that the stack frame of this function will be corrupted after the execution of the child process, so local variables will not contain what is needed. After releasing the previously allocated stack, the child task number is returned from the vfork () function.

Checking vfork performance


To test the correct behavior of vfork (), we wrote a set of tests covering several situations.

Two of them check for a correct return from vfork () when _exit () and execv () are called by the child process.
First test
TEST_CASE("after called vfork() child call exit()") {
    pid_t pid;
    pid_t parent_pid;
    int res;
    parent_pid = getpid();
    pid = vfork();
    /* When vfork() returns -1, an error happened. */
    test_assert(pid != -1);
    if (pid == 0) {
        /* When vfork() returns 0, we are in the child process. */
        _exit(0);
    }
    wait(&res);
    test_assert_not_equal(pid, parent_pid);
    test_assert_equal(getpid(), parent_pid);
}

Second test
TEST_CASE("after called vfork() child call execv()") {
    pid_t pid;
    pid_t parent_pid;
    int res;
    parent_pid = getpid();
    pid = vfork();
    /* When vfork() returns -1, an error happened. */
    test_assert(pid != -1);
    if (pid == 0) {
        close(0);
        close(1);
        close(2);
        /* When vfork() returns 0, we are in the child process. */
        if (execv("help", NULL) == -1) {
            test_assert(0);
        }
    }
    wait(&res);
    test_assert_not_equal(pid, parent_pid);
    test_assert_equal(getpid(), parent_pid);
}

Another test verifies the use of the same stack by parent and child processes.
Third test
TEST_CASE("parent should see stack modifications made from child") {
    pid_t pid;
    int res;
    int data;
    data = 1;
    pid = vfork();
    /* When vfork() returns -1, an error happened. */
    test_assert(pid != -1);
    if (pid == 0) {
        data = 2;
        /* When vfork() returns 0, we are in the child process. */
        _exit(0);
    }
    wait(&res);
    test_assert_equal(data, 2);
}


However, I would like to check the correctness of work on some real, and third-party, program, and for this a well-known dropbear package was chosen . When configured, it checks for fork () , and if it does not find it, it can use vfork () . I must say right away that this was done to support ucLinux , and not to improve performance.

The OS was configured accordingly (for dropbear to use vfork () ), and a connection was successfully established using ssh with both implementations.

Screenshot


PS Also in our project we managed to implement fork () itself without using MMU, at the moment an article is being written about this.

Also popular now: