Memory mapped files
In this article, I would like to talk about such a wonderful thing as files that are displayed in memory (memory-mapped files, hereinafter - MMF ).
Sometimes their use can give a pretty significant performance boost compared to regular buffered file handling.
This is a mechanism that allows you to display files on a piece of memory. Thus, when reading data from it, the corresponding bytes are read from the file. With the record is similar.
“Cool, of course, but what does it give?” - you ask. I will explain with an example.
Suppose we are faced with the task of processing a large file (several tens or even hundreds of megabytes). It would seem that the task is trivial - open the file, copy it from block to memory, process it. What happens at the same time? Each block is copied to a temporary cache, then from it to our memory. And so with each block. There is suboptimal memory usage for the cache + a bunch of copy operations. What to do?
This is where the MMF mechanism comes to our aid. When we access the memory where the file is mapped, the data is downloaded from the disk to the cache (if they are not already there), then the cache is mapped to the address space of our program. If this data is deleted, the display is canceled. Thus, we get rid of the copy operation from the cache to the buffer. In addition, we do not need to worry about optimizing the work with the disk - the OS kernel takes care of all the dirty work.
At one time I conducted an experiment. Using quantify, I measured the speed of a program that buffered copies a large 500 mb file to another file. And the speed of a program that does the same, but with the help of MMF. So the second one works faster by almost 30% (in Solaris, in other OSs the result may differ). Agree, not bad.
To take advantage of this opportunity, we must tell the kernel about our desire to map the file to memory. This is done using the mmap () function .
It returns the address of the beginning of the displayed memory, or MAP_FAILED in case of failure.
The first argument is the desired address of the beginning of the portion of the reflected memory. I don’t know when it might come in handy. We pass 0 - then the kernel itself will select this address.
len - the number of bytes to be mapped to memory.
prot is a number that determines the degree of security of the displayed memory area (read only, write only, execute, region not available). The usual values are PROT_READ , PROT_WRITE (can be co-combined via OR). I will not dwell on this - read more in mana. I only note that the memory security will not be set lower than the rights with which the file is open.
flag- describes the attributes of the area. The usual value is MAP_SHARED . For the rest, smoke mana. But I note that using MAP_FIXED reduces the portability of the application, because its support is optional on POSIX systems.
filedes - as you may have guessed - the handle of the file to be displayed.
off - offset of the displayed area from the beginning of the file.
Important note . If you plan to use MMF to write to a file, before mapping, you must set the final file size not less than the size of the displayed memory! Otherwise, you will run into SIGBUS.
The following is an example (honestly stolen from the wonderful book "Unix. Professional Programming") a program that copies a file using MMF.
That's just it. Hope this article has been helpful. I will gladly accept constructive criticism.
Sometimes their use can give a pretty significant performance boost compared to regular buffered file handling.
This is a mechanism that allows you to display files on a piece of memory. Thus, when reading data from it, the corresponding bytes are read from the file. With the record is similar.
“Cool, of course, but what does it give?” - you ask. I will explain with an example.
Suppose we are faced with the task of processing a large file (several tens or even hundreds of megabytes). It would seem that the task is trivial - open the file, copy it from block to memory, process it. What happens at the same time? Each block is copied to a temporary cache, then from it to our memory. And so with each block. There is suboptimal memory usage for the cache + a bunch of copy operations. What to do?
This is where the MMF mechanism comes to our aid. When we access the memory where the file is mapped, the data is downloaded from the disk to the cache (if they are not already there), then the cache is mapped to the address space of our program. If this data is deleted, the display is canceled. Thus, we get rid of the copy operation from the cache to the buffer. In addition, we do not need to worry about optimizing the work with the disk - the OS kernel takes care of all the dirty work.
At one time I conducted an experiment. Using quantify, I measured the speed of a program that buffered copies a large 500 mb file to another file. And the speed of a program that does the same, but with the help of MMF. So the second one works faster by almost 30% (in Solaris, in other OSs the result may differ). Agree, not bad.
To take advantage of this opportunity, we must tell the kernel about our desire to map the file to memory. This is done using the mmap () function .
#include
void *mmap(void *addr, size_t len, int prot, int flag, int filedes, off_t off);It returns the address of the beginning of the displayed memory, or MAP_FAILED in case of failure.
The first argument is the desired address of the beginning of the portion of the reflected memory. I don’t know when it might come in handy. We pass 0 - then the kernel itself will select this address.
len - the number of bytes to be mapped to memory.
prot is a number that determines the degree of security of the displayed memory area (read only, write only, execute, region not available). The usual values are PROT_READ , PROT_WRITE (can be co-combined via OR). I will not dwell on this - read more in mana. I only note that the memory security will not be set lower than the rights with which the file is open.
flag- describes the attributes of the area. The usual value is MAP_SHARED . For the rest, smoke mana. But I note that using MAP_FIXED reduces the portability of the application, because its support is optional on POSIX systems.
filedes - as you may have guessed - the handle of the file to be displayed.
off - offset of the displayed area from the beginning of the file.
Important note . If you plan to use MMF to write to a file, before mapping, you must set the final file size not less than the size of the displayed memory! Otherwise, you will run into SIGBUS.
The following is an example (honestly stolen from the wonderful book "Unix. Professional Programming") a program that copies a file using MMF.
#include
#include
int main(int argc, char *argv[])
{
int fdin, fdout;
void *src, *dst;
struct stat statbuf;
if (argc != 3)
err_quit("Использование: %s ", argv[0]);
if ( (fdin = open(argv[1], O_RDONLY)) < 0 )
err_sys("невозможно открыть %s для чтения", argv[1]);
if ( (fdout = open(argv[2], O_RDWR | O_CREAT | O_TRUNC, FILE_MODE)) < 0 )
err_sys("невозможно создать %s для записи", argv[2]);
if ( fstat(fdin, &statbuf) < 0 ) /* определить размер входного файла */
err_sys("fstat error");
/* установить размер выходного файла */
if ( lseek(fdout, statbuf.st_size - 1, SEEK_SET) == -1 )
err_sys("ошибка вызова функции lseek");
if ( write(fdout, "", 1) != 1 )
err_sys("ошибка вызова функции write");
if ( (src = mmap(0, statbuf.st_size, PROT_READ, MAP_SHARED, fdin, 0)) == MAP_FAILED )
err_sys("ошибка вызова функции mmap для входного файла");
if ( (dst = mmap(0, statbuf.st_size, PROT_READ | PROT_WRITE, MAP_SHARED, fdout, 0)) == MAP_FAILED )
err_sys("ошибка вызова функции mmap для выходного файла");
memcpy(dst, src, statbuf.st_size); /* сделать копию файла */
exit(0);
}
* This source code was highlighted with Source Code Highlighter. That's just it. Hope this article has been helpful. I will gladly accept constructive criticism.