Is it worth storing files in a MySQL database?
I had to observe disputes on this topic in several forums, sometimes even several times. Basically, these topics concerned the storage of images, less often text files. I myself am an opponent of this method and in the article I will try to provide reasonable evidence that storing files in the database is inconvenient and negatively affects the speed of the system as a whole. Since I mainly work with MySQL, I will consider this issue from the point of view of storing files in its databases when developing for WEB.
1. RAM. The most important argument against, in my opinion, is the fact that even when you just need to give the file to the user (for example, display the image), you still have to load it into RAM, because All data selected by the query from the database is loaded into RAM. At the time of this writing, the most expensive, after the processor, RAM on dedicated servers is RAM. At the same time, I'm not talking about regular hosting, where in the vast majority of cases a certain amount of RAM is allocated for each account, exceeding which leads to "Fatal error: Allowed memory size of X bytes exhausted (tried to allocate Y bytes)" or to angry letters from support.
2. File upload. If the files are stored in the database, in order to get them back you will have to use a script written in one or another language, which should do the following:
2.1 Open a new connection to the database, the number of which is far from infinite, or take an existing but free connection, which bad for the same reason;
2.2 A request to select the contents of the file from the table. Here 2 problems arise immediately:
2.2.1 Downloading the contents of a file to RAM (see section 1);
2.2.2 By what parameter to search for a file in the database. It is logical to assume that the search by integer ID will be the fastest, but in this case, you will also need to use this ID (for example:) in the link to the file output script
, which in the case of creating such links manually will complicate the work.
2.3 Give the necessary headers and the contents of the file.
And all these steps will be done for each file, and if there are 20 of them displayed on the page, and the download goes simultaneously, that is, the risk of not seeing a single one.
Also, this method of storing files practically deprives you of the following possibilities:
- distribute files across several servers and, for download, give links directly to these servers;
- install reverse proxy server to speed up "slow clients"
- use CDN .
3. Dumps. One of the arguments for “often” is the following: “If I need to transfer the site to another hosting, I just need to dump and copy the code.” Personally, I do not consider this argument weighty for the following reasons:
3.1 Let's finally start writing sites so that they work quickly, and not just that they are easily portable;
3.2 Storage of any, and especially binary files in the database leads to the following:
3.2.1 since the database is increasing then the time for creating its dump and its size is correspondingly increasing;
3.2.2 the presence in the dump of the contents of a binary file greatly affects its readability, and also, importantly, not all console editors can open a large dame for editing, for example, MCEDIT cannot;
3.2.3 with high probability there will be problems with filling such dumps:
- firstly, it will be a rather long process;
- secondly, it can be quite difficult, especially if the site is hosted, which does not allow connecting to the database from the outside and does not allow access to the server via ssh in order to use the mysql tool. In this case, you will have to use scripts like phpMyAdmin (although from my own experience working with it I can say with almost 100% certainty that uploading a large dump of 50 to 100 mb through it is almost impossible) or Sypex Dumper .
4. Differentiation of access to files. Another argument in favor of many discussions is that when storing data in a database it is easier to organize access control for files for different users of the site. I indirectly agree with this argument, because it is really simpler, but firstly, it does not outweigh the first three points, and secondly, you can use the methods that I described in this article to differentiate access .
5. Centralized storage. They also try to write this item to the asset storing files in the database, justifying that if there are many servers that work with the same files, it is much more convenient to store them in one place. In my opinion, this is also not an argument, because even if the files are physically stored on different servers, then nothing prevents them from being mounted on all the servers where they should be used, and thus creating the same directory structure on all servers.
That's all, but I want to remind once again that I considered this issue from the side of programming for the web, namely for MySQL. At the same time, I realize that for other areas of programming and other DBMSs, such as MSSQL and Oracle, all or part of my arguments may be incorrect.
1. RAM. The most important argument against, in my opinion, is the fact that even when you just need to give the file to the user (for example, display the image), you still have to load it into RAM, because All data selected by the query from the database is loaded into RAM. At the time of this writing, the most expensive, after the processor, RAM on dedicated servers is RAM. At the same time, I'm not talking about regular hosting, where in the vast majority of cases a certain amount of RAM is allocated for each account, exceeding which leads to "Fatal error: Allowed memory size of X bytes exhausted (tried to allocate Y bytes)" or to angry letters from support.
2. File upload. If the files are stored in the database, in order to get them back you will have to use a script written in one or another language, which should do the following:
2.1 Open a new connection to the database, the number of which is far from infinite, or take an existing but free connection, which bad for the same reason;
2.2 A request to select the contents of the file from the table. Here 2 problems arise immediately:
2.2.1 Downloading the contents of a file to RAM (see section 1);
2.2.2 By what parameter to search for a file in the database. It is logical to assume that the search by integer ID will be the fastest, but in this case, you will also need to use this ID (for example:) in the link to the file output script
2.3 Give the necessary headers and the contents of the file.
And all these steps will be done for each file, and if there are 20 of them displayed on the page, and the download goes simultaneously, that is, the risk of not seeing a single one.
Also, this method of storing files practically deprives you of the following possibilities:
- distribute files across several servers and, for download, give links directly to these servers;
- install reverse proxy server to speed up "slow clients"
- use CDN .
3. Dumps. One of the arguments for “often” is the following: “If I need to transfer the site to another hosting, I just need to dump and copy the code.” Personally, I do not consider this argument weighty for the following reasons:
3.1 Let's finally start writing sites so that they work quickly, and not just that they are easily portable;
3.2 Storage of any, and especially binary files in the database leads to the following:
3.2.1 since the database is increasing then the time for creating its dump and its size is correspondingly increasing;
3.2.2 the presence in the dump of the contents of a binary file greatly affects its readability, and also, importantly, not all console editors can open a large dame for editing, for example, MCEDIT cannot;
3.2.3 with high probability there will be problems with filling such dumps:
- firstly, it will be a rather long process;
- secondly, it can be quite difficult, especially if the site is hosted, which does not allow connecting to the database from the outside and does not allow access to the server via ssh in order to use the mysql tool. In this case, you will have to use scripts like phpMyAdmin (although from my own experience working with it I can say with almost 100% certainty that uploading a large dump of 50 to 100 mb through it is almost impossible) or Sypex Dumper .
4. Differentiation of access to files. Another argument in favor of many discussions is that when storing data in a database it is easier to organize access control for files for different users of the site. I indirectly agree with this argument, because it is really simpler, but firstly, it does not outweigh the first three points, and secondly, you can use the methods that I described in this article to differentiate access .
5. Centralized storage. They also try to write this item to the asset storing files in the database, justifying that if there are many servers that work with the same files, it is much more convenient to store them in one place. In my opinion, this is also not an argument, because even if the files are physically stored on different servers, then nothing prevents them from being mounted on all the servers where they should be used, and thus creating the same directory structure on all servers.
That's all, but I want to remind once again that I considered this issue from the side of programming for the web, namely for MySQL. At the same time, I realize that for other areas of programming and other DBMSs, such as MSSQL and Oracle, all or part of my arguments may be incorrect.