(5-2) Ways to move a large SQL table
Hello! This is my first article and I write it on behalf of a junior development engineer in C #. So there will not be any detailed information about SQL, only practical information and reflections on the solution of a rather non-obvious task that I had to deal with for the same beginners as myself.
First, I will describe the formulation of my task as an example in which there is a real need to transfer a large table.
So, imagine that you have a web-service and SQL (MS-SQL) database with a table of html-letters that your service sends to users. Letters are stored for a number of years and cannot be deleted, as they are necessary for collecting statistics and analytics. However, every year the number of letters grows, the base grows, and there is less and less space on the SQL server (in our case, another factor was the restoration of the base to the test site, since its time increased proportionally) and with that you need something do. Fortunately, in our case there is a free server with a bunch of free space (in reality it may not be, and of course this is a temporary solution, but this is beyond the scope of the article). So the task arose of transferring a large table (and saying “big”, I mean a really large table, everything that I saw while looking for similar solutions was around 60-100GB,
We will consider several ways to solve this problem, but not all of them will relate to the transfer type of server - server. Sometimes it may be necessary to transfer a table between databases within a single server. Also, some methods are purely theoretical, I have not tested them all in practice, but they probably should work.
Method -1. Data
No matter how obvious it may sound, you need to know what data you are going to transfer. Most often, the data is not stored in the most optimal way, and unnecessary information can also be stored. It is likely that in your particular case, you can do without transferring all the data.
First, removing a column is likely to help you, but this is a blocking operation, and it is not always possible to stop the web service. And on Habré there is an article in which it is told how it can be carried out.
Secondly, do not forget about normalization. Perhaps some data can be transferred to the dictionary (in the case of letters, it was possible to store not the bodies of the letters, but templates with the data that is inserted there), and in the large table only the id of these elements can be stored, this can save you a lot of space.
Method 0. SELECT INTO
Joke =) So unless you put yourself a base. However, if we are talking about the small size of the table (then what are you doing here), you can try to transfer the base with the help of this instruction. Also, if you have a test base, you can conduct an experiment to estimate the total transfer time by this method head-on.
Method 1. Backup
The most “canonical” method, this was the solution to my problem. We make a backup of the database containing our table, and restore it on another server and clear all unnecessary. Further, if it is possible to stop the web service, you can redo it by setting the record in the transferred table, and delete the old one * (most likely a moment may arise that you need to write queries to it with joins, for this google how to link sql- servers). If there is no such possibility, we fix the id of the last letter (for synchronization), then it will be necessary to delete * all transferred letters ( we will continue to write in the old table).
* Deleting a separate topic for conversation, it may seem that it is performed much faster than the transfer, but this is not the case and in the general case I advise you to delete it in portions.
Method 2. MS-SQL Management Studio
If you have this studio, you can try using the built-in tool for exporting and importing data. Personally, I read on the stack overflow that this thing hung on a table of 60 gigs and did not risk it.
Method 3. Partition
Improved forehead method. The idea is to transfer data in the usual way with a timer between iterations. You break all the lines into portions (for example, 100k each), transfer the portion (and then you can delete it, but not sure how safe it is), then fall asleep and so on until the bitter end. It is better to transfer from the end, so that you do not have to synchronize the data at the end. The method is obviously very slow, but this way you will transfer everything without stopping the web service. Most likely it will be more convenient to implement not with a SQL script, but with the help of some ORM.
The process of transferring large amounts of data always takes a certain amount of time, and you should be ready for this. There is no magic way that would instantly solve your problem. In each case, you need to build on your volumes and limitations. If you do not like any of the ways, consider whether you can not use any of their union.
At the end I would like to add 2 important notes.
Any process of moving and deleting rows in SQL is logged in a transaction log so that everything can be rolled back in case of an error (I assumed earlier that this was done only as part of a transaction). And the size of the log is obtained even slightly more than the amount of data. Make sure you have enough space or disable logging, but this is not safe.
Before transferring, you need to make sure that the data file and the log file have the required size, since operations to expand them spend a significant amount of time, and configure them accordingly. Thus you optimize the transfer.
Thanks to all who read! I will be glad to any criticism, comments and clarifications. Share your own ways and techniques with big data, because This is often very important and necessary information that is not so easy to find.