
Large tables and unique keys
The other day I had a task - to add a new field to the table. It would seem that ALTER TABLE ... It was not there.
I ran this query in the evening, but it still ran in the morning (16 hours were not enough :)
I immediately assumed that the problem was most likely in building a unique index and the huge size of the original table. As you know, with ALTER TABLE, the server creates a temporary file into which it copies the necessary data. SHOW PROCESSLIST has confirmed this.
I will give a more detailed statement of the problem.
Given. There is a data table.
Type - MyISAM.
The table has a VARCHAR field with a length of 140 characters and a unique key on it.
The table has about 150 million rows.
The volume on the disk is 15 GB.
NecessaryAdd a new field to the table.
The search showed that the problem is known and that the usual way can last forever. The size of the index does not allow the cache to be used in RAM, so creating a new table line by line is very slow.
In the blog of Peter Zaitsev , a non-standard “hacker" solution was proposed.
1. Create a new table (for example, test_new) with the desired structure, but without any indexes, even without a primary key; if there is an auto-increment field, then remove auto_increment from its definition.
2. We fill this table with data using INSERT INTO test_new ... SELECT .
3. Create another table with the same structure as test_new, but with all the necessary keys; let's call it test_struct.
4. We go to the directory on the disk where our tables are stored and copy test_struct.frm over test_new.frm and test_struct.MYI over test_new.MYI.
5. Open the mysql client and execute the FLUSH TABLES and REPAIR TABLE test_new queries.
In my case, REPAIR TABLE took 20 minutes.
Possible problem. I also ran out of space in the MySQL temporary directory. In my case, it was / tmp with a capacity of 5 GB. The fact is that REPAIR TABLE creates a temporary file for the new index, which eventually weighed 7 GB, which, of course, did not fit into 5 GB. If there is not enough disk space, MySQL waits 1 minute, and then again tries to find it so many times. Of course, we remain in ignorance, thinking that he is busy with business :)
MySQL can be set to a temporary directory using the tmpdir parameter .
Where can I use this hack? When adding, deleting or changing a field in a very large table where there is a unique index, it is also very large (most likely on a string field). When creating a unique index in a similar table. When recreating such a table from a dump.
Addition. What's the secret? The usual creation of a unique index is line by line, record by record. REPAIR TABLE creates an index at once using sorting, which is significantly faster.
Good luck
I ran this query in the evening, but it still ran in the morning (16 hours were not enough :)
I immediately assumed that the problem was most likely in building a unique index and the huge size of the original table. As you know, with ALTER TABLE, the server creates a temporary file into which it copies the necessary data. SHOW PROCESSLIST has confirmed this.
I will give a more detailed statement of the problem.
Given. There is a data table.
Type - MyISAM.
The table has a VARCHAR field with a length of 140 characters and a unique key on it.
The table has about 150 million rows.
The volume on the disk is 15 GB.
NecessaryAdd a new field to the table.
The search showed that the problem is known and that the usual way can last forever. The size of the index does not allow the cache to be used in RAM, so creating a new table line by line is very slow.
In the blog of Peter Zaitsev , a non-standard “hacker" solution was proposed.
1. Create a new table (for example, test_new) with the desired structure, but without any indexes, even without a primary key; if there is an auto-increment field, then remove auto_increment from its definition.
2. We fill this table with data using INSERT INTO test_new ... SELECT .
3. Create another table with the same structure as test_new, but with all the necessary keys; let's call it test_struct.
4. We go to the directory on the disk where our tables are stored and copy test_struct.frm over test_new.frm and test_struct.MYI over test_new.MYI.
5. Open the mysql client and execute the FLUSH TABLES and REPAIR TABLE test_new queries.
In my case, REPAIR TABLE took 20 minutes.
Possible problem. I also ran out of space in the MySQL temporary directory. In my case, it was / tmp with a capacity of 5 GB. The fact is that REPAIR TABLE creates a temporary file for the new index, which eventually weighed 7 GB, which, of course, did not fit into 5 GB. If there is not enough disk space, MySQL waits 1 minute, and then again tries to find it so many times. Of course, we remain in ignorance, thinking that he is busy with business :)
MySQL can be set to a temporary directory using the tmpdir parameter .
Where can I use this hack? When adding, deleting or changing a field in a very large table where there is a unique index, it is also very large (most likely on a string field). When creating a unique index in a similar table. When recreating such a table from a dump.
Addition. What's the secret? The usual creation of a unique index is line by line, record by record. REPAIR TABLE creates an index at once using sorting, which is significantly faster.
Good luck