Fix and change MySQL encodings

    Fighting the KrakozyabryDue to the fact that quite a lot of people ask for help to fix the problem with MySQL encodings, I decided to write an article describing how to “treat” the most common cases.

    The article describes not how to initially correctly configure MySQL encodings (quite a lot has been written about this already), but about cases when there are quite large tables with incorrect encodings and you need to fix everything.

    The worst thing about incorrectly configured encodings is that it is often difficult to detect a problem, and at first glance it may seem that the site is working correctly and there are no problems.

    A small digression. Sypex viewer


    At some point, tired of sending people to the bulky phpMyAdmin, and the tiny utility Sypex Viewer was written . It is a single PHP file, uses modern Web 2.0 technologies AJAX, JSON and others. The main tasks that were set during the creation are the minimum weight, and the maximum convenience and speed. In the future, examples will show screenshots from it, but all the same actions can be done in phpMyAdmin.

    Data in cp1251 table in latin1


    Probably the most popular issue. When the data is encoded cp1251 (Windows-1251), and the tables have the default encoding latin1. Such situations arise in the following cases:
    • if you upgrade from MySQL version less than 4.1 to newer ones;
    • very often occurs in “bourgeois” scripts, which are completely satisfied with the default encoding, and they “forget” that it would be nice to specify the encoding of both tables and joins;
    • there are also cases when they move from one server (which has the default cp1251 encoding installed, in particular, this is done in Denver) to another (which has the standard encoding latin1).

    As a result, everything seems to be normal on the site, but if you look in the Sypex Viewer, then the Russian characters will look like "krakozyabry" (as they are usually called by users).



    In the article, I will consider one of the options for converting encodings using the free php script Sypex Dumper , as a ready-made solution.
    1. On the “Export” tab, select the desired tables.
    2. The encoding should be auto (other parameters are unimportant, you can add a comment, for example, “Dump before correcting the encoding”).
    3. Click "Run." Now we have a backup (in any case, it is advisable to do it with any database transformations).
    4. Go to the “Import” tab
    5. Select the backup file you just made.
    6. Select the cp1251 encoding and check the option “Encoding correction”.
    7. Click "Run."
    8. That's all we go into Sypex Viewer to make sure that Russian characters are displayed correctly.



    Data and tables in utf8, but latin1 connection encoding


    Now consider a more advanced case. Recently gaining popularity, due to the UTF-8 craze. The creators of the software began to transfer their offspring to UTF-8, but here everything is not as smooth as we would like.

    The problem arises mainly when the tables specify the UTF-8 encoding, the data is in UTF-8, but the connection encoding is set to latin1 by default (a typical example, vBulletin 4, although there is a connection encoding setting in the configs, but it is commented out by default).

    As a result, data is sent to MySQL in UTF-8, but since the encoding of the latin1 connection is specified, MySQL tries to convert the data from latin1 to UTF-8. As a result, Russian characters look like this:



    The situation is more running, but the problem is fixed almost the same as in the first case, only in step 2 you need to select the encoding latin1, and in step 6 you need to choose the utf8 encoding.

    Encoding Change


    Also a common problem of encoding conversion from cp1251 to UTF-8. Before completing this step, be sure to ensure that your Russian characters are displayed correctly in Sypex Viewer or phpMyAdmin, if this is not the case, then correct the encoding first.
    So, again we go to the Sypex Dumper.
    1. In the “Export” tab, select the desired tables.
    2. Set the encoding to which you want to convert the tables, in this case utf8.
    3. Click "Run."
    4. Then we go to the "Import" and select the desired file.
    5. We set utf8 encoding and the option “Encoding correction”.
    6. Click "Run."
    7. That's all the tables in UTF-8. Do not forget that you still need to set the encoding of the connection, convert your scripts and templates to UTF-8, set the correct encoding in the headers.


    Connection encoding


    Do not forget that after correcting the encoding, you need to make sure that your scripts use the correct encoding of the connection (in principle, this will be immediately visible, they will incorrectly show Russian characters without the necessary encoding of the connection). For some, it is set in the settings, in some you have to add it yourself.

    For this, just go through the file search and find where mysql_connect (or mysqli_connect) is called. After this line you need to add a line that indicates the encoding of the connection.

    mysql_query("SET NAMES 'cp1251'");


    Where instead of cp1251, specify the desired encoding of the connection.

    Do not forget to backup before encoding conversions, just like with condoms, it is better if it is not needed and when it is needed, it won’t be there.

    PS Thanks to the Shortiks for the fun content for examples.

    Also popular now: