Fetching custom records in MySQL

    There are seemingly ordinary tasks that can be solved immediately and without hesitation, but with the intensive use of such solutions problems arise, and not small ones. I want to talk about one of these tasks.


    Problem

    We took an outsourcer here to write a small and uncomplicated code in PHP and MySQL. There was one of the tasks - to select several arbitrary records from a table in the MySQL database. And what did this lazy and stupid outsourcer do? Of course I wrote nonsense like this:
    SELECT * FROM tTable ORDER BY RAND() LIMIT 10;
    At first glance, everything is logical and works correctly. 10 random entries are selected. But if you look at the plan for fulfilling this request, it will become clear why I put a dozen obscene curses at the address of a stupid outsourcer.
    During the execution of this query, MySQL writes to the temporary table all (!!!) rows of the original table, with one new field, into which the results of the RAND () function are written - i.e. set of arbitrary values. Then this temporary table is sorted by filesort by the added field with arbitrary values, and then the first 10 records are selected. Full PPC. Now imagine what will happen if there are 10,000 records in the source table. But what if 1,000,000? But what if this sample needs to be done ten times per second. Yes, here any super-duper server will go into thought for a long time.

    But if you are a little savvy (and outsourcers do not want to think, they give up work and go to drink money), then you can come up with an elegant and quick option, the speed of which does not depend on the number of rows in the table.

    The idea

    So, let's start slowly. First we simplify the task, suppose we need to select not 10, but only one record.
    Everything is pretty simple here. We only need to operate with the number of records in the table, because the key can be any (composite, non-numeric), and it can also be "discharged" as a result of deleting records. First, find out the total number of entries in the table:
    SELECT COUNT(*) FROM tTable;
    Next, we simply calculate an arbitrary number from 0 to the number of entries in this table
    rand_row = round(rand() * row_count);
    Now, without any problems, you can select a random record:
    SELECT * FROM tTable LIMIT rand_row, 1;

    PHP solution

    So, they coped with a simplified task. Now you need to overcome the originally set, i.e. select 10 entries. The logic here is simple: you need to count 10 arbitrary numbers from 0 to the number of entries in the table, and then make 10 queries like the previous one and combine them using UNION.
    There are two options for how to do this: you can format it as a piece of PHP code, or you can use it as a MySQL stored procedure.
    In PHP, everything is very simple:
    $row_count = mysql_result(mysql_query('SELECT COUNT(*) FROM tTable;'), 0);
    $query = array();
    while (count($query) < 10) {
        $query[] = '(SELECT * FROM tTable LIMIT '.rand(0, $row_count).', 1)';
    }
    $query = implode(' UNION ', $query);
    $res = mysql_query($query);

    Everything is simple and fast. On the original table with ten thousand records, the performance increase is more than 12 times higher than the initial “lazy" version.
    If there are not so many records in the source table and the appearance of duplicate rows in the selection is unacceptable, then you can first create a list of non-repeating arbitrary values, and then compile a query on them.

    MySQL solution

    Alternatively, you can still do this as a stored procedure:
    CREATE PROCEDURE `spRandomSelect`(IN aSchema VARCHAR(50), IN aTable VARCHAR(50), IN aNumRows INTEGER(11))
        NOT DETERMINISTIC
        READS SQL DАТА
    BEGIN
      DECLARE iQuery VARCHAR(10000);
      DECLARE iNumRows INTEGER(11);

      SET iNumRows = (SELECT `TABLE_ROWS` FROM `information_schema`.`TABLES` t
        WHERE t.`TABLE_SCHEMA` = aSchema AND t.`TABLE_NAME` = aTable);
      SET iQuery = '';
      loop1: LOOP
        SET iQuery = CONCAT(iQuery, '(SELECT * FROM `', aSchema, '`.`', aTable,
          '` LIMIT ', ROUND(RAND(UNIX_TIMESTAMP() + aNumRows) * iNumRows), ', 1)');
        IF aNumRows > 1 THEN
          SET iQuery = CONCAT(iQuery, ' UNION ');
        END IF;
        SET aNumRows = aNumRows - 1;
        IF aNumRows > 0 THEN
          ITERATE loop1;
        END IF;
        LEAVE loop1;
      END LOOP loop1;
      SET @iQuery = iQuery;
      PREPARE iExecStmt FROM @iQuery;
      EXECUTE iExecStmt;
      DRОP PREPARE iExecStmt;
    END;

    The performance of this solution is lower than when preparing a compound query in PHP, but the point is to show the possibility of implementation in pure SQL.

    Also popular now: