Quick selection of random values ​​from large MySQL tables by condition

The task of choosing random rows from a table quite often arises for developers.
In case MySQL is used, it is usually solved in the following way: Such code works extremely slowly for large tables. If the query does not need to use WHERE or the table is small, there are effective solutions, for example habrahabr.ru/post/54176 or habrahabr.ru/post/55864 . But I did not find any ready-made solutions for a large table and the need to filter by condition, receiving new values ​​at each request, so the description of my method is under a cat.

SELECT *
FROM users
WHERE role_id=5
ORDER BY rand()
LIMIT 10







As it turned out, MySQL cannot efficiently select random rows using ORDER BY rand () LIMIT N, where it is necessary to filter rows by condition (although the same MSSQL does an excellent job of selecting random rows from a table with a large number of records).

So, solving the “head-on” problem, the request (in the table 5 million records): The request took 41.3544 seconds, which is unacceptably long. It is impossible to find the maximum and minimum id, and then select random id from the gap in this case: due to the WHERE clause, id are no longer in order and discharged. My solution is the following: a random_seed table is added, containing the id and random_seed fields, filled with random numbers, an index is added to this column, and the index is added to the column by which the selection will take place.

SELECT *
FROM users
WHERE role_id=5
ORDER BY rand()
LIMIT 10





Now, in order to select random strings by condition, the query needs to be changed as follows (in the table 5 million records): The query took 0.0460 seconds, which is more than an acceptable result. The variable random_from_php is generated by the code that calls the request, which provides a random set of values ​​for each request, this number will provide a selection of new random numbers. The random_seed table should have as many values ​​as the table from which you want to take random rows + N records, where N is the maximum possible value of random_from_php. A real example from my work is the selection of random categories of goods from different combinations (total 4,000,000 records):

SELECT
u1.*
FROM
users u1,
random_seed rs
WHERE
u1.role_id=5 AND u1.id=(rs.id+random_from_php)
ORDER BY
rs.random_seed
LIMIT 10






Normal: request“Accelerated” request:
SELECT
oc1. *
FROM
object_category oc1
WHERE oc1.region_id = 6
ORDER BY RAND ()
LIMIT 10
SELECT
oc1. *
FROM
object_category oc1, random_seed rs
WHERE
oc1.id = (rs.id + 564756) AND oc1.region_id = 6
ORDER BY
rs.random_seed
LIMIT 10
Lead time:
1.726s0.007s
1.851s0.010s
1.803s0.006s
1.784s0.008s


Advantages of the method described above:
+ The fastest possible way to select random rows from the table by condition
+ No need to re-generate random numbers for each row in the table
+ Request all the necessary values ​​does not iterate, in one query

Cons:
- The need to enter an additional table
- The need to change familiar queries

Also popular now: