Sequencing in MySQL
Hello! Today there was an interesting problem in data sampling, the solution of which I decided to share.
So, given:
Two tables - shops and products
Roughly speaking - a platform where different stores place their goods.
And so, there was a need to make the delivery of goods on the main page, but so that the user would not see a bunch of goods from one store. Stores need to be alternated.
shops:
products:
A little google - no sensible solution was found. But the thought came up how to implement a selection with alternating stores.
Initially, I will describe the algorithm:
And all this with MySQL tools. And preferably with a single request.
With a plan in mind, you can begin to implement it. What do we need?
Declare our variables:
Next, assign the initial values (number of stores and id of the first store).
Now you can start the selection itself. What do we need?
I got the following request:
More information about what is here:
In each line, we increase our counter by a number equal to the number of stores. That is, if we have 5 stores, then we will get the following numbering: 0, 5, 10, 15, etc.
As soon as we have a new store, we increase the shift by one. Those. for the first store, the shift will be 0, for the second - 1, etc.
When changing the store, we also need to reset our counters and start numbering goods from the beginning, not forgetting to add a shift.
And in the end - to update the current store, the goods of which we number ...
As a result, we get a sample of the type:
Here we see that the counter is added correctly, when changing the store it is reset, a shift is added and the numbering starts from the beginning (taking into account the shift).
Actually, the matter is left to the small. Wrap the resulting selection in a subquery and sort by our counter:
Voila! We got a selection of goods with alternating stores:
There is one minus - the goods at each store go in order. Those. at the beginning we will see the very first product of the first store, then the first product of the second store, the third, fourth, etc. Next will go the second store goods, the third and so on.
In order to get rid of this pattern, we need to mix the goods in the initial sample, wrapping it in another subquery:
So our products will already be mixed before they are numbered.
Actually, this is where the solution to the problem ended. Full request can be viewed under the cut.
For the sake of interest - I looked at the sampling rate. In my opinion, the results were quite good:
10 stores, 10,000 products - ~ 16ms (0.016s)
100 stores, 1,000,000 products - ~ 2568ms (2.568s)
100 stores, 10,000,000 products - 129951ms (2m 9.951s)
I think these are good results , although, of course, you need to test in combat mode.
PS For me, only one unclear question remains. Everything is fine, but what to do with pagination? After all, each next page is a new request.
Accordingly, goods mixed in stores will receive a new serial number and may appear more than once in the sample.
If you have any thoughts on this, I will be grateful to hear them in the comments.
Thank you all for your attention)
So, given:
Two tables - shops and products
Roughly speaking - a platform where different stores place their goods.
And so, there was a need to make the delivery of goods on the main page, but so that the user would not see a bunch of goods from one store. Stores need to be alternated.
shops:
- id int not null auto_increment primary key,
- name varchar (255) null
products:
- id int not null auto_increment primary key,
- shop_id int not null,
- name varchar (255) null,
A little google - no sensible solution was found. But the thought came up how to implement a selection with alternating stores.
Initially, I will describe the algorithm:
- You must select all the products and sort them by store.
- Next, number each product, starting from 1 with an interval equal to the number of stores.
- When numbering, as soon as the goods of one store end, the numbering is reset to zero, shifted by one, and starts again
- Select products by sorting them by numbered field
And all this with MySQL tools. And preferably with a single request.
With a plan in mind, you can begin to implement it. What do we need?
- @i - The counter that will number our products
- @cnt - Number of stores
- @delta - the delta by which the counter is shifted when numbering the goods of the next store
- @cur - id of the current store, to add a delta and reset the counter when numbering a new store
Declare our variables:
set @cnt = 0;
set @i = 0;
set @delta = 0;
set @cur = 0;
Next, assign the initial values (number of stores and id of the first store).
select @cur:=id from shops order by id limit 1;
select @cnt:=count(id) from shops;
Now you can start the selection itself. What do we need?
- It is necessary to number goods with an interval equal to the number of stores.
- At the end of the goods of one store - reset the counter, add a delta and change the current store.
I got the following request:
select id, shop_id, @i:=@i+@cnt as counter,
IF(@cur<>shop_id,@delta:=@delta+1,@delta) as delta,
IF(@cur<>shop_id,@i:=@delta,@i) as cur,
IF(@cur<>shop_id,@cur:=shop_id,@cur) as curshop
from t_product order by shop_id
More information about what is here:
@i:=@i+@cnt
In each line, we increase our counter by a number equal to the number of stores. That is, if we have 5 stores, then we will get the following numbering: 0, 5, 10, 15, etc.
IF(@cur<>shop_id,@delta:=@delta+1,@delta) as delta
As soon as we have a new store, we increase the shift by one. Those. for the first store, the shift will be 0, for the second - 1, etc.
IF(@cur<>shop_id,@i:=@delta,@i) as cur,
When changing the store, we also need to reset our counters and start numbering goods from the beginning, not forgetting to add a shift.
IF(@cur<>shop_id,@cur:=shop_id,@cur) as curshop
And in the end - to update the current store, the goods of which we number ...
As a result, we get a sample of the type:
id | shop_id | counter | delta | cur | curshop |
---|---|---|---|---|---|
43989 | 1 | 10 | 0 | 10 | 1 |
46989 | 1 | 20 | 0 | 20 | 1 |
114172 | 1 | thirty | 0 | thirty | 1 |
83989 | 1 | 40 | 0 | 40 | 1 |
67172 | 1 | fifty | 0 | fifty | 1 |
94672 | 2 | eleven | 1 | eleven | 2 |
6489 | 2 | 21 | 1 | 21 | 2 |
41989 | 2 | 31 | 1 | 31 | 2 |
61672 | 2 | 41 | 1 | 41 | 2 |
97489 | 3 | 12 | 2 | 12 | 3 |
Here we see that the counter is added correctly, when changing the store it is reset, a shift is added and the numbering starts from the beginning (taking into account the shift).
Actually, the matter is left to the small. Wrap the resulting selection in a subquery and sort by our counter:
select id as product_id, shop_id, cur from (
select id, shop_id, @i:=@i+@cnt as counter,
IF(@cur<>shop_id,@start:=@delta+1,@delta) as delta,
IF(@cur<>shop_id,@i:=@delta,@i) as cur,
IF(@cur<>shop_id,@cur:=shop_id,@cur) as curmag
from products order by shop_id
) as A order by cur ;
Voila! We got a selection of goods with alternating stores:
product_id | shop_id | cur |
---|---|---|
4187 | 1 | 10 |
7483 | 2 | eleven |
4045 | 3 | 12 |
9091 | 4 | thirteen |
1457 | 5 | 14 |
2387 | 6 | fifteen |
8109 | 7 | 16 |
1445 | 8 | 17 |
2102 | 9 | 18 |
9245 | 10 | 19 |
6744 | 1 | 20 |
7854 | 2 | 21 |
2164 | 3 | 22 |
There is one minus - the goods at each store go in order. Those. at the beginning we will see the very first product of the first store, then the first product of the second store, the third, fourth, etc. Next will go the second store goods, the third and so on.
In order to get rid of this pattern, we need to mix the goods in the initial sample, wrapping it in another subquery:
select id as product_id, shop_id, cur from (
select id, shop_id, @i:=@i+@cnt as counter,
IF(@cur<>shop_id,@start:=@delta+1,@delta) as delta,
IF(@cur<>shop_id,@i:=@delta,@i) as cur,
IF(@cur<>shop_id,@cur:=shop_id,@cur) as curmag
from products order by shop_id
from (select id, shop_id from products order by shop_id, rand()) as A order by shop_id) as B order by cur;
So our products will already be mixed before they are numbered.
Actually, this is where the solution to the problem ended. Full request can be viewed under the cut.
Full interleaved query
set @cnt = 0;
set @i = 0;
set @start = 0;
set @cur = 0;
select @cur:=id from shops order by id limit 1;
select @cnt:=count(id) from shops;
select id as product_id, shop_id, cur from (
select id, shop_id, @i:=@i+@cnt as counter,
IF(@cur<>shop_id,@start:=@delta+1,@delta) as delta,
IF(@cur<>shop_id,@i:=@delta,@i) as cur,
IF(@cur<>shop_id,@cur:=shop_id,@cur) as curmag
from products order by shop_id
from (select id, shop_id from products order by shop_id, rand()) as A order by shop_id) as B order by cur;
For the sake of interest - I looked at the sampling rate. In my opinion, the results were quite good:
10 stores, 10,000 products - ~ 16ms (0.016s)
100 stores, 1,000,000 products - ~ 2568ms (2.568s)
100 stores, 10,000,000 products - 129951ms (2m 9.951s)
I think these are good results , although, of course, you need to test in combat mode.
PS For me, only one unclear question remains. Everything is fine, but what to do with pagination? After all, each next page is a new request.
Accordingly, goods mixed in stores will receive a new serial number and may appear more than once in the sample.
If you have any thoughts on this, I will be grateful to hear them in the comments.
Thank you all for your attention)