Three Interesting SQL Knowledge Tasks - Solutions

    In a recent topic, I cited three, in my opinion, interesting MySQL problems with the desire to look at possible solutions to hara-humans. As I promised, I bring my solutions to these problems. In order to make the text interesting and informative, I decided to chew in detail what is why and how. So ...



    First, I will go over the solutions that were proposed in the comments.

    1 task.

    There was nothing complicated, however, there were few correct decisions. The most common mistake - few people guessed to consider the option when the category might not have been published. Caught with the wrong grouping or simply gross syntax errors in the code. There were decisions, after watching EXPLAIN which suicidal thoughts arose. :-)

    But after fixing errors, many worked correctly, except for some “minor” sins.

    2 task.

    In fact, it turned out easier than I thought - but this is more my fault, because the condition was not accurate enough, and no one guessed to clarify. As a result, it turned into a catch, which no one paid attention to. Here's the thing: the ordi field is responsible for sorting, but it has not been clarified that it is unique (although the opposite was not true, and it is my fault yes). Sorting is really going on in this field, but for each membership its own values, that is, if in one photo album is 1,2,3, then in the other not 4,5,6 but also 1,2,3. Because of this, half of the decisions are both correct and at the same time not.

    Of the errors - almost nowhere are there checks for the flags of the entries above. Except for this, then almost everyone worked.

    If I take into account the conditions I have given, then this is a rather complicated task, namely, the request is very cumbersome. Further stant clear why.

    3 task.

    Decisions, as one would expect, were few. Only one thing worked relatively well - the proposed pharod . Because the task is really very difficult, and here it is not the cumbersome queries as in the second, but directly in the solution itself.

    Now the decision.


    To begin with, we will build a template query that will correctly take into account all the conditions given and select everything at all. I always do this if the same data in the tables needs to be shown, but depending on some external conditions. Now it doesn’t matter, I just want to explain how I solve this in steps.

    We bind all three tables in a single query by setting the join conditions using JOIN ... ON (...). You get something like the following:

    SELECT
      *
    FROM photo_category as c
      JOIN photo_gallery as g ON (g.c_id=c.id AND c.is_published AND g.is_published)
      JOIN photo_image as i ON (i.g_id = g.id AND i.is_published)
    GROUP BY g.id
    HAVING COUNT(i.id) > 0
    ORDER BY c.ordi, g.ordi, i.ordi;


    I write the connection conditions in ON (...) so that it would immediately be clear what is relevant and not clutter up WHERE (in fact, one damn thing where they are written). Grouping is needed to filter empty galleries and categories (for them, the HAVING COUNT (i.id) condition will be NULL). Sorting by fields is present.

    In this request, all the initial conditions are fulfilled, although this is not a solution at all. We made a blank. Now let's go.

    The first task.


    We take the initial request, which already takes everything into account. We add a selection condition by category ID and one more JOIN table, taking into account the is_main_foto flag. But since this may not be possible, we will make it LEFT JOIN:

    SELECT
      *
    FROM photo_category as c
      JOIN photo_gallery as g ON (g.c_id=c.id AND c.is_published AND g.is_published)
      JOIN photo_image as i ON (i.g_id = g.id AND i.is_published)
      LEFT JOIN photo_image as i2 ON (i2.g_id = g.id AND i2.is_main_foto)
    WHERE c.id = 1
    GROUP BY g.id
    HAVING COUNT(i.id) > 0
    ORDER BY c.ordi, g.ordi, i.ordi;


    Now, if the value of im2.id IS NULL, then absolutely any arbitrary photo will be in i.id. By the way, a normal DBMS in this case should swear because it will not be clear what value to take in i . Now for complete happiness, we will write the answer using IF:

    SELECT
      c.id as cid,
      c.title as ctitle,
      g.id as gid,
      g.title as gtitle,
      IF (i2.id IS NULL, i.id, i2.id) as image_id
    FROM photo_category as c
      JOIN photo_gallery as g ON (g.c_id=c.id AND c.is_published AND g.is_published)
      JOIN photo_image as i ON (i.g_id = g.id AND i.is_published)
      LEFT JOIN photo_image as i2 ON (i2.g_id = g.id AND i2.is_main_foto)
    WHERE c.id = 1
    GROUP BY g.id
    HAVING COUNT(i.id) > 0
    ORDER BY c.ordi, g.ordi, i.ordi;


    For those who don’t know how IF works, I’ll explain IF (<condition>, <expression value if true>, <expression value if false>) . The trenar operator is practical.

    Everything, the problem is solved.

    The second task.

    The solution is not very complicated: you need to get the ORDI of the current element, then make MAX for those satisfying the sampling condition provided that it is less than the ordi received and thereby get the previous picture, then similarly the next one. If you don’t go into details, then everything is quite simple, but if you go into it you get quite complex queries.

    First we get the ordi (sorting field) of the current element - it's very simple:

    SELECT ordi FROM photo_image WHERE id = 1


    Further, we will take into account what you need to look for if you enter only the current gallery, that is, you need to determine the id of the gallery (by condition it can be obtained from outside, but for completeness we will calculate it ourselves):

    SELECT g_id FROM photo_image WHERE id = 1


    Now let's build a request for the previous ORDI (we are only looking for published ones):

    SELECT MAX(ordi) from photo_image
    WHERE is_published AND g_id = (SELECT g_id FROM photo_image WHERE id = 1)
    AND ordi < (SELECT ordi FROM photo_image WHERE id = 1)


    Similarly for the following:

    SELECT MIN(ordi) from photo_image
    WHERE is_published AND g_id = (SELECT g_id FROM photo_image WHERE id = 1)
    AND ordi > (SELECT ordi FROM photo_image WHERE id = 1)


    Now we know the ORDI of the next and the previous, we need to get the ID of the corresponding elements, again, provided that we are only in the current gallery:

    SELECT id, title
    FROM photo_image
    WHERE
      g_id = (SELECT g_id FROM photo_image WHERE id = 1)
        
      AND
      (
        ordi =
        (
          SELECT MAX(ordi) from photo_image
          WHERE is_published AND g_id = (SELECT g_id FROM photo_image WHERE id = 1)
          AND ordi < (SELECT ordi FROM photo_image WHERE id = 1)
        )
        
        OR
        ordi =
        
        (
          SELECT MIN(ordi) from photo_image
          WHERE is_published AND g_id = (SELECT g_id FROM photo_image WHERE id = 1)
          AND ordi > (SELECT ordi FROM photo_image WHERE id = 1)
        )
      );


    It seems to be all, but we still do not have any condition checks. And it’s also not clear how to separate the following from the previous one if we have one result in the sample. To solve this, add our photo to the sample (for which we calculate) and still use our "template" request, and as a result we get:

    SELECT ordi, id, title
    FROM photo_image
    WHERE
      EXISTS
      (
        SELECT
          i.id
        FROM photo_category as c
          JOIN photo_gallery as g ON (g.c_id=c.id AND c.is_published AND g.is_published)
          JOIN photo_image as i ON (i.g_id = g.id AND i.is_published)
        WHERE i.id = 1
      )
      
      AND
      g_id = (SELECT g_id FROM photo_image WHERE id = 1)
        
      AND
      (
        ordi =
        (
          SELECT MAX(ordi) from photo_image
          WHERE is_published AND g_id = (SELECT g_id FROM photo_image WHERE id = 1)
          AND ordi < (SELECT ordi FROM photo_image WHERE id = 1)
        )
        
        OR
        ordi =
        
        (
          SELECT MIN(ordi) from photo_image
          WHERE is_published AND g_id = (SELECT g_id FROM photo_image WHERE id = 1)
          AND ordi > (SELECT ordi FROM photo_image WHERE id = 1)
        )
        
        OR
        id = 1
      );


    Condition

    EXISTS
    (
      SELECT
        i.id
      FROM photo_category as c
        JOIN photo_gallery as g ON (g.c_id=c.id AND c.is_published AND g.is_published)
        JOIN photo_image as i ON (i.g_id = g.id AND i.is_published)
      WHERE i.id = 1
    )


    removes everything from the selection if the current photo, gallery or category is not published. There is no sense in the HAVING expression and grouping, because if there are no pictures in the gallery, then the ejection will still be empty. The result will be three, two or one row. Accordingly, the desired photograph will be one that has a given ID and the previous-next is determined on the basis of ordi - it is less or more than that with a given ID.

    It seems that there are a lot of subqueries and it will work slowly. Actually not - MySQL caches repetitive growths (you can make SQL_CACHE for the sake of confidence), so each of the subqueries will be executed only 1 time. Given the existence of indexes (well, where without them) this will work quite quickly.

    The problem is solved.

    Note: I said that I do not know how to solve this problem with one request. This is so, since in reality I still needed to get a large bunch of parameters for each photo, and if you do this with just one request, it becomes simply huge and really slows down. Here - this is 2 requests (for the next and previous), though with a common filter.

    The third task.

    The ambush is that you actually need to do a LIMIT for a specific group. You won’t do this with the usual limit, since it limits the result of the entire request and only it. To combine UNION queries for each category using LIMIT ... - well, the solution is of course, but if there are a lot of categories? Not okay. The next thing that comes to mind is filtering with WHERE. But then we need some indication of what filtering will be, that is, in the WHERE clause, it is necessary to distinguish rows that are larger than ours N. And how to get it? Only by setting your serial number to a number of albums for each of the categories. How to make it? Only with local variables. I’m a little less than completely sure that this problem and the like cannot be solved in any other way.

    If we take the original request and slightly modify it, like this:

    SELECT
      @a:=@a+1, *
    FROM photo_category as c
      JOIN photo_gallery as g ON (g.c_id=c.id AND c.is_published AND g.is_published)
      JOIN photo_image as i ON (i.g_id = g.id AND i.is_published)
    GROUP BY g.id
    HAVING COUNT(i.id) > 0
    ORDER BY c.ordi, g.ordi, i.ordi;


    Then we get numbered rows. For each row, the value of @a will increase by one. Excellent. But we need to count all of a certain group, so provided that c_id is changing, you need to reset @a. We introduce another variable for this, and the reset condition (at the same time, add the fields we need instead of the asterisk):

    SELECT
      @a:=@a+1,
      IF (@cid=cid, @a:=@a+1, (@cid:=cid) AND (@a:=1)),
      c.id as cid, c.title as ctitle,
      g.id as gid, g.title as gtitle,
    FROM photo_category as c
      JOIN photo_gallery as g ON (g.c_id=c.id AND c.is_published AND g.is_published)
      JOIN photo_image as i ON (i.g_id = g.id AND i.is_published)
    GROUP BY g.id
    HAVING COUNT(i.id) > 0
    ORDER BY c.ordi, g.ordi, i.ordi;


    What does it mean. If the value of the variable cid is equal to the current value of the category id, then the variable @ a is increased by 1, after which the result is @a. Otherwise, the cid value becomes equal to the current category id, @a becomes equal to 1 and the result of the expression becomes = 1 (that is, the same @a, although in fact it is just a good match). Now it would seem that you can simply filter out the condition WHERE @a <N but ... this will not work.

    And that's why. This scheme will be executed correctly only if the rows are already sorted. But sorting is done after the request has been processed , and not during. It turns out that you need to sort it first, and only then number it.

    Now I draw attention to the fact that, as I already said, fortunately, the expression IF ( cid = cid, @a: = @ a + 1, ( cid : = cid) AND (@a: = 1)) will be equal to @a in both cases. Then you can use it by writing it directly in WHERE. Let me remind you that by the condition it was necessary to show the picture, but this does not cause any problems, since we will first execute the request and then we will only perform counting operations with filters. Then here is what we get:

    SELECT q.* FROM
    (
      SELECT -- запрос из первой задачи
        c.id as cid, c.title as ctitle,
        g.id as gid, g.title as gtitle,
        IF (i2.title IS NULL, i.title, i2.title)
        FROM photo_category as c
          LEFT JOIN photo_gallery as g ON (g.c_id=c.id AND c.is_published AND g.is_published)
          LEFT JOIN photo_image as i ON (i.g_id = g.id AND i.is_published)
          LEFT JOIN photo_image as i2 ON (i2.g_id = g.id AND i2.is_main_foto)
        GROUP BY g.id
        HAVING COUNT(i.id) > 0
        ORDER BY c.ordi, g.ordi DESC
    ) as q
    WHERE IF (@cid=q.cid, @a:=@a+1, (@cid:=q.cid) AND (@a:=1)) <= N;


    This will already work proavilno. The problem is solved. By the way, I’ll note on my own that local variables are a very powerful thing with their skillful use. And yet - this request can be called a universal solution for many typical tasks of this kind: 5 last added or best or bought goods per category, 5 best articles for each user, 10 most sold books of each genre and so on. Because it’s absolutely all the same what the subquery will be.

    Well, that's just it. Thanks for attention. I hope you were interested.

    PS: about the title of the gallery in the condition of the problem - well, of course it was a typo :-) Thanks to those who reported this, although I could not fix it on time.

    PS2:
    * This source code was highlighted with Source Code Highlighter.

    Also popular now: