We restore the detailed geometry of objects for more accurate assortment validation.

    Dealing with search quality issues, sooner or later we have to face the challenge of visual validation of products. We will omit simple tasks that a regular classifier will cope with, focusing on cases that require more or less accurate object geometry:

    Suppose you only need to select good photos of various objects for later use in e-commerce. By good we mean photos without unnecessary details with the dominant main object.

    Why do you need it?

    Any non-standard product image will definitely attract attention. But the reaction of a potential buyer can be both positive and negative. The task of preliminary validation is to reduce (preferably significantly) the likelihood of a negative scenario.

    Below are the “inconsistencies” of styles for one of the categories of the test store.

    Without complicating further, if a T-shirt is a little lost in the photo, or you consider the details you do not really need - something is more likely to go (or has already gone) wrong.

    Thus, one of the strategies of preliminary validation can be formulated very simply: photos with dominant products win. Things are easy, you need to let them win.

    Early results looked quite good and allowed to significantly simplify and automate the validation:

    What is not so bounding box approach?

    The main problem is the accuracy of the results. Complex objects, non-standard photos, real life, you know. Thus, if you have a bounding box, you still do not have enough information.

    The conclusion is somewhat disappointing, since it immediately brushes aside proven and well-working solutions (or makes them significantly more difficult). For example, using neural networks to obtain any kind of accurate geometry requires a lot of resources to prepare a training set, without guaranteeing the necessary accuracy.

    But having a more or less accurate geometry, one could use more complex logic of analysis and validation. But what is really there, you can also wipe on the video (selection of the required segment, automatic crop, etc.)


    The current solution cannot be called universal due to a sufficiently large number of limitations and simplifications.

    Simplify # 1: Contrast

    One of the simplifications can be formulated as follows: the object in the photo will always be a contrast. It is easy to find a contrast object, and then perform a scan (adaptive, with dynamic pitch, etc.):

    Naturally, the contrast can be increased if necessary, making the solution more stable.

    By the example, the search for implanted hair is implemented above. A very strange task that appeared on stackoverflow and successfully “selected” one evening.

    Simplification # 2: Only one object should be dominant

    In this case, a very small amount of products with obvious design solutions suffers, but other cases are worked out quite easily:

    Difficult cases

    Being engaged in this topic for some time, I can confidently say - all cases are complex in their own way. However, dynamic scenes or scenes with varying distances create the greatest problems.



        Strangeness 4K Mask RCNN COCO
       YOLOv2 vs YOLOv3 vs Mask RCNN vs Deeplab Xception
       Telegram: RobotsCanSee
       Instagram: RobotsCanSee

    Also popular now: