Search for an object in an image using a perceptual hash

    image

    In those days when I still believed that programming for myself at lunchtime or after work, you can create your own startup, I had one project. And the project required such an algorithm for finding an object in an image, so that it could be trained quickly for a new object, and so that it does not consume a lot of computing resources. After reading the article about the perceptual hash ( again paper and two ), I decided, why not use it to limit the number of investigated areas of the image? And he began to build his bike, instead of using the signs of Haar.use a perceptual hash as a filter of image areas so that only those areas where the object you are looking for are most likely to remain after the filter. At the end of the article there is a link to C ++ code using the OpenCV library.

    What is the result?





    In this video you can see the operation of the algorithm for finding an object in the image in action. A blue bounding box is used to highlight an object in a video when video playback is stopped. In this case, you can remember the appearance of the object. Learning the algorithm happens instantly. Also, the size of the blue bounding box affects the maximum and minimum sizes of the object search window in the image. The red bounding box highlights the area in the image where the target is most likely to be located. The algorithm works as follows:

    • Prepare the image for processing (create an integrated image)
    • Calculate the perceptual hash of the image section
    • Check the flag of an array element with a number equal to the perceptual hash
    • If the perceptual hash passes the test, compare the image section with pre-prepared patches

    At the last point, problems arise, since the perceptual hash does not precisely limit the image area, and you simply cannot save the found image area with the image of the object, with the slightest shift you get a big difference. It would be more correct to find key points in two images and make a comparison on them. In addition, the usual enumeration of patches during the comparison of two images slows down the operation of the algorithm, and nevertheless, the result could be called “real time”.

    The panda video shows the operation of a “filter” based on a perceptual hash. Red bounding boxes indicate areas of the image where the hash matches the hash of the desired object (or is within the specified Hamming distance). Alas, this filtering method does a poor job with small objects in the video. But the algorithm shows itself well when working with large objects, such as searching for a face in an image from a laptop’s web camera.



    Features of using the perceptual hash in this algorithm


    If we have infinite enough large computer memory so that each image can be represented as an index of an array, then a comparison of two images could be made if it had an if condition , and in the array itself we could store a flag indicating that this image belongs to the desired object .

    This approach is quite realistic to apply to the perceptual hash.

    Since the image is the index of the array, it will not be necessary to spend time searching for the hash in the array, it will be enough to check the condition that this array element (hash) belongs to the desired class of objects. It is not time consuming to calculate the hash if you prepare the original image (obtain an integral image from it).

    Typically, a perceptual hash uses an image size of 8 x 8 pixels, or 64 bits of data. However, 64 data bits have so many possible combinations that the array will occupy (1.71E11 x cell size ) memory. Another thing is if you take a 5 x 5 pixel image, which will correspond to 2E25 = 33554432 or at least 32 megabytes of memory, which is quite acceptable.

    Noise instead of hamming distance


    Since in this algorithm the perceptual hash of the image of the desired object is the index of the array, instead of the Hamming distance, to evaluate the correspondence of the perceptual hash to the desired one, you can pre-fill the array with flags, i.e. add noise to the hamming distance. This will reduce the amount of computation.

    → The project on github is here

    Also popular now: