The algorithm for finding the displacement of an object in the image

    In a previous article, I described a simple algorithm for recognizing moving objects in a dynamic picture. And now I will describe a slightly different situation. Suppose we already have the coordinates of the object and we need to determine in which direction it has shifted. By coordinates, I mean the coordinates of a conditional point on the object, for simplicity, assuming that the object does not change its orientation in space. The fact that the object itself has shifted can be determined by subtracting a certain area of ​​the frame from the area of ​​another frame. In the last article, I did just that, only for the whole frame.

    It may be tempting to solve this problem “head-on” by sorting through the whole picture. However, such an operation may be too time consuming. Some other algorithm begs.

    So, a little experiment, the same pictures, I assume that the car was moving in a straight line. I take an area of ​​25 by 25 pixels in size, shift it along the assumed path of the car and calculate the total deviation of the brightness of the pixels, here is the program code:

     private void tsmiFindShift_Click(object sender, EventArgs e)
            {
                ImageMatrix matrix1 = create_matrix("D:\\3\\1.png");
                ImageMatrix matrix2 = create_matrix("D:\\3\\2.png");
                Bitmap picture = new Bitmap(matrix1.picture);
                using (var wrapper = new ImageWrapper(picture, true))
                {
                    int x1 = 110;
                    int x2 = x1+25;
                    int y1 = 140;
                    int y2 = y1+25;
                    int dx = 0;
                    int dy = 0;
                    StringBuilder sb = new StringBuilder();
                    for (dx = -100; dx < 200; dx++)
                    {
                        Int64 res = 0;
                        for (int i = x1; i < x2; i++)
                        {
                            for (int j = y1; j < y2; j++)
                            {
                                int light1 = matrix1.matrix[i, j];
                                int light2 = matrix2.matrix[i + dx, j + dy];
                                int light = Math.Abs(light2 - light1);
                                res = res + light;
                            }
                        }
                        sb.AppendFormat("{0}; {1};\n", res, dx);
                    }
                    System.IO.File.WriteAllText("D:\\3\\1.txt", sb.ToString());
                    pbImage.Image = picture;
                }
    

    We look at the received schedule:

    image

    What do we get? At some point, this sum of deviations is minimal, but not zero. Why not zero? Perhaps the direction of the car was not quite in a straight line, it could also be various noises (for example, raindrops, hand shake, and so on). To test this hypothesis, I nevertheless conducted an experiment with a “forehead” search, by sorting out coordinates on a large area of ​​the picture. It turned out that my assumption was true, the car did move in a straight line, since the smallest value of the objective function turned out to be exactly at zero displacement along the y axis, that is, only horizontal movement. It remains to write off all the noise. Although, it would not hurt to see the mask for subtracting frames at the found point:

    image

    In the subtraction area, the edges of the headlights are visible, apparently, just some glare.

    The question arises: is it possible to use something like descent along the gradient or the search for the minimum of the objective function to search for a new position of the object? If you look closely at the graph, we see that the function has a lot of local extrema, and it is not yet clear how to make the algorithm do not confuse the local extremum with the main one.

    Here comes the following idea: what if we use binary search? First we take the function along the x axis, then along the y, find the quadrant where the function is minimal, divide it into four of the same quadrants, and so until we find the desired point. This is much faster than going around all points in the search area.

    For the experiment, another picture was chosen: an ellipse moving simultaneously both horizontally and diagonally:

    image

    We plot the objective function along the x axis:

    image

    Extremum of the function at dx = 24
    And along the y axis:

    image

    Extremum at -34.
    As we see. on the dy axis, the extremum was found correctly, but on x - we found that the object is shifted to a completely different quadrant, the extremum is in the region dx> 0, although in fact the bias was in the direction of decreasing x. So we need some other algorithm.

    You can try to draw a perpendicular straight line through an extremum point and look for an extremum along it. And then through the point to the extremum on the same line draw a perpendicular line. Thus, we have a straight line on which we are looking for an extremum, will be either parallel to the axis Ox, then Oy. So, we try, we find the extremum on Ox, we look on the axis parallel to Oy at dx = 24:

    image

    Extreme at dy = -1. Draw a line through the point dx = 24, dy = -1, parallel to the Ox axis already. we get this result:

    image

    Now we have an extremum already at the point dx = 18. Already a little closer to the truth. The only question is: will an extremum be found in this way and how much more effective will it be than a "dumb search"?

    So, first stupid bust. The algorithm found an extremum at the point dx = -12, dy = -23 for 15556 iterations: The

    image

    total number of iterations in this example is 40401. The runtime on my laptop is about 0.7 seconds. For streaming video processing, this speed is, of course, unacceptable.

    Now let's try to implement the above idea with a gradual approximation:

      private ResStruct search(ImageMatrix matrix1, ImageMatrix matrix2, int x1, int y1, int x2, int y2, StringBuilder sb,  int x_beg, int y_beg, int x_end, int y_end)
            {
                Int64 min_res = 999999999999999999;
                ResStruct result = new ResStruct();
                int min_dx = 999999999;
                int min_dy = 999999999;
                for (int dy = y_beg; dy <= y_end; dy++)
                {
                    for (int dx = x_beg; dx <= x_end; dx++)
                    {
                        Int64 res = 0;
                        for (int i = x1; i < x2; i++)
                        {
                            for (int j = y1; j < y2; j++)
                            {
                                int light1 = matrix1.matrix[i, j];
                                int light2 = matrix2.matrix[i + dx, j + dy];
                                int light = Math.Abs(light2 - light1);
                                res = res + light;
                            }
                        }
                        if (res < min_res) 
                        {
                            min_res = res;
                            min_dx=dx;
                            min_dy=dy;
                        }
                        //sb.AppendFormat("{0}; {1}; {2}; {3};\n", dx, dy, res, min_res);
                    }
                }
                result.dx = min_dx;
                result.dy = min_dy;
                result.res = min_res;
                return result;
            }
            private void tsmiFastFindShift_Click(object sender, EventArgs e)
            {
                ImageMatrix matrix1 = create_matrix("D:\\3\\11.png");
                ImageMatrix matrix2 = create_matrix("D:\\3\\12.png");
                Bitmap picture = new Bitmap(matrix1.picture);
                using (var wrapper = new ImageWrapper(picture, true))
                {
                    int x1 = 140;
                    int x2 = x1 + 25;
                    int y1 = 110;
                    int y2 = y1 + 25;
                    Random rnd=new Random();
                    StringBuilder sb = new StringBuilder();
                    DateTime dt1 = DateTime.Now;
                    Int64 min_res = 999999999999999999;
                    int min_dx = 0;
                    int min_dy = 0;
                    int k=0;
                    bool flag = false;
                    do
                    {
                        ResStruct res;
                        if (flag)
                        {
                            res = search(matrix1, matrix2, x1, y1, x2, y2, sb, min_dx, -100, min_dx, 100);
                        }
                        else
                        {
                            res = search(matrix1, matrix2, x1, y1, x2, y2, sb, -100, min_dy, 100, min_dy);
                        }
                        flag = !flag;
                        if (res.res < min_res)
                        {
                            min_res = res.res;
                            min_dx = res.dx;
                            min_dy = res.dy;
                        }
                        else
                        {
                            //Если у нас не получилось найти минимум меньше текущего (не приблизились к цели ни на сколько)
                           //то просто сдвинем оси на несколько случайных пикселей
                            if (flag) min_dy = min_dy + rnd.Next(10) - 5; else min_dx=min_dx + rnd.Next(10) - 5;
                        }
                        sb.AppendFormat("{0}; {1}; {2}; {3}; {4}\n", res.dx, res.dy, res.res, min_res, k);
                        k++;
                    } while (k < 1000 && min_res>=1);
                    DateTime dt2 = DateTime.Now;
                    MessageBox.Show(dt1.ToString() + " " + dt2.ToString());
                    System.IO.File.WriteAllText("D:\\3\\1.txt", sb.ToString());
                    for (int i = x1; i < x2; i++)
                    {
                        for (int j = y1; j < y2; j++)
                        {
                            int light1 = matrix1.matrix[i, j];
                            int light2 = matrix2.matrix[i + min_dx, j + min_dy];
                            int light = Math.Abs(light2 - light1);
                            wrapper[i, j] = Color.FromArgb(light, light, light);
                        }
                    }
                    pbImage.Image = picture;
                }
            }
    

    This algorithm found offsets for 12,600 iterations (63 to 200):

    image

    This is somewhat faster. True, the algorithm itself is not the most optimal, it can be further optimized. Its other drawback: if you choose the wrong starting points, it freezes and does not find anything. For example, I chose -100, -100 first, and the search did not work.

    If this topic is of interest to you, then I will talk about how to further improve this algorithm and eliminate its shortcomings, as well as how to work with it in real rather than drawn pictures.

    Also popular now: