# Robot Loader Project: Orientation Definition

A month ago, I wrote about my robot loader determining his own position. (It’s a pity, I posted that article at an unsuccessful time on Saturday night, so few people saw it.) As I noted, the wheel sensors allow the robot to determine its position quite accurately - the slowly accumulating error is corrected as soon as the robot scans the barcode for any from the shelves of the warehouse. On the other hand, there was nothing to correct the accumulating directional error.

I discussed my difficulties with the humanities girl and asked what methods of orientation in space she knew. According to her, at the London Museum of Science she found an exposition dedicated to the orientation of ants vertically upward over their heads. Visitors were invited to take a mirror and walk around the room, looking into the mirror patterns on the ceiling and focusing only on them. (The ceiling map was attached.)

I decided to check: what does my robot see on the ceiling of the warehouse?

Even on such a low-quality recording, long rectangular windows are perfectly distinguishable. All of them, of course, are parallel to one of the warehouse axes, which means that they are parallel to the rows of shelves. It turns out that if a window gets into the frame and I can determine its direction, then I get the direction of the robot, with an accuracy of 180 °. Windows are not visible from anywhere in the warehouse, but for periodic course correction - they fall into the frame quite often.

In pattern recognition, I am not an expert, and the first thing I asked at Q&A was whether there was anything ready for recognition of rectangular windows, for example, is OpenCV applicable? Unfortunately, they didn’t tell me anything sensible - people who are “in the know” found it less than their dignity to chew on the teapot.

Moreover, the robot was controlled from under Microsoft Robotics Development Studio, but I did not find a finished .NET assembly for working with OpenCV - I decided to write my own recognition from scratch. Not rocket science, tea - just recognize the bright white rectangles.

The robot ceiling looks like this:

For starters, we separate the bright white window from the dark background. We convert from RGB to YCbCr and divide by Y = 227 (the threshold was chosen empirically, and in an ideal world it would be selected adaptively by the brightness of the image as a whole). Along the way, we reduce the initial image of 640x480 to 320x440 - this is enough for us, and processing will accelerate fourfold.

The code:

The separation results in the following image:

As usual on video images, in addition to a correctly selected window, we received noise from individual pixels on glare surfaces, but at the same time inside the window there were “knocked out pixels” that were not light enough.

We remove the noise with a median filter (although in QA I was advised for some reason by Gaussian blur. Well, what is the blur here for?) With a radius of 3 pixels and a threshold of 5 bright pixels out of 21 examined. Such a filter "inclines" to the connection of bright areas between which there are dark pixels - so that our window of three glasses is combined into one rectangle.

The dimension for arrays

After filtering, only the window combined from three glasses and a few highlights on the shelf remain white in the image: Let us state

that the largest white spot in the frame is by all means a window. Fill each flood spot (floodfill) and take the largest area.

Recognition image is ready! Two-level - a white rectangle on a black background. During filling, we even calculated the coordinates of its “center of mass”

Missile scientists might apply the Hough transform, would transfer the image to the 4-dimensional space of the probabilistic parameters of the rectangle (width, length, tilt angle, offset from the origin), and look for a maximum there. But I approached the task of workers and peasants, and for a start I made a “histogram” of the removal of the points of the rectangle from its geometric center:

The gray graph is the histogram, the dark bar on it is the maximum, i.e. the largest number of points is located at such a distance from the center of the rectangle. (A dark circle is drawn on the rectangle corresponding to this distance.) It is easy to understand that this distance is exactly the half-width of the rectangle: to it, the circles are entirely placed inside the rectangle, and the histogram is linear (with coefficient

Carefully find the centers of mass of these “sides”: the difficulty here is to separate them. We consider separately the center of mass in each "quadrant", then combine the "quadrants" in two pairs according to the proximity of the centers.

The point

Well, that's the whole orientation. There is a bit of pure geometry left to calculate the length of the rectangle and check whether the ratio of length to width is similar to a real window.

I will demonstrate the work on one more example - when the window does not fully fit into the frame. Due to the fact that we do not use the “end” of the histogram for recognition, the incomplete window does not confuse us at all.

The code I wrote was dressed in an MRDS service

With a window detector, my robot drove around the warehouse rather quickly:

The fate of the robot is sad. Just an hour after shooting this video, the wheel sensor clogged with warehouse dust, and the robot stopped monitoring its position. During the initial assembly of the robot, this sensor is placed the very first, i.e. to replace it, you need to essentially disassemble everything and then reassemble in a new way. I decided to try replacing the sensor “on a live” robot, but due to awkwardness I shorted the battery about something inside the robot, and it stopped driving completely. I had to leave him on a storage shelf next to the year before last robot - the one based on Roomba. It’s a robots cemetery, not a regiment.

So, before leaving Britain, I did not manage to make Eddie a working robot loader. But already this spring I will most likely return there with spare parts, revive it, and continue the tests.

I discussed my difficulties with the humanities girl and asked what methods of orientation in space she knew. According to her, at the London Museum of Science she found an exposition dedicated to the orientation of ants vertically upward over their heads. Visitors were invited to take a mirror and walk around the room, looking into the mirror patterns on the ceiling and focusing only on them. (The ceiling map was attached.)

I decided to check: what does my robot see on the ceiling of the warehouse?

Even on such a low-quality recording, long rectangular windows are perfectly distinguishable. All of them, of course, are parallel to one of the warehouse axes, which means that they are parallel to the rows of shelves. It turns out that if a window gets into the frame and I can determine its direction, then I get the direction of the robot, with an accuracy of 180 °. Windows are not visible from anywhere in the warehouse, but for periodic course correction - they fall into the frame quite often.

In pattern recognition, I am not an expert, and the first thing I asked at Q&A was whether there was anything ready for recognition of rectangular windows, for example, is OpenCV applicable? Unfortunately, they didn’t tell me anything sensible - people who are “in the know” found it less than their dignity to chew on the teapot.

American forum: I asked a question - you will get an answer.

Israeli forum: asked a question - get a question.

Russian forum: asked a question - they will explain to you what an asshole you are.

Moreover, the robot was controlled from under Microsoft Robotics Development Studio, but I did not find a finished .NET assembly for working with OpenCV - I decided to write my own recognition from scratch. Not rocket science, tea - just recognize the bright white rectangles.

The robot ceiling looks like this:

For starters, we separate the bright white window from the dark background. We convert from RGB to YCbCr and divide by Y = 227 (the threshold was chosen empirically, and in an ideal world it would be selected adaptively by the brightness of the image as a whole). Along the way, we reduce the initial image of 640x480 to 320x440 - this is enough for us, and processing will accelerate fourfold.

The code:

```
byte[] bytes = response.Frame;
int stride = bytes.Length / height;
byte[,] belong = (byte[,])Array.CreateInstance(typeof(byte), new int[] { 326, 246 }, new int[] { -3, -3 });
int Ythreshold = settings.Ythreshold;
for (int y = 0; y < 240; y++)
{
int offset = stride * y * 2;
for (int x = 0; x < 320; x++)
{
int blu = bytes[offset] + bytes[offset + 3] + bytes[offset + stride] + bytes[offset + stride + 3]; offset++;
int grn = bytes[offset] + bytes[offset + 3] + bytes[offset + stride] + bytes[offset + stride + 3]; offset++;
int red = bytes[offset] + bytes[offset + 3] + bytes[offset + stride] + bytes[offset + stride + 3]; offset += 4;
belong[x, y] = (0.299 / 4 * red + 0.587 / 4 * grn + 0.114 / 4 * blu > Ythreshold ? (byte)1 : (byte)0);
}
}
```

The separation results in the following image:

As usual on video images, in addition to a correctly selected window, we received noise from individual pixels on glare surfaces, but at the same time inside the window there were “knocked out pixels” that were not light enough.

We remove the noise with a median filter (although in QA I was advised for some reason by Gaussian blur. Well, what is the blur here for?) With a radius of 3 pixels and a threshold of 5 bright pixels out of 21 examined. Such a filter "inclines" to the connection of bright areas between which there are dark pixels - so that our window of three glasses is combined into one rectangle.

```
private bool[,] filtered = (bool[,])Array.CreateInstance(typeof(bool), new int[] { 326, 246 }, new int[] { -3, -3 });
int medianThreshold = settings.MedianThreshold;
for (int x = 0; x < 320; x++)
for (int y = 0; y < 240; y++)
filtered[x, y] = belong[x - 1, y - 2] + belong[x, y - 2] + belong[x + 1, y - 2] +
belong[x - 2, y - 1] + belong[x - 1, y - 1] + belong[x, y - 1] + belong[x + 1, y - 1] + belong[x + 2, y - 1] +
belong[x - 2, y * 1] + belong[x - 1, y * 1] + belong[x, y * 1] + belong[x + 1, y * 1] + belong[x + 2, y * 1] +
belong[x - 2, y + 1] + belong[x - 1, y + 1] + belong[x, y + 1] + belong[x + 1, y + 1] + belong[x + 2, y + 1] +
belong[x - 1, y + 2] + belong[x, y + 2] + belong[x + 1, y + 2] > medianThreshold;
```

The dimension for arrays

`belong`

and `filtered`

- [-3..322, -3..242] - is chosen on purpose with “fields” of three pixels on each side to work with these arrays without bothering with index checks. After filtering, only the window combined from three glasses and a few highlights on the shelf remain white in the image: Let us state

that the largest white spot in the frame is by all means a window. Fill each flood spot (floodfill) and take the largest area.

```
int biggest_area = 0;
byte area_id = 1, biggest_id = 0; // areas start from 2
Rectangle bounds = new Rectangle();
PointF cg = new PointF(); // center
Point[] stack = new Point[320*200];
for (int x = 0; x < 320; x++)
for (int y = 0; y < 240; y++)
if (filtered[x, y] && belong[x, y] <= 1)
{
int area = 0, left = 320, top = 240, right = 0, bottom = 0;
int sx = 0, sy = 0;
++area_id;
// FloodFill
int sp = 0;
stack[0] = new Point(x, y);
while (sp >= 0)
{
Point next = stack[sp--];
area++;
sx += next.X;
sy += next.Y;
belong[next.X, next.Y] = area_id;
if (next.X < left) left = next.X;
if (next.X > right) right = next.X;
if (next.Y < top) top = next.Y;
if (next.Y > bottom) bottom = next.Y;
if (filtered[next.X - 1, next.Y] && belong[next.X - 1, next.Y] <= 1) stack[++sp] = new Point(next.X - 1, next.Y);
if (filtered[next.X, next.Y - 1] && belong[next.X, next.Y - 1] <= 1) stack[++sp] = new Point(next.X, next.Y - 1);
if (filtered[next.X, next.Y + 1] && belong[next.X, next.Y + 1] <= 1) stack[++sp] = new Point(next.X, next.Y + 1);
if (filtered[next.X + 1, next.Y] && belong[next.X + 1, next.Y] <= 1) stack[++sp] = new Point(next.X + 1, next.Y);
}
if (area > biggest_area)
{
biggest_area = area;
biggest_id = area_id;
bounds = new Rectangle(left, top, right - left, bottom - top);
cg = new PointF((float)sx / area, (float)sy / area);
}
}
```

Recognition image is ready! Two-level - a white rectangle on a black background. During filling, we even calculated the coordinates of its “center of mass”

`cg`

(the red dot in the image) and the borders (green dot - the center of the bounding box). We can now check how much our white spot looks like a window: the area `biggest_area`

should be at least 2000 pixels, the distance between the two centers should be no more than 20 pixels, otherwise the figure is too asymmetrical for the rectangle. But how will we further determine its orientation? Missile scientists might apply the Hough transform, would transfer the image to the 4-dimensional space of the probabilistic parameters of the rectangle (width, length, tilt angle, offset from the origin), and look for a maximum there. But I approached the task of workers and peasants, and for a start I made a “histogram” of the removal of the points of the rectangle from its geometric center:

```
PointF c = new PointF(bounds.Left + (float)bounds.Width / 2, bounds.Top + (float)bounds.Height / 2);
int[] hist = new int[400];
for (int i = 0; i < 400; i++) hist[i] = 0;
int maxdist = 0;
for (int x = bounds.Left; x <= bounds.Right; x++)
for (int y = bounds.Top; y <= bounds.Bottom; y++)
if (belong[x, y] == biggest_id)
{
int dist = (int)Math.Sqrt(Sqr(x - c.X) + Sqr(y - c.Y));
hist[dist]++;
if (dist > maxdist) maxdist = dist;
}
```

The gray graph is the histogram, the dark bar on it is the maximum, i.e. the largest number of points is located at such a distance from the center of the rectangle. (A dark circle is drawn on the rectangle corresponding to this distance.) It is easy to understand that this distance is exactly the half-width of the rectangle: to it, the circles are entirely placed inside the rectangle, and the histogram is linear (with coefficient

*π*) is growing; then the histogram slowly decreases until it reaches the half-length of the rectangle; from now on, only four “corners” of circles are placed inside the rectangle, and the histogram drops sharply. Unfortunately, it is not possible to find the length through this “cliff” on the histogram - the edges of the rectangle turn out to be too torn, and the end of the histogram is noisy. We will go another way, and consider a circle 5 pixels wider than the rectangle. It will have two black “sides” just on the transverse axis of the rectangle:Carefully find the centers of mass of these “sides”: the difficulty here is to separate them. We consider separately the center of mass in each "quadrant", then combine the "quadrants" in two pairs according to the proximity of the centers.

```
int r1 = 0; // incircle radius
for (int x = maxdist; x >= 3; x--)
{
if (hist[x] > hist[r1]) r1 = x;
}
int rSample = r1 + 5;
int[] voters = new int[4];
Point[] sums = new Point[4];
Point sampleOrg = new Point(Math.Max((int)(c.X - rSample), 0),
Math.Max((int)(c.Y - rSample), 0));
Rectangle sample = new Rectangle(sampleOrg, new Size(
Math.Min((int)(c.X + rSample), 319) - sampleOrg.X,
Math.Min((int)(c.Y + rSample), 239) - sampleOrg.Y));
for (int x = sample.Left; x <= sample.Right; x++)
for (int y = sample.Top; y <= sample.Bottom; y++)
if (belong[x, y] != biggest_id)
{
int dist = (int)Math.Sqrt(Sqr(x - c.X) + Sqr(y - c.Y));
if (dist > r1 && dist <= rSample)
{
int idx = y < c.Y ? (x < c.X ? 1 : 0)
: (x < c.X ? 2 : 3);
voters[idx]++;
sums[idx].X += x;
sums[idx].Y += y;
}
}
PointF adjusted = new PointF();
int vAbove = voters[0] + voters[1],
vBelow = voters[2] + voters[3],
vLeft = voters[2] + voters[1],
vRight = voters[0] + voters[3],
allVoters = vAbove + vBelow;
if (allVoters == 0)
{
// abort: no black pixels found
}
else
{
if (vAbove > 0 && vBelow > 0)
{
// split vertically
PointF above = new PointF((float)(sums[0].X + sums[1].X) / vAbove - c.X, (float)(sums[0].Y + sums[1].Y) / vAbove - c.Y),
below = new PointF((float)(sums[2].X + sums[3].X) / vBelow - c.X, (float)(sums[2].Y + sums[3].Y) / vBelow - c.Y);
double dAbove = Math.Sqrt(above.X * above.X + above.Y * above.Y),
dBelow = Math.Sqrt(below.X * below.X + below.Y * below.Y);
if (dAbove >= r1 && dAbove <= rSample && dBelow >= r1 && dBelow <= rSample)
// the split is valid
adjusted = new PointF((above.X * vAbove - below.X * vBelow) / allVoters, (above.Y * vAbove - below.Y * vBelow) / allVoters);
}
if (adjusted.X == 0 && adjusted.Y == 0 &&
vLeft > 0 && vRight > 0)
{
// split horizontally
PointF toleft = new PointF((float)(sums[2].X + sums[1].X) / vLeft - c.X, (float)(sums[2].Y + sums[1].Y) / vLeft - c.Y),
toright = new PointF((float)(sums[0].X + sums[3].X) / vRight - c.X, (float)(sums[0].Y + sums[3].Y) / vRight - c.Y);
double dLeft = Math.Sqrt(toleft.X * toleft.X + toleft.Y * toleft.Y),
dRight = Math.Sqrt(toright.X * toright.X + toright.Y * toright.Y);
if (dLeft >= r1 && dLeft <= rSample && dRight >= r1 && dRight <= rSample)
// the split is valid
adjusted = new PointF((toleft.X * vLeft - toright.X * vRight) / allVoters, (toleft.Y * vLeft - toright.Y * vRight) / allVoters);
}
}
```

The point

`adjusted`

now points from the center of the rectangle along its transverse axis: Well, that's the whole orientation. There is a bit of pure geometry left to calculate the length of the rectangle and check whether the ratio of length to width is similar to a real window.

I will demonstrate the work on one more example - when the window does not fully fit into the frame. Due to the fact that we do not use the “end” of the histogram for recognition, the incomplete window does not confuse us at all.

**Second example**

Original image:

After separation:

After filtering:

Histogram:

Circle with sides:

Transverse axis:

After separation:

After filtering:

Histogram:

Circle with sides:

Transverse axis:

The code I wrote was dressed in an MRDS service

`WindowDetector`

- in the image of the standard `Technologies\Vision\ColorSegment`

. My service is attached to the camcorder, and for each frame update it sends to subscribers `UpdateFoundWindow`

with the angle of the window in the frame. The MRDS service that controls the robot is attached to `WindowDetector`

just like it is attached to a barcode scanner, and processes messages from both in exactly the same way - adjusting the course in the first case, in the other case. With a window detector, my robot drove around the warehouse rather quickly:

The fate of the robot is sad. Just an hour after shooting this video, the wheel sensor clogged with warehouse dust, and the robot stopped monitoring its position. During the initial assembly of the robot, this sensor is placed the very first, i.e. to replace it, you need to essentially disassemble everything and then reassemble in a new way. I decided to try replacing the sensor “on a live” robot, but due to awkwardness I shorted the battery about something inside the robot, and it stopped driving completely. I had to leave him on a storage shelf next to the year before last robot - the one based on Roomba. It’s a robots cemetery, not a regiment.

So, before leaving Britain, I did not manage to make Eddie a working robot loader. But already this spring I will most likely return there with spare parts, revive it, and continue the tests.