Ogoun February 15, 2012 at 14:34

Regular Network Point Clustering

In this article, I will consider two algorithms, the first - directly clustering, the second - the construction of the contour of the cluster in the form of a convex polygon, an applied task for improved perception of the result.

Clustering

In general, the law with this area is quite superficial, so it is expected that such an algorithm has long existed and is somehow named, if anyone knows, please enlighten.

Input data

Point coordinates are just an array of two-dimensional coordinates
R - The maximum distance between points in a cluster - the main indicator for building a cluster
k - Grid coefficient - degree for the number 2, by which the side of the network cell is calculated: a = 2 ^ k

For the algorithm to work effectively, the following condition should be observed:
R <2 ^ k, or modify the algorithm.

Algorithm

Comparison of coordinates of points with cells of a regular network.
Xr = x >> k
Yr = y >> k, where k is the grid coefficient, >> is the bitwise offset. Due to the absence of the division operation at this step, a good gain in speed is obtained. The result of this step will be a dictionary of the form:
[coordinates of the network cell]: [links to points related to this cell]
Building clusters in one pass on points:
- The next point is taken, if it is not tied to a cluster, a new cluster is created and a point is added to it, a link to the cluster is affixed at the point itself.
- A list of cells is taken; it includes: a cell with a point and neighboring cells.
- All points in cells from the received list are selected.
- The distance between the current point and the one checked from the list is found, if the distance <= R then two cases are possible
  - the tested point is not yet included in the cluster => the point is entered in the same cluster as the current point
  - the point being checked already belongs to some cluster that is different from the current => cluster with a large number of points absorbs a cluster with a smaller one.

The result of the work will be a set of clusters in each of which the distance between the points does not exceed R, something like this:

Or more clearly this way: The

points of one cluster are indicated by the same color.

Circuit

It works, but I want to look at the result in a more attractive way, the easiest way, to select the clatter by a contour, preferably in the form of a convex polygon.
The main tool is vector multiplication by three points:


 public static int CrossLinePoint(ClusterPoint l1, ClusterPoint l2, ClusterPoint p)
    {
      return (p.X - l2.X) * (l2.Y - l1.Y) - (p.Y - l2.Y) * (l2.X - l1.X);
    }

If the result of the function is less than 0, then the bend at the midpoint goes clockwise, if more than 0, then against, if it is 0, there is no bend. The meaning itself is not important, only the sign is important.

The algorithm consists of only two steps.

Descriptive outline selection

The center of mass of the cluster is found, without taking into account the weight coefficients. Those. they simply add up all the coordinates, separately in x and y, and are divided by the number of points in the cluster, the obtained coordinates [Xc; Yc] will be the center of mass of the cluster.
We search for the three points furthest from the center of mass that form the initial contour.
The remaining points are sorted, and for each entry into the initial contour is checked, if the point enters it, nothing happens, if outside, it becomes part of the contour.
When adding a point to the contour, the array of contour points is sorted; the angle from the polar coordinates of the point, where the center of coordinates of the cluster acts as the center of coordinates, is taken as the value for sorting.
The result will be a broken contour describing the points that make up the cluster, and the points in the contour are already ordered clockwise relative to the center of mass.

Function to get the angle from polar coordinates:


private float GetPolarAngle(ClusterPoint p)
    {
      float a = 0, x = p.X - center.X, y = p.Y - center.Y;
      if (x > 0 && y >= 0)
      {
        a = (float)Math.Atan(y / x);
      }
      else if (x > 0 && y < 0)
      {
        a = (float)(Math.Atan(y / x) + 2 * Math.PI);
      }
      else if (x < 0)
      {
        a = (float)(Math.Atan(y / x) + Math.PI);
      }
      else if (x == 0 && y > 0)
      {
        a = (float)(Math.PI * .5);
      }
      else if (x == 0 && y < 0)
      {
        a = (float)(Math.PI * 1.5);
      }
      else if (x == 0 && y == 0)
      {
        a = 0;
      }
      return (float)(a * 180 / Math.PI);
    }

Here center is the center of mass of the cluster.

Angular sorting is needed to draw contour points in the correct order, and to determine if a point is included / not included in the contour when constructing the contour. If you leave the dots in random order, the algorithm will not work.

Result of work:

Green dot, this is the center of mass.
Still not very nice, so go to the last step.

Convex outline

The simplest step, in which the points are already ordered clockwise, is made by passing through the points and those in which the bend is counterclockwise are thrown out of the contour. The CrossLinePoint function above is used to determine the bend.
The result of work:

Testing

Graphs of the speed of the algorithm.

The field has a constant size, the density of points increases: The

field increases in size, the density of points is approximately the same:

Processing time is highly dependent on the initial data. If you set a large distance between points in a cluster on a field with a high density of points, the time grows significantly, which is logical, because the number of points compared increases. For example, in the above graphs, the speed of work is three times different. The first shows that 140 seconds were spent for 20,000 points, and in the case when the density does not increase only 40 seconds for the same 20,000 points.
If you were engaged in a similar task and you have your own measurements, I will be glad to see the results in the comments, to compare the effectiveness of the algorithms.

For testing, contour construction was not used. Since the loop search code is not optimized, the time it takes to construct it is several times greater than the time spent by the clusters themselves.

What's next

The algorithm works, but only for a set of points having coordinates, the next step will be an attempt to modify to highlight areas on continuous images (drawings, photographs, etc.).

Implementation

If someone is interested in the algorithm, there is a program and source codes. Win / C #
Win x86
Sources

In addition to the mappings of clusters shown in the article as points and as points with a bounding contour of lines, the program can also display in the form of closed curves, and the filled region of closed curves.

Standard GDI + tools are used.

Tags: