Road recognition through semantic segmentation

    In the previous series, I conducted an experiment with the autonomous movement of my home tank . The road was recognized using a color filter, and the resulting mask went to the entrance of a specially trained classifier neural network, which chose to go right, left or straight.

    The weak point was the recognition of the roadway itself due to the variability of color shades, which is why the decision-making neural network produced strange results. The comments on that article recommended paying attention to semantic segmentation. The topic turned out to be promising and the use of segmented neural networks brought its advantages, but also its disadvantages, where would it be without them.

    But first things first and a little bit of equipment.


    Segmentation is the process of highlighting some parts of an image. The simplest and most obvious type of segmentation is color. However, using this method, it is impossible to understand what and where is depicted in the picture.

    Here is a good article describing primitive approaches.

    Semantic segmentation

    Semantic segmentation - splitting an image into objects with determining the types of these objects.

    It looks something like this:

    The results are very impressive, let's see what it is worth translating into real life.


    The most famous neural network, originally developed for medicine.
    The source

    People quickly realized that the approach can be used for all occasions.

    There are many articles on the Internet how to prepare data and train U-net networks:

    However, I did not find a ready-made U-net network to quickly take and experiment.


    A younger and lesser-known network. Designed just for recognizing city streets.


    The most popular datasets for street segmentation (they initially taught E-net):

    On the same datasets, U-net is now being trained.

    Implementation Choice

    The flow of new information on segmentation was quite overwhelming. Instinctively, I wanted to catch on to something simpler. I did not feel the inner Zen to understand the architecture of networks and spend time learning. But in the article from PyImageSearch there was a ready-made and trained neural network, moreover, in a format compatible with OpenCV-DNN.

    So the choice was made towards the least resistance.

    The use is very simple:
    (What is most worrying is that the network is trained in 1024x512 pictures - this is, firstly, more than the camera gives on Raspberry, and secondly, the required performance for processing this amount of data is somewhat confusing. As a result, the main problem will turn out to be exactly in that).

    We read the neural network from files (in one, the model itself, in the other class names, in the third - colors).

    def load_segment_model():
            classes = None
            with open(PiConf.SEGMENT_CLASSES) as f:
                classes ="\n")
            colors = None
            with open(PiConf.SEGMENT_COLORS) as f:
            colors = [np.array(c.split(",")).astype("int") for c in colors]
            colors = np.array(colors, dtype="uint8")
            print("[INFO] loading model...")
            net = cv2.dnn.readNet(PiConf.SEGMENT_MODEL)
            return net, classes, colors
        except Exception as e:
            logging.exception("Cannot load segment model")
        return None, None, None

    We segment the image, simultaneously marking segments on top of the original image
    (In my case, all classes except the road are invisible).

    def segment_image(image_path, seg_net, seg_classes, seg_colors):
        image0 = cv2.imread(image_path)
        image = cv2.resize(image0, (1024, 512),interpolation=cv2.INTER_NEAREST)
        blob = cv2.dnn.blobFromImage(image, 1 / 255.0, (1024, 512), 0, swapRB=True, crop=False)
        start = time.time()
        output = seg_net.forward()
        end = time.time()
        print("[INFO] inference took {:.4f} seconds".format(end - start))
        (numClasses, height, width) = output.shape[1:4]
        classMap = np.argmax(output[0], axis=0)
        mask = seg_colors[classMap]
        mask = cv2.resize(mask, (image0.shape[1], image0.shape[0]),interpolation=cv2.INTER_NEAREST)
        classMap = cv2.resize(classMap, (image0.shape[1], image0.shape[0]), interpolation=cv2.INTER_NEAREST)
        gmask = cv2.cvtColor(mask, cv2.COLOR_BGR2GRAY)
        gmask = cv2.resize(gmask, (128, 64), interpolation=cv2.INTER_NEAREST)
        gmask = gmask[0:64,32:96]
        output = ((0.6 * image0) + (0.4 * mask)).astype("uint8")
        return output, gmask


    We take ready-made pictures from the tank and set a segmented neural network on them.


    Only the left side of the sidewalk is recognized as expensive.

    We compress the picture and take from it a 64x64-sized center:
    (This size is expected by the neural network, which decides to change direction) The

    direction neural network (in fact - the classifier) ​​is ordered to take to the left. Not very correct, but bearable.


    A similar situation, again the lower right corner is lost (there is also asphalt wet).
    However, most of the road is still recognized.

    The classifier offers to go straight.


    The situation when the robot was in the middle of the sidewalk.

    The road is recognized almost perfectly.

    The classifier commands to take to the right (to find the edge of the road next time).


    Having conjured a little over the tank firmware, I replaced the color detector of the road with a segmenting neural network.

    When launching all this on the Raspberry Pi, the first thing that came out was depressing performance.
    It takes 6 seconds to segment one image - during this time, the tank manages to slip through all the turns with a vigorous trot.

    In real tests, this happened - despite the almost perfect recognition of the sidewalk and the correct commands from the control neural network - during the time the image was processed, the tank managed to go aside.

    In general, images of this size cannot be digested on Raspberry.
    It seems that you still have to do the training of a specialized neural network.


    Also popular now: