Deep Learning - not only seals on mobile phones or how we diagnose locomotive trolleys

Published on March 11, 2019

Deep Learning - not only seals on mobile phones or how we diagnose locomotive trolleys


    Just a couple of days ago, Aurorai transferred a defect recognition and trolley monitoring system for Ermak locomotives to trial operation. The task is non-trivial and very interesting, the first step of which was to assess the condition of the brake pads and the width of the brace. We managed to solve the problem with an accuracy of 1 mm at a locomotive speed of up to 30 km / h! I want to note that due to the specifics, it was possible to use “TTA (test-time augmentation)” - a vivid example of a competition kaggle-style hack that does not fit well with prod and semantic segmentation based on the se_resnext50 encoder , which gives an amazingly accurate result in predicting the mask .

    Description of the task

    It is necessary to create a hardware-software complex for detecting defects of brake pads and outputting data to the shift supervisor.

    Prerequisites for the task

    As it turned out, a huge number of pads, about 80%, change in the PTOL (points of technical inspection of locomotives), and this happens every 72 hours for each locomotive. The bulk of the checks in the PTOL is a visual inspection by the master of the external part of the locomotive trolley.



    Plan for solving the problem:

    1. Equipment selection
    2. Data collection
    3. Model training
    4. Server Development with REST API
    5. Android tablet client development
    6. Design and assembly of a rack for placement of cameras and light
    7. Trial operation

    Selection of equipment

    Perhaps one of the most difficult, if not the most difficult, task was to choose cameras, lenses and light in a limited budget and time: MVP had to be done in a month and a half. In a couple of days, Google made me an expert on hardware for machine vision. The choice was made on Basler cameras and a 6k lumens pulsed backlight, synchronized with the camera. In favor of Basler (70 frames / sec, resolution up to 1920x1024), its python API spoke, which greatly facilitated the integration of all system components, the only negative is the price of cameras ~ 100 tr.

    Choosing a lens for cameras was complicated by a lack of understanding of the required focal length and viewing angle, I had to take risks, but I took out a lens calculator and a pinch of luck.

    Backlight: the necessary time for the glow of the LEDs, their type, lens parameters was experimentally established. I tried 3 different lens modifications for LEDs, with an angle of 30, 45, 60. I eventually chose matte lenses with an angle of 45.





    Assembling and checking the control signal for the flash backlight for the camera



    For the server hardware, I took Intel Core i7-7740X Kaby Lake, 46gb RAM , 1 TB SSD and 3x1080Ti - this is enough to predict two 3-section locomotives in no more than 2 minutes.

    The collective farm cooling of a sandwich from video cards blows 10 degrees.



    Data collection

    Creating a dataset is a separate song, no one can be entrusted with this event, and therefore I was sent to a distant little-known town in the depths of our vast homeland. I took a picture on the phone (!!!)about 400 pads. Looking ahead, I’ll say that the valiant depot employees, apparently frightened by the auditor from Moscow, changed all the pads on the locomotives to completely new ones and painted them with a fresh coat of paint, it was funny and scary to look at it. I was looking forward to the worst, though there were still about 400 photos of completely different blocks that I made in the Moscow depot.

    It only remained to believe in a miracle, pile on augmentations, come up with heuristics to remove erroneous segments, of which there were many, since I did not think about anti-examples.

    Expectation:



    Reality:





    Here it must be said that there was not a single example of heavily worn blocks.

    Model training The

    model with the se_resnext50 encoder and the decoder with scse block from this one showed itself best of allrepository, but scse (implementation for pytorch) had to be removed for reasons of speeding up the prediction process, because had to be predicted in a minute. For model training, the Pytorch 1.0.1 framework was used , with a large number of augmentations from albumentations and a self-written Horizontal Flip augmentation to change the class when displaying.

    def train_transform(p=1):
            return Compose([
                OneOf([
                    CLAHE(clip_limit=2),
                    IAASharpen(),
                    IAAEmboss(),
                    RandomBrightnessContrast(brightness_limit=0.8, contrast_limit=0.8),
                    HueSaturationValue(hue_shift_limit=50, sat_shift_limit=50, val_shift_limit=50),
                    RGBShift(r_shift_limit=50, g_shift_limit=50, b_shift_limit=50),
                    JpegCompression(quality_lower=30),
                    RandomGamma(),
                    GaussNoise()
                ], p=0.3),
                OneOf([
                    Blur(),
                    MotionBlur(),
                    MedianBlur(),
                ], p=0.3),
                ShiftScaleRotate(shift_limit=0.2, scale_limit=0.4, rotate_limit=5, p=0.5),
                Normalize(p=1)
            ], p=p)
    

    As a loss function, I chose The Lovász-Softmax loss , it behaved almost the same as bce + jaccard, but better than BCE , which fits too much on the markup. Also a challenge was the choice of an algorithm for determining the serial number of a pair of wheels and pads, there were options with metric learning, but I needed to quickly show the result, and the idea came to mark the pads on classes 1 and 2, where 1 is the orientation to the right, and 2 to the left. The network began to predict not only the mask, but the orientation. Using simple heuristics, it was possible to reliably determine the serial numbers of blocks and wheelsets, then averaging the predictions, in fact, using TTA with a slight shift of the object during movement and different lighting angles gives a good result in mask accuracy even at a resolution of 320x320.

    Separately, the task was to determine the wedge-shaped defect in the pads, there were many ideas from the Huff Transformation, before marking the corners / borders of the block with dots / lines of different classes. In the end, the option won over how the workers do it: you need to step back 5 cm from the narrow edge and measure the width, if it is within the normal range, then skip the block.

    The training pipeline was taken from here by the MICCAI 2017 Robotic Instrument Segmentation . The training process consists of three stages: training with a frozen encoder, training the entire network and training with CosineAnnealingLR . The first two stages use ReduceLROnPlateau .

    Programming a REST server and client on Android

    For the REST server, I chose flask - it’s easier not to come up with a launch in 2 minutes. I decided to make a database for storage with my own hands in the form of a simple folder structure and a current state file. The application for the tablet on Android Studio, the benefit of the latest versions is just a paradise for the developer.

    Designing and assembling a rack for placing cameras and light I

    remembered the old time when I made charging stations for electric cars, and this experience was very useful - we decided to make it from structural aluminum racks printed on a 3D printer.





    Getting to the test!


    The result exceeded all expectations. For computer vision specialists, the task may seem fairly straightforward and simple. However, I had some skepticism because of two things: firstly, the training set was small and did not contain boundary cases such as very thin blocks; secondly, the tests took place in very different shooting and lighting conditions.





    Jaccard on validation reaches 0.96, visually the pads are segmented very clearly, add averaging over several photos and get very good accuracy in estimating the width of the pads. During the tests, it turned out that you can work with the carts of other locomotives, but take faster cameras:





    In conclusion, I want to say that the technology has shown itself very well and, in my opinion, has great potential in terms of eliminating the human factor, reducing the downtime of a locomotive and making forecasts.

    Acknowledgments Thanks

    to the ods.ai community , without your help I would not be able to do all this in such a short time! Many thanks to n01z3 , DL, who wished me to take up DL, for his invaluable advice and extraordinary professionalism! Many thanks to the ideological mastermind Vasily Manko (CEO, Aurorai company), the best designer Tatyana Brusova.

    See you in the next episode of the story!

    Aurorai, llc