Simplicity and complexity of primitives or how to determine unnecessary preprocessing for a neural network

This is the third article on the analysis and study of ellipses, triangles and other geometric shapes.
The previous articles have raised some very interesting questions for readers, in particular, about the complexity or simplicity of various training sequences. The questions are actually very interesting, for example, how much more difficult is a triangle for learning than a quadrilateral or another polygon?



Let us try to compare, and for comparison, we have an excellent, proven by generations of students, an idea - the shorter the cheat sheet, the easier the exam.

This article is also simply the result of curiosity and idle interest, nothing of it is found in practice, and for practical tasks there are a couple of great ideas, but there is almost nothing for copy-painting. This is a small study of the complexity of the training sequences - the author's reasoning and the code are set out, you can check / add / change everything yourself.

So, let's try to find out which geometrical figure is more difficult or simpler for segmentation, which course of lectures for AI is more comprehensible and better assimilated.

There are many different geometrical figures, but we will only compare triangles, quadrangles and five-pointed stars. We will use a simple method for constructing a train sequence - we will divide the 128x128 single-color image into four parts and randomly place an ellipse and, for example, a triangle in these quarters. We will detect a triangle of the same color as the ellipse. Those. the task is to train the network to distinguish, for example, a quadrangle polygon from an ellipse colored in the same color. Here are examples of pictures that we will study.







We will not detect a triangle and a quadrangle in one picture, we will detect them separately, in different trains, against the background of an ellipse-like noise.

Take for study the classic U-net and three types of training sequences with triangles, quadrangles and stars.

So, given:

  • three training sequences of pairs of picture / mask;
  • network. Common U-net, which is widely used for segmentation.

Idea for verification:

  • Determine which of the training sequences is “harder” to learn;
  • how some preprocessing techniques influence learning

Let's start, choose 10,000 pairs of pictures of quadrangles with ellipses and masks and consider them carefully. We are interested in how short the crib will turn out and what its length depends on.

Load the library, determine the size of the array of images
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import math
from tqdm import tqdm
from skimage.draw import ellipse, polygon
from keras import Model
from keras.optimizers import Adam
from keras.layers import Input,Conv2D,Conv2DTranspose,MaxPooling2D,concatenate
from keras.layers import BatchNormalization,Activation,Add,Dropout
from keras.losses import binary_crossentropy
from keras import backend as K
import tensorflow as tf
import keras as keras
w_size = 128
train_num = 10000
radius_min = 10
radius_max = 20


define loss and accuracy functions
defdice_coef(y_true, y_pred):
    y_true_f = K.flatten(y_true)
    y_pred = K.cast(y_pred, 'float32')
    y_pred_f = K.cast(K.greater(K.flatten(y_pred), 0.5), 'float32')
    intersection = y_true_f * y_pred_f
    score = 2. * K.sum(intersection) / (K.sum(y_true_f) + K.sum(y_pred_f))
    return score
defdice_loss(y_true, y_pred):
    smooth = 1.
    y_true_f = K.flatten(y_true)
    y_pred_f = K.flatten(y_pred)
    intersection = y_true_f * y_pred_f
    score = (2. * K.sum(intersection) + smooth) / (K.sum(y_true_f) +
                 K.sum(y_pred_f) + smooth)
    return1. - score
defbce_dice_loss(y_true, y_pred):return binary_crossentropy(y_true, y_pred) + dice_loss(y_true, y_pred)
defget_iou_vector(A, B):# Numpy version
    batch_size = A.shape[0]
    metric = 0.0for batch in range(batch_size):
        t, p = A[batch], B[batch]
        true = np.sum(t)
        pred = np.sum(p)
        # deal with empty mask firstif true == 0:
            metric += (pred == 0)
            continue# non empty mask case.  Union is never empty # hence it is safe to divide by its number of pixels
        intersection = np.sum(t * p)
        union = true + pred - intersection
        iou = intersection / union
        # iou metrric is a stepwise approximation of the real iou over 0.5
        iou = np.floor(max(0, (iou - 0.45)*20)) / 10
        metric += iou
    # teake the average over all images in batch
    metric /= batch_size
    return metric
defmy_iou_metric(label, pred):# Tensorflow versionreturn tf.py_func(get_iou_vector, [label, pred > 0.5], tf.float64)
from keras.utils.generic_utils import get_custom_objects
get_custom_objects().update({'bce_dice_loss': bce_dice_loss })
get_custom_objects().update({'dice_loss': dice_loss })
get_custom_objects().update({'dice_coef': dice_coef })
get_custom_objects().update({'my_iou_metric': my_iou_metric })


We will use the metric from the first article . Let me remind readers that we will predict the pixel mask - this is the “background” or “quadrilateral” and evaluate the truth or falsity of the prediction. Those. The following four options are possible - we correctly predicted that a pixel is a background, correctly predicted that a pixel is a quadrilateral, or made a mistake in predicting a “background” or “quadrilateral”. And so on all the pictures and all the pixels we estimate the number of all four options and calculate the result - this will be the result of the network. And the fewer erroneous predictions and the more true, the more accurate the result and the better the operation of the network.

We examine the network as a “black box”, we will not look at what is happening with the network inside, how weights change and how gradients are selected - look into the depths of the network later when we compare the networks.

simple u-net
defbuild_model(input_layer, start_neurons):# 128 -> 64
    conv1 = Conv2D(start_neurons * 1, (3, 3), activation="relu", padding="same")(input_layer)
    conv1 = Conv2D(start_neurons * 1, (3, 3), activation="relu", padding="same")(conv1)
    pool1 = MaxPooling2D((2, 2))(conv1)
    pool1 = Dropout(0.25)(pool1)
    # 64 -> 32
    conv2 = Conv2D(start_neurons * 2, (3, 3), activation="relu", padding="same")(pool1)
    conv2 = Conv2D(start_neurons * 2, (3, 3), activation="relu", padding="same")(conv2)
    pool2 = MaxPooling2D((2, 2))(conv2)
    pool2 = Dropout(0.5)(pool2)
    # 32 -> 16
    conv3 = Conv2D(start_neurons * 4, (3, 3), activation="relu", padding="same")(pool2)
    conv3 = Conv2D(start_neurons * 4, (3, 3), activation="relu", padding="same")(conv3)
    pool3 = MaxPooling2D((2, 2))(conv3)
    pool3 = Dropout(0.5)(pool3)
    # 16 -> 8
    conv4 = Conv2D(start_neurons * 8, (3, 3), activation="relu", padding="same")(pool3)
    conv4 = Conv2D(start_neurons * 8, (3, 3), activation="relu", padding="same")(conv4)
    pool4 = MaxPooling2D((2, 2))(conv4)
    pool4 = Dropout(0.5)(pool4)
    # Middle
    convm = Conv2D(start_neurons * 16, (3, 3), activation="relu", padding="same")(pool4)
    convm = Conv2D(start_neurons * 16, (3, 3), activation="relu", padding="same")(convm)
    # 8 -> 16
    deconv4 = Conv2DTranspose(start_neurons * 8, (3, 3), strides=(2, 2), padding="same")(convm)
    uconv4 = concatenate([deconv4, conv4])
    uconv4 = Dropout(0.5)(uconv4)
    uconv4 = Conv2D(start_neurons * 8, (3, 3), activation="relu", padding="same")(uconv4)
    uconv4 = Conv2D(start_neurons * 8, (3, 3), activation="relu", padding="same")(uconv4)
    # 16 -> 32
    deconv3 = Conv2DTranspose(start_neurons * 4, (3, 3), strides=(2, 2), padding="same")(uconv4)
    uconv3 = concatenate([deconv3, conv3])
    uconv3 = Dropout(0.5)(uconv3)
    uconv3 = Conv2D(start_neurons * 4, (3, 3), activation="relu", padding="same")(uconv3)
    uconv3 = Conv2D(start_neurons * 4, (3, 3), activation="relu", padding="same")(uconv3)
    # 32 -> 64
    deconv2 = Conv2DTranspose(start_neurons * 2, (3, 3), strides=(2, 2), padding="same")(uconv3)
    uconv2 = concatenate([deconv2, conv2])
    uconv2 = Dropout(0.5)(uconv2)
    uconv2 = Conv2D(start_neurons * 2, (3, 3), activation="relu", padding="same")(uconv2)
    uconv2 = Conv2D(start_neurons * 2, (3, 3), activation="relu", padding="same")(uconv2)
    # 64 -> 128
    deconv1 = Conv2DTranspose(start_neurons * 1, (3, 3), strides=(2, 2), padding="same")(uconv2)
    uconv1 = concatenate([deconv1, conv1])
    uconv1 = Dropout(0.5)(uconv1)
    uconv1 = Conv2D(start_neurons * 1, (3, 3), activation="relu", padding="same")(uconv1)
    uconv1 = Conv2D(start_neurons * 1, (3, 3), activation="relu", padding="same")(uconv1)
    uncov1 = Dropout(0.5)(uconv1)
    output_layer = Conv2D(1, (1,1), padding="same", activation="sigmoid")(uconv1)
    return output_layer
# model
input_layer = Input((w_size, w_size, 1))
output_layer = build_model(input_layer, 26)
model = Model(input_layer, output_layer)
model.compile(loss=bce_dice_loss, optimizer=Adam(lr=1e-4), metrics=[my_iou_metric])
model.summary()


The function of generating pairs of image / mask. On a black and white picture 128x128 filled with random noise with randomly selected from two ranges, or 0.0 ... 0.75 or 0.25.1.0. Randomly select a quarter in the picture and place a randomly oriented ellipse and in the other quarter place a quad and equally color it with random noise.

defnext_pair():
    img_l = (np.random.sample((w_size, w_size, 1))*
             0.75).astype('float32')
    img_h = (np.random.sample((w_size, w_size, 1))*
             0.75 + 0.25).astype('float32')
    img = np.zeros((w_size, w_size, 2), dtype='float')
    i0_qua = math.trunc(np.random.sample()*4.)
    i1_qua = math.trunc(np.random.sample()*4.)
    while i0_qua == i1_qua:
        i1_qua = math.trunc(np.random.sample()*4.)
    _qua = np.int(w_size/4)
    qua = np.array([[_qua,_qua],[_qua,_qua*3],[_qua*3,_qua*3],[_qua*3,_qua]])
    p = np.random.sample() - 0.5
    r = qua[i0_qua,0]
    c = qua[i0_qua,1]
    r_radius = np.random.sample()*(radius_max-radius_min) + radius_min
    c_radius = np.random.sample()*(radius_max-radius_min) + radius_min
    rot = np.random.sample()*360
    rr, cc = ellipse(
        r, c, 
        r_radius, c_radius, 
        rotation=np.deg2rad(rot), 
        shape=img_l.shape
    )
    p0 = np.rint(np.random.sample()*(radius_max-radius_min) + radius_min)
    p1 = qua[i1_qua,0] - (radius_max-radius_min)
    p2 = qua[i1_qua,1] - (radius_max-radius_min)
    p3 = np.rint(np.random.sample()*radius_min)
    p4 = np.rint(np.random.sample()*radius_min)
    p5 = np.rint(np.random.sample()*radius_min)
    p6 = np.rint(np.random.sample()*radius_min)
    p7 = np.rint(np.random.sample()*radius_min)
    p8 = np.rint(np.random.sample()*radius_min)
    poly = np.array((
        (p1, p2),
        (p1+p3, p2+p4+p0),
        (p1+p5+p0, p2+p6+p0),
        (p1+p7+p0, p2+p8),
        (p1, p2),
    ))
    rr_p, cc_p = polygon(poly[:, 0], poly[:, 1], img_l.shape)
    if p > 0:
        img[:,:,:1] = img_l.copy()
        img[rr, cc,:1] = img_h[rr, cc]
        img[rr_p, cc_p,:1] = img_h[rr_p, cc_p]
    else:
        img[:,:,:1] = img_h.copy()
        img[rr, cc,:1] = img_l[rr, cc]
        img[rr_p, cc_p,:1] = img_l[rr_p, cc_p]
    img[:,:,1] = 0.
    img[rr_p, cc_p,1] = 1.return img

Let's create a training sequence of pairs, let's see random 10. Let me remind you that the pictures are monochrome, grayscale.

_txy = [next_pair() for idx in range(train_num)]
f_imgs = np.array(_txy)[:,:,:,:1].reshape(-1,w_size ,w_size ,1)
f_msks = np.array(_txy)[:,:,:,1:].reshape(-1,w_size ,w_size ,1)
del(_txy)
# смотрим на случайные 10 с масками    
fig, axes = plt.subplots(2, 10, figsize=(20, 5))
for k in range(10):
    kk = np.random.randint(train_num)
    axes[0,k].set_axis_off()
    axes[0,k].imshow(f_imgs[kk])
    axes[1,k].set_axis_off()
    axes[1,k].imshow(f_msks[kk].squeeze())



First step. We train on the minimum starting set


The first step of our experiment is simple, we are trying to train the network to predict only 11 first pictures.

batch_size = 10
val_len = 11
precision = 0.85
m0_select = np.zeros((f_imgs.shape[0]), dtype='int')
for k in range(val_len):
    m0_select[k] = 1
t = tqdm()
whileTrue:
    fit = model.fit(f_imgs[m0_select>0], f_msks[m0_select>0],
                    batch_size=batch_size, 
                    epochs=1, 
                    verbose=0
                   )
    current_accu = fit.history['my_iou_metric'][0]
    current_loss = fit.history['loss'][0]
    t.set_description("accuracy {0:6.4f} loss {1:6.4f} ".\
                      format(current_accu, current_loss))
    t.update(1)
    if current_accu > precision:
        break
t.close()

accuracy 0.8545 loss 0.0674 lenght 11 : : 793it [00:58, 14.79it/s]

We selected the first 11 from the initial sequence and trained the network on them. Now it doesn’t matter whether the network memorizes these particular pictures or generalizes, the main thing is that it can recognize these 11 pictures as we need it. Depending on the chosen dataset and accuracy, network training can last for a long, very long time. But we have only a few iterations. I repeat that now it does not matter to us how or what the network has learned or learned, the main thing is that it has achieved the established prediction accuracy.

Now let's start the main experiment.


We will build a cheat sheet, we will build such cheat sheets separately for all three training sequences and compare their length. We will take new picture / mask pairs from the constructed sequence and will try to predict them with a network trained on the already selected sequence. At the beginning it is only 11 pairs of picture / mask and the network is trained, perhaps not very well. If a new mask is predicted for a picture with acceptable accuracy, then we throw out this pair, it does not contain new information for the network, it already knows and can calculate a mask from this picture. If the prediction accuracy is not sufficient, then we add this masked image to our sequence and begin to train the network until an acceptable accuracy is achieved on the selected sequence. Those.

batch_size = 50
t_batch_size = 1024
raw_len = val_len
t = tqdm(-1)
id_train = 0#id_select = 1whileTrue:
    t.set_description("Accuracy {0:6.4f} loss {1:6.4f}\
     selected img {2:5d} tested img {3:5d} ".
                      format(current_accu, current_loss, val_len, raw_len))
    t.update(1)
    if id_train == 1:
        fit = model.fit(f_imgs[m0_select>0], f_msks[m0_select>0],
                        batch_size=batch_size,
                        epochs=1,
                        verbose=0
                       )
        current_accu = fit.history['my_iou_metric'][0]
        current_loss = fit.history['loss'][0]
        if current_accu > precision:
            id_train = 0else:
        t_pred = model.predict(
            f_imgs[raw_len: min(raw_len+t_batch_size,f_imgs.shape[0])],
            batch_size=batch_size
                              )
        for kk in range(t_pred.shape[0]):
            val_iou = get_iou_vector(
                f_msks[raw_len+kk].reshape(1,w_size,w_size,1),
                t_pred[kk].reshape(1,w_size,w_size,1) > 0.5)
            if val_iou < precision*0.95:
                new_img_test = 1
                m0_select[raw_len+kk] = 1                
                val_len += 1break
        raw_len += (kk+1)
        id_train = 1if raw_len >= train_num:
        break
t.close()

Accuracy 0.9338 loss 0.0266 selected img  1007 tested img  9985 : : 4291it [49:52,  1.73s/it]

Here accuracy is used in the sense of “accuracy”, and not as the standard metric keras, and the subroutine “my_iou_metric” is used to calculate the accuracy.

And now let's compare the work of the same network with the same parameters on a different sequence, on triangles.



And we will get a completely different result.

Accuracy 0.9823 loss 0.0108 selected img  1913 tested img  9995 : : 6343it [2:11:36,  3.03s/it]

The network chose 1913 pictures with “new” information, i.e. the pithiness of the pictures with triangles is two times lower than with quadrilaterals!

Check the same on the stars and run the network in the third sequence



we get

Accuracy 0.8985 loss 0.0478 selected img   476 tested img  9985 : : 2188it [16:13,  1.16it/s]

As you can see, the stars were the most informative, only 476 pictures in the cheat sheet.

We have reason to judge the complexity of geometric shapes for perception of their neural network. The simplest is a star, only 476 pictures in the cheat sheet, then a quad with its 1007 and the most difficult was a triangle - you need 1913 pictures to learn.

Consider, this is for us, for people these are pictures, and for the network this is a course of recognition lectures and a course about triangles turned out to be the most difficult.

Now about serious


At first glance, all these ellipses and triangles seem to be pampering, sand cakes and lego. But here is a specific and serious question: if we apply some kind of preprocessing, a filter to the initial sequence, how will the complexity of the sequence change? For example, take all the same ellipses and quadrangles and apply such preprocessing to them

from scipy.ndimage import gaussian_filter
_tmp = [gaussian_filter(idx, sigma = 1) for idx in f_imgs]
f1_imgs = np.array(_tmp)[:,:,:,:1].reshape(-1,w_size ,w_size ,1)
del(_tmp)
fig, axes = plt.subplots(2, 5, figsize=(20, 7))
for k in range(5):
    kk = np.random.randint(train_num)
    axes[0,k].set_axis_off()
    axes[0,k].imshow(f1_imgs[kk].squeeze(), cmap="gray")
    axes[1,k].set_axis_off()
    axes[1,k].imshow(f_msks[kk].squeeze(), cmap="gray")



At first glance, everything is the same, the same ellipses, the same polygons, but the network began to work quite differently:

Accuracy 1.0575 loss 0.0011    selected img  7963 tested img  9999 : : 17765it [29:02:00, 12.40s/it]

It requires a little explanation, we do not use augmentation, because The shape of the polygon and the shape of the ellipse are initially randomly selected. Therefore, augmentation will not give new information and does not make sense with this case.

But, as can be seen from the result of the work, simple gaussian_filter created many problems for the network, generated a lot of new, and probably unnecessary, information.

Well, for lovers of simplicity in its pure form, let's take the same ellipses with polygons, but without any chance in color, the



result suggests that random color is not at all a simple additive.

Accuracy 0.9004 loss 0.0315 selected img   251 tested img  9832 : : 1000it [06:46,  1.33it/s]

The network has completely managed the information extracted from 251 pictures, almost four times less than from the multitude of pictures painted with noise.

The purpose of the article is to show some tool and examples of its work in non-serious examples, Lego in the sandbox. We have obtained a tool for comparing two training sequences, we can estimate how much our preprocessing complicates or simplifies the training sequence, how simple this or that primitive in the training sequence is for detection.

The possibility of applying this example of lego in real cases is obvious, but real trainings and networks of readers are the business of the readers themselves.

Also popular now: