Making a neural network: how not to break the brain

    Hi, Habr!

    In this small note I will talk about two pitfalls, which are easy to collide with and easily broken about.

    It will be about creating a trivial neural network on Keras, with which we will predict the arithmetic mean of two numbers.

    It would seem that it could be easier. And indeed, nothing complicated, but there are nuances.

    To whom the topic is interesting, welcome under the cat, there will not be long boring descriptions here, just a short code and comments to it.

    The solution looks something like this:

    import numpy as np
    from keras.layers import Input, Dense, Lambda
    from keras.models import Model
    import keras.backend as K
    # генератор данныхdeftrain_iterator(batch_size=64):
        x = np.zeros((batch_size, 2))
        whileTrue:
            for i in range(batch_size):
                x[i][0] = np.random.randint(0, 100)
                x[i][1] = np.random.randint(0, 100)
            x_mean = (x[::,0] + x[::,1]) / 2
            x_mean_ex = np.expand_dims(x_mean, -1)
            yield [x], [x_mean_ex]
    # модельdefcreate_model():
        x = Input(name = 'x', shape=(2,))
        x_mean = Dense(1)(x)
        model = Model(inputs=x, outputs=x_mean)
        return model
    # создаем и учим
    model = create_model()
    model.compile(loss=['mse'], optimizer = 'rmsprop')
    model.fit_generator(train_iterator(), steps_per_epoch = 1000, epochs = 100, verbose = 1)
    # предсказываем
    x, x_mean = next(train_iterator(1))
    print(x, x_mean, model.predict(x))
    

    We are trying to learn ... but nothing comes out. And in this place you can arrange dances with a tambourine and lose a lot of time.

    Epoch 1/100
    1000/1000 [==============================] - 2s 2ms/step - loss: 1044.0806
    Epoch 2/100
    1000/1000 [==============================] - 2s 2ms/step - loss: 713.5198
    Epoch 3/100
    1000/1000 [==============================] - 3s 3ms/step - loss: 708.1110
    ...
    Epoch 98/100
    1000/1000 [==============================] - 2s 2ms/step - loss: 415.0479
    Epoch 99/100
    1000/1000 [==============================] - 2s 2ms/step - loss: 416.6932
    Epoch 100/100
    1000/1000 [==============================] - 2s 2ms/step - loss: 417.2400
    [array([[73., 57.]])] [array([[65.]])] [[49.650894]]
    

    Predicted 49, which is far from 65.

    But if we alter the generator a little, everything starts working right away.

    deftrain_iterator_1(batch_size=64):
        x = np.zeros((batch_size, 2))
        x_mean = np.zeros((batch_size,))
        whileTrue:
            for i in range(batch_size):
                x[i][0] = np.random.randint(0, 100)
                x[i][1] = np.random.randint(0, 100)
            x_mean[::] = (x[::,0] + x[::,1]) / 2
            x_mean_ex = np.expand_dims(x_mean, -1)
            yield [x], [x_mean_ex]
    

    And it is clear that the network is already converging literally in the third era.

    Epoch 1/5
    1000/1000 [==============================] - 2s 2ms/step - loss: 648.9184
    Epoch 2/5
    1000/1000 [==============================] - 2s 2ms/step - loss: 0.0177
    Epoch 3/5
    1000/1000 [==============================] - 2s 2ms/step - loss: 0.0030
    

    The main difference is that in the first case, the x_mean object is created in memory each time, and in the second it appears when the generator is created and then it is only reused.

    We understand further whether everything is true in this generator. It turns out that not really.
    The following example shows that something is wrong.
    deftrain_iterator(batch_size=1):
        x = np.zeros((batch_size, 2))
        whileTrue:
            for i in range(batch_size):
                x[i][0] = np.random.randint(0, 100)
                x[i][1] = np.random.randint(0, 100)
            x_mean = (x[::,0] + x[::,1]) / 2yield x, x_mean
    it = train_iterator()
    print(next(it), next(it))
    

    (array([[44., 2.]]), array([10.])) (array([[44., 2.]]), array([23.]))

    The average value in the first iterator call does not match the numbers on the basis of which it is calculated. In fact, the average value was calculated correctly, but since the array was passed by reference, then when the iterator was invoked for the second time, the values ​​in the array were overwritten, and the print () function returned, which was in the array, and not what we expected.

    There are two ways to fix this. Both costly, but correct.
    1. Move the creation of the variable x inside the while loop so that the array at each yield creates a new one.
    deftrain_iterator_1(batch_size=1):whileTrue:
            x = np.zeros((batch_size, 2))
            for i in range(batch_size):
                x[i][0] = np.random.randint(0, 100)
                x[i][1] = np.random.randint(0, 100)
            x_mean = (x[::,0] + x[::,1]) / 2yield x, x_mean
    it_1 = train_iterator_1()
    print(next(it_1), next(it_1))
    

    (array([[82., 4.]]), array([43.])) (array([[77., 34.]]), array([55.5]))


    2. Return a copy of the array.
    deftrain_iterator_2(batch_size=1):
        x = np.zeros((batch_size, 2))
        whileTrue:
            x = np.zeros((batch_size, 2))
            for i in range(batch_size):
                x[i][0] = np.random.randint(0, 100)
                x[i][1] = np.random.randint(0, 100)
            x_mean = (x[::,0] + x[::,1]) / 2yield np.copy(x), x_mean
    it_2 = train_iterator_2()
    print(next(it_2), next(it_2))
    

    (array([[63., 31.]]), array([47.])) (array([[94., 25.]]), array([59.5]))


    Now everything is fine. Go ahead.

    Do I need to expand_dims? Let's try to remove this line and the new code will be like this:

    deftrain_iterator(batch_size=64):whileTrue:
            x = np.zeros((batch_size, 2))
            for i in range(batch_size):
                x[i][0] = np.random.randint(0, 100)
                x[i][1] = np.random.randint(0, 100)
            x_mean = (x[::,0] + x[::,1]) / 2yield [x], [x_mean]
    

    Everything is great at learning, although the returned data has a different shape.

    For example, it was [[49.]], and it became [49.], but inside Keras, this seems to be correctly reduced to the desired dimension.

    So, we know what the correct data generator should look like, now let's play with the lambda function, and look at the behavior of expand_dims there.

    We will not predict anything, just consider the correct value inside lambda.

    The code is as follows:

    defcalc_mean(x):
        res = (x[::,0] + x[::,1]) / 2
        res = K.expand_dims(res, -1)
        return res
    defcreate_model():
        x = Input(name = 'x', shape=(2,))
        x_mean = Lambda(lambda x: calc_mean(x), output_shape=(1,))(x)
        model = Model(inputs=x, outputs=x_mean)
        return model
    

    We start and see that everything is fine:

    Epoch 1/5
    100/100 [==============================] - 0s 3ms/step - loss: 0.0000e+00
    Epoch 2/5
    100/100 [==============================] - 0s 2ms/step - loss: 0.0000e+00
    Epoch 3/5
    100/100 [==============================] - 0s 3ms/step - loss: 0.0000e+00
    

    Now let's try a little change our lambda function and remove the expand_dims.

    defcalc_mean(x):
        res = (x[::,0] + x[::,1]) / 2return res
    

    When compiling the model, there were no errors on the dimension, but the result is different, the loss is considered incomprehensible as. Thus, here expand_dims needs to be done, nothing will automatically happen.

    Epoch 1/5
    100/100 [==============================] - 0s 3ms/step - loss: 871.6299
    Epoch 2/5
    100/100 [==============================] - 0s 3ms/step - loss: 830.2568
    Epoch 3/5
    100/100 [==============================] - 0s 2ms/step - loss: 830.8041
    

    And if you look at the returned result of predict (), you can see that the dimension is wrong, the output is [46.], and it is expected [[46.]].

    Something like this. Thanks to everyone who read it. And be careful in the details, the effect of them can be significant.

    Also popular now: