Making a neural network: how not to break the brain
Hi, Habr!
In this small note I will talk about two pitfalls, which are easy to collide with and easily broken about.
It will be about creating a trivial neural network on Keras, with which we will predict the arithmetic mean of two numbers.
It would seem that it could be easier. And indeed, nothing complicated, but there are nuances.
To whom the topic is interesting, welcome under the cat, there will not be long boring descriptions here, just a short code and comments to it.
The solution looks something like this:
We are trying to learn ... but nothing comes out. And in this place you can arrange dances with a tambourine and lose a lot of time.
Predicted 49, which is far from 65.
But if we alter the generator a little, everything starts working right away.
And it is clear that the network is already converging literally in the third era.
The main difference is that in the first case, the x_mean object is created in memory each time, and in the second it appears when the generator is created and then it is only reused.
We understand further whether everything is true in this generator. It turns out that not really.
The following example shows that something is wrong.
The average value in the first iterator call does not match the numbers on the basis of which it is calculated. In fact, the average value was calculated correctly, but since the array was passed by reference, then when the iterator was invoked for the second time, the values in the array were overwritten, and the print () function returned, which was in the array, and not what we expected.
There are two ways to fix this. Both costly, but correct.
1. Move the creation of the variable x inside the while loop so that the array at each yield creates a new one.
2. Return a copy of the array.
Now everything is fine. Go ahead.
Do I need to expand_dims? Let's try to remove this line and the new code will be like this:
Everything is great at learning, although the returned data has a different shape.
For example, it was [[49.]], and it became [49.], but inside Keras, this seems to be correctly reduced to the desired dimension.
So, we know what the correct data generator should look like, now let's play with the lambda function, and look at the behavior of expand_dims there.
We will not predict anything, just consider the correct value inside lambda.
The code is as follows:
We start and see that everything is fine:
Now let's try a little change our lambda function and remove the expand_dims.
When compiling the model, there were no errors on the dimension, but the result is different, the loss is considered incomprehensible as. Thus, here expand_dims needs to be done, nothing will automatically happen.
And if you look at the returned result of predict (), you can see that the dimension is wrong, the output is [46.], and it is expected [[46.]].
Something like this. Thanks to everyone who read it. And be careful in the details, the effect of them can be significant.
In this small note I will talk about two pitfalls, which are easy to collide with and easily broken about.
It will be about creating a trivial neural network on Keras, with which we will predict the arithmetic mean of two numbers.
It would seem that it could be easier. And indeed, nothing complicated, but there are nuances.
To whom the topic is interesting, welcome under the cat, there will not be long boring descriptions here, just a short code and comments to it.
The solution looks something like this:
import numpy as np
from keras.layers import Input, Dense, Lambda
from keras.models import Model
import keras.backend as K
# генератор данныхdeftrain_iterator(batch_size=64):
x = np.zeros((batch_size, 2))
whileTrue:
for i in range(batch_size):
x[i][0] = np.random.randint(0, 100)
x[i][1] = np.random.randint(0, 100)
x_mean = (x[::,0] + x[::,1]) / 2
x_mean_ex = np.expand_dims(x_mean, -1)
yield [x], [x_mean_ex]
# модельdefcreate_model():
x = Input(name = 'x', shape=(2,))
x_mean = Dense(1)(x)
model = Model(inputs=x, outputs=x_mean)
return model
# создаем и учим
model = create_model()
model.compile(loss=['mse'], optimizer = 'rmsprop')
model.fit_generator(train_iterator(), steps_per_epoch = 1000, epochs = 100, verbose = 1)
# предсказываем
x, x_mean = next(train_iterator(1))
print(x, x_mean, model.predict(x))
We are trying to learn ... but nothing comes out. And in this place you can arrange dances with a tambourine and lose a lot of time.
Epoch 1/100
1000/1000 [==============================] - 2s 2ms/step - loss: 1044.0806
Epoch 2/100
1000/1000 [==============================] - 2s 2ms/step - loss: 713.5198
Epoch 3/100
1000/1000 [==============================] - 3s 3ms/step - loss: 708.1110
...
Epoch 98/100
1000/1000 [==============================] - 2s 2ms/step - loss: 415.0479
Epoch 99/100
1000/1000 [==============================] - 2s 2ms/step - loss: 416.6932
Epoch 100/100
1000/1000 [==============================] - 2s 2ms/step - loss: 417.2400
[array([[73., 57.]])] [array([[65.]])] [[49.650894]]
Predicted 49, which is far from 65.
But if we alter the generator a little, everything starts working right away.
deftrain_iterator_1(batch_size=64):
x = np.zeros((batch_size, 2))
x_mean = np.zeros((batch_size,))
whileTrue:
for i in range(batch_size):
x[i][0] = np.random.randint(0, 100)
x[i][1] = np.random.randint(0, 100)
x_mean[::] = (x[::,0] + x[::,1]) / 2
x_mean_ex = np.expand_dims(x_mean, -1)
yield [x], [x_mean_ex]
And it is clear that the network is already converging literally in the third era.
Epoch 1/5
1000/1000 [==============================] - 2s 2ms/step - loss: 648.9184
Epoch 2/5
1000/1000 [==============================] - 2s 2ms/step - loss: 0.0177
Epoch 3/5
1000/1000 [==============================] - 2s 2ms/step - loss: 0.0030
The main difference is that in the first case, the x_mean object is created in memory each time, and in the second it appears when the generator is created and then it is only reused.
We understand further whether everything is true in this generator. It turns out that not really.
The following example shows that something is wrong.
deftrain_iterator(batch_size=1):
x = np.zeros((batch_size, 2))
whileTrue:
for i in range(batch_size):
x[i][0] = np.random.randint(0, 100)
x[i][1] = np.random.randint(0, 100)
x_mean = (x[::,0] + x[::,1]) / 2yield x, x_mean
it = train_iterator()
print(next(it), next(it))
(array([[44., 2.]]), array([10.])) (array([[44., 2.]]), array([23.]))
The average value in the first iterator call does not match the numbers on the basis of which it is calculated. In fact, the average value was calculated correctly, but since the array was passed by reference, then when the iterator was invoked for the second time, the values in the array were overwritten, and the print () function returned, which was in the array, and not what we expected.
There are two ways to fix this. Both costly, but correct.
1. Move the creation of the variable x inside the while loop so that the array at each yield creates a new one.
deftrain_iterator_1(batch_size=1):whileTrue:
x = np.zeros((batch_size, 2))
for i in range(batch_size):
x[i][0] = np.random.randint(0, 100)
x[i][1] = np.random.randint(0, 100)
x_mean = (x[::,0] + x[::,1]) / 2yield x, x_mean
it_1 = train_iterator_1()
print(next(it_1), next(it_1))
(array([[82., 4.]]), array([43.])) (array([[77., 34.]]), array([55.5]))
2. Return a copy of the array.
deftrain_iterator_2(batch_size=1):
x = np.zeros((batch_size, 2))
whileTrue:
x = np.zeros((batch_size, 2))
for i in range(batch_size):
x[i][0] = np.random.randint(0, 100)
x[i][1] = np.random.randint(0, 100)
x_mean = (x[::,0] + x[::,1]) / 2yield np.copy(x), x_mean
it_2 = train_iterator_2()
print(next(it_2), next(it_2))
(array([[63., 31.]]), array([47.])) (array([[94., 25.]]), array([59.5]))
Now everything is fine. Go ahead.
Do I need to expand_dims? Let's try to remove this line and the new code will be like this:
deftrain_iterator(batch_size=64):whileTrue:
x = np.zeros((batch_size, 2))
for i in range(batch_size):
x[i][0] = np.random.randint(0, 100)
x[i][1] = np.random.randint(0, 100)
x_mean = (x[::,0] + x[::,1]) / 2yield [x], [x_mean]
Everything is great at learning, although the returned data has a different shape.
For example, it was [[49.]], and it became [49.], but inside Keras, this seems to be correctly reduced to the desired dimension.
So, we know what the correct data generator should look like, now let's play with the lambda function, and look at the behavior of expand_dims there.
We will not predict anything, just consider the correct value inside lambda.
The code is as follows:
defcalc_mean(x):
res = (x[::,0] + x[::,1]) / 2
res = K.expand_dims(res, -1)
return res
defcreate_model():
x = Input(name = 'x', shape=(2,))
x_mean = Lambda(lambda x: calc_mean(x), output_shape=(1,))(x)
model = Model(inputs=x, outputs=x_mean)
return model
We start and see that everything is fine:
Epoch 1/5
100/100 [==============================] - 0s 3ms/step - loss: 0.0000e+00
Epoch 2/5
100/100 [==============================] - 0s 2ms/step - loss: 0.0000e+00
Epoch 3/5
100/100 [==============================] - 0s 3ms/step - loss: 0.0000e+00
Now let's try a little change our lambda function and remove the expand_dims.
defcalc_mean(x):
res = (x[::,0] + x[::,1]) / 2return res
When compiling the model, there were no errors on the dimension, but the result is different, the loss is considered incomprehensible as. Thus, here expand_dims needs to be done, nothing will automatically happen.
Epoch 1/5
100/100 [==============================] - 0s 3ms/step - loss: 871.6299
Epoch 2/5
100/100 [==============================] - 0s 3ms/step - loss: 830.2568
Epoch 3/5
100/100 [==============================] - 0s 2ms/step - loss: 830.8041
And if you look at the returned result of predict (), you can see that the dimension is wrong, the output is [46.], and it is expected [[46.]].
Something like this. Thanks to everyone who read it. And be careful in the details, the effect of them can be significant.