# How to distinguish birds from flowers. Or flowers from birds

As a weekend program, I wanted to play with a kind of “neural” network (spoiler - there are no neurons in it). And so that later it would not be excruciatingly painful for the hours spent aimlessly lived , I thought that in vain we feed it, let it bring benefits - let this grid at the same time take apart the home photo archive and at least put the photos of flowers in a separate folder.

## The simplest network The simplest network was found in the article " Neural network in 11 lines in Python " (this is a translation from SLY_G of the article " A Neural Network in 11 lines of Python (Part 1) ", in general, the author still has a continuation of " A Neural Network in 13 lines of Python (Part 2 - Gradient Descent) ", but here the first article is enough).

A brief description of the grid - this network has exactly one dependency - NumPy .

Many inputs are considered as a matrix. , many outputs - as a vector . In the original article, the network multiplies the input matrix, (4 x 3), by the input weight matrix (3 x 4), applies the transfer function to the product, and gets the matrix of the layer (4 x 4).  Next layer multiplied by the output weight matrix (4 x 1), also passed through the function, and the result is a layer (4 x 1), which is the result of the network. Total, omitting the scalar transfer function, the network implements two matrix multiplications: The consequence of this, according to the rules of matrix multiplication, it turned out that one of the dimensions does not change during the operation of the network and it’s impossible to get a single number.

Therefore, I slightly modified the code from the article, added transposition after multiplication and work with an arbitrary number of layers in the grid. This gave me the opportunity to get any combination of dimensions of inputs and outputs.

For example, if you want the input matrix to be (3 x 4), and the output to be a single number, then we add two synapse matrices (4 x 1) and (3 x 1): Or, say, you can convert the input matrix (10 x 8) to the output (4 x 5): The resulting code is:

nnmat.py
``````import numpy as np
def nonlin(x,deriv=False):
if(deriv==True):
return (x)*(1-(x))
return 1/(1+np.exp(-x))
def fmax(x,deriv=False):
if(deriv==True):
return 0.33
return np.maximum(x,0)/3
class NN:
def __init__(self, shapes, func=nonlin):
self.func = func
self.shapes = shapes
self.syns = [ 2*np.random.random((shapes[i-1],shapes[i])) - 1
for i in range(1, len(shapes)) ]
self.layers = [ np.zeros(shapes[i])
for i in range(1, len(shapes)) ]
def learn(self, X, y, cycles):
for j in range(cycles):
res = self.calc(X)
prev = y - res
for i in range(len(self.layers)-1,-1,-1):
l_delta = (prev*self.func(self.layers[i], True)).T
if i == 0:
self.syns[i] += X.T.dot(l_delta)
else:
prev = l_delta.dot(self.syns[i].T)
self.syns[i] += self.layers[i-1].T.dot(l_delta)
return self.layers[-1]
def calc(self,X):
for i in range(len(self.syns)):
if i == 0:
self.layers[i] = self.func(np.dot(X,self.syns[i])).T
else:
self.layers[i] = self.func(np.dot(self.layers[i-1],self.syns[i])).T
return self.layers[-1]
if __name__ == '__main__':
X = np.array([ [0,0,1],[0,1,1],[1,0,1],[1,1,1] ])
y = np.array([[0,1,1,0]])
print('X =',X)
print('y =',y)
nn = NN((X.shape, (y.shape, X.shape), y.shape))
nn.learn(X,y,1000)
print('Result =',nn.calc(X).round(2))
``````

The result of work:

``````X = [[0 0 1]
[0 1 1]
[1 0 1]
[1 1 1]]
y = [[0 1 1 0]]
Result = [[ 0.02  0.99  0.98  0.02]]``````

So, there is a grid, now we need to deal with loading photos. Photos are on disk, mostly in JPG, but there are other formats. They also have different sizes, depending on what they shot and how they were processed, from 3 Mpx to 16 Mpx.

At first I tried to upload photos via Qt, the QImage class, it can work with different formats, provides conversion and gives direct access to image data. Surely in Python there is a simpler way, but I did not have to deal with QImage. In order for the network to work with a picture, it should be converted to a monochrome image and reduced to a standard size.

``````def readImage(file, imageSize):
img = QImage(file)
if img.isNull():
return 0
img = img.convertToFormat(QImage.Format_Grayscale8)
img = img.scaled(imageSize,imageSize,Qt.IgnoreAspectRatio)
return img
``````

To transfer to the grid, you need to convert the image to the numpy.ndarray matrix. QImage.bits () gives a pointer to image data, where each byte corresponds to a pixel. NumPy found a recarray function that can make an array of records from a buffer, and it has a view method that will make us a numpy.ndarray matrix without copying data.

``````        srcBi = img.bits()
srcBi.setsize(img.width() * img.height())
srcBy = bytes(srcBi)
srcW, srcH = img.width(), img.height()
srcArr = np.recarray((srcH, srcW), dtype=np.int8, buf=srcBy).view(dtype=np.byte,type=np.ndarray)
``````

## Network for images

Directly sending a picture to the network input will be too expensive, albeit a reduced one - I have already said that the network does matrix multiplication, so even one training cycle will lead to 400x400x400 = 64 million multiplications. Experts recommend the use of convolution . Wikipedia has a wonderful illustration of her work: This animation shows that the dimension of the result is equal to the dimension of the original matrix. But I’ll simplify my life a little bit, I’ll move not in pixels, but I will break the image into pieces the size of the input matrix, and apply the grid to them one by one. In matrices, cutting a piece is done quite simply:

``srcArr[x:x+dw, y:y+dw]``

The result of processing the pieces by the network is added to a smaller matrix, this matrix is ​​transmitted to the input of the common network. That is, there will be two networks - the first works with pieces of the image, the second - with the result of the first network working on the pieces.

Create a primary network:

``````class ImgNN:
def __init__(self, shape, resultShape = (16, 16), imageSize = (400,400)):
self.resultShape = resultShape
self.w = imageSize // shape
self.h = imageSize // shape
self.net = NN([shape, (1,shape), (1,1)])
self.shape = shape
self.imageSize = imageSize
``````

Self.net is created inside - the network itself, with a given size of the matrix of inputs shape and c output in the form of an elementary matrix 1x1. Yes, it was possible to inherit from the class of the NN network, but there was a day off, I wanted to get the result faster, and the architecture has not settled down yet. Time to market beats in our hearts!

Calculation of the image by the first network:

``````    def calc(self, srcArr):
w = srcArr.shape // self.shape
h = srcArr.shape // self.shape
resArr = np.zeros(self.resultShape)
for x in range(w):
for y in range(h):
a = srcArr[x:x+self.shape, y:y+self.shape]
if a.shape != (self.shape, self.shape):
continue
if x >= self.resultShape or y >= self.resultShape:
continue
res = self.nn.calc(a)
resArr[x,y] = res[0,0]
return resArr
``````

At the output, we have a resArr matrix, with a dimension equal to the number of pieces into which the image was divided. We pass this matrix to the input of the second network, which gives the final result.

``````    y = np.array([[1,0,1,0]])
firstShape = (40, 40)
middleShape = (5, 5)
imageSize = firstShape*middleShape, firstShape*middleShape
...
nn = ImgNN(firstShape, resultShape=middleShape, imageSize=imageSize)
nn2 = NN([middleShape, (y.shape, middleShape), y.shape])
...
mid = nn.calc(i)
res = nn2.calc(mid)
``````

Here you should ask me where I got the first line, and what it means:

``y = np.array([[1,0,1,0]])``

This is the expected result of the network in the case of a positive answer, i.e. if the network believes that the input image is a flower. I chose the dimension from the principle of “neither less nor more” - if we take the dimension 1x1, then from one resulting number it is difficult to judge how much the network “doubts” the result. There is no point in asking a large dimension either — it will not provide more information. An equal number of zeros and ones gives a clear reference - the closer to it, the greater the coincidence. If we take all units or all zeros, then the network will have an incentive to retrain - increase all factors or, respectively, reset them to get the desired result regardless of the input data.

## How to train a convolutional network?

I made a training sample from my own photographs, simply decomposing them into two directories:
flowers and noflowers I will collect the paths to the pictures in two arrays

``````            import os
fl = [e.path for e in os.scandir('flowers')]
nofl = [e.path for e in os.scandir('noflowers')]
all = fl+nofl
``````

It is usually proposed to train simple networks, including in the original article, by the traditional method - back propagation of errors . But in order to apply this method to a convolutional network consisting of two elementary ones, it is necessary to ensure the end-to-end transmission of the accumulated error from the second network to the first. In general, there are other methods for convolution networks . I was too lazy to redo the working network, at least for now, so I decided to train the second network, and do not teach the first one at all, leave it clogged with random values, judging that since the person’s eye nerves are not trained, then I have nothing to teach the primary network, "Looking" at the image.

``````
for epoch in range(100):
print('Epoch =', epoch)
nn = ImgNN(firstShape, resultShape=middleShape, imageSize=imageSize)
nn2 = NN([middleShape, (y.shape, middleShape), y.shape])
for f in fl:
# nn.learn(i, yy, 1)
mid = nn.calc(i)
nn2.learn(mid, y, 1000)
``````

In each era, right after training, I run through the network the entire sample and see what happened.

``````            for f in all:
mid = nn.calc(i)
res = nn2.calc(mid)
delta = abs(y-res)
v = round(np.std(delta),3)
``````

If the network has trained correctly, then its output should have a value close to the given [[1,0,1,0]], if the input is a flower, and as different as possible from the given one, for example [[0,1,0, 1]] if there is no flower at the entrance. The result is evaluated, empirically I accepted a deviation from a successful result of no more than 0.2 - this is also a successful result, and the number of errors is considered. Of all the runs, we select one where the least errors are made, and save the weights of the synapses of both grids into files. Further these files can be used for loading grids.

``````                if v > 0.2 and f in fl:
fails += 1
failFiles.append(f)
elif v<0.2 and f in nofl:
fails +=1
failFiles.append(f)
if minFails == None or fails < minFails:
minFails = fails
lastSyns = nn.net.syns
lastSyns2 = nn2.syns
print('fails =',fails, failFiles)
print('min =',minFails)
if minFails <= 1:
print('found!')
break
for i in range(len(lastSyns)):
np.savetxt('syns_save%s.txt'%i, lastSyns[i])
for i in range(len(lastSyns2)):
np.savetxt('syns2_save%s.txt'%i, lastSyns2[i])
``````

## Call it a rose, if not

With hope, I launch and ... wait ..., then wait another ..., and more ... I get complete nonsense - the grid does not learn:

Nothing happened
```flowers\178.jpg res = [[ 0.98 0.5 0.98 0.5 ]] v = 0.241 flowers\179.jpg res = [[ 0.98 0.5 0.98 0.5 ]] v = 0.24 flowers\180.jpg res = [[ 0.98 0.5 0.98 0.5 ]] v = 0.241 flowers\182.jpg res = [[ 0.98 0.5 0.98 0.5 ]] v = 0.24 flowers\186-2.jpg res = [[ 0.98 0.5 0.98 0.5 ]] v = 0.241 flowers\186.jpg res = [[ 0.98 0.5 0.98 0.5 ]] v = 0.24 flowers\187.jpg res = [[ 0.98 0.5 0.98 0.5 ]] v = 0.24 flowers\190 (2).jpg res = [[ 0.98 0.5 0.98 0.5 ]] v = 0.24 flowers\190.jpg res = [[ 0.98 0.5 0.98 0.5 ]] v = 0.241 flowers\191.jpg res = [[ 0.98 0.5 0.98 0.5 ]] v = 0.24 flowers\195.jpg res = [[ 0.98 0.5 0.98 0.5 ]] v = 0.241 flowers\199.jpg res = [[ 0.98 0.5 0.98 0.5 ]] v = 0.24 flowers\2.jpg res = [[ 0.98 0.5 0.98 0.5 ]] v = 0.241 flowers\200.jpg res = [[ 0.98 0.5 0.98 0.5 ]] v = 0.241 noflowers\032.jpg res = [[ 0.98 0.5 0.98 0.5 ]] v = 0.241 noflowers\085.jpg res = [[ 0.98 0.5 0.98 0.5 ]] v = 0.24 noflowers\088.jpg res = [[ 0.98 0.5 0.98 0.5 ]] v = 0.241 noflowers\122.JPG res = [[ 0.98 0.5 0.98 0.5 ]] v = 0.241 noflowers\123.jpg res = [[ 0.98 0.5 0.98 0.5 ]] v = 0.241 noflowers\173.jpg res = [[ 0.98 0.5 0.98 0.5 ]] v = 0.24 noflowers\202.jpg res = [[ 0.98 0.5 0.98 0.5 ]] v = 0.241 noflowers\205.jpg res = [[ 0.98 0.5 0.98 0.5 ]] v = 0.241 noflowers\cutxml.jpg res = [[ 0.98 0.5 0.98 0.5 ]] v = 0.241 noflowers\Getaway.jpg res = [[ 0.98 0.5 0.98 0.5 ]] v = 0.24 noflowers\IMGP1800.JPG res = [[ 0.98 0.5 0.98 0.5 ]] v = 0.24 noflowers\trq-4.png res = [[ 0.97 0.51 0.97 0.51]] v = 0.239 fails = 14 ```

Being the carrier of real living, and not artificial neurons, it dawned on me that the main difference between colors is color (yes, cap, thank you for being always there, although you are often late with your advice). Therefore, it would be necessary to translate it into some kind of color model, where the color component will be highlighted (HSV or HSL), and train the network in color.

But it turned out that the QImage class does not know such color spaces . I had to abandon it and upload pictures using OpenCV, where there is such an opportunity.

``````import cv2
small = cv2.resize(img, imageSize)
hsv = cv2.cvtColor(small, cv2.COLOR_BGR2HSV)
return hsv[:,:,0]/255
``````

True, OpenCV flatly refused to work with Russian letters in file names, I had to rename them.

Launched - the result is not pleased, almost the same.

I also thought, decided that the problem is in strongly random values ​​in the first grid, in vain I hoped that the stars would converge without my help, so I added a little pre-training to her, only 2 cycles per file. For a sample of a positive result, I took the identity matrix.

``````        yy = np.zeros(middleShape)
np.fill_diagonal(yy,1)
...
for f in fl:
nn.learn(i, yy, 2) # чуть-чуть обучаем первую сетку
mid = nn.calc(i)
nn2.learn(mid, y, 1000)
``````

I started it again - it became much more interesting, the numbers began to change, although I did not reach the ideal.

Best result
```Epoch = 34 flowers\178.jpg res = [[ 0.86 0.47 0.88 0.47]] v = 0.171 flowers\179.jpg res = [[ 0.87 0.51 0.89 0.5 ]] v = 0.194 flowers\180.jpg res = [[ 0.79 0.69 0.79 0.67]] v = 0.233 flowers\182.jpg res = [[ 0.87 0.53 0.88 0.48]] v = 0.189 flowers\186-2.jpg res = [[ 0.89 0.41 0.89 0.39]] v = 0.144 flowers\186.jpg res = [[ 0.85 0.54 0.83 0.55]] v = 0.194 flowers\187.jpg res = [[ 0.86 0.54 0.86 0.54]] v = 0.199 flowers\190 (2).jpg res = [[ 0.96 0.25 0.97 0.15]] v = 0.089 flowers\190.jpg res = [[ 0.95 0.13 0.97 0.14]] v = 0.048 flowers\191.jpg res = [[ 0.81 0.57 0.82 0.57]] v = 0.195 flowers\195.jpg res = [[ 0.81 0.55 0.79 0.56]] v = 0.177 flowers\199.jpg res = [[ 0.89 0.45 0.89 0.45]] v = 0.171 flowers\2.jpg res = [[ 0.83 0.56 0.83 0.55]] v = 0.195 flowers\200.jpg res = [[ 0.91 0.42 0.89 0.43]] v = 0.163 noflowers\032.jpg res = [[ 0.7 0.79 0.69 0.8 ]] v = 0.246 noflowers\085.jpg res = [[ 0.86 0.53 0.86 0.53]] v = 0.192 noflowers\088.jpg res = [[ 0.86 0.56 0.87 0.53]] v = 0.207 noflowers\122.JPG res = [[ 0.81 0.63 0.81 0.62]] v = 0.218 noflowers\123.jpg res = [[ 0.83 0.59 0.84 0.55]] v = 0.204 noflowers\173.jpg res = [[ 0.83 0.6 0.83 0.58]] v = 0.209 noflowers\202.jpg res = [[ 0.78 0.7 0.8 0.65]] v = 0.234 noflowers\205.jpg res = [[ 0.84 0.77 0.79 0.75]] v = 0.287 noflowers\cutxml.jpg res = [[ 0.81 0.61 0.81 0.63]] v = 0.213 noflowers\Getaway.jpg res = [[ 0.85 0.56 0.85 0.55]] v = 0.202 noflowers\IMGP1800.JPG res = [[ 0.85 0.55 0.86 0.54]] v = 0.199 noflowers\trq-4.png res = [[ 0.7 0.72 0.7 0.71]] v = 0.208 fails = 3 ['flowers\\180.jpg', 'noflowers\\085.jpg', 'noflowers\\IMGP1800.JPG'] min = 3```

Further ... And then the weekend ended, and it was time for me to do household work.

## What to do next?

Of course, this network, the way I taught it, and the test dataset have very little relation to real networks and what data scientists do. This is just a toy for the gymnastics of the mind, do not have high hopes for it.

You can outline further steps on how to achieve the desired result (if you need it):

1. Add another intermediate layer or several to the second network - so she will have more freedom in learning. Still, the network on matrix multiplication is not entirely classical, since it has fewer synaptic links between layers, and the synapses themselves are not unique.
2. Use approximations to successful results as a blank for subsequent training - i.e. remember synapse weights of the most successful result, and not overwrite all random values.
3. Try genetic algorithms - mix and share, multiply the successful and reject the unsuccessful.
4. Try other learning methods, of which there is already a carriage and a small cart.
5. Use more information from the original image, for example, simultaneously apply color and monochrome to various networks, process the results in a common network.

Source
``````import numpy as np
from nnmat import *
import os
import sys
from PyQt5.QtGui import *
from PyQt5.QtCore import *
import meshandler
import random
import cv2
class ImgNN:
def __init__(self, shape, resultShape = (16, 16), imageSize = (400,400)):
self.resultShape = resultShape
self.w = imageSize // shape
self.h = imageSize // shape
self.net = NN([shape, (1,shape), (1,1)])
self.shape = shape
self.imageSize = imageSize
def learn(self, srcArr, result, cycles):
for c in range(cycles):
for x in range(self.w):
for y in range(self.h):
a = srcArr[x:x+self.shape, y:y+self.shape]
if a.shape != (self.shape, self.shape):
print(a.shape)
continue
self.net.learn(a, result[x,y], 1)
def calc(self, srcArr):
resArr = np.zeros(self.resultShape)
for x in range(self.w):
for y in range(self.h):
a = srcArr[x:x+self.shape, y:y+self.shape]
if a.shape != (self.shape, self.shape):
continue
if x >= self.resultShape or y >= self.resultShape:
continue
res = self.net.calc(a)
resArr[x,y] = res[0,0]
return resArr
def learnFile(self, file, result, cycles):
def calcFile(self, file):
small = cv2.resize(img, imageSize)
hsv = cv2.cvtColor(small, cv2.COLOR_BGR2HSV)
return hsv[:,:,0]/255
img = QImage(file)
if img.isNull():
return 0
img = img.convertToFormat(QImage.Format_Grayscale8)
img = img.scaled(imageSize,imageSize,Qt.IgnoreAspectRatio)
srcBi = img.bits()
srcBi.setsize(img.width() * img.height())
srcBy = bytes(srcBi)
srcW, srcH = img.width(), img.height()
srcArr = np.recarray((srcH, srcW), dtype=np.uint8, buf=srcBy).view(dtype=np.uint8,type=np.ndarray)
return srcArr/255
if __name__ == '__main__':
y = np.array([[1,0,1,0]])
firstShape = (40, 40)
middleShape = (10, 10)
imageSize = firstShape*middleShape, firstShape*middleShape
StartLearn = True
if not StartLearn:
pictDir = '2014-05'
nn = ImgNN(firstShape, resultShape=middleShape, imageSize=imageSize)
nn2 = NN([middleShape, (y.shape, middleShape), y.shape])
files = [e.path for e in os.scandir(pictDir)]
for f in files:
res = nn2.calc(i)
delta = y-res
v = round(np.std(delta),3)
if v < 0.2:
print('Flower',f)
else:
print('No flower',f)
else:
fl = [e.path for e in os.scandir('flowers')]
nofl = [e.path for e in os.scandir('noflowers')]
all = fl+nofl
yy = np.zeros(middleShape)
np.fill_diagonal(yy,1)
minFails = None
for epoch in range(100):
print('Epoch =', epoch)
nn = ImgNN(firstShape, resultShape=middleShape, imageSize=imageSize)
nn2 = NN([middleShape, (y.shape, middleShape), y.shape])
for f in fl:
nn.learn(i, yy, 2)
mid = nn.calc(i)
nn2.learn(mid, y, 1000)
fails = 0
failFiles = []
for f in all:
mid = nn.calc(i)
res = nn2.calc(mid)
delta = abs(y-res)
v = round(np.std(delta),3)
#v = round(delta.sum(),3)
print(f, 'res = ', res.round(2),'v =',v)
if v > 0.2 and f in fl:
fails += 1
failFiles.append(f)
elif v<0.2 and f in nofl:
fails +=1
failFiles.append(f)
if minFails == None or fails < minFails:
minFails = fails
lastSyns = nn.net.syns
lastSyns2 = nn2.syns
print('fails =',fails, failFiles)
print('min =',minFails)
if minFails <= 1:
print('found!')
break
for i in range(len(lastSyns)):
np.savetxt('syns_save%s.txt'%i, lastSyns[i])
for i in range(len(lastSyns2)):
np.savetxt('syns2_save%s.txt'%i, lastSyns2[i])
``````

Continuation