# Algorithm for learning a multilayer neural network using the backpropagation method

The topic of neural networks has already been covered more than once on the hub, but today I would like to introduce readers to the algorithm for training a multilayer neural network using the back propagation method of error and give an implementation of this method.

I want to make a reservation right away that I am not an expert in the field of neural networks, so I expect constructive criticism, comments and additions from readers.

This material assumes familiarity with the basics of neural networks, however, I consider it possible to introduce the reader to the topic without unnecessary trials on the theory of neural networks. So, for those who first hear the phrase “neural network”, I propose to perceive the neural network as a weighted directed graph whose nodes (neurons) are arranged in layers. In addition, the node of one layer has connections with all nodes of the previous layer. In our case, such a graph will have input and output layers, the nodes of which act as inputs and

outputs respectively. Each node (neuron) has an activation function - the function responsible for calculating the signal at the output of the node (neuron). There is also the concept of displacement, which is a node, the output of which always appears unit. In this article, we will consider the process of learning a neural network, which assumes the presence of a “teacher”, that is, the learning process in which learning occurs by providing the network with a sequence of training examples with the correct responses.

As with most neural networks, our goal is to train the network in such a way as to achieve a balance between the ability of the network to give the correct response to the input data used in the learning process (memorization) and the ability to produce the correct results in response to the input data. similar, but not identical to those that were used in training (principle of generalization). Learning a network using the error back propagation method includes three stages: feeding data to the input, followed by data distribution in the direction of the outputs, calculating and back propagating the corresponding error, and adjusting the weights. After training, it is only supposed to supply data to the network input and distribute them in the direction of the outputs. Moreover, if network training can be a rather lengthy process, then the direct calculation of the results by the trained network is very fast. In addition, there are numerous variations of the back propagation method designed to increase the flow rate.

learning process.

It is also worth noting that a single-layer neural network is significantly limited in what training it is subject to input patterns, while a multi-layer network (with one or more hidden layers) does not have such a drawback. Next, a description will be given of a standard neural network with back propagation of error.

Figure 1 shows a multilayer neural network with one layer of hidden neurons (elements Z).

Neurons representing network outputs (indicated ) and hidden neurons can be biased (as shown in the image). The offset corresponding to the output is indicated , for the hidden element - . These displacements serve as weights on the bonds emanating from neurons, the output of which always appears 1 (they are shown in Figure 1, but usually are not displayed, implied). In addition, in Figure 1, the arrows show the movement of information during the phase of data dissemination from inputs to outputs. In the learning process, signals propagate in the opposite direction.

The algorithm presented below is applicable to a neural network with one hidden layer, which is an acceptable and adequate situation for most applications. As mentioned earlier, network training includes three stages: supplying training data to the network inputs, back propagation of errors, and correction of weights. During the first stage, each input neuron receives a signal and broadcasts it to each of the hidden neurons . Each hidden neuron then calculates the result of its activation function (network function) and sends its signal to all output neurons. Each output neuron , in turn, calculates the result of its activation function, which is nothing more than the output signal of a given neuron for the corresponding input data. In the learning process, each neuron at the output of the network compares the calculated value with the provided teacher (target value), determining the corresponding error value for a given input template. Based on this error is calculated . used when propagating errors from to all network elements of the previous layer (hidden neurons associated with ), as well as later when changing the weights of connections between output neurons and hidden. It is similarly calculated for each hidden neuron . Although there is no need to propagate the error to the input layer,used to change the weights of connections between neurons of the hidden layer and input neurons. After all have been identified, there is a simultaneous adjustment of the weights of all bonds.

The following notation is used in the network learning algorithm:

Input vector of training data

Vector of target output values provided by the teacher;

Component for adjusting the connection weights corresponding to the error of the output neuron ; also, neuron error information that is distributed to those hidden layer neurons that are associated with .

Component of correlation of the weights of connections corresponding to the error information distributed from the output layer to the hidden neuron .

Learning speed.

Neuron at the input with index i. For input neurons, the input and output signals are the same .

Displacement of the hidden neuron j.

Hidden neuron j; The total value supplied to the input of the hidden element is indicated by :

The signal at the output (the result of applying to the activation function) is indicated by :

Displacement of the neuron at the output.

Neuron at the output under the index k; The total value supplied to the input of an output element is denoted by : . The output signal (the result of applying to the activation function) is indicated by :

The activation function in the error backpropagation algorithm should have several important characteristics: continuity, differentiability, and be monotonically non-decreasing. Moreover, for the sake of computational efficiency, it is desirable that its derivative be easily found. Often, the activation function is also a saturation function. One of the most commonly used activation functions is a binary sigmoid function with a range of values in (0, 1) and defined as:

Another widely used activation function is a bipolar sigmoid with a range of values (-1, 1) and defined as:

The learning algorithm is as follows:

Initialization of weights (the weights of all bonds are initialized by random small values).

Until the termination condition of the algorithm is correct, steps 2 through 9 are performed.

For each pair {data, target value}, steps 3 through 8 are performed.

Each input neuron sends the received signal to all neurons in the next layer (hidden).

Each hidden neuron sums the weighted incoming signals: and applies an activation function: After which it sends the result to all elements of the next layer (output).

Each output neuron sums the weighted input signals: and applies an activation function, calculating the output signal:

Каждый выходной нейрон получает целевое значение — то выходное значение, которое является правильным для данного входного сигнала, и вычисляет ошибку: , так же вычисляет величину, на которую изменится вес связи : . Помимо этого, вычисляет величину корректировки смещения: и посылает нейронам в предыдущем слое.

Каждый скрытый нейрон суммирует входящие ошибки ( от нейронов в последующем слое ) и вычисляет величину ошибки, умножая полученное значение на производную активационной функции: , так же вычисляет величину, на которую изменится вес связи : . Помимо этого, вычисляет величину корректировки смещения:

Each output neuron changes the weights of its connections with the displacement element and hidden neurons:

Each hidden neuron changes the weights of its connections with the displacement element and hidden neurons:

Checking the termination condition of the algorithm.

The condition for the termination of the operation of the algorithm can be both the achievement of the total quadratic error of the result at the output of the network at a predetermined minimum in advance during the learning process, and the execution of a certain number of iterations of the algorithm. The algorithm is based on a method called gradient descent. Depending on the sign, the gradient of the function (in this case, the value of the function is an error, and the parameters are the weights of the links in the network) gives the direction in which the values of the function increase (or decrease) most rapidly.

Random initialization. The choice of initial weights will affect whether the network can reach the global (or only local) minimum error, and how quickly this process will occur. The change in weights between two neurons is associated with the derivative of the activation function of the neuron from the next layer and the activation function of the neuron of the previous layer. In this regard, it is important to avoid choosing such initial weights that zero the activation function or its derivative. Also, the initial weights should not be too large (or the input signal for each hidden or output neuron will most likely fall into a region of very small sigmoid values (saturation region)). On the other hand, if the initial weights are too small, then the input signal to the hidden or output neurons will be close to zero, which will also lead to a very slow learning rate. The standard procedure for initializing weights is to assign them random values in the interval (-0.5; 0.5). The values can be either positive or negative, since the final weights obtained after training the network can be of both signs. Initializing Nguyen - Widrow. The simple modification of the standard initialization procedure presented below facilitates faster learning: Weights of the connections of hidden and output neurons, as well as the displacement of the output layer, are initialized, as in the standard procedure, by random values from the interval (-0.5; 0.5). obtained after training the network, can be both characters. Initializing Nguyen - Widrow. The simple modification of the standard initialization procedure presented below facilitates faster learning: Weights of the connections of hidden and output neurons, as well as the displacement of the output layer, are initialized, as in the standard procedure, by random values from the interval (-0.5; 0.5). obtained after training the network, can be both characters. Initializing Nguyen - Widrow. The simple modification of the standard initialization procedure presented below facilitates faster learning: Weights of the connections of hidden and output neurons, as well as the displacement of the output layer, are initialized, as in the standard procedure, by random values from the interval (-0.5; 0.5).

We introduce the following notation: the

number of input neurons, the

number of hidden neurons,

the scaling factor:

The procedure consists of the following simple steps:

For each hidden neuron :

initialize its weight vector (connections with input neurons):

calculate

reinitialize weights:

set the offset value:

I'll start with the implementation of the concept of a neuron. It was decided to present the neurons of the input layer as the base class, and the hidden and output ones as decorators of the base class. In addition, the neuron stores information about outgoing and incoming connections, and each neuron compositionally has an activation function.

The neural connection interface is presented below, each connection stores a weight and a pointer to a neuron:

Each activation function inherits from an abstract class, realizing the function itself and the derivative:

A neural factory is responsible for the production of neurons:

The neural network itself stores pointers to neurons organized by

layers (in general, pointers to neurons are stored in vectors that

need to be replaced by layer objects), includes an abstract

neuron factory, as well as a network learning algorithm.

And finally, the interface of the class responsible for training the network itself:

All the code is available on github: Sovietmade / NeuralNetworks.

As a conclusion, I would like to note that the topic of neural networks is not fully developed at the moment, again and again we see on the pages of the habron mention of new achievements of scientists in the field of neural networks, new amazing developments. For my part,

this article was the first step in mastering an interesting technology, and I hope for someone it will prove to be useful.

The neural network learning algorithm was taken from an amazing book:

Laurene V. Fausett “Fundamentals of Neural Networks: Architectures, Algorithms And Applications”.

I want to make a reservation right away that I am not an expert in the field of neural networks, so I expect constructive criticism, comments and additions from readers.

#### Theoretical part

This material assumes familiarity with the basics of neural networks, however, I consider it possible to introduce the reader to the topic without unnecessary trials on the theory of neural networks. So, for those who first hear the phrase “neural network”, I propose to perceive the neural network as a weighted directed graph whose nodes (neurons) are arranged in layers. In addition, the node of one layer has connections with all nodes of the previous layer. In our case, such a graph will have input and output layers, the nodes of which act as inputs and

outputs respectively. Each node (neuron) has an activation function - the function responsible for calculating the signal at the output of the node (neuron). There is also the concept of displacement, which is a node, the output of which always appears unit. In this article, we will consider the process of learning a neural network, which assumes the presence of a “teacher”, that is, the learning process in which learning occurs by providing the network with a sequence of training examples with the correct responses.

As with most neural networks, our goal is to train the network in such a way as to achieve a balance between the ability of the network to give the correct response to the input data used in the learning process (memorization) and the ability to produce the correct results in response to the input data. similar, but not identical to those that were used in training (principle of generalization). Learning a network using the error back propagation method includes three stages: feeding data to the input, followed by data distribution in the direction of the outputs, calculating and back propagating the corresponding error, and adjusting the weights. After training, it is only supposed to supply data to the network input and distribute them in the direction of the outputs. Moreover, if network training can be a rather lengthy process, then the direct calculation of the results by the trained network is very fast. In addition, there are numerous variations of the back propagation method designed to increase the flow rate.

learning process.

It is also worth noting that a single-layer neural network is significantly limited in what training it is subject to input patterns, while a multi-layer network (with one or more hidden layers) does not have such a drawback. Next, a description will be given of a standard neural network with back propagation of error.

##### Architecture

Figure 1 shows a multilayer neural network with one layer of hidden neurons (elements Z).

Neurons representing network outputs (indicated ) and hidden neurons can be biased (as shown in the image). The offset corresponding to the output is indicated , for the hidden element - . These displacements serve as weights on the bonds emanating from neurons, the output of which always appears 1 (they are shown in Figure 1, but usually are not displayed, implied). In addition, in Figure 1, the arrows show the movement of information during the phase of data dissemination from inputs to outputs. In the learning process, signals propagate in the opposite direction.

##### Algorithm description

The algorithm presented below is applicable to a neural network with one hidden layer, which is an acceptable and adequate situation for most applications. As mentioned earlier, network training includes three stages: supplying training data to the network inputs, back propagation of errors, and correction of weights. During the first stage, each input neuron receives a signal and broadcasts it to each of the hidden neurons . Each hidden neuron then calculates the result of its activation function (network function) and sends its signal to all output neurons. Each output neuron , in turn, calculates the result of its activation function, which is nothing more than the output signal of a given neuron for the corresponding input data. In the learning process, each neuron at the output of the network compares the calculated value with the provided teacher (target value), determining the corresponding error value for a given input template. Based on this error is calculated . used when propagating errors from to all network elements of the previous layer (hidden neurons associated with ), as well as later when changing the weights of connections between output neurons and hidden. It is similarly calculated for each hidden neuron . Although there is no need to propagate the error to the input layer,used to change the weights of connections between neurons of the hidden layer and input neurons. After all have been identified, there is a simultaneous adjustment of the weights of all bonds.

###### Designations:

The following notation is used in the network learning algorithm:

Input vector of training data

Vector of target output values provided by the teacher;

Component for adjusting the connection weights corresponding to the error of the output neuron ; also, neuron error information that is distributed to those hidden layer neurons that are associated with .

Component of correlation of the weights of connections corresponding to the error information distributed from the output layer to the hidden neuron .

Learning speed.

Neuron at the input with index i. For input neurons, the input and output signals are the same .

Displacement of the hidden neuron j.

Hidden neuron j; The total value supplied to the input of the hidden element is indicated by :

The signal at the output (the result of applying to the activation function) is indicated by :

Displacement of the neuron at the output.

Neuron at the output under the index k; The total value supplied to the input of an output element is denoted by : . The output signal (the result of applying to the activation function) is indicated by :

##### Activation function

The activation function in the error backpropagation algorithm should have several important characteristics: continuity, differentiability, and be monotonically non-decreasing. Moreover, for the sake of computational efficiency, it is desirable that its derivative be easily found. Often, the activation function is also a saturation function. One of the most commonly used activation functions is a binary sigmoid function with a range of values in (0, 1) and defined as:

Another widely used activation function is a bipolar sigmoid with a range of values (-1, 1) and defined as:

##### Learning algorithm

The learning algorithm is as follows:

###### Step 0

Initialization of weights (the weights of all bonds are initialized by random small values).

###### Step 1.

Until the termination condition of the algorithm is correct, steps 2 through 9 are performed.

###### Step 2

For each pair {data, target value}, steps 3 through 8 are performed.

###### Data dissemination from inputs to outputs:

###### Step 3

Each input neuron sends the received signal to all neurons in the next layer (hidden).

###### Step 4

Each hidden neuron sums the weighted incoming signals: and applies an activation function: After which it sends the result to all elements of the next layer (output).

###### Step 5

Each output neuron sums the weighted input signals: and applies an activation function, calculating the output signal:

###### Error propagation back:

###### Step 6

Каждый выходной нейрон получает целевое значение — то выходное значение, которое является правильным для данного входного сигнала, и вычисляет ошибку: , так же вычисляет величину, на которую изменится вес связи : . Помимо этого, вычисляет величину корректировки смещения: и посылает нейронам в предыдущем слое.

###### Шаг 7.

Каждый скрытый нейрон суммирует входящие ошибки ( от нейронов в последующем слое ) и вычисляет величину ошибки, умножая полученное значение на производную активационной функции: , так же вычисляет величину, на которую изменится вес связи : . Помимо этого, вычисляет величину корректировки смещения:

###### Шаг 8. Изменение весов.

Each output neuron changes the weights of its connections with the displacement element and hidden neurons:

Each hidden neuron changes the weights of its connections with the displacement element and hidden neurons:

###### Step 9

Checking the termination condition of the algorithm.

The condition for the termination of the operation of the algorithm can be both the achievement of the total quadratic error of the result at the output of the network at a predetermined minimum in advance during the learning process, and the execution of a certain number of iterations of the algorithm. The algorithm is based on a method called gradient descent. Depending on the sign, the gradient of the function (in this case, the value of the function is an error, and the parameters are the weights of the links in the network) gives the direction in which the values of the function increase (or decrease) most rapidly.

##### Выбор первоначальных весов и смещения

Random initialization. The choice of initial weights will affect whether the network can reach the global (or only local) minimum error, and how quickly this process will occur. The change in weights between two neurons is associated with the derivative of the activation function of the neuron from the next layer and the activation function of the neuron of the previous layer. In this regard, it is important to avoid choosing such initial weights that zero the activation function or its derivative. Also, the initial weights should not be too large (or the input signal for each hidden or output neuron will most likely fall into a region of very small sigmoid values (saturation region)). On the other hand, if the initial weights are too small, then the input signal to the hidden or output neurons will be close to zero, which will also lead to a very slow learning rate. The standard procedure for initializing weights is to assign them random values in the interval (-0.5; 0.5). The values can be either positive or negative, since the final weights obtained after training the network can be of both signs. Initializing Nguyen - Widrow. The simple modification of the standard initialization procedure presented below facilitates faster learning: Weights of the connections of hidden and output neurons, as well as the displacement of the output layer, are initialized, as in the standard procedure, by random values from the interval (-0.5; 0.5). obtained after training the network, can be both characters. Initializing Nguyen - Widrow. The simple modification of the standard initialization procedure presented below facilitates faster learning: Weights of the connections of hidden and output neurons, as well as the displacement of the output layer, are initialized, as in the standard procedure, by random values from the interval (-0.5; 0.5). obtained after training the network, can be both characters. Initializing Nguyen - Widrow. The simple modification of the standard initialization procedure presented below facilitates faster learning: Weights of the connections of hidden and output neurons, as well as the displacement of the output layer, are initialized, as in the standard procedure, by random values from the interval (-0.5; 0.5).

We introduce the following notation: the

number of input neurons, the

number of hidden neurons,

the scaling factor:

The procedure consists of the following simple steps:

For each hidden neuron :

initialize its weight vector (connections with input neurons):

calculate

reinitialize weights:

set the offset value:

#### Practical part

I'll start with the implementation of the concept of a neuron. It was decided to present the neurons of the input layer as the base class, and the hidden and output ones as decorators of the base class. In addition, the neuron stores information about outgoing and incoming connections, and each neuron compositionally has an activation function.

**Neuron interface**

```
/**
* Neuron base class.
* Represents a basic element of neural network, node in the net's graph.
* There are several possibilities for creation an object of type Neuron, different constructors suites for
* different situations.
*/
template
```
class Neuron
{
public:
/**
* A default Neuron constructor.
* - Description: Creates a Neuron; general purposes.
* - Purpose: Creates a Neuron, linked to nothing, with a Linear network function.
* - Prerequisites: None.
*/
Neuron( ) : mNetFunc( new Linear ), mSumOfCharges( 0.0 ) { };
/**
* A Neuron constructor based on NetworkFunction.
* - Description: Creates a Neuron; mostly designed to create an output kind of neurons.
* @param inNetFunc - a network function which is producing neuron's output signal;
* - Purpose: Creates a Neuron, linked to nothing, with a specific network function.
* - Prerequisites: The existence of NetworkFunction object.
*/
Neuron( NetworkFunction * inNetFunc ) : mNetFunc( inNetFunc ), mSumOfCharges( 0.0 ){ };
Neuron( std::vector *>& inLinksToNeurons, NetworkFunction * inNetFunc ) :
mNetFunc( inNetFunc ),
mLinksToNeurons(inLinksToNeurons),
mSumOfCharges(0.0){ };
/**
* A Neuron constructor based on layer of Neurons.
* - Description: Creates a Neuron; mostly designed to create an input and hidden kinds of neurons.
* @param inNeuronsLinkTo - a vector of pointers to Neurons which is representing a layer;
* @param inNetFunc - a network function which is producing neuron's output signal;
* - Purpose: Creates a Neuron, linked to every Neuron in provided layer.
* - Prerequisites: The existence of std::vector and NetworkFunction.
*/
Neuron( std::vector& inNeuronsLinkTo, NetworkFunction * inNetFunc );
virtual ~Neuron( );
virtual std::vector *>& GetLinksToNeurons( ){ return mLinksToNeurons; };
virtual NeuralLink * at( const int& inIndexOfNeuralLink ) { return mLinksToNeurons[ inIndexOfNeuralLink ]; };
virtual void SetLinkToNeuron( NeuralLink * inNeuralLink ){ mLinksToNeurons.push_back( inNeuralLink ); };
virtual void Input( double inInputData ){ mSumOfCharges += inInputData; };
virtual double Fire( );
virtual int GetNumOfLinks( ) { return mLinksToNeurons.size( ); };
virtual double GetSumOfCharges( );
virtual void ResetSumOfCharges( ){ mSumOfCharges = 0.0; };
virtual double Process( ) { return mNetFunc->Process( mSumOfCharges ); };
virtual double Process( double inArg ){ return mNetFunc->Process( inArg ); };
virtual double Derivative( ){ return mNetFunc->Derivative( mSumOfCharges ); };
virtual void SetInputLink( NeuralLink * inLink ){ mInputLinks.push_back( inLink ); };
virtual std::vector *>& GetInputLink( ){ return mInputLinks; };
virtual double PerformTrainingProcess( double inTarget );
virtual void PerformWeightsUpdating( );
virtual void ShowNeuronState( );
protected:
NetworkFunction * mNetFunc;
std::vector *> mInputLinks;
std::vector *> mLinksToNeurons;
double mSumOfCharges;
};
template
class OutputLayerNeuronDecorator : public Neuron
{
public:
OutputLayerNeuronDecorator( Neuron * inNeuron ){ mOutputCharge = 0; mNeuron = inNeuron; };
virtual ~OutputLayerNeuronDecorator( );
virtual std::vector *>& GetLinksToNeurons( ){ return mNeuron->GetLinksToNeurons( ) ;};
virtual NeuralLink * at( const int& inIndexOfNeuralLink ){ return ( mNeuron->at( inIndexOfNeuralLink ) ) ;};
virtual void SetLinkToNeuron( NeuralLink * inNeuralLink ){ mNeuron->SetLinkToNeuron( inNeuralLink ); };
virtual double GetSumOfCharges( ) { return mNeuron->GetSumOfCharges( ); };
virtual void ResetSumOfCharges( ){ mNeuron->ResetSumOfCharges( ); };
virtual void Input( double inInputData ){ mNeuron->Input( inInputData ); };
virtual double Fire( );
virtual int GetNumOfLinks( ) { return mNeuron->GetNumOfLinks( ); };
virtual double Process( ) { return mNeuron->Process( ); };
virtual double Process( double inArg ){ return mNeuron->Process( inArg ); };
virtual double Derivative( ) { return mNeuron->Derivative( ); };
virtual void SetInputLink( NeuralLink * inLink ){ mNeuron->SetInputLink( inLink ); };
virtual std::vector *>& GetInputLink( ) { return mNeuron->GetInputLink( ); };
virtual double PerformTrainingProcess( double inTarget );
virtual void PerformWeightsUpdating( );
virtual void ShowNeuronState( ) { mNeuron->ShowNeuronState( ); };
protected:
double mOutputCharge;
Neuron * mNeuron;
};
template
class HiddenLayerNeuronDecorator : public Neuron
{
public:
HiddenLayerNeuronDecorator( Neuron * inNeuron ) { mNeuron = inNeuron; };
virtual ~HiddenLayerNeuronDecorator( );
virtual std::vector *>& GetLinksToNeurons( ){ return mNeuron->GetLinksToNeurons( ); };
virtual void SetLinkToNeuron( NeuralLink * inNeuralLink ){ mNeuron->SetLinkToNeuron( inNeuralLink ); };
virtual double GetSumOfCharges( ){ return mNeuron->GetSumOfCharges( ) ;};
virtual void ResetSumOfCharges( ){mNeuron->ResetSumOfCharges( ); };
virtual void Input( double inInputData ){ mNeuron->Input( inInputData ); };
virtual double Fire( );
virtual int GetNumOfLinks( ){ return mNeuron->GetNumOfLinks( ); };
virtual NeuralLink * ( const int& inIndexOfNeuralLink ){ return ( mNeuron->at( inIndexOfNeuralLink) ); };
virtual double Process( ){ return mNeuron->Process( ); };
virtual double Process( double inArg ){ return mNeuron->Process( inArg ); };
virtual double Derivative( ){ return mNeuron->Derivative( ); };
virtual void SetInputLink( NeuralLink * inLink ){ mNeuron->SetInputLink( inLink ); };
virtual std::vector *>& GetInputLink( ){ return mNeuron->GetInputLink( ); };
virtual double PerformTrainingProcess( double inTarget );
virtual void PerformWeightsUpdating( );
virtual void ShowNeuronState( ){ mNeuron->ShowNeuronState( ); };
protected:
Neuron * mNeuron;
};

The neural connection interface is presented below, each connection stores a weight and a pointer to a neuron:

**Neural communication interface**

`template `
class Neuron;
template
class NeuralLink
{
public:
NeuralLink( ) : mWeightToNeuron( 0.0 ),
mNeuronLinkedTo( 0 ),
mWeightCorrectionTerm( 0 ),
mErrorInformationTerm( 0 ),
mLastTranslatedSignal( 0 ){ };
NeuralLink( Neuron * inNeuronLinkedTo, double inWeightToNeuron = 0.0 ) :
mWeightToNeuron( inWeightToNeuron ),
mNeuronLinkedTo( inNeuronLinkedTo ),
mWeightCorrectionTerm( 0 ),
mErrorInformationTerm( 0 ),
mLastTranslatedSignal( 0 ){ };
void SetWeight( const double& inWeight ){ mWeightToNeuron = inWeight; };
const double& GetWeight( ){ return mWeightToNeuron; };
void SetNeuronLinkedTo( Neuron * inNeuronLinkedTo ){ mNeuronLinkedTo = inNeuronLinkedTo; };
Neuron * GetNeuronLinkedTo( ){ return mNeuronLinkedTo; };
void SetWeightCorrectionTerm( double inWeightCorrectionTerm ){ mWeightCorrectionTerm = inWeightCorrectionTerm; };
double GetWeightCorrectionTerm( ){ return mWeightCorrectionTerm; };
void UpdateWeight( ){ mWeightToNeuron = mWeightToNeuron + mWeightCorrectionTerm; };
double GetErrorInFormationTerm( ){ return mErrorInformationTerm; };
void SetErrorInFormationTerm( double inEITerm ){ mErrorInformationTerm = inEITerm; };
void SetLastTranslatedSignal( double inLastTranslatedSignal ){ mLastTranslatedSignal = inLastTranslatedSignal; };
double GetLastTranslatedSignal( ){ return mLastTranslatedSignal; };
protected:
double mWeightToNeuron;
Neuron * mNeuronLinkedTo;
double mWeightCorrectionTerm;
double mErrorInformationTerm;
double mLastTranslatedSignal;
};

Each activation function inherits from an abstract class, realizing the function itself and the derivative:

**Activation function interface**

```
class NetworkFunction {
public:
NetworkFunction(){};
virtual ~NetworkFunction(){};
virtual double Process( double inParam ) = 0;
virtual double Derivative( double inParam ) = 0;
};
class Linear : public NetworkFunction {
public:
Linear(){};
virtual ~Linear(){};
virtual double Process( double inParam ){ return inParam; };
virtual double Derivative( double inParam ){ return 0; };
};
class Sigmoid : public NetworkFunction {
public:
Sigmoid(){};
virtual ~Sigmoid(){};
virtual double Process( double inParam ){ return ( 1 / ( 1 + exp( -inParam ) ) ); };
virtual double Derivative( double inParam ){ return ( this->Process(inParam)*(1 - this->Process(inParam)) );};
};
class BipolarSigmoid : public NetworkFunction {
public:
BipolarSigmoid(){};
virtual ~BipolarSigmoid(){};
virtual double Process( double inParam ){ return ( 2 / ( 1 + exp( -inParam ) ) - 1 ) ;};
virtual double Derivative( double inParam ){ return ( 0.5 * ( 1 + this->Process( inParam ) ) * ( 1 - this->Process( inParam ) ) ); };
};
```

A neural factory is responsible for the production of neurons:

**Neural factory interface**

```
template
```
class NeuronFactory
{
public:
NeuronFactory(){};
virtual ~NeuronFactory(){};
virtual Neuron * CreateInputNeuron( std::vector *>& inNeuronsLinkTo, NetworkFunction * inNetFunc ) = 0;
virtual Neuron * CreateOutputNeuron( NetworkFunction * inNetFunc ) = 0;
virtual Neuron * CreateHiddenNeuron( std::vector *>& inNeuronsLinkTo, NetworkFunction * inNetFunc ) = 0;
};
template
class PerceptronNeuronFactory : public NeuronFactory
{
public:
PerceptronNeuronFactory(){};
virtual ~PerceptronNeuronFactory(){};
virtual Neuron * CreateInputNeuron( std::vector *>& inNeuronsLinkTo, NetworkFunction * inNetFunc ){ return new Neuron( inNeuronsLinkTo, inNetFunc ); };
virtual Neuron * CreateOutputNeuron( NetworkFunction * inNetFunc ){ return new OutputLayerNeuronDecorator( new Neuron( inNetFunc ) ); };
virtual Neuron * CreateHiddenNeuron( std::vector *>& inNeuronsLinkTo, NetworkFunction * inNetFunc ){ return new HiddenLayerNeuronDecorator( new Neuron( inNeuronsLinkTo, inNetFunc ) ); };
};

The neural network itself stores pointers to neurons organized by

layers (in general, pointers to neurons are stored in vectors that

need to be replaced by layer objects), includes an abstract

neuron factory, as well as a network learning algorithm.

**Neural network interface**

`template `
class TrainAlgorithm;
/**
* Neural network class.
* An object of that type represents a neural network of several types:
* - Single layer perceptron;
* - Multiple layers perceptron.
*
* There are several training algorithms available as well:
* - Perceptron;
* - Backpropagation.
*
* How to use this class:
* To be able to use neural network , you have to create an instance of that class, specifying
* a number of input neurons, output neurons, number of hidden layers and amount of neurons in hidden layers.
* You can also specify a type of neural network, by passing a string with a name of neural network, otherwise
* MultiLayerPerceptron will be used. ( A training algorithm can be changed via public calls);
*
* Once the neural network was created, all u have to do is to set the biggest MSE required to achieve during
* the training phase ( or u can skip this step, then mMinMSE will be set to 0.01 ),
* train the network by providing a training data with target results.
* Afterwards u can obtain the net response by feeding the net with data;
*
*/
template
class NeuralNetwork
{
public:
/**
* A Neural Network constructor.
* - Description: A template constructor. T is a data type, all the nodes will operate with. Create a neural network by providing it with:
* @param inInputs - an integer argument - number of input neurons of newly created neural network;
* @param inOutputs- an integer argument - number of output neurons of newly created neural network;
* @param inNumOfHiddenLayers - an integer argument - number of hidden layers of newly created neural network, default is 0;
* @param inNumOfNeuronsInHiddenLayers - an integer argument - number of neurons in hidden layers of newly created neural network ( note that every hidden layer has the same amount of neurons), default is 0;
* @param inTypeOfNeuralNetwork - a const char * argument - a type of neural network, we are going to create. The values may be:
*
* - MultiLayerPerceptron;
* - Default is MultiLayerPerceptron.
*

* - Purpose: Creates a neural network for solving some interesting problems.
* - Prerequisites: The template parameter has to be picked based on your input data.
*
*/
NeuralNetwork( const int& inInputs,
const int& inOutputs,
const int& inNumOfHiddenLayers = 0,
const int& inNumOfNeuronsInHiddenLayers = 0,
const char * inTypeOfNeuralNetwork = "MultiLayerPerceptron"
);
~NeuralNetwork( );
/**
* Public method Train.
* - Description: Method for training the network.
* - Purpose: Trains a network, so the weights on the links adjusted in the way to be able to solve problem.
* - Prerequisites:
* @param inData - a vector of vectors with data to train with;
* @param inTarget - a vector of vectors with target data;
* - the number of data samples and target samples has to be equal;
* - the data and targets has to be in the appropriate order u want the network to learn.
*/
bool Train( const std::vector >& inData,
const std::vector >& inTarget );
/**
* Public method GetNetResponse.
* - Description: Method for actually get response from net by feeding it with data.
* - Purpose: By calling this method u make the network evaluate the response for u.
* - Prerequisites:
* @param inData - a vector data to feed with.
*/
std::vector GetNetResponse( const std::vector& inData );
/**
* Public method SetAlgorithm.
* - Description: Setter for algorithm of training the net.
* - Purpose: Can be used for dynamic change of training algorithm.
* - Prerequisites:
* @param inTrainingAlgorithm - an existence of already created object of type TrainAlgorithm.
*/
void SetAlgorithm( TrainAlgorithm * inTrainingAlgorithm ) { mTrainingAlgoritm = inTrainingAlgorithm; };
/**
* Public method SetNeuronFactory.
* - Description: Setter for the factory, which is making neurons for the net.
* - Purpose: Can be used for dynamic change of neuron factory.
* - Prerequisites:
* @param inNeuronFactory - an existence of already created object of type NeuronFactory.
*/
void SetNeuronFactory( NeuronFactory * inNeuronFactory ) { mNeuronFactory = inNeuronFactory; };
/**
* Public method ShowNetworkState.
* - Description: Prints current state to the standard output: weight of every link.
* - Purpose: Can be used for monitoring the weights change during training of the net.
* - Prerequisites: None.
*/
void ShowNetworkState( );
/**
* Public method GetMinMSE.
* - Description: Returns the biggest MSE required to achieve during the training phase.
* - Purpose: Can be used for getting the biggest MSE required to achieve during the training phase.
* - Prerequisites: None.
*/
const double& GetMinMSE( ){ return mMinMSE; };
/**
* Public method SetMinMSE.
* - Description: Setter for the biggest MSE required to achieve during the training phase.
* - Purpose: Can be used for setting the biggest MSE required to achieve during the training phase.
* - Prerequisites:
* @param inMinMse - double value, the biggest MSE required to achieve during the training phase.
*/
void SetMinMSE( const double& inMinMse ){ mMinMSE = inMinMse; };
/**
* Friend class.
*/
friend class Hebb;
/**
* Friend class.
*/
friend class Backpropagation;
protected:
/**
* Protected method GetLayer.
* - Description: Getter for the layer by index of that layer.
* - Purpose: Can be used by inner implementation for getting access to neural network's layers.
* - Prerequisites:
* @param inInd - an integer index of layer.
*/
std::vector *>& GetLayer( const int& inInd ){ return mLayers[inInd]; };
/**
* Protected method size.
* - Description: Returns the number of layers in the network.
* - Purpose: Can be used by inner implementation for getting number of layers in the network.
* - Prerequisites: None.
*/
unsigned int size( ){ return mLayers.size( ); };
/**
* Protected method GetNumOfOutputs.
* - Description: Returns the number of units in the output layer.
* - Purpose: Can be used by inner implementation for getting number of units in the output layer.
* - Prerequisites: None.
*/
std::vector *>& GetOutputLayer( ){ return mLayers[mLayers.size( )-1]; };
/**
* Protected method GetInputLayer.
* - Description: Returns the input layer.
* - Purpose: Can be used by inner implementation for getting the input layer.
* - Prerequisites: None.
*/
std::vector *>& GetInputLayer( ){ return mLayers[0]; };
/**
* Protected method GetBiasLayer.
* - Description: Returns the vector of Biases.
* - Purpose: Can be used by inner implementation for getting vector of Biases.
* - Prerequisites: None.
*/
std::vector *>& GetBiasLayer( ) { return mBiasLayer; };
/**
* Protected method UpdateWeights.
* - Description: Updates the weights of every link between the neurons.
* - Purpose: Can be used by inner implementation for updating the weights of links between the neurons.
* - Prerequisites: None, but only makes sense, when its called during the training phase.
*/
void UpdateWeights( );
/**
* Protected method ResetCharges.
* - Description: Resets the neuron's data received during iteration of net training.
* - Purpose: Can be used by inner implementation for reset the neuron's data between iterations.
* - Prerequisites: None, but only makes sense, when its called during the training phase.
*/
void ResetCharges( );
/**
* Protected method AddMSE.
* - Description: Changes MSE during the training phase.
* - Purpose: Can be used by inner implementation for changing MSE during the training phase.
* - Prerequisites:
* @param inInd - a double amount of MSE to be add.
*/
void AddMSE( double inPortion ){ mMeanSquaredError += inPortion; };
/**
* Protected method GetMSE.
* - Description: Getter for MSE value.
* - Purpose: Can be used by inner implementation for getting access to the MSE value.
* - Prerequisites: None.
*/
double GetMSE( ){ return mMeanSquaredError; };
/**
* Protected method ResetMSE.
* - Description: Resets MSE value.
* - Purpose: Can be used by inner implementation for resetting MSE value.
* - Prerequisites: None.
*/
void ResetMSE( ) { mMeanSquaredError = 0; };
NeuronFactory * mNeuronFactory; /*!< Member, which is responsible for creating neurons @see SetNeuronFactory */
TrainAlgorithm * mTrainingAlgoritm; /*!< Member, which is responsible for the way the network will trained @see SetAlgorithm */
std::vector *> > mLayers; /*!< Inner representation of neural networks */
std::vector *> mBiasLayer; /*!< Container for biases */
unsigned int mInputs, mOutputs, mHidden; /*!< Number of inputs, outputs and hidden units */
double mMeanSquaredError; /*!< Mean Squared Error which is changing every iteration of the training*/
double mMinMSE; /*!< The biggest Mean Squared Error required for training to stop*/
};

And finally, the interface of the class responsible for training the network itself:

**Learning Algorithm Interface**

```
template
```
class NeuralNetwork;
template
class TrainAlgorithm
{
public:
virtual ~TrainAlgorithm(){};
virtual double Train(const std::vector& inData, const std::vector& inTarget) = 0;
virtual void WeightsInitialization() = 0;
protected:
};
template
class Hebb : public TrainAlgorithm
{
public:
Hebb(NeuralNetwork * inNeuralNetwork) : mNeuralNetwork(inNeuralNetwork){};
virtual ~Hebb(){};
virtual double Train(const std::vector& inData, const std::vector& inTarget);
virtual void WeightsInitialization();
protected:
NeuralNetwork * mNeuralNetwork;
};
template
class Backpropagation : public TrainAlgorithm
{
public:
Backpropagation(NeuralNetwork * inNeuralNetwork);
virtual ~Backpropagation(){};
virtual double Train(const std::vector& inData, const std::vector& inTarget);
virtual void WeightsInitialization();
protected:
void NguyenWidrowWeightsInitialization();
void CommonInitialization();
NeuralNetwork * mNeuralNetwork;
};

All the code is available on github: Sovietmade / NeuralNetworks.

As a conclusion, I would like to note that the topic of neural networks is not fully developed at the moment, again and again we see on the pages of the habron mention of new achievements of scientists in the field of neural networks, new amazing developments. For my part,

this article was the first step in mastering an interesting technology, and I hope for someone it will prove to be useful.

##### References:

The neural network learning algorithm was taken from an amazing book:

Laurene V. Fausett “Fundamentals of Neural Networks: Architectures, Algorithms And Applications”.