Self-Organizing Incremental Neural Network (SOINN) Algorithm

    Introduction


    One of the tasks of teaching without a teacher is the task of finding a topological structure that most accurately reflects the topology of the distribution of input data. There are several approaches to solving this problem. For example, the Kohonen Self-Organizing Maps algorithm is a method for projecting multidimensional space into a space with a lower dimension (usually two-dimensional) with a predetermined structure. Due to the decrease in the dimension of the original problem, and the predetermined network structure, projection defects arise, the analysis of which is a difficult task. As one alternative to this approach, a combination of Hebb's competitive learning and neural gasis more effective in building a topological structure. But the practical application of this approach is hindered by a number of problems: a priori knowledge of the required network size and the difficulty of applying the training rate adaptation methods to this network are necessary, excessive adaptation leads to a decrease in efficiency when learning new data, and an adaptation speed that is too slow causes high sensitivity to noisy data.

    For the tasks of online training or long-term training, the above methods are not suitable. The fundamental problem for such tasks is how the system can adapt to new information without damaging or destroying what is already known.

    This article discusses the SOINN algorithm, which partially solves the problems mentioned above.


    Overview of the algorithm under consideration


    In the task of teaching without a teacher, we need to divide the incoming data into classes, without prior knowledge of their quantity. It is also necessary to determine the topology of the source data.

    So, the algorithm must satisfy the following requirements:
    • Online training, or lifetime
    • Classification without prior knowledge of input data
    • Separation of data with a fuzzy border and revealing the basic structure of clusters


    SOINN is a neural network with two layers. The first layer is used to determine the topological structure of clusters, and the second to determine the number of clusters and identify prototype nodes for them. First, the first layer of the network is trained, and then, using the data obtained during the training of the first layer, the second layer of the network is trained.

    For the classification task without a teacher, it is necessary to determine whether the input image belongs to one of the previously obtained clusters or represents a new one. Suppose that two images belong to the same cluster, if the distance (according to a predetermined metric) between them is less than the distance threshold T. If T is too large, then all input images will be assigned to the same cluster. If T is too small, then each image will be isolated as a separate cluster. To obtain a reasonable clusterization, T must be greater than the intraclass distance and less than the interclass one.

    For each of the layers of the SOINN network, we must calculate the threshold T. The first layer does not have a priori knowledge about the structure of the source data, so T is selected adaptively based on knowledge of the structure of the already built network and the current input image. When training the second layer, we can calculate the intraclass and interclass distance and choose the constant value of the threshold T.

    To represent the topological structure, in online or lifetime learning tasks, network growth is an important element for reducing errors and adapting to changing conditions while maintaining old data. But at the same time, an uncontrolled increase in the number of nodes leads to network congestion and "retraining" of it as a whole. Therefore, it is necessary to decide when and how to add new nodes to the network, and when to prevent the addition of new nodes.

    To add nodes, SOINN uses a scheme in which the insertion occurs in the region with the maximum error. And to evaluate the usefulness of the insert, the “error radius” is used, which is the estimate of the usefulness of the insert. This assessment ensures that the insertion will reduce the error and control the growth of nodes, and ultimately stabilizes their number.

    To make connections between neurons, the Hebb competitive rule proposed by Martinez in topology distribution networks (TRNs) is used. Hebb's competitive rule can be described as follows: for each input signal, the two nearest nodes are combined. It is proved that each graph optimally represents the topology of the input data. In online or lifetime learning tasks, nodes change their location slowly but constantly. Thus, nodes that are neighboring at an early stage may not be close at a later stage. This makes it necessary to delete connections that have not been updated recently.

    In the general case, there is also overlap of existing clusters. To determine the number of clusters accurately, it is assumed that the input data is separable: the probability density in the central part of each cluster is higher than the density in the part between the clusters, and the overlap of the clusters has a low probability density. Cluster separation occurs by removing nodes from regions with a low probability density. The SOINN algorithm offers the following strategy for removing nodes from regions with a low probability density: if the number of signals generated so far is an integer multiple of a given parameter, then delete those nodes that have only one topological neighbor, or have no neighbors at all and if the local accumulated the signal level is small,

    Algorithm description


    Now, based on the foregoing, we can give a formal description of the learning algorithm for the SOINN network. This algorithm is used to train both the first and second layers of the network. The only difference is that the input for training the second layer is generated by the first layer and when training the second layer, we have knowledge of the topological structure of the first layer to calculate a constant similarity threshold.

    Notation Used


    • A is the set used to store nodes
    • NA is the number of nodes in the set A
    • C - many links between nodes
    • NC is the number of edges in C
    • Wi - n-dimensional weight vector for node i
    • Ei - local error accumulator for node i
    • Mi - local signal accumulator for node i
    • Ri is the error radius for node i. Ri = Ei / Mi
    • Ci - cluster label for node i
    • Q is the number of clusters
    • Ti - similarity threshold for node i
    • Ni - set of topological neighbors for node i
    • age (i, j) - age of communication between nodes i and j


    Algorithm


    1. Initialize the set of nodes A with two nodes, c1 and c2:
      A = {c1, c2}
      Initialize the set of edges C with an empty set:
      C = {}.
    2. Input a new signal x from R ^ n.
    3. Find in the set A the nodes of the winner s1 and the second winner, as the nodes with the nearest and next weight vectors (by some given metric). If the distance between x, s1 and s2 is greater than the similarity thresholds Ts1 or Ts2, then create a new node and go to step 2.
    4. If the edge between s1 and s2 does not exist, then create it. Set the edge age between them to 0.
    5. Increase the age of all arcs emerging from s1 by 1.
    6. Add the distance between the input signal and the winner to the local total error Es1.
    7. Increase the local number of signals of node s1:
      Ms1 = Ms1 +1
    8. Adapt the weight vectors of the winner and his direct topological neighbors:
      Ws1 = Ws1 + e1 (t) (x - Ws1)
      Wsi = Wsi + e2 (t) (x - Wsi)
      Where e1 (t) and e2 (t) are the learning coefficients of the winner and his neighbors.
    9. Remove edges with an age above a specified threshold.
    10. If the number of input signals generated so far is a multiple of the lambda parameter, insert a new node and delete nodes in areas of low probability density according to the following rules:
      1. Find the node q with the maximum error.
      2. Find the node f with the maximum error among the neighbors q.
      3. Add the node r so that its weight vector is the arithmetic mean of the weights q and f.
      4. Calculate the accumulated error Er, the total number of signals Mr, and the inherited error radius:
        Er = alpha1 * (Eq + Ef)
        Mr = alpha2 * (Mq + Mf)
        Rr = alpha3 * (Rq + Rf)
      5. Reduce the total error of the q and r nodes:
        Eq = beta * Eq
        Ef = beta * Ef
      6. Reduce the accumulated number of signals:
        Mq = gamma * Mq
        Mf = gamma * Mf
      7. Determine if the insert of a new node was successful. If the insertion cannot reduce the average error of this area, then the insertion failed. The added vertex is deleted, and all parameters are returned to their original states. Otherwise, we update the error radius of all nodes involved in the insert:
        Rq = Eq / Mq
        Rf = Ef / Mf
        Rr = Er / Mr.
      8. If the insert was successful, create q <-> r and r <-> f links and delete q <-> f link.
      9. Among all nodes in A, find those that have only one neighbor, then compare the accumulated number of input signals with the average value of all nodes. If the node contains only one neighbor and the signal counter does not exceed the adaptive threshold, remove it from the set of nodes. For example, if Li = 1 and Mi <c * sum_ {j = 1 ... Na} Mj / Na (where 0 <c <1), then remove the vertex i.
      10. Delete all isolated nodes.
    11. After a long time, LT, each class from the input data will have a connected component in the constructed graph. Update class labels.
    12. Go to step 2 and continue training.


    Threshold Calculation Algorithm


    For the first layer, the similarity threshold T is calculated adaptively by the following algorithm:
    1. Initialize the similarity threshold for new nodes to + oo.
    2. When the node is the first or second winner, update the similarity threshold value:
      1. If the node has direct topological neighbors (Li> 0), update the value of Ti as the maximum distance between the node and its neighbors:
        Ti = max || Wi - Wj ||
      2. If the node has no neighbors, Ti is set as the minimum distance between the node and other nodes in the set A:
        Ti = min || Wi - Wc ||

    For the second layer, the similarity threshold is constant and is calculated as follows:
    1. Calculate the intraclass distance:
      dw = 1 / Nc * sum _ {(i, j) in C} || Wi - Wj ||
    2. Calculate interclass distances. The distance between the classes Ci and Cj is calculated as follows:
      db (Ci, Cj) = max_ {i in Ci, j in Cj} || Wi - Wj ||
    3. As the similarity threshold Tc, take the minimum interclass distance that exceeds the intraclass distance.


    Schematic diagram of the algorithm



    In "conditional" diamonds, black arrows correspond to the fulfillment of the condition, and red arrows to the non-fulfillment.

    The experiments


    To conduct experiments with this algorithm, it was implemented in C ++ using the Boost Graph Library. The code can be downloaded here .

    For the experiments, the following algorithm parameters were selected:
    1. lambda = 100
    2. age_max = 100
    3. c = 1.0
    4. alpha1 = 0.16
    5. alpha2 = 0.25
    6. alpha3 = 0.25
    7. beta = 0.67
    8. gamma = 0.75


    As test data, the following image was used, representing two classes (ring and circle).

    After applying 20,000 random points from these classes to the input of the SOINN network, the first network layer was formed, of the following form.

    Based on the data obtained on the first layer, second layer of the following form


    conclusions


    In this article, we examined one of the teacher-less learning algorithms based on data topologization. From the results obtained experimentally, we can conclude that the application of this method is fair.
    But still, it is worth noting the disadvantages of the SOINN algorithm:
    1. Since the network layers are trained sequentially one after another, it is difficult to determine the moment when it is worth stopping the training of the first layer and starting to learn the second layer.
    2. When changing the first layer of the network, you must completely retrain the second layer. In this regard, the problem of online learning has not been resolved.
    3. There are many parameters on the network that need to be selected manually.


    References


    “An incremental network for on-line unsupervised classification and topology learning” Sgen Furao, Osamu Hasegawa, 2005

    Also popular now: