ptsecurity March 7, 2017 at 17:30

How we analyze vulnerabilities using neural networks and fuzzy logic

Image: Daniel Friedman , Flickr

In our blog on Habré we write a lot about the implementation of DevOps practices in the development and testing of information security systems created in the company. The task of an automation engineer does not always consist only in installing and maintaining a service, sometimes it is necessary to solve labor-intensive research tasks.

To solve one of these problems - analysis of vulnerabilities during competitive analysis tests , we developed our own universal classifier . How this tool works, and what results it allows to achieve, will be discussed in our today's material.

Bit of theory

To begin with, we will understand what classification is in the general case. Under the classification of a derivative object is meant the relationship of this object to one of two classes, depending on how it is “similar” to the standard used in the subject area. That is, for the classification problem it is necessary to build some function (classifier), which would indicate the level of "similarity" of our object to the reference examples from different classes (more on the link ).

The Euler-Venn diagram for the vulnerability classification problem.

To solve a wide class of classification problems, it is proposed to use several theories:

fuzzy set theory;
a tool for fuzzy assessment of the properties of objects: fuzzy scales;
theory of neural networks.

Theory of Fuzzy Sets

The founder of the theory of fuzzy sets and fuzzy logic back in the 60s of the last century was Lotfie Zadeh . The meaning of the concept of “fuzzy set” is best illustrated by a simple example of an explanation of what a “lot” is. One instance of something - not a lot, two - too, but three, four or five - there may already be many. For the mathematical description of a fuzzy quantity, the so-called membership function is used, which for each object of the considered area associates a number characterizing the value of membership in a given fuzzy set.

Fuzzy scales

This is an ordered collection of fuzzy sets, that is, each of them must carry some kind of semantic load. An example is the well-known level scales. This is how a universal fuzzy scale looks like, consisting of five levels:

S = {Min, Low, Med, High, Max}

When operating with level scales, we are able to determine when a particular value is at some level. Such fuzzy scales allow you to interpret the values of specific properties in the form of a number (more on the link ).

Neural networks

It is known that in a biological neuron, cells can accumulate electrical impulses, which are transmitted to synapses that connect several neurons to each other. Depending on the sensitivity threshold of the cell, an electrical signal is transmitted or not transmitted further.

Mathematical neural networks are arranged in exactly the same way. At the input of the neuron, any numbers can be supplied - both clear and fuzzy, they are multiplied with weighting factors. For each neuron, a “response threshold” is set - the sum of the products of the inputs and weights is transmitted to the input of the activation function, which gives the result for a particular neuron. Such neurons, located one after another, are called a neural network (more on the link ).

To improve the quality of vulnerability analysis with our products, we needed to learn how to determine their belonging to one of two classes - confirmed or unconfirmed vulnerabilities. For this, many experiments were carried out, which culminated in the creation of an optimal neural network for solving this problem.

It consists of four layers, at the input of which numbers are supplied, and at the output we get two clear or fuzzy numbers that characterize the level of belonging to one of the classes - for example, the minimum level of "similarity" or "maximum" (more on the link ).

Classification Automation

To automate the process of classifying objects, we have developed a special tool - FuzzyClassificator. This is a fuzzy neuroclassifier, which is based on a neural network that processes clear and fuzzy values. The code of this tool is available on GitHub , Pyzo and PyBrain are required for its operation (more on the link ).

Now we use the FuzzyClassificator tool to solve a specific vulnerability classification application. They are a great example of objects of a fuzzy nature, and which even a person cannot unambiguously classify.

There are only two stages of the operation of any system based on a neural network - its training and classification. At the first stage, to solve our classification problem, we scan many different CMS with many security scanners. At the output, these scanners provide a lot of information about vulnerabilities in the CMS - at this stage it is impossible to say whether they are real, or we are dealing with false positives. We put the received data into the TFS database, from where it can be received and encoded in a form understandable to the neural network.

Then the neural network is trained on the reference data, after which it can be used on the data obtained during the tests of security scanners.

What is the result

Previously, we had to deal with vulnerabilities manually - this was the only way to understand whether our products worked correctly, whether the vulnerability they actually found existed and whether it was so serious. A neural network can save up to 70% of parsing time. In particular, this allowed to increase the number of scanned CMS and analyzed security scanners for competitive analysis.

This process has been automated in our TeamCity system. Testers use a special interface to run the FuzzyClassificator and use the neural network in training and classification mode.

An example of a system report at the training stage looks like this:

It includes data on the quality of the trained neural network - how much the network can be mistaken in the analysis. The report in the “combat” mode of vulnerability analysis looks like this:

All vulnerabilities are summarized in a table that reflects the levels of confidence of the neural network in the actual presence of a particular vulnerability or its falsity, as well as recommendations for interpreting this data. An example - in the figure above, the first vulnerability of the neural network is ready to confirm with a minimum level of confidence, and reject it with the maximum, therefore, recommends rejecting this error, marking it Rejected, that is, it is false positive for the scanner. After the neural network produces a result, it sends it also to the TFS database.

Limitations and improvements

Like any tool, our FuzzyClassificator has its limitations. Correct classification with its help:

highly dependent on the chosen input encoding method;
requires a good knowledge of the subject area for which the classification is performed;
requires considerable effort in preparing “good” training input.

At the moment, for the tool code and all its low-level methods, we have already carried out optimization of algorithms, but we are not going to stop there. In our immediate plans:

tool translation into CPython;
GPU code execution implementation.

Related materials:

PS The story about our experience in creating a fuzzy classifier was presented as part of the DevOps-meeting, which took place in the fall of 2016 in Moscow.

Video:

Slides:

The link presents presentations of 16 reports presented during the event. All presentations and video presentations are added to the table at the end of this topic-announcement .

Author : Timur Gilmullin

Tags: