ptsecurity December 18, 2014 at 15:35

Security Scanners: Automatically Validate Vulnerabilities Using Fuzzy Sets and Neural Networks

Now in the world there are a large number of information security scanners of various companies (including MaxPatrol , XSpider and the Application Inspector code analyzer manufactured by Positive Technologies). Such tools differ in price, scan quality, types of identified vulnerabilities, methods for their search, and dozens of other parameters.

When creating scanners, an important role is played by the methods of testing their work , a special place in which is the competitive analysis of similar products.

As a rule, the result of the work of any security scanner is a list of detected vulnerabilities obtained during the analysis of a web application. The fact that scanners use heuristic algorithms leads to the problem of the presence of false positives and filling the list with false positives that do not exist in reality. And this, in turn, leads to the need to allocate an information security expert to test the scanner.

To confirm the existence of a vulnerability, it is proposed to use “reference” lists of vulnerabilities contained in similar web applications. The analyst can use such lists to identify the most likely vulnerabilities of the product under test and filter out obvious false positives.

Statement of the problem of fuzzy classification of vulnerabilities

In practice, we propose to solve the problem of confirming vulnerabilities from the list issued by the scanner in practice as a task of comparing them with some standards. If all objects — both standards and vulnerability candidates — can be unambiguously parameterized and represented as a vector, then the problem can be reduced to the classical problem of classifying elements of a set .

Input data:

The set of Vulners of all vulnerabilities of web applications is specified, which can be specified by their vectors signs v _i . Vulners has many Candidates, candidates for vulnerabilities found by the scanner.
Each candidate vulnerability can be assigned to two classes: I - confirmed (Ver) and II - unconfirmed (NVer) vulnerabilities.
There are many Eth - reference vulnerabilities included in class I.
A lot of Scales are given - measuring scales for assessing the properties of vulnerabilities, both clear and fuzzy.

It is required:

Build functions that connect clear and fuzzy scales for the possibility of different interpretations of classification results.
Build a Classificator function, which for each vulnerability indicates an assessment of its belonging to the classes of confirmed and unconfirmed vulnerabilities.

As measuring scales for assessing the properties of information systems can be used:

A clear scale is a set of real numbers from the interval [0, 1], which can easily be converted to any other kind of clear numerical sets - discrete, continuous, unlimited - using various conversion functions.
Fuzzy scale of the F-set of ordered fuzzy variables of the form FP = {fp _i }, where fp _i are linguistic variables describing the values of the object's properties.

Clear and fuzzy “universal” measuring scales

Input coding

For any classification method, vulnerabilities must be precoded, that is, represented by the vector v = {v _i } from Vulners. To do this, you need to set a formal coding rule, according to which it is possible to evaluate individual properties of real vulnerabilities on a clear scale S _p . _{Define the V Vulners}

vulnerability coding matrix, the rows of which are individual properties of vulnerabilities (vulner property), the columns indicate the numeric code (code) of a certain property, and the possible values of the properties are indicated in the matrix cells. To build such a matrix, only significant properties should be selected that clearly distinguish one automatically detected vulnerability from another. It is clear that for each information security scanner, the classification of vulnerabilities can be its own. However, most of them contain such properties as, for example, the type of vulnerability, the protocol by which it can be exploited, the implementation channel inside this protocol, the type of vulnerable object, the path to the object on the server, and a network request with an attack vector. All possible values of each property are encoded by non-negative integers,

Matrix M _Vulners can be presented in tabular form. The values of the properties can also be fuzzy values and for use in further calculations they need to be dephased.

Construction of a neural network, its training and presentation of results

We will configure the neural network with three values:

Config =l}, outputs>,

where inputs is the number of input parameters, {layer ^l } is the set of non-negative integers indicating the number of neurons in the hidden layer, number l, and outputs is the number of output parameters.

Vector (s _I , s _II ) with parameter values on a clear scale S _p can be interpreted as follows:

The values of the parameters indicate the degree of confidence from 0 to 1 that the vector of vulnerability signs belongs to each class.
The parameter values, being multiplied by 100%, indicate the probability that the vector of vulnerability signs belongs to each class from 0 to 100%.
The parameter values phased using the special Fuzzy (x, S _f ) function indicate a linguistic assessment of the level of belonging of the vector of vulnerability signs to each of the classes on a fuzzy scale S _f = {Min, Low, Med, High, Max}.

Classifier software implementation

For the practical use of neural networks in solving fuzzy classification problems in the case of a different number of classes and network structures, the FuzzyClassificator software modules were developed, distributed under the GNU GPL v3 license. Download the latest version of FuzzyClassificator on GitHub.

For ease of use of modules in automation systems, program configuration is carried out through the command line interface. The GitHub program description section provides detailed technical information about interface commands, operation of modules, and input data. The FuzzyClassificator modules require Pyzo, a free and open source development tool based on Python 3.3.2, which includes many routines for implementing scientific calculations, in particular the PyBrain library, routines for working with neural networks.

The main software modules that implement the approaches proposed in the article and the mathematical apparatus:

FuzzyClassificator - implements a command line user interface, receives and processes input data, sets learning and classification modes, provides results.
PyBrainLearning - defines methods for working with fuzzy neural networks, combining the capabilities of the PyBrain library and the author's FuzzyRoutines library.
FuzzyRoutines - contains routines for working with fuzzy sets and fuzzy scales.

Upper A-0 level of the functional IDEF0 model of the FuzzyClassificator program.

Level A0 of the IDEF0 model. The main stages of the FuzzyClassificator

Level A1 IDEF0-model. Subprocesses of the stages of work of the FuzzyClassificator

The learning phase (learning mode) consists of the following steps:

1. Initialization of program objects with user-defined values.

2. Processing input data and preparing the neural network for training:

processing a file with data about the vectors of features of the standards;
preparation of data for training in PyBrain format;
initialization of the parameters of the new neural network PyBrain or its loading from the specified file.

3. Neural network training at specified standards:

initialization of the PyBrain trainer module;
training the network using a trainer and saving its configuration to a PyBrain file format.

Classification mode consists of the following steps:

1. Initialization of program objects with user-defined values.

2. Processing input data and preparing a neural network for data analysis:

processing a file with data on candidate feature vectors;
loading the configuration of the trained neural network PyBrain from the specified file.

3. Neural network analysis of candidate feature vectors:

activation of a neural network and calculation of levels of vectors belonging to different classes;
interpretation of the results on fuzzy scales and the formation of a report file.

Input data with vectors of features of standards and candidates are set in the form of ordinary text files with tabulation as a separator of values. For example, to specify the data for training, you can prepare the ethalons.dat file containing the first header line and then the lines with the values of the reference feature vectors and their belonging to one or another class.

Values can be set on both clear and fuzzy scales.

File ethalons.dat

input1 input2 input3 1st_class_output 2nd_class_output
0.1 0.2 Min 0 Max 
0.2 0.3 Low 0 Max
0.3 0.4 Med 0 Max
0.4 0.5 Med Max 0
0.5 0.6 High Max 0
0.6 0.7 Max 0

And as the data for analysis , the candidates.dat file can also be prepared , which also contains a title line and lines with the values of the candidate attribute vectors:

File candidates.dat

input1 input2 input3
0.12 0.32 Med
0.32 0.35 Low
0.54 0.57 Med
0.65 0.68 High
0.76 0.79 Min

Based on the results of the program, a file is created with a report containing information on the configuration of the neural network and the classification results for each feature vector from a variety of candidates.

After training the neural network in the above examples, with the parameters specified by the command line:

python FuzzyClassificator.py --learn config=3,3,2,2 epochs=1000 rate=0.1 momentum=0.05

and then, in the classification mode with the command line parameters:

python FuzzyClassificator.py --classify config=3,3,2,2

The output is a report file .

Report file

Neuronet: C:\work\projects\FuzzyClassificator\network.xml 
FuzzyScale = {Min, Low, Med, High, Max}
Min = 
Low = 
Med = 
High = 
Max = 
Classification results for candidates vectors: 
Input: ['0.12', '0.32', 'Min'] Output: ['Min', 'Max']
Input: ['0.32', '0.35', 'Low'] Output: ['Low', 'High']
Input: ['0.54', '0.57', 'Med'] Output: ['Max', 'Min']
Input: ['0.65', '0.68', 'High'] Output: ['Max', 'Min']
Input: ['0.76', '0.79', 'Max'] Output: ['Max', 'Min']

If we analyze the data from the candidates.dat file, then we can state with a high degree of certainty that an expert person, relying only on the data from the ethalons.dat file, would give similar classification results.

Conclusion

So, we managed to combine the mathematical apparatus of the theories of fuzzy systems and neural networks to solve the practical problem of vulnerability classification. From the work done, several conclusions can be drawn:

Mathematical methods of classifying on the basis of neural networks are also applicable in the case of classification of vulnerabilities.
To obtain adequate results, it is necessary to correctly build a coding matrix and select the best properties for modeling vulnerabilities.
For the vulnerability classification problem, it is recommended to use a neural network of perceptrons with two hidden layers and in a configuration that depends on the number of input parameters: in the first, the number of neurons is equal to the number of input parameters, and in the second, two times less.
An advantage of the proposed approaches is the use of universal fuzzy scales of linguistic variables, which are applicable both for estimating the values of feature vectors and for interpreting the final levels of class membership.
The proposed fuzzy classification method and the FuzzyClassificator software modules that implement it are universal, they are easily adapted and configured for specific classification objects.

We will be happy to answer your questions in the comments. For more details, as well as the description of the device, see: math-n-algo.blogspot.ru/2014/08/FuzzyClassificator.html .

Posted by Timur Gilmullin , Positive Technologies.

Tags: