SIEM depths: out-of-box correlations. Part 2. Data schema as a reflection of the “world” model
This is the second article in the series on the methodology for creating out-of-the-box correlation rules for SIEM systems. In the previous article, we set ourselves this task, described the advantages that will be obtained in its implementation, and also listed the main problems that stand in our way. In this article we will begin to search for solutions and begin with the problem of transforming the “world” model , as well as its manifestations at the stage of normalization of events.
The problem of transforming the “world” model was described in the first article. Let us briefly recall its essence: when an event occurs at the event source (for example, the process starts in the OS) it is recorded in different formats, first in memory, then in the OS event log and then in the SIEM system. Each stage of processing is accompanied by data loss, since at the OS level there is one model of the “world”, and in the OS journal another one, limited by a set of fields, is a log scheme. Thus, there is a reflection (transformation) of one model with a large number of parameters to another, with a smaller number of them. Normalization and preservation of events in the SIEM is another transformation that also occurs with data loss, as well as inside its own model of the “world”.
It is difficult to find a way that would allow the transformation of one model to another without loss. Knowing this limitation, it is necessary to formulate such an approach to the normalization and formation of a list of fields of the event scheme, which would not lose information important for correlation and further investigation of information security incidents.
In the framework of the SIEM, the model is represented by a scheme - a set of fields in which, in the normalization process, the data from the original event are stored. In the future, it will be used by experts to create correlation rules. In order for incident investigators and those responsible for developing correlation rules to unambiguously interpret normalized events, the scheme must satisfy the basic properties:
In the process of developing normalization rules, information about the interaction must be found in the original event and decomposed into special fields. The same must be done with the context and essence of the interaction (more on this in the next article).
The question arises: is it possible to single out typical schemes for interactions that would be satisfied by any events created by all possible IT and information source sources? If so, what do these schemes look like?
In order to find the answer to these questions, it is necessary to turn to the analytics and try to analyze as much as possible the normalization rules already developed and functioning in the SIEM solutions to identify common patterns. As part of such work, it was possible to analyze more than 3,000 normalization rules from more than 100 different sources from solutions such as Positive Technologies MaxPatrol SIEM and Micro Focus ArcSight. The analysis resulted in the following conclusions:
We describe typical schemes for each level. Before this, you need to select entities that are always present in the events. Further, on their basis, interaction schemes will be constructed. These include:
However, not all entities can be simultaneously represented in the event (more on this later), so it is important to enter agreements initially, as in this case the corresponding fields of the scheme are filled. This will help to further clearly distinguish the cases in which these fields were not filled due to the error of the specialist developing the normalization rules from the cases in which the original event really did not contain data on any entity.
Let us turn to the interaction schemes and examples of events. For clarity, all examples will be given on the basis of file logs, syslog messages, or records in a relational database, but they can also be used for other log formats, for example, binary logs.
The main identifier of network layer entities is IP addresses. At the same time, it is important to understand that there may be other related identifiers — the MAC addresses on the data link layer, the FQDN — on the application layer. The question arises: do they speak about the same essence or about different ones? Can the same entity change these identifiers with time? A separate article will be devoted to this, now we’ll dwell on the fact that the main identifier for interaction models at the network level is the IP address.
So, typical interaction schemes of this level can be divided into two classes - basic and degenerate.
Scheme 1. Full scheme of interaction
Within the framework of this model, in the event received at the input of the SIEM, it is possible to distinguish all the main entities: Subject, Object, Source, Transmitter. In the interaction diagram, the Subject acts on the Object. This impact registers (observes) the Source and spawns an event. The event from the Source enters the Transmitter and from it enters the SIEM.
The event below fixes the resolution of the network interaction between the hosts by the Stonesoft firewall (now Forcepoint), while the event itself enters the SIEM not directly, but from an intermediate syslog server.
Here:
40.0.0.1 - Transmitter (intermediate syslog server),
30.0.0.1 - Source (firewall node),
10.0.0.1 - Subject (sending UDP packets),
20.0.0.1 - Object (receiving UDP packets).
Scheme 2. Scheme of direct collection without a transmitter It is
not always in the interaction scheme there is a Transmitter. It is usually present when an intermediate server (for example, a syslog server) is used to transmit events, or when a solution from which events are collected has a centralized management system — for example, Kaspersky Security Center, Check Point Smart Console, or Cisco Prime. According to this scheme, events come to the SIEM directly from the Source. Most of all events are described by this very scheme. By the way, an example of such an event can be seen in Scheme 1 , if there was no intermediate syslog server in it and we received events directly from the firewall.
Here:
30.0.0.1- Source (firewall node),
10.0.0.1 - Subject (sending UDP packets),
20.0.0.1 - Object (receiving UDP packets).
Scheme 3. Interaction with a multitude of Objects
This scheme of interaction at the network level is quite rare and, as a rule, is characteristic of network equipment events. In the scheme, one Subject interacts with a set of Objects, a similar interaction is present in events describing a multicast, unicast or broadcast distribution.
Note that sometimes the set of Objects can be combined by a common identifier — the subnet address or the broadcast address. This needs to be remembered, because when analyzing events, including at the level of correlation rules, you can easily miss potentially important interaction, since in such a scheme the Address of the Object is hidden behind the group address.
The following example shows an event with an IGMP Relay server through which a request to belong to a group address is broadcast.
Here:
30.0.0.1 - Source (IGMP Relay server),
10.0.0.1 - Subject (requesting membership in a group),
224.0.0.252 - Object (group address).
Subject, Object and Source are the basic entities in the group of basic interaction schemes. However, there are cases when one of the entities may be missing in the event.
Scheme 4. Interaction without an Object
Often such a scheme is typical in situations in which the Subject reports a change in his internal state - that is, he acts simultaneously in the role of the Subject and the Object. For example, such interaction can be observed in configuration change events or malware detection on a workstation. But this information is not recorded by the Subject himself, but by the centralized management system and is stored in his journal.
The example shows how the Symantec Management Server management server detects that the Symantec Endpoint Protection agent it manages detects a malicious file on its node.
Here:
30.0.0.1 - Source (Symantec Management Server),
10.0.0.1 - Subject (Symantec Endpoint Protection agent).
Scheme 5. Combination of the role of the Subject and the Object in the Source
The last degenerate interaction scheme is characteristic of the situation when the SIEM receives events from the Source that reports its internal state changes: for example, reconfiguration of the device or software, enabling or disabling the network port. In such a scheme, the role of the Source coincides with the role of the Subject and the Object. Unlike the previous scheme, here the events in the SIEM come directly.
In this example, a Cisco IOS based switch reports that its interface has changed to UP status.
Here 30.0.0.1 - Source (switch).
At this level, there are interactions of entities already known to us: the Subject, the Object. However, all information about the Source and Transmitter remains directly at the network level and does not have its reflection at the application level.
Most of all types of events include interactions at the network and application levels simultaneously. However, we note that events generated directly by application software, for example, 1C: Enterprise, Microsoft SQL Server or Oracle Database, may contain only application-level interactions.
In addition, an additional Resource entity appears at the application level .
Resource- an intermediate entity through which the Subject influences the Object without direct interaction. For example, giving Alex the user rights to access the MyFile file to the user Bob. Here Alex is the Subject, Bob is the Object, MyFile is the Resource. Please note that in this example, Alex does not directly interact with Bob.
Important : application-level events can contain both the advanced parameters of the Subject and the Object, and the Resource itself. For example, the additional parameters of such a Resource as a “file” may be the directory in which it is located, or its size.
In this case, the Subject, Object, and Resource are identified by their name or unique identifier: e-mail address, file name, directory name, table name in the database.
Consider additional interaction schemes that are specific to the application layer.
Scheme 6. Interaction through a resource
In this scheme, the Subject indirectly affects the Object through an intermediate Resource. As a rule, events with such a scheme are clearly visible in the database audit logs or work with access rights to files and directories at the OS level.
The example shows an entry from the Oracle Database DBMS audit log. It records the process of revoking the role of the user.
Here:
“ALEX” - Subject (name of the user who revokes the role),
“BOB” - Object (name of the user who is being withdrawn),
“ROLE” - Resource (name of the role being recalled).
Scheme 7. Interaction with a variety of resources
At the application level, as well as at the network level, there are such types of events in which the Subject interacts with the Object at once through a variety of Resources. It is very rare, but there are cases when the number of Objects is also more than one. These types of events appear when fixing bulk operations. For example, granting access to several files to one user or changing the set of rules included in a policy.
In the example, the solution for protecting virtual environments the Security Code vGate records the addition to the set of new policies.
Here:
“admin @ VGATE” - Subject (name of the user who changes the policy set)
“base” - Object (policy set)
"Installing and maintaining file system integrity", "Checking SNMP agent settings", "Preventing automatic installation of VMware Tools" - Resources (names of added policies)
In all the schemes we distinguished different entities (subjects, objects, resources, sources, transmitters) and noted the so-called interaction channel between them. Let us dwell in more detail on the penultimate component of the large model of the “world” that the SIEM should operate on — models of the interaction channel between the Subject and the Object. Recall that the last component is the context of interaction (the next article will be devoted to this).
So, there are two entities that interact with each other. As part of this interaction, data is transferred from one entity to another. These can be network data packets, files, or control commands. In this case, the resulting channel can be represented as a "pipe", through which the directional flow of data and commands goes. Such a model is clearly visible at the network level, but less pronounced at the application level (see example ).
Data Channel Model
Based on this model, each event that SIEM receives may contain information describing:
As a rule, the channel is described by such parameters as session identifier, data transfer protocol, channel establishment time, end time, duration. The data in the events are characterized by the format, the encryption algorithms used, the number of transmitted packets, the number of bytes transferred.
Consider an example of an event that contains data about the interaction channel. Here is an event from the identification and access control process management system - the Cisco Identity Services Engine (ISE), which records the user's network session as part of the account procedure.
Here:
"Acct-Session-Id = 1A346216", "Acct-Session-Time = 50", "Service-Type = Framed", "Framed-Protocol = PPP" - parameters of the communication channel,
"Acct-Input-Octets = 43525 "," Acct-Output-Octets = 122215 "," Acct-Input-Packets = 234 "," Acct-Output-Packets = 466 " are the parameters of the data transmitted over the channel.
So, we looked at the patterns of interaction between network levels and applications, as well as the model of the interaction channel. Next, we will show with an example how in one event the interaction schemes of different levels are combined and information about the channel model is used.
Here we see an event from the firewall - the Cisco Adaptive Security Appliance (ASA), in which the outbound TCP connection is fixed.
In the example it is clearly seen that within one event there are entities at the network level and application level. At the network level, the scheme of interaction between the Subject and the Object, which is fixed by the Source. Transmitter is missing.
Here:
30.0.0.1 - Source (Cisco ASA),
10.0.0.1 - Subject (the address of who connects),
20.0.0.1- Object (the address of who you connect to).
At the application level, a simple scheme in which only the Subject and the Object are present:
“ALEX” - Subject (the name of the user who connects),
“BOB” - Object (the name of the user to which they connect).
Also in this event there is a description of the data transmission channel, but there is no description of the data itself:
“TCP” is the protocol on the basis of which the channel is created,
“136247” is the channel session ID.
How can the typical interaction schemes we have identified help?
Thus, the model of the "world", which is built in the SIEM and is represented by a set of fields (schema), should contain sections for description:
For each entity, it is necessary to define a set of properties that uniquely identify it. At the network level, entities are identified by IP, MAC, or FQDN. At the application level - names or IDs. The schema must have dedicated fields for storing these identifiers.
There are degenerate interaction schemes in which one entity can combine several roles at once. When normalizing such events, it is necessary to explicitly define the rule for filling in all the fields of the scheme responsible for the entire set of entities. In the future, this will help the correlation rules not to miss some of the interactions.
Let us explain: take the case with the combination of the role of the Subject and the Object in the Source. If during normalization only the schema fields responsible for the Source are filled, then the correlation rules that analyze configuration changes on a specific Object simply skip the events we need, since the Object fields will be empty.
When writing correlation rules, it is important to clearly understand the events of which scheme and what level of interaction we work with. This will help to correctly interpret the roles of the entities involved in the events.
As a result, the general scheme capable of describing the entire set of typical interactions looks like this:
Field scheme focused on interactions
The next stage is the inclusion in the SIEM model of the world of the meaning or semantics of the interaction that we observe in the initial event. Practice shows that it is not enough to know that the user Alex from his workstation connected to the domain controller - it is important to understand that this was an attempt to login and, possibly, failed. When writing correlation rules, it is better to operate with semantics of the occurring phenomena, and not just with data from the event fields. Of course, you can somehow interpret and understand the meaning by looking at the data in the normalized event, but the correlator in the SIEM needs help in doing this.
In the next article we will talk about categorization and how it helps to unambiguously interpret the meaning of the interactions that are in the event. We will also put together everything described and formulate the basic principles underlying the methodology for the normalization of events, which are obtained from different sources.
A series of articles:
SIEM Depths: Out-of-Box Correlations. Part 1: Pure marketing or unsolvable problem?
SIEM depths: out-of-box correlations. Part 2. Data scheme as a reflection of the “world” model ( This article )
SIEM depths: out-of-box correlations. Part 3.1. Categorization of events
Depth SIEM: out-of-box correlations. Part 3.2.
SIEM Depth event normalization methodology : out-of-box correlations. Part 4. System model as a context of correlation rules.
Depth SIEM: out-of-box correlations. Part 5. Methodology for developing correlation rules
The problem of transforming the “world” model was described in the first article. Let us briefly recall its essence: when an event occurs at the event source (for example, the process starts in the OS) it is recorded in different formats, first in memory, then in the OS event log and then in the SIEM system. Each stage of processing is accompanied by data loss, since at the OS level there is one model of the “world”, and in the OS journal another one, limited by a set of fields, is a log scheme. Thus, there is a reflection (transformation) of one model with a large number of parameters to another, with a smaller number of them. Normalization and preservation of events in the SIEM is another transformation that also occurs with data loss, as well as inside its own model of the “world”.
It is difficult to find a way that would allow the transformation of one model to another without loss. Knowing this limitation, it is necessary to formulate such an approach to the normalization and formation of a list of fields of the event scheme, which would not lose information important for correlation and further investigation of information security incidents.
In the framework of the SIEM, the model is represented by a scheme - a set of fields in which, in the normalization process, the data from the original event are stored. In the future, it will be used by experts to create correlation rules. In order for incident investigators and those responsible for developing correlation rules to unambiguously interpret normalized events, the scheme must satisfy the basic properties:
- be unified for events of any type and source;
- clearly describe who interacted with whom and how;
- preserve the essence and context of the interaction.
In the process of developing normalization rules, information about the interaction must be found in the original event and decomposed into special fields. The same must be done with the context and essence of the interaction (more on this in the next article).
The question arises: is it possible to single out typical schemes for interactions that would be satisfied by any events created by all possible IT and information source sources? If so, what do these schemes look like?
In order to find the answer to these questions, it is necessary to turn to the analytics and try to analyze as much as possible the normalization rules already developed and functioning in the SIEM solutions to identify common patterns. As part of such work, it was possible to analyze more than 3,000 normalization rules from more than 100 different sources from solutions such as Positive Technologies MaxPatrol SIEM and Micro Focus ArcSight. The analysis resulted in the following conclusions:
- Typical interaction schemes exist.
- In each separate event, as a rule, there is information about interaction at the network level and at the application level .
- Typical interaction schemes may vary at different levels, and this should be taken into account.
Network and application layer interworking schemes
We describe typical schemes for each level. Before this, you need to select entities that are always present in the events. Further, on their basis, interaction schemes will be constructed. These include:
- Subject . An entity affecting an object. For example, a User changing a registry key or a host with IP 10.0.0.1, sending a packet to a host with IP 20.0.0.1.
- Object . An entity that is affected by the Subject.
- Source . As a rule, the host that registers the interaction of the Subject with the Object and generates the event itself. For example, the Source will be the firewall, which fixed the transfer of packets from the host - the Subject with IP 10.0.0.1, to the host - to the Object with IP 20.0.0.1.
- Transmitter . There are cases when the SIEM receives events not directly from the source, but from the intermediate server through which these events pass. The simplest example is an intermediate syslog server. An example is more complicated - when the Transmitter can be a management server, for example, Kaspersky Security Center. In this case, the Source is a specific Kaspersky Endpoint Security agent.
However, not all entities can be simultaneously represented in the event (more on this later), so it is important to enter agreements initially, as in this case the corresponding fields of the scheme are filled. This will help to further clearly distinguish the cases in which these fields were not filled due to the error of the specialist developing the normalization rules from the cases in which the original event really did not contain data on any entity.
Let us turn to the interaction schemes and examples of events. For clarity, all examples will be given on the basis of file logs, syslog messages, or records in a relational database, but they can also be used for other log formats, for example, binary logs.
Network level
The main identifier of network layer entities is IP addresses. At the same time, it is important to understand that there may be other related identifiers — the MAC addresses on the data link layer, the FQDN — on the application layer. The question arises: do they speak about the same essence or about different ones? Can the same entity change these identifiers with time? A separate article will be devoted to this, now we’ll dwell on the fact that the main identifier for interaction models at the network level is the IP address.
So, typical interaction schemes of this level can be divided into two classes - basic and degenerate.
Basic interaction schemes
Scheme 1. Full scheme of interaction
Within the framework of this model, in the event received at the input of the SIEM, it is possible to distinguish all the main entities: Subject, Object, Source, Transmitter. In the interaction diagram, the Subject acts on the Object. This impact registers (observes) the Source and spawns an event. The event from the Source enters the Transmitter and from it enters the SIEM.
The event below fixes the resolution of the network interaction between the hosts by the Stonesoft firewall (now Forcepoint), while the event itself enters the SIEM not directly, but from an intermediate syslog server.
Here:
40.0.0.1 - Transmitter (intermediate syslog server),
30.0.0.1 - Source (firewall node),
10.0.0.1 - Subject (sending UDP packets),
20.0.0.1 - Object (receiving UDP packets).
Scheme 2. Scheme of direct collection without a transmitter It is
not always in the interaction scheme there is a Transmitter. It is usually present when an intermediate server (for example, a syslog server) is used to transmit events, or when a solution from which events are collected has a centralized management system — for example, Kaspersky Security Center, Check Point Smart Console, or Cisco Prime. According to this scheme, events come to the SIEM directly from the Source. Most of all events are described by this very scheme. By the way, an example of such an event can be seen in Scheme 1 , if there was no intermediate syslog server in it and we received events directly from the firewall.
Here:
30.0.0.1- Source (firewall node),
10.0.0.1 - Subject (sending UDP packets),
20.0.0.1 - Object (receiving UDP packets).
Scheme 3. Interaction with a multitude of Objects
This scheme of interaction at the network level is quite rare and, as a rule, is characteristic of network equipment events. In the scheme, one Subject interacts with a set of Objects, a similar interaction is present in events describing a multicast, unicast or broadcast distribution.
Note that sometimes the set of Objects can be combined by a common identifier — the subnet address or the broadcast address. This needs to be remembered, because when analyzing events, including at the level of correlation rules, you can easily miss potentially important interaction, since in such a scheme the Address of the Object is hidden behind the group address.
The following example shows an event with an IGMP Relay server through which a request to belong to a group address is broadcast.
Here:
30.0.0.1 - Source (IGMP Relay server),
10.0.0.1 - Subject (requesting membership in a group),
224.0.0.252 - Object (group address).
Degenerate schemes
Subject, Object and Source are the basic entities in the group of basic interaction schemes. However, there are cases when one of the entities may be missing in the event.
Scheme 4. Interaction without an Object
Often such a scheme is typical in situations in which the Subject reports a change in his internal state - that is, he acts simultaneously in the role of the Subject and the Object. For example, such interaction can be observed in configuration change events or malware detection on a workstation. But this information is not recorded by the Subject himself, but by the centralized management system and is stored in his journal.
The example shows how the Symantec Management Server management server detects that the Symantec Endpoint Protection agent it manages detects a malicious file on its node.
Here:
30.0.0.1 - Source (Symantec Management Server),
10.0.0.1 - Subject (Symantec Endpoint Protection agent).
Scheme 5. Combination of the role of the Subject and the Object in the Source
The last degenerate interaction scheme is characteristic of the situation when the SIEM receives events from the Source that reports its internal state changes: for example, reconfiguration of the device or software, enabling or disabling the network port. In such a scheme, the role of the Source coincides with the role of the Subject and the Object. Unlike the previous scheme, here the events in the SIEM come directly.
In this example, a Cisco IOS based switch reports that its interface has changed to UP status.
Here 30.0.0.1 - Source (switch).
Application level
At this level, there are interactions of entities already known to us: the Subject, the Object. However, all information about the Source and Transmitter remains directly at the network level and does not have its reflection at the application level.
Most of all types of events include interactions at the network and application levels simultaneously. However, we note that events generated directly by application software, for example, 1C: Enterprise, Microsoft SQL Server or Oracle Database, may contain only application-level interactions.
In addition, an additional Resource entity appears at the application level .
Resource- an intermediate entity through which the Subject influences the Object without direct interaction. For example, giving Alex the user rights to access the MyFile file to the user Bob. Here Alex is the Subject, Bob is the Object, MyFile is the Resource. Please note that in this example, Alex does not directly interact with Bob.
Important : application-level events can contain both the advanced parameters of the Subject and the Object, and the Resource itself. For example, the additional parameters of such a Resource as a “file” may be the directory in which it is located, or its size.
In this case, the Subject, Object, and Resource are identified by their name or unique identifier: e-mail address, file name, directory name, table name in the database.
Consider additional interaction schemes that are specific to the application layer.
Scheme 6. Interaction through a resource
In this scheme, the Subject indirectly affects the Object through an intermediate Resource. As a rule, events with such a scheme are clearly visible in the database audit logs or work with access rights to files and directories at the OS level.
The example shows an entry from the Oracle Database DBMS audit log. It records the process of revoking the role of the user.
Here:
“ALEX” - Subject (name of the user who revokes the role),
“BOB” - Object (name of the user who is being withdrawn),
“ROLE” - Resource (name of the role being recalled).
Scheme 7. Interaction with a variety of resources
At the application level, as well as at the network level, there are such types of events in which the Subject interacts with the Object at once through a variety of Resources. It is very rare, but there are cases when the number of Objects is also more than one. These types of events appear when fixing bulk operations. For example, granting access to several files to one user or changing the set of rules included in a policy.
In the example, the solution for protecting virtual environments the Security Code vGate records the addition to the set of new policies.
Here:
“admin @ VGATE” - Subject (name of the user who changes the policy set)
“base” - Object (policy set)
"Installing and maintaining file system integrity", "Checking SNMP agent settings", "Preventing automatic installation of VMware Tools" - Resources (names of added policies)
Model of the channel of interaction between the Subject and the Object
In all the schemes we distinguished different entities (subjects, objects, resources, sources, transmitters) and noted the so-called interaction channel between them. Let us dwell in more detail on the penultimate component of the large model of the “world” that the SIEM should operate on — models of the interaction channel between the Subject and the Object. Recall that the last component is the context of interaction (the next article will be devoted to this).
So, there are two entities that interact with each other. As part of this interaction, data is transferred from one entity to another. These can be network data packets, files, or control commands. In this case, the resulting channel can be represented as a "pipe", through which the directional flow of data and commands goes. Such a model is clearly visible at the network level, but less pronounced at the application level (see example ).
Data Channel Model
Based on this model, each event that SIEM receives may contain information describing:
- The parameters of the channel itself - "pipe"
- Transmitted by this "pipe" data.
As a rule, the channel is described by such parameters as session identifier, data transfer protocol, channel establishment time, end time, duration. The data in the events are characterized by the format, the encryption algorithms used, the number of transmitted packets, the number of bytes transferred.
Consider an example of an event that contains data about the interaction channel. Here is an event from the identification and access control process management system - the Cisco Identity Services Engine (ISE), which records the user's network session as part of the account procedure.
Here:
"Acct-Session-Id = 1A346216", "Acct-Session-Time = 50", "Service-Type = Framed", "Framed-Protocol = PPP" - parameters of the communication channel,
"Acct-Input-Octets = 43525 "," Acct-Output-Octets = 122215 "," Acct-Input-Packets = 234 "," Acct-Output-Packets = 466 " are the parameters of the data transmitted over the channel.
An example of models of interaction between entities and a channel in a single event.
So, we looked at the patterns of interaction between network levels and applications, as well as the model of the interaction channel. Next, we will show with an example how in one event the interaction schemes of different levels are combined and information about the channel model is used.
Here we see an event from the firewall - the Cisco Adaptive Security Appliance (ASA), in which the outbound TCP connection is fixed.
In the example it is clearly seen that within one event there are entities at the network level and application level. At the network level, the scheme of interaction between the Subject and the Object, which is fixed by the Source. Transmitter is missing.
Here:
30.0.0.1 - Source (Cisco ASA),
10.0.0.1 - Subject (the address of who connects),
20.0.0.1- Object (the address of who you connect to).
At the application level, a simple scheme in which only the Subject and the Object are present:
“ALEX” - Subject (the name of the user who connects),
“BOB” - Object (the name of the user to which they connect).
Also in this event there is a description of the data transmission channel, but there is no description of the data itself:
“TCP” is the protocol on the basis of which the channel is created,
“136247” is the channel session ID.
findings
How can the typical interaction schemes we have identified help?
- First , the expert, when writing correlation rules and analyzing events, must understand which entities are present within the framework of each event that enters the SIEM. To do this, it is necessary at the stage of normalization of events to explicitly distinguish the entities: Subject, Object, Resource, Source and Transmitter.
- Secondly , during normalization, it is important to take into account that the event contains information on both the interaction of the network level and the application level. Both of these interactions can be simultaneously present in one event.
- Thirdly , the interaction itself is a composite structure in which there is information about the channel formed and about the data transmitted over this channel.
Thus, the model of the "world", which is built in the SIEM and is represented by a set of fields (schema), should contain sections for description:
- At the network level :
- The subject;
- Object;
- Source;
- Transmitter;
- Interaction channel;
- Data transmitted over the channel.
- At the application level :
- The subject;
- Object or set of objects;
- Resource or set of resources.
For each entity, it is necessary to define a set of properties that uniquely identify it. At the network level, entities are identified by IP, MAC, or FQDN. At the application level - names or IDs. The schema must have dedicated fields for storing these identifiers.
There are degenerate interaction schemes in which one entity can combine several roles at once. When normalizing such events, it is necessary to explicitly define the rule for filling in all the fields of the scheme responsible for the entire set of entities. In the future, this will help the correlation rules not to miss some of the interactions.
Let us explain: take the case with the combination of the role of the Subject and the Object in the Source. If during normalization only the schema fields responsible for the Source are filled, then the correlation rules that analyze configuration changes on a specific Object simply skip the events we need, since the Object fields will be empty.
When writing correlation rules, it is important to clearly understand the events of which scheme and what level of interaction we work with. This will help to correctly interpret the roles of the entities involved in the events.
As a result, the general scheme capable of describing the entire set of typical interactions looks like this:
Field scheme focused on interactions
The next stage is the inclusion in the SIEM model of the world of the meaning or semantics of the interaction that we observe in the initial event. Practice shows that it is not enough to know that the user Alex from his workstation connected to the domain controller - it is important to understand that this was an attempt to login and, possibly, failed. When writing correlation rules, it is better to operate with semantics of the occurring phenomena, and not just with data from the event fields. Of course, you can somehow interpret and understand the meaning by looking at the data in the normalized event, but the correlator in the SIEM needs help in doing this.
In the next article we will talk about categorization and how it helps to unambiguously interpret the meaning of the interactions that are in the event. We will also put together everything described and formulate the basic principles underlying the methodology for the normalization of events, which are obtained from different sources.
A series of articles:
SIEM Depths: Out-of-Box Correlations. Part 1: Pure marketing or unsolvable problem?
SIEM depths: out-of-box correlations. Part 2. Data scheme as a reflection of the “world” model ( This article )
SIEM depths: out-of-box correlations. Part 3.1. Categorization of events
Depth SIEM: out-of-box correlations. Part 3.2.
SIEM Depth event normalization methodology : out-of-box correlations. Part 4. System model as a context of correlation rules.
Depth SIEM: out-of-box correlations. Part 5. Methodology for developing correlation rules