Sigma rules. Craft or new standard for SOC
I am Sergey Rublev, head of SOC (Security Operations Center) at Infosecurity.
In this article, I will examine in detail the ambitious project Sigma Rules , whose motto is: "Sigma for logs is like Snort for traffic and Yara for files."

It will be about three aspects:
It all started a few years ago when the trees were large, and our monitoring team was still small. We faced a lot of questions, almost any team that grows into a three-person line goes through this.

The reasons for the appearance of questions are different:
If you have to take on support already configured by someone SIEM, the number of questions grows like an avalanche.
The world experience in building monitoring centers has already come up with a solution for organizing chaos and its name is the library of case studies. The purpose of each case is to comprehensively describe the solution to a problem in the framework of information security monitoring.
The composition of knowledge laid down in each case may vary; we proceed from the following set:
Use case example for the task of detecting communication with the botnet management server (C&C or simply C2): The

example is significantly simplified, in reality, the case with proper description grows to a multi-page document.
At that moment, when the number of cases exceeded several tens, we started looking for ready-made tools for maintaining such a knowledge base, preferably having, in addition to human friendly, also some kind of machine-friendly interface for work.
The Sigma project certainly deserves consideration in the context of the knowledge base on incident detection rules. It started in 2016, and I have been following it almost from the very beginning.
In fact, the project consists of
The SIEM list is impressive: almost all popular event analysis solutions are present. Further about everything in detail and in order.
Sigma rules are YAML documents describing a scenario for detecting a specific attack. Syntactically, the rules consist of the following blocks:
Descriptive part to structure and simplify the search for the necessary rules.
I would also like to note that many rules are already provided with links to the attack technique according to the MITER ATT & CK methodology.
Description of the source based on the events of which detection logic is implemented.
It is syntactically possible to describe both the end service of a particular product and a whole category of systems.
At the detection logic level, the following are described:
Logic can be trivial, for example, conditions imposed on a set of fields:
and quite complicated:
The expressive means of the language, although not universal, are still quite wide and allow you to describe a large number of cases for identifying attacks.
In addition to your favorite text editor for YAML, WEB UI from SOC Prime is also available, which allows you to both validate the syntax of an already written rule and create rules manually from graphic blocks.

To summarize a brief summary.
Currently, the rules syntax mainly concentrates on the description of the threat detection logic and is not intended for a comprehensive description of the use case; accordingly, it will not work to maintain a full-fledged library using Sigma Rules only.
For the use case structure that we have chosen, Sigma closes only half (Objective, Data requirements, Logic and Priority).

Since we are a service provider of SOC services, the idea to keep all our developments according to the correlation rules in some universal format and at the implementation stage to convert to the desired SIEM format seemed very attractive to us.
The project includes console utilities for generating event requests in the format of various SIEMs. Consider what conversion is and what is under her hood.

The conversion takes place in 3 stages:
As a result, starting the conversion utility looks as follows:

The following parameters are passed as parameters:
SOC Prime also has a ready-made UI for the conversion function ( uncoder.io )

Let's see what the publicly available Sigma rule base carries. Currently, content is actively being added to two repositories:
Rules in the repositories have a nonzero intersection.
SOC Prime has a number of rules that apply to paid subscriptions; I do not consider their content in this article.
For analytics, we need the sigmatools library for Python and some programming skills.
To parse and load rules from a directory into a dictionary, you can use the following code:
Deduplicating the same rules, the following picture emerges:

As part of a unique list of rules, we obtain the following distributions:
By type of event source:

A bit larger statistics
Basically, the current content focuses on the Windows and Sysmon system, in particular, the rules for the rest of the systems are a few.
By content availability:

It turns out that the developers of Sigma-rules marked as stable less than 20% of all existing rules.
In publicly available sources there are a large number of rules. They are regularly updated, and the rules for detecting indicators, and sometimes even the technician for the most high-profile APT companies, quickly appear.
There are a lot of restrictions for applying the rules in real life:
At Infosecurity, we use the content of Sigma rules as an additional source of knowledge for more effective detection of incidents. If we find something interesting, we will already implement it within the framework of our correlation rules, which take into account both the kernel on which the rules work (Apache Spark) and the specifics of the infrastructures and the means of protection we use.
In this article, I will examine in detail the ambitious project Sigma Rules , whose motto is: "Sigma for logs is like Snort for traffic and Yara for files."

It will be about three aspects:
- Applicability of Sigma-rule syntax for maintaining a knowledge base of threat detection scripts
- Capabilities of rule generation tools for boxed SIEM systems
- Value for the SOC of the current content of the public repositories of Sigma rules
Once upon a time, in a far, far galaxy
It all started a few years ago when the trees were large, and our monitoring team was still small. We faced a lot of questions, almost any team that grows into a three-person line goes through this.

The reasons for the appearance of questions are different:
- Team growth
- Staff turnover
- A large number of heterogeneous systems for monitoring
If you have to take on support already configured by someone SIEM, the number of questions grows like an avalanche.
Use Case Library
The world experience in building monitoring centers has already come up with a solution for organizing chaos and its name is the library of case studies. The purpose of each case is to comprehensively describe the solution to a problem in the framework of information security monitoring.
The composition of knowledge laid down in each case may vary; we proceed from the following set:
- Objective - the task solved by the case
- Threat - the threat that the detection rule seeks to detect.
- Stakeholders - people interested in this rule: IB / IT / Business
- Data Requirements - the data set required to identify a threat
- Logic - threat detection rule logic
- Testing - an algorithm for testing the correctness of the detection rule
- Priority - priority of event processing by case (usually calculated from the potential damage from a successfully implemented threat)
- Output - A list of actions for parsing the alert, a description of the correct exits from the parsing procedure and the composition of the data recorded in the parsing results
Use case example for the task of detecting communication with the botnet management server (C&C or simply C2): The

example is significantly simplified, in reality, the case with proper description grows to a multi-page document.
At that moment, when the number of cases exceeded several tens, we started looking for ready-made tools for maintaining such a knowledge base, preferably having, in addition to human friendly, also some kind of machine-friendly interface for work.
Sigma Project
The Sigma project certainly deserves consideration in the context of the knowledge base on incident detection rules. It started in 2016, and I have been following it almost from the very beginning.
In fact, the project consists of
- Sigma rules themselves
- Utilities for converting rules into queries for various SIEM systems
The SIEM list is impressive: almost all popular event analysis solutions are present. Further about everything in detail and in order.
Rule syntax
Sigma rules are YAML documents describing a scenario for detecting a specific attack. Syntactically, the rules consist of the following blocks:
Meta information
Descriptive part to structure and simplify the search for the necessary rules.
title: Access to ADMIN$ Share
description: Detects access to $ADMIN share
author: Florian Roth
falsepositives:
- Legitimate administrative activity
level: low
tags:
- attack.lateral_movement
- attack.t1077
status: experimental
I would also like to note that many rules are already provided with links to the attack technique according to the MITER ATT & CK methodology.
Data Source Declaration
Description of the source based on the events of which detection logic is implemented.
logsource:
product: windows
service: security
It is syntactically possible to describe both the end service of a particular product and a whole category of systems.
Processing Logic Declaration
At the detection logic level, the following are described:
- Searched patterns
- Values of certain fields in the log
- Time frame
- Aggregate Functions
Logic can be trivial, for example, conditions imposed on a set of fields:
detection:
selection:
EventID: 5140
ShareName: Admin$
filter:
SubjectUserName: '*$'
condition: selection and not filter
and quite complicated:
detection:
selection1:
EventID:
- 529
- 4625
UserName: '*'
WorkstationName: '*'
selection2:
EventID: 4776
UserName: '*'
Workstation: '*'
timeframe: 24h
condition:
- selection1 | count(UserName) by WorkstationName > 3
- selection2 | count(UserName) by Workstation > 3
The expressive means of the language, although not universal, are still quite wide and allow you to describe a large number of cases for identifying attacks.
Rule Development Tools
In addition to your favorite text editor for YAML, WEB UI from SOC Prime is also available, which allows you to both validate the syntax of an already written rule and create rules manually from graphic blocks.

Sigma as a Knowledge Base Tool
To summarize a brief summary.
Currently, the rules syntax mainly concentrates on the description of the threat detection logic and is not intended for a comprehensive description of the use case; accordingly, it will not work to maintain a full-fledged library using Sigma Rules only.
For the use case structure that we have chosen, Sigma closes only half (Objective, Data requirements, Logic and Priority).

Convert to various SIEM
Since we are a service provider of SOC services, the idea to keep all our developments according to the correlation rules in some universal format and at the implementation stage to convert to the desired SIEM format seemed very attractive to us.
The project includes console utilities for generating event requests in the format of various SIEMs. Consider what conversion is and what is under her hood.

The conversion takes place in 3 stages:
- Parsing rules - I think this is clear: the YAML document is sorted into component blocks
- Reduction to SIEM taxonomy The
necessity of this stage is related to the fact that normalization in SIEM systems is implemented in a slightly different way, so declarations from Sigma rules need to be reduced to the taxonomy of events of the selected SIEM - Generating a request for SIEM
For this stage to work, one more component is required - a backend for this SIEM.
In fact, the backend is a plug-in for the conversion utility, which contains the logic for converting to the final request format in SIEM. The detection and logsource blocks are converted taking into account previously superimposed mapping of fields, additional SIEM-specific information is added.
As a result, starting the conversion utility looks as follows:

The following parameters are passed as parameters:
- Target SIEM
- The rule
- Mapping file for this SIEM
SOC Prime also has a ready-made UI for the conversion function ( uncoder.io )

Pitfalls of conversion
- After studying the mechanics of conversion, we encountered significant limitations, which kept us from converting all the developments to the Sigma format:
- The converter operates only with a request. The correlation rule in SIEM affects more aspects: time window, aggregation, actions based on the results of identified alerts
- Key features of individual SIEMs, such as ActiveLists, are not taken into account.
- Insufficient detailing of field mapping - as part of the mapping configuration, the fields of only a few sources are described; accordingly, having rules for several tens of different types of event sources in the database, you will have to invest heavily in writing the mapping.
Rule Base
Let's see what the publicly available Sigma rule base carries. Currently, content is actively being added to two repositories:
- The main project repository
- SOC Prime Threat Detection Marketplace
Rules in the repositories have a nonzero intersection.
SOC Prime has a number of rules that apply to paid subscriptions; I do not consider their content in this article.
For analytics, we need the sigmatools library for Python and some programming skills.
To parse and load rules from a directory into a dictionary, you can use the following code:
from sigma.parser.collection import SigmaCollectionParser
import pathlib
import itertools
def alliter(path):
for sub in path.iterdir():
if sub.name.startswith("."):
continue
if sub.is_dir():
yield from alliter(sub)
else:
yield sub
def get_inputs(paths, recursive):
if recursive:
return list(itertools.chain.from_iterable([list(alliter(pathlib.Path(p))) for p in paths]))
else:
return [pathlib.Path(p) for p in paths]
BASE_PATH = [r'sigma\rules']
path_list = get_inputs(BASE_PATH, True)
rules_map = {}
for sigmafile in get_inputs(BASE_PATH, True):
f = sigmafile.open(encoding='utf-8')
parser = SigmaCollectionParser(f)
rule = next(iter(parser))
rules_map[rule['title']] = rule
Deduplicating the same rules, the following picture emerges:

As part of a unique list of rules, we obtain the following distributions:
By type of event source:

A bit larger statistics
- Windows ~ 80%
- Sysmon ~ 53%
- Proxy ~ 8%
- Linux ~ 4%
Basically, the current content focuses on the Windows and Sysmon system, in particular, the rules for the rest of the systems are a few.
By content availability:

It turns out that the developers of Sigma-rules marked as stable less than 20% of all existing rules.
To summarize
In publicly available sources there are a large number of rules. They are regularly updated, and the rules for detecting indicators, and sometimes even the technician for the most high-profile APT companies, quickly appear.
There are a lot of restrictions for applying the rules in real life:
- There are a lot of rules for Microsoft Sysmon, which is rarely used in enterprise.
- There are many rules that actually perform IoC checks (hashes, IP addresses, URLs, User Agents). Such rules quickly become obsolete, and there are more effective mechanisms for finding IoC than rules.
- A lot of experimental content, respectively, additional requirements are imposed on high-quality testing before commissioning.
At Infosecurity, we use the content of Sigma rules as an additional source of knowledge for more effective detection of incidents. If we find something interesting, we will already implement it within the framework of our correlation rules, which take into account both the kernel on which the rules work (Apache Spark) and the specifics of the infrastructures and the means of protection we use.