srublev March 5, 2019 at 11:30

Sigma rules. Craft or new standard for SOC

I am Sergey Rublev, head of SOC (Security Operations Center) at Infosecurity.
In this article, I will examine in detail the ambitious project Sigma Rules , whose motto is: "Sigma for logs is like Snort for traffic and Yara for files."

It will be about three aspects:

Applicability of Sigma-rule syntax for maintaining a knowledge base of threat detection scripts
Capabilities of rule generation tools for boxed SIEM systems
Value for the SOC of the current content of the public repositories of Sigma rules

Once upon a time, in a far, far galaxy

It all started a few years ago when the trees were large, and our monitoring team was still small. We faced a lot of questions, almost any team that grows into a three-person line goes through this.

The reasons for the appearance of questions are different:

Team growth
Staff turnover
A large number of heterogeneous systems for monitoring

If you have to take on support already configured by someone SIEM, the number of questions grows like an avalanche.

Use Case Library

The world experience in building monitoring centers has already come up with a solution for organizing chaos and its name is the library of case studies. The purpose of each case is to comprehensively describe the solution to a problem in the framework of information security monitoring.

The composition of knowledge laid down in each case may vary; we proceed from the following set:

Objective - the task solved by the case
Threat - the threat that the detection rule seeks to detect.
Stakeholders - people interested in this rule: IB / IT / Business
Data Requirements - the data set required to identify a threat
Logic - threat detection rule logic
Testing - an algorithm for testing the correctness of the detection rule
Priority - priority of event processing by case (usually calculated from the potential damage from a successfully implemented threat)
Output - A list of actions for parsing the alert, a description of the correct exits from the parsing procedure and the composition of the data recorded in the parsing results

Use case example for the task of detecting communication with the botnet management server (C&C or simply C2): The

example is significantly simplified, in reality, the case with proper description grows to a multi-page document.

At that moment, when the number of cases exceeded several tens, we started looking for ready-made tools for maintaining such a knowledge base, preferably having, in addition to human friendly, also some kind of machine-friendly interface for work.

Sigma Project

The Sigma project certainly deserves consideration in the context of the knowledge base on incident detection rules. It started in 2016, and I have been following it almost from the very beginning.

In fact, the project consists of

Sigma rules themselves
Utilities for converting rules into queries for various SIEM systems

The SIEM list is impressive: almost all popular event analysis solutions are present. Further about everything in detail and in order.

Rule syntax

Sigma rules are YAML documents describing a scenario for detecting a specific attack. Syntactically, the rules consist of the following blocks:

Meta information

Descriptive part to structure and simplify the search for the necessary rules.

title: Access to ADMIN$ Share
description: Detects access to $ADMIN share
author: Florian Roth
falsepositives: 
    - Legitimate administrative activity
level: low
tags:
    - attack.lateral_movement
    - attack.t1077
status: experimental

I would also like to note that many rules are already provided with links to the attack technique according to the MITER ATT & CK methodology.

Data Source Declaration

Description of the source based on the events of which detection logic is implemented.

logsource:
    product: windows
    service: security

It is syntactically possible to describe both the end service of a particular product and a whole category of systems.

Processing Logic Declaration

At the detection logic level, the following are described:

Searched patterns
Values of certain fields in the log
Time frame
Aggregate Functions

Logic can be trivial, for example, conditions imposed on a set of fields:

detection:
    selection:
        EventID: 5140
        ShareName: Admin$
    filter:
        SubjectUserName: '*$'
    condition: selection and not filter

and quite complicated:

detection:
    selection1:
        EventID:
            - 529
            - 4625
        UserName: '*'
        WorkstationName: '*'
    selection2:
        EventID: 4776
        UserName: '*'
        Workstation: '*'
    timeframe: 24h 
    condition:
        - selection1 | count(UserName) by WorkstationName > 3
        - selection2 | count(UserName) by Workstation > 3

The expressive means of the language, although not universal, are still quite wide and allow you to describe a large number of cases for identifying attacks.

Rule Development Tools

In addition to your favorite text editor for YAML, WEB UI from SOC Prime is also available, which allows you to both validate the syntax of an already written rule and create rules manually from graphic blocks.

Sigma as a Knowledge Base Tool

To summarize a brief summary.

Currently, the rules syntax mainly concentrates on the description of the threat detection logic and is not intended for a comprehensive description of the use case; accordingly, it will not work to maintain a full-fledged library using Sigma Rules only.

For the use case structure that we have chosen, Sigma closes only half (Objective, Data requirements, Logic and Priority).

Convert to various SIEM

Since we are a service provider of SOC services, the idea to keep all our developments according to the correlation rules in some universal format and at the implementation stage to convert to the desired SIEM format seemed very attractive to us.

The project includes console utilities for generating event requests in the format of various SIEMs. Consider what conversion is and what is under her hood.

The conversion takes place in 3 stages:

Parsing rules - I think this is clear: the YAML document is sorted into component blocks
Reduction to SIEM taxonomy The
necessity of this stage is related to the fact that normalization in SIEM systems is implemented in a slightly different way, so declarations from Sigma rules need to be reduced to the taxonomy of events of the selected SIEM
Generating a request for SIEM
For this stage to work, one more component is required - a backend for this SIEM.
In fact, the backend is a plug-in for the conversion utility, which contains the logic for converting to the final request format in SIEM. The detection and logsource blocks are converted taking into account previously superimposed mapping of fields, additional SIEM-specific information is added.

As a result, starting the conversion utility looks as follows:

The following parameters are passed as parameters:

Target SIEM
The rule
Mapping file for this SIEM

SOC Prime also has a ready-made UI for the conversion function ( uncoder.io )

Pitfalls of conversion

After studying the mechanics of conversion, we encountered significant limitations, which kept us from converting all the developments to the Sigma format:
The converter operates only with a request. The correlation rule in SIEM affects more aspects: time window, aggregation, actions based on the results of identified alerts
Key features of individual SIEMs, such as ActiveLists, are not taken into account.
Insufficient detailing of field mapping - as part of the mapping configuration, the fields of only a few sources are described; accordingly, having rules for several tens of different types of event sources in the database, you will have to invest heavily in writing the mapping.

Rule Base

Let's see what the publicly available Sigma rule base carries. Currently, content is actively being added to two repositories:

The main project repository
SOC Prime Threat Detection Marketplace

Rules in the repositories have a nonzero intersection.
SOC Prime has a number of rules that apply to paid subscriptions; I do not consider their content in this article.

For analytics, we need the sigmatools library for Python and some programming skills.

To parse and load rules from a directory into a dictionary, you can use the following code:

from sigma.parser.collection import SigmaCollectionParser
import pathlib
import itertools
def alliter(path):
    for sub in path.iterdir():
        if sub.name.startswith("."):
            continue
        if sub.is_dir():
            yield from alliter(sub)
        else:
            yield sub
def get_inputs(paths, recursive):
    if recursive:
        return list(itertools.chain.from_iterable([list(alliter(pathlib.Path(p))) for p in paths]))
    else:
        return [pathlib.Path(p) for p in paths]
BASE_PATH = [r'sigma\rules']
path_list = get_inputs(BASE_PATH, True)
rules_map = {}
for sigmafile in get_inputs(BASE_PATH, True):
    f = sigmafile.open(encoding='utf-8')
    parser = SigmaCollectionParser(f)
    rule = next(iter(parser))
    rules_map[rule['title']] = rule

Deduplicating the same rules, the following picture emerges:

As part of a unique list of rules, we obtain the following distributions:

By type of event source:

A bit larger statistics

Windows ~ 80%
Sysmon ~ 53%
Proxy ~ 8%
Linux ~ 4%

Basically, the current content focuses on the Windows and Sysmon system, in particular, the rules for the rest of the systems are a few.

By content availability:

It turns out that the developers of Sigma-rules marked as stable less than 20% of all existing rules.

To summarize

In publicly available sources there are a large number of rules. They are regularly updated, and the rules for detecting indicators, and sometimes even the technician for the most high-profile APT companies, quickly appear.

There are a lot of restrictions for applying the rules in real life:

There are a lot of rules for Microsoft Sysmon, which is rarely used in enterprise.
There are many rules that actually perform IoC checks (hashes, IP addresses, URLs, User Agents). Such rules quickly become obsolete, and there are more effective mechanisms for finding IoC than rules.
A lot of experimental content, respectively, additional requirements are imposed on high-quality testing before commissioning.

At Infosecurity, we use the content of Sigma rules as an additional source of knowledge for more effective detection of incidents. If we find something interesting, we will already implement it within the framework of our correlation rules, which take into account both the kernel on which the rules work (Apache Spark) and the specifics of the infrastructures and the means of protection we use.

Tags: