JSOC: the experience of the young Russian MSSP

    As part of the corporate blog, I would like to launch a series of articles about our young (but, nevertheless, very bright) initiative in the field of information security - JSOC (Jet Security Operation Center) - a commercial center for monitoring and responding to incidents. In the articles I will try to do less self-promotion and pay more attention to practice: our experience and the principles of building services. Nevertheless, this is my first “habro experience”, and therefore do not judge strictly.

    SOC - Prerequisites


    I don’t really want to tell why a large Russian company needs a SOC at all (there are too many various articles and studies written on this subject). But statistics is a completely different matter, and it’s a sin not to recall it. For example:
    • in a company of 1 to 5 thousand people during the year, the following is recorded:
      • 90 million IS events;
      • 16,865 suspected incident events;
      • 109 real information security incidents;
    • the total loss from IS incidents in 2013 amounted to $ 25 billion;
    • a large company uses at least 15 heterogeneous remedies, and no more than 7 of them actively analyze logs to identify incidents.

    If we add to this another 3-4 news headlines on the relevant topic, then the idea that security needs to be monitored and IS incidents should be identified and analyzed becomes absolutely logical and understandable.

    What do security experts advise on this? Of course, make the SIEM solution the core of an existing or under construction SOC. This will solve several problems at once:
    • to close incidents recorded by other systems independently, within the framework of one single core of incident management;
    • Get a convenient tool to search for necessary events, investigate incidents, store collected data;
    • identify statistical deviations and slowly developing incidents through the analysis of large intervals and volumes of information from specific means of protection;
    • compare and correlate data from different systems, and, as a result, build complex chains of scenarios for detecting incidents, “enrich” the information in the logs of some systems with data from others.

    A bit of general methodology


    There are several levels of SOC maturity - SOMM (Security Operations Maturity Model):
    SOMM Levels

    Fig. 1 - SOMM Levels

    Unfortunately, most companies, having taken the first step towards their own incident monitoring center, stop there. According to HP estimates, 24% of SOCs in the world do not reach level 1, and only 30% of SOCs correspond to the base (2) level. The statistics of the distribution of SOMM levels depending on the business sector of companies collected in 13 countries of the world (including Canada, the USA, China, Great Britain, Germany, South Africa and others) are as follows:
    Distribution of SOMM levels by business area

    Fig. 2 - Distribution of SOMM levels by business area

    SOC in-house: issues


    At the same time, almost all major Russian companies went along the path of introducing a large-scale SIEM solution. Did they manage to build effective SOCs? Unfortunately, most often not: today we know the experience of only four successful SOC launches in Russia.

    And, as a rule, when starting to build their own SOC, everyone faces three faces of the same problem.

    Firstly, with a quantitative shortage of staff for a variety of reasons: from staff shortages and the lack of specialized universities to the difficulty of acquiring the required competencies. De facto, within the framework of the it-security division, 4–5 people work today, carrying out the entire cycle of work to ensure the company's security (from administering security measures to regular risk analysis and developing a strategy for developing topics in the company). Naturally, with such a load, it is almost impossible to devote proper time to the tasks of SOC.

    The second point is the impossibility of building an effective monitoring process with internal SLAs. In addition to the need to allocate personnel, the launch of SOC usually entails the creation of a full-time shift shift in the it-security division, working as part of an extended working day or around the clock. And this is from 2 to 5 new staffing units. At the same time, the allocation of personnel is directly related to the need for constant monitoring of personnel turnover (it is extremely rare for information security specialists to work in the night shift), building processes and internal quality control of the work performed.

    Well, the third point cannot but mention the need not only to handle emerging incidents, but also constantly “tune” and adjust the system to changing infrastructure or emerging security threats. And this, regardless of the chosen tool, is a very laborious task for the analyst, requiring constantly keeping abreast. And the presence of a person engaged in pure analytics and SOC development is a big luxury (even for a large company).

    Assessment of the current market demand for creating SOCs, coupled with the described nuances, led us first to the idea, and then to the actual construction of our own commercial SOC.

    Platform selection


    Naturally, when launching SOC, we first of all came across the question: “What SIEM solution to make the core of our system”? Responding to it, we formed a list of requirements for the created system. In particular, it should:
    • allow physically and logically separate accumulated data for different resource pools (in our case, for different customers) with the possibility of separation of access rights;
    • allow you to build the most complex chains and relationships between events, use various directories and events to supplement the incident with important information. At the same time, we needed a framework for building our own logic for detecting incidents rather than already written rules and scripts;
    • have the ability to write and develop integration buses both in the direction of source systems (and here the maximum flexibility in writing connectors to target systems / directories is of key importance) and api for linking with external incident management, reporting and visualization systems;
    • allow you to customize internal resources for changing SOC tasks. In particular, creating an internal profile for monitoring sources, maintaining and customizing your incident management, etc. (by the way, these studies will be the subject of a separate article).

    We opted for the flagship product in the SIEM class - HP ArcSight (and, despite various difficulties in the life of the system, we never regretted our choice). Technologically, JSOC is not only HP ArcSight for a long time. The SIEM core has gradually grown into various useful features: traffic monitoring, ips \ ids, vulnerability assessment, etc. At the same time, we have accumulated a large number of scripts, add-ons and our own developments, integrated with our own Security Intelligence solution (JiVS), which is:
    • a tool for a high-level search for anomalies in the client and tracking general trends in activities and incidents;
    • a system for monitoring and visualizing our implementation of SLA before the customer;
    • an effective visual dashboard and reporting system for business customer guidance.

    As a result, we formed such protection profiles / areas for detecting incidents at customer companies, such as:
    • attacks on external web resources of the company;
    • unauthorized access to systems and applications;
    • comprehensive security for business applications;
    • Virus and malware activity in the company’s network, including heuristic detection of zero-day viruses;
    • violation of the policies for using remote access to the company network;
    • illegitimate user actions when accessing the Internet and working with external devices;
    • Anomalies in the authentication and use of accounts;
    • and other categories of incidents, depending on the infrastructure of the company, its internal information security policies and the means of protection used.

    Infrastructure


    JSOC Service Infrastructure

    Fig. 3 - JSOC service infrastructure

    After choosing the main technological platform, it was necessary to solve the problems of creating infrastructure and determine the location. The experience of our Western colleagues shows that the target accessibility of architecture should be at least 99.5% (and with maximum cataclysm resistance). At the same time, the geography issue remained fundamental: collocation is possible only within the borders of the Russian Federation, which excluded the possibility of using popular western providers for us. Natural questions of providing information security infrastructure at all levels of access were superimposed on this, and, by and large, we had no choice: we turned to the team of our water supply center. As part of the large colocation for our JSOC, a fragment was specially allocated where we were able to deploy our architecture, while at the same time tightening the security profiles already existing within the framework of the WDC. IT infrastructure is deployed in the Tier 3 data center of our company, and its availability rates are 99.8%. As a result, we were able to reach the target indicators of the availability of our service and received significant freedom of action in the work and adaptation of the system for ourselves.

    Command


    At the initial stage of launching the service, the JSOC team consisted of 3 people: two monitoring engineers, closing the time interval from 8 to 22 hours, and one analyst / administrator who was involved in the development of the rules. The SLA for the service designated by the customers was also quite mild: the reaction time to the detected incident was up to 30 minutes, the time for analysis, preparation of the analytical report and informing the client was up to 2 hours. But, after the first months of work, we made some very significant conclusions:
    1. Change of monitoring must necessarily work in 24 * 7 mode. Despite the significantly smaller volume of incidents in the evening and night hours, the most important and critical events (the start of DDoS attacks, the final phases of slow attacks on penetration through the external perimeter, malicious actions of counterparties, etc.) still occur precisely at night and by the time the morning shift starts, they are already losing their relevance.
    2. Critical incident parsing time should not exceed 30 minutes. Otherwise, the chances of preventing it or significantly minimizing the damage are catastrophically falling.
    3. To ensure the required parsing time for each incident, a full-fledged toolkit for its investigation should be prepared: active channels with filtered target events for parsing, trends showing statistical changes in suspicious activities and targeted analytical reports that allow you to quickly analyze activities and make operational decisions.
    4. The security management team for our clients should be separate from the incident monitoring and detection team. Otherwise, the risk of the human factor in the chain “made configuration changes - recorded an incident - noted a false positive” could significantly affect the quality of our service.

    In practice, all these conclusions resulted in the creation of a separate structural unit within the framework of the Information Security Center of Jet Infosystems company, focused on a three-level model for providing each of the tasks: both monitoring and parsing incidents, and administering security tools. Now the division has more than 30 people, has an established structure (see. Fig. 4) and includes:
    • 2 duty shifts that work 24 * 7: one is engaged in monitoring and analysis of incidents, the other - system administration;
    • a dedicated development team, abstracted from operational activities within our clients, and allowing us to maintain the relevance of the service and the threat monitoring profile.

    JSOC Organization Chart

    Fig. 4 - JSOC Organizational Structure

    This organizational structure allowed us to reach the following SLA targets:
    Jet Security Operation Center options Base Advanced Premium
    Service time 8 * 5 24 * 7 24 * 7
    Incident Detection Time (min) Critical incidents 15-30 10-20 5-10
    Other incidents up to 60 up to 60 up to 45
    Basic diagnostics and customer information time (min) Critical incidents 45 thirty 20
    Other incidents up to 120 up to 120 up to 90
    Time for issuing recommendations on counteraction Critical incidents up to 2 hours up to 1.5 hours up to 45 min
    Other incidents up to 8 hours up to 6 hours up to 4 hours

    At the moment, we serve 12 clients and solve the tasks facing us to ensure their information security. These are the results of the first one and a half years of JSOC.



    I hope this material did not seem too marketing to you. In future articles, we plan to cover topics such as:
    • Availability of SOC: what it is, what it is made of and how to measure it;
    • How far is the path from the correlation rule in SIEM to a working incident detection scenario;
    • Organizational issues: what to teach and not to teach SOC specialists;
    • A little practice on parsing incidents.



    To be continued ...;) dryukov

    Also popular now: