Artificial intelligence in the network security service. Part 2

    Part 2. Part 1 by reference.

    In our case, the Introspect behavior analytics system from the User and Entity Behavior Analytics product class (UEBA for short) is a single entry point for a large amount of various machine information collected from the existing infrastructure, including from SIEM systems, and based on machine analytics algorithms artificial intelligence to help security staff automate the routine work of analyzing a large number of incidents.

    Moreover, the system can be integrated with existing infrastructure access control systems (NAC) to perform various actions with sources of abnormal behavior in the network — turn off, slow down, move to another VLAN, etc.



    What kind of data should Introspect receive as source information? The most diverse, up to network traffic. For this purpose, the system has specialized components for processing traffic - Packet Processor (PP).

    The advantage of receiving data from SIEM systems can be the fact that they have already passed preliminary analysis (parsing) by these systems. Introspect works with such SIEM systems as SPLUNK, QRadar, ArcSight. The next step is the implementation of LogRhythm (raw syslog), Intel Nitro. In addition, the system collects a huge array of data:

    MS Active Directory (AD Security Logs, AD user, group, group user), MS LDAP logs,
    DHCP logs,
    MS DHCP, Infoblox DHCP, dnsmasq DHCP
    DNS logs,
    MS DNS, Infoblox DNS, BIND
    Firewall logs
    Cisco ASA (syslog), Fortinet (via SPLUNK), Palo Alto (via SPLUNK), Checkpoint (via SPLUNK), Juniper (via SPLUNK).
    Proxy logs
    Bluecoat, McAfee, ForcePoint
    Alerts
    Fireeye
    MS ATA
    VPN logs
    Cisco Anyconnect / WebVPN
    Juniper VPN (via SPLUNK)
    Juniper Pulse Secure (via SPLUNK)
    Fortinet VPN (via SPLUNK)
    Checkpoint VPN
    Palo Alto VPN
    Flow logs
    Netflow v5, v7, v9
    Email logs
    Ironport ESA
    Bro logs
    Connection logs

    The competitive advantage of the system is the ability to operate at the transaction level, i.e. with network traffic that complements the information received in the log messages. This gives the system additional analytical capabilities — analyzing DNS queries, tunnel traffic, and efficiently searching for attempts to transfer sensitive data beyond the organization’s perimeter. In addition, the system provides analytics of packet entropy, analysis of HTTPS headers and files, as well as analysis of the work of cloud applications.

    The above-mentioned packet processors (PP) have a virtual and hardware implementation, operate at speeds of up to 5-6 Gbps and perform DPI “raw” data, extract context information or packet metadata from it and transfer it to another component of the system - the analyzer (Analyzer).

    If analysis decisions are made not only for logs, but also for traffic using SPAN / TAP methods or using a package broker or repeaters, such as Gigamon or Ixia, PP should be located in the right place on the network. For maximum efficiency, it is necessary to capture all network traffic going in each user VLAN to / from the Internet, as well as traffic going from / to users to protected resources, or servers containing critical information.

    A necessary and key component of the system is the Analyzer. It processes data from logs, flows, packet metadata, alerts from third-party systems, threat intelligence feeds and other sources.

    The Analyzer can be a single 2RU appliance or a horizontally scalable scale-out solution consisting of a set of 1RU appliance as well as a cloud solution.

    Logical structure

    Logically Analyzer is a horizontally scalable Hadoop platform consisting of several types of nodes - Edge Nodes, Index and Search nodes, Hadoop data nodes.
    Edge Nodes receive data and record to Flume channels with HDFS receivers.

    Index and Search nodes extract information from three types of bases - Hbase, Parquet, ElasticSearch.
    Hadoop data nodes are intended for data storage.

    Logically, the system works as follows - packet metadata, flows, logs, alerts, threat feeds are parsed, cached, distilled, correlated. The system binds between the user and his data, caching the fast-moving user data in HDFS.

    Next, the data goes to the discrete analytics module, where, based on the received information on any fixed event or field, the so-called discrete alarms are sifted out. For example, the operation of the DNS DGA algorithm or an attempt to enter a blocked account obviously do not require any computer analytics to detect a potentially dangerous event. The behavior analytics module is connected at this stage only for reading potential events on the network.

    The next step is the correlation of events, indexing and storage in the above mentioned bases. The behavior mechanism of analytics is enabled based on the stored information and can work on the basis of certain periods of time or on the basis of the behavior of this user in comparison with another user. This is the so-called baselining mechanism of behavior profiling. Models of behavior profiling are built on the basis of machine analytics algorithms SVD, RBM, BayesNet, K-means, Decision tree.

    The integrated model of the behavior of the analytics product is shown in Figure


    1. Figure 1

    The diagram shows that the mechanism of behavior analytics is based on four blocks:

    • data sources;
    • conditions of work with data (access time, the amount of downloaded or downloaded data, the number of e-mail messages, information about the geolocation of the source or receiver of information, VPN connection, etc.);
    • mechanisms for profiling user behavior (evaluation of behavior after some time or in relation to another employee, time window during which the analysis is performed, mathematical model of behavior profiling - SVD, Restricted Boltzmann Machine (RBM), BayesNet, K-means, Decision tree and others);
    • detection of anomalies in traffic using mathematical models of machine analytics, such as mahalanobis distance, energy distance and the generation of events in the system with a certain priority and stage.

    Aruba Introspect has over 100 supervised and unsupervised models designed to detect targeted attacks at every stage of the CKC model. For example, the implementation of the Introspect Advanced level detects

    • Suspicious network activity types: Abnormal Asset Access, Abnormal Data Usage, Abnormal Network Access, Adware Communication, Bitcoin Application as Bitcoin Mining, Botnet (TeslaCrypt, CryptoWall), Cloud Exfiltration, HTTP Protocol Anomaly (Header Misspellings, Header Misordering), Hacker tool Download, IOC attack types (IOC-STIX Abuse-ch, IOC-STIX CybercrimeTracker, IOC-STIX EmergingThreatsRules and others), Network Scan, P2P Application, Remote Command Execution, SSL Protocol Violation, Spyware Comm, Suspicious Data Usage, Suspicious External Access , Suspicious File, Suspicious Outbound Comm, WebShell, Malware communication, Command and Control, Lateral movement, Data Exfiltration, Browser exploit, Beaconing, SMB execution, Protocol violation, Internal Reconnaissance and more.
    • Suspicious access to accounts such as Abnormal Account Activity, Abnormal Asset Access, Abnormal Logon, Privilege Escalation, Suspicious Account Activity, Suspicious User Logon and others
    • Data access via VPN: Abnormal Data Usage, Abnormal Logon, Abnormal User Logon,
    • DNS data analysis: Botnet work through various DNS DGA algorithms
    • Email Analysis: Abnormal Incoming Email, Abnormal Outgoing Email, Suspicious Attachment, Suspicious Email

    Further, on the basis of the identified anomalies, an event is assigned a risk assessment associated with one or another stage of hacking the system, according to Lockheed Martin's Cyber ​​Kill Chain (CKC). Risk assessment is determined by the Hidden Markov Model model, unlike competitors, which linearly increase or decrease the risk assessment in their calculations.

    As the attack develops on the CKC model, i.e. stages of infection, internal reconnaissance, command & control, privilege escalation, lateral movement, exfiltration, risk assessment increases. See fig.2


    Fig.2

    The system has the functions of adaptive learning, when the results of the analytics module are revised or adapted, in risk scoring assessment or when placed on a white list.

    Information about threats or Threat Feeds can be downloaded from external sources, using the mechanisms of STIX, TAXII. Anomali resource is also supported. Introspect can also download “Whitelist” domain names from the Alexa service to reduce false positives in the generation of alerts.

    The competitive advantages of the system are:

    • a variety of input data used,
    • DPI function,
    • correlation of security events with the user, not the IP address, without additional software,
    • Using the Hadoop / Spark big data system as the basis, with unlimited clustering capabilities,
    • results of the system, obtained on the basis of analytics, the ability to investigate incidents using full-context forensics, threat hunting,
    • integration with the existing NAC Clearpass solution,
    • work without an agent at Endpoint,
    • practical independence from the type of network infrastructure manufacturer
    • On-premise work, without having to send data to the cloud

    The system has two delivery options - Standard Edition and Advanced Edition. The Standard Edition is adapted for Aruba Network equipment and receives log information from AD, AMON, LDAP, Firewall, VPN logs.

    Also popular now: