Detection of known malicious code in TLS-encrypted traffic (without decryption)

    This article discusses the work of a team of Cisco researchers, proving the applicability of traditional methods of statistical and behavioral analysis to detect and attribute malware using TLS as a method of encrypting communication channels without decrypting or compromising a TLS session, as well as a description of the Encrypted Traffic Analytics solution. implementing the principles laid down in this study.

    The widespread use of the TLS protocol by malware has created new problems for security tools, since traditional methods for detecting by pattern (signature) search are not applicable in this case. Nevertheless, TLS still has a whole range of options available to a third-party observer that can be used to search for malware both when analyzing traffic from the client side on which the encrypted connection is running, and analyzing calls to the server to which the encrypted tunnel is being built. In this case, we can only analyze the establishment of a secure connection, without access to the transmitted confidential information, and without decrypting the latter. In most cases, such an analysis allows a fairly accurate attribution of the established connection to belong to a particular HPE family, even if we are dealing with a single fully encrypted connection. To test this hypothesis, a group of Cisco employees - Blake Anderson, Subharthi Paul, David McGrew, conducted a detailed study "Deciphering Malware's use of TLS (without Decryption)", a preprint of the work is freely available atarxiv.org/abs/1607.01639 exactly how malicious and corporate applications use TLS. An analysis of several million TLS-encrypted connections was carried out, the possibility of attribution of 18 malware families using thousands of unique malware samples and tens of thousands of malicious TLS connections was verified. One of the most important results of this work was checking the correct operation of the sandbox detection mechanisms and other analysis tools used.

    The performance of the malware classifier correlates well with the way the given malware family uses TLS; families of malware that use cryptographic functions to a greater extent are more difficult to classify.

    We have proven that the use of TLS by malware and legitimate applications is different, and these differences can be successfully applied to create behavioral detection rules or classifiers used in machine learning.

    How and where can we get this information? We can collect it directly on network devices, switches and routers that allow you to collect network telemetry (unsampled Netflow / IPFIX) for analyzing connection information, and also transfer for analysis the first initialization packet of an encrypted TLS connection (Initial Data Packet, IDP), to analyze TLS metadata. We can also collect related information about DNS and HTTP requests to increase the accuracy of detection and reduce the number of errors and information about global reputation or suspicious behavior based on information from the cloud reputation center.

    image

    The architecture of the solution is as follows:

    image

    As an example of using this technology to collect information about the cryptographic parameters used (compliance with regulatory requirements, for example, to audit PCI-DSS compliance):

    image

    Malware detection (information from the Cisco Cognitive Analytics, CTA global cloud center):

    image

    Malware detection (correlation global and local information):

    image

    Example of incident investigation:

    image

    Confirmed threat:

    image

    When creating machine learning classifiers based on belonging to a particular HPE family, it became obvious that some families are more difficult to detect, and some simpler. Our goal was not only to detect traces of malware in the traffic, but also to do it in the optimal way - to pay attention to exactly which parameters allow us to draw more accurate conclusions for this malware family, and which are less accurate.

    Finally, we have demonstrated that the attribution of known malware can only be done by analyzing network traffic without decrypting the TLS connection.

    A detection accuracy of 90.3% was achieved with attribution of the malware family in the case when we are limited to a single encrypted connection, and an accuracy of 93.2% when analyzing all available encrypted connections within a five-minute analysis window. To analyze the first five minutes of activity of known malware samples, the Cisco ThreatGrid dynamic analysis system was used. Tens of thousands of unique malware samples were collected and hundreds of thousands of malicious, encrypted connections analyzed. Telemetry was collected about millions of encrypted TLS connections in corporate networks, for comparison with telemetry generated by malicious connections.

    An open toolkit was developed for the efficient collection and pre-processing and conversion of network telemetry to JSON ( Joy project), which collects all the information necessary for analysis - source and destination IP, source and destination ports, protocols, time-frequency characteristics of transmitted packet sizes, frequency byte allocation and entropy, unencrypted information about establishing a TLS connection. All analysis is performed only at the network level, without the need to install any agents on the terminal devices.

    Additional Resources:
    Download Official Cisco Encrypted Traffic Analytics Report

    Also popular now: